feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions
--- a/docs/DEPLOYMENT.md
+++ b/docs/DEPLOYMENT.md
@@ -0,0 +1,889 @@
+# CXInsights - Deployment Guide
+
+## Modelo de Deployment
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         DEPLOYMENT MODEL                                    │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  CXInsights está diseñado para ejecutarse como LONG-RUNNING BATCH JOBS     │
+│  en un servidor dedicado (físico o VM), NO como microservicio elástico.    │
+│                                                                             │
+│  ✅ Modelo principal: Servidor dedicado con ejecución via tmux/systemd     │
+│  ⚠️ Modelo secundario: Cloud VM (misma arquitectura, diferente hosting)    │
+│  📦 Opcional: Docker (para portabilidad, no para orquestación)             │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Prerequisitos
+
+### Software requerido
+
+| Software | Versión | Propósito |
+|----------|---------|-----------|
+| Python | 3.11+ | Runtime |
+| Git | 2.40+ | Control de versiones |
+| ffmpeg | 6.0+ | Validación de audio (opcional) |
+| tmux | 3.0+ | Sesiones persistentes para batch jobs |
+
+### Cuentas y API Keys
+
+| Servicio | URL | Necesario para |
+|----------|-----|----------------|
+| AssemblyAI | https://assemblyai.com | Transcripción STT |
+| OpenAI | https://platform.openai.com | Análisis LLM |
+| Anthropic | https://console.anthropic.com | Backup LLM (opcional) |
+
+---
+
+## Capacity Planning (Sizing Estático)
+
+### Requisitos de Hardware
+
+El sizing es **estático** para el volumen máximo esperado. No hay auto-scaling.
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    CAPACITY PLANNING                                        │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  VOLUMEN: 5,000 llamadas / batch                                           │
+│  ├─ CPU: 4 cores (transcripción es I/O bound, no CPU bound)                │
+│  ├─ RAM: 8 GB                                                              │
+│  ├─ Disco: 50 GB SSD (audio + transcripts + outputs)                       │
+│  └─ Red: 100 Mbps (upload audio a STT API)                                 │
+│                                                                             │
+│  VOLUMEN: 20,000 llamadas / batch                                          │
+│  ├─ CPU: 4-8 cores                                                         │
+│  ├─ RAM: 16 GB                                                             │
+│  ├─ Disco: 200 GB SSD                                                      │
+│  └─ Red: 100+ Mbps                                                         │
+│                                                                             │
+│  NOTA: El cuello de botella es el rate limit de APIs externas,            │
+│        no el hardware local. Más cores no acelera el pipeline.            │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Estimación de espacio en disco
+
+```
+Por cada 1,000 llamadas (AHT = 7 min):
+├─ Audio original:     ~2-4 GB (depende de bitrate)
+├─ Transcripts raw:    ~100 MB
+├─ Transcripts compressed: ~40 MB
+├─ Features:           ~20 MB
+├─ Labels (processed): ~50 MB
+├─ Outputs finales:    ~10 MB
+└─ TOTAL:              ~2.5-4.5 GB por 1,000 calls
+
+Recomendación:
+├─ 5K calls:  50 GB disponibles
+└─ 20K calls: 200 GB disponibles
+```
+
+---
+
+## Deployment Estándar (Servidor Dedicado)
+
+### 1. Preparar servidor
+
+```bash
+# Ubuntu 22.04 LTS (o similar)
+sudo apt update
+sudo apt install -y python3.11 python3.11-venv git ffmpeg tmux
+```
+
+### 2. Clonar repositorio
+
+```bash
+# Ubicación recomendada: /opt/cxinsights o ~/cxinsights
+cd /opt
+git clone https://github.com/tu-org/cxinsights.git
+cd cxinsights
+```
+
+### 3. Crear entorno virtual
+
+```bash
+python3.11 -m venv .venv
+source .venv/bin/activate
+```
+
+### 4. Instalar dependencias
+
+```bash
+# Instalación base
+pip install -e .
+
+# Con PII detection (recomendado)
+pip install -e ".[pii]"
+
+# Con herramientas de desarrollo
+pip install -e ".[dev]"
+```
+
+### 5. Configurar variables de entorno
+
+```bash
+cp .env.example .env
+nano .env
+```
+
+Contenido de `.env`:
+
+```bash
+# === API KEYS ===
+ASSEMBLYAI_API_KEY=your_assemblyai_key_here
+OPENAI_API_KEY=sk-your_openai_key_here
+ANTHROPIC_API_KEY=sk-ant-your_anthropic_key_here  # Opcional
+
+# === THROTTLING (ajustar manualmente según tier y pruebas) ===
+# Estos son LÍMITES INTERNOS, no promesas de las APIs
+MAX_CONCURRENT_TRANSCRIPTIONS=30    # AssemblyAI: empezar conservador
+LLM_REQUESTS_PER_MINUTE=200         # OpenAI: depende de tu tier
+LLM_BACKOFF_BASE=2.0                # Segundos base para retry
+LLM_BACKOFF_MAX=60.0                # Máximo backoff
+LLM_MAX_RETRIES=5
+
+# === LOGGING ===
+LOG_LEVEL=INFO
+LOG_DIR=./data/logs
+
+# === RUTAS ===
+DATA_DIR=./data
+CONFIG_DIR=./config
+```
+
+### 6. Crear estructura de datos persistente
+
+```bash
+# Script de inicialización (ejecutar una sola vez)
+./scripts/init_data_structure.sh
+```
+
+O manualmente:
+
+```bash
+mkdir -p data/{raw/audio,raw/metadata}
+mkdir -p data/{transcripts/raw,transcripts/compressed}
+mkdir -p data/features
+mkdir -p data/processed
+mkdir -p data/outputs
+mkdir -p data/logs
+mkdir -p data/.checkpoints
+```
+
+### 7. Verificar instalación
+
+```bash
+python -m cxinsights.pipeline.cli --help
+```
+
+---
+
+## Configuración de Throttling
+
+### Concepto clave
+
+Los parámetros `MAX_CONCURRENT_*` y `*_REQUESTS_PER_MINUTE` son **throttles internos** que tú ajustas manualmente según:
+1. Tu tier en las APIs (OpenAI, AssemblyAI)
+2. Pruebas reales de comportamiento
+3. Errores 429 observados
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    THROTTLING CONFIGURATION                                 │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  ASSEMBLYAI:                                                                │
+│  ├─ Default: 100 concurrent transcriptions (según docs)                    │
+│  ├─ Recomendación inicial: 30 (conservador)                                │
+│  └─ Ajustar según errores observados                                       │
+│                                                                             │
+│  OPENAI:                                                                    │
+│  ├─ Tier 1 (free): 500 RPM → configurar 200 RPM interno                   │
+│  ├─ Tier 2: 5000 RPM → configurar 2000 RPM interno                        │
+│  ├─ Tier 3+: 5000+ RPM → configurar según necesidad                       │
+│  └─ SIEMPRE dejar margen (40-50% del límite real)                         │
+│                                                                             │
+│  Si ves errores 429:                                                        │
+│  1. Reducir *_REQUESTS_PER_MINUTE                                          │
+│  2. El backoff exponencial manejará picos                                  │
+│  3. Loguear y ajustar para siguiente batch                                 │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Ejecución de Batch Jobs
+
+### Modelo de ejecución: Long-running batch jobs
+
+CXInsights ejecuta **procesos de larga duración** (6-24+ horas). Usa tmux o systemd para persistencia.
+
+### Opción A: tmux (recomendado para operación manual)
+
+```bash
+# Crear sesión tmux
+tmux new-session -s cxinsights
+
+# Dentro de tmux, ejecutar pipeline
+source .venv/bin/activate
+python -m cxinsights.pipeline.cli run \
+    --input ./data/raw/audio/batch_2024_01 \
+    --batch-id batch_2024_01
+
+# Detach de tmux: Ctrl+B, luego D
+# Re-attach: tmux attach -t cxinsights
+
+# Ver logs en otra ventana tmux
+# Ctrl+B, luego C (nueva ventana)
+tail -f data/logs/pipeline_*.log
+```
+
+### Opción B: systemd (recomendado para ejecución programada)
+
+```ini
+# /etc/systemd/system/cxinsights-batch.service
+[Unit]
+Description=CXInsights Batch Processing
+After=network.target
+
+[Service]
+Type=simple
+User=cxinsights
+WorkingDirectory=/opt/cxinsights
+Environment="PATH=/opt/cxinsights/.venv/bin"
+ExecStart=/opt/cxinsights/.venv/bin/python -m cxinsights.pipeline.cli run \
+    --input /opt/cxinsights/data/raw/audio/current_batch \
+    --batch-id current_batch
+Restart=no
+StandardOutput=append:/opt/cxinsights/data/logs/systemd.log
+StandardError=append:/opt/cxinsights/data/logs/systemd.log
+
+[Install]
+WantedBy=multi-user.target
+```
+
+```bash
+# Activar y ejecutar
+sudo systemctl daemon-reload
+sudo systemctl start cxinsights-batch
+
+# Ver estado
+sudo systemctl status cxinsights-batch
+journalctl -u cxinsights-batch -f
+```
+
+### Comando básico
+
+```bash
+python -m cxinsights.pipeline.cli run \
+    --input ./data/raw/audio/batch_2024_01 \
+    --batch-id batch_2024_01
+```
+
+### Opciones disponibles
+
+```bash
+python -m cxinsights.pipeline.cli run --help
+
+# Opciones:
+#   --input PATH          Carpeta con archivos de audio [required]
+#   --output PATH         Carpeta de salida [default: ./data]
+#   --batch-id TEXT       Identificador del batch [required]
+#   --config PATH         Archivo de configuración [default: ./config/settings.yaml]
+#   --stages TEXT         Stages a ejecutar (comma-separated) [default: all]
+#   --skip-transcription  Saltar transcripción (usar existentes)
+#   --skip-inference      Saltar inferencia (usar existentes)
+#   --dry-run             Mostrar qué se haría sin ejecutar
+#   --verbose             Logging detallado
+```
+
+### Ejecución por stages (útil para debugging)
+
+```bash
+# Solo transcripción
+python -m cxinsights.pipeline.cli run \
+    --input ./data/raw/audio/batch_01 \
+    --batch-id batch_01 \
+    --stages transcription
+
+# Solo features (requiere transcripts)
+python -m cxinsights.pipeline.cli run \
+    --batch-id batch_01 \
+    --stages features
+
+# Solo inferencia (requiere transcripts + features)
+python -m cxinsights.pipeline.cli run \
+    --batch-id batch_01 \
+    --stages inference
+
+# Agregación y reportes (requiere labels)
+python -m cxinsights.pipeline.cli run \
+    --batch-id batch_01 \
+    --stages aggregation,visualization
+```
+
+### Resumir desde checkpoint
+
+```bash
+# Si el pipeline falló o se interrumpió
+python -m cxinsights.pipeline.cli resume --batch-id batch_01
+
+# El sistema detecta automáticamente:
+# - Transcripciones completadas
+# - Features extraídos
+# - Labels ya generados
+# - Continúa desde donde se quedó
+```
+
+### Estimación de costes antes de ejecutar
+
+```bash
+python -m cxinsights.pipeline.cli estimate --input ./data/raw/audio/batch_01
+
+# Output:
+# ┌─────────────────────────────────────────────────┐
+# │           COST ESTIMATION (AHT=7min)            │
+# ├─────────────────────────────────────────────────┤
+# │ Files found:           5,234                    │
+# │ Total duration:        ~611 hours               │
+# │ Avg duration/call:     7.0 min                  │
+# ├─────────────────────────────────────────────────┤
+# │ Transcription (STT):   $540 - $600              │
+# │ Inference (LLM):       $2.50 - $3.50            │
+# │ TOTAL ESTIMATED:       $543 - $604              │
+# └─────────────────────────────────────────────────┘
+# Proceed? [y/N]:
+```
+
+---
+
+## Política de Logs y Retención
+
+### Estructura de logs
+
+```
+data/logs/
+├── pipeline_2024_01_15_103000.log    # Log principal del batch
+├── pipeline_2024_01_15_103000.err    # Errores separados
+├── transcription_2024_01_15.log      # Detalle STT
+├── inference_2024_01_15.log          # Detalle LLM
+└── systemd.log                       # Si usas systemd
+```
+
+### Política de retención
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    RETENTION POLICY                                         │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  LOGS:                                                                      │
+│  ├─ Pipeline logs: 30 días                                                 │
+│  ├─ Error logs: 90 días                                                    │
+│  └─ Rotación: diaria, compresión gzip después de 7 días                   │
+│                                                                             │
+│  DATOS:                                                                     │
+│  ├─ Audio raw: borrar tras procesamiento exitoso (o retener 30 días)      │
+│  ├─ Transcripts raw: borrar tras 30 días                                  │
+│  ├─ Transcripts compressed: borrar tras procesamiento LLM                 │
+│  ├─ Features: retener mientras existan labels                             │
+│  ├─ Labels (processed): retener indefinidamente (sin PII)                 │
+│  ├─ Outputs (stats, RCA): retener indefinidamente                         │
+│  └─ Checkpoints: borrar tras completar batch                              │
+│                                                                             │
+│  IMPORTANTE: Los logs NUNCA contienen transcripts completos               │
+│              Solo: call_id, timestamps, errores, métricas                 │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Configuración de logrotate (Linux)
+
+```bash
+# /etc/logrotate.d/cxinsights
+/opt/cxinsights/data/logs/*.log {
+    daily
+    rotate 30
+    compress
+    delaycompress
+    missingok
+    notifempty
+    create 644 cxinsights cxinsights
+}
+```
+
+### Script de limpieza manual
+
+```bash
+# scripts/cleanup_old_data.sh
+#!/bin/bash
+# Ejecutar periódicamente (cron semanal)
+
+DATA_DIR="/opt/cxinsights/data"
+RETENTION_DAYS=30
+
+echo "Cleaning data older than $RETENTION_DAYS days..."
+
+# Logs antiguos
+find "$DATA_DIR/logs" -name "*.log" -mtime +$RETENTION_DAYS -delete
+find "$DATA_DIR/logs" -name "*.gz" -mtime +90 -delete
+
+# Transcripts raw antiguos
+find "$DATA_DIR/transcripts/raw" -name "*.json" -mtime +$RETENTION_DAYS -delete
+
+# Checkpoints de batches completados (manual review recomendado)
+echo "Review and delete completed checkpoints manually:"
+ls -la "$DATA_DIR/.checkpoints/"
+
+echo "Cleanup complete."
+```
+
+---
+
+## Dashboard (Visualización)
+
+```bash
+# Lanzar dashboard
+streamlit run src/visualization/dashboard.py -- --batch-id batch_2024_01
+
+# Acceder en: http://localhost:8501
+# O si es servidor remoto: http://servidor:8501
+```
+
+### Con autenticación (proxy nginx)
+
+Ver TECH_STACK.md sección "Streamlit - Deploy" para configuración de nginx con basic auth.
+
+---
+
+## Estructura de Outputs
+
+Después de ejecutar el pipeline:
+
+```
+data/outputs/batch_2024_01/
+├── aggregated_stats.json           # Estadísticas consolidadas
+├── call_matrix.csv                 # Todas las llamadas con labels
+├── rca_lost_sales.json             # Árbol RCA de ventas perdidas
+├── rca_poor_cx.json                # Árbol RCA de CX deficiente
+├── emergent_drivers_review.json    # OTHER_EMERGENT para revisión
+├── validation_report.json          # Resultado de quality gate
+├── executive_summary.pdf           # Reporte ejecutivo
+├── full_analysis.xlsx              # Excel con drill-down
+└── figures/
+    ├── rca_tree_lost_sales.png
+    ├── rca_tree_poor_cx.png
+    └── ...
+```
+
+---
+
+## Script de Deployment (deploy.sh)
+
+Script para configuración inicial del entorno persistente.
+
+```bash
+#!/bin/bash
+# deploy.sh - Configuración inicial de entorno persistente
+# Ejecutar UNA VEZ al instalar en nuevo servidor
+
+set -e
+
+INSTALL_DIR="${INSTALL_DIR:-/opt/cxinsights}"
+PYTHON_VERSION="python3.11"
+
+echo "======================================"
+echo "CXInsights - Initial Deployment"
+echo "======================================"
+echo "Install directory: $INSTALL_DIR"
+echo ""
+
+# 1. Verificar Python
+if ! command -v $PYTHON_VERSION &> /dev/null; then
+    echo "ERROR: $PYTHON_VERSION not found"
+    echo "Install with: sudo apt install python3.11 python3.11-venv"
+    exit 1
+fi
+echo "✓ Python: $($PYTHON_VERSION --version)"
+
+# 2. Verificar que estamos en el directorio correcto
+if [ ! -f "pyproject.toml" ]; then
+    echo "ERROR: pyproject.toml not found. Run from repository root."
+    exit 1
+fi
+echo "✓ Repository structure verified"
+
+# 3. Crear entorno virtual (si no existe)
+if [ ! -d ".venv" ]; then
+    echo "Creating virtual environment..."
+    $PYTHON_VERSION -m venv .venv
+fi
+source .venv/bin/activate
+echo "✓ Virtual environment: .venv"
+
+# 4. Instalar dependencias
+echo "Installing dependencies..."
+pip install -q --upgrade pip
+pip install -q -e .
+echo "✓ Dependencies installed"
+
+# 5. Configurar .env (si no existe)
+if [ ! -f ".env" ]; then
+    if [ -f ".env.example" ]; then
+        cp .env.example .env
+        echo "⚠ Created .env from template - CONFIGURE API KEYS"
+    else
+        echo "ERROR: .env.example not found"
+        exit 1
+    fi
+else
+    echo "✓ .env exists"
+fi
+
+# 6. Crear estructura de datos persistente (idempotente)
+echo "Creating data directory structure..."
+mkdir -p data/raw/audio
+mkdir -p data/raw/metadata
+mkdir -p data/transcripts/raw
+mkdir -p data/transcripts/compressed
+mkdir -p data/features
+mkdir -p data/processed
+mkdir -p data/outputs
+mkdir -p data/logs
+mkdir -p data/.checkpoints
+
+# Crear .gitkeep para preservar estructura en git
+touch data/raw/audio/.gitkeep
+touch data/raw/metadata/.gitkeep
+touch data/transcripts/raw/.gitkeep
+touch data/transcripts/compressed/.gitkeep
+touch data/features/.gitkeep
+touch data/processed/.gitkeep
+touch data/outputs/.gitkeep
+touch data/logs/.gitkeep
+
+echo "✓ Data directories created"
+
+# 7. Verificar API keys en .env
+source .env
+if [ -z "$ASSEMBLYAI_API_KEY" ] || [ "$ASSEMBLYAI_API_KEY" = "your_assemblyai_key_here" ]; then
+    echo ""
+    echo "⚠ WARNING: ASSEMBLYAI_API_KEY not configured in .env"
+fi
+if [ -z "$OPENAI_API_KEY" ] || [ "$OPENAI_API_KEY" = "sk-your_openai_key_here" ]; then
+    echo "⚠ WARNING: OPENAI_API_KEY not configured in .env"
+fi
+
+# 8. Verificar instalación
+echo ""
+echo "Verifying installation..."
+python -m cxinsights.pipeline.cli --help > /dev/null 2>&1
+if [ $? -eq 0 ]; then
+    echo "✓ CLI verification passed"
+else
+    echo "ERROR: CLI verification failed"
+    exit 1
+fi
+
+echo ""
+echo "======================================"
+echo "Deployment complete!"
+echo "======================================"
+echo ""
+echo "Next steps:"
+echo "  1. Configure API keys in .env"
+echo "  2. Copy audio files to data/raw/audio/your_batch/"
+echo "  3. Start tmux session: tmux new -s cxinsights"
+echo "  4. Activate venv: source .venv/bin/activate"
+echo "  5. Run pipeline:"
+echo "     python -m cxinsights.pipeline.cli run \\"
+echo "         --input ./data/raw/audio/your_batch \\"
+echo "         --batch-id your_batch"
+echo ""
+```
+
+```bash
+# Uso:
+chmod +x deploy.sh
+./deploy.sh
+```
+
+---
+
+## Docker (Opcional)
+
+Docker es una opción para **portabilidad**, no el camino principal de deployment.
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    DOCKER - DISCLAIMER                                      │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  Docker es OPCIONAL y se proporciona para:                                 │
+│  ├─ Entornos donde no se puede instalar Python directamente               │
+│  ├─ Reproducibilidad exacta del entorno                                   │
+│  └─ Integración con sistemas de CI/CD existentes                          │
+│                                                                             │
+│  Docker NO es necesario para:                                              │
+│  ├─ Ejecución normal en servidor dedicado                                 │
+│  ├─ Obtener mejor rendimiento                                             │
+│  └─ Escalar horizontalmente (no aplica a este workload)                   │
+│                                                                             │
+│  El deployment estándar (venv + tmux/systemd) es preferido.               │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Dockerfile
+
+```dockerfile
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# Dependencias del sistema
+RUN apt-get update && \
+    apt-get install -y ffmpeg && \
+    rm -rf /var/lib/apt/lists/*
+
+# Copiar código
+COPY pyproject.toml .
+COPY src/ src/
+COPY config/ config/
+
+# Instalar dependencias Python
+RUN pip install --no-cache-dir -e .
+
+# Volumen para datos persistentes
+VOLUME ["/app/data"]
+
+ENTRYPOINT ["python", "-m", "cxinsights.pipeline.cli"]
+```
+
+### Uso
+
+```bash
+# Build
+docker build -t cxinsights:latest .
+
+# Run (montar volumen de datos)
+docker run -it \
+    -v /path/to/data:/app/data \
+    --env-file .env \
+    cxinsights:latest run \
+    --input /app/data/raw/audio/batch_01 \
+    --batch-id batch_01
+```
+
+---
+
+## Cloud VM (Opción Secundaria)
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    CLOUD VM - DISCLAIMER                                    │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  Usar Cloud VM (AWS EC2, GCP Compute, Azure VM) cuando:                   │
+│  ├─ No tienes servidor físico disponible                                  │
+│  ├─ Necesitas acceso remoto desde múltiples ubicaciones                   │
+│  └─ Quieres delegar mantenimiento de hardware                             │
+│                                                                             │
+│  La arquitectura es IDÉNTICA al servidor dedicado:                         │
+│  ├─ Mismo sizing estático (no auto-scaling)                               │
+│  ├─ Mismo modelo de ejecución (long-running batch)                        │
+│  ├─ Misma configuración de throttling manual                              │
+│  └─ Solo cambia dónde está el servidor                                    │
+│                                                                             │
+│  COSTE ADICIONAL: $30-100/mes por la VM (según specs)                     │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Setup en Cloud VM
+
+```bash
+# 1. Crear VM (ejemplo AWS)
+# - Ubuntu 22.04 LTS
+# - t3.xlarge (4 vCPU, 16 GB RAM) para 20K calls
+# - 200 GB gp3 SSD
+# - Security group: SSH (22), HTTP opcional (8501 para dashboard)
+
+# 2. Conectar
+ssh -i key.pem ubuntu@vm-ip
+
+# 3. Seguir pasos de "Deployment Estándar" arriba
+# (idéntico a servidor dedicado)
+```
+
+---
+
+## Troubleshooting
+
+### Error: API key inválida
+
+```
+Error: AssemblyAI authentication failed
+```
+
+**Solución**: Verificar `ASSEMBLYAI_API_KEY` en `.env`
+
+### Error: Rate limit exceeded (429)
+
+```
+Error: OpenAI rate limit exceeded
+```
+
+**Solución**:
+1. Reducir `LLM_REQUESTS_PER_MINUTE` en `.env`
+2. El backoff automático manejará picos temporales
+3. Revisar tu tier en OpenAI dashboard
+
+### Error: Memoria insuficiente
+
+```
+MemoryError: Unable to allocate array
+```
+
+**Solución**:
+- Procesar en batches más pequeños
+- Aumentar RAM del servidor
+- Usar `--stages` para ejecutar por partes
+
+### Error: Transcripción fallida
+
+```
+Error: Transcription failed for call_xxx.mp3
+```
+
+**Solución**:
+- Verificar archivo: `ffprobe call_xxx.mp3`
+- Verificar que no excede 5 horas (límite AssemblyAI)
+- El pipeline continúa con las demás llamadas
+
+### Ver logs detallados
+
+```bash
+# Log principal del pipeline
+tail -f data/logs/pipeline_*.log
+
+# Verbose mode
+python -m cxinsights.pipeline.cli run ... --verbose
+
+# Si usas systemd
+journalctl -u cxinsights-batch -f
+```
+
+---
+
+## Checklist Pre-Ejecución
+
+```
+SERVIDOR:
+[ ] Python 3.11+ instalado
+[ ] tmux instalado
+[ ] Suficiente espacio en disco (ver Capacity Planning)
+[ ] Conectividad de red estable
+
+APLICACIÓN:
+[ ] Repositorio clonado
+[ ] Entorno virtual creado y activado
+[ ] Dependencias instaladas (pip install -e .)
+[ ] .env configurado con API keys
+[ ] Throttling configurado según tu tier
+
+DATOS:
+[ ] Archivos de audio en data/raw/audio/batch_id/
+[ ] Estimación de costes revisada (estimate command)
+[ ] Estructura de directorios creada
+
+EJECUCIÓN:
+[ ] Sesión tmux iniciada (o systemd configurado)
+[ ] Logs monitoreables
+```
+
+---
+
+## Makefile (Comandos útiles)
+
+```makefile
+.PHONY: install dev test lint run dashboard status logs clean-logs
+
+# Instalación
+install:
+	pip install -e .
+
+install-pii:
+	pip install -e ".[pii]"
+
+dev:
+	pip install -e ".[dev]"
+
+# Testing
+test:
+	pytest tests/ -v
+
+test-cov:
+	pytest tests/ --cov=src --cov-report=html
+
+# Linting
+lint:
+	ruff check src/
+	mypy src/
+
+format:
+	ruff format src/
+
+# Ejecución
+run:
+	python -m cxinsights.pipeline.cli run --input $(INPUT) --batch-id $(BATCH)
+
+estimate:
+	python -m cxinsights.pipeline.cli estimate --input $(INPUT)
+
+resume:
+	python -m cxinsights.pipeline.cli resume --batch-id $(BATCH)
+
+dashboard:
+	streamlit run src/visualization/dashboard.py -- --batch-id $(BATCH)
+
+# Monitoreo
+status:
+	@echo "=== Pipeline Status ==="
+	@ls -la data/.checkpoints/ 2>/dev/null || echo "No active checkpoints"
+	@echo ""
+	@echo "=== Recent Logs ==="
+	@ls -lt data/logs/*.log 2>/dev/null | head -5 || echo "No logs found"
+
+logs:
+	tail -f data/logs/pipeline_*.log
+
+# Limpieza (CUIDADO: no borrar datos de producción)
+clean-logs:
+	find data/logs -name "*.log" -mtime +30 -delete
+	find data/logs -name "*.gz" -mtime +90 -delete
+
+clean-checkpoints:
+	@echo "Review before deleting:"
+	@ls -la data/.checkpoints/
+	@read -p "Delete all checkpoints? [y/N] " confirm && [ "$$confirm" = "y" ] && rm -rf data/.checkpoints/*
+```
+
+Uso:
+
+```bash
+make install
+make run INPUT=./data/raw/audio/batch_01 BATCH=batch_01
+make logs
+make status
+make dashboard BATCH=batch_01
+```