feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -0,0 +1,839 @@
+# CXInsights - Arquitectura del Sistema
+
+## Visión del Producto
+
+CXInsights transforma 5,000-20,000 llamadas de contact center en **RCA Trees ejecutivos** que identifican las causas raíz de:
+- **Lost Sales**: Oportunidades de venta perdidas
+- **Poor CX**: Experiencias de cliente deficientes
+
+---
+
+## Principios de Diseño Críticos
+
+### 1. Separación Estricta: Observed vs Inferred
+
+**Todo dato debe estar claramente clasificado como HECHO o INFERENCIA.**
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                    OBSERVED vs INFERRED                                     │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  OBSERVED (Hechos medibles)          INFERRED (Opinión del modelo)         │
+│  ─────────────────────────           ──────────────────────────────        │
+│  ✓ Duración de la llamada            ✗ Sentimiento del cliente             │
+│  ✓ Número de transfers               ✗ Motivo de pérdida de venta          │
+│  ✓ Tiempo en hold (medido)           ✗ Calidad del agente                  │
+│  ✓ Silencios detectados (>N seg)     ✗ Clasificación de intent             │
+│  ✓ Texto transcrito                  ✗ Resumen de la llamada               │
+│  ✓ Quién habló cuánto (%)            ✗ Outcome (sale/no_sale/resolved)     │
+│  ✓ Timestamp de eventos              ✗ Drivers de RCA                      │
+│                                                                             │
+│  Regla: Si el LLM lo genera → es INFERRED                                  │
+│         Si viene del audio/STT → es OBSERVED                               │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+**Impacto**: RCA defendible ante stakeholders. Auditoría clara. Separación de hechos y opinión.
+
+### 2. Evidencia Obligatoria por Driver
+
+**Regla dura: Sin `evidence_spans` → el driver NO EXISTE**
+
+```json
+{
+  "rca_code": "LONG_HOLD",
+  "confidence": 0.77,
+  "evidence_spans": [
+    {"start": "02:14", "end": "03:52", "text": "[silence - hold]", "source": "observed"}
+  ]
+}
+```
+
+Un driver sin evidencia timestamped será rechazado por validación.
+
+### 3. Versionado de Prompts + Schema
+
+**Todo output incluye metadatos de versión para reproducibilidad.**
+
+```json
+{
+  "_meta": {
+    "schema_version": "1.0.0",
+    "prompt_version": "call_analysis_v1.2",
+    "model": "gpt-4o-mini",
+    "model_version": "2024-07-18",
+    "processed_at": "2024-01-15T10:30:00Z"
+  }
+}
+```
+
+### 4. Taxonomía RCA Cerrada + Canal de Emergentes
+
+**Solo códigos del enum. Única excepción controlada: `OTHER_EMERGENT`**
+
+```json
+{
+  "rca_code": "OTHER_EMERGENT",
+  "proposed_label": "agent_rushed_due_to_queue_pressure",
+  "evidence_spans": [...]
+}
+```
+
+Los `OTHER_EMERGENT` se revisan manualmente y se promueven a taxonomía oficial en siguiente versión.
+
+### 5. Eventos de Journey como Estructura
+
+**No texto libre. Objetos tipados con timestamp.**
+
+```json
+{
+  "journey_events": [
+    {"type": "CALL_START", "t": "00:00"},
+    {"type": "GREETING", "t": "00:03"},
+    {"type": "TRANSFER", "t": "01:42"},
+    {"type": "HOLD_START", "t": "02:10"},
+    {"type": "HOLD_END", "t": "03:40"},
+    {"type": "NEGATIVE_SENTIMENT", "t": "04:05", "source": "inferred"},
+    {"type": "RESOLUTION_ATTEMPT", "t": "05:20"},
+    {"type": "CALL_END", "t": "06:15"}
+  ]
+}
+```
+
+### 6. Adaptador de STT (Sin Lock-in)
+
+**Interfaz abstracta. El proveedor es intercambiable.**
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                         TRANSCRIBER INTERFACE                               │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  Interface: Transcriber                                                     │
+│  ├─ transcribe(audio_path) → TranscriptContract                            │
+│  └─ transcribe_batch(paths) → List[TranscriptContract]                     │
+│                                                                             │
+│  Implementations:                                                           │
+│  ├─ AssemblyAITranscriber (default)                                        │
+│  ├─ WhisperTranscriber (local/offline)                                     │
+│  ├─ GoogleSTTTranscriber (alternative)                                     │
+│  └─ AWSTranscribeTranscriber (alternative)                                 │
+│                                                                             │
+│  TranscriptContract (output normalizado):                                  │
+│  ├─ call_id: str                                                           │
+│  ├─ utterances: List[Utterance]                                            │
+│  ├─ observed_events: List[ObservedEvent]                                   │
+│  └─ metadata: TranscriptMetadata                                           │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Diagrama de Flujo End-to-End
+
+```
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              CXINSIGHTS PIPELINE                                │
+└─────────────────────────────────────────────────────────────────────────────────┘
+
+INPUT                           PROCESSING                              OUTPUT
+─────                           ──────────                              ──────
+
+┌──────────────┐
+│  5K-20K      │
+│  Audio Files │
+│  (.mp3/.wav) │
+└──────┬───────┘
+       │
+       ▼
+╔══════════════════════════════════════════════════════════════════════════════╗
+║  MODULE 1: BATCH TRANSCRIPTION (via Transcriber Interface)                   ║
+║  ┌────────────────────────────────────────────────────────────────────────┐  ║
+║  │  Transcriber Adapter (pluggable: AssemblyAI, Whisper, Google, AWS)     │  ║
+║  │  ├─ Parallel uploads (configurable concurrency)                        │  ║
+║  │  ├─ Spanish language model                                             │  ║
+║  │  ├─ Speaker diarization (Agent vs Customer)                            │  ║
+║  │  └─ Output: TranscriptContract (normalized)                            │  ║
+║  └────────────────────────────────────────────────────────────────────────┘  ║
+║         │                                                                     ║
+║         ▼                                                                     ║
+║  📁 data/transcripts/{call_id}.json (TranscriptContract)                     ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+       │
+       ▼
+╔══════════════════════════════════════════════════════════════════════════════╗
+║  MODULE 2: FEATURE EXTRACTION (OBSERVED ONLY)                                ║
+║  ┌────────────────────────────────────────────────────────────────────────┐  ║
+║  │  Extrae SOLO hechos medibles del transcript:                           │  ║
+║  │  ├─ Duración total                                                     │  ║
+║  │  ├─ % habla agente vs cliente (ratio)                                  │  ║
+║  │  ├─ Silencios > 5s (timestamp + duración)                              │  ║
+║  │  ├─ Interrupciones detectadas                                          │  ║
+║  │  ├─ Transfers (si detectables por audio/metadata)                      │  ║
+║  │  └─ Palabras clave literales (sin interpretación)                      │  ║
+║  │                                                                         │  ║
+║  │  Output: observed_features (100% verificable)                          │  ║
+║  └────────────────────────────────────────────────────────────────────────┘  ║
+║         │                                                                     ║
+║         ▼                                                                     ║
+║  📁 data/transcripts/{call_id}_features.json                                 ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+       │
+       ▼
+╔══════════════════════════════════════════════════════════════════════════════╗
+║  MODULE 3: PER-CALL INFERENCE (MAP) - Separación Observed/Inferred          ║
+║  ┌────────────────────────────────────────────────────────────────────────┐  ║
+║  │  LLM Analysis (GPT-4o-mini / Claude 3.5 Sonnet)                        │  ║
+║  │                                                                         │  ║
+║  │  Input al LLM:                                                         │  ║
+║  │  ├─ Transcript comprimido                                              │  ║
+║  │  ├─ observed_features (contexto factual)                               │  ║
+║  │  └─ Taxonomía RCA (enum cerrado)                                       │  ║
+║  │                                                                         │  ║
+║  │  Output estructurado:                                                  │  ║
+║  │  ├─ OBSERVED (pass-through, no inferido):                              │  ║
+║  │  │   └─ observed_outcome (si explícito en audio: "venta cerrada")     │  ║
+║  │  │                                                                      │  ║
+║  │  ├─ INFERRED (con confidence + evidence obligatoria):                  │  ║
+║  │  │   ├─ intent: {code, confidence, evidence_spans[]}                   │  ║
+║  │  │   ├─ outcome: {code, confidence, evidence_spans[]}                  │  ║
+║  │  │   ├─ sentiment: {score, confidence, evidence_spans[]}               │  ║
+║  │  │   ├─ lost_sale_driver: {rca_code, confidence, evidence_spans[]}    │  ║
+║  │  │   ├─ poor_cx_driver: {rca_code, confidence, evidence_spans[]}      │  ║
+║  │  │   └─ agent_quality: {scores{}, confidence, evidence_spans[]}       │  ║
+║  │  │                                                                      │  ║
+║  │  └─ JOURNEY_EVENTS (structured timeline):                              │  ║
+║  │      └─ events[]: {type, t, source: observed|inferred}                │  ║
+║  └────────────────────────────────────────────────────────────────────────┘  ║
+║         │                                                                     ║
+║         ▼                                                                     ║
+║  📁 data/processed/{call_id}_analysis.json                                   ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+       │
+       ▼
+╔══════════════════════════════════════════════════════════════════════════════╗
+║  MODULE 4: VALIDATION & QUALITY GATE                                         ║
+║  ┌────────────────────────────────────────────────────────────────────────┐  ║
+║  │  Validación estricta antes de agregar:                                 │  ║
+║  │  ├─ ¿Tiene evidence_spans todo driver? → Si no, RECHAZAR driver       │  ║
+║  │  ├─ ¿rca_code está en taxonomía? → Si no, marcar OTHER_EMERGENT       │  ║
+║  │  ├─ ¿Confidence > umbral? → Si no, marcar low_confidence              │  ║
+║  │  ├─ ¿Schema version match? → Si no, ERROR                             │  ║
+║  │  └─ ¿Journey events tienen timestamps válidos?                        │  ║
+║  │                                                                         │  ║
+║  │  Output: validated_analysis.json + validation_report.json             │  ║
+║  └────────────────────────────────────────────────────────────────────────┘  ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+       │
+       ▼
+╔══════════════════════════════════════════════════════════════════════════════╗
+║  MODULE 5: AGGREGATION (REDUCE)                                              ║
+║  ┌────────────────────────────────────────────────────────────────────────┐  ║
+║  │  Consolidación estadística (solo datos validados):                     │  ║
+║  │  ├─ Conteo por rca_code (taxonomía cerrada)                           │  ║
+║  │  ├─ Distribuciones con confidence_weighted                            │  ║
+║  │  ├─ Separación: high_confidence vs low_confidence                     │  ║
+║  │  ├─ Lista de OTHER_EMERGENT para revisión manual                      │  ║
+║  │  ├─ Cross-tabs (intent × outcome × driver)                            │  ║
+║  │  └─ Correlaciones observed_features ↔ inferred_outcomes               │  ║
+║  └────────────────────────────────────────────────────────────────────────┘  ║
+║         │                                                                     ║
+║         ▼                                                                     ║
+║  📁 data/outputs/aggregated_stats.json                                       ║
+║  📁 data/outputs/emergent_drivers_review.json                                ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+       │
+       ▼
+╔══════════════════════════════════════════════════════════════════════════════╗
+║  MODULE 6: RCA TREE GENERATION                                               ║
+║  ┌────────────────────────────────────────────────────────────────────────┐  ║
+║  │  Construcción de árboles (determinístico, no LLM):                     │  ║
+║  │                                                                         │  ║
+║  │  🔴 LOST SALES RCA TREE                                                │  ║
+║  │  └─ Lost Sales (N=1,250, 25%)                                          │  ║
+║  │     ├─ PRICING (45%, avg_conf=0.82)                                   │  ║
+║  │     │  ├─ TOO_EXPENSIVE (30%, n=375)                                  │  ║
+║  │     │  │  └─ evidence_samples: ["...", "..."]                         │  ║
+║  │     │  └─ COMPETITOR_CHEAPER (15%, n=187)                             │  ║
+║  │     │     └─ evidence_samples: ["...", "..."]                         │  ║
+║  │     └─ ...                                                             │  ║
+║  │                                                                         │  ║
+║  │  Cada nodo incluye:                                                    │  ║
+║  │  ├─ rca_code (del enum)                                               │  ║
+║  │  ├─ count, pct                                                        │  ║
+║  │  ├─ avg_confidence                                                    │  ║
+║  │  ├─ evidence_samples[] (verbatims representativos)                    │  ║
+║  │  └─ call_ids[] (para drill-down)                                      │  ║
+║  └────────────────────────────────────────────────────────────────────────┘  ║
+║         │                                                                     ║
+║         ▼                                                                     ║
+║  📁 data/outputs/rca_lost_sales.json                                         ║
+║  📁 data/outputs/rca_poor_cx.json                                            ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+       │
+       ▼
+╔══════════════════════════════════════════════════════════════════════════════╗
+║  MODULE 7: EXECUTIVE REPORTING                                               ║
+║  ┌────────────────────────────────────────────────────────────────────────┐  ║
+║  │  Formatos de salida:                                                   │  ║
+║  │  ├─ 📊 Streamlit Dashboard (con filtro observed/inferred)             │  ║
+║  │  ├─ 📑 PDF Executive Summary (incluye confidence disclaimers)         │  ║
+║  │  ├─ 📈 Excel con drill-down (link a evidence_spans)                   │  ║
+║  │  └─ 🖼️ PNG de árboles RCA (con leyenda de confidence)                │  ║
+║  └────────────────────────────────────────────────────────────────────────┘  ║
+╚══════════════════════════════════════════════════════════════════════════════╝
+```
+
+---
+
+## Modelo de Datos (Actualizado)
+
+### TranscriptContract (Module 1 output)
+
+```json
+{
+  "_meta": {
+    "schema_version": "1.0.0",
+    "transcriber": "assemblyai",
+    "transcriber_version": "2024-07",
+    "processed_at": "2024-01-15T10:30:00Z"
+  },
+  "call_id": "c001",
+  "observed": {
+    "duration_seconds": 245,
+    "language_detected": "es",
+    "speakers": [
+      {"id": "A", "label": "agent", "talk_time_pct": 0.45},
+      {"id": "B", "label": "customer", "talk_time_pct": 0.55}
+    ],
+    "utterances": [
+      {
+        "speaker": "A",
+        "text": "Buenos días, gracias por llamar a Movistar...",
+        "start_ms": 0,
+        "end_ms": 3500
+      }
+    ],
+    "detected_events": [
+      {"type": "SILENCE", "start_ms": 72000, "end_ms": 80000, "duration_ms": 8000},
+      {"type": "CROSSTALK", "start_ms": 45000, "end_ms": 46500}
+    ]
+  }
+}
+```
+
+### CallAnalysis (Module 3 output) - CON SEPARACIÓN OBSERVED/INFERRED
+
+```json
+{
+  "_meta": {
+    "schema_version": "1.0.0",
+    "prompt_version": "call_analysis_v1.2",
+    "model": "gpt-4o-mini",
+    "model_version": "2024-07-18",
+    "processed_at": "2024-01-15T10:35:00Z"
+  },
+  "call_id": "c001",
+
+  "observed": {
+    "duration_seconds": 245,
+    "agent_talk_pct": 0.45,
+    "customer_talk_pct": 0.55,
+    "silence_total_seconds": 38,
+    "silence_events": [
+      {"start": "01:12", "end": "01:20", "duration_s": 8}
+    ],
+    "transfer_count": 0,
+    "hold_events": [
+      {"start": "02:14", "end": "03:52", "duration_s": 98}
+    ],
+    "explicit_outcome": null
+  },
+
+  "inferred": {
+    "intent": {
+      "code": "SALES_INQUIRY",
+      "confidence": 0.91,
+      "evidence_spans": [
+        {"start": "00:15", "end": "00:28", "text": "Quería información sobre la fibra de 600 megas"}
+      ]
+    },
+
+    "outcome": {
+      "code": "NO_SALE",
+      "confidence": 0.85,
+      "evidence_spans": [
+        {"start": "05:40", "end": "05:52", "text": "Lo voy a pensar y ya les llamo yo"}
+      ]
+    },
+
+    "sentiment": {
+      "overall_score": -0.3,
+      "evolution": [
+        {"segment": "start", "score": 0.2},
+        {"segment": "middle", "score": -0.1},
+        {"segment": "end", "score": -0.6}
+      ],
+      "confidence": 0.78,
+      "evidence_spans": [
+        {"start": "04:10", "end": "04:25", "text": "Es que me parece carísimo, la verdad"}
+      ]
+    },
+
+    "lost_sale_driver": {
+      "rca_code": "PRICING_TOO_EXPENSIVE",
+      "confidence": 0.83,
+      "evidence_spans": [
+        {"start": "03:55", "end": "04:08", "text": "59 euros al mes es mucho dinero"},
+        {"start": "04:10", "end": "04:25", "text": "Es que me parece carísimo, la verdad"}
+      ],
+      "secondary_driver": {
+        "rca_code": "COMPETITOR_CHEAPER",
+        "confidence": 0.71,
+        "evidence_spans": [
+          {"start": "04:30", "end": "04:45", "text": "En Vodafone me lo dejan por 45"}
+        ]
+      }
+    },
+
+    "poor_cx_driver": {
+      "rca_code": "LONG_HOLD",
+      "confidence": 0.77,
+      "evidence_spans": [
+        {"start": "02:14", "end": "03:52", "text": "[hold - 98 segundos]", "source": "observed"}
+      ]
+    },
+
+    "agent_quality": {
+      "overall_score": 6,
+      "dimensions": {
+        "empathy": 7,
+        "product_knowledge": 8,
+        "objection_handling": 4,
+        "closing_skills": 5
+      },
+      "confidence": 0.72,
+      "evidence_spans": [
+        {"start": "04:50", "end": "05:10", "text": "Bueno, es el precio que tenemos...", "dimension": "objection_handling"}
+      ]
+    },
+
+    "summary": "Cliente interesado en fibra 600Mb abandona por precio (59€) comparando con Vodafone (45€). Hold largo de 98s. Agente no rebatió objeción de precio."
+  },
+
+  "journey_events": [
+    {"type": "CALL_START", "t": "00:00", "source": "observed"},
+    {"type": "GREETING", "t": "00:03", "source": "observed"},
+    {"type": "INTENT_STATED", "t": "00:15", "source": "inferred"},
+    {"type": "HOLD_START", "t": "02:14", "source": "observed"},
+    {"type": "HOLD_END", "t": "03:52", "source": "observed"},
+    {"type": "PRICE_OBJECTION", "t": "03:55", "source": "inferred"},
+    {"type": "COMPETITOR_MENTION", "t": "04:30", "source": "inferred"},
+    {"type": "NEGATIVE_SENTIMENT_PEAK", "t": "04:10", "source": "inferred"},
+    {"type": "SOFT_DECLINE", "t": "05:40", "source": "inferred"},
+    {"type": "CALL_END", "t": "06:07", "source": "observed"}
+  ]
+}
+```
+
+### RCA Tree Node (Module 6 output)
+
+```json
+{
+  "_meta": {
+    "schema_version": "1.0.0",
+    "generated_at": "2024-01-15T11:00:00Z",
+    "taxonomy_version": "rca_taxonomy_v1.0",
+    "total_calls_analyzed": 5000,
+    "confidence_threshold_used": 0.70
+  },
+  "tree_type": "lost_sales",
+  "total_affected": {
+    "count": 1250,
+    "pct_of_total": 25.0
+  },
+  "root": {
+    "label": "Lost Sales",
+    "children": [
+      {
+        "rca_code": "PRICING",
+        "label": "Pricing Issues",
+        "count": 562,
+        "pct_of_parent": 45.0,
+        "avg_confidence": 0.82,
+        "children": [
+          {
+            "rca_code": "PRICING_TOO_EXPENSIVE",
+            "label": "Too Expensive",
+            "count": 375,
+            "pct_of_parent": 66.7,
+            "avg_confidence": 0.84,
+            "evidence_samples": [
+              {"call_id": "c001", "text": "59 euros al mes es mucho dinero", "t": "03:55"},
+              {"call_id": "c042", "text": "No puedo pagar tanto", "t": "02:30"}
+            ],
+            "call_ids": ["c001", "c042", "c078", "..."]
+          },
+          {
+            "rca_code": "PRICING_COMPETITOR_CHEAPER",
+            "label": "Competitor Cheaper",
+            "count": 187,
+            "pct_of_parent": 33.3,
+            "avg_confidence": 0.79,
+            "evidence_samples": [
+              {"call_id": "c001", "text": "En Vodafone me lo dejan por 45", "t": "04:30"}
+            ],
+            "call_ids": ["c001", "c015", "..."]
+          }
+        ]
+      }
+    ]
+  },
+  "other_emergent": [
+    {
+      "proposed_label": "agent_rushed_due_to_queue_pressure",
+      "count": 23,
+      "evidence_samples": [
+        {"call_id": "c234", "text": "Perdona que voy con prisa que hay cola", "t": "01:15"}
+      ],
+      "recommendation": "Considerar añadir a taxonomía v1.1"
+    }
+  ]
+}
+```
+
+---
+
+## Taxonomía RCA (config/rca_taxonomy.yaml)
+
+```yaml
+# config/rca_taxonomy.yaml
+# Version: 1.0.0
+# Last updated: 2024-01-15
+
+_meta:
+  version: "1.0.0"
+  author: "CXInsights Team"
+  description: "Closed taxonomy for RCA classification. Only these codes are valid."
+
+# ============================================================================
+# INTENTS (Motivo de la llamada)
+# ============================================================================
+intents:
+  - SALES_INQUIRY           # Consulta de venta
+  - SALES_UPGRADE           # Upgrade de producto
+  - SUPPORT_TECHNICAL       # Soporte técnico
+  - SUPPORT_BILLING         # Consulta de facturación
+  - COMPLAINT               # Queja/reclamación
+  - CANCELLATION            # Solicitud de baja
+  - GENERAL_INQUIRY         # Consulta general
+  - OTHER_EMERGENT          # Captura de nuevos intents
+
+# ============================================================================
+# OUTCOMES (Resultado de la llamada)
+# ============================================================================
+outcomes:
+  - SALE_COMPLETED          # Venta cerrada
+  - SALE_LOST               # Venta perdida
+  - ISSUE_RESOLVED          # Problema resuelto
+  - ISSUE_UNRESOLVED        # Problema no resuelto
+  - ESCALATED               # Escalado a supervisor/otro depto
+  - CALLBACK_SCHEDULED      # Callback programado
+  - OTHER_EMERGENT
+
+# ============================================================================
+# LOST SALE DRIVERS (Por qué se perdió la venta)
+# ============================================================================
+lost_sale_drivers:
+
+  # Pricing cluster
+  PRICING:
+    - PRICING_TOO_EXPENSIVE         # "Es muy caro"
+    - PRICING_COMPETITOR_CHEAPER    # "En X me lo dan más barato"
+    - PRICING_NO_DISCOUNT           # No se ofreció descuento
+    - PRICING_PAYMENT_TERMS         # Condiciones de pago no aceptables
+
+  # Product fit cluster
+  PRODUCT_FIT:
+    - PRODUCT_FEATURE_MISSING       # Falta funcionalidad requerida
+    - PRODUCT_WRONG_OFFERED         # Se ofreció producto equivocado
+    - PRODUCT_COVERAGE_AREA         # Sin cobertura en su zona
+    - PRODUCT_TECH_REQUIREMENTS     # No cumple requisitos técnicos
+
+  # Process cluster
+  PROCESS:
+    - PROCESS_TOO_COMPLEX           # Proceso demasiado complicado
+    - PROCESS_DOCUMENTATION         # Requiere mucha documentación
+    - PROCESS_ACTIVATION_TIME       # Tiempo de activación largo
+    - PROCESS_CONTRACT_TERMS        # Términos de contrato no aceptables
+
+  # Agent cluster
+  AGENT:
+    - AGENT_COULDNT_CLOSE           # No cerró la venta
+    - AGENT_POOR_OBJECTION          # Mal manejo de objeciones
+    - AGENT_LACK_URGENCY            # No creó urgencia
+    - AGENT_MISSED_UPSELL           # Perdió oportunidad de upsell
+
+  # Timing cluster
+  TIMING:
+    - TIMING_NOT_READY              # Cliente no está listo
+    - TIMING_COMPARING              # Comparando opciones
+    - TIMING_BUDGET_PENDING         # Presupuesto pendiente
+
+  # Catch-all
+  OTHER_EMERGENT: []
+
+# ============================================================================
+# POOR CX DRIVERS (Por qué fue mala experiencia)
+# ============================================================================
+poor_cx_drivers:
+
+  # Wait time cluster
+  WAIT_TIME:
+    - WAIT_INITIAL_LONG             # Espera inicial larga (>2min)
+    - WAIT_HOLD_LONG                # Hold durante llamada largo (>1min)
+    - WAIT_CALLBACK_NEVER           # Callback prometido no llegó
+
+  # Resolution cluster
+  RESOLUTION:
+    - RESOLUTION_NOT_ACHIEVED       # Problema no resuelto
+    - RESOLUTION_NEEDED_ESCALATION  # Necesitó escalación
+    - RESOLUTION_CALLBACK_BROKEN    # Callback prometido incumplido
+    - RESOLUTION_INCORRECT          # Resolución incorrecta
+
+  # Agent behavior cluster
+  AGENT_BEHAVIOR:
+    - AGENT_LACK_EMPATHY            # Falta de empatía
+    - AGENT_RUDE                    # Grosero/dismissive
+    - AGENT_RUSHED                  # Con prisas
+    - AGENT_NOT_LISTENING           # No escuchaba
+
+  # Information cluster
+  INFORMATION:
+    - INFO_WRONG_GIVEN              # Información incorrecta
+    - INFO_INCONSISTENT             # Información inconsistente
+    - INFO_COULDNT_ANSWER           # No supo responder
+
+  # Process/System cluster
+  PROCESS_SYSTEM:
+    - SYSTEM_DOWN                   # Sistema caído
+    - POLICY_LIMITATION             # Limitación de política
+    - TOO_MANY_TRANSFERS            # Demasiados transfers
+    - AUTH_ISSUES                   # Problemas de autenticación
+
+  # Catch-all
+  OTHER_EMERGENT: []
+
+# ============================================================================
+# JOURNEY EVENT TYPES (Eventos del timeline)
+# ============================================================================
+journey_event_types:
+  # Observed (vienen del audio/STT)
+  observed:
+    - CALL_START
+    - CALL_END
+    - GREETING
+    - SILENCE                       # >5 segundos
+    - HOLD_START
+    - HOLD_END
+    - TRANSFER
+    - CROSSTALK                     # Hablan a la vez
+
+  # Inferred (vienen del LLM)
+  inferred:
+    - INTENT_STATED
+    - PRICE_OBJECTION
+    - COMPETITOR_MENTION
+    - NEGATIVE_SENTIMENT_PEAK
+    - POSITIVE_SENTIMENT_PEAK
+    - RESOLUTION_ATTEMPT
+    - SOFT_DECLINE
+    - HARD_DECLINE
+    - COMMITMENT
+    - ESCALATION_REQUEST
+```
+
+---
+
+## Diagrama de Componentes (Actualizado)
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                           CXINSIGHTS COMPONENTS                             │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│  ┌─────────────────────────────────────────────────────────────────────┐   │
+│  │                    TRANSCRIBER INTERFACE (Adapter Pattern)          │   │
+│  │  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │   │
+│  │  │ AssemblyAI   │ │   Whisper    │ │  Google STT  │ │    AWS     │ │   │
+│  │  │ Transcriber  │ │ Transcriber  │ │ Transcriber  │ │ Transcribe │ │   │
+│  │  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └─────┬──────┘ │   │
+│  │         └────────────────┴────────────────┴───────────────┘        │   │
+│  │                              ▼                                      │   │
+│  │                    TranscriptContract (normalized output)           │   │
+│  └─────────────────────────────────────────────────────────────────────┘   │
+│                                 │                                           │
+│                                 ▼                                           │
+│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐        │
+│  │    Feature      │    │    Inference    │    │   Validation    │        │
+│  │   Extractor     │───▶│     Service     │───▶│     Gate        │        │
+│  │ (observed only) │    │ (observed/infer)│    │ (evidence check)│        │
+│  └─────────────────┘    └─────────────────┘    └─────────────────┘        │
+│                                                         │                  │
+│                                                         ▼                  │
+│  ┌─────────────────────────────────────────────────────────────────────┐  │
+│  │                         AGGREGATION LAYER                            │  │
+│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │  │
+│  │  │ Stats Engine │  │  RCA Builder │  │   Emergent   │              │  │
+│  │  │ (by rca_code)│  │(deterministic│  │   Collector  │              │  │
+│  │  └──────────────┘  └──────────────┘  └──────────────┘              │  │
+│  └─────────────────────────────────────────────────────────────────────┘  │
+│                                 │                                           │
+│                                 ▼                                           │
+│  ┌─────────────────────────────────────────────────────────────────────┐  │
+│  │                      VISUALIZATION LAYER                             │  │
+│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐    │  │
+│  │  │ Dashboard  │  │    PDF     │  │   Excel    │  │    PNG     │    │  │
+│  │  │(obs/infer) │  │ (disclaim) │  │(drill-down)│  │  (legend)  │    │  │
+│  │  └────────────┘  └────────────┘  └────────────┘  └────────────┘    │  │
+│  └─────────────────────────────────────────────────────────────────────┘  │
+│                                                                             │
+│  ┌─────────────────────────────────────────────────────────────────────┐  │
+│  │                         CONFIG LAYER                                 │  │
+│  │  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐        │  │
+│  │  │ rca_taxonomy   │  │ prompts/ +     │  │   settings     │        │  │
+│  │  │ v1.0 (enum)    │  │ VERSION FILE   │  │    (.env)      │        │  │
+│  │  └────────────────┘  └────────────────┘  └────────────────┘        │  │
+│  └─────────────────────────────────────────────────────────────────────┘  │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Reglas de Validación (Quality Gate)
+
+```python
+# Pseudocódigo de validación
+
+def validate_call_analysis(analysis: CallAnalysis) -> ValidationResult:
+    errors = []
+    warnings = []
+
+    # REGLA 1: Todo driver debe tener evidence_spans
+    for driver in [analysis.inferred.lost_sale_driver, analysis.inferred.poor_cx_driver]:
+        if driver and not driver.evidence_spans:
+            errors.append(f"Driver {driver.rca_code} sin evidence_spans → RECHAZADO")
+
+    # REGLA 2: rca_code debe estar en taxonomía
+    if driver.rca_code not in TAXONOMY:
+        if driver.rca_code != "OTHER_EMERGENT":
+            errors.append(f"rca_code {driver.rca_code} no está en taxonomía")
+        else:
+            if not driver.proposed_label:
+                errors.append("OTHER_EMERGENT requiere proposed_label")
+
+    # REGLA 3: Confidence mínima
+    if driver.confidence < CONFIDENCE_THRESHOLD:
+        warnings.append(f"Driver {driver.rca_code} con low confidence: {driver.confidence}")
+
+    # REGLA 4: Schema version debe coincidir
+    if analysis._meta.schema_version != EXPECTED_SCHEMA_VERSION:
+        errors.append(f"Schema mismatch: {analysis._meta.schema_version}")
+
+    # REGLA 5: Journey events deben tener timestamps válidos
+    for event in analysis.journey_events:
+        if not is_valid_timestamp(event.t):
+            errors.append(f"Invalid timestamp in event: {event}")
+
+    return ValidationResult(
+        valid=len(errors) == 0,
+        errors=errors,
+        warnings=warnings
+    )
+```
+
+---
+
+## Versionado de Prompts
+
+```
+config/prompts/
+├── versions.yaml                    # Registry de versiones
+├── call_analysis/
+│   ├── v1.0/
+│   │   ├── system.txt
+│   │   ├── user.txt
+│   │   └── schema.json              # JSON Schema esperado
+│   ├── v1.1/
+│   │   ├── system.txt
+│   │   ├── user.txt
+│   │   └── schema.json
+│   └── v1.2/                        # Current
+│       ├── system.txt
+│       ├── user.txt
+│       └── schema.json
+└── rca_synthesis/
+    └── v1.0/
+        ├── system.txt
+        └── user.txt
+```
+
+```yaml
+# config/prompts/versions.yaml
+current:
+  call_analysis: "v1.2"
+  rca_synthesis: "v1.0"
+
+history:
+  call_analysis:
+    v1.0: "2024-01-01"
+    v1.1: "2024-01-10"  # Added secondary_driver support
+    v1.2: "2024-01-15"  # Added journey_events structure
+```
+
+---
+
+## Estimaciones
+
+### Tiempo Total (5,000 llamadas, ~4min promedio)
+
+| Stage | Tiempo Estimado |
+|-------|-----------------|
+| Transcription | 3-4 horas |
+| Feature Extraction | 15 min |
+| Inference | 2-3 horas |
+| Validation | 10 min |
+| Aggregation | 10 min |
+| RCA Tree Build | 5 min |
+| Reporting | 5 min |
+| **Total** | **6-8 horas** |
+
+### Costes (ver TECH_STACK.md para detalle)
+
+| Volumen | Transcription | Inference | Total |
+|---------|---------------|-----------|-------|
+| 5,000 calls | ~$300 | ~$15 | ~$315 |
+| 20,000 calls | ~$1,200 | ~$60 | ~$1,260 |
+
+---
+
+## Implementation Status (2026-01-19)
+
+| Module | Status | Location |
+|--------|--------|----------|
+| Transcription | ✅ Done | `src/transcription/` |
+| Feature Extraction | ✅ Done | `src/features/` |
+| Compression | ✅ Done | `src/compression/` |
+| Inference | ✅ Done | `src/inference/` |
+| Validation | ✅ Done | Built into models |
+| Aggregation | ✅ Done | `src/aggregation/` |
+| RCA Trees | ✅ Done | `src/aggregation/rca_tree.py` |
+| Pipeline | ✅ Done | `src/pipeline/` |
+| Exports | ✅ Done | `src/exports/` |
+| CLI | ✅ Done | `cli.py` |
+
+**Última actualización**: 2026-01-19 | **Versión**: 1.0.0