feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions
--- a/docs/QUICK_START.md
+++ b/docs/QUICK_START.md
@@ -0,0 +1,229 @@
+# QUICK_START.md
+
+> Para que Claude Code (o cualquier dev) empiece rápido
+
+---
+
+## Para entender el proyecto (5 min)
+
+### Paso 1: Lee PROJECT_CONTEXT.md (2 min)
+```
+docs/PROJECT_CONTEXT.md
+```
+Contiene: qué es, estado actual, stack, estructura, prohibiciones.
+
+### Paso 2: Lee ARCHITECTURE.md (2 min)
+```
+docs/ARCHITECTURE.md
+```
+Contiene: diagrama de pipeline, módulos, flujo de datos.
+
+### Paso 3: Escanea la estructura (1 min)
+```
+src/
+├── transcription/     # Audio → JSON transcripts
+├── features/          # Eventos determinísticos
+├── compression/       # Reducción de tokens
+├── inference/         # LLM → RCA labels
+├── aggregation/       # Stats + RCA trees
+├── pipeline/          # Orchestration
+├── exports/           # JSON/Excel/PDF
+└── models/            # CallAnalysis central
+```
+
+---
+
+## Para ejecutar el pipeline
+
+### Instalación
+```bash
+# Crear virtualenv
+python -m venv venv
+venv\Scripts\activate  # Windows
+source venv/bin/activate  # Linux/Mac
+
+# Instalar dependencias
+pip install -r requirements.txt
+
+# Configurar variables de entorno
+cp .env.example .env
+# Editar .env con tus API keys
+```
+
+### Ejecutar pipeline
+```bash
+# Con audio files
+python cli.py run my_batch -i data/audio -o data/output
+
+# Ver estado
+python cli.py status my_batch
+
+# Con opciones
+python cli.py run my_batch --model gpt-4o --formats json,excel,pdf
+```
+
+---
+
+## Para implementar un feature
+
+### Paso 1: Identifica el módulo
+| Si quieres... | Edita... |
+|---------------|----------|
+| Cambiar transcripción | `src/transcription/` |
+| Detectar nuevos eventos | `src/features/event_detector.py` |
+| Modificar compresión | `src/compression/compressor.py` |
+| Cambiar prompt LLM | `src/inference/prompts.py` |
+| Ajustar severidad | `src/aggregation/severity.py` |
+| Añadir nuevo export | `src/exports/` |
+
+### Paso 2: Lee el schema en DATA_CONTRACTS.md
+```
+docs/DATA_CONTRACTS.md
+```
+
+### Paso 3: Implementa siguiendo el patrón existente
+```python
+# Ejemplo: Añadir nuevo evento
+# src/features/event_detector.py
+
+class EventType(str, Enum):
+    # ... existentes ...
+    NEW_EVENT = "new_event"  # Añadir aquí
+
+def _detect_new_event(self, transcript: Transcript) -> list[Event]:
+    # Implementar detección
+    pass
+```
+
+### Paso 4: Escribe tests
+```bash
+# Crear test
+tests/unit/test_<module>.py
+
+# Ejecutar
+pytest tests/unit/test_<module>.py -v
+```
+
+### Paso 5: Actualiza documentación
+- `CHANGELOG.md` - Log del cambio
+- `DATA_CONTRACTS.md` - Si cambias schemas
+- `TECHNICAL_DECISIONS.md` - Si tomas decisiones
+
+---
+
+## Para debugging
+
+### Paso 1: Check TROUBLESHOOTING.md
+```
+docs/TROUBLESHOOTING.md
+```
+
+### Paso 2: Ejecutar módulo aislado
+```python
+# Test transcription solo
+from src.transcription import AssemblyAITranscriber
+
+transcriber = AssemblyAITranscriber(api_key="...")
+result = transcriber.transcribe(Path("test.mp3"))
+print(result)
+```
+
+### Paso 3: Logs verbosos
+```bash
+python cli.py run test_batch -v  # Verbose mode
+```
+
+### Paso 4: Si resuelves algo nuevo
+Añádelo a `docs/TROUBLESHOOTING.md`
+
+---
+
+## Para validar cambios
+
+### Paso 1: Tests
+```bash
+pytest tests/ -v
+```
+
+### Paso 2: Notebooks de validación
+```
+notebooks/01_transcription_validation.ipynb
+notebooks/02_inference_validation.ipynb
+notebooks/03_compression_validation.ipynb
+notebooks/04_aggregation_validation.ipynb
+notebooks/05_full_pipeline_test.ipynb
+```
+
+### Paso 3: Actualizar BENCHMARKS.md
+Si afecta performance/cost:
+```
+docs/BENCHMARKS.md
+```
+
+---
+
+## Archivos críticos (NO modificar sin revisión)
+
+| Archivo | Por qué |
+|---------|---------|
+| `config/rca_taxonomy.yaml` | Define todos los drivers |
+| `src/models/call_analysis.py` | Contrato central |
+| `src/inference/prompts.py` | Prompt afecta calidad |
+| `src/aggregation/severity.py` | Fórmula de priorización |
+
+---
+
+## Comandos útiles
+
+```bash
+# Ver estructura del proyecto
+tree -L 2 src/
+
+# Buscar en código
+grep -r "RCALabel" src/
+
+# Ver tests de un módulo
+pytest tests/unit/test_inference.py -v
+
+# Coverage
+pytest --cov=src tests/
+
+# Type checking (si hay mypy)
+mypy src/
+```
+
+---
+
+## Principios clave (siempre recordar)
+
+1. **OBSERVED vs INFERRED** - Todo dato clasificado
+2. **Evidence obligatoria** - Sin evidence = driver rechazado
+3. **Taxonomía cerrada** - Solo códigos del enum
+4. **Traceability** - Versiones en todo output
+5. **No over-engineering** - Solo lo que se pide
+
+---
+
+## Preguntas frecuentes
+
+### ¿Cómo añado un nuevo driver RCA?
+1. Editar `config/rca_taxonomy.yaml`
+2. Actualizar `src/inference/prompts.py`
+3. Correr tests
+4. Documentar en CHANGELOG.md
+
+### ¿Cómo cambio el LLM?
+1. Editar `cli.py run --model <model>`
+2. O configurar en `src/inference/analyzer.py`
+
+### ¿Cómo proceso más de 20k llamadas?
+1. Dividir en batches
+2. Usar resume automático
+3. Considerar DuckDB para aggregation
+
+### ¿Dónde están los costes?
+`docs/BENCHMARKS.md` (pendiente de datos reales)
+
+---
+
+**Última actualización**: 2026-01-19