feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions
--- a/docs/PROJECT_STRUCTURE.md
+++ b/docs/PROJECT_STRUCTURE.md
@@ -0,0 +1,574 @@
+# CXInsights - Estructura del Proyecto
+
+## Árbol de Carpetas Completo
+
+```
+cxinsights/
+│
+├── 📁 data/                          # Datos (ignorado en git excepto .gitkeep)
+│   ├── raw/                          # Input original
+│   │   ├── audio/                    # Archivos de audio (.mp3, .wav)
+│   │   │   └── batch_2024_01/
+│   │   │       ├── call_001.mp3
+│   │   │       └── ...
+│   │   └── metadata/                 # CSV con metadatos opcionales
+│   │       └── calls_metadata.csv
+│   │
+│   ├── transcripts/                  # Output de STT
+│   │   └── batch_2024_01/
+│   │       ├── raw/                  # Transcripciones originales del STT
+│   │       │   └── call_001.json
+│   │       └── compressed/           # Transcripciones reducidas para LLM
+│   │           └── call_001.json
+│   │
+│   ├── features/                     # Output de extracción de features (OBSERVED)
+│   │   └── batch_2024_01/
+│   │       └── call_001_features.json
+│   │
+│   ├── processed/                    # Output de LLM (Labels con INFERRED)
+│   │   └── batch_2024_01/
+│   │       └── call_001_labels.json
+│   │
+│   ├── outputs/                      # Output final
+│   │   └── batch_2024_01/
+│   │       ├── aggregated_stats.json
+│   │       ├── call_matrix.csv
+│   │       ├── rca_lost_sales.json
+│   │       ├── rca_poor_cx.json
+│   │       ├── emergent_drivers_review.json
+│   │       ├── executive_summary.pdf
+│   │       ├── full_analysis.xlsx
+│   │       └── figures/
+│   │           ├── rca_tree_lost_sales.png
+│   │           └── rca_tree_poor_cx.png
+│   │
+│   ├── .checkpoints/                 # Estado del pipeline para resume
+│   │   ├── transcription_state.json
+│   │   ├── features_state.json
+│   │   ├── inference_state.json
+│   │   └── pipeline_state.json
+│   │
+│   └── logs/                         # Logs de ejecución
+│       └── pipeline_2024_01_15.log
+│
+├── 📁 src/                           # Código fuente
+│   ├── __init__.py
+│   │
+│   ├── 📁 transcription/             # Module 1: STT (SOLO transcripción)
+│   │   ├── __init__.py
+│   │   ├── base.py                   # Interface abstracta Transcriber
+│   │   ├── assemblyai_client.py      # Implementación AssemblyAI
+│   │   ├── whisper_client.py         # Implementación Whisper (futuro)
+│   │   ├── batch_processor.py        # Procesamiento paralelo
+│   │   ├── compressor.py             # SOLO reducción de texto para LLM
+│   │   └── models.py                 # Pydantic models: TranscriptContract
+│   │
+│   ├── 📁 features/                  # Module 2: Extracción OBSERVED
+│   │   ├── __init__.py
+│   │   ├── turn_metrics.py           # talk ratio, interruptions, silence duration
+│   │   ├── event_detector.py         # HOLD, TRANSFER, SILENCE events
+│   │   └── models.py                 # Pydantic models: ObservedFeatures, Event
+│   │
+│   ├── 📁 inference/                 # Module 3: LLM Analysis (INFERRED)
+│   │   ├── __init__.py
+│   │   ├── client.py                 # OpenAI/Anthropic client wrapper
+│   │   ├── prompt_manager.py         # Carga y renderiza prompts versionados
+│   │   ├── analyzer.py               # Análisis por llamada → CallLabels
+│   │   ├── batch_analyzer.py         # Procesamiento en lote con rate limiting
+│   │   ├── rca_synthesizer.py        # (opcional) Síntesis narrativa del RCA vía LLM
+│   │   └── models.py                 # CallLabels, InferredData, EvidenceSpan
+│   │
+│   ├── 📁 validation/                # Module 4: Quality Gate
+│   │   ├── __init__.py
+│   │   ├── validator.py              # Validación de evidence_spans, taxonomy, etc.
+│   │   ├── schema_checker.py         # Verificación de schema_version
+│   │   └── models.py                 # ValidationResult, ValidationError
+│   │
+│   ├── 📁 aggregation/               # Module 5-6: Stats + RCA (DETERMINÍSTICO)
+│   │   ├── __init__.py
+│   │   ├── stats_engine.py           # Cálculos estadísticos (pandas + DuckDB)
+│   │   ├── rca_builder.py            # Construcción DETERMINÍSTICA del árbol RCA
+│   │   ├── emergent_collector.py     # Recolección de OTHER_EMERGENT para revisión
+│   │   ├── correlations.py           # Análisis de correlaciones observed↔inferred
+│   │   └── models.py                 # AggregatedStats, RCATree, RCANode
+│   │
+│   ├── 📁 visualization/             # Module 7: Reports (SOLO presentación)
+│   │   ├── __init__.py
+│   │   ├── dashboard.py              # Streamlit app
+│   │   ├── charts.py                 # Generación de gráficos (plotly/matplotlib)
+│   │   ├── tree_renderer.py          # Visualización de árboles RCA como PNG/SVG
+│   │   ├── pdf_report.py             # Generación PDF ejecutivo
+│   │   └── excel_export.py           # Export a Excel con drill-down
+│   │
+│   ├── 📁 pipeline/                  # Orquestación
+│   │   ├── __init__.py
+│   │   ├── orchestrator.py           # Pipeline principal
+│   │   ├── stages.py                 # Definición de stages
+│   │   ├── checkpoint.py             # Gestión de checkpoints
+│   │   └── cli.py                    # Interfaz de línea de comandos
+│   │
+│   └── 📁 utils/                     # Utilidades compartidas
+│       ├── __init__.py
+│       ├── file_io.py                # Lectura/escritura de archivos
+│       ├── logging_config.py         # Setup de logging
+│       └── validators.py             # Validación de archivos de audio
+│
+├── 📁 config/                        # Configuración
+│   ├── rca_taxonomy.yaml             # Taxonomía cerrada de drivers (versionada)
+│   ├── settings.yaml                 # Config general (no secrets)
+│   │
+│   └── 📁 prompts/                   # Templates de prompts LLM (versionados)
+│       ├── versions.yaml             # Registry de versiones activas
+│       ├── call_analysis/
+│       │   └── v1.2/
+│       │       ├── system.txt
+│       │       ├── user.txt
+│       │       └── schema.json
+│       └── rca_synthesis/
+│           └── v1.0/
+│               ├── system.txt
+│               └── user.txt
+│
+├── 📁 tests/                         # Tests
+│   ├── __init__.py
+│   ├── conftest.py                   # Fixtures compartidas
+│   │
+│   ├── 📁 fixtures/                  # Datos de prueba
+│   │   ├── sample_audio/
+│   │   │   └── test_call.mp3
+│   │   ├── sample_transcripts/
+│   │   │   ├── raw/
+│   │   │   └── compressed/
+│   │   ├── sample_features/
+│   │   └── expected_outputs/
+│   │
+│   ├── 📁 unit/                      # Tests unitarios
+│   │   ├── test_transcription.py
+│   │   ├── test_features.py
+│   │   ├── test_inference.py
+│   │   ├── test_validation.py
+│   │   ├── test_aggregation.py
+│   │   └── test_visualization.py
+│   │
+│   └── 📁 integration/               # Tests de integración
+│       └── test_pipeline.py
+│
+├── 📁 notebooks/                     # Jupyter notebooks para EDA
+│   ├── 01_eda_transcripts.ipynb
+│   ├── 02_feature_exploration.ipynb
+│   ├── 03_prompt_testing.ipynb
+│   ├── 04_aggregation_validation.ipynb
+│   └── 05_visualization_prototypes.ipynb
+│
+├── 📁 scripts/                       # Scripts auxiliares
+│   ├── estimate_costs.py             # Estimador de costes antes de ejecutar
+│   ├── validate_audio.py             # Validar archivos de audio
+│   └── sample_calls.py               # Extraer muestra para testing
+│
+├── 📁 docs/                          # Documentación
+│   ├── ARCHITECTURE.md
+│   ├── TECH_STACK.md
+│   ├── PROJECT_STRUCTURE.md          # Este documento
+│   ├── DEPLOYMENT.md
+│   └── PROMPTS.md                    # Documentación de prompts
+│
+├── .env.example                      # Template de variables de entorno
+├── .gitignore
+├── pyproject.toml                    # Dependencias y metadata
+├── Makefile                          # Comandos útiles
+└── README.md                         # Documentación principal
+```
+
+---
+
+## Responsabilidades por Módulo
+
+### 📁 `src/transcription/`
+
+**Propósito**: Convertir audio a texto con diarización. **SOLO STT, sin analítica.**
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `base.py` | Interface abstracta `Transcriber`. Define contrato de salida. |
+| `assemblyai_client.py` | Implementación AssemblyAI. Maneja auth, upload, polling. |
+| `whisper_client.py` | Implementación Whisper local (futuro). |
+| `batch_processor.py` | Procesa N archivos en paralelo. Gestiona concurrencia. |
+| `compressor.py` | **SOLO reducción de texto**: quita muletillas, normaliza, acorta para LLM. **NO extrae features.** |
+| `models.py` | `TranscriptContract`, `Utterance`, `Speaker` - schemas Pydantic. |
+
+**Interfaces principales**:
+```python
+class Transcriber(ABC):
+    """Interface abstracta - permite cambiar proveedor STT sin refactor."""
+    async def transcribe(self, audio_path: Path) -> TranscriptContract
+    async def transcribe_batch(self, paths: list[Path]) -> list[TranscriptContract]
+
+class TranscriptCompressor:
+    """SOLO reduce texto. NO calcula métricas ni detecta eventos."""
+    def compress(self, transcript: TranscriptContract) -> CompressedTranscript
+```
+
+**Output**:
+- `data/transcripts/raw/{call_id}.json` → Transcripción original del STT
+- `data/transcripts/compressed/{call_id}.json` → Texto reducido para LLM
+
+---
+
+### 📁 `src/features/`
+
+**Propósito**: Extracción **determinística** de métricas y eventos desde transcripts. **100% OBSERVED.**
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `turn_metrics.py` | Calcula: talk_ratio, interruption_count, silence_total_seconds, avg_turn_duration. |
+| `event_detector.py` | Detecta eventos observables: HOLD_START, HOLD_END, TRANSFER, SILENCE, CROSSTALK. |
+| `models.py` | `ObservedFeatures`, `ObservedEvent`, `TurnMetrics`. |
+
+**Interfaces principales**:
+```python
+class TurnMetricsExtractor:
+    """Calcula métricas de turno desde utterances."""
+    def extract(self, transcript: TranscriptContract) -> TurnMetrics
+
+class EventDetector:
+    """Detecta eventos observables (silencios, holds, transfers)."""
+    def detect(self, transcript: TranscriptContract) -> list[ObservedEvent]
+```
+
+**Output**:
+- `data/features/{call_id}_features.json` → Métricas y eventos OBSERVED
+
+**Nota**: Este módulo **NO usa LLM**. Todo es cálculo determinístico sobre el transcript.
+
+---
+
+### 📁 `src/inference/`
+
+**Propósito**: Analizar transcripciones con LLM para extraer **datos INFERRED**.
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `client.py` | Wrapper sobre OpenAI/Anthropic SDK. Maneja retries, rate limiting. |
+| `prompt_manager.py` | Carga templates versionados, renderiza con variables, valida schema. |
+| `analyzer.py` | Análisis de una llamada → `CallLabels` con separación observed/inferred. |
+| `batch_analyzer.py` | Procesa N llamadas con rate limiting y checkpoints. |
+| `rca_synthesizer.py` | **(Opcional)** Síntesis narrativa del RCA tree vía LLM. NO construye el árbol. |
+| `models.py` | `CallLabels`, `InferredData`, `EvidenceSpan`, `JourneyEvent`. |
+
+**Interfaces principales**:
+```python
+class CallAnalyzer:
+    """Genera labels INFERRED con evidence_spans obligatorias."""
+    async def analyze(self, transcript: CompressedTranscript, features: ObservedFeatures) -> CallLabels
+
+class RCASynthesizer:
+    """(Opcional) Genera narrativa ejecutiva sobre RCA tree ya construido."""
+    async def synthesize_narrative(self, rca_tree: RCATree) -> str
+```
+
+**Output**:
+- `data/processed/{call_id}_labels.json` → Labels con observed + inferred
+
+---
+
+### 📁 `src/validation/`
+
+**Propósito**: Quality gate antes de agregación. Rechaza datos inválidos.
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `validator.py` | Valida: evidence_spans presente, rca_code en taxonomía, confidence > umbral. |
+| `schema_checker.py` | Verifica que schema_version y prompt_version coinciden con esperados. |
+| `models.py` | `ValidationResult`, `ValidationError`. |
+
+**Interfaces principales**:
+```python
+class CallLabelsValidator:
+    """Valida CallLabels antes de agregación."""
+    def validate(self, labels: CallLabels) -> ValidationResult
+
+    # Reglas:
+    # - Driver sin evidence_spans → RECHAZADO
+    # - rca_code no en taxonomía → marca como OTHER_EMERGENT o ERROR
+    # - schema_version mismatch → ERROR
+```
+
+---
+
+### 📁 `src/aggregation/`
+
+**Propósito**: Consolidar labels validados en estadísticas y RCA trees. **DETERMINÍSTICO, no usa LLM.**
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `stats_engine.py` | Cálculos: distribuciones, percentiles, cross-tabs. Usa pandas + DuckDB. |
+| `rca_builder.py` | **Construcción DETERMINÍSTICA** del árbol RCA a partir de stats y taxonomía. NO usa LLM. |
+| `emergent_collector.py` | Recolecta `OTHER_EMERGENT` para revisión manual y posible promoción a taxonomía. |
+| `correlations.py` | Análisis de correlaciones entre observed_features e inferred_outcomes. |
+| `models.py` | `AggregatedStats`, `RCATree`, `RCANode`, `Correlation`. |
+
+**Interfaces principales**:
+```python
+class StatsEngine:
+    """Agrega labels validados en estadísticas."""
+    def aggregate(self, labels: list[CallLabels]) -> AggregatedStats
+
+class RCABuilder:
+    """Construye árbol RCA de forma DETERMINÍSTICA (conteo + jerarquía de taxonomía)."""
+    def build_lost_sales_tree(self, stats: AggregatedStats, taxonomy: RCATaxonomy) -> RCATree
+    def build_poor_cx_tree(self, stats: AggregatedStats, taxonomy: RCATaxonomy) -> RCATree
+
+class EmergentCollector:
+    """Recolecta OTHER_EMERGENT para revisión humana."""
+    def collect(self, labels: list[CallLabels]) -> EmergentDriversReport
+```
+
+**Nota sobre RCA**:
+- `rca_builder.py` → **Determinístico**: cuenta ocurrencias, agrupa por taxonomía, calcula porcentajes
+- `inference/rca_synthesizer.py` → **(Opcional) LLM**: genera texto narrativo sobre el árbol ya construido
+
+---
+
+### 📁 `src/visualization/`
+
+**Propósito**: Capa de salida. Genera reportes visuales. **NO recalcula métricas ni inferencias.**
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `dashboard.py` | App Streamlit: filtros, gráficos interactivos, drill-down. |
+| `charts.py` | Funciones para generar gráficos (plotly/matplotlib). |
+| `tree_renderer.py` | Visualización de árboles RCA como PNG/SVG. |
+| `pdf_report.py` | Generación de PDF ejecutivo con ReportLab. |
+| `excel_export.py` | Export a Excel con múltiples hojas y formato. |
+
+**Restricción crítica**: Este módulo **SOLO presenta datos pre-calculados**. No contiene lógica analítica.
+
+**Interfaces principales**:
+```python
+class ReportGenerator:
+    """Genera reportes a partir de datos ya calculados."""
+    def generate_pdf(self, stats: AggregatedStats, trees: dict[str, RCATree]) -> Path
+    def generate_excel(self, labels: list[CallLabels], stats: AggregatedStats) -> Path
+
+class TreeRenderer:
+    """Renderiza RCATree como imagen."""
+    def render_png(self, tree: RCATree, output_path: Path) -> None
+```
+
+---
+
+### 📁 `src/pipeline/`
+
+**Propósito**: Orquestar el flujo completo de ejecución.
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `orchestrator.py` | Ejecuta stages en orden, maneja errores, logging. |
+| `stages.py` | Define cada stage: `transcribe`, `extract_features`, `analyze`, `validate`, `aggregate`, `report`. |
+| `checkpoint.py` | Guarda/carga estado para resume. |
+| `cli.py` | Interfaz CLI con argparse/typer. |
+
+---
+
+### 📁 `src/utils/`
+
+**Propósito**: Funciones auxiliares compartidas.
+
+| Archivo | Responsabilidad |
+|---------|-----------------|
+| `file_io.py` | Lectura/escritura JSON, CSV, audio. Glob patterns. |
+| `logging_config.py` | Setup de logging estructurado (consola + archivo). |
+| `validators.py` | Validación de archivos de audio (formato, duración). |
+
+---
+
+## Modelo de Datos (Output Artifacts)
+
+### Estructura mínima obligatoria de `labels.json`
+
+Todo archivo `{call_id}_labels.json` **SIEMPRE** incluye estos campos:
+
+```json
+{
+  "_meta": {
+    "schema_version": "1.0.0",      // OBLIGATORIO - versión del schema
+    "prompt_version": "v1.2",       // OBLIGATORIO - versión del prompt usado
+    "model_id": "gpt-4o-mini",      // OBLIGATORIO - modelo LLM usado
+    "processed_at": "2024-01-15T10:35:00Z"
+  },
+  "call_id": "c001",                // OBLIGATORIO
+
+  "observed": {                     // OBLIGATORIO - datos del STT/features
+    "duration_seconds": 245,
+    "agent_talk_pct": 0.45,
+    "customer_talk_pct": 0.55,
+    "silence_total_seconds": 38,
+    "hold_events": [...],
+    "transfer_count": 0
+  },
+
+  "inferred": {                     // OBLIGATORIO - datos del LLM
+    "intent": { "code": "...", "confidence": 0.91, "evidence_spans": [...] },
+    "outcome": { "code": "...", "confidence": 0.85, "evidence_spans": [...] },
+    "lost_sale_driver": { ... } | null,
+    "poor_cx_driver": { ... } | null,
+    "sentiment": { ... },
+    "agent_quality": { ... },
+    "summary": "..."
+  },
+
+  "events": [                       // OBLIGATORIO - timeline estructurado
+    {"type": "CALL_START", "t": "00:00", "source": "observed"},
+    {"type": "HOLD_START", "t": "02:14", "source": "observed"},
+    {"type": "PRICE_OBJECTION", "t": "03:55", "source": "inferred"},
+    ...
+  ]
+}
+```
+
+### Sobre `events[]`
+
+`events[]` es una **lista estructurada de eventos normalizados**, NO texto libre.
+
+Cada evento tiene:
+- `type`: Código del enum (`HOLD_START`, `TRANSFER`, `ESCALATION`, `NEGATIVE_SENTIMENT_PEAK`, etc.)
+- `t`: Timestamp en formato `MM:SS` o `HH:MM:SS`
+- `source`: `"observed"` (viene de STT/features) o `"inferred"` (viene de LLM)
+
+Tipos de eventos válidos definidos en `config/rca_taxonomy.yaml`:
+```yaml
+journey_event_types:
+  observed:
+    - CALL_START
+    - CALL_END
+    - HOLD_START
+    - HOLD_END
+    - TRANSFER
+    - SILENCE
+    - CROSSTALK
+  inferred:
+    - INTENT_STATED
+    - PRICE_OBJECTION
+    - COMPETITOR_MENTION
+    - NEGATIVE_SENTIMENT_PEAK
+    - RESOLUTION_ATTEMPT
+    - SOFT_DECLINE
+    - ESCALATION_REQUEST
+```
+
+---
+
+## Flujo de Datos entre Módulos
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                              DATA FLOW                                      │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                             │
+│   data/raw/audio/*.mp3                                                      │
+│           │                                                                 │
+│           ▼                                                                 │
+│   ┌───────────────┐                                                        │
+│   │ transcription │ → data/transcripts/raw/*.json                          │
+│   │   (STT only)  │ → data/transcripts/compressed/*.json                   │
+│   └───────────────┘                                                        │
+│           │                                                                 │
+│           ▼                                                                 │
+│   ┌───────────────┐                                                        │
+│   │   features    │ → data/features/*_features.json                        │
+│   │  (OBSERVED)   │   (turn_metrics + detected_events)                     │
+│   └───────────────┘                                                        │
+│           │                                                                 │
+│           ▼                                                                 │
+│   ┌───────────────┐                                                        │
+│   │   inference   │ → data/processed/*_labels.json                         │
+│   │  (INFERRED)   │   (observed + inferred + events)                       │
+│   └───────────────┘                                                        │
+│           │                                                                 │
+│           ▼                                                                 │
+│   ┌───────────────┐                                                        │
+│   │  validation   │ → rechaza labels sin evidence_spans                    │
+│   │ (quality gate)│ → marca low_confidence                                 │
+│   └───────────────┘                                                        │
+│           │                                                                 │
+│           ▼                                                                 │
+│   ┌───────────────┐                                                        │
+│   │  aggregation  │ → data/outputs/aggregated_stats.json                   │
+│   │(DETERMINISTIC)│ → data/outputs/rca_*.json                              │
+│   └───────────────┘ → data/outputs/emergent_drivers_review.json            │
+│           │                                                                 │
+│           ▼                                                                 │
+│   ┌───────────────┐                                                        │
+│   │ visualization │ → data/outputs/executive_summary.pdf                   │
+│   │(PRESENTATION) │ → data/outputs/full_analysis.xlsx                      │
+│   └───────────────┘ → http://localhost:8501 (dashboard)                    │
+│                                                                             │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Separación de Responsabilidades (Resumen)
+
+| Capa | Módulo | Tipo de Lógica | Usa LLM |
+|------|--------|----------------|---------|
+| STT | `transcription/` | Conversión audio→texto | No |
+| Texto | `transcription/compressor.py` | Reducción de texto | No |
+| Features | `features/` | Extracción determinística | No |
+| Análisis | `inference/analyzer.py` | Clasificación + evidencia | **Sí** |
+| Narrativa | `inference/rca_synthesizer.py` | Síntesis textual (opcional) | **Sí** |
+| Validación | `validation/` | Reglas de calidad | No |
+| Agregación | `aggregation/` | Estadísticas + RCA tree | No |
+| Presentación | `visualization/` | Reportes + dashboard | No |
+
+---
+
+## Convenciones de Código
+
+### Naming
+
+- **Archivos**: `snake_case.py`
+- **Clases**: `PascalCase`
+- **Funciones/métodos**: `snake_case`
+- **Constantes**: `UPPER_SNAKE_CASE`
+
+### Type hints
+
+Usar type hints en todas las funciones públicas. Pydantic para validación de datos.
+
+### Ejemplo de estructura de módulo
+
+```python
+# src/features/turn_metrics.py
+
+"""Deterministic extraction of turn-based metrics from transcripts."""
+
+from __future__ import annotations
+
+import logging
+from dataclasses import dataclass
+
+from src.transcription.models import TranscriptContract
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class TurnMetrics:
+    """Observed metrics extracted from transcript turns."""
+    agent_talk_pct: float
+    customer_talk_pct: float
+    silence_total_seconds: float
+    interruption_count: int
+    avg_turn_duration_seconds: float
+
+
+class TurnMetricsExtractor:
+    """Extracts turn metrics from transcript. 100% deterministic, no LLM."""
+
+    def extract(self, transcript: TranscriptContract) -> TurnMetrics:
+        """Extract turn metrics from transcript utterances."""
+        utterances = transcript.observed.utterances
+        # ... cálculos determinísticos ...
+        return TurnMetrics(...)
+```