BeyondCX_Insights/docs/MODULE_GUIDES.md

# MODULE_GUIDES.md

> Guía de implementación para cada módulo

---

## Guía: Transcription Module

### Archivos involucrados
```
src/transcription/
├── __init__.py
├── base.py           # Interface Transcriber + MockTranscriber
├── assemblyai.py     # AssemblyAITranscriber
└── models.py         # Transcript, SpeakerTurn, TranscriptMetadata
```

### Cómo funciona
1. Audio file entra al transcriber
2. AssemblyAI procesa con diarización (agent/customer)
3. Retorna `Transcript` con `SpeakerTurn[]` y metadata

### Cómo testear
```bash
pytest tests/unit/test_transcription.py -v
```

### Cómo extender
- Para nuevo provider: implementar `Transcriber` interface
- Para modificar output: editar `models.py`

### Troubleshooting
- "API key invalid" → Check `.env` ASSEMBLYAI_API_KEY
- "Audio format not supported" → Convert to MP3/WAV

---

## Guía: Feature Extraction Module

### Archivos involucrados
```
src/features/
├── __init__.py
├── event_detector.py   # HOLD, TRANSFER, SILENCE detection
└── turn_metrics.py     # Talk ratio, interruptions
```

### Cómo funciona
1. Transcript entra
2. Regex + reglas detectan eventos (HOLD, TRANSFER, etc.)
3. Métricas calculadas (talk ratio, speaking time)
4. Transcript enriquecido con `detected_events[]`

### Eventos soportados
- `HOLD_START` / `HOLD_END`
- `TRANSFER`
- `ESCALATION`
- `SILENCE` (> umbral)
- `INTERRUPTION`

### Cómo testear
```bash
pytest tests/unit/test_features.py -v
```

---

## Guía: Compression Module

### Archivos involucrados
```
src/compression/
├── __init__.py
├── compressor.py     # TranscriptCompressor
└── models.py         # CompressedTranscript, CustomerIntent, etc.
```

### Cómo funciona
1. Transcript completo entra
2. Regex español extrae:
   - Customer intents (cancelar, consultar)
   - Agent offers (descuento, upgrade)
   - Objections (precio, competencia)
   - Resolutions
3. Genera `CompressedTranscript` con >60% reducción

### Patrones español
```python
INTENT_PATTERNS = {
    IntentType.CANCEL: [r"quiero\s+cancelar", r"dar\s+de\s+baja"],
    IntentType.INQUIRY: [r"quería\s+saber", r"información\s+sobre"],
}
```

### Cómo testear
```bash
pytest tests/unit/test_compression.py -v
```

---

## Guía: Inference Module

### Archivos involucrados
```
src/inference/
├── __init__.py
├── analyzer.py       # CallAnalyzer (main class)
├── llm_client.py     # OpenAIClient
└── prompts.py        # Spanish MAP prompt
```

### Cómo funciona
1. CompressedTranscript entra
2. Prompt construido con taxonomía + transcript
3. LLM genera JSON con:
   - `outcome`
   - `lost_sales_drivers[]` con evidence
   - `poor_cx_drivers[]` con evidence
4. Response parseada a `CallAnalysis`

### Configuración
```python
AnalyzerConfig(
    model="gpt-4o-mini",
    use_compression=True,
    max_concurrent=5,
)
```

### Cómo testear
```bash
pytest tests/unit/test_inference.py -v
```

---

## Guía: Aggregation Module

### Archivos involucrados
```
src/aggregation/
├── __init__.py
├── statistics.py     # StatisticsCalculator
├── severity.py       # SeverityCalculator
├── rca_tree.py       # RCATreeBuilder
└── models.py         # DriverFrequency, RCATree, etc.
```

### Cómo funciona
1. List[CallAnalysis] entra
2. Statistics: frecuencias por driver
3. Severity: puntuación ponderada
4. RCA Tree: árbol jerárquico ordenado

### Fórmula de severidad
```python
severity = (
    base_severity * 0.4 +
    frequency_factor * 0.3 +
    confidence_factor * 0.2 +
    co_occurrence_factor * 0.1
) * 100
```

### Cómo testear
```bash
pytest tests/unit/test_aggregation.py -v
```

---

## Guía: Pipeline Module

### Archivos involucrados
```
src/pipeline/
├── __init__.py
├── models.py         # PipelineManifest, StageManifest, Config
└── pipeline.py       # CXInsightsPipeline
```

### Stages
1. TRANSCRIPTION
2. FEATURE_EXTRACTION
3. COMPRESSION
4. INFERENCE
5. AGGREGATION
6. EXPORT

### Resume
- Manifest JSON guardado por batch
- `get_resume_stage()` detecta dónde continuar

### Cómo testear
```bash
pytest tests/unit/test_pipeline.py -v
```

---

## Guía: Exports Module

### Archivos involucrados
```
src/exports/
├── __init__.py
├── json_export.py    # Summary + analyses
├── excel_export.py   # Multi-sheet workbook
└── pdf_export.py     # HTML executive report
```

### Formatos
- **JSON**: `summary.json` + `analyses/*.json`
- **Excel**: 5 sheets (Summary, Lost Sales, Poor CX, Details, Patterns)
- **PDF/HTML**: Executive report con métricas

### Cómo testear
```bash
pytest tests/unit/test_pipeline.py::TestExports -v
```

---

**Última actualización**: 2026-01-19