feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions
--- a/docs/API_REFERENCE.md
+++ b/docs/API_REFERENCE.md
@@ -0,0 +1,317 @@
+# API_REFERENCE.md
+
+> Documentación de funciones públicas principales
+
+---
+
+## Transcription Module
+
+### `AssemblyAITranscriber`
+
+```python
+from src.transcription import AssemblyAITranscriber
+
+class AssemblyAITranscriber(Transcriber):
+    def __init__(self, api_key: str, language: str = "es"):
+        """
+        Initialize AssemblyAI transcriber.
+
+        Args:
+            api_key: AssemblyAI API key
+            language: Language code (default: "es" for Spanish)
+        """
+
+    async def transcribe(self, audio_path: Path) -> Transcript:
+        """
+        Transcribe a single audio file.
+
+        Args:
+            audio_path: Path to MP3/WAV file
+
+        Returns:
+            Transcript with speaker diarization
+
+        Raises:
+            TranscriptionError: If API fails
+        """
+
+    async def transcribe_batch(
+        self,
+        audio_paths: list[Path],
+        max_concurrent: int = 5
+    ) -> list[Transcript]:
+        """
+        Transcribe multiple audio files in parallel.
+
+        Args:
+            audio_paths: List of paths to audio files
+            max_concurrent: Max parallel requests
+
+        Returns:
+            List of Transcripts
+        """
+```
+
+**Example:**
+```python
+transcriber = AssemblyAITranscriber(api_key=os.getenv("ASSEMBLYAI_API_KEY"))
+transcript = await transcriber.transcribe(Path("call_001.mp3"))
+print(f"Duration: {transcript.metadata.audio_duration_sec}s")
+print(f"Turns: {len(transcript.turns)}")
+```
+
+---
+
+## Inference Module
+
+### `CallAnalyzer`
+
+```python
+from src.inference import CallAnalyzer, AnalyzerConfig
+
+class CallAnalyzer:
+    def __init__(self, config: AnalyzerConfig | None = None):
+        """
+        Initialize call analyzer.
+
+        Args:
+            config: Analyzer configuration (optional)
+        """
+
+    async def analyze(self, transcript: Transcript) -> CallAnalysis:
+        """
+        Analyze a single transcript.
+
+        Args:
+            transcript: Transcript to analyze
+
+        Returns:
+            CallAnalysis with RCA labels and evidence
+        """
+
+    async def analyze_batch(
+        self,
+        transcripts: list[Transcript],
+        batch_id: str,
+        progress_callback: Callable | None = None
+    ) -> list[CallAnalysis]:
+        """
+        Analyze multiple transcripts in parallel.
+
+        Args:
+            transcripts: List of transcripts
+            batch_id: Batch identifier
+            progress_callback: Optional progress callback
+
+        Returns:
+            List of CallAnalysis results
+        """
+```
+
+**Example:**
+```python
+config = AnalyzerConfig(
+    model="gpt-4o-mini",
+    use_compression=True,
+    max_concurrent=5,
+)
+analyzer = CallAnalyzer(config)
+
+analyses = await analyzer.analyze_batch(
+    transcripts=transcripts,
+    batch_id="batch_001",
+    progress_callback=lambda current, total: print(f"{current}/{total}")
+)
+```
+
+---
+
+## Aggregation Module
+
+### `aggregate_batch`
+
+```python
+from src.aggregation import aggregate_batch
+
+def aggregate_batch(
+    batch_id: str,
+    analyses: list[CallAnalysis]
+) -> BatchAggregation:
+    """
+    Aggregate call analyses into statistics and RCA tree.
+
+    Args:
+        batch_id: Batch identifier
+        analyses: List of call analyses
+
+    Returns:
+        BatchAggregation with frequencies, severities, and RCA tree
+    """
+```
+
+**Example:**
+```python
+aggregation = aggregate_batch("batch_001", analyses)
+print(f"Lost sales drivers: {len(aggregation.lost_sales_frequencies)}")
+print(f"Top driver: {aggregation.rca_tree.top_lost_sales_drivers[0]}")
+```
+
+---
+
+## Pipeline Module
+
+### `CXInsightsPipeline`
+
+```python
+from src.pipeline import CXInsightsPipeline, PipelineConfig
+
+class CXInsightsPipeline:
+    def __init__(
+        self,
+        config: PipelineConfig | None = None,
+        progress_callback: Callable | None = None
+    ):
+        """
+        Initialize pipeline.
+
+        Args:
+            config: Pipeline configuration
+            progress_callback: Optional progress callback
+        """
+
+    def run(
+        self,
+        batch_id: str,
+        audio_files: list[Path] | None = None,
+        transcripts: list[Transcript] | None = None,
+        resume: bool = True
+    ) -> BatchAggregation:
+        """
+        Run full pipeline.
+
+        Args:
+            batch_id: Batch identifier
+            audio_files: Optional list of audio files
+            transcripts: Optional pre-loaded transcripts
+            resume: Whether to resume from checkpoint
+
+        Returns:
+            BatchAggregation with full results
+        """
+```
+
+**Example:**
+```python
+config = PipelineConfig(
+    input_dir=Path("data/audio"),
+    output_dir=Path("data/output"),
+    export_formats=["json", "excel", "pdf"],
+)
+pipeline = CXInsightsPipeline(config)
+
+result = pipeline.run(
+    batch_id="batch_001",
+    audio_files=list(Path("data/audio").glob("*.mp3")),
+)
+```
+
+---
+
+## Export Module
+
+### `export_to_json`
+
+```python
+from src.exports import export_to_json
+
+def export_to_json(
+    batch_id: str,
+    aggregation: BatchAggregation,
+    analyses: list[CallAnalysis],
+    output_dir: Path
+) -> Path:
+    """
+    Export results to JSON files.
+
+    Args:
+        batch_id: Batch identifier
+        aggregation: Aggregated results
+        analyses: Individual call analyses
+        output_dir: Output directory
+
+    Returns:
+        Path to summary.json
+    """
+```
+
+### `export_to_excel`
+
+```python
+from src.exports import export_to_excel
+
+def export_to_excel(
+    batch_id: str,
+    aggregation: BatchAggregation,
+    analyses: list[CallAnalysis],
+    output_dir: Path
+) -> Path:
+    """
+    Export results to Excel workbook.
+
+    Creates sheets:
+    - Summary
+    - Lost Sales Drivers
+    - Poor CX Drivers
+    - Call Details
+    - Emergent Patterns
+
+    Returns:
+        Path to .xlsx file
+    """
+```
+
+### `export_to_pdf`
+
+```python
+from src.exports import export_to_pdf
+
+def export_to_pdf(
+    batch_id: str,
+    aggregation: BatchAggregation,
+    output_dir: Path
+) -> Path:
+    """
+    Export executive report to PDF/HTML.
+
+    Falls back to HTML if weasyprint not installed.
+
+    Returns:
+        Path to .pdf or .html file
+    """
+```
+
+---
+
+## Compression Module
+
+### `TranscriptCompressor`
+
+```python
+from src.compression import TranscriptCompressor
+
+class TranscriptCompressor:
+    def compress(self, transcript: Transcript) -> CompressedTranscript:
+        """
+        Compress transcript by extracting key information.
+
+        Args:
+            transcript: Full transcript
+
+        Returns:
+            CompressedTranscript with >60% token reduction
+        """
+```
+
+---
+
+**Última actualización**: 2026-01-19