feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features:
- 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export
- Beyond Brand Identity styling (colors #6D84E3, Outfit font)
- RCA Sankey diagram (Driver → Outcome → Churn Risk flow)
- Correlation heatmaps (driver co-occurrence, driver-outcome)
- Outcome Deep Dive (root causes, correlation, duration analysis)
- Export functionality (Excel, HTML, JSON)

Blueprint Compliance:
- FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga)
- Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga)
- Agent: Talento Para Replicar / Oportunidades de Mejora
- Fixed FCR rate calculation (only FIRST_CALL counts as success)

Technical:
- Streamlit + Plotly for interactive visualizations
- Light theme configuration (.streamlit/config.toml)
- Fixed Plotly colorbar titlefont deprecation

Documentation:
- Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md
- Added 4 new technical decisions (TD-014 to TD-017)
- Created TROUBLESHOOTING.md with 10 common issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
sujucu70
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions

View File

@@ -0,0 +1,651 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 02 - Inference Engine Validation\n",
"\n",
"**Checkpoint 5 validation notebook**\n",
"\n",
"This notebook validates the inference engine components:\n",
"1. LLMClient with JSON strict mode and retries\n",
"2. PromptManager with versioned templates\n",
"3. CallAnalyzer for single-call analysis\n",
"4. BatchAnalyzer with checkpointing\n",
"\n",
"**Note**: Uses mocked LLM responses to avoid API costs during validation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '..')\n",
"\n",
"import json\n",
"from pathlib import Path\n",
"from datetime import datetime\n",
"from unittest.mock import AsyncMock, MagicMock, patch\n",
"\n",
"# Project imports\n",
"from src.inference.client import LLMClient, LLMClientConfig, LLMResponse\n",
"from src.inference.prompt_manager import (\n",
" PromptManager,\n",
" PromptTemplate,\n",
" format_events_for_prompt,\n",
" format_transcript_for_prompt,\n",
" load_taxonomy_for_prompt,\n",
")\n",
"from src.inference.analyzer import CallAnalyzer, AnalyzerConfig\n",
"from src.models.call_analysis import (\n",
" CallAnalysis,\n",
" CallOutcome,\n",
" ProcessingStatus,\n",
" Event,\n",
" EventType,\n",
")\n",
"from src.transcription.models import SpeakerTurn, Transcript, TranscriptMetadata\n",
"\n",
"print(\"Imports successful!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Prompt Manager Validation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize prompt manager\n",
"prompts_dir = Path('../config/prompts')\n",
"manager = PromptManager(prompts_dir)\n",
"\n",
"print(f\"Prompts directory: {prompts_dir}\")\n",
"print(f\"Available prompt types: {manager.list_prompt_types()}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load call analysis prompt\n",
"template = manager.load('call_analysis', 'v1.0')\n",
"\n",
"print(f\"Template name: {template.name}\")\n",
"print(f\"Template version: {template.version}\")\n",
"print(f\"System prompt length: {len(template.system)} chars\")\n",
"print(f\"User prompt length: {len(template.user)} chars\")\n",
"print(f\"Has schema: {template.schema is not None}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test template rendering\n",
"system, user = template.render(\n",
" call_id=\"TEST001\",\n",
" transcript=\"AGENT: Hola, buenos días\\nCUSTOMER: Quiero cancelar\",\n",
" duration_sec=120.5,\n",
" queue=\"ventas\",\n",
" observed_events=\"- HOLD_START at 30.0s\",\n",
" lost_sales_taxonomy=\"- PRICE_TOO_HIGH: Customer mentions price concerns\",\n",
" poor_cx_taxonomy=\"- LONG_HOLD: Extended hold times\",\n",
")\n",
"\n",
"print(\"=== SYSTEM PROMPT (first 500 chars) ===\")\n",
"print(system[:500])\n",
"print(\"\\n=== USER PROMPT (first 500 chars) ===\")\n",
"print(user[:500])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test taxonomy loading\n",
"lost_sales_tax, poor_cx_tax = load_taxonomy_for_prompt(\n",
" Path('../config/rca_taxonomy.yaml')\n",
")\n",
"\n",
"print(\"=== LOST SALES TAXONOMY ===\")\n",
"print(lost_sales_tax[:500] if lost_sales_tax else \"(empty)\")\n",
"print(\"\\n=== POOR CX TAXONOMY ===\")\n",
"print(poor_cx_tax[:500] if poor_cx_tax else \"(empty)\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. LLMClient Validation (Mocked)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test LLMResponse cost estimation\n",
"response = LLMResponse(\n",
" content='{\"outcome\": \"LOST_SALE\"}',\n",
" prompt_tokens=1000,\n",
" completion_tokens=500,\n",
" total_tokens=1500,\n",
" success=True,\n",
" model=\"gpt-4o-mini\",\n",
")\n",
"\n",
"print(f\"Response success: {response.success}\")\n",
"print(f\"Total tokens: {response.total_tokens}\")\n",
"print(f\"Estimated cost: ${response.cost_estimate_usd:.6f}\")\n",
"print(f\"Model: {response.model}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test JSON parsing with mocked client\n",
"with patch.dict('os.environ', {'OPENAI_API_KEY': 'test-key'}):\n",
" client = LLMClient()\n",
" \n",
" # Test various JSON formats\n",
" test_cases = [\n",
" ('{\"key\": \"value\"}', \"Plain JSON\"),\n",
" ('```json\\n{\"key\": \"value\"}\\n```', \"Markdown block\"),\n",
" ('Here is the result: {\"key\": \"value\"} done.', \"Embedded JSON\"),\n",
" ('not json', \"Invalid\"),\n",
" ]\n",
" \n",
" for content, desc in test_cases:\n",
" result = client._parse_json(content)\n",
" print(f\"{desc}: {result}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Formatting Functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test event formatting\n",
"events = [\n",
" Event(event_type=EventType.HOLD_START, start_time=10.0),\n",
" Event(event_type=EventType.HOLD_END, start_time=45.0),\n",
" Event(event_type=EventType.SILENCE, start_time=60.0, duration_sec=8.5),\n",
" Event(event_type=EventType.TRANSFER, start_time=120.0),\n",
"]\n",
"\n",
"events_text = format_events_for_prompt(events)\n",
"print(\"=== FORMATTED EVENTS ===\")\n",
"print(events_text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test transcript formatting\n",
"turns = [\n",
" SpeakerTurn(speaker=\"agent\", text=\"Hola, buenos días, gracias por llamar.\", start_time=0.0, end_time=2.5),\n",
" SpeakerTurn(speaker=\"customer\", text=\"Hola, quiero información sobre los precios.\", start_time=3.0, end_time=5.0),\n",
" SpeakerTurn(speaker=\"agent\", text=\"Claro, ¿qué producto le interesa?\", start_time=5.5, end_time=7.0),\n",
" SpeakerTurn(speaker=\"customer\", text=\"El plan premium, pero es muy caro.\", start_time=7.5, end_time=10.0),\n",
"]\n",
"\n",
"transcript_text = format_transcript_for_prompt(turns)\n",
"print(\"=== FORMATTED TRANSCRIPT ===\")\n",
"print(transcript_text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test truncation\n",
"long_turns = [\n",
" SpeakerTurn(speaker=\"agent\", text=\"A\" * 3000, start_time=0.0, end_time=30.0),\n",
" SpeakerTurn(speaker=\"customer\", text=\"B\" * 3000, start_time=30.0, end_time=60.0),\n",
" SpeakerTurn(speaker=\"agent\", text=\"C\" * 3000, start_time=60.0, end_time=90.0),\n",
"]\n",
"\n",
"truncated = format_transcript_for_prompt(long_turns, max_chars=5000)\n",
"print(f\"Truncated length: {len(truncated)} chars\")\n",
"print(f\"Contains truncation marker: {'truncated' in truncated}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. CallAnalyzer Validation (Mocked LLM)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create test transcript\n",
"test_transcript = Transcript(\n",
" call_id=\"VAL001\",\n",
" turns=[\n",
" SpeakerTurn(speaker=\"agent\", text=\"Hola, buenos días.\", start_time=0.0, end_time=1.5),\n",
" SpeakerTurn(speaker=\"customer\", text=\"Hola, quiero cancelar mi servicio.\", start_time=2.0, end_time=4.0),\n",
" SpeakerTurn(speaker=\"agent\", text=\"¿Puedo preguntar el motivo?\", start_time=4.5, end_time=6.0),\n",
" SpeakerTurn(speaker=\"customer\", text=\"Es demasiado caro para mí.\", start_time=6.5, end_time=8.5),\n",
" SpeakerTurn(speaker=\"agent\", text=\"Entiendo. ¿Le puedo ofrecer un descuento?\", start_time=9.0, end_time=11.0),\n",
" SpeakerTurn(speaker=\"customer\", text=\"No gracias, ya tomé la decisión.\", start_time=11.5, end_time=13.5),\n",
" ],\n",
" metadata=TranscriptMetadata(\n",
" audio_duration_sec=60.0,\n",
" language=\"es\",\n",
" provider=\"assemblyai\",\n",
" ),\n",
")\n",
"\n",
"print(f\"Test transcript: {test_transcript.call_id}\")\n",
"print(f\"Turns: {len(test_transcript.turns)}\")\n",
"print(f\"Duration: {test_transcript.metadata.audio_duration_sec}s\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Mock LLM response for lost sale\n",
"mock_llm_response = {\n",
" \"outcome\": \"LOST_SALE\",\n",
" \"lost_sales_drivers\": [\n",
" {\n",
" \"driver_code\": \"PRICE_TOO_HIGH\",\n",
" \"confidence\": 0.92,\n",
" \"evidence_spans\": [\n",
" {\n",
" \"text\": \"Es demasiado caro para mí\",\n",
" \"start_time\": 6.5,\n",
" \"end_time\": 8.5,\n",
" \"speaker\": \"customer\"\n",
" }\n",
" ],\n",
" \"reasoning\": \"Customer explicitly states the service is too expensive\"\n",
" },\n",
" {\n",
" \"driver_code\": \"RETENTION_ATTEMPT_FAILED\",\n",
" \"confidence\": 0.85,\n",
" \"evidence_spans\": [\n",
" {\n",
" \"text\": \"No gracias, ya tomé la decisión\",\n",
" \"start_time\": 11.5,\n",
" \"end_time\": 13.5,\n",
" \"speaker\": \"customer\"\n",
" }\n",
" ],\n",
" \"reasoning\": \"Customer rejected discount offer indicating firm decision\"\n",
" }\n",
" ],\n",
" \"poor_cx_drivers\": []\n",
"}\n",
"\n",
"print(\"Mock LLM response prepared\")\n",
"print(json.dumps(mock_llm_response, indent=2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test analyzer with mocked LLM\n",
"with patch.dict('os.environ', {'OPENAI_API_KEY': 'test-key'}):\n",
" # Create mock LLM client\n",
" mock_client = MagicMock(spec=LLMClient)\n",
" mock_client.complete.return_value = LLMResponse(\n",
" content=json.dumps(mock_llm_response),\n",
" parsed_json=mock_llm_response,\n",
" prompt_tokens=500,\n",
" completion_tokens=200,\n",
" total_tokens=700,\n",
" success=True,\n",
" model=\"gpt-4o-mini\",\n",
" )\n",
" \n",
" # Create analyzer with mock client\n",
" analyzer = CallAnalyzer(\n",
" llm_client=mock_client,\n",
" config=AnalyzerConfig(\n",
" prompt_version=\"v1.0\",\n",
" min_confidence_threshold=0.3,\n",
" ),\n",
" )\n",
" \n",
" # Analyze\n",
" result = analyzer.analyze(test_transcript, batch_id=\"validation\")\n",
" \n",
" print(f\"Analysis status: {result.status}\")\n",
" print(f\"Outcome: {result.outcome}\")\n",
" print(f\"Lost sales drivers: {len(result.lost_sales_drivers)}\")\n",
" print(f\"Poor CX drivers: {len(result.poor_cx_drivers)}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Validate result structure\n",
"print(\"=== CALL ANALYSIS RESULT ===\")\n",
"print(f\"Call ID: {result.call_id}\")\n",
"print(f\"Batch ID: {result.batch_id}\")\n",
"print(f\"Status: {result.status}\")\n",
"print(f\"Outcome: {result.outcome}\")\n",
"\n",
"print(\"\\n=== OBSERVED FEATURES ===\")\n",
"print(f\"Audio duration: {result.observed.audio_duration_sec}s\")\n",
"print(f\"Events: {len(result.observed.events)}\")\n",
"print(f\"Agent talk ratio: {result.observed.agent_talk_ratio:.2%}\")\n",
"\n",
"print(\"\\n=== LOST SALES DRIVERS ===\")\n",
"for driver in result.lost_sales_drivers:\n",
" print(f\" - {driver.driver_code} (conf: {driver.confidence:.2f})\")\n",
" print(f\" Evidence: \\\"{driver.evidence_spans[0].text}\\\"\")\n",
"\n",
"print(\"\\n=== TRACEABILITY ===\")\n",
"print(f\"Schema version: {result.traceability.schema_version}\")\n",
"print(f\"Prompt version: {result.traceability.prompt_version}\")\n",
"print(f\"Model ID: {result.traceability.model_id}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Validate JSON serialization\n",
"result_dict = result.model_dump()\n",
"result_json = json.dumps(result_dict, indent=2, default=str)\n",
"\n",
"print(f\"Serialized JSON length: {len(result_json)} chars\")\n",
"print(\"\\n=== SAMPLE OUTPUT (first 1500 chars) ===\")\n",
"print(result_json[:1500])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Validation of Evidence Requirements"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from src.models.call_analysis import RCALabel, EvidenceSpan\n",
"\n",
"# Test: RCALabel requires evidence\n",
"print(\"Testing evidence requirements...\")\n",
"\n",
"# Valid: with evidence\n",
"try:\n",
" valid_label = RCALabel(\n",
" driver_code=\"PRICE_TOO_HIGH\",\n",
" confidence=0.9,\n",
" evidence_spans=[\n",
" EvidenceSpan(text=\"Es muy caro\", start_time=10.0, end_time=12.0)\n",
" ],\n",
" )\n",
" print(\"✓ Valid label with evidence created successfully\")\n",
"except Exception as e:\n",
" print(f\"✗ Unexpected error: {e}\")\n",
"\n",
"# Invalid: without evidence\n",
"try:\n",
" invalid_label = RCALabel(\n",
" driver_code=\"PRICE_TOO_HIGH\",\n",
" confidence=0.9,\n",
" evidence_spans=[], # Empty!\n",
" )\n",
" print(\"✗ Should have raised error for empty evidence\")\n",
"except ValueError as e:\n",
" print(f\"✓ Correctly rejected: {e}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test: OTHER_EMERGENT requires proposed_label\n",
"print(\"\\nTesting OTHER_EMERGENT requirements...\")\n",
"\n",
"evidence = [EvidenceSpan(text=\"test\", start_time=0, end_time=1)]\n",
"\n",
"# Valid: with proposed_label\n",
"try:\n",
" emergent_valid = RCALabel(\n",
" driver_code=\"OTHER_EMERGENT\",\n",
" confidence=0.7,\n",
" evidence_spans=evidence,\n",
" proposed_label=\"NEW_PATTERN_DISCOVERED\",\n",
" )\n",
" print(f\"✓ OTHER_EMERGENT with proposed_label: {emergent_valid.proposed_label}\")\n",
"except Exception as e:\n",
" print(f\"✗ Unexpected error: {e}\")\n",
"\n",
"# Invalid: without proposed_label\n",
"try:\n",
" emergent_invalid = RCALabel(\n",
" driver_code=\"OTHER_EMERGENT\",\n",
" confidence=0.7,\n",
" evidence_spans=evidence,\n",
" # No proposed_label!\n",
" )\n",
" print(\"✗ Should have raised error for missing proposed_label\")\n",
"except ValueError as e:\n",
" print(f\"✓ Correctly rejected: {e}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test: Confidence bounds\n",
"print(\"\\nTesting confidence bounds...\")\n",
"\n",
"evidence = [EvidenceSpan(text=\"test\", start_time=0, end_time=1)]\n",
"\n",
"# Valid: confidence in range\n",
"for conf in [0.0, 0.5, 1.0]:\n",
" try:\n",
" label = RCALabel(\n",
" driver_code=\"TEST\",\n",
" confidence=conf,\n",
" evidence_spans=evidence,\n",
" )\n",
" print(f\"✓ Confidence {conf} accepted\")\n",
" except Exception as e:\n",
" print(f\"✗ Confidence {conf} rejected: {e}\")\n",
"\n",
"# Invalid: out of range\n",
"for conf in [-0.1, 1.5]:\n",
" try:\n",
" label = RCALabel(\n",
" driver_code=\"TEST\",\n",
" confidence=conf,\n",
" evidence_spans=evidence,\n",
" )\n",
" print(f\"✗ Confidence {conf} should have been rejected\")\n",
" except ValueError as e:\n",
" print(f\"✓ Confidence {conf} correctly rejected\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Batch Analyzer Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from src.inference.batch_analyzer import BatchAnalyzer, BatchAnalyzerConfig, BatchCheckpoint\n",
"\n",
"# Test checkpoint serialization\n",
"checkpoint = BatchCheckpoint(\n",
" batch_id=\"test_batch_001\",\n",
" total_calls=100,\n",
" processed_call_ids=[\"CALL001\", \"CALL002\", \"CALL003\"],\n",
" failed_call_ids={\"CALL004\": \"LLM timeout\"},\n",
" success_count=3,\n",
" partial_count=0,\n",
" failed_count=1,\n",
")\n",
"\n",
"print(\"=== CHECKPOINT ===\")\n",
"print(f\"Batch ID: {checkpoint.batch_id}\")\n",
"print(f\"Total: {checkpoint.total_calls}\")\n",
"print(f\"Processed: {len(checkpoint.processed_call_ids)}\")\n",
"print(f\"Failed: {len(checkpoint.failed_call_ids)}\")\n",
"\n",
"# Test round-trip\n",
"checkpoint_dict = checkpoint.to_dict()\n",
"restored = BatchCheckpoint.from_dict(checkpoint_dict)\n",
"\n",
"print(f\"\\nRound-trip successful: {restored.batch_id == checkpoint.batch_id}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test batch config\n",
"config = BatchAnalyzerConfig(\n",
" batch_size=10,\n",
" max_concurrent=5,\n",
" requests_per_minute=200,\n",
" save_interval=10,\n",
")\n",
"\n",
"print(\"=== BATCH CONFIG ===\")\n",
"print(f\"Batch size: {config.batch_size}\")\n",
"print(f\"Max concurrent: {config.max_concurrent}\")\n",
"print(f\"Requests/minute: {config.requests_per_minute}\")\n",
"print(f\"Save interval: {config.save_interval}\")\n",
"print(f\"Checkpoint dir: {config.checkpoint_dir}\")\n",
"print(f\"Output dir: {config.output_dir}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Summary\n",
"\n",
"### Components Validated:\n",
"\n",
"1. **PromptManager** ✓\n",
" - Loads versioned prompts from config/prompts/\n",
" - Template rendering with safe_substitute\n",
" - Taxonomy loading for RCA drivers\n",
"\n",
"2. **LLMClient** ✓\n",
" - Cost estimation based on tokens\n",
" - JSON parsing (plain, markdown blocks, embedded)\n",
" - Usage statistics tracking\n",
"\n",
"3. **CallAnalyzer** ✓\n",
" - Combines observed features + LLM inference\n",
" - Produces CallAnalysis with full traceability\n",
" - Evidence validation enforced\n",
"\n",
"4. **BatchAnalyzer** ✓\n",
" - Checkpoint serialization/restoration\n",
" - Configurable concurrency and rate limiting\n",
" - Incremental saving support\n",
"\n",
"5. **Data Contracts** ✓\n",
" - Evidence required for all RCA labels\n",
" - Confidence bounds enforced (0-1)\n",
" - OTHER_EMERGENT requires proposed_label\n",
"\n",
"### Ready for:\n",
"- Integration with real OpenAI API\n",
"- Batch processing of transcripts\n",
"- Checkpoint/resume for long-running jobs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"=\"*50)\n",
"print(\"CHECKPOINT 5 - INFERENCE ENGINE VALIDATION COMPLETE\")\n",
"print(\"=\"*50)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}