Files
BeyondCX_Insights/docs/ARCHITECTURE.md
sujucu70 75e7b9da3d feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)
Dashboard Features:
- 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export
- Beyond Brand Identity styling (colors #6D84E3, Outfit font)
- RCA Sankey diagram (Driver → Outcome → Churn Risk flow)
- Correlation heatmaps (driver co-occurrence, driver-outcome)
- Outcome Deep Dive (root causes, correlation, duration analysis)
- Export functionality (Excel, HTML, JSON)

Blueprint Compliance:
- FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga)
- Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga)
- Agent: Talento Para Replicar / Oportunidades de Mejora
- Fixed FCR rate calculation (only FIRST_CALL counts as success)

Technical:
- Streamlit + Plotly for interactive visualizations
- Light theme configuration (.streamlit/config.toml)
- Fixed Plotly colorbar titlefont deprecation

Documentation:
- Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md
- Added 4 new technical decisions (TD-014 to TD-017)
- Created TROUBLESHOOTING.md with 10 common issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-19 16:27:30 +01:00

840 lines
44 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CXInsights - Arquitectura del Sistema
## Visión del Producto
CXInsights transforma 5,000-20,000 llamadas de contact center en **RCA Trees ejecutivos** que identifican las causas raíz de:
- **Lost Sales**: Oportunidades de venta perdidas
- **Poor CX**: Experiencias de cliente deficientes
---
## Principios de Diseño Críticos
### 1. Separación Estricta: Observed vs Inferred
**Todo dato debe estar claramente clasificado como HECHO o INFERENCIA.**
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ OBSERVED vs INFERRED │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ OBSERVED (Hechos medibles) INFERRED (Opinión del modelo) │
│ ───────────────────────── ────────────────────────────── │
│ ✓ Duración de la llamada ✗ Sentimiento del cliente │
│ ✓ Número de transfers ✗ Motivo de pérdida de venta │
│ ✓ Tiempo en hold (medido) ✗ Calidad del agente │
│ ✓ Silencios detectados (>N seg) ✗ Clasificación de intent │
│ ✓ Texto transcrito ✗ Resumen de la llamada │
│ ✓ Quién habló cuánto (%) ✗ Outcome (sale/no_sale/resolved) │
│ ✓ Timestamp de eventos ✗ Drivers de RCA │
│ │
│ Regla: Si el LLM lo genera → es INFERRED │
│ Si viene del audio/STT → es OBSERVED │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
**Impacto**: RCA defendible ante stakeholders. Auditoría clara. Separación de hechos y opinión.
### 2. Evidencia Obligatoria por Driver
**Regla dura: Sin `evidence_spans` → el driver NO EXISTE**
```json
{
"rca_code": "LONG_HOLD",
"confidence": 0.77,
"evidence_spans": [
{"start": "02:14", "end": "03:52", "text": "[silence - hold]", "source": "observed"}
]
}
```
Un driver sin evidencia timestamped será rechazado por validación.
### 3. Versionado de Prompts + Schema
**Todo output incluye metadatos de versión para reproducibilidad.**
```json
{
"_meta": {
"schema_version": "1.0.0",
"prompt_version": "call_analysis_v1.2",
"model": "gpt-4o-mini",
"model_version": "2024-07-18",
"processed_at": "2024-01-15T10:30:00Z"
}
}
```
### 4. Taxonomía RCA Cerrada + Canal de Emergentes
**Solo códigos del enum. Única excepción controlada: `OTHER_EMERGENT`**
```json
{
"rca_code": "OTHER_EMERGENT",
"proposed_label": "agent_rushed_due_to_queue_pressure",
"evidence_spans": [...]
}
```
Los `OTHER_EMERGENT` se revisan manualmente y se promueven a taxonomía oficial en siguiente versión.
### 5. Eventos de Journey como Estructura
**No texto libre. Objetos tipados con timestamp.**
```json
{
"journey_events": [
{"type": "CALL_START", "t": "00:00"},
{"type": "GREETING", "t": "00:03"},
{"type": "TRANSFER", "t": "01:42"},
{"type": "HOLD_START", "t": "02:10"},
{"type": "HOLD_END", "t": "03:40"},
{"type": "NEGATIVE_SENTIMENT", "t": "04:05", "source": "inferred"},
{"type": "RESOLUTION_ATTEMPT", "t": "05:20"},
{"type": "CALL_END", "t": "06:15"}
]
}
```
### 6. Adaptador de STT (Sin Lock-in)
**Interfaz abstracta. El proveedor es intercambiable.**
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRANSCRIBER INTERFACE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Interface: Transcriber │
│ ├─ transcribe(audio_path) → TranscriptContract │
│ └─ transcribe_batch(paths) → List[TranscriptContract] │
│ │
│ Implementations: │
│ ├─ AssemblyAITranscriber (default) │
│ ├─ WhisperTranscriber (local/offline) │
│ ├─ GoogleSTTTranscriber (alternative) │
│ └─ AWSTranscribeTranscriber (alternative) │
│ │
│ TranscriptContract (output normalizado): │
│ ├─ call_id: str │
│ ├─ utterances: List[Utterance] │
│ ├─ observed_events: List[ObservedEvent] │
│ └─ metadata: TranscriptMetadata │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Diagrama de Flujo End-to-End
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CXINSIGHTS PIPELINE │
└─────────────────────────────────────────────────────────────────────────────────┘
INPUT PROCESSING OUTPUT
───── ────────── ──────
┌──────────────┐
│ 5K-20K │
│ Audio Files │
│ (.mp3/.wav) │
└──────┬───────┘
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 1: BATCH TRANSCRIPTION (via Transcriber Interface) ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Transcriber Adapter (pluggable: AssemblyAI, Whisper, Google, AWS) │ ║
║ │ ├─ Parallel uploads (configurable concurrency) │ ║
║ │ ├─ Spanish language model │ ║
║ │ ├─ Speaker diarization (Agent vs Customer) │ ║
║ │ └─ Output: TranscriptContract (normalized) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/transcripts/{call_id}.json (TranscriptContract) ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 2: FEATURE EXTRACTION (OBSERVED ONLY) ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Extrae SOLO hechos medibles del transcript: │ ║
║ │ ├─ Duración total │ ║
║ │ ├─ % habla agente vs cliente (ratio) │ ║
║ │ ├─ Silencios > 5s (timestamp + duración) │ ║
║ │ ├─ Interrupciones detectadas │ ║
║ │ ├─ Transfers (si detectables por audio/metadata) │ ║
║ │ └─ Palabras clave literales (sin interpretación) │ ║
║ │ │ ║
║ │ Output: observed_features (100% verificable) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/transcripts/{call_id}_features.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 3: PER-CALL INFERENCE (MAP) - Separación Observed/Inferred ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ LLM Analysis (GPT-4o-mini / Claude 3.5 Sonnet) │ ║
║ │ │ ║
║ │ Input al LLM: │ ║
║ │ ├─ Transcript comprimido │ ║
║ │ ├─ observed_features (contexto factual) │ ║
║ │ └─ Taxonomía RCA (enum cerrado) │ ║
║ │ │ ║
║ │ Output estructurado: │ ║
║ │ ├─ OBSERVED (pass-through, no inferido): │ ║
║ │ │ └─ observed_outcome (si explícito en audio: "venta cerrada") │ ║
║ │ │ │ ║
║ │ ├─ INFERRED (con confidence + evidence obligatoria): │ ║
║ │ │ ├─ intent: {code, confidence, evidence_spans[]} │ ║
║ │ │ ├─ outcome: {code, confidence, evidence_spans[]} │ ║
║ │ │ ├─ sentiment: {score, confidence, evidence_spans[]} │ ║
║ │ │ ├─ lost_sale_driver: {rca_code, confidence, evidence_spans[]} │ ║
║ │ │ ├─ poor_cx_driver: {rca_code, confidence, evidence_spans[]} │ ║
║ │ │ └─ agent_quality: {scores{}, confidence, evidence_spans[]} │ ║
║ │ │ │ ║
║ │ └─ JOURNEY_EVENTS (structured timeline): │ ║
║ │ └─ events[]: {type, t, source: observed|inferred} │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/processed/{call_id}_analysis.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 4: VALIDATION & QUALITY GATE ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Validación estricta antes de agregar: │ ║
║ │ ├─ ¿Tiene evidence_spans todo driver? → Si no, RECHAZAR driver │ ║
║ │ ├─ ¿rca_code está en taxonomía? → Si no, marcar OTHER_EMERGENT │ ║
║ │ ├─ ¿Confidence > umbral? → Si no, marcar low_confidence │ ║
║ │ ├─ ¿Schema version match? → Si no, ERROR │ ║
║ │ └─ ¿Journey events tienen timestamps válidos? │ ║
║ │ │ ║
║ │ Output: validated_analysis.json + validation_report.json │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 5: AGGREGATION (REDUCE) ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Consolidación estadística (solo datos validados): │ ║
║ │ ├─ Conteo por rca_code (taxonomía cerrada) │ ║
║ │ ├─ Distribuciones con confidence_weighted │ ║
║ │ ├─ Separación: high_confidence vs low_confidence │ ║
║ │ ├─ Lista de OTHER_EMERGENT para revisión manual │ ║
║ │ ├─ Cross-tabs (intent × outcome × driver) │ ║
║ │ └─ Correlaciones observed_features ↔ inferred_outcomes │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/outputs/aggregated_stats.json ║
║ 📁 data/outputs/emergent_drivers_review.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 6: RCA TREE GENERATION ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Construcción de árboles (determinístico, no LLM): │ ║
║ │ │ ║
║ │ 🔴 LOST SALES RCA TREE │ ║
║ │ └─ Lost Sales (N=1,250, 25%) │ ║
║ │ ├─ PRICING (45%, avg_conf=0.82) │ ║
║ │ │ ├─ TOO_EXPENSIVE (30%, n=375) │ ║
║ │ │ │ └─ evidence_samples: ["...", "..."] │ ║
║ │ │ └─ COMPETITOR_CHEAPER (15%, n=187) │ ║
║ │ │ └─ evidence_samples: ["...", "..."] │ ║
║ │ └─ ... │ ║
║ │ │ ║
║ │ Cada nodo incluye: │ ║
║ │ ├─ rca_code (del enum) │ ║
║ │ ├─ count, pct │ ║
║ │ ├─ avg_confidence │ ║
║ │ ├─ evidence_samples[] (verbatims representativos) │ ║
║ │ └─ call_ids[] (para drill-down) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/outputs/rca_lost_sales.json ║
║ 📁 data/outputs/rca_poor_cx.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 7: EXECUTIVE REPORTING ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Formatos de salida: │ ║
║ │ ├─ 📊 Streamlit Dashboard (con filtro observed/inferred) │ ║
║ │ ├─ 📑 PDF Executive Summary (incluye confidence disclaimers) │ ║
║ │ ├─ 📈 Excel con drill-down (link a evidence_spans) │ ║
║ │ └─ 🖼️ PNG de árboles RCA (con leyenda de confidence) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════════╝
```
---
## Modelo de Datos (Actualizado)
### TranscriptContract (Module 1 output)
```json
{
"_meta": {
"schema_version": "1.0.0",
"transcriber": "assemblyai",
"transcriber_version": "2024-07",
"processed_at": "2024-01-15T10:30:00Z"
},
"call_id": "c001",
"observed": {
"duration_seconds": 245,
"language_detected": "es",
"speakers": [
{"id": "A", "label": "agent", "talk_time_pct": 0.45},
{"id": "B", "label": "customer", "talk_time_pct": 0.55}
],
"utterances": [
{
"speaker": "A",
"text": "Buenos días, gracias por llamar a Movistar...",
"start_ms": 0,
"end_ms": 3500
}
],
"detected_events": [
{"type": "SILENCE", "start_ms": 72000, "end_ms": 80000, "duration_ms": 8000},
{"type": "CROSSTALK", "start_ms": 45000, "end_ms": 46500}
]
}
}
```
### CallAnalysis (Module 3 output) - CON SEPARACIÓN OBSERVED/INFERRED
```json
{
"_meta": {
"schema_version": "1.0.0",
"prompt_version": "call_analysis_v1.2",
"model": "gpt-4o-mini",
"model_version": "2024-07-18",
"processed_at": "2024-01-15T10:35:00Z"
},
"call_id": "c001",
"observed": {
"duration_seconds": 245,
"agent_talk_pct": 0.45,
"customer_talk_pct": 0.55,
"silence_total_seconds": 38,
"silence_events": [
{"start": "01:12", "end": "01:20", "duration_s": 8}
],
"transfer_count": 0,
"hold_events": [
{"start": "02:14", "end": "03:52", "duration_s": 98}
],
"explicit_outcome": null
},
"inferred": {
"intent": {
"code": "SALES_INQUIRY",
"confidence": 0.91,
"evidence_spans": [
{"start": "00:15", "end": "00:28", "text": "Quería información sobre la fibra de 600 megas"}
]
},
"outcome": {
"code": "NO_SALE",
"confidence": 0.85,
"evidence_spans": [
{"start": "05:40", "end": "05:52", "text": "Lo voy a pensar y ya les llamo yo"}
]
},
"sentiment": {
"overall_score": -0.3,
"evolution": [
{"segment": "start", "score": 0.2},
{"segment": "middle", "score": -0.1},
{"segment": "end", "score": -0.6}
],
"confidence": 0.78,
"evidence_spans": [
{"start": "04:10", "end": "04:25", "text": "Es que me parece carísimo, la verdad"}
]
},
"lost_sale_driver": {
"rca_code": "PRICING_TOO_EXPENSIVE",
"confidence": 0.83,
"evidence_spans": [
{"start": "03:55", "end": "04:08", "text": "59 euros al mes es mucho dinero"},
{"start": "04:10", "end": "04:25", "text": "Es que me parece carísimo, la verdad"}
],
"secondary_driver": {
"rca_code": "COMPETITOR_CHEAPER",
"confidence": 0.71,
"evidence_spans": [
{"start": "04:30", "end": "04:45", "text": "En Vodafone me lo dejan por 45"}
]
}
},
"poor_cx_driver": {
"rca_code": "LONG_HOLD",
"confidence": 0.77,
"evidence_spans": [
{"start": "02:14", "end": "03:52", "text": "[hold - 98 segundos]", "source": "observed"}
]
},
"agent_quality": {
"overall_score": 6,
"dimensions": {
"empathy": 7,
"product_knowledge": 8,
"objection_handling": 4,
"closing_skills": 5
},
"confidence": 0.72,
"evidence_spans": [
{"start": "04:50", "end": "05:10", "text": "Bueno, es el precio que tenemos...", "dimension": "objection_handling"}
]
},
"summary": "Cliente interesado en fibra 600Mb abandona por precio (59€) comparando con Vodafone (45€). Hold largo de 98s. Agente no rebatió objeción de precio."
},
"journey_events": [
{"type": "CALL_START", "t": "00:00", "source": "observed"},
{"type": "GREETING", "t": "00:03", "source": "observed"},
{"type": "INTENT_STATED", "t": "00:15", "source": "inferred"},
{"type": "HOLD_START", "t": "02:14", "source": "observed"},
{"type": "HOLD_END", "t": "03:52", "source": "observed"},
{"type": "PRICE_OBJECTION", "t": "03:55", "source": "inferred"},
{"type": "COMPETITOR_MENTION", "t": "04:30", "source": "inferred"},
{"type": "NEGATIVE_SENTIMENT_PEAK", "t": "04:10", "source": "inferred"},
{"type": "SOFT_DECLINE", "t": "05:40", "source": "inferred"},
{"type": "CALL_END", "t": "06:07", "source": "observed"}
]
}
```
### RCA Tree Node (Module 6 output)
```json
{
"_meta": {
"schema_version": "1.0.0",
"generated_at": "2024-01-15T11:00:00Z",
"taxonomy_version": "rca_taxonomy_v1.0",
"total_calls_analyzed": 5000,
"confidence_threshold_used": 0.70
},
"tree_type": "lost_sales",
"total_affected": {
"count": 1250,
"pct_of_total": 25.0
},
"root": {
"label": "Lost Sales",
"children": [
{
"rca_code": "PRICING",
"label": "Pricing Issues",
"count": 562,
"pct_of_parent": 45.0,
"avg_confidence": 0.82,
"children": [
{
"rca_code": "PRICING_TOO_EXPENSIVE",
"label": "Too Expensive",
"count": 375,
"pct_of_parent": 66.7,
"avg_confidence": 0.84,
"evidence_samples": [
{"call_id": "c001", "text": "59 euros al mes es mucho dinero", "t": "03:55"},
{"call_id": "c042", "text": "No puedo pagar tanto", "t": "02:30"}
],
"call_ids": ["c001", "c042", "c078", "..."]
},
{
"rca_code": "PRICING_COMPETITOR_CHEAPER",
"label": "Competitor Cheaper",
"count": 187,
"pct_of_parent": 33.3,
"avg_confidence": 0.79,
"evidence_samples": [
{"call_id": "c001", "text": "En Vodafone me lo dejan por 45", "t": "04:30"}
],
"call_ids": ["c001", "c015", "..."]
}
]
}
]
},
"other_emergent": [
{
"proposed_label": "agent_rushed_due_to_queue_pressure",
"count": 23,
"evidence_samples": [
{"call_id": "c234", "text": "Perdona que voy con prisa que hay cola", "t": "01:15"}
],
"recommendation": "Considerar añadir a taxonomía v1.1"
}
]
}
```
---
## Taxonomía RCA (config/rca_taxonomy.yaml)
```yaml
# config/rca_taxonomy.yaml
# Version: 1.0.0
# Last updated: 2024-01-15
_meta:
version: "1.0.0"
author: "CXInsights Team"
description: "Closed taxonomy for RCA classification. Only these codes are valid."
# ============================================================================
# INTENTS (Motivo de la llamada)
# ============================================================================
intents:
- SALES_INQUIRY # Consulta de venta
- SALES_UPGRADE # Upgrade de producto
- SUPPORT_TECHNICAL # Soporte técnico
- SUPPORT_BILLING # Consulta de facturación
- COMPLAINT # Queja/reclamación
- CANCELLATION # Solicitud de baja
- GENERAL_INQUIRY # Consulta general
- OTHER_EMERGENT # Captura de nuevos intents
# ============================================================================
# OUTCOMES (Resultado de la llamada)
# ============================================================================
outcomes:
- SALE_COMPLETED # Venta cerrada
- SALE_LOST # Venta perdida
- ISSUE_RESOLVED # Problema resuelto
- ISSUE_UNRESOLVED # Problema no resuelto
- ESCALATED # Escalado a supervisor/otro depto
- CALLBACK_SCHEDULED # Callback programado
- OTHER_EMERGENT
# ============================================================================
# LOST SALE DRIVERS (Por qué se perdió la venta)
# ============================================================================
lost_sale_drivers:
# Pricing cluster
PRICING:
- PRICING_TOO_EXPENSIVE # "Es muy caro"
- PRICING_COMPETITOR_CHEAPER # "En X me lo dan más barato"
- PRICING_NO_DISCOUNT # No se ofreció descuento
- PRICING_PAYMENT_TERMS # Condiciones de pago no aceptables
# Product fit cluster
PRODUCT_FIT:
- PRODUCT_FEATURE_MISSING # Falta funcionalidad requerida
- PRODUCT_WRONG_OFFERED # Se ofreció producto equivocado
- PRODUCT_COVERAGE_AREA # Sin cobertura en su zona
- PRODUCT_TECH_REQUIREMENTS # No cumple requisitos técnicos
# Process cluster
PROCESS:
- PROCESS_TOO_COMPLEX # Proceso demasiado complicado
- PROCESS_DOCUMENTATION # Requiere mucha documentación
- PROCESS_ACTIVATION_TIME # Tiempo de activación largo
- PROCESS_CONTRACT_TERMS # Términos de contrato no aceptables
# Agent cluster
AGENT:
- AGENT_COULDNT_CLOSE # No cerró la venta
- AGENT_POOR_OBJECTION # Mal manejo de objeciones
- AGENT_LACK_URGENCY # No creó urgencia
- AGENT_MISSED_UPSELL # Perdió oportunidad de upsell
# Timing cluster
TIMING:
- TIMING_NOT_READY # Cliente no está listo
- TIMING_COMPARING # Comparando opciones
- TIMING_BUDGET_PENDING # Presupuesto pendiente
# Catch-all
OTHER_EMERGENT: []
# ============================================================================
# POOR CX DRIVERS (Por qué fue mala experiencia)
# ============================================================================
poor_cx_drivers:
# Wait time cluster
WAIT_TIME:
- WAIT_INITIAL_LONG # Espera inicial larga (>2min)
- WAIT_HOLD_LONG # Hold durante llamada largo (>1min)
- WAIT_CALLBACK_NEVER # Callback prometido no llegó
# Resolution cluster
RESOLUTION:
- RESOLUTION_NOT_ACHIEVED # Problema no resuelto
- RESOLUTION_NEEDED_ESCALATION # Necesitó escalación
- RESOLUTION_CALLBACK_BROKEN # Callback prometido incumplido
- RESOLUTION_INCORRECT # Resolución incorrecta
# Agent behavior cluster
AGENT_BEHAVIOR:
- AGENT_LACK_EMPATHY # Falta de empatía
- AGENT_RUDE # Grosero/dismissive
- AGENT_RUSHED # Con prisas
- AGENT_NOT_LISTENING # No escuchaba
# Information cluster
INFORMATION:
- INFO_WRONG_GIVEN # Información incorrecta
- INFO_INCONSISTENT # Información inconsistente
- INFO_COULDNT_ANSWER # No supo responder
# Process/System cluster
PROCESS_SYSTEM:
- SYSTEM_DOWN # Sistema caído
- POLICY_LIMITATION # Limitación de política
- TOO_MANY_TRANSFERS # Demasiados transfers
- AUTH_ISSUES # Problemas de autenticación
# Catch-all
OTHER_EMERGENT: []
# ============================================================================
# JOURNEY EVENT TYPES (Eventos del timeline)
# ============================================================================
journey_event_types:
# Observed (vienen del audio/STT)
observed:
- CALL_START
- CALL_END
- GREETING
- SILENCE # >5 segundos
- HOLD_START
- HOLD_END
- TRANSFER
- CROSSTALK # Hablan a la vez
# Inferred (vienen del LLM)
inferred:
- INTENT_STATED
- PRICE_OBJECTION
- COMPETITOR_MENTION
- NEGATIVE_SENTIMENT_PEAK
- POSITIVE_SENTIMENT_PEAK
- RESOLUTION_ATTEMPT
- SOFT_DECLINE
- HARD_DECLINE
- COMMITMENT
- ESCALATION_REQUEST
```
---
## Diagrama de Componentes (Actualizado)
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ CXINSIGHTS COMPONENTS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TRANSCRIBER INTERFACE (Adapter Pattern) │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │ │
│ │ │ AssemblyAI │ │ Whisper │ │ Google STT │ │ AWS │ │ │
│ │ │ Transcriber │ │ Transcriber │ │ Transcriber │ │ Transcribe │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └─────┬──────┘ │ │
│ │ └────────────────┴────────────────┴───────────────┘ │ │
│ │ ▼ │ │
│ │ TranscriptContract (normalized output) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Feature │ │ Inference │ │ Validation │ │
│ │ Extractor │───▶│ Service │───▶│ Gate │ │
│ │ (observed only) │ │ (observed/infer)│ │ (evidence check)│ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AGGREGATION LAYER │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Stats Engine │ │ RCA Builder │ │ Emergent │ │ │
│ │ │ (by rca_code)│ │(deterministic│ │ Collector │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VISUALIZATION LAYER │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Dashboard │ │ PDF │ │ Excel │ │ PNG │ │ │
│ │ │(obs/infer) │ │ (disclaim) │ │(drill-down)│ │ (legend) │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONFIG LAYER │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ │
│ │ │ rca_taxonomy │ │ prompts/ + │ │ settings │ │ │
│ │ │ v1.0 (enum) │ │ VERSION FILE │ │ (.env) │ │ │
│ │ └────────────────┘ └────────────────┘ └────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## Reglas de Validación (Quality Gate)
```python
# Pseudocódigo de validación
def validate_call_analysis(analysis: CallAnalysis) -> ValidationResult:
errors = []
warnings = []
# REGLA 1: Todo driver debe tener evidence_spans
for driver in [analysis.inferred.lost_sale_driver, analysis.inferred.poor_cx_driver]:
if driver and not driver.evidence_spans:
errors.append(f"Driver {driver.rca_code} sin evidence_spans → RECHAZADO")
# REGLA 2: rca_code debe estar en taxonomía
if driver.rca_code not in TAXONOMY:
if driver.rca_code != "OTHER_EMERGENT":
errors.append(f"rca_code {driver.rca_code} no está en taxonomía")
else:
if not driver.proposed_label:
errors.append("OTHER_EMERGENT requiere proposed_label")
# REGLA 3: Confidence mínima
if driver.confidence < CONFIDENCE_THRESHOLD:
warnings.append(f"Driver {driver.rca_code} con low confidence: {driver.confidence}")
# REGLA 4: Schema version debe coincidir
if analysis._meta.schema_version != EXPECTED_SCHEMA_VERSION:
errors.append(f"Schema mismatch: {analysis._meta.schema_version}")
# REGLA 5: Journey events deben tener timestamps válidos
for event in analysis.journey_events:
if not is_valid_timestamp(event.t):
errors.append(f"Invalid timestamp in event: {event}")
return ValidationResult(
valid=len(errors) == 0,
errors=errors,
warnings=warnings
)
```
---
## Versionado de Prompts
```
config/prompts/
├── versions.yaml # Registry de versiones
├── call_analysis/
│ ├── v1.0/
│ │ ├── system.txt
│ │ ├── user.txt
│ │ └── schema.json # JSON Schema esperado
│ ├── v1.1/
│ │ ├── system.txt
│ │ ├── user.txt
│ │ └── schema.json
│ └── v1.2/ # Current
│ ├── system.txt
│ ├── user.txt
│ └── schema.json
└── rca_synthesis/
└── v1.0/
├── system.txt
└── user.txt
```
```yaml
# config/prompts/versions.yaml
current:
call_analysis: "v1.2"
rca_synthesis: "v1.0"
history:
call_analysis:
v1.0: "2024-01-01"
v1.1: "2024-01-10" # Added secondary_driver support
v1.2: "2024-01-15" # Added journey_events structure
```
---
## Estimaciones
### Tiempo Total (5,000 llamadas, ~4min promedio)
| Stage | Tiempo Estimado |
|-------|-----------------|
| Transcription | 3-4 horas |
| Feature Extraction | 15 min |
| Inference | 2-3 horas |
| Validation | 10 min |
| Aggregation | 10 min |
| RCA Tree Build | 5 min |
| Reporting | 5 min |
| **Total** | **6-8 horas** |
### Costes (ver TECH_STACK.md para detalle)
| Volumen | Transcription | Inference | Total |
|---------|---------------|-----------|-------|
| 5,000 calls | ~$300 | ~$15 | ~$315 |
| 20,000 calls | ~$1,200 | ~$60 | ~$1,260 |
---
## Implementation Status (2026-01-19)
| Module | Status | Location |
|--------|--------|----------|
| Transcription | ✅ Done | `src/transcription/` |
| Feature Extraction | ✅ Done | `src/features/` |
| Compression | ✅ Done | `src/compression/` |
| Inference | ✅ Done | `src/inference/` |
| Validation | ✅ Done | Built into models |
| Aggregation | ✅ Done | `src/aggregation/` |
| RCA Trees | ✅ Done | `src/aggregation/rca_tree.py` |
| Pipeline | ✅ Done | `src/pipeline/` |
| Exports | ✅ Done | `src/exports/` |
| CLI | ✅ Done | `cli.py` |
**Última actualización**: 2026-01-19 | **Versión**: 1.0.0