Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
44 KiB
CXInsights - Arquitectura del Sistema
Visión del Producto
CXInsights transforma 5,000-20,000 llamadas de contact center en RCA Trees ejecutivos que identifican las causas raíz de:
- Lost Sales: Oportunidades de venta perdidas
- Poor CX: Experiencias de cliente deficientes
Principios de Diseño Críticos
1. Separación Estricta: Observed vs Inferred
Todo dato debe estar claramente clasificado como HECHO o INFERENCIA.
┌─────────────────────────────────────────────────────────────────────────────┐
│ OBSERVED vs INFERRED │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ OBSERVED (Hechos medibles) INFERRED (Opinión del modelo) │
│ ───────────────────────── ────────────────────────────── │
│ ✓ Duración de la llamada ✗ Sentimiento del cliente │
│ ✓ Número de transfers ✗ Motivo de pérdida de venta │
│ ✓ Tiempo en hold (medido) ✗ Calidad del agente │
│ ✓ Silencios detectados (>N seg) ✗ Clasificación de intent │
│ ✓ Texto transcrito ✗ Resumen de la llamada │
│ ✓ Quién habló cuánto (%) ✗ Outcome (sale/no_sale/resolved) │
│ ✓ Timestamp de eventos ✗ Drivers de RCA │
│ │
│ Regla: Si el LLM lo genera → es INFERRED │
│ Si viene del audio/STT → es OBSERVED │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Impacto: RCA defendible ante stakeholders. Auditoría clara. Separación de hechos y opinión.
2. Evidencia Obligatoria por Driver
Regla dura: Sin evidence_spans → el driver NO EXISTE
{
"rca_code": "LONG_HOLD",
"confidence": 0.77,
"evidence_spans": [
{"start": "02:14", "end": "03:52", "text": "[silence - hold]", "source": "observed"}
]
}
Un driver sin evidencia timestamped será rechazado por validación.
3. Versionado de Prompts + Schema
Todo output incluye metadatos de versión para reproducibilidad.
{
"_meta": {
"schema_version": "1.0.0",
"prompt_version": "call_analysis_v1.2",
"model": "gpt-4o-mini",
"model_version": "2024-07-18",
"processed_at": "2024-01-15T10:30:00Z"
}
}
4. Taxonomía RCA Cerrada + Canal de Emergentes
Solo códigos del enum. Única excepción controlada: OTHER_EMERGENT
{
"rca_code": "OTHER_EMERGENT",
"proposed_label": "agent_rushed_due_to_queue_pressure",
"evidence_spans": [...]
}
Los OTHER_EMERGENT se revisan manualmente y se promueven a taxonomía oficial en siguiente versión.
5. Eventos de Journey como Estructura
No texto libre. Objetos tipados con timestamp.
{
"journey_events": [
{"type": "CALL_START", "t": "00:00"},
{"type": "GREETING", "t": "00:03"},
{"type": "TRANSFER", "t": "01:42"},
{"type": "HOLD_START", "t": "02:10"},
{"type": "HOLD_END", "t": "03:40"},
{"type": "NEGATIVE_SENTIMENT", "t": "04:05", "source": "inferred"},
{"type": "RESOLUTION_ATTEMPT", "t": "05:20"},
{"type": "CALL_END", "t": "06:15"}
]
}
6. Adaptador de STT (Sin Lock-in)
Interfaz abstracta. El proveedor es intercambiable.
┌─────────────────────────────────────────────────────────────────────────────┐
│ TRANSCRIBER INTERFACE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Interface: Transcriber │
│ ├─ transcribe(audio_path) → TranscriptContract │
│ └─ transcribe_batch(paths) → List[TranscriptContract] │
│ │
│ Implementations: │
│ ├─ AssemblyAITranscriber (default) │
│ ├─ WhisperTranscriber (local/offline) │
│ ├─ GoogleSTTTranscriber (alternative) │
│ └─ AWSTranscribeTranscriber (alternative) │
│ │
│ TranscriptContract (output normalizado): │
│ ├─ call_id: str │
│ ├─ utterances: List[Utterance] │
│ ├─ observed_events: List[ObservedEvent] │
│ └─ metadata: TranscriptMetadata │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Diagrama de Flujo End-to-End
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CXINSIGHTS PIPELINE │
└─────────────────────────────────────────────────────────────────────────────────┘
INPUT PROCESSING OUTPUT
───── ────────── ──────
┌──────────────┐
│ 5K-20K │
│ Audio Files │
│ (.mp3/.wav) │
└──────┬───────┘
│
▼
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 1: BATCH TRANSCRIPTION (via Transcriber Interface) ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Transcriber Adapter (pluggable: AssemblyAI, Whisper, Google, AWS) │ ║
║ │ ├─ Parallel uploads (configurable concurrency) │ ║
║ │ ├─ Spanish language model │ ║
║ │ ├─ Speaker diarization (Agent vs Customer) │ ║
║ │ └─ Output: TranscriptContract (normalized) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/transcripts/{call_id}.json (TranscriptContract) ║
╚══════════════════════════════════════════════════════════════════════════════╝
│
▼
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 2: FEATURE EXTRACTION (OBSERVED ONLY) ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Extrae SOLO hechos medibles del transcript: │ ║
║ │ ├─ Duración total │ ║
║ │ ├─ % habla agente vs cliente (ratio) │ ║
║ │ ├─ Silencios > 5s (timestamp + duración) │ ║
║ │ ├─ Interrupciones detectadas │ ║
║ │ ├─ Transfers (si detectables por audio/metadata) │ ║
║ │ └─ Palabras clave literales (sin interpretación) │ ║
║ │ │ ║
║ │ Output: observed_features (100% verificable) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/transcripts/{call_id}_features.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
│
▼
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 3: PER-CALL INFERENCE (MAP) - Separación Observed/Inferred ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ LLM Analysis (GPT-4o-mini / Claude 3.5 Sonnet) │ ║
║ │ │ ║
║ │ Input al LLM: │ ║
║ │ ├─ Transcript comprimido │ ║
║ │ ├─ observed_features (contexto factual) │ ║
║ │ └─ Taxonomía RCA (enum cerrado) │ ║
║ │ │ ║
║ │ Output estructurado: │ ║
║ │ ├─ OBSERVED (pass-through, no inferido): │ ║
║ │ │ └─ observed_outcome (si explícito en audio: "venta cerrada") │ ║
║ │ │ │ ║
║ │ ├─ INFERRED (con confidence + evidence obligatoria): │ ║
║ │ │ ├─ intent: {code, confidence, evidence_spans[]} │ ║
║ │ │ ├─ outcome: {code, confidence, evidence_spans[]} │ ║
║ │ │ ├─ sentiment: {score, confidence, evidence_spans[]} │ ║
║ │ │ ├─ lost_sale_driver: {rca_code, confidence, evidence_spans[]} │ ║
║ │ │ ├─ poor_cx_driver: {rca_code, confidence, evidence_spans[]} │ ║
║ │ │ └─ agent_quality: {scores{}, confidence, evidence_spans[]} │ ║
║ │ │ │ ║
║ │ └─ JOURNEY_EVENTS (structured timeline): │ ║
║ │ └─ events[]: {type, t, source: observed|inferred} │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/processed/{call_id}_analysis.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
│
▼
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 4: VALIDATION & QUALITY GATE ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Validación estricta antes de agregar: │ ║
║ │ ├─ ¿Tiene evidence_spans todo driver? → Si no, RECHAZAR driver │ ║
║ │ ├─ ¿rca_code está en taxonomía? → Si no, marcar OTHER_EMERGENT │ ║
║ │ ├─ ¿Confidence > umbral? → Si no, marcar low_confidence │ ║
║ │ ├─ ¿Schema version match? → Si no, ERROR │ ║
║ │ └─ ¿Journey events tienen timestamps válidos? │ ║
║ │ │ ║
║ │ Output: validated_analysis.json + validation_report.json │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════════╝
│
▼
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 5: AGGREGATION (REDUCE) ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Consolidación estadística (solo datos validados): │ ║
║ │ ├─ Conteo por rca_code (taxonomía cerrada) │ ║
║ │ ├─ Distribuciones con confidence_weighted │ ║
║ │ ├─ Separación: high_confidence vs low_confidence │ ║
║ │ ├─ Lista de OTHER_EMERGENT para revisión manual │ ║
║ │ ├─ Cross-tabs (intent × outcome × driver) │ ║
║ │ └─ Correlaciones observed_features ↔ inferred_outcomes │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/outputs/aggregated_stats.json ║
║ 📁 data/outputs/emergent_drivers_review.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
│
▼
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 6: RCA TREE GENERATION ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Construcción de árboles (determinístico, no LLM): │ ║
║ │ │ ║
║ │ 🔴 LOST SALES RCA TREE │ ║
║ │ └─ Lost Sales (N=1,250, 25%) │ ║
║ │ ├─ PRICING (45%, avg_conf=0.82) │ ║
║ │ │ ├─ TOO_EXPENSIVE (30%, n=375) │ ║
║ │ │ │ └─ evidence_samples: ["...", "..."] │ ║
║ │ │ └─ COMPETITOR_CHEAPER (15%, n=187) │ ║
║ │ │ └─ evidence_samples: ["...", "..."] │ ║
║ │ └─ ... │ ║
║ │ │ ║
║ │ Cada nodo incluye: │ ║
║ │ ├─ rca_code (del enum) │ ║
║ │ ├─ count, pct │ ║
║ │ ├─ avg_confidence │ ║
║ │ ├─ evidence_samples[] (verbatims representativos) │ ║
║ │ └─ call_ids[] (para drill-down) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ 📁 data/outputs/rca_lost_sales.json ║
║ 📁 data/outputs/rca_poor_cx.json ║
╚══════════════════════════════════════════════════════════════════════════════╝
│
▼
╔══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 7: EXECUTIVE REPORTING ║
║ ┌────────────────────────────────────────────────────────────────────────┐ ║
║ │ Formatos de salida: │ ║
║ │ ├─ 📊 Streamlit Dashboard (con filtro observed/inferred) │ ║
║ │ ├─ 📑 PDF Executive Summary (incluye confidence disclaimers) │ ║
║ │ ├─ 📈 Excel con drill-down (link a evidence_spans) │ ║
║ │ └─ 🖼️ PNG de árboles RCA (con leyenda de confidence) │ ║
║ └────────────────────────────────────────────────────────────────────────┘ ║
╚══════════════════════════════════════════════════════════════════════════════╝
Modelo de Datos (Actualizado)
TranscriptContract (Module 1 output)
{
"_meta": {
"schema_version": "1.0.0",
"transcriber": "assemblyai",
"transcriber_version": "2024-07",
"processed_at": "2024-01-15T10:30:00Z"
},
"call_id": "c001",
"observed": {
"duration_seconds": 245,
"language_detected": "es",
"speakers": [
{"id": "A", "label": "agent", "talk_time_pct": 0.45},
{"id": "B", "label": "customer", "talk_time_pct": 0.55}
],
"utterances": [
{
"speaker": "A",
"text": "Buenos días, gracias por llamar a Movistar...",
"start_ms": 0,
"end_ms": 3500
}
],
"detected_events": [
{"type": "SILENCE", "start_ms": 72000, "end_ms": 80000, "duration_ms": 8000},
{"type": "CROSSTALK", "start_ms": 45000, "end_ms": 46500}
]
}
}
CallAnalysis (Module 3 output) - CON SEPARACIÓN OBSERVED/INFERRED
{
"_meta": {
"schema_version": "1.0.0",
"prompt_version": "call_analysis_v1.2",
"model": "gpt-4o-mini",
"model_version": "2024-07-18",
"processed_at": "2024-01-15T10:35:00Z"
},
"call_id": "c001",
"observed": {
"duration_seconds": 245,
"agent_talk_pct": 0.45,
"customer_talk_pct": 0.55,
"silence_total_seconds": 38,
"silence_events": [
{"start": "01:12", "end": "01:20", "duration_s": 8}
],
"transfer_count": 0,
"hold_events": [
{"start": "02:14", "end": "03:52", "duration_s": 98}
],
"explicit_outcome": null
},
"inferred": {
"intent": {
"code": "SALES_INQUIRY",
"confidence": 0.91,
"evidence_spans": [
{"start": "00:15", "end": "00:28", "text": "Quería información sobre la fibra de 600 megas"}
]
},
"outcome": {
"code": "NO_SALE",
"confidence": 0.85,
"evidence_spans": [
{"start": "05:40", "end": "05:52", "text": "Lo voy a pensar y ya les llamo yo"}
]
},
"sentiment": {
"overall_score": -0.3,
"evolution": [
{"segment": "start", "score": 0.2},
{"segment": "middle", "score": -0.1},
{"segment": "end", "score": -0.6}
],
"confidence": 0.78,
"evidence_spans": [
{"start": "04:10", "end": "04:25", "text": "Es que me parece carísimo, la verdad"}
]
},
"lost_sale_driver": {
"rca_code": "PRICING_TOO_EXPENSIVE",
"confidence": 0.83,
"evidence_spans": [
{"start": "03:55", "end": "04:08", "text": "59 euros al mes es mucho dinero"},
{"start": "04:10", "end": "04:25", "text": "Es que me parece carísimo, la verdad"}
],
"secondary_driver": {
"rca_code": "COMPETITOR_CHEAPER",
"confidence": 0.71,
"evidence_spans": [
{"start": "04:30", "end": "04:45", "text": "En Vodafone me lo dejan por 45"}
]
}
},
"poor_cx_driver": {
"rca_code": "LONG_HOLD",
"confidence": 0.77,
"evidence_spans": [
{"start": "02:14", "end": "03:52", "text": "[hold - 98 segundos]", "source": "observed"}
]
},
"agent_quality": {
"overall_score": 6,
"dimensions": {
"empathy": 7,
"product_knowledge": 8,
"objection_handling": 4,
"closing_skills": 5
},
"confidence": 0.72,
"evidence_spans": [
{"start": "04:50", "end": "05:10", "text": "Bueno, es el precio que tenemos...", "dimension": "objection_handling"}
]
},
"summary": "Cliente interesado en fibra 600Mb abandona por precio (59€) comparando con Vodafone (45€). Hold largo de 98s. Agente no rebatió objeción de precio."
},
"journey_events": [
{"type": "CALL_START", "t": "00:00", "source": "observed"},
{"type": "GREETING", "t": "00:03", "source": "observed"},
{"type": "INTENT_STATED", "t": "00:15", "source": "inferred"},
{"type": "HOLD_START", "t": "02:14", "source": "observed"},
{"type": "HOLD_END", "t": "03:52", "source": "observed"},
{"type": "PRICE_OBJECTION", "t": "03:55", "source": "inferred"},
{"type": "COMPETITOR_MENTION", "t": "04:30", "source": "inferred"},
{"type": "NEGATIVE_SENTIMENT_PEAK", "t": "04:10", "source": "inferred"},
{"type": "SOFT_DECLINE", "t": "05:40", "source": "inferred"},
{"type": "CALL_END", "t": "06:07", "source": "observed"}
]
}
RCA Tree Node (Module 6 output)
{
"_meta": {
"schema_version": "1.0.0",
"generated_at": "2024-01-15T11:00:00Z",
"taxonomy_version": "rca_taxonomy_v1.0",
"total_calls_analyzed": 5000,
"confidence_threshold_used": 0.70
},
"tree_type": "lost_sales",
"total_affected": {
"count": 1250,
"pct_of_total": 25.0
},
"root": {
"label": "Lost Sales",
"children": [
{
"rca_code": "PRICING",
"label": "Pricing Issues",
"count": 562,
"pct_of_parent": 45.0,
"avg_confidence": 0.82,
"children": [
{
"rca_code": "PRICING_TOO_EXPENSIVE",
"label": "Too Expensive",
"count": 375,
"pct_of_parent": 66.7,
"avg_confidence": 0.84,
"evidence_samples": [
{"call_id": "c001", "text": "59 euros al mes es mucho dinero", "t": "03:55"},
{"call_id": "c042", "text": "No puedo pagar tanto", "t": "02:30"}
],
"call_ids": ["c001", "c042", "c078", "..."]
},
{
"rca_code": "PRICING_COMPETITOR_CHEAPER",
"label": "Competitor Cheaper",
"count": 187,
"pct_of_parent": 33.3,
"avg_confidence": 0.79,
"evidence_samples": [
{"call_id": "c001", "text": "En Vodafone me lo dejan por 45", "t": "04:30"}
],
"call_ids": ["c001", "c015", "..."]
}
]
}
]
},
"other_emergent": [
{
"proposed_label": "agent_rushed_due_to_queue_pressure",
"count": 23,
"evidence_samples": [
{"call_id": "c234", "text": "Perdona que voy con prisa que hay cola", "t": "01:15"}
],
"recommendation": "Considerar añadir a taxonomía v1.1"
}
]
}
Taxonomía RCA (config/rca_taxonomy.yaml)
# config/rca_taxonomy.yaml
# Version: 1.0.0
# Last updated: 2024-01-15
_meta:
version: "1.0.0"
author: "CXInsights Team"
description: "Closed taxonomy for RCA classification. Only these codes are valid."
# ============================================================================
# INTENTS (Motivo de la llamada)
# ============================================================================
intents:
- SALES_INQUIRY # Consulta de venta
- SALES_UPGRADE # Upgrade de producto
- SUPPORT_TECHNICAL # Soporte técnico
- SUPPORT_BILLING # Consulta de facturación
- COMPLAINT # Queja/reclamación
- CANCELLATION # Solicitud de baja
- GENERAL_INQUIRY # Consulta general
- OTHER_EMERGENT # Captura de nuevos intents
# ============================================================================
# OUTCOMES (Resultado de la llamada)
# ============================================================================
outcomes:
- SALE_COMPLETED # Venta cerrada
- SALE_LOST # Venta perdida
- ISSUE_RESOLVED # Problema resuelto
- ISSUE_UNRESOLVED # Problema no resuelto
- ESCALATED # Escalado a supervisor/otro depto
- CALLBACK_SCHEDULED # Callback programado
- OTHER_EMERGENT
# ============================================================================
# LOST SALE DRIVERS (Por qué se perdió la venta)
# ============================================================================
lost_sale_drivers:
# Pricing cluster
PRICING:
- PRICING_TOO_EXPENSIVE # "Es muy caro"
- PRICING_COMPETITOR_CHEAPER # "En X me lo dan más barato"
- PRICING_NO_DISCOUNT # No se ofreció descuento
- PRICING_PAYMENT_TERMS # Condiciones de pago no aceptables
# Product fit cluster
PRODUCT_FIT:
- PRODUCT_FEATURE_MISSING # Falta funcionalidad requerida
- PRODUCT_WRONG_OFFERED # Se ofreció producto equivocado
- PRODUCT_COVERAGE_AREA # Sin cobertura en su zona
- PRODUCT_TECH_REQUIREMENTS # No cumple requisitos técnicos
# Process cluster
PROCESS:
- PROCESS_TOO_COMPLEX # Proceso demasiado complicado
- PROCESS_DOCUMENTATION # Requiere mucha documentación
- PROCESS_ACTIVATION_TIME # Tiempo de activación largo
- PROCESS_CONTRACT_TERMS # Términos de contrato no aceptables
# Agent cluster
AGENT:
- AGENT_COULDNT_CLOSE # No cerró la venta
- AGENT_POOR_OBJECTION # Mal manejo de objeciones
- AGENT_LACK_URGENCY # No creó urgencia
- AGENT_MISSED_UPSELL # Perdió oportunidad de upsell
# Timing cluster
TIMING:
- TIMING_NOT_READY # Cliente no está listo
- TIMING_COMPARING # Comparando opciones
- TIMING_BUDGET_PENDING # Presupuesto pendiente
# Catch-all
OTHER_EMERGENT: []
# ============================================================================
# POOR CX DRIVERS (Por qué fue mala experiencia)
# ============================================================================
poor_cx_drivers:
# Wait time cluster
WAIT_TIME:
- WAIT_INITIAL_LONG # Espera inicial larga (>2min)
- WAIT_HOLD_LONG # Hold durante llamada largo (>1min)
- WAIT_CALLBACK_NEVER # Callback prometido no llegó
# Resolution cluster
RESOLUTION:
- RESOLUTION_NOT_ACHIEVED # Problema no resuelto
- RESOLUTION_NEEDED_ESCALATION # Necesitó escalación
- RESOLUTION_CALLBACK_BROKEN # Callback prometido incumplido
- RESOLUTION_INCORRECT # Resolución incorrecta
# Agent behavior cluster
AGENT_BEHAVIOR:
- AGENT_LACK_EMPATHY # Falta de empatía
- AGENT_RUDE # Grosero/dismissive
- AGENT_RUSHED # Con prisas
- AGENT_NOT_LISTENING # No escuchaba
# Information cluster
INFORMATION:
- INFO_WRONG_GIVEN # Información incorrecta
- INFO_INCONSISTENT # Información inconsistente
- INFO_COULDNT_ANSWER # No supo responder
# Process/System cluster
PROCESS_SYSTEM:
- SYSTEM_DOWN # Sistema caído
- POLICY_LIMITATION # Limitación de política
- TOO_MANY_TRANSFERS # Demasiados transfers
- AUTH_ISSUES # Problemas de autenticación
# Catch-all
OTHER_EMERGENT: []
# ============================================================================
# JOURNEY EVENT TYPES (Eventos del timeline)
# ============================================================================
journey_event_types:
# Observed (vienen del audio/STT)
observed:
- CALL_START
- CALL_END
- GREETING
- SILENCE # >5 segundos
- HOLD_START
- HOLD_END
- TRANSFER
- CROSSTALK # Hablan a la vez
# Inferred (vienen del LLM)
inferred:
- INTENT_STATED
- PRICE_OBJECTION
- COMPETITOR_MENTION
- NEGATIVE_SENTIMENT_PEAK
- POSITIVE_SENTIMENT_PEAK
- RESOLUTION_ATTEMPT
- SOFT_DECLINE
- HARD_DECLINE
- COMMITMENT
- ESCALATION_REQUEST
Diagrama de Componentes (Actualizado)
┌─────────────────────────────────────────────────────────────────────────────┐
│ CXINSIGHTS COMPONENTS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ TRANSCRIBER INTERFACE (Adapter Pattern) │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │ │
│ │ │ AssemblyAI │ │ Whisper │ │ Google STT │ │ AWS │ │ │
│ │ │ Transcriber │ │ Transcriber │ │ Transcriber │ │ Transcribe │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └─────┬──────┘ │ │
│ │ └────────────────┴────────────────┴───────────────┘ │ │
│ │ ▼ │ │
│ │ TranscriptContract (normalized output) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Feature │ │ Inference │ │ Validation │ │
│ │ Extractor │───▶│ Service │───▶│ Gate │ │
│ │ (observed only) │ │ (observed/infer)│ │ (evidence check)│ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AGGREGATION LAYER │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Stats Engine │ │ RCA Builder │ │ Emergent │ │ │
│ │ │ (by rca_code)│ │(deterministic│ │ Collector │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VISUALIZATION LAYER │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Dashboard │ │ PDF │ │ Excel │ │ PNG │ │ │
│ │ │(obs/infer) │ │ (disclaim) │ │(drill-down)│ │ (legend) │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ CONFIG LAYER │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │ │
│ │ │ rca_taxonomy │ │ prompts/ + │ │ settings │ │ │
│ │ │ v1.0 (enum) │ │ VERSION FILE │ │ (.env) │ │ │
│ │ └────────────────┘ └────────────────┘ └────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Reglas de Validación (Quality Gate)
# Pseudocódigo de validación
def validate_call_analysis(analysis: CallAnalysis) -> ValidationResult:
errors = []
warnings = []
# REGLA 1: Todo driver debe tener evidence_spans
for driver in [analysis.inferred.lost_sale_driver, analysis.inferred.poor_cx_driver]:
if driver and not driver.evidence_spans:
errors.append(f"Driver {driver.rca_code} sin evidence_spans → RECHAZADO")
# REGLA 2: rca_code debe estar en taxonomía
if driver.rca_code not in TAXONOMY:
if driver.rca_code != "OTHER_EMERGENT":
errors.append(f"rca_code {driver.rca_code} no está en taxonomía")
else:
if not driver.proposed_label:
errors.append("OTHER_EMERGENT requiere proposed_label")
# REGLA 3: Confidence mínima
if driver.confidence < CONFIDENCE_THRESHOLD:
warnings.append(f"Driver {driver.rca_code} con low confidence: {driver.confidence}")
# REGLA 4: Schema version debe coincidir
if analysis._meta.schema_version != EXPECTED_SCHEMA_VERSION:
errors.append(f"Schema mismatch: {analysis._meta.schema_version}")
# REGLA 5: Journey events deben tener timestamps válidos
for event in analysis.journey_events:
if not is_valid_timestamp(event.t):
errors.append(f"Invalid timestamp in event: {event}")
return ValidationResult(
valid=len(errors) == 0,
errors=errors,
warnings=warnings
)
Versionado de Prompts
config/prompts/
├── versions.yaml # Registry de versiones
├── call_analysis/
│ ├── v1.0/
│ │ ├── system.txt
│ │ ├── user.txt
│ │ └── schema.json # JSON Schema esperado
│ ├── v1.1/
│ │ ├── system.txt
│ │ ├── user.txt
│ │ └── schema.json
│ └── v1.2/ # Current
│ ├── system.txt
│ ├── user.txt
│ └── schema.json
└── rca_synthesis/
└── v1.0/
├── system.txt
└── user.txt
# config/prompts/versions.yaml
current:
call_analysis: "v1.2"
rca_synthesis: "v1.0"
history:
call_analysis:
v1.0: "2024-01-01"
v1.1: "2024-01-10" # Added secondary_driver support
v1.2: "2024-01-15" # Added journey_events structure
Estimaciones
Tiempo Total (5,000 llamadas, ~4min promedio)
| Stage | Tiempo Estimado |
|---|---|
| Transcription | 3-4 horas |
| Feature Extraction | 15 min |
| Inference | 2-3 horas |
| Validation | 10 min |
| Aggregation | 10 min |
| RCA Tree Build | 5 min |
| Reporting | 5 min |
| Total | 6-8 horas |
Costes (ver TECH_STACK.md para detalle)
| Volumen | Transcription | Inference | Total |
|---|---|---|---|
| 5,000 calls | ~$300 | ~$15 | ~$315 |
| 20,000 calls | ~$1,200 | ~$60 | ~$1,260 |
Implementation Status (2026-01-19)
| Module | Status | Location |
|---|---|---|
| Transcription | ✅ Done | src/transcription/ |
| Feature Extraction | ✅ Done | src/features/ |
| Compression | ✅ Done | src/compression/ |
| Inference | ✅ Done | src/inference/ |
| Validation | ✅ Done | Built into models |
| Aggregation | ✅ Done | src/aggregation/ |
| RCA Trees | ✅ Done | src/aggregation/rca_tree.py |
| Pipeline | ✅ Done | src/pipeline/ |
| Exports | ✅ Done | src/exports/ |
| CLI | ✅ Done | cli.py |
Última actualización: 2026-01-19 | Versión: 1.0.0