feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features:
- 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export
- Beyond Brand Identity styling (colors #6D84E3, Outfit font)
- RCA Sankey diagram (Driver → Outcome → Churn Risk flow)
- Correlation heatmaps (driver co-occurrence, driver-outcome)
- Outcome Deep Dive (root causes, correlation, duration analysis)
- Export functionality (Excel, HTML, JSON)

Blueprint Compliance:
- FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga)
- Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga)
- Agent: Talento Para Replicar / Oportunidades de Mejora
- Fixed FCR rate calculation (only FIRST_CALL counts as success)

Technical:
- Streamlit + Plotly for interactive visualizations
- Light theme configuration (.streamlit/config.toml)
- Fixed Plotly colorbar titlefont deprecation

Documentation:
- Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md
- Added 4 new technical decisions (TD-014 to TD-017)
- Created TROUBLESHOOTING.md with 10 common issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
sujucu70
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions

View File

@@ -0,0 +1,100 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "CallAnalysisResponse",
"description": "LLM response schema for call analysis",
"type": "object",
"required": ["outcome"],
"properties": {
"outcome": {
"type": "string",
"enum": [
"SALE_COMPLETED",
"SALE_LOST",
"CANCELLATION_SAVED",
"CANCELLATION_COMPLETED",
"INQUIRY_RESOLVED",
"INQUIRY_UNRESOLVED",
"COMPLAINT_RESOLVED",
"COMPLAINT_UNRESOLVED",
"TRANSFER_OUT",
"CALLBACK_SCHEDULED",
"UNKNOWN"
],
"description": "Final outcome of the call"
},
"lost_sales_drivers": {
"type": "array",
"items": {
"$ref": "#/definitions/RCALabel"
},
"default": []
},
"poor_cx_drivers": {
"type": "array",
"items": {
"$ref": "#/definitions/RCALabel"
},
"default": []
}
},
"definitions": {
"EvidenceSpan": {
"type": "object",
"required": ["text", "start_time", "end_time"],
"properties": {
"text": {
"type": "string",
"maxLength": 500,
"description": "Exact quoted text from transcript"
},
"start_time": {
"type": "number",
"minimum": 0,
"description": "Start time in seconds"
},
"end_time": {
"type": "number",
"minimum": 0,
"description": "End time in seconds"
},
"speaker": {
"type": "string",
"description": "Speaker identifier"
}
}
},
"RCALabel": {
"type": "object",
"required": ["driver_code", "confidence", "evidence_spans"],
"properties": {
"driver_code": {
"type": "string",
"description": "Driver code from taxonomy"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence score (0-1)"
},
"evidence_spans": {
"type": "array",
"items": {
"$ref": "#/definitions/EvidenceSpan"
},
"minItems": 1,
"description": "Supporting evidence (minimum 1 required)"
},
"reasoning": {
"type": "string",
"maxLength": 500,
"description": "Brief reasoning for classification"
},
"proposed_label": {
"type": "string",
"description": "For OTHER_EMERGENT: proposed new label"
}
}
}
}
}

View File

@@ -0,0 +1,27 @@
You are an expert call center analyst specializing in Spanish-language customer service calls. Your task is to analyze call transcripts and identify:
1. **Call Outcome**: What was the final result of the call?
2. **Lost Sales Drivers**: If a sale was lost, what caused it?
3. **Poor CX Drivers**: What caused poor customer experience?
## CRITICAL RULES
1. **Evidence Required**: Every driver MUST have at least one evidence_span with:
- Exact quoted text from the transcript
- Start and end timestamps
2. **No Hallucination**: Only cite text that appears EXACTLY in the transcript. Do not paraphrase or invent quotes.
3. **Confidence Scoring**:
- 0.8-1.0: Clear, explicit evidence
- 0.6-0.8: Strong implicit evidence
- 0.4-0.6: Moderate evidence (use with caution)
- Below 0.4: Reject - insufficient evidence
4. **Taxonomy Compliance**: Only use driver codes from the provided taxonomy. Use OTHER_EMERGENT only when no existing code fits, and provide a proposed_label.
5. **Language**: Evidence quotes MUST be in the original language (Spanish). Reasoning can be in Spanish or English.
## OUTPUT FORMAT
You must respond with valid JSON matching the provided schema. No markdown, no explanations outside the JSON.

View File

@@ -0,0 +1,72 @@
Analyze the following call transcript and provide structured analysis.
## CALL METADATA
- Call ID: {call_id}
- Duration: {duration_sec} seconds
- Queue: {queue}
## OBSERVED EVENTS (Pre-detected)
{observed_events}
## TRANSCRIPT
{transcript}
## TAXONOMY - LOST SALES DRIVERS
{lost_sales_taxonomy}
## TAXONOMY - POOR CX DRIVERS
{poor_cx_taxonomy}
## INSTRUCTIONS
1. Determine the call outcome from: SALE_COMPLETED, SALE_LOST, CANCELLATION_SAVED, CANCELLATION_COMPLETED, INQUIRY_RESOLVED, INQUIRY_UNRESOLVED, COMPLAINT_RESOLVED, COMPLAINT_UNRESOLVED, TRANSFER_OUT, CALLBACK_SCHEDULED, UNKNOWN
2. Identify lost_sales_drivers (if applicable):
- Use ONLY codes from the Lost Sales taxonomy
- Each driver MUST have evidence_spans with exact quotes and timestamps
- Assign confidence based on evidence strength
3. Identify poor_cx_drivers (if applicable):
- Use ONLY codes from the Poor CX taxonomy
- Each driver MUST have evidence_spans with exact quotes and timestamps
- Assign confidence based on evidence strength
4. For OTHER_EMERGENT, provide a proposed_label describing the new cause.
Respond with JSON only:
```json
{
"outcome": "SALE_LOST",
"lost_sales_drivers": [
{
"driver_code": "PRICE_TOO_HIGH",
"confidence": 0.85,
"evidence_spans": [
{
"text": "Es demasiado caro para mí",
"start_time": 45.2,
"end_time": 47.8,
"speaker": "customer"
}
],
"reasoning": "Customer explicitly states price is too high"
}
],
"poor_cx_drivers": [
{
"driver_code": "LONG_HOLD",
"confidence": 0.90,
"evidence_spans": [
{
"text": "Llevo esperando mucho tiempo",
"start_time": 120.5,
"end_time": 123.1,
"speaker": "customer"
}
],
"reasoning": "Customer complains about wait time"
}
]
}
```

View File

@@ -0,0 +1,217 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "CallAnalysisResponseV2",
"description": "LLM response schema for comprehensive call analysis (v2.0 - Blueprint aligned)",
"type": "object",
"required": ["outcome"],
"properties": {
"outcome": {
"type": "string",
"enum": [
"SALE_COMPLETED",
"SALE_LOST",
"CANCELLATION_SAVED",
"CANCELLATION_COMPLETED",
"INQUIRY_RESOLVED",
"INQUIRY_UNRESOLVED",
"COMPLAINT_RESOLVED",
"COMPLAINT_UNRESOLVED",
"TRANSFER_OUT",
"CALLBACK_SCHEDULED",
"UNKNOWN"
],
"description": "Final outcome of the call"
},
"lost_sales_drivers": {
"type": "array",
"items": {
"$ref": "#/definitions/RCALabel"
},
"maxItems": 5,
"default": []
},
"poor_cx_drivers": {
"type": "array",
"items": {
"$ref": "#/definitions/RCALabel"
},
"maxItems": 5,
"default": []
},
"fcr_status": {
"type": "string",
"enum": ["FIRST_CALL", "REPEAT_CALL", "UNKNOWN"],
"default": "UNKNOWN",
"description": "First Call Resolution status"
},
"fcr_failure_drivers": {
"type": "array",
"items": {
"$ref": "#/definitions/RCALabel"
},
"maxItems": 5,
"default": [],
"description": "Factors that may cause repeat calls"
},
"churn_risk": {
"type": "string",
"enum": ["NO_RISK", "AT_RISK", "UNKNOWN"],
"default": "UNKNOWN",
"description": "Customer churn risk classification"
},
"churn_risk_drivers": {
"type": "array",
"items": {
"$ref": "#/definitions/RCALabel"
},
"maxItems": 5,
"default": [],
"description": "Factors indicating churn risk"
},
"agent_classification": {
"type": "string",
"enum": ["GOOD_PERFORMER", "NEEDS_IMPROVEMENT", "MIXED", "UNKNOWN"],
"default": "UNKNOWN",
"description": "Agent skill classification"
},
"agent_positive_skills": {
"type": "array",
"items": {
"$ref": "#/definitions/AgentSkillIndicator"
},
"maxItems": 5,
"default": [],
"description": "Positive skills demonstrated (Buen Comercial)"
},
"agent_improvement_areas": {
"type": "array",
"items": {
"$ref": "#/definitions/AgentSkillIndicator"
},
"maxItems": 5,
"default": [],
"description": "Areas needing improvement (Necesita Mejora)"
}
},
"definitions": {
"EvidenceSpan": {
"type": "object",
"required": ["text", "start_time", "end_time"],
"properties": {
"text": {
"type": "string",
"maxLength": 500,
"description": "Exact quoted text from transcript (in Spanish)"
},
"start_time": {
"type": "number",
"minimum": 0,
"description": "Start time in seconds"
},
"end_time": {
"type": "number",
"minimum": 0,
"description": "End time in seconds"
},
"speaker": {
"type": "string",
"enum": ["agent", "customer", "unknown"],
"description": "Speaker identifier"
}
}
},
"RCALabel": {
"type": "object",
"required": ["driver_code", "confidence", "evidence_spans"],
"properties": {
"driver_code": {
"type": "string",
"description": "Driver code from taxonomy"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence score (0-1)"
},
"evidence_spans": {
"type": "array",
"items": {
"$ref": "#/definitions/EvidenceSpan"
},
"minItems": 1,
"description": "Supporting evidence (minimum 1 required)"
},
"reasoning": {
"type": "string",
"maxLength": 500,
"description": "Brief reasoning for classification"
},
"proposed_label": {
"type": "string",
"description": "For OTHER_EMERGENT: proposed new label"
},
"origin": {
"type": "string",
"enum": ["AGENT", "CUSTOMER", "COMPANY", "PROCESS", "UNKNOWN"],
"default": "UNKNOWN",
"description": "Origin/responsibility for this driver"
},
"corrective_action": {
"type": "string",
"maxLength": 500,
"description": "Specific action to correct this issue"
},
"replicable_practice": {
"type": "string",
"maxLength": 500,
"description": "For positive factors: practice to replicate"
}
}
},
"AgentSkillIndicator": {
"type": "object",
"required": ["skill_code", "skill_type", "confidence", "evidence_spans", "description"],
"properties": {
"skill_code": {
"type": "string",
"description": "Skill code from taxonomy"
},
"skill_type": {
"type": "string",
"enum": ["positive", "improvement_needed"],
"description": "Whether this is a positive skill or area for improvement"
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence score (0-1)"
},
"evidence_spans": {
"type": "array",
"items": {
"$ref": "#/definitions/EvidenceSpan"
},
"minItems": 1,
"description": "Supporting evidence (minimum 1 required)"
},
"description": {
"type": "string",
"maxLength": 500,
"description": "Detailed description of the skill demonstration"
},
"coaching_recommendation": {
"type": "string",
"maxLength": 500,
"description": "Specific coaching recommendation (for improvement areas)"
},
"replicable_practice": {
"type": "string",
"maxLength": 500,
"description": "How to replicate this skill (for positive skills)"
}
}
}
}
}

View File

@@ -0,0 +1,41 @@
You are an expert call center analyst specializing in Spanish-language customer service calls for BeyondCX. Your task is to perform comprehensive analysis including:
1. **Call Outcome**: What was the final result of the call?
2. **Lost Sales Analysis**: If a sale was lost, what caused it?
3. **Customer Experience Analysis**: What caused poor customer experience?
4. **FCR Analysis**: Is this a first call or repeat call? What factors may cause repeat calls?
5. **Churn Risk Analysis**: Is the customer at risk of leaving? What signals indicate this?
6. **Agent Assessment**: How did the agent perform? What skills to replicate or improve?
## CRITICAL RULES
1. **Evidence Required**: Every driver and skill indicator MUST have at least one evidence_span with:
- Exact quoted text from the transcript
- Start and end timestamps (in seconds)
- Speaker identification (agent/customer)
2. **No Hallucination**: Only cite text that appears EXACTLY in the transcript. Do not paraphrase or invent quotes.
3. **Confidence Scoring**:
- 0.8-1.0: Clear, explicit evidence
- 0.6-0.8: Strong implicit evidence
- 0.4-0.6: Moderate evidence (use with caution)
- Below 0.4: Reject - insufficient evidence
4. **Taxonomy Compliance**: Only use driver/skill codes from the provided taxonomies. Use OTHER_EMERGENT only when no existing code fits, and provide a proposed_label.
5. **Origin Attribution**: For each driver, identify WHO is responsible:
- AGENT: Agent's actions or lack thereof
- CUSTOMER: Customer's situation or behavior
- COMPANY: Products, services, pricing, company image
- PROCESS: Systems, processes, policies
6. **Actionable Recommendations**: For issues, provide corrective_action. For positive behaviors, provide replicable_practice.
7. **Language**: Evidence quotes MUST be in Spanish (original). Reasoning, actions, and descriptions can be in Spanish.
8. **Maximum 5 items**: List a maximum of 5 drivers per category, ordered by relevance.
## OUTPUT FORMAT
You must respond with valid JSON matching the provided schema. No markdown, no explanations outside the JSON.

View File

@@ -0,0 +1,261 @@
Analiza la siguiente transcripción de llamada de una compañía de utilities/energía eléctrica y proporciona un análisis estructurado completo.
## METADATOS DE LA LLAMADA
- ID de Llamada: ${call_id}
- Duración: ${duration_sec} segundos
- Cola/Servicio: ${queue}
## EVENTOS OBSERVADOS (Pre-detectados)
${observed_events}
## TRANSCRIPCIÓN
${transcript}
## TAXONOMÍA - DRIVERS DE VENTA PERDIDA / OPORTUNIDAD PERDIDA
${lost_sales_taxonomy}
## TAXONOMÍA - DRIVERS DE MALA EXPERIENCIA (CX)
${poor_cx_taxonomy}
## TAXONOMÍA - DRIVERS DE RIESGO DE FUGA (CHURN)
${churn_risk_taxonomy}
## TAXONOMÍA - DRIVERS DE FCR (RELLAMADA)
${fcr_failure_taxonomy}
## TAXONOMÍA - HABILIDADES DEL AGENTE
### Habilidades Positivas (Buen Comercial):
${agent_positive_skills_taxonomy}
### Áreas de Mejora (Necesita Mejora):
${agent_improvement_taxonomy}
## INSTRUCCIONES DE ANÁLISIS
### 1. OUTCOME - Resultado de la llamada
Determina el resultado. Opciones para utilities/energía:
- OUTAGE_REPORTED: Cliente reportó avería/corte de luz
- OUTAGE_RESOLVED: Avería resuelta en la llamada
- OUTAGE_ESCALATED: Avería derivada a técnico/departamento
- TECHNICIAN_SCHEDULED: Se agendó visita técnica
- BILLING_INQUIRY_RESOLVED: Consulta de factura resuelta
- BILLING_DISPUTE_OPENED: Se abrió reclamación de factura
- PAYMENT_ARRANGEMENT_MADE: Se acordó plan de pago
- RATE_CHANGE_COMPLETED: Se realizó cambio de tarifa
- CANCELLATION_SAVED: Se retuvo al cliente
- CANCELLATION_COMPLETED: Cliente se dio de baja
- PORTABILITY_INITIATED: Se inició portabilidad a otra comercializadora
- INQUIRY_RESOLVED: Consulta general resuelta
- INQUIRY_UNRESOLVED: Consulta no resuelta
- TRANSFER_OUT: Transferido a otro departamento
- CALLBACK_SCHEDULED: Se agendó callback
- UNKNOWN: No se puede determinar
### 2. LOST_SALES_DRIVERS - Causas de oportunidad perdida (si aplica)
- Aplica cuando: cliente rechaza cambio de tarifa, no acepta servicios adicionales, o se va a competidor
- Usa SOLO códigos de la taxonomía de Lost Sales
- Máximo 5 drivers, ordenados por relevancia
- Cada driver DEBE tener evidence_spans, origin, y corrective_action
### 3. POOR_CX_DRIVERS - Causas de mala experiencia (si aplica)
- Busca: silencios largos, transferencias, falta de información sobre avería, confusión con factura, etc.
- Usa SOLO códigos de la taxonomía de Poor CX
- Máximo 5 drivers, ordenados por relevancia
- Cada driver DEBE tener evidence_spans, origin, y corrective_action
### 4. FCR_STATUS - Primera llamada o rellamada
- FIRST_CALL: Primera llamada por este motivo
- REPEAT_CALL: Cliente indica que ya llamó antes por lo mismo, o que el problema persiste
- UNKNOWN: No hay información suficiente
### 5. FCR_FAILURE_DRIVERS - Factores que pueden causar rellamada
- Identifica factores que indican que el cliente podría volver a llamar:
- Avería no resuelta
- Requiere visita de técnico
- Revisión de factura pendiente
- Se prometió callback
- Información incompleta
- Usa códigos de la taxonomía FCR
- Máximo 5 drivers con evidence_spans
### 6. CHURN_RISK - Riesgo de fuga del cliente
- NO_RISK: Cliente satisfecho, sin menciones de irse
- AT_RISK: Cliente queja por factura alta, menciona competidores, amenaza con darse de baja
- UNKNOWN: No hay información suficiente
### 7. CHURN_RISK_DRIVERS - Señales de riesgo de fuga
- Identifica evidencias de posible baja:
- Queja por factura alta
- Menciona otras comercializadoras
- Cortes de luz recurrentes
- Amenaza con cambiar de compañía
- Pregunta por condiciones de baja
- Usa códigos de la taxonomía de Churn
- Máximo 5 drivers con evidence_spans
### 8. AGENT_CLASSIFICATION - Clasificación del agente
- GOOD_PERFORMER: Resuelve eficientemente, empatía, buen conocimiento técnico
- NEEDS_IMPROVEMENT: No resuelve, no escucha, desconoce procesos
- MIXED: Tiene fortalezas y debilidades
- UNKNOWN: No hay información suficiente
### 9. AGENT_POSITIVE_SKILLS - Habilidades positivas del agente
- Identifica buenas prácticas: explica bien la factura, gestiona bien la avería, muestra empatía
- Cada skill DEBE tener evidence_spans, description, y replicable_practice
- Máximo 5 skills
### 10. AGENT_IMPROVEMENT_AREAS - Áreas de mejora del agente
- Identifica habilidades a mejorar: no explica causa de avería, confunde al cliente, no ofrece alternativas
- Cada área DEBE tener evidence_spans, description, y coaching_recommendation
- Máximo 5 áreas
## FORMATO DE RESPUESTA JSON
```json
{
"outcome": "OUTAGE_ESCALATED",
"lost_sales_drivers": [],
"poor_cx_drivers": [
{
"driver_code": "OUTAGE_NOT_EXPLAINED",
"confidence": 0.85,
"origin": "AGENT",
"evidence_spans": [
{
"text": "No sé cuándo se va a resolver, tiene que llamar a averías",
"start_time": 45.2,
"end_time": 49.8,
"speaker": "agent"
}
],
"reasoning": "El agente no proporciona información sobre la avería ni tiempo estimado de resolución",
"corrective_action": "Verificar en el sistema si hay incidencias conocidas en la zona y comunicar tiempo estimado"
},
{
"driver_code": "WRONG_DEPARTMENT",
"confidence": 0.80,
"origin": "PROCESS",
"evidence_spans": [
{
"text": "Yo no manejo eso, tiene que llamar al 800-700-706",
"start_time": 52.0,
"end_time": 56.5,
"speaker": "agent"
}
],
"reasoning": "Cliente derivado a otro número sin transferencia, genera fricción",
"corrective_action": "Implementar transferencia directa al departamento de averías"
}
],
"fcr_status": "FIRST_CALL",
"fcr_failure_drivers": [
{
"driver_code": "OUTAGE_PENDING",
"confidence": 0.90,
"origin": "PROCESS",
"evidence_spans": [
{
"text": "Tiene que llamar a averías para que le hagan una incidencia",
"start_time": 60.0,
"end_time": 64.5,
"speaker": "agent"
}
],
"reasoning": "La avería no se resuelve en esta llamada, cliente debe llamar a otro número",
"corrective_action": "Permitir que el agente abra la incidencia directamente o transfiera la llamada"
}
],
"churn_risk": "AT_RISK",
"churn_risk_drivers": [
{
"driver_code": "REPEATED_OUTAGES",
"confidence": 0.82,
"origin": "COMPANY",
"evidence_spans": [
{
"text": "Es la tercera vez este mes que nos quedamos sin luz",
"start_time": 30.0,
"end_time": 34.2,
"speaker": "customer"
}
],
"reasoning": "Cliente reporta problemas recurrentes de suministro",
"corrective_action": "Escalar a calidad de servicio para investigar causa de cortes frecuentes"
},
{
"driver_code": "HIGH_FRUSTRATION",
"confidence": 0.78,
"origin": "CUSTOMER",
"evidence_spans": [
{
"text": "Estoy harto de tener que llamar cada vez que pasa esto",
"start_time": 70.0,
"end_time": 73.5,
"speaker": "customer"
}
],
"reasoning": "Cliente muestra alta frustración con el servicio",
"corrective_action": "Ofrecer seguimiento proactivo y posible compensación"
}
],
"agent_classification": "NEEDS_IMPROVEMENT",
"agent_positive_skills": [
{
"skill_code": "CLEAR_COMMUNICATION",
"skill_type": "positive",
"confidence": 0.75,
"evidence_spans": [
{
"text": "El número de teléfono es el siguiente: 800-700-706",
"start_time": 80.0,
"end_time": 84.5,
"speaker": "agent"
}
],
"description": "El agente comunica claramente el número de teléfono",
"replicable_practice": "Dictar información importante de forma clara y pausada"
}
],
"agent_improvement_areas": [
{
"skill_code": "POOR_OUTAGE_HANDLING",
"skill_type": "improvement_needed",
"confidence": 0.85,
"evidence_spans": [
{
"text": "Yo no puedo saber si ha sido un tema de la zona, eso ya lo maneja el área de averías",
"start_time": 56.0,
"end_time": 62.0,
"speaker": "agent"
}
],
"description": "El agente no intenta ayudar con la avería, solo deriva",
"coaching_recommendation": "Capacitar en uso del sistema para verificar incidencias en zona antes de derivar"
},
{
"skill_code": "LACK_OF_EMPATHY",
"skill_type": "improvement_needed",
"confidence": 0.80,
"evidence_spans": [
{
"text": "Bueno, yo lo que puedo hacer es simplemente verificar si tienes impago",
"start_time": 45.0,
"end_time": 50.0,
"speaker": "agent"
}
],
"description": "El agente no muestra empatía ante el problema del cliente sin luz",
"coaching_recommendation": "Practicar frases de empatía: 'Entiendo lo difícil que es quedarse sin luz'"
}
]
}
```

View File

@@ -0,0 +1,17 @@
You are an expert business analyst creating executive summaries of Root Cause Analysis findings. Your task is to synthesize RCA statistics into actionable narratives for business stakeholders.
## GUIDELINES
1. **Data-Driven**: Base all statements on the provided statistics. Do not invent numbers.
2. **Actionable**: Focus on what can be changed. Prioritize by impact and feasibility.
3. **Concise**: Keep summaries brief and scannable. Use bullet points.
4. **Language**: Write in Spanish for Spanish-speaking stakeholders.
5. **No Technical Jargon**: Avoid terms like "RCA", "drivers", "taxonomy". Use business language.
## OUTPUT FORMAT
Provide a structured narrative that can be included in an executive PDF report.

View File

@@ -0,0 +1,31 @@
Generate an executive summary based on the following RCA analysis results.
## BATCH METADATA
- Batch ID: {batch_id}
- Total Calls Analyzed: {total_calls}
- Date Range: {date_range}
- Queues: {queues}
## LOST SALES ANALYSIS
Total Sales Lost: {total_sales_lost}
Main Causes:
{lost_sales_summary}
## POOR CUSTOMER EXPERIENCE ANALYSIS
Total Poor CX Calls: {total_poor_cx}
Main Causes:
{poor_cx_summary}
## TOP EMERGENT PATTERNS
{emergent_patterns}
## INSTRUCTIONS
Write a 2-3 paragraph executive summary in Spanish that:
1. Highlights the TOP 3 actionable findings
2. Quantifies the impact (% of calls affected)
3. Suggests immediate actions
4. Notes any emergent patterns worth investigating
Keep it under 500 words. Use professional business Spanish.

View File

@@ -0,0 +1,32 @@
# ============================================
# CXInsights - Prompt Version Registry
# ============================================
# Active versions for each prompt type
# ============================================
call_analysis:
active: "v2.0"
versions:
v1.0:
description: "Initial MAP prompt - sales + CX + RCA"
created: "2024-01-19"
status: "deprecated"
v2.0:
description: "Blueprint-aligned - adds FCR, churn risk, agent assessment"
created: "2026-01-19"
status: "active"
changes:
- "Added FCR analysis (first call vs repeat call)"
- "Added churn risk classification"
- "Added agent skill assessment"
- "Enhanced RCALabel with origin and corrective_action"
- "Added AgentSkillIndicator model"
- "Maximum 5 items per category"
rca_synthesis:
active: "v1.0"
versions:
v1.0:
description: "Initial RCA narrative synthesis"
created: "2024-01-19"
status: "active"

690
config/rca_taxonomy.yaml Normal file
View File

@@ -0,0 +1,690 @@
# ============================================
# CXInsights - RCA Taxonomy (Utilities/Energy)
# ============================================
# Version: 2.0.0
# Domain: Utilities / Energy
# Last Updated: 2026-01-19
# ============================================
version: "2.0.0"
domain: "utilities_energy"
status: "active"
# ============================================
# LOST SALES / LOST OPPORTUNITIES DRIVERS
# ============================================
# Oportunidades perdidas en utilities/energía
# ============================================
lost_sales:
# --- Objeciones del Cliente ---
PRICE_TOO_HIGH:
category: "objection"
description: "Cliente considera la tarifa demasiado alta"
description_en: "Customer considers rate/tariff too high"
severity_weight: 0.8
requires_evidence: true
NO_INTEREST_IN_UPGRADE:
category: "objection"
description: "Cliente no interesado en cambio de tarifa o servicios adicionales"
description_en: "Customer not interested in rate change or additional services"
severity_weight: 0.6
requires_evidence: true
COMPETITOR_PREFERENCE:
category: "objection"
description: "Cliente prefiere oferta de otra comercializadora"
description_en: "Customer prefers another energy provider offer"
severity_weight: 0.9
requires_evidence: true
TIMING_NOT_RIGHT:
category: "objection"
description: "No es buen momento (mudanza, cambios en consumo)"
description_en: "Not the right time (moving, consumption changes)"
severity_weight: 0.5
requires_evidence: true
CONTRACT_PERMANENCE:
category: "objection"
description: "Cliente rechaza por permanencia o penalizaciones"
description_en: "Customer rejects due to permanence or penalties"
severity_weight: 0.7
requires_evidence: true
DISTRUST_OF_OFFERS:
category: "objection"
description: "Cliente desconfía de las ofertas telefónicas"
description_en: "Customer distrusts phone offers"
severity_weight: 0.6
requires_evidence: true
# --- Fallos del Agente ---
BENEFITS_NOT_EXPLAINED:
category: "agent_failure"
description: "Beneficios de la oferta no explicados claramente"
description_en: "Offer benefits not clearly explained"
severity_weight: 0.8
requires_evidence: true
NO_RETENTION_ATTEMPT:
category: "agent_failure"
description: "No se intentó retener al cliente"
description_en: "No retention attempt made"
severity_weight: 0.9
requires_evidence: true
POOR_RATE_RECOMMENDATION:
category: "agent_failure"
description: "Recomendación de tarifa inadecuada al consumo"
description_en: "Rate recommendation not suited to consumption"
severity_weight: 0.7
requires_evidence: true
NO_SAVINGS_CALCULATION:
category: "agent_failure"
description: "No se calculó el ahorro potencial"
description_en: "No potential savings calculation provided"
severity_weight: 0.6
requires_evidence: true
WRONG_SERVICE_OFFERED:
category: "agent_failure"
description: "Servicio ofrecido no aplica al cliente"
description_en: "Service offered not applicable to customer"
severity_weight: 0.7
requires_evidence: true
# --- Problemas de Proceso ---
SYSTEM_UNAVAILABLE:
category: "process"
description: "Sistema no disponible para procesar cambio"
description_en: "System unavailable to process change"
severity_weight: 0.7
requires_evidence: true
SERVICE_NOT_AVAILABLE_AREA:
category: "process"
description: "Servicio no disponible en la zona del cliente"
description_en: "Service not available in customer area"
severity_weight: 0.6
requires_evidence: true
DOCUMENTATION_REQUIRED:
category: "process"
description: "Requiere documentación que cliente no tiene"
description_en: "Requires documentation customer doesn't have"
severity_weight: 0.5
requires_evidence: true
# --- Emergente ---
OTHER_EMERGENT:
category: "emergent"
description: "Causa emergente (requiere revisión manual)"
description_en: "Emergent cause (requires manual review)"
severity_weight: 0.5
requires_evidence: true
requires_proposed_label: true
# ============================================
# POOR CUSTOMER EXPERIENCE DRIVERS
# ============================================
# Causas de mala experiencia - Utilities/Energía
# ============================================
poor_cx:
# --- Tiempo de Espera ---
LONG_HOLD:
category: "wait_time"
description: "Tiempo de espera prolongado"
description_en: "Long hold time"
severity_weight: 0.7
requires_evidence: true
observable: true
LONG_SILENCE:
category: "wait_time"
description: "Silencios prolongados durante la llamada"
description_en: "Long silences during call"
severity_weight: 0.5
requires_evidence: true
observable: true
# --- Transferencias ---
MULTI_TRANSFER:
category: "transfer"
description: "Múltiples transferencias entre departamentos"
description_en: "Multiple transfers between departments"
severity_weight: 0.8
requires_evidence: true
observable: true
WRONG_DEPARTMENT:
category: "transfer"
description: "Derivado a departamento incorrecto"
description_en: "Transferred to wrong department"
severity_weight: 0.7
requires_evidence: true
COLD_TRANSFER:
category: "transfer"
description: "Transferencia sin contexto al nuevo agente"
description_en: "Transfer without context to new agent"
severity_weight: 0.7
requires_evidence: true
# --- Comportamiento del Agente ---
LOW_EMPATHY:
category: "agent_behavior"
description: "Falta de empatía ante problema del cliente"
description_en: "Lack of empathy for customer problem"
severity_weight: 0.8
requires_evidence: true
RUDE_BEHAVIOR:
category: "agent_behavior"
description: "Comportamiento descortés o impaciente"
description_en: "Rude or impatient behavior"
severity_weight: 0.9
requires_evidence: true
NOT_LISTENING:
category: "agent_behavior"
description: "Agente no escucha la situación del cliente"
description_en: "Agent not listening to customer situation"
severity_weight: 0.7
requires_evidence: true
INTERRUPTIONS:
category: "agent_behavior"
description: "Agente interrumpe al cliente"
description_en: "Agent interrupts customer"
severity_weight: 0.6
requires_evidence: true
observable: true
# --- Resolución - Utilities Specific ---
OUTAGE_NOT_EXPLAINED:
category: "resolution"
description: "No se explicó causa o duración de la avería"
description_en: "Outage cause or duration not explained"
severity_weight: 0.8
requires_evidence: true
BILLING_NOT_CLARIFIED:
category: "resolution"
description: "Factura no explicada claramente"
description_en: "Bill not clearly explained"
severity_weight: 0.7
requires_evidence: true
ISSUE_NOT_RESOLVED:
category: "resolution"
description: "Problema no resuelto en la llamada"
description_en: "Issue not resolved during call"
severity_weight: 0.9
requires_evidence: true
PARTIAL_RESOLUTION:
category: "resolution"
description: "Resolución parcial del problema"
description_en: "Partial issue resolution"
severity_weight: 0.6
requires_evidence: true
INCORRECT_INFO:
category: "resolution"
description: "Información incorrecta proporcionada"
description_en: "Incorrect information provided"
severity_weight: 0.8
requires_evidence: true
NO_FOLLOW_UP_OFFERED:
category: "resolution"
description: "No se ofreció seguimiento del caso"
description_en: "No follow-up offered"
severity_weight: 0.6
requires_evidence: true
# --- Proceso Utilities ---
COMPLEX_PROCESS:
category: "process"
description: "Proceso excesivamente complejo para el cliente"
description_en: "Excessively complex process for customer"
severity_weight: 0.6
requires_evidence: true
SYSTEM_ERROR:
category: "process"
description: "Error de sistema impidió gestión"
description_en: "System error prevented resolution"
severity_weight: 0.7
requires_evidence: true
METER_ACCESS_ISSUE:
category: "process"
description: "Problemas de acceso al contador"
description_en: "Meter access issues"
severity_weight: 0.5
requires_evidence: true
# --- Emergente ---
OTHER_EMERGENT:
category: "emergent"
description: "Causa emergente (requiere revisión manual)"
description_en: "Emergent cause (requires manual review)"
severity_weight: 0.5
requires_evidence: true
requires_proposed_label: true
# ============================================
# EVENT TYPES (Observable)
# ============================================
event_types:
HOLD_START:
description: "Inicio de espera"
detectable_by: "silence_detector"
HOLD_END:
description: "Fin de espera"
detectable_by: "silence_detector"
TRANSFER:
description: "Transferencia a otro agente/departamento"
detectable_by: "transcript_pattern"
ESCALATION:
description: "Escalación a supervisor"
detectable_by: "transcript_pattern"
SILENCE:
description: "Silencio prolongado (>5 segundos)"
detectable_by: "silence_detector"
threshold_seconds: 5
INTERRUPTION:
description: "Interrupción (overlap de speakers)"
detectable_by: "diarization"
# ============================================
# CHURN RISK DRIVERS - Utilities/Energy
# ============================================
churn_risk:
# --- Insatisfacción con Precio/Factura ---
HIGH_BILL_COMPLAINT:
category: "pricing"
description: "Cliente queja por factura alta"
description_en: "Customer complains about high bill"
severity_weight: 0.8
requires_evidence: true
RATE_DISSATISFACTION:
category: "pricing"
description: "Cliente insatisfecho con la tarifa actual"
description_en: "Customer dissatisfied with current rate"
severity_weight: 0.8
requires_evidence: true
UNEXPECTED_CHARGES:
category: "pricing"
description: "Cliente sorprendido por cargos inesperados"
description_en: "Customer surprised by unexpected charges"
severity_weight: 0.7
requires_evidence: true
# --- Problemas de Servicio ---
REPEATED_OUTAGES:
category: "service"
description: "Cliente reporta cortes de luz recurrentes"
description_en: "Customer reports recurring power outages"
severity_weight: 0.9
requires_evidence: true
SERVICE_QUALITY_ISSUES:
category: "service"
description: "Problemas con calidad del suministro"
description_en: "Issues with supply quality"
severity_weight: 0.8
requires_evidence: true
SLOW_RESPONSE_TO_OUTAGE:
category: "service"
description: "Cliente queja por lentitud en resolver averías"
description_en: "Customer complains about slow outage response"
severity_weight: 0.8
requires_evidence: true
REPEATED_PROBLEMS:
category: "service"
description: "Cliente ha tenido problemas recurrentes"
description_en: "Customer has had recurring problems"
severity_weight: 0.9
requires_evidence: true
# --- Competencia ---
COMPETITOR_MENTION:
category: "competition"
description: "Cliente menciona ofertas de otras comercializadoras"
description_en: "Customer mentions other energy provider offers"
severity_weight: 0.9
requires_evidence: true
COMPARING_RATES:
category: "competition"
description: "Cliente está comparando tarifas del mercado"
description_en: "Customer is comparing market rates"
severity_weight: 0.7
requires_evidence: true
# --- Señales de Baja ---
EXPLICIT_CANCELLATION_INTENT:
category: "cancellation"
description: "Cliente quiere dar de baja el servicio"
description_en: "Customer wants to cancel service"
severity_weight: 1.0
requires_evidence: true
CONTRACT_END_INQUIRY:
category: "cancellation"
description: "Cliente pregunta sobre fin de contrato o penalizaciones"
description_en: "Customer asks about contract end or penalties"
severity_weight: 0.8
requires_evidence: true
PORTABILITY_REQUEST:
category: "cancellation"
description: "Cliente solicita portabilidad a otra comercializadora"
description_en: "Customer requests portability to another provider"
severity_weight: 1.0
requires_evidence: true
# --- Frustración ---
HIGH_FRUSTRATION:
category: "sentiment"
description: "Cliente muestra alta frustración"
description_en: "Customer shows high frustration"
severity_weight: 0.7
requires_evidence: true
THREAT_TO_LEAVE:
category: "sentiment"
description: "Cliente amenaza con cambiar de compañía"
description_en: "Customer threatens to switch providers"
severity_weight: 0.9
requires_evidence: true
# --- Emergente ---
OTHER_EMERGENT:
category: "emergent"
description: "Señal de churn emergente"
description_en: "Emergent churn signal"
severity_weight: 0.5
requires_evidence: true
requires_proposed_label: true
# ============================================
# FCR FAILURE DRIVERS - Utilities/Energy
# ============================================
fcr_failure:
# --- Averías/Incidencias ---
OUTAGE_PENDING:
category: "outage"
description: "Avería pendiente de resolver"
description_en: "Outage pending resolution"
severity_weight: 0.9
requires_evidence: true
TECHNICIAN_VISIT_REQUIRED:
category: "outage"
description: "Requiere visita de técnico"
description_en: "Requires technician visit"
severity_weight: 0.7
requires_evidence: true
OUTAGE_CAUSE_UNKNOWN:
category: "outage"
description: "Causa de avería no determinada"
description_en: "Outage cause not determined"
severity_weight: 0.6
requires_evidence: true
# --- Facturación ---
BILLING_REVIEW_PENDING:
category: "billing"
description: "Revisión de factura pendiente"
description_en: "Bill review pending"
severity_weight: 0.8
requires_evidence: true
REFUND_PENDING:
category: "billing"
description: "Reembolso o abono pendiente"
description_en: "Refund pending"
severity_weight: 0.7
requires_evidence: true
METER_READING_REQUIRED:
category: "billing"
description: "Requiere lectura de contador"
description_en: "Meter reading required"
severity_weight: 0.6
requires_evidence: true
# --- Información ---
MISSING_INFORMATION:
category: "information"
description: "Información incompleta proporcionada"
description_en: "Incomplete information provided"
severity_weight: 0.7
requires_evidence: true
UNCLEAR_NEXT_STEPS:
category: "information"
description: "Cliente no tiene claros los próximos pasos"
description_en: "Customer unclear on next steps"
severity_weight: 0.7
requires_evidence: true
INCORRECT_INFORMATION_GIVEN:
category: "information"
description: "Se proporcionó información incorrecta"
description_en: "Incorrect information was given"
severity_weight: 0.9
requires_evidence: true
# --- Proceso ---
CALLBACK_PROMISED:
category: "process"
description: "Se prometió callback"
description_en: "Callback was promised"
severity_weight: 0.6
requires_evidence: true
ESCALATION_REQUIRED:
category: "process"
description: "Requiere escalación a otro departamento"
description_en: "Requires escalation"
severity_weight: 0.7
requires_evidence: true
CONTRACT_CHANGE_PENDING:
category: "process"
description: "Cambio de contrato pendiente de procesar"
description_en: "Contract change pending processing"
severity_weight: 0.6
requires_evidence: true
SYSTEM_LIMITATION:
category: "process"
description: "Limitación del sistema impidió resolución"
description_en: "System limitation prevented resolution"
severity_weight: 0.7
requires_evidence: true
# --- Emergente ---
OTHER_EMERGENT:
category: "emergent"
description: "Factor FCR emergente"
description_en: "Emergent FCR factor"
severity_weight: 0.5
requires_evidence: true
requires_proposed_label: true
# ============================================
# AGENT SKILL INDICATORS - Utilities/Energy
# ============================================
agent_skills:
positive:
EFFECTIVE_PROBLEM_RESOLUTION:
description: "Resuelve problema eficientemente"
description_en: "Resolves problem efficiently"
skill_area: "problem_solving"
CLEAR_TECHNICAL_EXPLANATION:
description: "Explica temas técnicos de forma clara"
description_en: "Explains technical topics clearly"
skill_area: "technical"
GOOD_RAPPORT:
description: "Construye buena relación con el cliente"
description_en: "Builds good rapport with customer"
skill_area: "communication"
BILLING_EXPERTISE:
description: "Demuestra conocimiento de facturación"
description_en: "Demonstrates billing expertise"
skill_area: "technical"
ACTIVE_LISTENING:
description: "Escucha activa al cliente"
description_en: "Active listening to customer"
skill_area: "communication"
EMPATHY_SHOWN:
description: "Muestra empatía ante problemas"
description_en: "Shows empathy for problems"
skill_area: "soft_skills"
CLEAR_COMMUNICATION:
description: "Comunicación clara y estructurada"
description_en: "Clear and structured communication"
skill_area: "communication"
PROACTIVE_SOLUTIONS:
description: "Ofrece soluciones proactivamente"
description_en: "Proactively offers solutions"
skill_area: "problem_solving"
OUTAGE_HANDLING:
description: "Gestiona averías efectivamente"
description_en: "Handles outages effectively"
skill_area: "technical"
RETENTION_SKILLS:
description: "Demuestra habilidad de retención"
description_en: "Demonstrates retention skills"
skill_area: "sales"
improvement_needed:
POOR_PROBLEM_RESOLUTION:
description: "No resuelve el problema adecuadamente"
description_en: "Doesn't resolve problem adequately"
skill_area: "problem_solving"
CONFUSING_EXPLANATION:
description: "Explicaciones confusas o técnicas"
description_en: "Confusing or overly technical explanations"
skill_area: "technical"
LACK_OF_RAPPORT:
description: "No construye relación con el cliente"
description_en: "Doesn't build rapport with customer"
skill_area: "communication"
BILLING_KNOWLEDGE_GAPS:
description: "Gaps en conocimiento de facturación"
description_en: "Gaps in billing knowledge"
skill_area: "technical"
NOT_LISTENING:
description: "No escucha al cliente"
description_en: "Doesn't listen to customer"
skill_area: "communication"
LACK_OF_EMPATHY:
description: "Falta de empatía ante problemas"
description_en: "Lack of empathy for problems"
skill_area: "soft_skills"
CONFUSING_COMMUNICATION:
description: "Comunicación confusa o desorganizada"
description_en: "Confusing or disorganized communication"
skill_area: "communication"
REACTIVE_ONLY:
description: "Solo reactivo, no busca soluciones"
description_en: "Only reactive, doesn't seek solutions"
skill_area: "problem_solving"
POOR_OUTAGE_HANDLING:
description: "Gestión deficiente de averías"
description_en: "Poor outage handling"
skill_area: "technical"
NO_RETENTION_EFFORT:
description: "No intenta retener al cliente"
description_en: "No retention effort"
skill_area: "sales"
# ============================================
# CALL OUTCOMES - Utilities/Energy
# ============================================
call_outcomes:
# --- Averías ---
- OUTAGE_REPORTED
- OUTAGE_RESOLVED
- OUTAGE_ESCALATED
- TECHNICIAN_SCHEDULED
# --- Facturación ---
- BILLING_INQUIRY_RESOLVED
- BILLING_DISPUTE_OPENED
- PAYMENT_ARRANGEMENT_MADE
- REFUND_PROCESSED
# --- Contratos ---
- RATE_CHANGE_COMPLETED
- CONTRACT_RENEWED
- SERVICE_UPGRADED
- SERVICE_DOWNGRADED
# --- Retención ---
- CANCELLATION_SAVED
- CANCELLATION_COMPLETED
- PORTABILITY_INITIATED
# --- General ---
- INQUIRY_RESOLVED
- INQUIRY_UNRESOLVED
- CALLBACK_SCHEDULED
- TRANSFER_OUT
- UNKNOWN
# ============================================
# VALIDATION RULES
# ============================================
validation:
min_evidence_spans: 1
confidence_thresholds:
high: 0.8
medium: 0.6
low: 0.4
reject: 0.3
reject_low_confidence: true
emergent:
require_proposed_label: true
require_evidence: true
exclude_from_main_rca: true

View File

@@ -0,0 +1,47 @@
"""
CXInsights - Schema Definitions
Export all schema models from the current version.
"""
from config.schemas.call_analysis_v1 import (
SCHEMA_VERSION,
BatchManifest,
CallAnalysis,
CallOutcome,
CompressedTranscript,
DataSource,
Event,
EventType,
EvidenceSpan,
FailureReason,
ObservedFeatures,
ProcessingStatus,
RCALabel,
SpeakerTurn,
Traceability,
Transcript,
TranscriptMetadata,
TurnMetrics,
)
__all__ = [
"SCHEMA_VERSION",
"DataSource",
"ProcessingStatus",
"FailureReason",
"EventType",
"CallOutcome",
"Traceability",
"SpeakerTurn",
"TranscriptMetadata",
"Transcript",
"Event",
"TurnMetrics",
"ObservedFeatures",
"EvidenceSpan",
"RCALabel",
"CallAnalysis",
"CompressedTranscript",
"BatchManifest",
]

View File

@@ -0,0 +1,416 @@
"""
CXInsights - Call Analysis Schema v1.0
Data contracts for the call analysis pipeline.
All outputs MUST include: schema_version, prompt_version, model_id
This schema defines:
- OBSERVED: Facts extracted from STT (deterministic)
- INFERRED: Conclusions from LLM (requires evidence)
"""
from datetime import datetime
from enum import Enum
from typing import Literal
from pydantic import BaseModel, Field, field_validator
# ============================================
# SCHEMA VERSION
# ============================================
SCHEMA_VERSION = "1.0.0"
# ============================================
# ENUMS
# ============================================
class DataSource(str, Enum):
"""Source of data - critical for audit trail"""
OBSERVED = "observed" # From STT, deterministic
INFERRED = "inferred" # From LLM, requires evidence
class ProcessingStatus(str, Enum):
"""Processing status for each call"""
SUCCESS = "success"
PARTIAL = "partial"
FAILED = "failed"
class FailureReason(str, Enum):
"""Reasons for processing failure"""
LOW_AUDIO_QUALITY = "LOW_AUDIO_QUALITY"
TRANSCRIPTION_FAILED = "TRANSCRIPTION_FAILED"
LLM_PARSE_ERROR = "LLM_PARSE_ERROR"
NO_EVIDENCE_FOUND = "NO_EVIDENCE_FOUND"
SCHEMA_VALIDATION_ERROR = "SCHEMA_VALIDATION_ERROR"
TIMEOUT = "TIMEOUT"
RATE_LIMITED = "RATE_LIMITED"
UNKNOWN = "UNKNOWN"
class EventType(str, Enum):
"""Observable events (detected without LLM)"""
HOLD_START = "HOLD_START"
HOLD_END = "HOLD_END"
TRANSFER = "TRANSFER"
ESCALATION = "ESCALATION"
SILENCE = "SILENCE"
INTERRUPTION = "INTERRUPTION"
class CallOutcome(str, Enum):
"""Final outcome of the call"""
SALE_COMPLETED = "SALE_COMPLETED"
SALE_LOST = "SALE_LOST"
CANCELLATION_SAVED = "CANCELLATION_SAVED"
CANCELLATION_COMPLETED = "CANCELLATION_COMPLETED"
INQUIRY_RESOLVED = "INQUIRY_RESOLVED"
INQUIRY_UNRESOLVED = "INQUIRY_UNRESOLVED"
COMPLAINT_RESOLVED = "COMPLAINT_RESOLVED"
COMPLAINT_UNRESOLVED = "COMPLAINT_UNRESOLVED"
TRANSFER_OUT = "TRANSFER_OUT"
CALLBACK_SCHEDULED = "CALLBACK_SCHEDULED"
UNKNOWN = "UNKNOWN"
# ============================================
# TRACEABILITY (Required on all outputs)
# ============================================
class Traceability(BaseModel):
"""Traceability metadata - REQUIRED on all analysis outputs"""
schema_version: str = Field(
default=SCHEMA_VERSION,
description="Version of this schema",
)
prompt_version: str = Field(
description="Version of the prompt used for inference",
)
model_id: str = Field(
description="Model identifier (e.g., gpt-4o-mini-2024-07-18)",
)
created_at: datetime = Field(
default_factory=datetime.utcnow,
description="Timestamp of analysis",
)
# ============================================
# TRANSCRIPT MODELS (OBSERVED)
# ============================================
class SpeakerTurn(BaseModel):
"""Single speaker turn in transcript"""
speaker: str = Field(description="Speaker identifier (A, B, agent, customer)")
text: str = Field(description="Transcribed text")
start_time: float = Field(description="Start time in seconds")
end_time: float = Field(description="End time in seconds")
confidence: float | None = Field(
default=None,
ge=0.0,
le=1.0,
description="STT confidence score",
)
class TranscriptMetadata(BaseModel):
"""Metadata about the transcript"""
audio_duration_sec: float = Field(description="Total audio duration in seconds")
language: str = Field(default="es", description="Detected language")
provider: str = Field(description="STT provider (assemblyai, whisper, etc.)")
job_id: str | None = Field(default=None, description="Provider job ID")
created_at: datetime = Field(
default_factory=datetime.utcnow,
description="Timestamp of transcription",
)
class Transcript(BaseModel):
"""Complete transcript with speaker diarization - OBSERVED data"""
call_id: str = Field(description="Unique call identifier")
turns: list[SpeakerTurn] = Field(description="List of speaker turns")
metadata: TranscriptMetadata = Field(description="Transcript metadata")
full_text: str | None = Field(
default=None,
description="Full concatenated text (optional)",
)
# ============================================
# EVENT MODELS (OBSERVED)
# ============================================
class Event(BaseModel):
"""Observable event detected without LLM - OBSERVED data"""
event_type: EventType = Field(description="Type of event")
start_time: float = Field(description="Event start time in seconds")
end_time: float | None = Field(
default=None,
description="Event end time in seconds (if applicable)",
)
duration_sec: float | None = Field(
default=None,
description="Event duration in seconds",
)
metadata: dict | None = Field(
default=None,
description="Additional event-specific data",
)
source: Literal["observed"] = Field(
default="observed",
description="Events are always observed, not inferred",
)
# ============================================
# TURN METRICS (OBSERVED)
# ============================================
class TurnMetrics(BaseModel):
"""Metrics computed from transcript - OBSERVED data"""
total_turns: int = Field(description="Total number of turns")
agent_turns: int = Field(description="Number of agent turns")
customer_turns: int = Field(description="Number of customer turns")
agent_talk_ratio: float = Field(
ge=0.0,
le=1.0,
description="Ratio of agent talk time",
)
customer_talk_ratio: float = Field(
ge=0.0,
le=1.0,
description="Ratio of customer talk time",
)
silence_ratio: float = Field(
ge=0.0,
le=1.0,
description="Ratio of silence time",
)
interruption_count: int = Field(
default=0,
description="Number of detected interruptions",
)
avg_turn_duration_sec: float = Field(description="Average turn duration")
source: Literal["observed"] = Field(
default="observed",
description="Metrics are always observed, not inferred",
)
# ============================================
# OBSERVED FEATURES (Aggregated)
# ============================================
class ObservedFeatures(BaseModel):
"""All observed features for a call - deterministic, no LLM"""
call_id: str = Field(description="Unique call identifier")
events: list[Event] = Field(
default_factory=list,
description="Detected events",
)
turn_metrics: TurnMetrics = Field(description="Turn-based metrics")
hold_count: int = Field(default=0, description="Number of hold events")
total_hold_duration_sec: float = Field(
default=0.0,
description="Total hold duration",
)
transfer_count: int = Field(default=0, description="Number of transfers")
silence_count: int = Field(
default=0,
description="Number of significant silences",
)
created_at: datetime = Field(default_factory=datetime.utcnow)
# ============================================
# EVIDENCE MODELS (For INFERRED data)
# ============================================
class EvidenceSpan(BaseModel):
"""Evidence from transcript supporting an inference"""
text: str = Field(
max_length=500,
description="Quoted text from transcript",
)
start_time: float = Field(description="Start time in seconds")
end_time: float = Field(description="End time in seconds")
speaker: str | None = Field(
default=None,
description="Speaker of this evidence",
)
@field_validator("text")
@classmethod
def text_not_empty(cls, v: str) -> str:
if not v.strip():
raise ValueError("Evidence text cannot be empty")
return v.strip()
# ============================================
# RCA LABELS (INFERRED)
# ============================================
class RCALabel(BaseModel):
"""Root Cause Analysis label - INFERRED data (requires evidence)"""
driver_code: str = Field(
description="Driver code from taxonomy (e.g., PRICE_TOO_HIGH)",
)
confidence: float = Field(
ge=0.0,
le=1.0,
description="Confidence score (0-1)",
)
evidence_spans: list[EvidenceSpan] = Field(
min_length=1,
description="Supporting evidence (minimum 1 required)",
)
reasoning: str | None = Field(
default=None,
max_length=500,
description="Brief reasoning for this classification",
)
proposed_label: str | None = Field(
default=None,
description="For OTHER_EMERGENT: proposed new label",
)
source: Literal["inferred"] = Field(
default="inferred",
description="RCA labels are always inferred",
)
@field_validator("evidence_spans")
@classmethod
def at_least_one_evidence(cls, v: list[EvidenceSpan]) -> list[EvidenceSpan]:
if len(v) < 1:
raise ValueError("At least one evidence span is required")
return v
# ============================================
# CALL ANALYSIS (Complete Output)
# ============================================
class CallAnalysis(BaseModel):
"""
Complete analysis output for a single call.
Combines:
- OBSERVED: Features, events, metrics (from STT)
- INFERRED: RCA labels, outcome (from LLM)
MUST include traceability for audit.
"""
# === Identifiers ===
call_id: str = Field(description="Unique call identifier")
batch_id: str = Field(description="Batch identifier")
# === Processing Status ===
status: ProcessingStatus = Field(description="Processing status")
failure_reason: FailureReason | None = Field(
default=None,
description="Reason for failure (if status != success)",
)
# === OBSERVED Data ===
observed: ObservedFeatures = Field(description="Observed features (deterministic)")
# === INFERRED Data ===
outcome: CallOutcome = Field(description="Call outcome (inferred)")
lost_sales_drivers: list[RCALabel] = Field(
default_factory=list,
description="Lost sales RCA labels",
)
poor_cx_drivers: list[RCALabel] = Field(
default_factory=list,
description="Poor CX RCA labels",
)
# === Traceability (REQUIRED) ===
traceability: Traceability = Field(description="Version and audit metadata")
# === Timestamps ===
created_at: datetime = Field(default_factory=datetime.utcnow)
# ============================================
# COMPRESSED TRANSCRIPT (For LLM Input)
# ============================================
class CompressedTranscript(BaseModel):
"""Compressed transcript for LLM inference - reduces token usage"""
call_id: str = Field(description="Unique call identifier")
customer_intent: str = Field(description="Summarized customer intent")
agent_offers: list[str] = Field(
default_factory=list,
description="Key offers made by agent",
)
objections: list[str] = Field(
default_factory=list,
description="Customer objections",
)
resolution_statements: list[str] = Field(
default_factory=list,
description="Resolution statements",
)
key_exchanges: list[dict] = Field(
default_factory=list,
description="Key exchanges with timestamps",
)
original_token_count: int = Field(description="Tokens in original transcript")
compressed_token_count: int = Field(description="Tokens after compression")
compression_ratio: float = Field(
ge=0.0,
le=1.0,
description="Compression ratio achieved",
)
# ============================================
# BATCH MANIFEST
# ============================================
class BatchManifest(BaseModel):
"""Manifest for a processing batch"""
batch_id: str = Field(description="Unique batch identifier")
total_calls: int = Field(description="Total calls in batch")
processed_calls: int = Field(default=0, description="Calls processed")
success_count: int = Field(default=0, description="Successful processing")
partial_count: int = Field(default=0, description="Partial processing")
failed_count: int = Field(default=0, description="Failed processing")
status: str = Field(default="pending", description="Batch status")
started_at: datetime | None = Field(default=None)
completed_at: datetime | None = Field(default=None)
traceability: Traceability = Field(description="Version metadata")

207
config/settings.yaml Normal file
View File

@@ -0,0 +1,207 @@
# ============================================
# CXInsights - Settings Configuration
# ============================================
# Non-secret configuration values
# Secrets (API keys) go in .env
# ============================================
# ============================================
# GENERAL
# ============================================
project:
name: "CXInsights"
version: "0.1.0"
language: "es" # Primary language for analysis
# ============================================
# BATCH PROCESSING
# ============================================
batch:
# Maximum calls per batch (cost protection)
max_calls: 5000
# Maximum audio minutes per batch (cost protection)
max_audio_minutes: 40000
# Default AHT assumption for cost estimation (minutes)
default_aht_minutes: 7
# ============================================
# TRANSCRIPTION (STT)
# ============================================
transcription:
# Default provider
provider: "assemblyai"
# AssemblyAI settings
assemblyai:
language_code: "es"
speaker_labels: true
auto_chapters: false
entity_detection: false
# Audio validation
audio:
supported_formats: ["mp3", "wav", "m4a"]
max_duration_seconds: 18000 # 5 hours
min_duration_seconds: 30
# ============================================
# FEATURES (Deterministic Extraction)
# ============================================
features:
# Silence detection
silence:
threshold_seconds: 5.0
min_gap_seconds: 1.0
# Turn metrics
turn_metrics:
min_turn_duration_seconds: 0.5
interruption_overlap_seconds: 0.3
# ============================================
# COMPRESSION
# ============================================
compression:
# Target token reduction percentage
target_reduction_percent: 60
# Max tokens after compression
max_compressed_tokens: 2000
# Preserve elements
preserve:
- customer_intent
- agent_offers
- objections
- resolution_statements
- key_timestamps
# ============================================
# INFERENCE (LLM)
# ============================================
inference:
# Default model
model: "gpt-4o-mini"
# Model settings
temperature: 0.1
max_tokens: 4000
# Batch processing
batch_size: 10
checkpoint_interval: 50
# Retry settings
max_retries: 5
backoff_base: 2.0
backoff_max: 60.0
# Response validation
require_evidence: true
min_evidence_spans: 1
# ============================================
# VALIDATION (Quality Gate)
# ============================================
validation:
# Confidence thresholds
confidence:
accept: 0.6
review: 0.4
reject: 0.3
# Evidence requirements
evidence:
required: true
min_spans: 1
max_span_length_chars: 500
# Schema validation
schema:
strict: true
version: "1.0.0"
# ============================================
# AGGREGATION (RCA Building)
# ============================================
aggregation:
# Minimum sample size for statistics
min_sample_size: 10
# Severity score calculation
severity:
# Weights for severity formula
frequency_weight: 0.4
impact_weight: 0.4
confidence_weight: 0.2
# RCA Tree building
rca_tree:
# Minimum percentage to include in tree
min_percentage: 1.0
# Maximum drivers per category
max_drivers_per_category: 10
# Include emergent in separate section
separate_emergent: true
# ============================================
# EXPORTS
# ============================================
exports:
# PDF Report
pdf:
template: "executive_summary"
max_pages: 5
include_charts: true
# Excel Export
excel:
include_raw_data: true
include_pivot_tables: true
# JSON Export
json:
pretty_print: true
include_metadata: true
# ============================================
# LOGGING
# ============================================
logging:
# Log level (DEBUG, INFO, WARNING, ERROR)
level: "INFO"
# Log format
format: "structured" # "structured" or "plain"
# Retention
retention_days: 30
error_retention_days: 90
# What to log
log_transcripts: false # Never log full transcripts
log_evidence_spans: true
log_token_usage: true
# ============================================
# PROMPT VERSIONS
# ============================================
prompts:
# Active prompt versions
call_analysis: "v1.0"
rca_synthesis: "v1.0"