feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)

Dashboard Features:
- 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export
- Beyond Brand Identity styling (colors #6D84E3, Outfit font)
- RCA Sankey diagram (Driver → Outcome → Churn Risk flow)
- Correlation heatmaps (driver co-occurrence, driver-outcome)
- Outcome Deep Dive (root causes, correlation, duration analysis)
- Export functionality (Excel, HTML, JSON)

Blueprint Compliance:
- FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga)
- Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga)
- Agent: Talento Para Replicar / Oportunidades de Mejora
- Fixed FCR rate calculation (only FIRST_CALL counts as success)

Technical:
- Streamlit + Plotly for interactive visualizations
- Light theme configuration (.streamlit/config.toml)
- Fixed Plotly colorbar titlefont deprecation

Documentation:
- Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md
- Added 4 new technical decisions (TD-014 to TD-017)
- Created TROUBLESHOOTING.md with 10 common issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
sujucu70
2026-01-19 16:27:30 +01:00
commit 75e7b9da3d
110 changed files with 28247 additions and 0 deletions

289
docs/DATA_CONTRACTS.md Normal file
View File

@@ -0,0 +1,289 @@
# DATA_CONTRACTS.md
> Schemas de todos los datos que fluyen por el sistema
---
## Regla de oro
> Si cambias un schema, actualiza este doc PRIMERO, luego implementa el código.
---
## Schema: Transcript
**Archivo**: `src/transcription/models.py`
```python
@dataclass
class SpeakerTurn:
speaker: Literal["agent", "customer"]
text: str
start_time: float # seconds
end_time: float # seconds
confidence: float = 1.0
@dataclass
class TranscriptMetadata:
audio_duration_sec: float
language: str = "es"
provider: str = "assemblyai"
job_id: str | None = None
created_at: datetime = field(default_factory=datetime.now)
@dataclass
class Transcript:
call_id: str
turns: list[SpeakerTurn]
metadata: TranscriptMetadata
detected_events: list[Event] = field(default_factory=list)
```
---
## Schema: Event
**Archivo**: `src/models/call_analysis.py`
```python
class EventType(str, Enum):
HOLD_START = "hold_start"
HOLD_END = "hold_end"
TRANSFER = "transfer"
ESCALATION = "escalation"
SILENCE = "silence"
INTERRUPTION = "interruption"
@dataclass
class Event:
event_type: EventType
timestamp: float # seconds from call start
duration_sec: float | None = None
metadata: dict = field(default_factory=dict)
```
---
## Schema: CompressedTranscript
**Archivo**: `src/compression/models.py`
```python
@dataclass
class CustomerIntent:
intent_type: IntentType # CANCEL, INQUIRY, COMPLAINT, etc.
text: str
timestamp: float
confidence: float = 0.8
@dataclass
class AgentOffer:
offer_type: OfferType # DISCOUNT, UPGRADE, RETENTION, etc.
text: str
timestamp: float
@dataclass
class CustomerObjection:
objection_type: ObjectionType # PRICE, SERVICE, COMPETITOR, etc.
text: str
timestamp: float
@dataclass
class CompressedTranscript:
call_id: str
customer_intents: list[CustomerIntent]
agent_offers: list[AgentOffer]
objections: list[CustomerObjection]
resolutions: list[ResolutionStatement]
key_moments: list[KeyMoment]
compression_ratio: float = 0.0 # tokens_after / tokens_before
```
---
## Schema: CallAnalysis
**Archivo**: `src/models/call_analysis.py`
```python
@dataclass
class EvidenceSpan:
text: str
start_time: float | None = None
end_time: float | None = None
@dataclass
class RCALabel:
driver_code: str # From rca_taxonomy.yaml
confidence: float # 0.0-1.0
evidence_spans: list[EvidenceSpan] # Min 1 required!
reasoning: str | None = None
@dataclass
class ObservedFeatures:
audio_duration_sec: float
agent_talk_ratio: float | None = None
customer_talk_ratio: float | None = None
hold_time_total_sec: float | None = None
transfer_count: int = 0
silence_count: int = 0
@dataclass
class Traceability:
schema_version: str
prompt_version: str
model_id: str
processed_at: datetime = field(default_factory=datetime.now)
class CallOutcome(str, Enum):
SALE_COMPLETED = "sale_completed"
SALE_LOST = "sale_lost"
INQUIRY_RESOLVED = "inquiry_resolved"
INQUIRY_UNRESOLVED = "inquiry_unresolved"
COMPLAINT_RESOLVED = "complaint_resolved"
COMPLAINT_UNRESOLVED = "complaint_unresolved"
class ProcessingStatus(str, Enum):
SUCCESS = "success"
PARTIAL = "partial"
FAILED = "failed"
@dataclass
class CallAnalysis:
call_id: str
batch_id: str
status: ProcessingStatus
observed: ObservedFeatures
outcome: CallOutcome | None = None
lost_sales_drivers: list[RCALabel] = field(default_factory=list)
poor_cx_drivers: list[RCALabel] = field(default_factory=list)
traceability: Traceability | None = None
error_message: str | None = None
```
---
## Schema: BatchAggregation
**Archivo**: `src/aggregation/models.py`
```python
@dataclass
class DriverFrequency:
driver_code: str
category: Literal["lost_sales", "poor_cx"]
total_occurrences: int
calls_affected: int
total_calls_in_batch: int
occurrence_rate: float # occurrences / total_calls
call_rate: float # calls_affected / total_calls
avg_confidence: float
min_confidence: float
max_confidence: float
class ImpactLevel(str, Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
@dataclass
class DriverSeverity:
driver_code: str
category: Literal["lost_sales", "poor_cx"]
base_severity: float
frequency_factor: float
confidence_factor: float
co_occurrence_factor: float
severity_score: float # 0-100
impact_level: ImpactLevel
@dataclass
class RCATree:
batch_id: str
total_calls: int
calls_with_lost_sales: int
calls_with_poor_cx: int
calls_with_both: int
top_lost_sales_drivers: list[str]
top_poor_cx_drivers: list[str]
nodes: list[RCANode] = field(default_factory=list)
@dataclass
class BatchAggregation:
batch_id: str
total_calls_processed: int
successful_analyses: int
failed_analyses: int
lost_sales_frequencies: list[DriverFrequency]
poor_cx_frequencies: list[DriverFrequency]
lost_sales_severities: list[DriverSeverity]
poor_cx_severities: list[DriverSeverity]
rca_tree: RCATree | None = None
emergent_patterns: list[dict] = field(default_factory=list)
```
---
## Schema: PipelineManifest
**Archivo**: `src/pipeline/models.py`
```python
class PipelineStage(str, Enum):
TRANSCRIPTION = "transcription"
FEATURE_EXTRACTION = "feature_extraction"
COMPRESSION = "compression"
INFERENCE = "inference"
AGGREGATION = "aggregation"
EXPORT = "export"
class StageStatus(str, Enum):
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
SKIPPED = "skipped"
@dataclass
class StageManifest:
stage: PipelineStage
status: StageStatus = StageStatus.PENDING
started_at: datetime | None = None
completed_at: datetime | None = None
total_items: int = 0
processed_items: int = 0
failed_items: int = 0
errors: list[dict] = field(default_factory=list)
metadata: dict = field(default_factory=dict)
@dataclass
class PipelineManifest:
batch_id: str
created_at: datetime = field(default_factory=datetime.now)
status: StageStatus = StageStatus.PENDING
current_stage: PipelineStage | None = None
total_audio_files: int = 0
stages: dict[PipelineStage, StageManifest] = field(default_factory=dict)
```
---
## Validation Rules
### RCALabel
- `evidence_spans` MUST have at least 1 element
- `driver_code` MUST be in rca_taxonomy.yaml OR be "OTHER_EMERGENT"
- `confidence` MUST be between 0.0 and 1.0
### CallAnalysis
- `traceability` MUST be present
- If `status == SUCCESS`, `outcome` MUST be present
- If `outcome == SALE_LOST`, `lost_sales_drivers` SHOULD have entries
### BatchAggregation
- `total_calls_processed` == `successful_analyses` + `failed_analyses`
---
**Última actualización**: 2026-01-19