Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
290 lines
6.8 KiB
Markdown
290 lines
6.8 KiB
Markdown
# DATA_CONTRACTS.md
|
|
|
|
> Schemas de todos los datos que fluyen por el sistema
|
|
|
|
---
|
|
|
|
## Regla de oro
|
|
|
|
> Si cambias un schema, actualiza este doc PRIMERO, luego implementa el código.
|
|
|
|
---
|
|
|
|
## Schema: Transcript
|
|
|
|
**Archivo**: `src/transcription/models.py`
|
|
|
|
```python
|
|
@dataclass
|
|
class SpeakerTurn:
|
|
speaker: Literal["agent", "customer"]
|
|
text: str
|
|
start_time: float # seconds
|
|
end_time: float # seconds
|
|
confidence: float = 1.0
|
|
|
|
@dataclass
|
|
class TranscriptMetadata:
|
|
audio_duration_sec: float
|
|
language: str = "es"
|
|
provider: str = "assemblyai"
|
|
job_id: str | None = None
|
|
created_at: datetime = field(default_factory=datetime.now)
|
|
|
|
@dataclass
|
|
class Transcript:
|
|
call_id: str
|
|
turns: list[SpeakerTurn]
|
|
metadata: TranscriptMetadata
|
|
detected_events: list[Event] = field(default_factory=list)
|
|
```
|
|
|
|
---
|
|
|
|
## Schema: Event
|
|
|
|
**Archivo**: `src/models/call_analysis.py`
|
|
|
|
```python
|
|
class EventType(str, Enum):
|
|
HOLD_START = "hold_start"
|
|
HOLD_END = "hold_end"
|
|
TRANSFER = "transfer"
|
|
ESCALATION = "escalation"
|
|
SILENCE = "silence"
|
|
INTERRUPTION = "interruption"
|
|
|
|
@dataclass
|
|
class Event:
|
|
event_type: EventType
|
|
timestamp: float # seconds from call start
|
|
duration_sec: float | None = None
|
|
metadata: dict = field(default_factory=dict)
|
|
```
|
|
|
|
---
|
|
|
|
## Schema: CompressedTranscript
|
|
|
|
**Archivo**: `src/compression/models.py`
|
|
|
|
```python
|
|
@dataclass
|
|
class CustomerIntent:
|
|
intent_type: IntentType # CANCEL, INQUIRY, COMPLAINT, etc.
|
|
text: str
|
|
timestamp: float
|
|
confidence: float = 0.8
|
|
|
|
@dataclass
|
|
class AgentOffer:
|
|
offer_type: OfferType # DISCOUNT, UPGRADE, RETENTION, etc.
|
|
text: str
|
|
timestamp: float
|
|
|
|
@dataclass
|
|
class CustomerObjection:
|
|
objection_type: ObjectionType # PRICE, SERVICE, COMPETITOR, etc.
|
|
text: str
|
|
timestamp: float
|
|
|
|
@dataclass
|
|
class CompressedTranscript:
|
|
call_id: str
|
|
customer_intents: list[CustomerIntent]
|
|
agent_offers: list[AgentOffer]
|
|
objections: list[CustomerObjection]
|
|
resolutions: list[ResolutionStatement]
|
|
key_moments: list[KeyMoment]
|
|
compression_ratio: float = 0.0 # tokens_after / tokens_before
|
|
```
|
|
|
|
---
|
|
|
|
## Schema: CallAnalysis
|
|
|
|
**Archivo**: `src/models/call_analysis.py`
|
|
|
|
```python
|
|
@dataclass
|
|
class EvidenceSpan:
|
|
text: str
|
|
start_time: float | None = None
|
|
end_time: float | None = None
|
|
|
|
@dataclass
|
|
class RCALabel:
|
|
driver_code: str # From rca_taxonomy.yaml
|
|
confidence: float # 0.0-1.0
|
|
evidence_spans: list[EvidenceSpan] # Min 1 required!
|
|
reasoning: str | None = None
|
|
|
|
@dataclass
|
|
class ObservedFeatures:
|
|
audio_duration_sec: float
|
|
agent_talk_ratio: float | None = None
|
|
customer_talk_ratio: float | None = None
|
|
hold_time_total_sec: float | None = None
|
|
transfer_count: int = 0
|
|
silence_count: int = 0
|
|
|
|
@dataclass
|
|
class Traceability:
|
|
schema_version: str
|
|
prompt_version: str
|
|
model_id: str
|
|
processed_at: datetime = field(default_factory=datetime.now)
|
|
|
|
class CallOutcome(str, Enum):
|
|
SALE_COMPLETED = "sale_completed"
|
|
SALE_LOST = "sale_lost"
|
|
INQUIRY_RESOLVED = "inquiry_resolved"
|
|
INQUIRY_UNRESOLVED = "inquiry_unresolved"
|
|
COMPLAINT_RESOLVED = "complaint_resolved"
|
|
COMPLAINT_UNRESOLVED = "complaint_unresolved"
|
|
|
|
class ProcessingStatus(str, Enum):
|
|
SUCCESS = "success"
|
|
PARTIAL = "partial"
|
|
FAILED = "failed"
|
|
|
|
@dataclass
|
|
class CallAnalysis:
|
|
call_id: str
|
|
batch_id: str
|
|
status: ProcessingStatus
|
|
observed: ObservedFeatures
|
|
outcome: CallOutcome | None = None
|
|
lost_sales_drivers: list[RCALabel] = field(default_factory=list)
|
|
poor_cx_drivers: list[RCALabel] = field(default_factory=list)
|
|
traceability: Traceability | None = None
|
|
error_message: str | None = None
|
|
```
|
|
|
|
---
|
|
|
|
## Schema: BatchAggregation
|
|
|
|
**Archivo**: `src/aggregation/models.py`
|
|
|
|
```python
|
|
@dataclass
|
|
class DriverFrequency:
|
|
driver_code: str
|
|
category: Literal["lost_sales", "poor_cx"]
|
|
total_occurrences: int
|
|
calls_affected: int
|
|
total_calls_in_batch: int
|
|
occurrence_rate: float # occurrences / total_calls
|
|
call_rate: float # calls_affected / total_calls
|
|
avg_confidence: float
|
|
min_confidence: float
|
|
max_confidence: float
|
|
|
|
class ImpactLevel(str, Enum):
|
|
CRITICAL = "critical"
|
|
HIGH = "high"
|
|
MEDIUM = "medium"
|
|
LOW = "low"
|
|
|
|
@dataclass
|
|
class DriverSeverity:
|
|
driver_code: str
|
|
category: Literal["lost_sales", "poor_cx"]
|
|
base_severity: float
|
|
frequency_factor: float
|
|
confidence_factor: float
|
|
co_occurrence_factor: float
|
|
severity_score: float # 0-100
|
|
impact_level: ImpactLevel
|
|
|
|
@dataclass
|
|
class RCATree:
|
|
batch_id: str
|
|
total_calls: int
|
|
calls_with_lost_sales: int
|
|
calls_with_poor_cx: int
|
|
calls_with_both: int
|
|
top_lost_sales_drivers: list[str]
|
|
top_poor_cx_drivers: list[str]
|
|
nodes: list[RCANode] = field(default_factory=list)
|
|
|
|
@dataclass
|
|
class BatchAggregation:
|
|
batch_id: str
|
|
total_calls_processed: int
|
|
successful_analyses: int
|
|
failed_analyses: int
|
|
lost_sales_frequencies: list[DriverFrequency]
|
|
poor_cx_frequencies: list[DriverFrequency]
|
|
lost_sales_severities: list[DriverSeverity]
|
|
poor_cx_severities: list[DriverSeverity]
|
|
rca_tree: RCATree | None = None
|
|
emergent_patterns: list[dict] = field(default_factory=list)
|
|
```
|
|
|
|
---
|
|
|
|
## Schema: PipelineManifest
|
|
|
|
**Archivo**: `src/pipeline/models.py`
|
|
|
|
```python
|
|
class PipelineStage(str, Enum):
|
|
TRANSCRIPTION = "transcription"
|
|
FEATURE_EXTRACTION = "feature_extraction"
|
|
COMPRESSION = "compression"
|
|
INFERENCE = "inference"
|
|
AGGREGATION = "aggregation"
|
|
EXPORT = "export"
|
|
|
|
class StageStatus(str, Enum):
|
|
PENDING = "pending"
|
|
RUNNING = "running"
|
|
COMPLETED = "completed"
|
|
FAILED = "failed"
|
|
SKIPPED = "skipped"
|
|
|
|
@dataclass
|
|
class StageManifest:
|
|
stage: PipelineStage
|
|
status: StageStatus = StageStatus.PENDING
|
|
started_at: datetime | None = None
|
|
completed_at: datetime | None = None
|
|
total_items: int = 0
|
|
processed_items: int = 0
|
|
failed_items: int = 0
|
|
errors: list[dict] = field(default_factory=list)
|
|
metadata: dict = field(default_factory=dict)
|
|
|
|
@dataclass
|
|
class PipelineManifest:
|
|
batch_id: str
|
|
created_at: datetime = field(default_factory=datetime.now)
|
|
status: StageStatus = StageStatus.PENDING
|
|
current_stage: PipelineStage | None = None
|
|
total_audio_files: int = 0
|
|
stages: dict[PipelineStage, StageManifest] = field(default_factory=dict)
|
|
```
|
|
|
|
---
|
|
|
|
## Validation Rules
|
|
|
|
### RCALabel
|
|
- `evidence_spans` MUST have at least 1 element
|
|
- `driver_code` MUST be in rca_taxonomy.yaml OR be "OTHER_EMERGENT"
|
|
- `confidence` MUST be between 0.0 and 1.0
|
|
|
|
### CallAnalysis
|
|
- `traceability` MUST be present
|
|
- If `status == SUCCESS`, `outcome` MUST be present
|
|
- If `outcome == SALE_LOST`, `lost_sales_drivers` SHOULD have entries
|
|
|
|
### BatchAggregation
|
|
- `total_calls_processed` == `successful_analyses` + `failed_analyses`
|
|
|
|
---
|
|
|
|
**Última actualización**: 2026-01-19
|