feat: Add Streamlit dashboard with Blueprint compliance (v2.1.0)
Dashboard Features: - 8 navigation sections: Overview, Outcomes, Poor CX, FCR, Churn, Agent, Call Explorer, Export - Beyond Brand Identity styling (colors #6D84E3, Outfit font) - RCA Sankey diagram (Driver → Outcome → Churn Risk flow) - Correlation heatmaps (driver co-occurrence, driver-outcome) - Outcome Deep Dive (root causes, correlation, duration analysis) - Export functionality (Excel, HTML, JSON) Blueprint Compliance: - FCR: 4 categories (Primera Llamada/Rellamada × Sin/Con Riesgo de Fuga) - Churn: Binary view (Sin Riesgo de Fuga / En Riesgo de Fuga) - Agent: Talento Para Replicar / Oportunidades de Mejora - Fixed FCR rate calculation (only FIRST_CALL counts as success) Technical: - Streamlit + Plotly for interactive visualizations - Light theme configuration (.streamlit/config.toml) - Fixed Plotly colorbar titlefont deprecation Documentation: - Updated PROJECT_CONTEXT.md, TODO.md, CHANGELOG.md - Added 4 new technical decisions (TD-014 to TD-017) - Created TROUBLESHOOTING.md with 10 common issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
289
docs/DATA_CONTRACTS.md
Normal file
289
docs/DATA_CONTRACTS.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# DATA_CONTRACTS.md
|
||||
|
||||
> Schemas de todos los datos que fluyen por el sistema
|
||||
|
||||
---
|
||||
|
||||
## Regla de oro
|
||||
|
||||
> Si cambias un schema, actualiza este doc PRIMERO, luego implementa el código.
|
||||
|
||||
---
|
||||
|
||||
## Schema: Transcript
|
||||
|
||||
**Archivo**: `src/transcription/models.py`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class SpeakerTurn:
|
||||
speaker: Literal["agent", "customer"]
|
||||
text: str
|
||||
start_time: float # seconds
|
||||
end_time: float # seconds
|
||||
confidence: float = 1.0
|
||||
|
||||
@dataclass
|
||||
class TranscriptMetadata:
|
||||
audio_duration_sec: float
|
||||
language: str = "es"
|
||||
provider: str = "assemblyai"
|
||||
job_id: str | None = None
|
||||
created_at: datetime = field(default_factory=datetime.now)
|
||||
|
||||
@dataclass
|
||||
class Transcript:
|
||||
call_id: str
|
||||
turns: list[SpeakerTurn]
|
||||
metadata: TranscriptMetadata
|
||||
detected_events: list[Event] = field(default_factory=list)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema: Event
|
||||
|
||||
**Archivo**: `src/models/call_analysis.py`
|
||||
|
||||
```python
|
||||
class EventType(str, Enum):
|
||||
HOLD_START = "hold_start"
|
||||
HOLD_END = "hold_end"
|
||||
TRANSFER = "transfer"
|
||||
ESCALATION = "escalation"
|
||||
SILENCE = "silence"
|
||||
INTERRUPTION = "interruption"
|
||||
|
||||
@dataclass
|
||||
class Event:
|
||||
event_type: EventType
|
||||
timestamp: float # seconds from call start
|
||||
duration_sec: float | None = None
|
||||
metadata: dict = field(default_factory=dict)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema: CompressedTranscript
|
||||
|
||||
**Archivo**: `src/compression/models.py`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class CustomerIntent:
|
||||
intent_type: IntentType # CANCEL, INQUIRY, COMPLAINT, etc.
|
||||
text: str
|
||||
timestamp: float
|
||||
confidence: float = 0.8
|
||||
|
||||
@dataclass
|
||||
class AgentOffer:
|
||||
offer_type: OfferType # DISCOUNT, UPGRADE, RETENTION, etc.
|
||||
text: str
|
||||
timestamp: float
|
||||
|
||||
@dataclass
|
||||
class CustomerObjection:
|
||||
objection_type: ObjectionType # PRICE, SERVICE, COMPETITOR, etc.
|
||||
text: str
|
||||
timestamp: float
|
||||
|
||||
@dataclass
|
||||
class CompressedTranscript:
|
||||
call_id: str
|
||||
customer_intents: list[CustomerIntent]
|
||||
agent_offers: list[AgentOffer]
|
||||
objections: list[CustomerObjection]
|
||||
resolutions: list[ResolutionStatement]
|
||||
key_moments: list[KeyMoment]
|
||||
compression_ratio: float = 0.0 # tokens_after / tokens_before
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema: CallAnalysis
|
||||
|
||||
**Archivo**: `src/models/call_analysis.py`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class EvidenceSpan:
|
||||
text: str
|
||||
start_time: float | None = None
|
||||
end_time: float | None = None
|
||||
|
||||
@dataclass
|
||||
class RCALabel:
|
||||
driver_code: str # From rca_taxonomy.yaml
|
||||
confidence: float # 0.0-1.0
|
||||
evidence_spans: list[EvidenceSpan] # Min 1 required!
|
||||
reasoning: str | None = None
|
||||
|
||||
@dataclass
|
||||
class ObservedFeatures:
|
||||
audio_duration_sec: float
|
||||
agent_talk_ratio: float | None = None
|
||||
customer_talk_ratio: float | None = None
|
||||
hold_time_total_sec: float | None = None
|
||||
transfer_count: int = 0
|
||||
silence_count: int = 0
|
||||
|
||||
@dataclass
|
||||
class Traceability:
|
||||
schema_version: str
|
||||
prompt_version: str
|
||||
model_id: str
|
||||
processed_at: datetime = field(default_factory=datetime.now)
|
||||
|
||||
class CallOutcome(str, Enum):
|
||||
SALE_COMPLETED = "sale_completed"
|
||||
SALE_LOST = "sale_lost"
|
||||
INQUIRY_RESOLVED = "inquiry_resolved"
|
||||
INQUIRY_UNRESOLVED = "inquiry_unresolved"
|
||||
COMPLAINT_RESOLVED = "complaint_resolved"
|
||||
COMPLAINT_UNRESOLVED = "complaint_unresolved"
|
||||
|
||||
class ProcessingStatus(str, Enum):
|
||||
SUCCESS = "success"
|
||||
PARTIAL = "partial"
|
||||
FAILED = "failed"
|
||||
|
||||
@dataclass
|
||||
class CallAnalysis:
|
||||
call_id: str
|
||||
batch_id: str
|
||||
status: ProcessingStatus
|
||||
observed: ObservedFeatures
|
||||
outcome: CallOutcome | None = None
|
||||
lost_sales_drivers: list[RCALabel] = field(default_factory=list)
|
||||
poor_cx_drivers: list[RCALabel] = field(default_factory=list)
|
||||
traceability: Traceability | None = None
|
||||
error_message: str | None = None
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema: BatchAggregation
|
||||
|
||||
**Archivo**: `src/aggregation/models.py`
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class DriverFrequency:
|
||||
driver_code: str
|
||||
category: Literal["lost_sales", "poor_cx"]
|
||||
total_occurrences: int
|
||||
calls_affected: int
|
||||
total_calls_in_batch: int
|
||||
occurrence_rate: float # occurrences / total_calls
|
||||
call_rate: float # calls_affected / total_calls
|
||||
avg_confidence: float
|
||||
min_confidence: float
|
||||
max_confidence: float
|
||||
|
||||
class ImpactLevel(str, Enum):
|
||||
CRITICAL = "critical"
|
||||
HIGH = "high"
|
||||
MEDIUM = "medium"
|
||||
LOW = "low"
|
||||
|
||||
@dataclass
|
||||
class DriverSeverity:
|
||||
driver_code: str
|
||||
category: Literal["lost_sales", "poor_cx"]
|
||||
base_severity: float
|
||||
frequency_factor: float
|
||||
confidence_factor: float
|
||||
co_occurrence_factor: float
|
||||
severity_score: float # 0-100
|
||||
impact_level: ImpactLevel
|
||||
|
||||
@dataclass
|
||||
class RCATree:
|
||||
batch_id: str
|
||||
total_calls: int
|
||||
calls_with_lost_sales: int
|
||||
calls_with_poor_cx: int
|
||||
calls_with_both: int
|
||||
top_lost_sales_drivers: list[str]
|
||||
top_poor_cx_drivers: list[str]
|
||||
nodes: list[RCANode] = field(default_factory=list)
|
||||
|
||||
@dataclass
|
||||
class BatchAggregation:
|
||||
batch_id: str
|
||||
total_calls_processed: int
|
||||
successful_analyses: int
|
||||
failed_analyses: int
|
||||
lost_sales_frequencies: list[DriverFrequency]
|
||||
poor_cx_frequencies: list[DriverFrequency]
|
||||
lost_sales_severities: list[DriverSeverity]
|
||||
poor_cx_severities: list[DriverSeverity]
|
||||
rca_tree: RCATree | None = None
|
||||
emergent_patterns: list[dict] = field(default_factory=list)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Schema: PipelineManifest
|
||||
|
||||
**Archivo**: `src/pipeline/models.py`
|
||||
|
||||
```python
|
||||
class PipelineStage(str, Enum):
|
||||
TRANSCRIPTION = "transcription"
|
||||
FEATURE_EXTRACTION = "feature_extraction"
|
||||
COMPRESSION = "compression"
|
||||
INFERENCE = "inference"
|
||||
AGGREGATION = "aggregation"
|
||||
EXPORT = "export"
|
||||
|
||||
class StageStatus(str, Enum):
|
||||
PENDING = "pending"
|
||||
RUNNING = "running"
|
||||
COMPLETED = "completed"
|
||||
FAILED = "failed"
|
||||
SKIPPED = "skipped"
|
||||
|
||||
@dataclass
|
||||
class StageManifest:
|
||||
stage: PipelineStage
|
||||
status: StageStatus = StageStatus.PENDING
|
||||
started_at: datetime | None = None
|
||||
completed_at: datetime | None = None
|
||||
total_items: int = 0
|
||||
processed_items: int = 0
|
||||
failed_items: int = 0
|
||||
errors: list[dict] = field(default_factory=list)
|
||||
metadata: dict = field(default_factory=dict)
|
||||
|
||||
@dataclass
|
||||
class PipelineManifest:
|
||||
batch_id: str
|
||||
created_at: datetime = field(default_factory=datetime.now)
|
||||
status: StageStatus = StageStatus.PENDING
|
||||
current_stage: PipelineStage | None = None
|
||||
total_audio_files: int = 0
|
||||
stages: dict[PipelineStage, StageManifest] = field(default_factory=dict)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Validation Rules
|
||||
|
||||
### RCALabel
|
||||
- `evidence_spans` MUST have at least 1 element
|
||||
- `driver_code` MUST be in rca_taxonomy.yaml OR be "OTHER_EMERGENT"
|
||||
- `confidence` MUST be between 0.0 and 1.0
|
||||
|
||||
### CallAnalysis
|
||||
- `traceability` MUST be present
|
||||
- If `status == SUCCESS`, `outcome` MUST be present
|
||||
- If `outcome == SALE_LOST`, `lost_sales_drivers` SHOULD have entries
|
||||
|
||||
### BatchAggregation
|
||||
- `total_calls_processed` == `successful_analyses` + `failed_analyses`
|
||||
|
||||
---
|
||||
|
||||
**Última actualización**: 2026-01-19
|
||||
Reference in New Issue
Block a user