flowchart TD
A(["๐ RSS Feeds โ BBC ยท NYT ยท Al Jazeera"]) -->|feedparser| B(["๐ฐ Raw Articles"])
B -->|TF-IDF vectorization| C(["๐ข Feature Vectors"])
C -->|DBSCAN clustering| D{{"๐๏ธ Cluster? โ similarity > 0.3"}}
D -- "Multi-source" --> E(["๐ง LLM Extraction โ Llama 3.2"])
D -. "Single article" .-> F(["๐ Standalone Article"])
E --> G(["โ
Verified Facts + Classification"])
G -->|geography ยท category| H[("๐๏ธ SQLite DB")]
F -.-> H
H -->|parent / child links| I(["๐ DAG Builder"])
I --> J(["๐ Knowledge Graph โ vis-network"])
H --> K(["๐ฅ๏ธ Dashboard โ Filterable UI"])
style A fill:#312e81,stroke:#818cf8,stroke-width:2px,color:#e0e7ff
style B fill:#1e1b4b,stroke:#6366f1,stroke-width:2px,color:#c7d2fe
style C fill:#1e1b4b,stroke:#6366f1,stroke-width:2px,color:#c7d2fe
style D fill:#3b0764,stroke:#c084fc,stroke-width:2px,color:#f3e8ff
style E fill:#312e81,stroke:#818cf8,stroke-width:2px,color:#e0e7ff
style F fill:#1e293b,stroke:#475569,stroke-width:1px,color:#94a3b8
style G fill:#064e3b,stroke:#34d399,stroke-width:2px,color:#d1fae5
style H fill:#1e1b4b,stroke:#a78bfa,stroke-width:2px,color:#ddd6fe
style I fill:#312e81,stroke:#818cf8,stroke-width:2px,color:#e0e7ff
style J fill:#1e1b4b,stroke:#6366f1,stroke-width:2px,color:#c7d2fe
style K fill:#1e1b4b,stroke:#6366f1,stroke-width:2px,color:#c7d2fe
linkStyle default stroke:#818cf8,stroke-width:2px
linkStyle 5 stroke:#475569,stroke-width:1px,stroke-dasharray:6
linkStyle 8 stroke:#475569,stroke-width:1px,stroke-dasharray:6
1. RSS Ingestion
Fetches live feeds from global sources using feedparser. Collects titles, descriptions,
and publication dates.
2. Semantic Clustering
Vectorizes text with TF-IDF and groups similar articles using DBSCAN
(cosine similarity > 0.3).
3. LLM Extraction
Passes clusters to Llama-3.2 to synthesize titles, extract verified facts, and classify
by geography & category.
4. DAG Generation
Links events chronologically (parent โ child) and stores them in SQLite. The Knowledge Graph renders
this as a directed, hierarchical timeline.