Methodology
How we get closer to the truth — and where we fall short.
Source selection
We pull from 65 curated RSS feeds spanning wire services (AP, Reuters, AFP), US mainstream outlets across the political spectrum, international sources from 15+ countries, and independent/nonprofit journalism. We supplement these with 24 Google News topic feeds that surface articles from hundreds of additional outlets.
Source selection criteria: editorial standards, factual reporting track record, and spectrum coverage. We intentionally include outlets we disagree with — a truth engine that only reads sources it likes isn't seeking truth.
How bias ratings work
Each source is rated using Media Bias/Fact Check (MBFC) classifications. MBFC is the most comprehensive independent media bias database, rating 9,000+ outlets on both bias (far-left to far-right) and factual reporting (very high to very low).
We maintain a local database of 100+ source ratings for fast lookup. Articles from Google News feeds are automatically tagged by matching the source name against our MBFC database. Sources not found default to "center" — this is a known limitation.
How clustering works
We generate vector embeddings for each article title using OpenAI's text-embedding-3-small model, then group articles with cosine similarity above 0.55 into clusters.
If embeddings fail, we fall back to TF-IDF (term frequency) matching on titles with a 0.35 similarity threshold. This is less accurate but ensures the pipeline never stops.
How synthesis works
For the top multi-source stories, we run a 4-agent adversarial debate:
Emphasizes humanitarian impact, institutional accountability, and systemic factors.
Emphasizes national security, fiscal responsibility, and traditional frameworks.
Emphasizes individual liberty, government overreach, and market dynamics.
Challenges all three — finds groupthink, missing angles, and logical gaps in every perspective.
A synthesis agent then produces the final article: consensus facts first, disputed claims clearly marked, blindspots noted, every factual assertion cited to its source.
How confidence scores work
Limitations and known biases
We believe transparency about limitations is more honest than pretending they don't exist.
- △RSS-only ingestion means we miss stories that only appear on social media or behind paywalls.
- △Google News topic feeds return disproportionately center/left-center articles, inflating those categories.
- △Sources not in our MBFC database default to "center" — unknown outlets get the benefit of the doubt.
- △LLM synthesis can introduce subtle framing biases even with adversarial agents. The citations let you check.
- △We synthesize 2-3 stories per run, not all 200+ clusters. Most stories pass through unsynthesized.
- △English-language sources only. Global events are filtered through anglophone media.
- △The pipeline runs on AI models (xAI Grok, OpenAI embeddings) that have their own training biases.