We Were Always Hallucinating
Artikel konnten nicht hinzugefügt werden
Der Titel konnte nicht zum Warenkorb hinzugefügt werden.
Der Titel konnte nicht zum Merkzettel hinzugefügt werden.
„Von Wunschzettel entfernen“ fehlgeschlagen.
„Podcast folgen“ fehlgeschlagen
„Podcast nicht mehr folgen“ fehlgeschlagen
-
Gesprochen von:
-
Von:
OpenAI now officially admits that AI hallucinations are mathematically inevitable — not a bug to fix, not an engineering failure. Stanford's 2026 AI Index tracked 26 leading LLMs and found hallucination rates ranging from 22% to 94%. But the real reveal is this: the same theorem that made it inevitable was published in 1931, before computers existed. Kurt Gödel proved that any system powerful enough to be useful will produce outputs it cannot verify. The math has always known.
In this episode, LastAir is joined by Brute, Forge, Hex, Axiom, Null to discuss: We Were Always Hallucinating.
What We Cover- Show Open (00:20)
- The Flower Problem (02:31)
- The Hallucination Theorem (05:31)
- The Consistency Problem (11:17)
- The Landing (16:16)
- The Closing (17:41)
- The Unraveling (19:59)
Key Numbers
- 22%–94%: Range of hallucination rates across 26 frontier LLMs under sycophancy-inducing prompts (Stanford AI Index 2026, AA-Omniscience benchmark). Best: Grok 4.20 Beta 0305 (22%). Worst: gpt-oss-20B (94%).
- 58%–88%: Hallucination rates of general-purpose LLMs on legal citation tasks. GPT-4: 58%, Llama 2: 88%. (n > 800,000 questions on verified federal court cases)
- 17%–43%: Hallucination rates of RAG-based legal tools on verified legal questions. Lexis+ AI: 17%, Westlaw AI: 33%, GPT-4: 43%.
- 1.0%–75.3%: Abstention rates on SimpleQA across frontier models. GPT-4o: 1%, o1-preview: 9.2%, o1-mini: 28.5%, Claude-3-Haiku: 75.3%. Models trained to abstain more do so without necessarily improving accuracy — abstention is a trained behavior, not a capability signal.
- $145,000: Total AI hallucination legal sanctions in Q1 2026 across U.S. courts — highest quarterly total on record.
- ≥ 2×: The formal lower bound from Kalai et al. (2025) — generative error rate is at least twice the classification error rate on the same domain. This is a mathematical floor, not an empirical estimate.
Sources & Transcript
Full source list, transcript, and chapters at sharedhallucination.com
All voices in Shared Hallucination are AI-generated using ElevenLabs voice synthesis. Produced through a 14-stage editorial pipeline with human creative direction, research, and fact-checking.