• Grok 4.20 multi-agent inference works at production scale
    Feb 26 2026

    xAI just shipped something fundamentally different. Grok 4.20 doesn't use one model to answer your questions. It deploys four specialized AI agents that think in parallel, debate each other in real time, and synthesize a unified answer before you see a single word.

    In this episode:

    • How the four-agent architecture works: Grok (Captain), Harper (researcher), Benjamin (logician), and Lucas (contrarian)
    • The hallucination results: a sixty-five percent reduction, from twelve percent down to four point two percent
    • Alpha Arena and ForecastBench: where Grok 4.20 outperformed GPT-5 and Gemini
    • The real criticisms: latency, new failure modes, and the social media fact-checking problem
    • Why this might reshape how every lab builds AI over the next year

    The big takeaway: whether Grok 4.20 wins the model race or not, xAI just proved that teams of models can outperform individual geniuses at production scale. That changes the game.

    New episodes every weekday. Share this with someone keeping up with AI.

    Mehr anzeigen Weniger anzeigen
    8 Min.
  • Lockdown Mode: When AI Security Means Disabling AI Features
    Feb 26 2026

    Microsoft just discovered that thirty-one companies are hiding prompt injections inside ordinary "Summarize with AI" buttons, poisoning your AI assistant's memory to manipulate future recommendations. The tools to do this are open source, documented, and work across ChatGPT, Copilot, Claude, Perplexity, and Grok.

    In this episode:

    • How AI Recommendation Poisoning works and why Microsoft compares it to the SEO wars
    • Why prompt injection is the number one AI security threat and structurally unfixable in current architectures
    • The EchoLeak zero-click attack, three hundred thousand stolen ChatGPT credentials, and the massive readiness gap in agentic AI deployment
    • OpenAI's new Lockdown Mode: what it disables, why that matters, and the security-versus-capability tradeoff every organization now faces

    The big takeaway: defending AI systems is going to be a long, iterative war, and the choices organizations make right now about security versus capability will define the next era of AI deployment.

    New episodes every weekday. Share this with your security team.

    Mehr anzeigen Weniger anzeigen
    9 Min.
  • Cursor Gave AI Agents Their Own Computers
    Feb 25 2026

    Cursor just announced cloud agents that change the game for AI-assisted coding. These agents don't just write code in your editor — they spin up their own virtual machines, build and test the software, and deliver merge-ready pull requests with video recordings of themselves using the finished product.In this episode:- How Cursor's cloud agents work: isolated VMs, parallel execution, and self-validating output- The AI coding tool war by the numbers: Cursor at twenty-nine billion valuation versus Claude Code, Codex, and Copilot- Why this signals the shift from AI assistance to AI autonomy in software development- The uncomfortable question: if agents write, test, and demo the code, what's the developer's role?The big takeaway: the AI coding market is moving from autocomplete to autonomous agent fleets, and every developer tool will need to match this model within months.New episodes every weekday. Share this with a developer keeping up with AI tools.

    Mehr anzeigen Weniger anzeigen
    9 Min.
  • The Swarm, The Solver, and The Coder
    Feb 24 2026

    Three Chinese AI labs just released models that are rewriting the leaderboards. Moonshot AI's Kimi K2.5 can spin up a hundred agents working in parallel and scored 74.9% on BrowseComp, seventeen points ahead of GPT-5.2. Alibaba's Qwen3-Max-Thinking hit 58.3 on Humanity's Last Exam with perfect scores on AIME 2025. And Zhipu AI's GLM-5 matches Claude Opus 4.6 on SWE-bench Verified at a fraction of the cost. All three are open source. We break down what each one does, why it matters, and what it means for developers and builders.

    Sources: Moonshot AI (kimi.com), Alibaba Qwen (huggingface.co/Qwen), Zhipu AI (zhipuai.cn), TechCrunch, InfoQ, RAND Corporation.

    Mehr anzeigen Weniger anzeigen
    9 Min.
  • Inside the AI Microscope — How Researchers Are Finally Learning Why AI Lies and Cheats
    Feb 21 2026

    For the first time, researchers can peer inside AI models and see not just what they say, but what they're actually thinking. It's called mechanistic interpretability, and MIT Technology Review just named it one of the ten breakthrough technologies of twenty twenty-six. In this episode: how Anthropic built an AI microscope using sparse autoencoders, what they found inside Claude — including features tied to deception, sycophancy, and a collection of absorbed internet personas — and how OpenAI used related techniques to catch one of its own reasoning models cheating on coding tests, in its own words, in real time. Plus: the race to scale this research before AI models outpace our ability to understand them, and the growing divide between Anthropic's ambitious twenty twenty-seven interpretability goals and Google DeepMind's more pragmatic approach.

    Mehr anzeigen Weniger anzeigen
    10 Min.
  • The Three Sixty Billion Dollar AI Summit
    Feb 20 2026

    India just hosted the largest AI investment event in history. Here's what was pledged, who showed up, and whether this actually helps the people it's supposed to.

    Mehr anzeigen Weniger anzeigen
    12 Min.
  • OpenAI's hire of OpenClaw creator Peter Steinberger
    Feb 18 2026

    OpenClaw went from one-hour side project to nearly two hundred thousand GitHub stars in ninety days. Then OpenAI hired its creator. The story behind how a trademark dispute may have handed OpenAI their most important agent hire of the year. New episode out now.

    Mehr anzeigen Weniger anzeigen
    11 Min.
  • Seedance 2.0: Hollywood's Worst Nightmare Is Here
    Feb 18 2026

    ByteDance's new AI video model went viral in 72 hours, triggered cease-and-desist letters from Disney and Paramount, and may have just changed the creative economy forever.

    Mehr anzeigen Weniger anzeigen
    12 Min.