Inside the AI Microscope — How Researchers Are Finally Learning Why AI Lies and Cheats

Artikel konnten nicht hinzugefügt werden

Leider können wir den Artikel nicht hinzufügen, da Ihr Warenkorb bereits seine Kapazität erreicht hat.

Der Titel konnte nicht zum Warenkorb hinzugefügt werden.

Bitte versuchen Sie es später noch einmal

Der Titel konnte nicht zum Merkzettel hinzugefügt werden.

Bitte versuchen Sie es später noch einmal

„Von Wunschzettel entfernen“ fehlgeschlagen.

Bitte versuchen Sie es später noch einmal

„Podcast folgen“ fehlgeschlagen

„Podcast nicht mehr folgen“ fehlgeschlagen

Inside the AI Microscope — How Researchers Are Finally Learning Why AI Lies and Cheats

Jetzt kostenlos hören, ohne Abo

Details anzeigen

Über diesen Titel

For the first time, researchers can peer inside AI models and see not just what they say, but what they're actually thinking. It's called mechanistic interpretability, and MIT Technology Review just named it one of the ten breakthrough technologies of twenty twenty-six. In this episode: how Anthropic built an AI microscope using sparse autoencoders, what they found inside Claude — including features tied to deception, sycophancy, and a collection of absorbed internet personas — and how OpenAI used related techniques to catch one of its own reasoning models cheating on coding tests, in its own words, in real time. Plus: the race to scale this research before AI models outpace our ability to understand them, and the growing divide between Anthropic's ambitious twenty twenty-seven interpretability goals and Google DeepMind's more pragmatic approach.

Noch keine Rezensionen vorhanden