trends
This researcher has a new way to measure AI performance. It's BS, literally.
25 de marzo de 2026 · Fuente original
BullshitBench, created by Peter Gostev, evaluates AI models' ability to detect nonsense. One AI company did way better than everyone else.
Peter Gostev, AI capability lead at ArenaPeter Gostev
<ul><li>Peter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection.</li><li>Google Gemini 3.0 struggles with B… [+5440 chars]