trends

This researcher has a new way to measure AI performance. It's BS, literally.

25 de marzo de 2026 · Fuente original

This researcher has a new way to measure AI performance. It's BS, literally.

BullshitBench, created by Peter Gostev, evaluates AI models' ability to detect nonsense. One AI company did way better than everyone else.

Peter Gostev, AI capability lead at ArenaPeter Gostev <ul><li>Peter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection.</li><li>Google Gemini 3.0 struggles with B… [+5440 chars]