The 60-Point Gap: Why We're Measuring the Wrong Customer
A Nature study shows LLMs achieve 94.9% accuracy on benchmarks but only 34.5% when laypeople use LLMs on physician-created scenarios. The gap reveals something deeper: We measure the model in isolation. We deploy to a human in distress. The system fails at the intersection.