Gate News message, April 27 — Logan Kilpatrick, senior product manager at Google DeepMind and product lead for Google AI Studio, stated on X that every company building AI-based products should establish its own custom benchmarks to measure AI model performance. He described this as a method to make model improvements “disproportionately benefit your company” and urged founders and business leaders to “start tomorrow.”
Most companies currently rely on public leaderboards to select AI models, but these measure general capabilities that often misalign with specific business scenarios. Kilpatrick cited the example of a contract review company most concerned with clause extraction accuracy—a capability absent from public benchmarks, making it impossible to assess model performance on that task. Custom benchmarks offer two key advantages: first, they enable companies to evaluate each model update against their own business tasks and select the model that performs best in their actual use case rather than the highest-ranked model overall; second, they allow companies to share these test sets with model providers, driving continuous optimization in areas that matter to their business.
Kilpatrick noted that companies like Zapier and Sierra are already implementing this approach, stating that “there is a lot of alpha that can be created here.”
Related News
MediaTek remporte une grande commande de Google pour la 8e génération de TPU ! L’“effet fermentation” des ASIC stimule trois valeurs conceptuelles bénéficiaires
JPMorgan : La tokenisation transformera l'industrie des fonds, mais de « bons cas d’usage » arriveront dans des années
Un agent IA peut déjà reproduire de manière autonome des articles universitaires complexes : Mollick affirme que les erreurs proviennent davantage des textes originaux humains que de l’IA