March 13, 2025 9:00 AM
Credit: VentureBeat made with Midjourney
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Patronus AI announced today the launch of what it calls the industry’s first multimodal large language model-as-a-judge (MLLM-as-a-Judge), a tool designed to evaluate AI systems that interpret images and produce text.
The new evaluation technology aims to help developers detect and mitigate hallucinations and reliability issues in multimodal AI applications. E-commerce giant Etsy has already implemented the technology to verify caption accuracy for product images across its marketplace of handmade and vintage goods.
“Super excited to announce that Etsy is one of our ship customers,” said Anand Kannappan, cofounder of Patronus AI, in an exclusive interview with VentureBeat. “They have hundreds of millions of items in their online marketplace for handmade and vintage products that people are creating around the world. One of the things that their AI team wanted to be able to leverage generative AI for was the ability to auto-generate image captions and to make sure that as they scale across their entire global user base, that the captions that are generated are ultimately correct.”
Why Google’s Gemini powers the new AI judge rather than OpenAI
Patronus built its first MLLM-as-a-Judge, called Judge-Image, on Google’s Gemini model after extensive research comparing it with alternatives like OpenAI’s GPT-4V.
“We tended to see that there was a slighter p...