New Hallucination Index Identifies Best LLMs for Most Popular AI Use Cases
SAN FRANCISCO, Nov. 15, 2023 (GLOBE NEWSWIRE) -- Galileo, a leading machine learning (ML) company for unstructured data, today released a Hallucination Index developed by its research arm, Galileo Labs, to help users of today’s leading LLMs determine which model is least likely to hallucinate for their intended application. The findings can be viewed here: https://www.rungalileo.io/hallucinationindex
“2023 has been the year of LLMs. While everyone from individual developers to Fortune 50 enterprises has been learning how to wrangle this novel new technology, two things are clear: first, LLMs are not one size fits all and second, hallucinations remain one of the greatest hurdles to LLM adoption,” said Atindriyo Sanyal, Galileo’s co-founder and CTO. “To help builders identify which LLMs to use for their applications, Galileo Labs created a ranking of the most popular LLMs based on their propensity to hallucinate using our proprietary hallucination evaluation metrics, Correctness and Context Adherence. We hope this effort sheds light on LLMs and helps teams pick the perfect LLM for their use case.”
While businesses of all sizes are building LLM-based applications, these efforts are being hindered by hallucinations that pose significant challenges in generating accurate and reliable responses. With hallucinations, AI generates information that appears realistic at first glance yet is ultimately incorrect or disconnected from the context.
To help teams get a handle on hallucinations and identify the best LLM that suits their needs, Galileo Labs developed a Hallucination Index that takes 11 LLMs from Open AI (GPT-4-0613, GPT-3.5-turbo-1106, GPT-3.5-turbo-0613 and GPT-3.5-turbo-instruct), Meta (Llama-2-70b, Llama-2-13b and Llama-2-7b), TII UAE (Falcon-40b-instruct), Mosaic ML (MPT-7b-instruct), Mistral.ai (Mistral-7b-instruct) and Hugging Face (Zephyr-7b-beta) and evaluates each LLM’s likelihood to hallucinate in common generative AI task types.
Key insights include:
Supporting these analyses are Galileo’s proprietary evaluation metrics Correctness and Context Adherence. These metrics are powered by ChainPoll, a hallucination detection methodology developed by Galileo Labs. During the creation of the index, Galileo’s evaluation metrics were proven to detect hallucinations with 87% accuracy, finally giving teams a reliable way to automatically detect hallucination risk saving teams time and cost typically spent on manual evaluation.
By helping teams catch errors of stale knowledge, wrong knowledge, logical fallacies and mathematical errors, Galileo hopes to help organizations find the perfect LLM for their use case, move from sandbox to production and more quickly deploy reliable and trustworthy AI.
Legal Disclaimer: The findings and rankings presented in Galileo's Hallucination Index are based on Galileo’s proprietary evaluation metrics, namely "Correctness" and "Context Adherence." These metrics have been developed by Galileo to assess the performance of various Large Language Models (LLMs). It's important to note that these rankings could differ when evaluated against other metrics or methodologies.
This study is not endorsed by, directly affiliated with, maintained, authorized or sponsored by any of the LLM providers mentioned in this index including but not limited to OpenAI, Meta, Mosaic ML, Hugging Face or their subsidiaries or affiliates. All product and company names are the registered trademarks of their original owners. The use of any trade name or trademark is for identification and reference purposes only and does not imply any association with the trademark holder of their product brand.
Media and Analyst Contact: