Getting to a Great Answer – Correcting AI hallucinations before they reach the user

February 28, 2024

There is growing acknowledgement, particularly in large enterprises, that LLMs alone won’t solve business problems. Large, generic LLMs on their own often fail to produce great answers for a business user - answers that are precise, reliable, and domain-specific.

It is clear that innovative techniques are needed in order to ensure reliability and accuracy of LLM outputs – credibility being the underlying driver for widespread adoption in the enterprise.

 

This need becomes even more pronounced when dealing with structured data, where the data in systems of record lacks ambiguity, yet LLMs still have a high likelihood of producing erroneous results.

 

What are the interventions and quality controls that are needed? Enter hila Enterprise, our platform with the tools and techniques designed to improve the accuracy of LLM responses before they ever reach the business user. Integrated into our Conversational Finance application, a GenAI application purpose-built for finance users, hila ensures that users querying directly from systems of record (What were operating expenses for the last year, broken down by region? Provide data as a chart.) get back responses directly sourced from the system, free of hallucinations.

hila Enterprise uses fine-tuned models, post processing of hallucinations and enhancements to the retrieval process. These methods are holistic — the work on our own models or public models — and they’re comprehensive. hila also highlights potential hallucinations, directly removes the hallucinations from the original answers, or improves the answer based on the appropriate sources. 

More specifically, hila Enterprise utilizes:

1.  Entailment-based detection —A proprietary Vianai technique, we determine whether each sentence in the LLM-generated answer is entailed by the most similar sentences in the context. Thus, hila Enterprise can arrive at a measure for hallucination. The platform applies a thorough process that extends beyond purely using entailment and leverages additional techniques that enable us to capture sentence variability and nuances in text. We can then provide a hallucination score that considers differences in sentence structure that appear between an LLM-generated answer and the sources the LLM uses to generate that answer.

2.  Iterative refinement technique — An LLM determines whether or not anLLM-generated answer contains a hallucination with regard to a provided context. This is done multiple times for each piece of context. The number of“yes” answers is divided by the total number of times the LLM is prompted to detect hallucinations to get a score for hallucination. The hila Enterprise platform averages the hallucination scores for all pieces of context and applies additional steps that refine the final answer based on detected hallucinations. This technique not only produces a metric for hallucination(the hallucination rate), but also provides an enhanced answer that overcomes hallucinations.

3.  Multistep verification technique — Each LLM-generated answer is sent to an LLM to generate follow-up questions. The hila Enterprise platform uses these follow-up questions to retrieve more context from the retrieval system, and then uses this to answer the follow-up questions. The follow-up responses are then used to modify the original answer via an LLM. This leads to a more refined answer.

4.  MARAG —This is a proprietary VIANAI technique that is inspired by the way in which humans retrieve information when answering questions during long research processes. It is a complex retrieval process that leverages language models to enhance the retrieval process. This technique not only enables us to find the best possible sources to answer a question, but also enables us to determine what sources to not consider when prompting an LLM. Although this process is computationally expensive and long, it does enable much more factual correctness as the final answer that is generated by an LLM that uses the best possible context retrieved from the hila Enterprise retrieval system. This technique is more useful for report generation, where longer wait times are expected.

5.  Fine-tuned models— We have brought in key methodologies to improve models, fine-tuning based on task, on function and to improve various aspects off actuality and retrieval. These models now serve in txt2sql, translation, and anti-hallucination. Our fine-tuned model, for example, took execution accuracy on a SAP HANA database from 0% (with mostly empty responses) to 96.42%; our additional services took it to 100%.

Taken together, these techniques have taken hallucinations from 66 percent of a sentence down to zero. And, they’ve improved the sentence — instead of leaving empty responses or solely eliminating sentences, by providing more robust answers from the original sources.  

Fundamentally, hila Enterprise is a helpful assistant, not a creative agent. This pushed us to advance the state of the art, and propelled us to continue to spin in cutting-edge anti-hallucination methods that maintain privacy and aid in the performing of various work tasks.

Interested in learning more about how you can bring non-hallucinating GenAI into your enterprise? Get in touch here.