How to finally have working GenAI in the enterprise

September 8, 2025

hila adds all the critical enterprise context and guardrails on LLMs to ensure accuracy and trust

The buzz around generative AI in the consumer space is undeniable. Large Language Models have become a fixture in the workflows of millions of knowledge workers, offering instant answers and flexible interfaces.  

Yet MIT recently released a study that shows that 95 percent of enterprise AI fails to make it out of a prototype. In fact, in the face of failing enterprise generative AI, many are still using tools like ChatGPT for work.  

hila provides an answer by enhancing LLMs with all the critical enterprise context and guardrails and then continuing to enhance and learn from this context as the data within the enterprise changes. While public LLMs thrive in personal and creative contexts, they fail when asked to solve the high-stakes challenges inside enterprises. In fact, in our own testing, we found that a cutting-edge model failed to provide the correct answer nearly 90% of the time when applied to business data in the enterprise. Enterprise data is usually distributed across multiple applications, can be dirty, but most important has deep semantics specific to the enterprise. This isn’t a limitation of the model as a language generator—the capabilities of the models are incredible - rather, it’s a limitation of its ability to understand and operate within the unique contexts of a complex business.

Our Methodology for Evaluating the Models

To ensure representative data for our evaluation, we curated a balanced test dataset composed of questions from two critical sources: an internal evaluation suite that focuses on common financial analytics questions, and real-world queries submitted by our enterprise customers on our system.

We provided the popular model and hila with the same list of questions.

Examples included:

  1. Show Amount and CAGR for Divisions from FY22 to FY24
  1. Show Actual Amount by Expense Item in Calendar Month 2024-07 compared to 2023-07
  1. Show Amount by Indirect Cost Posting Category for Fiscal Year FY24
  1. Compare Actual Amount and Budget Amount by Indirect Cost Posting Category for last 4 Fiscal Months
  1. Compare Monthly Amount for Headquarter Code value B5 between fiscal year FY24 and FY23

The task: generate the correct SQL and return the correct result. All outputs were cross verified against the database.

The results were decisive. The cutting-edge public model failed in nearly 90% of cases—including all of the examples above even though it was provided details about the database schema—while hila provided the correct answers every single time.

hila's response to our first question. It provides both a table and graph with the correct data.

Why Does hila Work Where Out-of-the-Box Models Fail?

The model we tested against is a breakthrough in foundational modeling, but it was not trained with  enterprise data contexts. hila, on the other hand, is built precisely for that challenge. It succeeds because it brings multiple layers of knowledge and Agents to the table:

  • Knowledge and semantic models — hila builds a map of what the enterprise data contains and where it lives, automatically configured during onboarding.
  • Domain knowledge — hila comes with fine-tuned models specific to enterprise functions, such as profitability in finance or demand planning in supply chain.
  • Company-specific knowledge — hila allows business users to embed their own terminology, formulas, and hierarchies using natural language, ensuring the system reflects how the business actually works. And it continues to learn as this knowledge evolves

This is why hila has succeeded where nearly 95% of GenAI projects fail to make it out of the POC phase (according to MIT). It isn’t “just another chatbot”—it’s enterprise AI agents, engineered for accuracy, persistent memory, and trust.

How we succeed

The core barrier to scaling is not infrastructure, regulation, or talent—it is learning.

Most GenAI systems don’t retain feedback, adapt to company context, or improve with use. They forget user preferences, misinterpret local terminology, and repeat mistakes.

  • How hila works: hila is built with feedback loops, semantic understanding, and contextual persistence. It remembers, adapts, and gets better over time—turning every interaction into a step toward greater accuracy.

Enterprise buyers care about outcomes, not benchmarks.

Speed, token counts, and benchmark scores mean little if a tool can’t help shorten a monthly close, detect margin leakage, or forecast demand more accurately. Generic LLMs rarely deliver these outcomes without long, fragile customization projects.

  • How hila works: hila comes pre-loaded with domain-specific intelligence and proven business outcomes. Example, finance teams benefit from built-in profitability logic, supply chain teams from supply and demand logic, and additionally every team can add custom KPIs or formulas in natural  language. Outcomes are delivered immediately—not years later. And tools that allows hila to continuously learn – not by writing more code but by allowing business and domain users to continuously provide training feedback.

The Bottom Line

Today’s corporate world often feels like an endless loop of late-night spreadsheet marathons, frantic emails, and last-minute fire drills just to answer basic data questions. hila offers an alternative—a calmer, clearer way. No more chasing down data or juggling pivot tables in a panic.

With hila, anyone in your organization can ask questions in plain language and get instant, answers—governed by your permissions. Less stress. Fewer fire drills. More focus.

Ready to stop the scramble?

Book a live demo. Bring 3 real questions from your backlog—we’ll run them in our demo environment, show the SQL, and return the numbers you can trust. Walk out with a simple next-step plan for your team.

Experience a more relaxed, intuitive way to work with your data.

Book your demo and find your analytics zen today.