Try building enterprise apps using them

if you thought training AI models is hard.

Interview. Even though billions of dollars are spent every year on training large language models, there is still a significant gap between building the model and integrating it into a useful application.

In theory, retrieval augmented creation (RAG) or fine tuning are well-understood ways to expand the knowledge and abilities of pre-trained AI-models, such as Meta’s Llama or Google’s Gemma or Microsoft’s Phi. Aleph Alpha CEO Jonas Andrulis told El Reg that in practice, things are not always so simple. He said.

We’ve already discussedthat while fine tuning is effective in changing a model’s style or behaviour, it’s not a good way to teach new information.

Another concept that we have studied in depth is RAG. The LLM is supposed to act like a librarian retrieving data from an external archive. This approach has the advantage that the information in the database can easily be updated and changed without having to retrain the model or fine-tune it. The results can also be cited after the event and checked for accuracy. Andrulis said.

RAG has many benefits, but it is dependent on key processes, procedures and other institutional know-how being documented so that the model can make it sense. Andrulis says that this is not the case in many cases.

Even if this is the case, it will not do enterprises any good if these documents or processes depend on data that hasn’t been distributed. This is data that looks completely different from the data that was used to train the model. If a model is only trained with English datasets, then it will struggle to understand documentation in German, especially if the document contains scientific formulas. In many cases, it will not be able interpret the document at all.

Andrulis says that, in order to get a meaningful outcome, a combination of RAG and fine-tuning is often required.

Bridging The Gap

Aleph Alpha aims to carve a niche as a European DeepMind, by tackling the types of problems that prevent enterprises and nations from creating sovereign AIs.

Sovereign AI is a term that refers to models which are trained or fine-tuned using the nation’s own datasets, on hardware which has been built or deployed within their borders. Andrulis said. “We try to add innovation where we feel it’s necessary, but also to leverage open source and state of the art where it’s possible.”

There’s no need to build another Llama, or DeepSeek, because they already exist

Although this can sometimes mean training models, such as Aleph Pharia-1-LLM by Andrulis, he emphasizes that they are not trying to create the next Llama, or DeepSeek. Andrulis said. “We don’t have to build another Llama or DeepSeek because they’re already out there.”

Aleph’s focus is on building frameworks that make it easier and more efficient to adopt these technologies. The Heidelberg-based AI startup has developed a new tokenizer-free training architecture “T-Free” that aims to fine-tune models more efficiently. According to Aleph

traditional tokenizer-based methods often require large amounts of out-of distribution data to fine-tune models. It is not only computationally expensive but also assumes that there are enough data to begin with.

According to the startup, its T-Free Architecture avoids this problem entirely by ditching tokenizers. Aleph claims that in its early testing of its previously announced Pharia Large Language Model (LLM) for the Finnish language it has achieved a 70% reduction in training costs and carbon footprint when compared to tokenizer based approaches.

Aleph also developed tools to overcome gaps in documented information, which could lead to AI drawing inaccurate or unhelpful conclusion.

For example, if there are two contracts relevant to a question of compliance and they contradict each other, “the system can basically approach the human saying, I found a discrepancy … can you please give me feedback on whether that is an actual conflict,” Andrulis explained.

The data gathered by this framework, called Pharia Catch by Aleph, can be fed into the application’s database or used to fine-tune better models. Andrulis says that tools like this have helped the company gain partners such as PwC Deloitte Capgemini and Supra who work with customers to implement their technology. What about hardware?

Data and software aren’t the sole challenges that Sovereign AI users face. Hardware is also a factor to consider.

Some enterprises and countries may require their hardware to be developed domestically, while others may dictate where workloads can be run.

This means that Andrulis’ team has to be able to work with a wide range of hardware. The least surprising is AMD. Aleph Alpha announced last monththat it would be partnering with the AI infrastructure vendor for its MI300-series of accelerators. Andrulis also highlighted Aleph Alpha’s collaborations, including with Britain’s Graphcore which was acquired by Japanese mega-conglomerate Softbank last year, as well as Cerebras, whose CS-3 accelerators are being used to train AI for the German armed services.

  • IT decision-makers are not convinced about the ROI of AI
  • IBM is seeking $3.5B of cost savings by 2025 and discretionary spending will be cut
  • Humanoid robotics are coming soon. “We will never become a cloud provider,” said he. It’s going to be more challenging

    Andrulis predicts that the building of AI applications will only become more difficult as the industry moves from chatbots to agentic AI systems capable to more complex problem-solving.

    Agentic AI is a hot topic in the last year, with model builders, software developers, and hardware vendors all promising systems that can complete multiple-step processes asynchronously. Early examples include OpenAI’s Operator () and Anthropics Computer Use API . He said

    “What we did last year was, in most cases, pretty straightforward stuff. Easy things like summarization of documents or a writing assistant,” . “Now, it’s getting a little more exciting with things that, at first glance, don’t even look like genAI problems where the UX is not a chat bot.” (r)

www.aiobserver.co

More from this stream

Recomended