)
Image generated by VentureBeat using FLUX-pro 1.1
Enterprise AI can only be as good as the data available to a model.
In years past, enterprises relied heavily on structured data. In the wake of the rapid adoption and use of generative AI by enterprises, they are increasingly looking to consume large amounts of unstructured information. Unstructured data is by definition unstructured and can take on any form. Unstructured data can pose a challenge to enterprises, as it is not always known what its quality is. Data quality can be defined as accuracy, knowledge gaps and duplication, among other issues, that affect the utility of data.
Data-quality tools, which have been used for structured data for many years, are now being expanded to unstructured data in enterprise AI. Anomalo is one such vendor, which has developed its data quality platform primarily for structured data over the past few years. Today, the company announced that it has expanded its platform to support better unstructured data monitoring. Elliot Shmukler, co-founder and CEO of Anomalo, believes that the technology his company offers can have a significant impact on organizations.
In an exclusive interview with VentureBeat, Shmukler said: “We believe we can accelerate at minimum 30% of gen AI deployments by eliminating data-quality issues.”
Shmukler noted that some enterprises abandon AI projects after the proof of concept stage. The problem is caused by poor data quality, data gaps and the fact enterprise data are not ready for gen AI.
According to Shmukler, “We believe that using Anomalo unstructured monitoring can accelerate typical gen AI in the Enterprise projects by up to a year.” This is due to Anomalo’s ability to quickly understand, profile, and curate the data on which these projects rely.
Alongside its product update, Anomalo also announced a $10-million extension of Series B funding, first announced on January 23. The round now totals $82 million.
Why does data quality matter for enterprise AI?
Unstructured content poses unique challenges to AI applications.
Shmukler said that unstructured data could contain anything. “It could contain personally identifiable information such as emails, names, Social Security numbers, or proprietary secret information.”
Anomalo’s platform addresses these challenges by adding structured meta data to unstructured documents. This allows organizations to better control and understand their data before it is sent to AI models.
Anomalo provides the following key features to improve unstructured data:
Custom Issue Definition: Allows users define their own issues in document collections beyond the predefined issues such as personally identifiable information (PII), or abusive content. Support for private cloud models:Enables enterprises using large language models (LLMs), deployed in their cloud provider environments. This gives them more control and comfort with their data. Metadata tagging – Adds structured metadata, such as information on detected issues, to unstructured documents to allow better curation and filtering for gen AI applications.
A new feature will allow software to redact documents and remove sensitive information.
Anomalo’s unstructured quality data market is not the only one.
Several data quality vendors, including Monte Carlo Data and Qlik, have unstructured data technology. Shmukler sees a number of areas and ways that his company can differentiate itself.
According to him, some vendors approach unstructured data by integrating and monitoring vector databases containing data that power a retrieval-augmented-generation (RAG) workflow. Shmukler explained the approach requires a pipeline to be already set up in order to send the data into the vector databases. He also added that it restricts applications to the traditional RAG rather than newer methods such as large contexts models, which may not require a vector databases.
Shmukler explained that “Anomalo” is different because it analyzes the raw unstructured collections of data before any pipelines are set up to ingest them. This allows for a broader exploration of the data available before committing to a pipeline. It also opens up other possible approaches to using these data beyond traditional RAG methods.
How Anomalo’s monitoring fits in enterprise AI deployments
Anomalo’s platform can accelerate different aspects of enterprise AI implementations.
Shmukler stated that teams can integrate data-quality monitoring into the data-preparation phase, before sending data to a vector database or model. Anomalo provides metadata on top of unstructured data. Enterprises can use structured meta-data to ensure high-quality data that is free of issues when training or fine tuning genAI models.
Anomalo’s data quality monitoring is also compatible with the data pipelines feeding into RAG. In the RAG use-case, unstructured data are ingested to vector databases for retrieval. The metadata can also be used to filter and rank data for RAG to ensure the quality of information used to produce outputs.
Compliance and risk mitigation are two other areas where Shmukler believes data quality monitoring has a significant impact. Anomalo data tagging prevents enterprises from genAI exposing sensitive information or violating compliance.
Shmukler said that enterprises are concerned about LLMs providing sensitive information by answering with data they shouldn’t. “Another big part of this is being able sleep better at night while building your gen AI apps, knowing that it’s less likely that sensitive data or data that you do not want the LLM know about will actually reach the LLM.”
Daily insights into business use cases from VB Daily
Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to maximize ROI, from regulatory changes to practical deployments.
Read our privacy policy
Thank you for subscribing. Click here to view more VB Newsletters.
An error occured.