Meta, the parent company of Facebook and Instagram, WhatsApp, Threads, and other services, runs one of the largest recommendation systems in the industry.
Two recently published papers by its researchers reveal how generative models can help better understand and respond user intent.
By viewing recommendations as a generative issue, you can approach it in new and more efficient ways. This approach is useful for any application that needs to retrieve documents, products, or other types of objects.
Dense retrieval vs generative
Standard approach to creating recommendation system is to compute, save and retrieve dense representations. To recommend items to users, for example, an application needs to train a model which can compute embeddings of the users’ requests as well as embeddings of a large number items.
The recommendation system attempts to understand the user’s intent at inference time by finding items whose embeddings match the user’s. This approach requires increasing storage and computation capacity with the increase in items, because each item embedding needs to be stored, and every recommendation operation involves comparing the user embedding with the entire item store.
Generative retrieval is a more recent approach that tries to understand user intent and make recommendations not by searching a database but by simply predicting the next item in a sequence of things it knows about a user’s interactions.
Here’s how it works:
The key to making generative retrieval work is to compute “semantic IDs” (SIDs) which contain the contextual information about each item. Generative retrieval systems like TIGER work in two phases. First, an encoder model is trained to create a unique embedding value for each item based on its description and properties. These embedding values become the SIDs and are stored along with the item.
In the second stage, a transformer model is trained to predict the next SID in an input sequence. The list of input SIDs represents the user’s interactions with past items, and the model’s prediction is the SID of the item to recommend. Generative retrieval reduces the need for storing and searching across individual item embeddings. So its inference and storage costs remain constant as the list of items grows. It also enhances the ability to capture deeper semantic relationships within the data, and provides other benefits of generative models, such as modifying the temperature to adjust the diversity of recommendations.
Advanced generative retriever
Generative retrieval has some limitations despite its lower storage and inference cost. It tends to overfit the items it’s seen during training. This means that it has difficulty dealing with items added to the catalog since the model was trained. This is known as the “cold start problem” in recommendation systems. It refers to items and users that are brand new and do not have any interaction history.
Meta developed a hybrid system called LIGER to address these shortcomings. LIGER combines the computational efficiency of generative retrieval and the robust embedding and ranking capabilities provided by dense retrieval.
LIGER uses both similarity scores and next-token goal to improve the model’s recommendations during training. During inference LIGER selects a few candidates based upon the generative mechanism, and then adds a few cold start items. These are then ranked according to the embeddings generated by the generated candidates.
The researchers note that “the fusion of dense and generative retrieval methods holds tremendous potential for advancing recommendation systems,” and as the models evolve “they will become increasingly practical for real-world applications, enabling more personalized and responsive user experiences.”
In a separate paper, the researchers introduce a novel multimodal generative retrieval method named Multimodal preference discerner (Mender), a technique that can enable generative models to pick up implicit preferences from users’ interactions with different items. Mender builds on top of the generative retrieval methods based on SIDs and adds a few components that can enrich recommendations with user preferences.
Mender uses a large language model (LLM) to translate user interactions into specific preferences. For example, if the user has praised or complained about a specific item in a review, the model will summarize it into a preference about that product category.
The main recommender model is trained to be conditioned both on the sequence of user interactions and the user preferences when predicting the next semantic ID in the input sequence. This gives the recommender model the ability to generalize and perform in-context learning and to adapt to user preferences without being explicitly trained on them.
“Our contributions pave the way for a new class of generative retrieval models that unlock the ability to utilize organic data for steering recommendation via textual user preferences,” the researchers write.
Implications for enterprise application
Generative retrieval systems’ efficiency can have important implications on enterprise applications. These improvements translate into immediate benefits, such as reduced infrastructure costs and quicker inference. It is especially valuable for growing businesses because it can maintain constant storage and inference cost regardless of catalog size.
Benefits are available across industries from ecommerce and enterprise search. The emergence of applications and frameworks is expected to continue as generative retrieval matures.
Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to generate AI, from regulatory changes to practical deployments. This will allow you to share insights and maximize ROI.
Read our privacy policy
Thank you for subscribing. Click here to view more VB Newsletters.
An error occured.