Context Tuning for Retrieval Augmented Generation

Paper · arXiv 2312.05708 · Published December 9, 2023
RAGContext Engineering

“Large language models (LLMs) have the remarkable ability to solve new tasks with just a few examples, but they need access to the right tools. Retrieval Augmented Generation (RAG) addresses this problem by retrieving a list of relevant tools for a given task. However, RAG’s tool retrieval step requires all the required information to be explicitly present in the query. This is a limitation, as semantic search, the widely adopted tool retrieval method, can fail when the query is incomplete or lacks context. To address this limitation, we propose Context Tuning for RAG, which employs a smart context retrieval system to fetch relevant information that improves both tool retrieval and plan generation. Our lightweight context retrieval model uses numerical, categorical, and habitual usage signals to retrieve and rank context items.”

We share empirically the insight that Chain of Thought (CoT) augmentation improves context retrieval when no fine-tuning is applied, whereas fine-tuning the retrieval model removes the need for CoT augmentation;

While re-ranking is preferred, employing any pretrained retriever, particularly a text-based retriever, would be sub-optimal due to the inadequate information expected from ambiguous queries. Our work demonstrates the inadequacy of text-based retrievers for context retrieval and the necessity of more advanced retrieval models.

While CoT augmentation improves upon baseline methods, such as vanilla semantic search, CoT may potentially increase the input length to the LLM, which has a limited context window size. Additionally, studies have demonstrated that the placement of relevant information impacts LLM generation (Liu et al., 2023). Therefore, it is preferable to avoid increasing input sequence length if the same or better results can be achieved without query augmentation.

fine-tuning semantic search obviates the necessity for query augmentation while achieving comparable performance.

Our study employed a data generation methodology using synthetic application data, aimed at simulating real-world scenarios for a digital assistant. The data encompasses 7 commonly used applications: mail, calendar, google, music, reminders, notes, and phone call. We generated this data using GPT- 4, ensuring diversity in the dataset to reflect a wide range of user personalities. The synthetic dataset contained a diverse range of context items spanning various applications. A total of 791 distinct personas were synthesized, yielding 4,338 unique implicit queries for training and 936 implicit queries for evaluation.

we developed a toolbox containing APIs for each of the applications we considered. This toolbox was created using in-context learning with GPT-4 and contained a total of 59 APIs distributed across the applications.

BM25 For text-based search, we use an improved version of BM25, called BM25T (Trotman et al., 2014).

3.2 Context Tuning

To compare various context retrieval methods, we employ both text-based and vector-based retrieval baselines. We simulate different context stores by structuring context data per persona and train models to perform federated search. We use query and persona meta-signals, such as frequency, usage history, and correlation with geo-temporal features, to perform retrieval. We evaluate context retrieval using the Recall@K and Normalized Discounted Cumulative Gain (NDCG@K) metrics.

3.4 Planner

The planner’s objective is to select the most appropriate tool from the retrieved tool list and generate a well-formed plan. A plan comprises an API call constructed using the chosen tool and parameters extracted from the query and retrieved context. We fine-tune OpenLLaMA-v2-7B (Touvron et al., 2023) for plan generation. To assess the planner’s performance, we employ the Abstract Syntax Tree (AST) matching strategy to compute plan accuracy. A hallucination is defined as a plan generated using an imaginary tool.

Note: this is what Perplexity uses as well: bm25 according to Lex Fridman interview

Learning to Retrieve In-Context Examples for Large Language Models https://arxiv.org/abs/2307.07164

“However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples. In this paper, we propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. Our framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder based dense retrievers.”

“One area of research is focused on understanding the underlying mechanism and principles of in-context learning. For instance, Xie et al. view in-context learning as implicit Bayesian inference, while Dai et al. interpret it as meta optimization.”