Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Paper · arXiv 2402.09642 · Published February 15, 2024

This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose INBEDDER that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. INBEDDER demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large).

We offer a novel viewpoint, which treats the instruction as a question about the input text and encodes the expected answers. Specifically, using the instructed input as the prompt to generative language models, we argue that the generated answers can be natively utilized to model semantic similarity under different instructions. For instance, given the sentences “I love cats” and “I love dogs”, the instruction “Do they love animals?” will lead to a uniform response of “Yes/Certainly/...”; Conversely, distinct answers would be generated in response to “What animals do they love?”. Therefore, we believe that the expectation of answer representations given the prompt can serve as an instruction-following embedding.