Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation

Paper · arXiv 2412.15375 · Published December 19, 2024

Extracting metaphors and analogies from free text requires high-level reasoning abilities such as abstraction and language understanding. Our study focuses on the extraction of the concepts that form metaphoric analogies in literary texts. To this end, we construct a novel dataset in this domain with the help of domain experts. We compare the out-of-the-box ability of recent large language models (LLMs) to structure metaphoric mappings from fragments of texts containing proportional analogies. The models are further evaluated on the generation of implicit elements of the analogy, which are indirectly suggested in the texts and inferred by human readers. The competitive results obtained by LLMs in our experiments are encouraging and open up new avenues such as automatically extracting analogies and metaphors from text instead of investing resources in domain experts to manually label data.

According to Hofstadter and Sander (2013), analogy is the fuel and fire of thinking because humans dynamically build concepts by analogy. Although it is a core mechanism of the mind, they have proven to be difficult to extract automatically from free text, because they can involve some implicit concepts and can link dissimilar concepts. In some cases, in particular when analogies pair very different concepts, they can also be metaphoric (Bowdle and Gentner, 2005). For example, a head and an apple can be mapped in the sentence My head is an apple without a core (Sternberg et al., 1993), and form an analogy that is also a metaphor. The recent progress of transformer-based large language models (LLMs) open a path towards a finergrain semantic handling of metaphoric analogies in Natural Language Processing.

Analogical thinking is a process of generalization and abstraction. Large pretrained language models have some analogical abilities, they can be prompted successfully for analogical reasoning (Yasunaga et al., 2024), and perform zero-shot analogical reasoning in visual task after converting them into language (Hu et al., 2023). Do they also recognize complex metaphoric analogies in texts?

To our knowledge, there is currently no evaluation of the ability of LLMs to extract mappings involving more than one single pair of concepts from text.

Analogies are parallels or mappings across concepts. Proportional analogies, which are the focus of this study, are an association between two pairs of concepts such as Answer is to riddle what key is to lock.

We tackle the question of metaphorical mapping identification at the lexical level,

For example, the poem verse Memory, a jar of flies. Spin off the lid. (Seibles, 1955) maps memory to a jar, and flies to an implicit concept that could very well be <-recollections>.

Given a short text containing a metaphor, the task we introduce is to extract a pair of Noun Phrases (NPs) belonging to the source domain of the metaphor, i.e. expressions that are used metaphorically (jar and flies in our example) and another pair of NPs belonging to the target domain, i.e. the topic being discussed (memory and <-recollections>, such as flies are to the jar what recollections are to memory). We view this task as a step towards the extraction and structuring of more complex metaphoric mappings from free texts.

Elucidating freely expressed complex analogies also has the potential to improve NLP downstream tasks such as Question Answering, Natural Language Inference or Machine Translation (Li et al., 2024).