Domain Specialization in LLMs

Topic · 88 papers

Related topics:

A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning
we introduce a hybrid RAG system enhanced through a comprehensive suite of optimizations that significantly improve retrieval quality, augment reasoning capabilities, and refine numerical computation …
AI-Powered (Finance) Scholarship
This paper describes a process for automatically generating academic finance papers using large language models (LLMs). It demonstrates the process’ efficacy by producing hundreds of complete papers o…
AI-Researcher: Autonomous Scientific Innovation
The powerful reasoning capabilities of Large Language Models (LLMs) in mathematics and coding, combined with their ability to automate complex tasks through agentic frameworks, present unprecedented o…
AInsight: Augmenting Expert Decision-Making with On-the-Fly Insights Grounded in Historical Data
In decision-making conversations, experts must navigate complex choices and make on-the-spot decisions while engaged in conversation. Although extensive historical data often exists, the real-time nat…
ALIGN: Prompt-based Attribute Alignment for Reliable, Responsible, and Personalized LLM-based Decision-Making
Large language models (LLMs) are increasingly being used as decision aids. However, users have diverse values and preferences that can affect their decision-making, which requires novel methods for LL…
Affordable AI Assistants with Knowledge Graph of Thoughts
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significa…
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation—modifying inputs with instructions, strategies, or evidence, rather than we…
Ask an Expert: Leveraging Language Models to Improve Strategic Reasoning in Goal-Oriented Dialogue Models
“Existing dialogue models may encounter scenarios which are not well-represented in the training data, and as a result generate responses that are unnatural, inappropriate, or unhelpful. We propose th…
Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models
We present Attentive Reasoning Queries (ARQs), a novel structured reasoning approach that significantly improves instruction-following in Large Language Models through domain-specialized reasoning blu…
Automatic Extraction of Metaphoric Analogies from Literary Texts: Task Formulation, Dataset Construction, and Evaluation
Extracting metaphors and analogies from free text requires high-level reasoning abilities such as abstraction and language understanding. Our study focuses on the extraction of the concepts that form …
Beyond "Not Novel Enough": Enriching Scholarly Critique with LLM-Assisted Feedback
Novelty assessment is a central yet understudied aspect of peer review, particularly in highvolume fields like NLP where reviewer capacity is increasingly strained. We present a structured approach fo…
Bottom-up Domain-specific Superintelligence: A Reliable Knowledge Graph is What We Need
Language models traditionally utilized for cross-domain generalization in natural language understanding and generation have recently demonstrated task-specific reasoning through inference-time scalin…
Building Decision Making Models Through Language Model Regime
LLMs demonstrate remarkable success in generalizing across varied language tasks, inspiring a new strategy for training decision making models. Our approach, referred to as "Learning then Using" (LTU)…
CEO: Corpus-based Open-Domain Event Ontology Induction
This paper presents CEO, a novel Corpus-based Event Ontology induction model to relax the restriction imposed by pre-defined event ontologies. Without direct supervision, CEO leverages distant supervi…
Can Large Language Models Capture Human Annotator Disagreements?
Human annotation variation (i.e., annotation disagreements) is common in NLP and often reflects important information such as task subjectivity and sample ambiguity. While Large Language Models (LLMs)…
Can Large Language Models Develop Strategic Reasoning? Post-training Insights from Learning Chess
While reinforcement learning (RL) for large language models (LLMs) has shown promise in mathematical reasoning, strategic reasoning for LLMs using RL remains largely unexplored. We investigate whether…
Can Theoretical Physics Research Benefit from Language Agents?
yet their application in theoretical physics research is not yet mature. This position paper argues that LLM agents can potentially help accelerate theoretical, computational, and applied physics when…
Chamain: Harmonizing Character Persona Integrity with Domain-Adaptive Knowledge in Dialogue Generation
Recent advances in large language models (LLMs) have shown their capacity for generating natural dialogues, leveraging extensive pre-trained knowledge. However, the seamless integration of domain-spec…
Characterizing Deep Research: A Benchmark and Formal Definition
Information tasks such as writing surveys or analytical reports require complex search and reasoning, and have recently been grouped under the umbrella of deep research — a term also adopted by recent…
Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems
we conducted a user study comparing an LLM-based CA to an intent-based system regarding interaction efficiency, user experience, workload, and usability. This revealed that LLM-based CAs exhibited bet…
Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction
Keyphrase extraction is the task of identifying a set of keyphrases present in a document that captures its most salient topics. Scientific domain-specific pre-training has led to achieving state-of-t…
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given th…
Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains
However, its extension to broader, less structured domains remains unexplored. In this work, we investigate the effectiveness and scalability of RLVR across diverse realworld domains including medicin…
Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory
While large language models (LLMs) leverage both knowledge and reasoning during inference, the capacity to distinguish between them plays a pivotal role in model analysis, interpretability, and develo…
Dense Retrieval Adaptation using Target Domain Description
“This paper introduces a new category of domain adaptation in IR that is as-yet unexplored. Here, similar to the zero-shot setting, we assume the retrieval model does not have access to the target doc…
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
The scarcity of domain-specific dialogue datasets limits the development of dialogue systems across applications. Existing research is constrained by general or niche datasets that lack sufficient sca…
Do LLMs Truly Understand When a Precedent Is Overruled?
Large language models (LLMs) with extended context windows show promise for complex legal reasoning tasks, yet their ability to understand long legal documents remains insufficiently evaluated. Develo…
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Specifically, domain specialization of Large Language Models (LLMs) is defined as the process of customizing general-purpose LLMs according to specific domain contextual data, augmented by domain-spec…
Domain-specific Question Answering with Hybrid Search
With the increasing adoption of Large Language Models (LLMs) in enterprise settings, ensuring accurate and reliable question-answering systems remains a critical challenge. Building upon our previous …
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval, making them highly effective for knowledge-intensive tasks. A crucial but often und…
Educating LLMs like Human Students: Structure-aware Injection of Domain Knowledge
This paper presents a pioneering methodology, termed StructTuning, to efficiently transform foundation Large Language Models (LLMs) into domain specialists. It significantly minimizes the training cor…
Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
Large language models (LLMs) often exhibit limited performance on domain-specific tasks due to the natural disproportionate representation of specialized information in their training data and the sta…
Empowering Domain-Specific Language Models with Graph-Oriented Databases: A Paradigm Shift in Performance and Model Maintenance
Modern GODB have emerged as a solution for highly-connected data, and link oriented queries and algorithms [2]. In fact, they have been a valuable solution in software industry for decades. The implem…
Evaluating Large Language Models in Exercises of UML Class Diagram Modeling
The goal of this work is to evaluate the capability of LLM agents to correctly generate UML class diagrams in activities of Requirements Modeling in the field of Software Engineering. Our aim is to ev…
Exploring LLMs Applications in Law: A Literature Review on Current Legal NLP Approaches
The integration of Natural Language Processing (NLP) and AI into legal tasks is a natural progression, given the linguistic nature of law. This combination allows for more efficient and accurate analy…
FinCoT: Grounding Chain-of-Thought in Expert Financial Reasoning
This paper presents FinCoT, a structured chain-of- thought (CoT) prompting approach that incorporates insights from domain-specific expert financial reasoning to guide the reasoning traces of large la…
FormulaOne: Measuring the Depth of Algorithmic Reasoning Beyond Competitive Programming
Frontier AI models demonstrate formidable breadth of knowledge. But how close are they to true human — or superhuman — expertise? Genuine experts can tackle the hardest problems and push the boundarie…
Further Explorations on the Use of Large Language Models for Thematic Analysis. Open-Ended Prompts, Better Terminologies and Thematic Maps
There is a nascent area, where scholars are approaching thematic analysis (TA) using LLMs, following the six phases developed by BRAUN and CLARKE (2006). TA is a qualitative method of analysis where t…
GRASP: Municipal Budget AI Chatbots for Enhancing Civic Engagement
Abstract—There are a growing number of AI applications, but none tailored specifically to help residents answer their questions about municipal budget, a topic most are interested in but few have a so…
Harnessing Business and Media Insights with Large Language Models
• Business-Centric Question Answering: FALM leverages diverse data sources, including news articles, video interviews, ranking lists, financial metrics, and business leader profiles, to answer complex…
How well can large language models explain business processes?
One such system’s functionality is Situation-Aware eXplainability (SAX), which relates to generating causally sound and yet human-interpretable explanations that take into account the process context …
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey
To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge. In this survey, we provide a comprehensive overview of these methods, which we cate…
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains
However, the quality and transparency of their internal reasoning processes remain underexplored. This work moves beyond the final-answer accuracy and investigates step-by-step reasoning in the medica…
LESS: Selecting Influential Data for Targeted Instruction Tuning
Instruction tuning has unlocked powerful capabilities in large language models (LLMs), using combined datasets to develop general-purpose chatbots. However, real-world applications often require a spe…
LLM Augmentations to support Analytical Reasoning over Multiple Documents
Our key contributions are: 1) We conduct the first investigation of the feasibility of using LLMs in intelligence analysis where both evidencebased reasoning and analytical creativity is of utmost …
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
“Large language models (LLMs) have demonstrated remarkable zero shot generalization abilities: state-of-the-art chatbots can provide plausible answers to many common questions that arise in daily life…
LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems
Recent progress in Large Reasoning Models (LRMs) has significantly enhanced the reasoning abilities of Large Language Models (LLMs), empowering them to tackle increasingly complex tasks through reflec…
LSR: Reinforcement Learning with Supervised Reward Outperforms SFT in Instruction Following
After the pretraining stage of LLMs, techniques such as SFT, RLHF, RLVR, and RFT are applied to enhance instruction-following ability, mitigate undesired responses, improve reasoning capability and en…
Large Language Model-based Data Science Agent: A Survey
The rapid advancement of Large Language Models (LLMs) has driven novel applications across diverse domains, with LLM-based agents emerging as a crucial area of exploration. This survey presents a comp…
Large Language Models as Planning Domain Generators
Developing domain models is one of the few remaining places that require manual human labor in AI planning. Thus, in order to make planning more accessible, it is desirable to automate the process of …
Large Language Models can accomplish Business Process Management Tasks
“In this paper, we illustrate how LLMs can be utilized for three BPM tasks that require textual documents as input. For all tasks, we follow the same approach, illustrated in Fig. 1. We start by assem…
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several fact…
Mastering Diverse Domains through World Models
Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement learning algor…
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies (e.g., supervised fine-tuning, reinforcement learning) and test-time mechanisms (e.g., prompt engin…
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
We propose MODEL SWARMS, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. Specifically, MODEL SWARMS starts with a pool of LLM…
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Supervised fine-tuning (SFT) is a pivotal approach to adapting large language models (LLMs) for downstream tasks; however, performance often suffers from the “seesaw phenomenon”, where indiscriminate …
On The Persona-based Summarization of Domain-Specific Documents
In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. H…
On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
Despite their impressive performance, recent studies have highlighted the potential for significant enhancements in LLMs’ taskspecific performance through fine-tuning strategies like Reinforcement Lea…
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-training paradigms for refining the capabilities and aligning the behavior of Large Language Models (LLMs). Existing…
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
OpenAI’s recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model and offers a new paradigm for fine-tuning beyond simple pattern imitation. This te…
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval- Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data…
PolyResponse: A Rank-based Approach to Task-Oriented Dialogue with Application in Restaurant Search and Booking
We present PolyResponse, a conversational search engine that supports task-oriented dialogue. It is a retrieval-based approach that bypasses the complex multi-component design of traditional task-orie…
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strateg…
RAG-Gym: Systematic Optimization of Language Agents for Retrieval-Augmented Generation
Retrieval-augmented generation (RAG) has shown great promise for knowledge-intensive tasks and recently advanced with agentic RAG, where language agents engage in multi-round interactions with externa…
Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains
Traditional Retrieval-Augmented Generation (RAG) pipelines rely on similarity-based retrieval and re-ranking, which depend on heuristics such as top-k, and lack explainability, interpretability, and r…
Reranking-based Generation for Unbiased Perspective Summarization
Generating unbiased summaries in real-world settings such as political perspective summarization remains a crucial application of Large Language Models (LLMs). Yet, existing evaluation frameworks rely…
Rethinking STS and NLI in Large Language Models
Recent years, have seen the rise of large language models (LLMs), where practitioners use task-specific prompts; this was shown to be effective for a variety of tasks. However, when applied to semanti…
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Extending Reinforcement Learning with Verifiable Rewards (RLVR) to real-world tasks often requires balancing objective and subjective evaluation criteria. However, many such tasks lack a single, unamb…
Rule2Text: Natural Language Explanation of Logical Rules in Knowledge Graphs
Knowledge graphs (KGs) often contain sufficient information to support the inference of new facts. Identifying logical rules not only improves the completeness of a knowledge graph but also enables th…
SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval
To address these issues, in this paper, we propose SAILER, a new Structure-Aware pre-traIned language model for LEgal case Retrieval. It is highlighted in the following three aspects: (1) SAILER fully…
SParC: Cross-Domain Semantic Parsing in Context
The most prominent context-dependent text-to-SQL benchmark is ATIS1, which is set in the flight-booking domain and contains only one database (Hemphill et al., 1990; Dahl et al., 1994). In a real-worl…
Scaling Expert Language Models with Unsupervised Domain Discovery
Large language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We introdu…
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes o…
TaskLAMA: Probing the Complex Task Understanding of Language Models
“Structured Complex Task Decomposition (SCTD) is the problem of breaking down a complex real-world task (such as planning a wedding) into a directed acyclic graph over individual steps that contribute…
The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
The “LLM-as-a-judge” paradigm employs Large Language Models (LLMs) as annotators and evaluators in tasks traditionally performed by humans. LLM annotations are widely used, not only in NLP research bu…
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think
Long chain-of-thought (CoT) is an essential ingredient in effective usage of modern large language models, but our understanding of the reasoning strategies underlying these capabilities remains limit…
The Incomplete Bridge: How AI Research (Mis)Engages with Psychology
Social sciences have accumulated a rich body of theories and methodologies for investigating the human mind and behaviors, while offering valuable insights into the design and understanding of Artific…
Towards Understanding Counseling Conversations: Domain Knowledge and Large Language Models
This paper proposes a systematic approach to examine the efficacy of domain knowledge and large language models (LLMs) in better representing conversations between a crisis counselor and a help seeker…
Transcendence: Generative Models Can Outperform The Experts That Train Them
Generative models (GMs) are typically trained to mimic human behavior. These humans may be skilled in their various human objectives: answering a question, creating art, singing a song. The model has …
Tuning Language Models by Proxy
We introduce proxy-tuning, a lightweight decoding-time algorithm that operates on top of black-box LMs to achieve the same end as direct tuning, but by accessing only its predictions over the output v…
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Large language models (LLMs) have shown impressive capabilities across tasks such as mathematics, coding, and reasoning, yet their learning ability, which is crucial for adapting to dynamic environmen…
Using LLMs to Discover Legal Factors
Recently, large language models (LLMs) have been applied automatically to annotate legal case texts from particular legal domains in terms of factors from pre-existing factor lists. In this paper, we …
Virtual Assistance in Any Context
Abstract Several domain-specific assistants in the form of chatbots have conquered many commercial and private areas. However, there is still a limited level of systematic knowledge of the distinctive…
Virtuous Machines: Towards Artificial General Science
Artificial intelligence systems are transforming scientific discovery by accelerating specific research tasks, from protein structure prediction to materials design, yet remain confined to narrow doma…
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
AI research agents offer the promise to accelerate scientific progress by automating the design, implementation, and training of machine learning models. However, the field is still in its infancy, an…
You Don't Need Pre-built Graphs for RAG: Retrieval Augmented Generation with Adaptive Reasoning Structures
Large language models (LLMs) often suffer from hallucination, generating factually incorrect statements when handling questions beyond their knowledge and perception. Retrieval-augmented generation (R…
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Our results reveal a significant decline in accuracy as problem complexity grows—a phenomenon we term the “curse of complexity.” This limitation persists even with larger models and increased inferenc…
𝙻𝙼𝟸: A Simple Society of Language Models Solves Complex Reasoning
Despite demonstrating emergent reasoning abilities, Large Language Models (LLMS) often lose track of complex, multi-step reasoning. Existing studies show that providing guidance via decomposing the or…