From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. It entails a comprehensive agent construction scenario, including phases like Conversation Selection, Scene Extraction, CoT Completion, and Scene Augmentation, leading to the LLMs Training phase. This approach appears to enhance agent controllability and adaptability in complex, multi-turn dialogues.
However, a significant challenge in the realm of LLMs lies in their integration into conversational agents(Weng, 2023; Wang et al., 2023a; Sumers et al., 2023; Xi et al., 2023) . While these models exhibit high levels of performance in isolated tasks, creating an agent that can sustain coherent, context-aware, and purpose-driven conversations remains an intricate endeavor. The need for a more sophisticated framework that leverages the strengths of LLMs while addressing their limitations in conversational settings has become increasingly apparent.
The memory module in RAISE framework stores information perceived from its environment and facilitates the agent’s future actions. The memory includes the following components:
System Prompt Includes profiles (detailing role identity, objectives, and behaviors), task instructions, tool descriptions, and few-shot learning elements for optimizing model performance. Flexibly designed, system prompt can either remain static or dynamically adjust to accommodate various stages of a dialogue and differing query types.
Context Includes conversation history and task trajectory. Conversation history records all query-response pairs within the dialogue, providing a complete context for more accurate agent perception. Task trajectory documents the decision-making trajectory, including plan designation, tool selection, and execution, guiding the agent’s future planning.
Scratchpad Logs background information, knowledge generated by reasoning and observations from previous tool usage, essential for efficiency in multi-turn interactions.
Examples Comprises query-response pairs used for recalling relevant examples to supplement the model’s and tools’ knowledge gaps and to customize agent behavior and expression.
These four components collectively form the working memory of RAISE, with conversation history and scratchpad being dialogue-level, while examples and task trajectory are turn-level.
The tool module enriches LLMs after pretraining and Supervised Fine-Tuning (SFT) by integrating external knowledge sources and resources. This module incorporates a diverse array of tools, including but not limited to databases for data retrieval, APIs for system interactions, sophisticated recommendation systems, and collaborative frameworks involving other LLMs or agents. The description file for a tool typically needs to include the tool’s name, its function, essential parameters, optional parameters, and may also include some usage examples. This descriptive file aids agents in better planning, tool selection, parameter generation for tools, and execution of those tools.
The controller module connects the aforementioned modules through preset trigger conditions. Upon receiving a new query, the agent executes the loop of perception, planning, tool selection, and tool execution. The specific process is as follows. Memory Update At the beginning of a conversation, the Scratchpad records the context of the dialogue, including user and agent roles, date, time, etc.
During the conversation, each time a user query is received, the system will: (1) Add the user’s query to the Conversation History; (2) Recall topk relevant examples from the Example Pool for the current task, based on the historical and current query, using vector retrieval; (3) Update the current entity information in the Scratchpad if the user’s query contains a product link; (4) Update the agent’s trajectory in the Task Memory and the results of tool usage in the Scratchpad during task execution; (5) Post-task completion, include the agent’s final output in the Conversation History.
Task Planning After collecting the above information, it is combined into a complete task inference prompt according to the designed template, as illustrated in Figure 2. An example of the complete prompt is available in Table 5 of the Appendix. The LLM utilizes the information within the prompt for perception and planning, subsequently outputting actions in accordance with the format outlined in the prompt. If an action involves invoking a tool, it should specify the tool’s name and input parameters.
Tool Execution This phase involves executing the tool selected in the previous step. The command for tool execution may either be directly output by the agent or correspond to a manually crafted function specific to each tool. The output of the execution is formatted as predetermined. Summary The agent, synthesizing all the information gathered from the environment, decides whether it can respond to the user’s query. Termination criteria might include having gathered sufficient information, exceeding a preset number of loops, or encountering a system error. Upon meeting any of these conditions, the agent can proceed to summarize its findings and provide a response.
However, the study has limitations, including potential hallucination issues and challenges in handling complex logic problems, necessitating further research. Despite these limitations, RAISE presents a promising advancement in adaptable, context-aware conversational agents, offering a foundation for future developments in artificial intelligence.