MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving

Paper · arXiv 2503.16905 · Published March 21, 2025
Agents MultiMultimodal

Solving this problem requires not only understanding the combined visual and textual information but also applying the lever balance principle by comparing the effects on both sides. One intuitive solution is to delegate the task-solving process to a single multimodal large language model (MLLM). Although the trending MLLMs have the basic abilities (e.g., diagram parsing and theorem retrieval), they are not well optimized to combine these skills in complex scenarios. Therefore, it motivates us to tackle this critical research question: how to leverage and elicit the off-the-shelf MLLMs to address the challenging MSPs?