Interesting Scientific Idea Generation Using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders
But how compelling are these AI-generated ideas, and how can we improve their quality? Here, we introduce SciMuse, which uses 58 million research papers and a large-language model to generate research ideas. We conduct a large-scale evaluation in which over 100 research group leaders – from natural sciences to humanities – ranked more than 4,400 personalized ideas based on their interest.
analyzing relationships between research topics across vast scientific literature can reliably predict future research directions [6–10], forecast the potential impact of emerging work [11, 12], and identify unconventional avenues for discovery [13]. With the advent of powerful large-language models (LLMs), it is now possible to leverage knowledge from millions of scientific papers to generate concrete research ideas [14–16].
SciMuse formulates comprehensive research suggestions.
suggestions were evaluated by more than 100 research group leaders from the Max Planck Society across natural sciences and technology (e.g., from the Institutes for Biogeochemistry, Astrophysics, Quantum Optics, and Intelligent Systems) as well as social sciences and humanities (e.g., from the Institutes for Geoanthropology, Demographic Research, and Human Development). These experienced researchers rank the interest-level of more than 4,400 research ideas generated by SciMuse. This large dataset not only allows us to identify connections between properties of ideas and their interest-level, but also enables us to accurately predict the level of interest of new ideas with two fundamentally different methods: (1) training supervised neural networks and (2) using LLMs for zero-shot prediction without access to human evaluations, which will be important when expensive human-expert data is unavailable.
While we could directly use publicly available large language models such as GPT-4 [17] or Gemini [26] or Claude [27] to suggest new research ideas and collaborations, our control over the generated ideas would be limited to the structure of the prompt. Therefore, we decided to build a large knowledge graph from the scientific literature to identify the personalized research interests of scientists.
The knowledge graph, depicted in Fig. 1(a), consists of vertices, representing scientific concepts, and edges are drawn when two concepts jointly appear in a title or abstract of a scientific paper.
Rapid Automatic Key-word Extraction (RAKE) algorithm based on statistical text analysis is used to extract candidate concepts [22]. Those candidates are further refined using GPT, Wikipedia, and human annotators, resulting in 123,128 concepts in the natural and social sciences. We then use more than 58 million scientific papers from the open-source database OpenAlex [23] to create edges. These edges contain information about the co-occurrence of concepts in scientific papers (in titles and abstracts) and their subsequent citation rates. This new knowledge graph representation was recently introduced in [12] to predict the impact of future research topics.