AI & Human Co-Improvement for Safer Co-Superintelligence
Self-improvement is a goal currently exciting the field of AI, but is fraught with danger, and may take time to fully achieve. We advocate that a more achievable and better goal for humanity is to maximize co-improvement: collaboration between human researchers and AIs to achieve co-superintelligence. That is, specifically targeting improving AI systems’ ability to work with human researchers to conduct AI research together, from ideation to experimentation, in order to both accelerate AI research and to generally endow both AIs and humans with safer superintelligence through their symbiosis. Focusing on including human research improvement in the loop will both get us there faster, and more safely.
It seems clear by now that we are marching towards ever more intelligent AI systems that in the long run will surpass humans in all task metrics, and by a large margin. Fully realized self-improvement is clearly an end-game marker. However, endowing AIs with this autonomous ability without appropriate guidance built into the system is fraught with danger for humankind – from misuse through to misalignment
What can we gain? Progress in AI has been made with a combination of both training data and method changes from architecture through to training objectives, often with these advances working in tandem, leading to notable paradigm shifts. For example, the creation of Imagenet and the introduction of AlexNet [33, 34], curating web data and scaling transformers [35, 36, 37], the labeling of instruction following data and building of RLHF training [38, 39, 40], or the collection of verifiable reasoning tasks and the use of RLVR for training chain-of-thought [41, 42, 43, 25]. In each case it took human researchers significant effort, with many smaller intermediate results as well as wrong directions and dead ends, in order to find these wins. Any improvement in our ability to do research will speed up this process. Hence, co-research with strong AI systems built to collaborate with us should accelerate finding the unknown new paradigm shifts which are currently missing.
Overall, we expect co-improvement can provide: (i) faster progress to find important paradigm shifts; (ii) more transparency and steerability than direct self-improvement in making this progress; (iii) more focus on human-centered safe AI. For example, we may be able to develop systems that are super-human at ML theory, so we could have provably safe AI. In contrast, an entirely autonomous AI self-improvement system can suffer from goal misspecification (e.g. what it means to "solve AI" does not take human needs into account).
How do we do it? In order to build AI that can collaborate with us on research, we should put some of our focus on building AI possessing these skills. So, that means measuring the research collaboration skills of AI with new benchmarks, and constructing training data and methods that improve these benchmarks, much as we do with building other skills. These skills should cover all major AI research activities that comprise the end-to-end research pipeline. We define some major ones in Table 1. These include collaboration with us to identify research problems, create training data and benchmarks, innovate methods, design and execute experiments, and conduct evaluation and error analysis which is then fed back to refine the whole process.