Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling

Paper · arXiv 2410.17950 · Published October 23, 2024
Tool Computer Use

2 Limitations of Traditional Function Calling

Traditional AI systems have treated function calling as a monolithic task, where the model accepts a task and relevant function schemas, then outputs a complete function call. This approach suffers from several key disadvantages:

• Inefficient Function Retrieval: Retrieving appropriate functions often relies on vector similarity, a heuristic approach known to suffer from issues with accuracy, scalability, and domain specificity, as discussed in [4].

• Excessive Token Lengths: Function schemas can be lengthy, leading to large prompt sizes. This increases deployment costs, time consumption, and can result in decreased accuracy on reasoning tasks. For example, [5] shows that reasoning abilities of LLMs fall drastically with increase in active context length.

• High Output Sensitivity: LLMs are trained on free-flowing text and struggle with the rigid requirements of function calling, where precise variable names, JSON structures, and argument values are crucial.

These limitations have resulted in even the best closed-source LLMs (e.g., GPT-4o, Claude-3 Opus) failing to solve the function calling problem effectively.