How AI Impacts Skill Formation

Paper · arXiv 2601.20245 · Published January 28, 2026

AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effectively supervise AI remains unclear. Novice workers who rely heavily on AI to complete unfamiliar tasks may compromise their own skill acquisition in the process. We conduct randomized experiments to study how developers gained mastery of a new asynchronous programming library with and without the assistance of AI. We find that AI use impairs conceptual understanding, code reading, and debugging abilities, without delivering significant efficiency gains on average. Participants who fully delegated coding tasks showed some productivity improvements, but at the cost of learning the library. We identify six distinct AI interaction patterns, three of which involve cognitive engagement and preserve learning outcomes even when participants receive AI assistance. Our findings suggest that AI-enhanced productivity is not a shortcut to competence and AI assistance should be carefully adopted into workflows to preserve skill formation– particularly in safety-critical domains.

Productivity Gains Many studies have found improvements in productivity using these AI assistants. For example, Brynjolfsson et al. found that AI-based conversational assistants increased the number of issues call center workers were able to resolve on average by 15%. Dell’Acqua et al. find similar results in which consultants completed 12.2% more tasks on average with the help of AI than without it. While the skill-based effects differ across studies, a consistent pattern emerges in call center work, consulting, legal question-answering, and writing: less experienced and lower-skilled workers tend to benefit most [Brynjolfsson et al., 2025, Dell’Acqua et al., 2023, Choi and Schwarcz, 2023, Noy and Zhang, 2023]. One exception was when GPT-4 was given to Kenyan small business owners, AI business advice helped high performers (by revenue) improve business results while worsening the results for lower performers [Otis et al., 2024].

For software engineering in particular, Peng et al. found that crowd-sourced software developers using copilot completed a task 55.5% faster than the control group and novice programmers benefited more from AI coding assistance. Follow-up studies of developers in major software companies and found that AI-generated code completions provide a 26. 8% boost in productivity as measured by pull requests, commits, and software product builds [Cui et al., 2024]. This study also found that less experienced coders experienced greater boosts in productivity. While studies find that junior or less experienced developers experience greater productivity uplift from using AI, these very same workers should be quickly developing new skills in the workplace. Yet the effect of these tools on the skill formation of this subgroup remains unknown. Will the skill development of novice workers be affected significantly since they are still in the process of learning their trade? We are motivated by whether this productivity comes from free or at a cost.

Cognitive Offloading Concerns around the impact of AI assistance and skill depletion have been highlighted by recent works. For example, medical professionals trained with AI assistance might not develop keen visual skills to identify certain conditions [Macnamara et al., 2024]. In surveys given to knowledge workers, frequent use of AI has been associated with worse critical thinking abilities and increased cognitive offloading [Gerlich, 2025]. Furthermore, knowledge workers reported a lower cognitive effort and confidence when using generative AI tools [Lee et al., 2025]. However, these surveys are observational and may not capture the causal effects of AI usage.

Skill Retention An adjacent line of inquiry to our research is how well humans retain knowledge and skills after AI assistance. Wu et al. find that even when generative AI improved immediate performance on content creation tasks (e.g., writing a Facebook post, writing a performance review, drafting a welcoming email), the performance increase did not persist in subsequent tasks performed independently by humans afterward. For data science tasks, Wiles et al. described the impact of AI on non-technical consultants as an “exoskeleton”, the enhanced technical abilities enabled by AI did not persist when workers no longer had access to AI. Our work asks the natural follow-up question of whether the usage of AI tools could cause worse learning outcomes for the acquisition of skills on the job for technical professionals themselves.

Overreliance Although much of the literature in economics on AI-enhanced productivity implicitly assumes that generations of AI are trustworthy, the reality is that generative AI can produce incorrect [Longwell et al., 2024] or hallucinated content [Maleki et al., 2024]. When models are fallible, yet still deployed to assist humans, human decisions that follow erroneous model decisions are referred to as “overreliance” [Buçinca et al., 2021, Vasconcelos et al., 2023, Klingbeil et al., 2024]. Although methods have been suggested to reduce overreliance, these focus mainly on decision-time information such as explanations [Vasconcelos et al., 2023, Reingold et al., 2024] or debate [Kenton et al., 2024].

Analyzing these concepts or common patterns among participants helps supplement our quantitative observations of skill formation and task completion in this new library. Specifically, the following axes shows differences between participants and across conditions:

• AI Interaction Time: The lack of significant speed-up in the AI condition can be explained by how some participants used AI. Several participants spent substantial time interacting with the AI assistant, spending up to 11 minutes composing AI queries in total (Figure 12).

• Query Types: The study participants varied between conceptual questions only, code generation only, and a mixture of conceptual, debugging, and code generation queries. Participants who focused on asking the AI assistant debugging questions or confirming their answer spent more time on the task (Figure 18).

• Encountering Errors: Participants in the control group (no AI) encountered more errors; these errors included both syntax errors and Trio errors (Figure 14). Encountering more errors and independently resolving errors likely improved the formation of Trio skills.

• Active Time: Using AI decreased the amount of active coding time. Time spent coding shifted to time spent interacting with AI and understanding AI generations (Figure 16).

Using these axes, we develop a typology of six AI interaction patterns based on query types, number of queries, queries per task, and active time. As a result of this categorization, these six patterns yield different outcomes for both completion time and skill formation (i.e., quiz score). Figure 11 summarizes each pattern and the average task outcomes. We can divide the interaction pattern into two categories: low- and high-scoring interaction patterns; the high-scoring patterns generally involve more cognitive effort and less AI reliance.

Although each behavior pattern cluster is small, the difference between low-scoring clusters and high-scoring clusters is stark. Low-Scoring Interaction Patterns Low-scoring patterns generally involved a heavy reliance on AI, either through code generation or debugging. The average quiz scores in these groups are less than 40%. Participants exhibiting these interaction patterns showed less independent thinking and more cognitive offloading [Lee et al., 2025].

• AI Delegation (n=4): Participants in this group wholly relied on AI to write code and complete the task. This group completed the task the fastest and encountered few or no errors in the process.

• Progressive AI Reliance (n=4): Participants in this group started by asking 1 or 2 questions and eventually delegated all code writing to the AI assistant. This group scored poorly on the quiz largely due to not mastering any of the concepts in the second task.

• Iterative AI Debugging (n=4): Participants in this group relied on AI to debug or verify their code. This group made a higher number of queries to the AI assistant, but relied on the assistant to solve problems, rather than clarifying their own understanding. As a result, they scored poorly on the quiz and were relatively slower at completing the two tasks.

High-Scoring Interaction Patterns High-scoring interaction patterns were clusters of behaviors where the average quiz score is 65% or higher. Participants in these clusters used AI both for code generation, conceptual queries or a combination of the two.

• Generation-Then-Comprehension (n=2): Participants in this group first generated code and then manually copied or pasted the code into their work. After their code was generated, they then asked the AI assistant follow-up questions to improve understanding. These participants were not particularly fast when using AI, but demonstrated a high level of understanding on the quiz. Importantly, this approach looks nearly the same as the AI delegation group, but additionally uses AI to check their own understanding.

• Hybrid Code-Explanation (n=3): Participants in this group composed hybrid queries in which they asked for code generation along with explanations of the generated code. Reading and understanding the explanations they asked for took more time.

• Conceptual Inquiry (n=7): Participants in this group only asked conceptual questions and relied on their improved understanding to complete the task. Although this group encountered many errors, they also independently resolved these errors. On average, this mode was the fastest among high-scoring patterns and second fastest overall after the AI Delegation mode.

Our main finding is that using AI to complete tasks that require a new skill (i.e., knowledge of a new Python library) reduces skill formation. In a randomized controlled trial, participants were assigned to the treatment condition (using an AI assistant, web search, and instructions) or the control condition (completing tasks with web search and instructions alone). The erosion of conceptual understanding, code reading, and debugging skills that we measured among participants using AI assistance suggests that workers acquiring new skills should be mindful of their reliance on AI during the learning process. Among participants who use AI, we find a stark divide in skill formation outcomes between high-scoring interaction patterns (65%-86% quiz score) vs low-scoring interaction patterns (24%-39% quiz score). The high scorers only asked AI conceptual questions instead of code generation or asked for explanations to accompany generated code; these usage patterns demonstrate a high level of cognitive engagement.

Contrary to our initial hypothesis, we did not observe a significant performance boost in task completion in our main study. While using AI improved the average completion time of the task, the improvement in efficiency was not significant in our study, despite the AI Assistant being able to generate the complete code solution when prompted. Our qualitative analysis reveals that our finding is largely due to the heterogeneity in how participants decide to use AI during the task. There is a group of participants who relied on AI to generate all the code and never asked conceptual questions or for explanations. This group finished much faster than the control group (19.5 minutes vs 23 minutes), but this group only accounted for around 20% of the participants in the treatment group. Other participants in the AI group who asked a large number of queries (e.g., 15 queries), spent a long time composing queries (e.g., 10 minutes), or asked for follow-up explanations, raised the average task completion time. These contrasting patterns of AI usage suggest that accomplishing a task with new knowledge or skills does not necessarily lead to the same productive gains as tasks that require only existing knowledge.

Together, our results suggest that the aggressive incorporation of AI into the workplace can have negative impacts on the professional development workers if they do not remain cognitatively engaged. Given time constraints and organizational pressures, junior developers or other professionals may rely on AI to complete tasks as fast as possible at the cost of real skill development. Furthermore, we found that the biggest difference in test scores is between the debugging questions. This suggests that as companies transition to more AI code writing with human supervision, humans may not possess the necessary skills to validate and debug AI-written code if their skill formation was inhibited by using AI in the first place.