Exploring Student-AI Interactions in Vibe Coding

Paper · arXiv 2507.22614 · Published July 30, 2025
Visual GUI AgentsEvaluations

Findings. For both groups, the majority of student interactions with Replit were to test or debug the prototype and only rarely did students visit code. Prompts by advanced software engineering students were much more likely to include relevant app feature and codebase contexts than those by introductory programming students.

2.1.3 Vibe Coding vs. Other AI Programming Workflows. While definitions of AI-assisted programming workflows are in flux, we do wish to contrast vibe coding against two other workflows in order to further position vibe coding. First, we distinguish vibe coding from first-generation GenAI programming workflows from 2022 and 2023, where programmers would prompt for each function and the AI completed the code [32]. (Chat interfaces hadn’t been integrated into the GenAI tools yet!) Vibe coding, as we have described, is much further abstracted from the code, allowing programmers to delegate significantly larger tasks to the AI. That said, it is not entirely hands off [32]. Second, we distinguish vibe coding from “agentic coding” [31], in which the intention is to be hands off. The human is not in the loop: “agentic coding enables autonomous software development through goal-driven agents capable of planning, executing, testing, and iterating tasks with minimal human intervention” [31]. At least, this is how agentic coding is defined for experienced developers. We wonder to what extent our comparably less experienced students’ “vibe coding” looks like agentic coding.

Reloading the tab displaying the prototype

Interacting with the prototype using typical inputs or actions that reflect normal usage;

repeated clicks or filling out a form count as a single case

Interacting with the prototype using unusual, invalid, or boundary inputs such as negative

numbers and empty fields

Prompting to address an error, bug, or system failure

Prompting to request any changes to a core feature (e.g. adding, checking, deleting, or editing

budgets, budget categories, or expenses) that does not have an error or bug

Prompting to request any changes to a non-core feature (e.g. UI elements, spending graphs)

that does not have an error or bug

Prompting to ask the AI to explain, define, or elaborate on a concept, term, feature, or technical

details

Prompting for open-ended ideas or approaches without a clearly defined solution

Prompting actions not covered by other labels, such as simulating a test case or responding

to the AI’s questions

Accepting a code modification proposed by the Replit Assistant using the built-in applysuggestion

feature

Interrupting the AI’s response before completion by clicking a pause or stop button in the

Replit interface

Resuming a paused AI response by clicking a continue or resume button in the Replit interface

Reverting the code to a previous version by selecting a checkpoint created during an earlier

AI response

Loading a preview of code from a previous checkpoint without reverting to it

Approving 0 or more optional features from the Replit Agent’s initial project plan

Spending significant time analyzing code, console logs, or other development-related outputs.

Modifying code, console logs, or other development-related outputs

4.1 RQ1: Student-AI Interactions in Vibe Coding Overview.

The overall interactions with AI tools across the 19 participants were shown in Figure 1. Across all students, the most prevalent label was Interacting with Prototype (63.61% of all labels, n= 1164), followed by Writing a Prompt (20.60%, n = 377), Managing ReplitWorkflow (8.42%, n = 154), and Engaging with Code/log (7.38%, n = 135). However, differences emerged across cohorts: students in CS1 showed markedly higher proportions of Writing a Prompt labels compared to SWE students, whereas Engaging with Code/log interactions were more common among SWE students.

Restarters. While prompting was a common behavior across participants, a subset of students (4 out of 19) exhibited a more dramatic interaction pattern: they restarted the entire project using the Replit Agent mid-task. We term these students restarters (S1, S2, S14, S17). 3 out of the 4 restarters restarted primarily to simplify their interaction with the AI, citing overwhelming or ambiguous behavior from earlier prompts. For instance, one student reflected that they “asked the Replit to do way too many things,” making it difficult to identify specific issues. Another opted to “break it down one task at a time” after experiencing repeated failures. These behaviors suggest that some students use restart strategies not out of failure alone, but as a form of iterative refinement and task decomposition. The fourth restarter (S2) retained their original prompt but added a sentence to explicitly request a Flask framework over React due to familiarity and perceived simplicity, and they considered their restart as a means of architectural realignment rather than functional simplification. In all cases, the restart decision reflected metacognitive awareness about the limitations of debugging through prompting: students had attempted to fix issues in the original prototype but found the bugs unresolvable via continued AI interaction, prompting a fresh start instead.

Table 3: Frequency of labels for Interacting with Prototype

actions.

Label Count Percent

Test Common Case 1067 91.67%

Refresh Prototype 71 6.10%

Test Edge Case 26 2.23%

Total 1164 100%

Interacting with Prototype. First, we examine how students interact with the prototypes generated by the AI tools (Table 3). The overwhelming majority of these interactions (91.67%) involved testing common use cases, while only 6.10% involved refreshing the prototype and a mere 2.23% involved edge case testing. Transcript data suggests that prototype refreshes were typically prompted by technical limitations (e.g., Replit failing to maintain state across pages) rather than as part of a deliberate debugging strategy. Notably, no student wrote or executed unit tests during the study, indicating that their approach to testing was exclusively centered on feature-level behaviors visible through the UI. The absence of structured test practices, combined with the minimal edge case coverage, suggests that many students were unable to progress beyond basic functionality, frequently encountering bugs that prevented more in-depth testing. These patterns highlight the inherently iterative and sometimes unstable nature of vibe coding, where students often remain occupied with basic interactions and repeated troubleshooting, rather than advancing toward comprehensive feature validation.

Writing a Prompt. To better understand what students’ goals are when they are prompting, we analyzed the labels of all prompting behaviors for all 19 students (Table 4). The majority of prompts (61.01%) were used for debugging AI-generated code, followed by modifications to non-core features (16.71%) and core features (14.06%). Prompts related to brainstorming, clarification questions, or miscellaneous tasks were relatively rare (<5% each). These data indicate that students primarily engaged with AI tools to troubleshoot and refine partial implementations, rather than to build functionality from scratch. As one student described their strategy, “...finding [bugs] myself, realizing what the problem was, and then putting it back into the Agent to solve it in like 2 seconds...” reflects how students often used AI to efficiently resolve implementation issues they had already identified.

In terms of which AI tools students used for prompting (Table 5), the Replit Assistant accounted for a slight majority of prompting interactions (50.94%), closely followed by Replit Agent (46.24%). ChatGPT was used only in 2.82% of the prompt instances, likely reflecting its auxiliary role in the workflow. These numbers suggest that while multiple AI tools were available, most of the prompting occurred within the embedded Replit interfaces, suggesting that accessibility and immediacy of tooling may play a significant role in shaping student behavior. As one student reflected, “I didn’t really understand the real difference between Agent and Assistant... it kind of felt like they were the same thing”, emphasizing how conceptual ambiguity may have contributed to balanced usage. Another student explained, “I’ll kind of use ChatGPT and Copilot for assistance on making small fixes,” illustrating ChatGPT’s more occasional and supporting role relative to the Replit-native tools.

Engaging with Code/log. Among the rare cases where students engaged with code and logs generated by Replit, an overwhelming 90.37% of the interactions were reading and interpreting code (n = 122), and the remaining actions were direct edits (9.63%, n = 13). This strong preference for interpretation over modification suggests that students were often hesitant to alter AI-generated code, likely due to limited familiarity with the underlying implementation. The vibe coding workflow may exacerbate this hesitation by distancing students from the logic and structure of AI-generated codebases. As one student explained, “Because so much of it was just done by the LLM, I had a lesser understanding of the codebase — rather than what I would do on my own, where I know what each line does.” We further expand and compare Engaging with Code/log behaviors in Section 4.2.