Visual and GUI Agents
- AutoGLM: Autonomous Foundation Agents for GUIsWhile foundation models excel at acquiring human knowledge, they often struggle with decision-making in dynamic real-world environments, limiting their progress toward artificial general intelligence.…
- Exploring Student-AI Interactions in Vibe CodingFindings. For both groups, the majority of student interactions with Replit were to test or debug the prototype and only rarely did students visit code. Prompts by advanced software engineering studen…
- OmniParser for Pure Vision Based GUI AgentThe recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as a g…
- ShowUI: One Vision-Language-Action Model for GUI Visual AgentBuilding Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. While most agents are language-based, relying on closed-source API with text-ric…