GUI stands for graphical user interface. It is a visual representation of communication presented to the user for easy interaction with the machine. It allows users to manipulate elements on the screen using a mouse, a stylus or even a finger. The actions in a GUI are usually performed through direct manipulation of the graphical elements.
AgentCPM-GUI is an open-source on-device LLM agent model jointly developed by THUNLP, Renmin University of China and ModelBest. Built on MiniCPM-V with 8 billion parameters, it accepts smartphone screenshots as input and autonomously executes user-specified tasks. Key features include: High-quality GUI grounding — Pre-training on a large-scale bilingual Android dataset significantly boosts ...
DarwinKit (old name: MacDriver) lets you work with Apple frameworks and build native Mac applications using Go. dlgs is a cross-platform library for displaying dialogs and input boxes. gamen is cross-platform GUI window creation and management library in Go. gform is an easy to use Windows GUI ...
The attention-based action head not only enables GUI-Actor to perform coordinate-free GUI grounding that more closely aligns with human behavior, but also can generate multiple candidate regions in a single forward pass, offering flexibility for downstream modules such as search strategies.