Desktop Control
Local CLI for AI agents to observe and control your computer via screen, mouse, and keyboard. Bring your own AI - any model, even without vision.
Works with any app — no integrations required. Privacy-first, free and open source, written in Rust.
Demo: find a note with shopping list, create reminders for the furniture store trip.
What is this
- DesktopCtl works with any desktop app - no APIs required
- Exposes UI as structured tokens for agents
- Deterministic CLI primitives for click, type, and wait
- Local GPU-accelerated text recognition and vision
- Bring your own AI: any agent which can use CLI interface
CLI interface
$ desktopctl app open Notes --json $ desktopctl keyboard press cmd+f --active-window $ desktopctl keyboard type "Shopping list" $ desktopctl screen tokenize --active-window
How it works
Step 01
Fast local perception loop
GPU-accelerated text recognition and computer vision capture the UI and extract structured content.
Step 02
Slow decision loop
The agent reads structured content, runs the model, and decides what to do next.
Step 03
Deterministic execution via CLI
DesktopCtl exposes CLI interface and interacts with the UI using mouse and keyboard.
Stay in control
- Computer vision and text recognition run locally on your machine
- Screenshots stay local by default — not shared with AI agents
- Visual indicator when the agent accesses screen, mouse, or keyboard
- OS-level permissions are granted to DesktopCtl only