Desktop Control

Local CLI for AI agents to observe and control your computer via screen, mouse, and keyboard. Bring your own AI - any model, even without vision.

Works with any app — no integrations required. Privacy-first, free and open source, written in Rust.

View on GitHub Read the spec desktopctl app open Notes

Demo: find a note with shopping list, create reminders for the furniture store trip.

What is this

DesktopCtl works with any desktop app - no APIs required
Exposes UI as structured tokens for agents
Deterministic CLI primitives for click, type, and wait
Local GPU-accelerated text recognition and vision
Bring your own AI: any agent which can use CLI interface

CLI interface

$ desktopctl app open Notes --json
$ desktopctl keyboard press cmd+f --active-window
$ desktopctl keyboard type "Shopping list"
$ desktopctl screen tokenize --active-window

How it works

Step 01

Fast local perception loop

GPU-accelerated text recognition and computer vision capture the UI and extract structured content.

Step 02

Slow decision loop

The agent reads structured content, runs the model, and decides what to do next.

Step 03

Deterministic execution via CLI

DesktopCtl exposes CLI interface and interacts with the UI using mouse and keyboard.

Stay in control

Computer vision and text recognition run locally on your machine
Screenshots stay local by default — not shared with AI agents
Visual indicator when the agent accesses screen, mouse, or keyboard
OS-level permissions are granted to DesktopCtl only