Desktop Control

Local CLI for AI agents to observe and control your computer via screen, mouse, and keyboard. Bring your own AI - any model, even without vision.

Works with any app — no integrations required. Privacy-first, free and open source, written in Rust.

View on GitHub Read the spec desktopctl app open Notes

Demo: find a note with shopping list, create reminders for the furniture store trip.

What is this

  • DesktopCtl works with any desktop app - no APIs required
  • Exposes UI as structured tokens for agents
  • Deterministic CLI primitives for click, type, and wait
  • Local GPU-accelerated text recognition and vision
  • Bring your own AI: any agent which can use CLI interface

CLI interface

$ desktopctl app open Notes --json
$ desktopctl keyboard press cmd+f --active-window
$ desktopctl keyboard type "Shopping list"
$ desktopctl screen tokenize --active-window

How it works

Step 01

Fast local perception loop

GPU-accelerated text recognition and computer vision capture the UI and extract structured content.

Step 02

Slow decision loop

The agent reads structured content, runs the model, and decides what to do next.

Step 03

Deterministic execution via CLI

DesktopCtl exposes CLI interface and interacts with the UI using mouse and keyboard.

Stay in control

  • Computer vision and text recognition run locally on your machine
  • Screenshots stay local by default — not shared with AI agents
  • Visual indicator when the agent accesses screen, mouse, or keyboard
  • OS-level permissions are granted to DesktopCtl only