Desktop automation
that acts like a human

Orion captures the screen, asks an LLM what to do, and drives the real mouse and keyboard. Then it records the task as a semantic action graph and replays it deterministically — with drift detection when the UI shifts underneath it. Every action is gated by an auditable whitelist.

Download Get started View on GitHub

0 external dependencies Go stdlib only Windows · macOS · Linux empty go.sum

An unscripted run: the task is typed in, Orion reads the screen, clicks the calculator and writes the answer to a file. Every action is whitelisted and audited as it happens.

Record & replay

Record once. Replay deterministically.
Notice when it drifts.

The core of Orion is a semantic action graph — a recording that matches what the screen means, not where its pixels were. It survives moved windows, restyled themes, and different resolutions.

Each step stores a semantic target (role, label, context) plus a perceptual hash of the screen — not a brittle (x, y).
Matching live state replays deterministically; known detours take a recorded branch.
The LLM is consulted only at genuine branch points, constrained to the recording's scope.
Drift detection surfaces when the UI has moved on — instead of silently clicking the wrong thing.

Read the full spec ↗

// One recorded step — matched by meaning, not pixels
{
  "action": { "type": "mouse_click", "reason": "Export report" },
  "target":  { "role": "button", "label": "Export", "context": "top-right" },
  "screen_state_before": { "perceptual_hash": "c3e1a0…" }
}

# Replay anywhere — parameters fill the blanks
$ orion replay f3a1 --param filename=june-2026.csv
Replay finished: 3 steps, drift=false, 0 warnings

Capabilities

Everything an operator needs

Built from twelve focused subsystems — each using only the Go standard library and OS primitives.

Human-like input

Mouse moves along randomized Bézier curves with jitter and overshoot; keystrokes carry natural 40–220 ms delays.

Whitelist & audit

Every action clears a glob-rule whitelist and is written to an append-only NDJSON log before it ever executes.

Record & replay

Capture tasks as a semantic action graph and replay them with perceptual-hash state matching and drift detection.

Multi-provider LLM

Anthropic, OpenAI, and GitHub Copilot — all with vision, reasoning controls, and structured action output.

MCP server

Expose the desktop as Model Context Protocol tools over stdio, so any MCP client can drive it — pure stdlib JSON-RPC.

Local web UI

A dark, precision-tooling control room embedded in the binary via go:embed. Vanilla JS, no CDN, no frameworks.

Zero dependencies

No go get, no npm, no Docker. GUI primitives are OS syscalls; go.sum stays empty and builds are reproducible.

Cross-platform

One codebase compiles to Windows (syscall), macOS (CoreGraphics), and Linux (X11/XTest, Wayland fallback).

World model

Progressive disclosure: screenshot + pruned accessibility tree every turn, with on-demand clipboard, files, processes and hardware tools picked by cost.

The loop

Screenshot → decision → action

Each iteration mirrors how a person works at a computer — observe, decide, act, verify.

Capture

Take a screenshot of the live screen.

Reason

The LLM reads the screen and returns a structured action.

Clear

The whitelist approves, denies, or pauses for the user.

Act

Human-like mouse and keyboard input executes it.

Audit

Screenshot, decision, and result are logged.

Security

Designed to resist prompt injection

The agent reads untrusted pixels and can drive real input and shell commands — so containment is built into every layer, not bolted on.

Default-deny whitelist

Every action — replayed, agent-driven, or via MCP — clears an allow / deny / ask glob-rule engine before it executes.

Injection-contained replay

During replay the model only sees the screen at genuine branch points, scoped to the recording — on-screen text can't hijack the run.

No shell interpolation

Shell actions run as explicit argv, never re-parsed by a shell — removing the classic command-injection surface.

Loopback · audited · encrypted

The server binds 127.0.0.1; actions stream to an append-only redacted NDJSON audit; secrets are AES-256-GCM encrypted at rest.

Model Context Protocol

Drive your desktop from any MCP client

Exposes screenshot, mouse_*, type_string, key_press, shell_command and more as MCP tools.
Pure standard-library JSON-RPC 2.0 over stdio — no SDK, no dependencies.
Every tool call still passes the same whitelist and audit engine.
Works with Claude Desktop, IDEs, and custom agents.

// claude_desktop_config.json
{
  "mcpServers": {
    "orion": {
      "command": "orion",
      "args": ["mcp"]
    }
  }
}

Quickstart

Build and run in seconds

Requires Go 1.22+. No other toolchain, runtime, or package manager.

# Download a prebuilt binary from the releases page,
# or build from source (Go 1.22+, zero dependencies):
git clone https://github.com/DebajyotiSaikia/computer-use
cd orion
go build -trimpath -ldflags="-s -w" -o orion .

# Start the agent + local web UI
./orion start

# Authenticate a provider
./orion auth anthropic

# Or expose your desktop to any MCP client
./orion mcp

Driving it from the command line

Every option the web UI exposes is also a flag, so a script, a CI job, or another agent can drive Orion with no interface at all. Progress streams out as JSON and the exit code says what happened.

# Run a task headlessly and stop when it is done
orion run --prompt "Open the invoice in Preview and save it as PDF to ~/Desktop" \
          --model claude-sonnet-4-6 \
          --allow-action write_file \
          --timeout 10m

# Machine-readable: one JSON object per event, then a result
orion run --prompt-file task.txt --output-format stream-json
#  {"type":"action_executed","action":{"type":"mouse_click",...},"ts":"..."}
#  {"type":"result","status":"completed","actions":12,"exit_code":0,...}

# Read the prompt from stdin, and branch on the outcome
echo "Summarise what is on my screen" | orion run --prompt - --output-format json
#  exit 0 completed · 3 task_fail · 4 timed out · 5 cancelled

# Discover models, review what the agent asked to remember
orion models --format json
orion memory list

Full flag reference: docs/cli.md — or orion help.

View on GitHub

Desktop automationthat acts like a human

Record once. Replay deterministically.Notice when it drifts.

Everything an operator needs

Human-like input

Whitelist & audit

Record & replay

Multi-provider LLM

MCP server

Local web UI

Zero dependencies

Cross-platform

World model

Screenshot → decision → action

Capture

Reason

Clear

Act

Audit

Designed to resist prompt injection

Default-deny whitelist

Injection-contained replay

No shell interpolation

Loopback · audited · encrypted

Drive your desktop from any MCP client

Build and run in seconds

Driving it from the command line

Desktop automation
that acts like a human

Record once. Replay deterministically.
Notice when it drifts.