Give AI
hands.

One protocol to see, decide, and act on any interface. macOS apps, web pages, iOS simulators — all through snapshot → think → act.

macOS

Web

iOS

Android

agent-control — observe → act → observe

# Install

$ npm install -g agent-control

$ agent-control doctor

✅ Node.js >= 18 ✅ Playwright ✅ Chromium All checks passed.

# See the screen

$ agent-control -p web snapshot

12 interactive elements
@e8 text "Name" @e10 email "Email" @e18 submit "Create Account"

# Act

$ agent-control -p web fill @e8 "Alice"

✓ { ok: true }

The Loop

No pre-scripted steps. The AI observes the current state, decides what to do, acts, then observes again. Like a human would.

👁

Observe

Screenshot + element tree with @ref identifiers

→

🧠

Decide

LLM sees the UI, picks the next action

→

🤚

Act

Click, type, scroll — through the unified protocol

→

🔄

Repeat

Until the goal is reached

Two Ways to Use

Let the AI figure it out, or define every step. Pick the right tool for the job.

Auto Mode

Give a goal, AI decides how

Natural language objective → LLM loops snapshot→think→act until done. Best for exploratory tasks, testing new apps, one-off automation.

$ agent-control auto -p web \
  --goal "Sign up with name Alice and email alice@test.com" \
  --url https://example.com/signup

# AI observes the form, fills fields, clicks submit
# No scripting needed

Flow DSL

Define steps, run deterministically

JSON-declared action sequences with verify/retry. Best for regression tests, CI pipelines, repeatable workflows.

// signup-flow.json
{ "platform": "web",
  "steps": [
    { "action": "fill", "find": ["Name"], "value": "Alice" },
    { "action": "click", "find": ["Create Account"] },
    { "action": "verify", "contains": "Welcome" }
  ] }

Four Drivers, One Protocol

macOS

Accessibility API

Native Swift CLI that reads the AX tree and acts via AXPress, CGEvent, or coordinate fallback. Operates any macOS app, Electron apps, and menubar apps. Use --app to target by name.

Swift · ApplicationServices · CGEvent

Web

Playwright

Headless Chromium with chain commands. Open a URL, snapshot the DOM, fill forms, click buttons — all in one pipeline.

Node.js · Playwright · Chromium

iOS

idb

Uses Facebook's idb to describe UI elements and tap by coordinates. Auto-detects booted Simulator. Future: real device support via USB.

idb · xcrun simctl · Simulator

Android Experimental

adb + uiautomator

Dumps the UI hierarchy via uiautomator, taps by coordinates, swipes, launches apps. Works with emulators and physical devices over adb.

adb · uiautomator · Android SDK

Unified Actions

Action	Syntax	macOS	Web	iOS
observe	observe [--ss] [--tree]	✓	✓	✓
click	click @ref	✓	✓	✓
longpress	longpress @ref [--duration=ms]	✓	✓	✓
dblclick	dblclick @ref	✓	✓	—
rightclick	rightclick @ref	✓	✓	—
fill	fill @ref "text"	✓	✓	✓
press	press <key>	✓	✓	—
scroll	scroll <up\|down>	✓	✓	✓
drag	drag @ref1 @ref2	✓	✓	—
screenshot	screenshot [path]	✓	✓	✓
open	open <url>	—	✓	—
wait	wait --idle \| @ref \| --url	—	✓	—
find	find <text>	—	✓	—
eval	eval <js>	—	✓	—
back/forward	back \| forward	—	✓	—

For AI Agents

If you're an AI agent reading this page, here's what you need to know:

Installation:

git clone https://github.com/momomo-agent/agent-control
cd agent-control && npm install && npm link

Basic workflow:

agent-control -p <platform> snapshot — See what's on screen
Analyze the output, find target element (e.g. @e3)
agent-control -p <platform> click @e3 — Interact with it
agent-control -p <platform> screenshot /tmp/verify.png — Verify result

Platforms:

-p web — Control web pages (Playwright)
-p macos --app <name> — Control macOS apps (Accessibility API)
-p ios — Control iOS Simulator (idb)
-p android — Control Android emulator/device (adb)

Full documentation: GitHub README

Give AIhands.