Airpoint:适用于 macOS 的 AI 驱动计算机操作工具 - Openclaw Skills

作者:互联网

2026-03-26

数值计算

什么是 Airpoint?

Airpoint 是一款专门为 macOS 生态系统设计的计算机使用智能体,允许用户使用简单的语言与电脑互动。通过整合辅助功能树数据、实时屏幕截图和视觉定位器,它弥补了人类意图与底层 UI 执行之间的鸿沟。这项技能有效地将您的 Mac 变成了一个能够导航 Safari、修改系统配置并从任何可见窗口提取数据的自主工作站。

作为 Openclaw Skills 的特色组件,Airpoint 利用 GPT-4o 和 Gemini 等先进推理模型来感知屏幕并执行多步计划。无论您是想自动化重复的管理任务,还是进行跨应用程序的数据录入,Airpoint 都提供了一个强大的 CLI 驱动框架,将自然语言指令转化为精准的系统动作。

下载入口:https://github.com/openclaw/skills/tree/main/skills/marioandf/airpoint

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install airpoint

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 airpoint。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Airpoint 应用场景

  • 在 Safari 或 Chrome 中自动化基于 Web 的研究和数据提取。
  • 无需手动点击即可修改深色模式、音量或显示配置等系统设置。
  • 总结来自 Mail、Slack 或 Word 文档等活动应用程序窗口的内容。
  • 通过 CLI 命令实现生产力工具和媒体播放器的免提操作。
  • 在运行自动化部署脚本后验证 UI 状态和视觉结果。
Airpoint 工作原理
  1. 用户通过 airpoint ask 命令提供自然语言提示来启动任务。
  2. 智能体使用 macOS 辅助功能树和屏幕截图捕获显示的当前状态。
  3. 通常由 Gemini 驱动的视觉定位器识别用于交互的特定 UI 目标坐标。
  4. 核心推理模型生成一个涉及鼠标移动、点击、输入和滚动的逐步计划。
  5. 智能体自主执行这些操作,同时监控屏幕以验证每个阶段的成功。
  6. 完成后,工具向用户返回文本摘要和最终状态的屏幕截图。

Airpoint 配置指南

要使用 Openclaw Skills 将此工具集成到您的工作流中,请按照以下配置步骤操作:

  1. 从 airpoint.app 下载 Airpoint macOS 应用程序。
  2. 在应用内通过导航至 设置 → 插件 → 安装 CLI 来安装 CLI 工具。
  3. 设置 → 助手 中提供用于推理的 OpenAI API 密钥和用于视觉定位的 Gemini API 密钥。
  4. 在 macOS 隐私与安全性设置中授予必要的 辅助功能屏幕录制 权限。
  5. 在终端中运行以下命令以验证安装:
airpoint status
airpoint ask "open Safari and search for Openclaw Skills"

Airpoint 数据架构与分类体系

Airpoint 在本地组织其运行数据和元数据,以确保隐私和性能。架构包括:

组件 描述 数据格式
屏幕截图 智能体进度的顺序捕获 PNG 文件
会话日志 动作和观察的按时间顺序记录 文本/JSON
配置 用户定义的灵敏度和 API 偏好 本地 Plist/数据库
系统指标 CPU、RAM 和散热状态的实时监控 通过 airpoint vitals 获取 JSON

所有特定于会话的数据都存储在 Application Support 目录中,以便于检索和审计。

name: airpoint
description: Control a Mac through natural language — open apps, click buttons, read the screen, type text, manage windows, and automate multi-step tasks via Airpoint's AI computer-use agent.
metadata: {"openclaw": {"emoji": "???", "homepage": "https://airpoint.app", "requires": {"bins": ["airpoint"]}, "os": ["darwin"]}}

Airpoint — AI Computer Use for macOS

Airpoint gives you an AI agent that can see and control a Mac — open apps, click UI elements, read on-screen text, type, scroll, drag, and manage windows. You give it a natural-language instruction and it carries out the task autonomously by perceiving the screen (accessibility tree + screenshots + visual locator), planning actions, executing them, and verifying the result.

Everything runs through the airpoint CLI.

Requirements

  • macOS (Apple Silicon or Intel)
  • Airpoint app — must be running. Download from airpoint.app.
  • Airpoint CLI — the airpoint command must be on PATH. Install it from the Airpoint app: Settings → Plugins → Install CLI.

Setup

Before using Airpoint's AI agent, the user must configure it in the Airpoint app (Settings → Assistant):

  1. AI model API key (required). Set an API key for the chosen provider:
    • OpenAI (recommended): model gpt-5.1 with reasoning effort low gives the best balance of cost, speed, and quality.
    • Anthropic and Google Gemini are also supported.
  2. Gemini API key (recommended). Even when using OpenAI or Anthropic as the primary model, a Google Gemini API key enables the visual locator — a secondary model (gemini-3-flash-preview) that finds UI targets on screen by analyzing screenshots. Without it, the agent relies on the accessibility tree only.
  3. macOS permissions. The app prompts on first launch, but verify these are granted in System Settings → Privacy & Security:
    • Accessibility — required for mouse/keyboard control.
    • Screen Recording — required for screenshots and screen perception.
    • Camera is only needed for hand tracking (not for the AI agent).
  4. Custom instructions (optional). In Settings → Assistant, add custom instructions to tailor the agent's behavior (e.g., preferred language, apps to avoid, workflows to follow).

If the user reports that airpoint ask fails or the agent can't see the screen, ask them to verify steps 1–3 above.

How to use

  1. Run airpoint ask "" to send a task to the on-device agent.
  2. The command blocks until the agent finishes (up to 5 minutes) and returns:
    • A text summary of what the agent did and the result.
    • One or more screenshot file paths showing the screen state after the task.
  3. Read the text output to confirm whether the task succeeded.
  4. If screenshots were returned, show the last screenshot to the user as visual confirmation of the result.
  5. If something went wrong or the task is stuck, run airpoint stop to cancel.

Example flow:

> airpoint ask "open Safari and search for 'OpenClaw'"
Opened Safari, typed 'OpenClaw' into the address bar, and pressed Enter.
The search results page is now displayed.

1 screenshot(s) saved to session abc123
  └ screenshots/step_3.png (/Users/you/Library/Application Support/com.medhuelabs.airpoint/sessions/abc123/screenshots/step_3.png)

After receiving this, show the screenshot to the user so they can see what happened.

Commands

Ask the AI agent to do something (primary command)

This is the most important command. It sends a natural-language task to Airpoint's built-in computer-use agent which can see the screen, move the mouse, click, type, scroll, open apps via Spotlight, manage windows, and verify its own actions.

# Synchronous — waits for the agent to finish (up to 5 min) and returns output
airpoint ask "open Safari and go to github.com"
airpoint ask "what's on my screen right now?"
airpoint ask "find the Slack notification and read it"
airpoint ask "open System Settings and enable Dark Mode"
airpoint ask "open Mail, find the latest email from John, and summarize it"

# Fire-and-forget — returns immediately
airpoint ask "open Spotify and play my liked songs" --no-wait

# Show the assistant panel on screen while running
airpoint ask "open System Settings and enable Dark Mode" --show-panel

Stop a running task

airpoint stop

Cancels the currently running assistant task. Use this if a task is stuck or taking too long.

Capture a screenshot

airpoint see

Returns a screenshot of the current display. Useful for verifying state before or after issuing an ask command.

Check status

airpoint status
airpoint status --json

Returns app version and current state (tracking active, etc.).

Hand tracking (secondary)

Airpoint also supports hands-free cursor control via camera-based hand tracking. These commands start/stop that feature:

airpoint tracking on
airpoint tracking off
airpoint tracking        # show current state

Read or change settings

airpoint settings list             # all current settings
airpoint settings list --json      # machine-readable
airpoint settings get cursor.sensitivity
airpoint settings set cursor.sensitivity 1.5

Common settings: cursor.sensitivity (default 1.0), cursor.acceleration (default true), scroll.sensitivity (default 1.0), scroll.inertia (default true).

System vitals

airpoint vitals          # CPU, RAM, temperature
airpoint vitals --json

Launch the app

airpoint open            # opens/focuses the Airpoint macOS app

Tips

  • Use airpoint ask for almost everything. The agent can read the screen, interact with any app, and chain multi-step workflows autonomously.
  • Always use --json when you need to parse output programmatically.
  • The agent can answer questions about what's on screen ("what app is in the foreground?", "read the error message in this dialog").
  • Airpoint is a notarized, code-signed macOS app. Download it from airpoint.app.