Mac 控制:面向 AI 智能体的 macOS UI 自动化 - Openclaw Skills

作者:互联网

2026-03-29

AI教程

什么是 Mac 控制?

Mac 控制技能为 AI 智能体与 macOS 桌面环境交互提供了一个强大的接口。通过利用 cliclick 和 AppleScript 等行业标准工具,该技能实现了精准的鼠标移动、点击和键盘输入。它旨在弥合无界面代码执行与图形用户界面交互之间的差距,允许 Openclaw Skills 在原生应用程序和 Web 浏览器中执行任务。

该技能可处理复杂的桌面场景,包括高分辨率 Retina 显示屏的坐标缩放和窗口管理。无论您是在 Chrome 中自动执行重复性任务,还是与系统对话框进行交互,Mac 控制都能通过基于截图的验证工作流确保可靠执行。它是任何需要深度操作系统集成的 Openclaw Skills 部署的重要组成部分。

下载入口:https://github.com/openclaw/skills/tree/main/skills/easonc13/mac-control

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install mac-control

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 mac-control。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Mac 控制 应用场景

  • 自动化与 Chrome 扩展程序和浏览器工具栏的交互。
  • 关闭阻碍自动化工作流的系统级对话框或安全提示。
  • 对原生 macOS 应用程序进行 GUI 测试。
  • 处理 CLI 工具不足以应对的复杂多窗口工作流。
  • 在鼠标事件被屏蔽时,使用基于键盘的备选方案导航 Google OAuth 等受保护的登录页面。
Mac 控制 工作原理
  1. 捕获:该技能使用内置的 screencapture 工具对当前显示状态进行静默快照。
  2. 分析:智能体分析截图以识别 UI 元素及其特定的像素坐标。
  3. 校准:如果使用 Retina 显示屏,该技能会计算 2:1 的坐标缩放以确保点击精度。
  4. 执行:它触发 cliclick 或 osascript 来执行鼠标动作(点击、拖动、双击)或键盘输入。
  5. 验证:获取后续截图以确认 UI 状态已按预期更改,从而完成 Openclaw Skills 反馈循环。

Mac 控制 配置指南

通过 Homebrew 安装必要的依赖项,并为您的环境授予所需的辅助功能权限:

brew install cliclick imagemagick
# 在系统设置中授予辅助功能权限
# 隐私与安全性 -> 辅助功能 -> 添加您的终端或 /opt/homebrew/bin/node

使用提供的校准脚本验证您的显示缩放:

./scripts/calibrate-cursor.sh

Mac 控制 数据架构与分类体系

该技能通过视觉快照和系统查询管理 UI 数据,以确保 Openclaw Skills 具有高度的情境感知能力:

数据元素 存储/格式 用途
UI 快照 /tmp/screenshot.png 用于视觉分析和坐标映射的临时图像文件。
窗口边界 AppleScript 数组 活动应用程序窗口的详细(左、上、右、下)坐标。
光标位置 X, Y 坐标 通过 cliclick 实时追踪系统鼠标位置。
进程元数据 字符串列表 运行中的非后台进程的标识符,用于定位特定应用。
name: mac-control
description: Control Mac via mouse/keyboard automation using cliclick and AppleScript. Use for clicking UI elements, taking screenshots, getting window bounds, handling coordinate scaling on Retina displays, and automating UI interactions like clicking Chrome extension icons, dismissing dialogs, or toolbar buttons.

Mac Control

Automate Mac UI interactions using cliclick (mouse/keyboard) and system tools.

Tools

  • cliclick: /opt/homebrew/bin/cliclick - mouse/keyboard control
  • screencapture: Built-in screenshot tool
  • magick: ImageMagick for image analysis
  • osascript: AppleScript for window info

Coordinate System (Eason's Mac Mini)

Current setup: 1920x1080 display, 1:1 scaling (no conversion needed!)

  • Screenshot coords = cliclick coords
  • If screenshot shows element at (800, 500), click at (800, 500)

For Retina Displays (2x)

If screenshot is 2x the logical resolution:

# Convert: cliclick_coords = screenshot_coords / 2
cliclick c:$((screenshot_x / 2)),$((screenshot_y / 2))

Calibration Script

Run to verify your scale factor:

/Users/eason/clawd/scripts/calibrate-cursor.sh

cliclick Commands

# Click at coordinates
/opt/homebrew/bin/cliclick c:500,300

# Move mouse (no click) - Note: may not visually update cursor
/opt/homebrew/bin/cliclick m:500,300

# Double-click
/opt/homebrew/bin/cliclick dc:500,300

# Right-click
/opt/homebrew/bin/cliclick rc:500,300

# Click and drag
/opt/homebrew/bin/cliclick dd:100,100 du:200,200

# Type text
/opt/homebrew/bin/cliclick t:"hello world"

# Press key (Return, Escape, Tab, etc.)
/opt/homebrew/bin/cliclick kp:return
/opt/homebrew/bin/cliclick kp:escape

# Key with modifier (cmd+w to close window)
/opt/homebrew/bin/cliclick kd:cmd t:w ku:cmd

# Get current mouse position
/opt/homebrew/bin/cliclick p

# Wait before action (ms)
/opt/homebrew/bin/cliclick -w 100 c:500,300

Screenshots

# Full screen (silent)
/usr/sbin/screencapture -x /tmp/screenshot.png

# With cursor (may not work for custom cursor colors)
/usr/sbin/screencapture -C -x /tmp/screenshot.png

# Interactive region selection
screencapture -i region.png

# Delayed capture
screencapture -T 3 -x delayed.png  # 3 second delay

Workflow: Screenshot → Analyze → Click

Best practice for reliable clicking:

  1. Take screenshot

    /usr/sbin/screencapture -x /tmp/screen.png
    
  2. View screenshot (Read tool) to find target coordinates

  3. Click at those coordinates (1:1 on 1920x1080)

    /opt/homebrew/bin/cliclick c:X,Y
    
  4. Verify by taking another screenshot

Example: Click a button

# 1. Screenshot
/usr/sbin/screencapture -x /tmp/before.png

# 2. View image, find button at (850, 450)
# (Use Read tool on /tmp/before.png)

# 3. Click
/opt/homebrew/bin/cliclick c:850,450

# 4. Verify
/usr/sbin/screencapture -x /tmp/after.png

Window Bounds

# Get Chrome window bounds
osascript -e 'tell application "Google Chrome" to get bounds of front window'
# Returns: 0, 38, 1920, 1080  (left, top, right, bottom)

Common Patterns

Chrome Extension Icon (Browser Relay)

Use AppleScript to find exact button position:

# Find Clawdbot extension button position
osascript -e '
tell application "System Events"
    tell process "Google Chrome"
        set toolbarGroup to group 2 of group 3 of toolbar 1 of group 1 of group 1 of group 1 of group 1 of group 1 of window 1
        set allButtons to every pop up button of toolbarGroup
        repeat with btn in allButtons
            if description of btn contains "Clawdbot" then
                return position of btn & size of btn
            end if
        end repeat
    end tell
end tell
'
# Output: 1755, 71, 34, 34 (x, y, width, height)

# Click center of button
# center_x = x + width/2 = 1755 + 17 = 1772
# center_y = y + height/2 = 71 + 17 = 88
/opt/homebrew/bin/cliclick c:1772,88

Clicking by Color Detection

If you need to find a specific colored element:

# Find red (#FF0000) pixels in screenshot
magick /tmp/screen.png txt:- | grep "#FF0000" | head -5

# Calculate center of colored region
magick /tmp/screen.png txt:- | grep "#FF0000" | awk -F'[,:]' '
  BEGIN{sx=0;sy=0;c=0}
  {sx+=$1;sy+=$2;c++}
  END{printf "Center: (%d, %d)
", sx/c, sy/c}'

Dialog Button Click

  1. Screenshot the dialog
  2. Find button coordinates visually
  3. Click (no scaling on 1920x1080)
# Example: Click "OK" button at (960, 540)
/opt/homebrew/bin/cliclick c:960,540

Type in Text Field

# Click to focus, then type
/opt/homebrew/bin/cliclick c:500,300
sleep 0.2
/opt/homebrew/bin/cliclick t:"Hello world"
/opt/homebrew/bin/cliclick kp:return

Helper Scripts

Located in /Users/eason/clawd/scripts/:

  • calibrate-cursor.sh - Calibrate coordinate scaling
  • click-at-visual.sh - Click at screenshot coordinates
  • get-cursor-pos.sh - Get current cursor position
  • attach-browser-relay.sh - Auto-click Browser Relay extension

Keyboard Navigation (When Clicks Fail)

Google OAuth and protected pages block synthetic mouse clicks! Use keyboard navigation:

# Tab to navigate between elements
osascript -e 'tell application "System Events" to keystroke tab'

# Shift+Tab to go backwards
osascript -e 'tell application "System Events" to key code 48 using shift down'

# Enter to activate focused element
osascript -e 'tell application "System Events" to keystroke return'

# Full workflow: Tab 3 times then Enter
osascript -e '
tell application "System Events"
    keystroke tab
    delay 0.15
    keystroke tab
    delay 0.15
    keystroke tab
    delay 0.15
    keystroke return
end tell
'

When to use keyboard instead of mouse:

  • Google OAuth / login pages (anti-automation protection)
  • Popup dialogs with focus trapping
  • When mouse clicks consistently fail after verification

Chrome Browser Relay & Multiple Windows

Problem: Browser Relay may list tabs from multiple Chrome windows, causing snapshot to fail on the desired tab.

Solution:

  1. Close extra Chrome windows before automation
  2. Or ensure only the target window has relay attached

Check tabs visible to relay:

# In agent code
browser action=tabs profile=chrome

If target tab missing from list → wrong window attached.

Verify single window:

osascript -e 'tell application "Google Chrome" to return count of windows'

Verify-Before-Click Workflow

Critical: Always verify coordinates BEFORE clicking important buttons.

# 1. Take screenshot
osascript -e 'do shell script "/usr/sbin/screencapture -x /tmp/before.png"'

# 2. View screenshot (Read tool), note target position

# 3. Move mouse to verify position (optional)
python3 -c "import pyautogui; pyautogui.moveTo(X, Y)"
osascript -e 'do shell script "/usr/sbin/screencapture -C -x /tmp/verify.png"'

# 4. Check cursor is on target, THEN click
/opt/homebrew/bin/cliclick c:X,Y

# 5. Take screenshot to confirm action worked
osascript -e 'do shell script "/usr/sbin/screencapture -x /tmp/after.png"'

Troubleshooting

Click lands wrong: Verify scale factor with calibration script

cliclick m: doesn't move cursor visually: Use c: (click) instead, or check with cliclick p to confirm position changed

Permission denied: System Settings → Privacy & Security → Accessibility → Add /opt/homebrew/bin/node

Window not found: Check exact app name:

osascript -e 'tell application "System Events" to get name of every process whose background only is false'

Clicks ignored on OAuth/protected pages: These pages block synthetic events. Use keyboard navigation (Tab + Enter) instead.

pyautogui vs cliclick coordinates differ: Stick with cliclick for consistency. pyautogui may have different coordinate mapping.

Quartz CGEvent clicks don't work: Some pages (Google OAuth) block low-level mouse events too. Keyboard is the only reliable method.

相关推荐