Mac 控制:面向 AI 智能体的 macOS UI 自动化 - Openclaw Skills
作者:互联网
2026-03-29
什么是 Mac 控制?
Mac 控制技能为 AI 智能体与 macOS 桌面环境交互提供了一个强大的接口。通过利用 cliclick 和 AppleScript 等行业标准工具,该技能实现了精准的鼠标移动、点击和键盘输入。它旨在弥合无界面代码执行与图形用户界面交互之间的差距,允许 Openclaw Skills 在原生应用程序和 Web 浏览器中执行任务。
该技能可处理复杂的桌面场景,包括高分辨率 Retina 显示屏的坐标缩放和窗口管理。无论您是在 Chrome 中自动执行重复性任务,还是与系统对话框进行交互,Mac 控制都能通过基于截图的验证工作流确保可靠执行。它是任何需要深度操作系统集成的 Openclaw Skills 部署的重要组成部分。
下载入口:https://github.com/openclaw/skills/tree/main/skills/easonc13/mac-control
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install mac-control
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 mac-control。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
Mac 控制 应用场景
- 自动化与 Chrome 扩展程序和浏览器工具栏的交互。
- 关闭阻碍自动化工作流的系统级对话框或安全提示。
- 对原生 macOS 应用程序进行 GUI 测试。
- 处理 CLI 工具不足以应对的复杂多窗口工作流。
- 在鼠标事件被屏蔽时,使用基于键盘的备选方案导航 Google OAuth 等受保护的登录页面。
- 捕获:该技能使用内置的 screencapture 工具对当前显示状态进行静默快照。
- 分析:智能体分析截图以识别 UI 元素及其特定的像素坐标。
- 校准:如果使用 Retina 显示屏,该技能会计算 2:1 的坐标缩放以确保点击精度。
- 执行:它触发 cliclick 或 osascript 来执行鼠标动作(点击、拖动、双击)或键盘输入。
- 验证:获取后续截图以确认 UI 状态已按预期更改,从而完成 Openclaw Skills 反馈循环。
Mac 控制 配置指南
通过 Homebrew 安装必要的依赖项,并为您的环境授予所需的辅助功能权限:
brew install cliclick imagemagick
# 在系统设置中授予辅助功能权限
# 隐私与安全性 -> 辅助功能 -> 添加您的终端或 /opt/homebrew/bin/node
使用提供的校准脚本验证您的显示缩放:
./scripts/calibrate-cursor.sh
Mac 控制 数据架构与分类体系
该技能通过视觉快照和系统查询管理 UI 数据,以确保 Openclaw Skills 具有高度的情境感知能力:
| 数据元素 | 存储/格式 | 用途 |
|---|---|---|
| UI 快照 | /tmp/screenshot.png | 用于视觉分析和坐标映射的临时图像文件。 |
| 窗口边界 | AppleScript 数组 | 活动应用程序窗口的详细(左、上、右、下)坐标。 |
| 光标位置 | X, Y 坐标 | 通过 cliclick 实时追踪系统鼠标位置。 |
| 进程元数据 | 字符串列表 | 运行中的非后台进程的标识符,用于定位特定应用。 |
name: mac-control
description: Control Mac via mouse/keyboard automation using cliclick and AppleScript. Use for clicking UI elements, taking screenshots, getting window bounds, handling coordinate scaling on Retina displays, and automating UI interactions like clicking Chrome extension icons, dismissing dialogs, or toolbar buttons.
Mac Control
Automate Mac UI interactions using cliclick (mouse/keyboard) and system tools.
Tools
- cliclick:
/opt/homebrew/bin/cliclick- mouse/keyboard control - screencapture: Built-in screenshot tool
- magick: ImageMagick for image analysis
- osascript: AppleScript for window info
Coordinate System (Eason's Mac Mini)
Current setup: 1920x1080 display, 1:1 scaling (no conversion needed!)
- Screenshot coords = cliclick coords
- If screenshot shows element at (800, 500), click at (800, 500)
For Retina Displays (2x)
If screenshot is 2x the logical resolution:
# Convert: cliclick_coords = screenshot_coords / 2
cliclick c:$((screenshot_x / 2)),$((screenshot_y / 2))
Calibration Script
Run to verify your scale factor:
/Users/eason/clawd/scripts/calibrate-cursor.sh
cliclick Commands
# Click at coordinates
/opt/homebrew/bin/cliclick c:500,300
# Move mouse (no click) - Note: may not visually update cursor
/opt/homebrew/bin/cliclick m:500,300
# Double-click
/opt/homebrew/bin/cliclick dc:500,300
# Right-click
/opt/homebrew/bin/cliclick rc:500,300
# Click and drag
/opt/homebrew/bin/cliclick dd:100,100 du:200,200
# Type text
/opt/homebrew/bin/cliclick t:"hello world"
# Press key (Return, Escape, Tab, etc.)
/opt/homebrew/bin/cliclick kp:return
/opt/homebrew/bin/cliclick kp:escape
# Key with modifier (cmd+w to close window)
/opt/homebrew/bin/cliclick kd:cmd t:w ku:cmd
# Get current mouse position
/opt/homebrew/bin/cliclick p
# Wait before action (ms)
/opt/homebrew/bin/cliclick -w 100 c:500,300
Screenshots
# Full screen (silent)
/usr/sbin/screencapture -x /tmp/screenshot.png
# With cursor (may not work for custom cursor colors)
/usr/sbin/screencapture -C -x /tmp/screenshot.png
# Interactive region selection
screencapture -i region.png
# Delayed capture
screencapture -T 3 -x delayed.png # 3 second delay
Workflow: Screenshot → Analyze → Click
Best practice for reliable clicking:
-
Take screenshot
/usr/sbin/screencapture -x /tmp/screen.png -
View screenshot (Read tool) to find target coordinates
-
Click at those coordinates (1:1 on 1920x1080)
/opt/homebrew/bin/cliclick c:X,Y -
Verify by taking another screenshot
Example: Click a button
# 1. Screenshot
/usr/sbin/screencapture -x /tmp/before.png
# 2. View image, find button at (850, 450)
# (Use Read tool on /tmp/before.png)
# 3. Click
/opt/homebrew/bin/cliclick c:850,450
# 4. Verify
/usr/sbin/screencapture -x /tmp/after.png
Window Bounds
# Get Chrome window bounds
osascript -e 'tell application "Google Chrome" to get bounds of front window'
# Returns: 0, 38, 1920, 1080 (left, top, right, bottom)
Common Patterns
Chrome Extension Icon (Browser Relay)
Use AppleScript to find exact button position:
# Find Clawdbot extension button position
osascript -e '
tell application "System Events"
tell process "Google Chrome"
set toolbarGroup to group 2 of group 3 of toolbar 1 of group 1 of group 1 of group 1 of group 1 of group 1 of window 1
set allButtons to every pop up button of toolbarGroup
repeat with btn in allButtons
if description of btn contains "Clawdbot" then
return position of btn & size of btn
end if
end repeat
end tell
end tell
'
# Output: 1755, 71, 34, 34 (x, y, width, height)
# Click center of button
# center_x = x + width/2 = 1755 + 17 = 1772
# center_y = y + height/2 = 71 + 17 = 88
/opt/homebrew/bin/cliclick c:1772,88
Clicking by Color Detection
If you need to find a specific colored element:
# Find red (#FF0000) pixels in screenshot
magick /tmp/screen.png txt:- | grep "#FF0000" | head -5
# Calculate center of colored region
magick /tmp/screen.png txt:- | grep "#FF0000" | awk -F'[,:]' '
BEGIN{sx=0;sy=0;c=0}
{sx+=$1;sy+=$2;c++}
END{printf "Center: (%d, %d)
", sx/c, sy/c}'
Dialog Button Click
- Screenshot the dialog
- Find button coordinates visually
- Click (no scaling on 1920x1080)
# Example: Click "OK" button at (960, 540)
/opt/homebrew/bin/cliclick c:960,540
Type in Text Field
# Click to focus, then type
/opt/homebrew/bin/cliclick c:500,300
sleep 0.2
/opt/homebrew/bin/cliclick t:"Hello world"
/opt/homebrew/bin/cliclick kp:return
Helper Scripts
Located in /Users/eason/clawd/scripts/:
calibrate-cursor.sh- Calibrate coordinate scalingclick-at-visual.sh- Click at screenshot coordinatesget-cursor-pos.sh- Get current cursor positionattach-browser-relay.sh- Auto-click Browser Relay extension
Keyboard Navigation (When Clicks Fail)
Google OAuth and protected pages block synthetic mouse clicks! Use keyboard navigation:
# Tab to navigate between elements
osascript -e 'tell application "System Events" to keystroke tab'
# Shift+Tab to go backwards
osascript -e 'tell application "System Events" to key code 48 using shift down'
# Enter to activate focused element
osascript -e 'tell application "System Events" to keystroke return'
# Full workflow: Tab 3 times then Enter
osascript -e '
tell application "System Events"
keystroke tab
delay 0.15
keystroke tab
delay 0.15
keystroke tab
delay 0.15
keystroke return
end tell
'
When to use keyboard instead of mouse:
- Google OAuth / login pages (anti-automation protection)
- Popup dialogs with focus trapping
- When mouse clicks consistently fail after verification
Chrome Browser Relay & Multiple Windows
Problem: Browser Relay may list tabs from multiple Chrome windows, causing snapshot to fail on the desired tab.
Solution:
- Close extra Chrome windows before automation
- Or ensure only the target window has relay attached
Check tabs visible to relay:
# In agent code
browser action=tabs profile=chrome
If target tab missing from list → wrong window attached.
Verify single window:
osascript -e 'tell application "Google Chrome" to return count of windows'
Verify-Before-Click Workflow
Critical: Always verify coordinates BEFORE clicking important buttons.
# 1. Take screenshot
osascript -e 'do shell script "/usr/sbin/screencapture -x /tmp/before.png"'
# 2. View screenshot (Read tool), note target position
# 3. Move mouse to verify position (optional)
python3 -c "import pyautogui; pyautogui.moveTo(X, Y)"
osascript -e 'do shell script "/usr/sbin/screencapture -C -x /tmp/verify.png"'
# 4. Check cursor is on target, THEN click
/opt/homebrew/bin/cliclick c:X,Y
# 5. Take screenshot to confirm action worked
osascript -e 'do shell script "/usr/sbin/screencapture -x /tmp/after.png"'
Troubleshooting
Click lands wrong: Verify scale factor with calibration script
cliclick m: doesn't move cursor visually: Use c: (click) instead, or check with cliclick p to confirm position changed
Permission denied: System Settings → Privacy & Security → Accessibility → Add /opt/homebrew/bin/node
Window not found: Check exact app name:
osascript -e 'tell application "System Events" to get name of every process whose background only is false'
Clicks ignored on OAuth/protected pages: These pages block synthetic events. Use keyboard navigation (Tab + Enter) instead.
pyautogui vs cliclick coordinates differ: Stick with cliclick for consistency. pyautogui may have different coordinate mapping.
Quartz CGEvent clicks don't work: Some pages (Google OAuth) block low-level mouse events too. Keyboard is the only reliable method.
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
Seedance 助手:精通即梦 AI 视频生成 - Openclaw 技能
视觉 RPA:AI 驱动的桌面自动化 - Openclaw Skills
Clawzempic:优化大模型成本与显存 - Openclaw Skills
Feed to Markdown: RSS 和 Atom 转 Markdown 转换器 - Openclaw Skills
CUCHD 教职工门户登录:自动身份验证 - Openclaw Skills
RAG 工程师:优化检索增强生成 - Openclaw 技能
每日销售摘要:自动化电商报表 - Openclaw Skills
Microsoft 365 CLI 工具包:自动化 Outlook 和 OneDrive - Openclaw Skills
Splunk: 机器数据与 SIEM 分析 - Openclaw Skills
Miro 技能:自动化协作白板 - Openclaw Skills
AI精选
