Browser CDP: 为 AI Agents 提供真实 Chrome 自动化 - Openclaw Skills

作者:互联网

2026-04-11

AI教程

什么是 Browser CDP Automation?

browser-cdp 技能是为 Openclaw Skills 设计的高级自动化层,旨在桥接 AI agents 与本地 Google Chrome 实例。通过利用 Chrome DevTools Protocol (CDP),该技能允许 agent 在真实的浏览器环境中操作,并带入您现有的登录会话、cookies 和完整的 JavaScript 执行能力。它专门设计用于处理标准 HTTP 获取器无法胜任的复杂 Web 任务,例如与高度动态的单页面应用交互或应对复杂的安全挑战。

此实现提供了一个暴露 REST API 的轻量级 CDP Proxy,使 agent 能够轻松执行点击、滚动和表单填写等交互操作。对于在 Openclaw Skills 生态系统中构建需要真实类人 Web 导航和数据抓取的稳健工作流的开发人员来说,这是一个必不可少的组件。

下载入口:https://github.com/openclaw/skills/tree/main/skills/0xcjl/browser-cdp

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install browser-cdp

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 browser-cdp。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Browser CDP Automation 应用场景

  • 绕过 Google、Bing 和 YouTube 搜索结果等搜索引擎的反机器人保护。
  • 访问并从 T@witter/X、LinkedIn 或私有内部仪表板等登录保护平台提取内容。
  • 自动化复杂的跨步骤 Web 工作流,包括文件上传、拖放操作和表单提交。
  • 捕获动态渲染页面的高质量截图,用于视觉审计或数据验证。
  • 从使用静态 HTML 解析器无法正确渲染的 JavaScript 重型网站抓取内容。
Browser CDP Automation 工作原理
  1. 用户启动带有特定远程调试端口和专用配置文件目录的 Chrome 实例。
  2. Openclaw Skills agent 启动本地 CDP Proxy 服务器 (cdp-proxy.mjs),该服务器与 Chrome 建立 WebSocket 连接。
  3. 当任务需要浏览器交互时,agent 向代理的 REST 端点发送结构化的 HTTP 请求。
  4. 代理将这些请求转换为浏览器实时执行的低级 CDP 命令。
  5. 浏览器将请求的数据(如渲染后的 DOM、页面源码或截图)返回给 agent 进行分析。

Browser CDP Automation 配置指南

要将此技能集成到您的 Openclaw Skills 设置中,首先启动已启用调试端口的 Chrome:

# macOS - Start Chrome with a dedicated debugging profile
pkill -9 "Google Chrome"; sleep 2
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" r
  --remote-debugging-port=9222 r
  --user-data-dir=/tmp/chrome-debug-profile r
  --no-first-run &

Chrome 运行后,启动 CDP Proxy 服务器:

# Start the proxy and verify connection
node ~/.openclaw/skills/browser-cdp/scripts/cdp-proxy.mjs &
sleep 3
curl -s http://localhost:3456/health

Browser CDP Automation 数据架构与分类体系

该技能将其操作数据和学习到的行为组织成结构化的目录格式,以确保在不同 Openclaw Skills 任务中的一致性。

Path Description
~/.openclaw/skills/browser-cdp/references/site-patterns/ 特定域名的交互规则和已知选择器,用于更好的自动化。
~/.openclaw/skills/browser-cdp/references/usage-log.md 浏览器活动、成功率和触发事件的详细日志。
/tmp/chrome-debug-profile 持久化浏览器会话、cookies 和缓存的本地目录。
name: browser-cdp
description: >
  Real Chrome browser automation via CDP Proxy — access pages with full user login state,
  bypass anti-bot detection, perform interactive operations (click/fill/scroll), extract
  dynamic JavaScript-rendered content, take screenshots.
  Triggers (satisfy ANY one):
  - Target URL is a search results page (Bing/Google/YouTube search)
  - Static fetch (agent-reach/WebFetch) is blocked by anti-bot (captcha/intercept/empty)
  - Need to read logged-in user's private content
  - YouTube, T@witter/X, Xiaohongshu, WeChat public accounts, etc.
  - Task involves "click", "fill form", "scroll", "drag"
  - Need screenshot or dynamic-rendered page capture
metadata:
  author: adapted from eze-is/web-access (MIT licensed)
  version: "1.0.0"

What is browser-cdp?

browser-cdp connects directly to your local Chrome via Chrome DevTools Protocol (CDP), giving the AI agent:

  • Full login state — your cookies and sessions are carried through
  • Anti-bot bypass — pages that block static fetchers (search results, video platforms)
  • Interactive operations — click, fill forms, scroll, drag, file upload
  • Dynamic content extraction — read JavaScript-rendered DOM
  • Screenshots — capture any page at any point

Architecture

Chrome (remote-debugging-port=9222)
    ↓ CDP WebSocket
CDP Proxy (cdp-proxy.mjs) — HTTP API on localhost:3456
    ↓ HTTP REST
OpenClaw AI Agent

Setup

1. Start Chrome with debugging port

# macOS — must use full binary path (not `open -a`)
pkill -9 "Google Chrome"; sleep 2
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" r
  --remote-debugging-port=9222 r
  --user-data-dir=/tmp/chrome-debug-profile r
  --no-first-run &

Verify:

curl -s http://127.0.0.1:9222/json/version

2. Start CDP Proxy

node ~/.openclaw/skills/browser-cdp/scripts/cdp-proxy.mjs &
sleep 3
curl -s http://localhost:3456/health
# {"status":"ok","connected":true,"sessions":0,"chromePort":9222}

API Reference

# List all tabs
curl -s http://localhost:3456/targets

# Open URL in new tab
curl -s "http://localhost:3456/new?url=https://example.com"

# Execute JavaScript
curl -s -X POST "http://localhost:3456/eval?target=TARGET_ID" r
  -d 'document.title'

# JS click (fast, preferred)
curl -s -X POST "http://localhost:3456/click?target=TARGET_ID" r
  -d 'button.submit'

# Real mouse click
curl -s -X POST "http://localhost:3456/clickAt?target=TARGET_ID" r
  -d '.upload-btn'

# Screenshot
curl -s "http://localhost:3456/screenshot?target=TARGET_ID&file=/tmp/shot.png"

# Scroll (lazy loading)
curl -s "http://localhost:3456/scroll?target=TARGET_ID&direction=bottom"

# Navigate
curl -s "http://localhost:3456/navigate?target=TARGET_ID&url=https://..."

# Close tab
curl -s "http://localhost:3456/close?target=TARGET_ID"

Tool Selection: Three-Layer Strategy

Scenario Use Reason
Public pages (GitHub, Wikipedia, blogs) agent-reach Fast, low token, structured
Search results (Bing/Google/YouTube) browser-cdp agent-reach blocked
Login-gated content browser-cdp No cookies in agent-reach
JS-rendered pages browser-cdp Reads rendered DOM
Simple automation, isolated screenshots agent-browser No Chrome setup
Large-scale parallel scraping agent-reach + parallel browser-cdp gets rate-limited

Decision flow:

Public content → agent-reach (fast, cheap)
Search results / blocked → browser-cdp
Still fails → agent-reach fallback + record in site-patterns

Known Limitations

  • Chrome must use a separate profile (/tmp/chrome-debug-profile)
  • Same-site parallel tabs may get rate-limited
  • Node.js 22+ required (native WebSocket)
  • macOS: use full binary path to start Chrome, not open -a

Site Patterns & Usage Log

~/.openclaw/skills/browser-cdp/references/site-patterns/   # per-domain experience
~/.openclaw/skills/browser-cdp/references/usage-log.md    # per-use tracking

Origin

Adapted from eze-is/web-access (MIT) for OpenClaw. A bug in the original (require() in ES module, reported here) is fixed in this version.

相关推荐