Playwright 浏览器自动化:高级网页抓取 - Openclaw Skills
作者:互联网
2026-03-26
什么是 Playwright 浏览器自动化?
Playwright 浏览器自动化是一个复杂的工具,旨在为 AI 智能体提供对 Web 浏览器的直接、可靠控制。与依赖中间层协议的标准方法不同,该技能直接与 Playwright API 接口,以确保最大的稳定性和性能。它使智能体能够在 Chromium、Firefox 和 WebKit 浏览器中导航复杂的 Web 应用程序,与动态元素交互,并执行高保真数据提取。
通过将这些 Openclaw Skills 整合到您的工作流中,您可以使用面向用户的定位器(Locators)来替代脆弱的选择器,从而确保即使在网站结构发生变化时,自动化脚本依然保持稳健。无论您是在构建自动化测试套件、从单页应用(SPA)中抓取数据,还是生成视觉资产,该技能都为现代网页自动化提供了坚实的基础。
下载入口:https://github.com/openclaw/skills/tree/main/skills/manikantasai1987/manikantasai-playwright-automation
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install manikantasai-playwright-automation
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 manikantasai-playwright-automation。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
Playwright 浏览器自动化 应用场景
- 从重度依赖 JavaScript 的网站和单页应用(SPA)中自动化提取复杂数据。
- 生成自动化的 PDF 报告和高分辨率屏幕截图,用于视觉回归或审计。
- 录制浏览器会话视频,以记录错误重现或用户操作流程。
- 处理多步骤表单提交和包括 Cookie 管理在内的身份验证序列。
- 测试各种移动端模拟和网络条件下的 Web 应用程序行为。
- AI 智能体使用 Playwright API 初始化浏览器实例,支持 Chromium、Firefox 或 WebKit。
- 创建新的浏览器上下文以提供隔离环境,防止不同自动化任务之间的会话泄漏。
- 该技能利用弹性定位器和自动等待机制与 DOM 交互,确保在执行命令前元素是可操作的。
- 根据特定的自动化需求,应用网络拦截或文件处理等复杂逻辑。
- 完成后,该技能捕获请求的数据或媒体,并关闭浏览器上下文以释放系统资源。
Playwright 浏览器自动化 配置指南
要开始使用此技能,请安装必要的软件包和浏览器二进制文件:
npm install -g playwright
npx playwright install chromium
# 可选浏览器
npx playwright install firefox webkit
对于 Linux 用户,请确保已安装系统依赖:
sudo npx playwright install-deps chromium
Playwright 浏览器自动化 数据架构与分类体系
该技能管理各种输出格式和元数据以保持组织性:
| 数据类型 | 格式 | 描述 |
|---|---|---|
| 屏幕截图 | .png | 特定元素或整页的图像。 |
| 文档导出 | 打印就绪的 PDF 文件(仅限 Chromium)。 | |
| 会话录制 | .webm / .mp4 | 捕获整个自动化序列的视频文件。 |
| 身份验证状态 | .json | 存储 Cookie 和本地存储以实现会话持久化。 |
| 追踪日志 | .zip | 详细的执行追踪,用于通过 Playwright Trace Viewer 进行调试。 |
name: manikantasai-playwright-automation
description: Browser automation using Playwright API directly. Navigate websites, interact with elements, extract data, take screenshots, generate PDFs, record videos, and automate complex workflows. More reliable than MCP approach.
metadata: {"openclaw":{"emoji":"??","os":["linux","darwin","win32"],"requires":{"bins":["node","npx"]},"install":[{"id":"npm-playwright","kind":"npm","package":"playwright","bins":["playwright"],"label":"Install Playwright"}]}}
Playwright Browser Automation
Direct Playwright API for reliable browser automation without MCP complexity.
Installation
# Install Playwright
npm install -g playwright
# Install browsers (one-time, ~100MB each)
npx playwright install chromium
# Optional:
npx playwright install firefox
npx playwright install webkit
# For system dependencies on Ubuntu/Debian:
sudo npx playwright install-deps chromium
Quick Start
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'screenshot.png' });
await browser.close();
})();
Best Practices
1. Use Locators (Auto-waiting)
// ? GOOD: Uses auto-waiting and retries
await page.getByRole('button', { name: 'Submit' }).click();
await page.getByLabel('Username').fill('user');
await page.getByPlaceholder('Search').fill('query');
// ? BAD: May fail if element not ready
await page.click('#submit');
2. Prefer User-Facing Attributes
// ? GOOD: Resilient to DOM changes
await page.getByRole('heading', { name: 'Welcome' });
await page.getByText('Sign in');
await page.getByTestId('login-button');
// ? BAD: Brittle CSS selectors
await page.click('.btn-primary > div:nth-child(2)');
3. Handle Dynamic Content
// Wait for network idle
await page.goto('https://spa-app.com', { waitUntil: 'networkidle' });
// Wait for specific element
await page.waitForSelector('.results-loaded');
await page.waitForFunction(() => document.querySelectorAll('.item').length > 0);
4. Use Contexts for Isolation
// Each context = isolated session (cookies, storage)
const context = await browser.newContext();
const page = await context.newPage();
// Multiple pages in one context
const page2 = await context.newPage();
5. Network Interception
// Mock API responses
await page.route('**/api/users', route => {
route.fulfill({
status: 200,
body: JSON.stringify({ users: [] })
});
});
// Block resources
await page.route('**/*.{png,jpg,css}', route => route.abort());
Common Patterns
Form Automation
// Fill form
await page.goto('https://example.com/login');
await page.getByLabel('Username').fill('myuser');
await page.getByLabel('Password').fill('mypass');
await page.getByRole('button', { name: 'Sign in' }).click();
// Wait for navigation/result
await page.waitForURL('/dashboard');
await expect(page.getByText('Welcome')).toBeVisible();
Data Extraction
// Extract table data
const rows = await page.$$eval('table tr', rows =>
rows.map(row => ({
name: row.querySelector('td:nth-child(1)')?.textContent,
price: row.querySelector('td:nth-child(2)')?.textContent
}))
);
// Extract with JavaScript evaluation
const data = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product')).map(p => ({
title: p.querySelector('.title')?.textContent,
price: p.querySelector('.price')?.textContent
}));
});
Screenshots & PDFs
// Full page screenshot
await page.screenshot({ path: 'full.png', fullPage: true });
// Element screenshot
await page.locator('.chart').screenshot({ path: 'chart.png' });
// PDF (Chromium only)
await page.pdf({
path: 'page.pdf',
format: 'A4',
printBackground: true
});
Video Recording
const context = await browser.newContext({
recordVideo: {
dir: './videos/',
size: { width: 1920, height: 1080 }
}
});
const page = await context.newPage();
// ... do stuff ...
await context.close(); // Video saved automatically
Mobile Emulation
const context = await browser.newContext({
viewport: { width: 375, height: 667 },
userAgent: 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0)',
isMobile: true,
hasTouch: true
});
Authentication
// Method 1: HTTP Basic Auth
const context = await browser.newContext({
httpCredentials: { username: 'user', password: 'pass' }
});
// Method 2: Cookies
await context.addCookies([
{ name: 'session', value: 'abc123', domain: '.example.com', path: '/' }
]);
// Method 3: Local Storage
await page.evaluate(() => {
localStorage.setItem('token', 'xyz');
});
// Method 4: Reuse auth state
await context.storageState({ path: 'auth.json' });
// Later: await browser.newContext({ storageState: 'auth.json' });
Advanced Features
File Upload/Download
// Upload
await page.setInputFiles('input[type="file"]', '/path/to/file.pdf');
// Download
const [download] = await Promise.all([
page.waitForEvent('download'),
page.click('a[download]')
]);
await download.saveAs('/path/to/save/' + download.suggestedFilename());
Dialogs Handling
page.on('dialog', dialog => {
if (dialog.type() === 'alert') dialog.accept();
if (dialog.type() === 'confirm') dialog.accept();
if (dialog.type() === 'prompt') dialog.accept('My answer');
});
Frames & Shadow DOM
// Frame by name
const frame = page.frame('frame-name');
await frame.click('button');
// Frame by locator
const frame = page.frameLocator('iframe').first();
await frame.getByRole('button').click();
// Shadow DOM
await page.locator('my-component').locator('button').click();
Tracing (Debug)
await context.tracing.start({ screenshots: true, snapshots: true });
// ... run tests ...
await context.tracing.stop({ path: 'trace.zip' });
// View at https://trace.playwright.dev
Configuration Options
const browser = await chromium.launch({
headless: true, // Run without UI
slowMo: 50, // Slow down by 50ms (for debugging)
devtools: false, // Open DevTools
args: ['--no-sandbox', '--disable-setuid-sandbox'] // Docker/Ubuntu
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
locale: 'ru-RU',
timezoneId: 'Europe/Moscow',
geolocation: { latitude: 55.7558, longitude: 37.6173 },
permissions: ['geolocation'],
userAgent: 'Custom Agent',
bypassCSP: true, // Bypass Content Security Policy
});
Error Handling
// Retry with timeout
try {
await page.getByRole('button', { name: 'Load' }).click({ timeout: 10000 });
} catch (e) {
console.log('Button not found or not clickable');
}
// Check if element exists
const hasButton = await page.getByRole('button').count() > 0;
// Wait with custom condition
await page.waitForFunction(() =>
document.querySelectorAll('.loaded').length >= 10
);
Sudoers Setup
For Playwright browser installation:
# /etc/sudoers.d/playwright
username ALL=(root) NOPASSWD: /usr/bin/npx playwright install-deps *
username ALL=(root) NOPASSWD: /usr/bin/npx playwright install *
References
- Playwright Docs
- API Reference
- Best Practices
- Locators Guide
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
会话成本追踪器:优化 Token 投资回报率 - Openclaw Skills
Memoria: AI 智能体结构化记忆系统 - Openclaw Skills
Deno 运行时专家:安全 TypeScript 开发 - Openclaw Skills
为 AI 代理部署 Spark Bitcoin L2 代理 - Openclaw Skills
加密货币价格技能:实时市场数据集成 - Openclaw Skills
Happenstance:专业人脉搜索与研究 - Openclaw Skills
飞书日历技能:通过 Openclaw Skills 自动化日程安排
顾问委员会:多人格 AI 加密货币分析 - Openclaw Skills
CRIF:面向 AI Agent 的加密深度研究框架 - Openclaw Skills
个人社交:社交生活与生日助手 - Openclaw Skills
AI精选
