Markdown 浏览器：网页内容处理与规范化

AI智能体脚本智能办公脚本自动化游戏脚本浏览器自动化脚本服务器脚本

Markdown 浏览器：网页内容处理与规范化 - Openclaw Skills

作者：互联网

2026-03-31

AI教程

什么是 Markdown 浏览器？

Markdown 浏览器是一个技术编排技能，旨在作为网页数据的后处理引擎。它并非独立的抓取工具，而是通过提供处理网页内容的 MECE（相互独立，完全穷尽）架构来补充官方抓取工具。它确保在数据到达 AI 智能体之前，每一条检索到的数据都经过严格的策略、隐私和规范化层处理。

通过将此工具集成到您的 Openclaw Skills 工作流中，您可以获得稳定的输出架构，处理复杂的任务，如通过 Turndown 进行 HTML 到 Markdown 的转换、敏感路径片段的 URL 脱敏，以及基于 Content-Signal 响应头的自动化策略决策。这使其成为开发者构建稳健、具备隐私保护意识的 AI 应用的宝贵资产。

下载入口:https://github.com/openclaw/skills/tree/main/skills/johnortegahyc/markdown-browser-skills-openclaw

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install markdown-browser-skills-openclaw

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级：工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 markdown-browser-skills-openclaw。如果尚未安装 Clawhub，请先安装（npm i -g clawhub）。

Markdown 浏览器应用场景

将异构的网页内容类型规范化为统一的 Markdown 格式，供 AI 使用。
对敏感的 URL 查询参数和片段进行脱敏，防止日志中的数据泄露。
根据服务器端信号执行自动化的内容策略，如 allow_input 或 block_input。
使用可预测且结构化的数据模式为下游智能体逻辑准备网页数据。

Markdown 浏览器工作原理

系统触发官方 web_fetch 工具来检索原始页面数据和响应头。
抓取产生的原始 JSON 结果被传递到 Markdown 浏览器封装器中。
策略层分析 Content-Signal 等响应头，以计算适当的 policy_action。
隐私层脱敏 URL 中的敏感路径和查询值，同时保持整体 URL 结构。
规范化层检测内容类型，并在必要时使用回退机制将 HTML 转换为 Markdown。
该技能输出一个精炼的 JSON 对象，包含处理后的内容、Token 预估和合规性元数据。

Markdown 浏览器配置指南

要在您的 Openclaw Skills 环境中开始使用此技能，请先安装运行时依赖：

npm install --omit=dev

然后，您可以通过 CLI 运行封装器，提供抓取结果和可选的响应头：

node browser.js r
  --input /tmp/web_fetch.json r
  --content-signal "ai-input=yes, search=yes" r
  --markdown-tokens "1200"

Markdown 浏览器数据架构与分类体系

该技能返回一个具有以下字段的结构化对象：

属性	描述
content	最终处理后的文本或 Markdown 内容
format	格式类型：markdown、html-fallback 或 text
token_estimate	基于响应头的 Token 数值估算
content_signal	原始 Content-Signal 响应头数值
policy_action	计算出的动作：allow_input、block_input 或 needs_review
source_url	为隐私保护而脱敏后的源 URL
status_code	来自抓取源的 HTTP 状态码
fallback_used	布尔值，指示是否使用了 HTML 到 Markdown 的转换

name: markdown-browser
description: "Wrapper skill for OpenClaw web_fetch results. Use when you need MECE post-processing on fetched pages: policy decision from Content-Signal, privacy redaction, optional markdown normalization fallback, and stable output schema without re-implementing network fetch."

Markdown Browser Skills

This skill is an orchestration layer, not a replacement fetcher. It always keeps official web_fetch as the fetch source of truth.

MECE Architecture

Fetch layer (official, exclusive)

Use OpenClaw web_fetch to retrieve the page.
Do not call direct HTTP fetch inside this skill for normal operation.

Policy layer (these skills)

Parse Content-Signal and compute policy_action.
Current action focuses on ai-input semantics: allow_input, block_input, needs_review.

Privacy layer (these skills)

Redact path/fragment/query values in output URL fields.
Keep URL shape useful for debugging without leaking sensitive values.

Normalization layer (these skills)

If contentType=text/markdown, keep content as-is.
If contentType=text/html, convert with turndown as fallback enhancement.
For other content types, pass through text.

Execution Order

Call official web_fetch.
Pass the result JSON into this wrapper.
Optionally pass Content-Signal and x-markdown-tokens header values if available.
Use the returned normalized object for downstream agent logic.

Wrapper Tool

process_web_fetch_result({ web_fetch_result, content_signal_header, markdown_tokens_header })

Input:

web_fetch_result (required): JSON payload returned by OpenClaw web_fetch.
content_signal_header (optional): raw Content-Signal header string.
markdown_tokens_header (optional): raw x-markdown-tokens header value.

Output:

content
format (markdown | html-fallback | text)
token_estimate (number | null)
content_signal
policy_action
source_url (redacted)
status_code
fallback_used

CLI Usage

# Install runtime dependency once inside the skill directory
npm install --omit=dev

# 1) Obtain a web_fetch payload first (from OpenClaw runtime)
# 2) Save it as /tmp/web_fetch.json
# 3) Run wrapper post-processing
node browser.js r
  --input /tmp/web_fetch.json r
  --content-signal "ai-input=yes, search=yes, ai-train=no" r
  --markdown-tokens "1820"

上一篇：GitHub 赏金猎人：自动化赏金收益 - Openclaw Skills 下一篇：Google 航班搜索：实时票价监控 - Openclaw Skills

相关推荐