Jina AI 技能:网页阅读、搜索与深度搜索 - Openclaw Skills

作者:互联网

2026-03-24

AI教程

什么是 Jina AI 阅读器与搜索?

针对 Openclaw Skills 的 Jina AI 技能为 AI 代理与网页内容交互提供了一个强大的接口。通过利用 Jina 的专业 API,此技能允许代理绕过复杂的 HTML 结构,并从任何 URL(包括重度使用 JavaScript 的网站和 PDF 文档)检索干净的、Markdown 格式的文本。它作为原始网页与大语言模型之间的桥梁,确保以高度易读的格式传递数据。

除了简单的页面读取,此 Openclaw Skills 集成还包括先进的搜索能力和多步研究代理。这些工具使代理能够执行广泛的网页搜索,返回针对 LLM 优化的结果,或进行 DeepSearch(深度搜索),通过结合搜索、阅读和推理来解决需要综合多源信息的复杂查询。

下载入口:https://github.com/openclaw/skills/tree/main/skills/adhishthite/jina-ai

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install jina-ai

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 jina-ai。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Jina AI 阅读器与搜索 应用场景

  • 将文档页面或博客文章转换为干净的 Markdown,用于 RAG 或上下文注入。
  • 执行网页搜索,提供可供代理分析的完整页面内容。
  • 使用 DeepSearch 代理对技术主题或时事进行深入的多源研究。
  • 通过定位自定义 CSS 选择器从网页中提取特定数据。
  • 在 Openclaw Skills 内部直接将远程 PDF 文件解析为文本以进行文档处理。
Jina AI 阅读器与搜索 工作原理
  1. 代理或用户调用其中一个辅助脚本(reader、search 或 deepsearch),并提供目标 URL 或查询。
  2. 该技能将 JINA_API_KEY 附加到请求头中,用于 Jina AI 服务的身份验证。
  3. 对于 URL 读取,请求被发送到 Reader API,该接口会渲染页面并剥离广告和导航等非核心元素。
  4. 对于搜索,查询通过 Search API 处理,以查找最相关的网页结果并将其解析为 Markdown。
  5. 对于复杂任务,DeepSearch 会运行多步推理链,从整个网络收集并合成数据。
  6. 生成的干净文本或结构化 JSON 将返回到环境中,供代理在其工作流中使用。

Jina AI 阅读器与搜索 配置指南

要将此技能集成到您的 Openclaw Skills 环境中,请按照以下步骤操作:

# 1. 从 Jina AI 控制台获取您的 API 密钥 (https://jina.ai/)
# 2. 设置环境变量
export JINA_API_KEY="your_jina_api_key_here"

# 3. 测试阅读器脚本
./scripts/jina-reader.sh https://example.com

# 4. 测试搜索脚本
./scripts/jina-search.sh "关于 Openclaw Skills 的最新消息"

Jina AI 阅读器与搜索 数据架构与分类体系

该技能将网页数据组织成针对 AI 消耗优化的结构化格式。下表描述了主要的数据输出:

输出格式 描述
文本/Markdown 默认输出;网页内容的干净、精简版本。
JSON 对象 包含元数据,如 titleurlcontenttimestamp
DeepSearch 响应 包含综合研究结果的 OpenAI 兼容聊天完成对象。
屏幕截图 当使用 X-Respond-With: screenshot 标头时对页面的视觉捕捉。
name: jina
description: Web reading and searching via Jina AI APIs. Fetch clean markdown from URLs (r.jina.ai), web search (s.jina.ai), or deep multi-step research (DeepSearch).
homepage: "https://github.com/adhishthite/jina-ai-skill"
metadata:
  {
    "clawdbot":
      {
        "emoji": "??",
        "requires": { "env": ["JINA_API_KEY"] },
        "primaryEnv": "JINA_API_KEY",
        "files": ["scripts/*"],
      },
  }

Jina AI — Reader, Search & DeepSearch

Web reading and search powered by Jina AI. Requires JINA_API_KEY environment variable.

Trust & Privacy: By using this skill, URLs and queries are transmitted to Jina AI (jina.ai). Only install if you trust Jina with your data.

Model Invocation: This skill may be invoked autonomously by the model without explicit user trigger (standard for integration skills). If you prefer manual-only invocation, disable model invocation in your OpenClaw skill settings.

Get your API key: https://jina.ai/ → Dashboard → API Keys

External Endpoints

This skill makes HTTP requests to the following external endpoints only:

Endpoint URL Pattern Purpose
Reader API https://r.jina.ai/{url} Sends URL content request to Jina for conversion to markdown
Search API https://s.jina.ai/{query} Sends search query to Jina for web search results
DeepSearch API https://deepsearch.jina.ai/v1/chat/completions Sends research question to Jina for multi-step research

No other external network calls are made by this skill.

Security & Privacy

  • Authentication: Only your JINA_API_KEY is transmitted to Jina's servers (via Authorization header)
  • Data sent: URLs and search queries you provide are sent to Jina's servers for processing
  • Local files: No local files are read or transmitted by this skill
  • Local storage: No data is stored locally beyond stdout output
  • Environment access: Scripts only access the JINA_API_KEY environment variable; no other env vars are read
  • Cookies: Cookies are not forwarded by default; the X-Set-Cookie header is available for authenticated content but is opt-in only

Endpoints

Endpoint Base URL Purpose
Reader https://r.jina.ai/{url} Convert any URL → clean markdown
Search https://s.jina.ai/{query} Web search with LLM-friendly results
DeepSearch https://deepsearch.jina.ai/v1/chat/completions Multi-step research agent

All endpoints accept Authorization: Bearer $JINA_API_KEY.


Reader API (r.jina.ai)

Fetches any URL and returns clean, LLM-friendly content. Works with web pages, PDFs, and JS-heavy sites.

Basic Usage

# Plain text output
curl -s "https://r.jina.ai/https://example.com" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Accept: text/plain"

# JSON output (includes url, title, content, timestamp)
curl -s "https://r.jina.ai/https://example.com" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Accept: application/json"

Or use the helper script: scripts/jina-reader.sh [--json]

Parameters (via headers or query params)

Content Control

Header Query Param Values Default Description
X-Respond-With respondWith content, markdown, html, text, screenshot, pageshot, vlm, readerlm-v2 content Output format
X-Retain-Images retainImages none, all, alt, all_p, alt_p all Image handling
X-Retain-Links retainLinks none, all, text, gpt-oss all Link handling
X-With-Generated-Alt withGeneratedAlt true/false false Auto-caption images
X-With-Links-Summary withLinksSummary true - Append links section
X-With-Images-Summary withImagesSummary true/false false Append images section
X-Token-Budget tokenBudget number - Max tokens for response

CSS Selectors

Header Query Param Description
X-Target-Selector targetSelector Only extract matching elements
X-Wait-For-Selector waitForSelector Wait for elements before extracting
X-Remove-Selector removeSelector Remove elements before extraction

Browser & Network

Header Query Param Description
X-Timeout timeout Page load timeout (1-180s)
X-Respond-Timing respondTiming When page is "ready" (html, network-idle, etc.)
X-No-Cache noCache Bypass cached content
X-Proxy proxy Country code or auto for proxy
X-Set-Cookie setCookies Forward cookies for authenticated content

Common Patterns

# Extract main content, remove navigation elements
curl -s "https://r.jina.ai/https://example.com/article" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "X-Retain-Images: none" r
  -H "X-Remove-Selector: nav, footer, .sidebar, .ads" r
  -H "Accept: text/plain"

# Extract specific section
curl -s "https://r.jina.ai/https://example.com" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "X-Target-Selector: article.main-content"

# Parse a PDF
curl -s "https://r.jina.ai/https://example.com/paper.pdf" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Accept: text/plain"

# Wait for dynamic content
curl -s "https://r.jina.ai/https://spa-app.com" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "X-Wait-For-Selector: .loaded-content" r
  -H "X-Respond-Timing: network-idle"

Search API (s.jina.ai)

Web search returning LLM-friendly results with full page content.

Basic Usage

# Plain text
curl -s "https://s.jina.ai/your+search+query" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Accept: text/plain"

# JSON
curl -s "https://s.jina.ai/your+search+query" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Accept: application/json"

Or use the helper script: scripts/jina-search.sh "" [--json]

Search Parameters

Param Values Description
site domain Limit to specific site
type web, images, news Search type
num / count 0-20 Number of results
gl country code Geo-location (e.g. us, in)
filetype extension Filter by file type
intitle string Must appear in title

All Reader parameters also work on search results.

Common Patterns

# Site-scoped search
curl -s "https://s.jina.ai/OpenAI+GPT-5?site=reddit.com" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Accept: text/plain"

# News search
curl -s "https://s.jina.ai/latest+AI+news?type=news&num=5" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Accept: application/json"

# Search for PDFs
curl -s "https://s.jina.ai/machine+learning+survey?filetype=pdf&num=5" r
  -H "Authorization: Bearer $JINA_API_KEY"

DeepSearch

Multi-step research agent that combines search + reading + reasoning. OpenAI-compatible chat completions API.

curl -s "https://deepsearch.jina.ai/v1/chat/completions" r
  -H "Authorization: Bearer $JINA_API_KEY" r
  -H "Content-Type: application/json" r
  -d '{
    "model": "jina-deepsearch-v1",
    "messages": [{"role": "user", "content": "Your research question here"}],
    "stream": false
  }'

Or use the helper script: scripts/jina-deepsearch.sh ""

Use for complex research requiring multiple sources and reasoning chains.


Helper Scripts

Script Purpose
scripts/jina-reader.sh Read any URL as markdown
scripts/jina-search.sh Web search
scripts/jina-deepsearch.sh Deep multi-step research
scripts/jina-reader.py Python reader (no deps beyond stdlib)

Rate Limits

  • Free (no key): 20 RPM
  • With API key: Higher limits, token-based pricing

API Docs

  • Reader: https://jina.ai/reader
  • Search: https://s.jina.ai/docs
  • OpenAPI specs: https://r.jina.ai/openapi.json | https://s.jina.ai/openapi.json

When to Use

Need Use
Fetch a URL as markdown Reader — better than web_fetch for JS-heavy sites
Web search Search — LLM-friendly results
Complex multi-source research DeepSearch
Parse a PDF from URL Reader — pass PDF URL directly
Screenshot a page Reader with X-Respond-With: screenshot
Extract structured data Reader with jsonSchema param