Jina AI 技能:网页阅读、搜索与深度搜索 - Openclaw Skills
作者:互联网
2026-03-24
什么是 Jina AI 阅读器与搜索?
针对 Openclaw Skills 的 Jina AI 技能为 AI 代理与网页内容交互提供了一个强大的接口。通过利用 Jina 的专业 API,此技能允许代理绕过复杂的 HTML 结构,并从任何 URL(包括重度使用 JavaScript 的网站和 PDF 文档)检索干净的、Markdown 格式的文本。它作为原始网页与大语言模型之间的桥梁,确保以高度易读的格式传递数据。
除了简单的页面读取,此 Openclaw Skills 集成还包括先进的搜索能力和多步研究代理。这些工具使代理能够执行广泛的网页搜索,返回针对 LLM 优化的结果,或进行 DeepSearch(深度搜索),通过结合搜索、阅读和推理来解决需要综合多源信息的复杂查询。
下载入口:https://github.com/openclaw/skills/tree/main/skills/adhishthite/jina-ai
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install jina-ai
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 jina-ai。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
Jina AI 阅读器与搜索 应用场景
- 将文档页面或博客文章转换为干净的 Markdown,用于 RAG 或上下文注入。
- 执行网页搜索,提供可供代理分析的完整页面内容。
- 使用 DeepSearch 代理对技术主题或时事进行深入的多源研究。
- 通过定位自定义 CSS 选择器从网页中提取特定数据。
- 在 Openclaw Skills 内部直接将远程 PDF 文件解析为文本以进行文档处理。
- 代理或用户调用其中一个辅助脚本(reader、search 或 deepsearch),并提供目标 URL 或查询。
- 该技能将 JINA_API_KEY 附加到请求头中,用于 Jina AI 服务的身份验证。
- 对于 URL 读取,请求被发送到 Reader API,该接口会渲染页面并剥离广告和导航等非核心元素。
- 对于搜索,查询通过 Search API 处理,以查找最相关的网页结果并将其解析为 Markdown。
- 对于复杂任务,DeepSearch 会运行多步推理链,从整个网络收集并合成数据。
- 生成的干净文本或结构化 JSON 将返回到环境中,供代理在其工作流中使用。
Jina AI 阅读器与搜索 配置指南
要将此技能集成到您的 Openclaw Skills 环境中,请按照以下步骤操作:
# 1. 从 Jina AI 控制台获取您的 API 密钥 (https://jina.ai/)
# 2. 设置环境变量
export JINA_API_KEY="your_jina_api_key_here"
# 3. 测试阅读器脚本
./scripts/jina-reader.sh https://example.com
# 4. 测试搜索脚本
./scripts/jina-search.sh "关于 Openclaw Skills 的最新消息"
Jina AI 阅读器与搜索 数据架构与分类体系
该技能将网页数据组织成针对 AI 消耗优化的结构化格式。下表描述了主要的数据输出:
| 输出格式 | 描述 |
|---|---|
| 文本/Markdown | 默认输出;网页内容的干净、精简版本。 |
| JSON 对象 | 包含元数据,如 title、url、content 和 timestamp。 |
| DeepSearch 响应 | 包含综合研究结果的 OpenAI 兼容聊天完成对象。 |
| 屏幕截图 | 当使用 X-Respond-With: screenshot 标头时对页面的视觉捕捉。 |
name: jina
description: Web reading and searching via Jina AI APIs. Fetch clean markdown from URLs (r.jina.ai), web search (s.jina.ai), or deep multi-step research (DeepSearch).
homepage: "https://github.com/adhishthite/jina-ai-skill"
metadata:
{
"clawdbot":
{
"emoji": "??",
"requires": { "env": ["JINA_API_KEY"] },
"primaryEnv": "JINA_API_KEY",
"files": ["scripts/*"],
},
}
Jina AI — Reader, Search & DeepSearch
Web reading and search powered by Jina AI. Requires JINA_API_KEY environment variable.
Trust & Privacy: By using this skill, URLs and queries are transmitted to Jina AI (jina.ai). Only install if you trust Jina with your data.
Model Invocation: This skill may be invoked autonomously by the model without explicit user trigger (standard for integration skills). If you prefer manual-only invocation, disable model invocation in your OpenClaw skill settings.
Get your API key: https://jina.ai/ → Dashboard → API Keys
External Endpoints
This skill makes HTTP requests to the following external endpoints only:
| Endpoint | URL Pattern | Purpose |
|---|---|---|
| Reader API | https://r.jina.ai/{url} |
Sends URL content request to Jina for conversion to markdown |
| Search API | https://s.jina.ai/{query} |
Sends search query to Jina for web search results |
| DeepSearch API | https://deepsearch.jina.ai/v1/chat/completions |
Sends research question to Jina for multi-step research |
No other external network calls are made by this skill.
Security & Privacy
- Authentication: Only your
JINA_API_KEYis transmitted to Jina's servers (viaAuthorizationheader) - Data sent: URLs and search queries you provide are sent to Jina's servers for processing
- Local files: No local files are read or transmitted by this skill
- Local storage: No data is stored locally beyond stdout output
- Environment access: Scripts only access the
JINA_API_KEYenvironment variable; no other env vars are read - Cookies: Cookies are not forwarded by default; the
X-Set-Cookieheader is available for authenticated content but is opt-in only
Endpoints
| Endpoint | Base URL | Purpose |
|---|---|---|
| Reader | https://r.jina.ai/{url} |
Convert any URL → clean markdown |
| Search | https://s.jina.ai/{query} |
Web search with LLM-friendly results |
| DeepSearch | https://deepsearch.jina.ai/v1/chat/completions |
Multi-step research agent |
All endpoints accept Authorization: Bearer $JINA_API_KEY.
Reader API (r.jina.ai)
Fetches any URL and returns clean, LLM-friendly content. Works with web pages, PDFs, and JS-heavy sites.
Basic Usage
# Plain text output
curl -s "https://r.jina.ai/https://example.com" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Accept: text/plain"
# JSON output (includes url, title, content, timestamp)
curl -s "https://r.jina.ai/https://example.com" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Accept: application/json"
Or use the helper script: scripts/jina-reader.sh
Parameters (via headers or query params)
Content Control
| Header | Query Param | Values | Default | Description |
|---|---|---|---|---|
X-Respond-With |
respondWith |
content, markdown, html, text, screenshot, pageshot, vlm, readerlm-v2 |
content |
Output format |
X-Retain-Images |
retainImages |
none, all, alt, all_p, alt_p |
all |
Image handling |
X-Retain-Links |
retainLinks |
none, all, text, gpt-oss |
all |
Link handling |
X-With-Generated-Alt |
withGeneratedAlt |
true/false |
false |
Auto-caption images |
X-With-Links-Summary |
withLinksSummary |
true |
- | Append links section |
X-With-Images-Summary |
withImagesSummary |
true/false |
false |
Append images section |
X-Token-Budget |
tokenBudget |
number | - | Max tokens for response |
CSS Selectors
| Header | Query Param | Description |
|---|---|---|
X-Target-Selector |
targetSelector |
Only extract matching elements |
X-Wait-For-Selector |
waitForSelector |
Wait for elements before extracting |
X-Remove-Selector |
removeSelector |
Remove elements before extraction |
Browser & Network
| Header | Query Param | Description |
|---|---|---|
X-Timeout |
timeout |
Page load timeout (1-180s) |
X-Respond-Timing |
respondTiming |
When page is "ready" (html, network-idle, etc.) |
X-No-Cache |
noCache |
Bypass cached content |
X-Proxy |
proxy |
Country code or auto for proxy |
X-Set-Cookie |
setCookies |
Forward cookies for authenticated content |
Common Patterns
# Extract main content, remove navigation elements
curl -s "https://r.jina.ai/https://example.com/article" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "X-Retain-Images: none" r
-H "X-Remove-Selector: nav, footer, .sidebar, .ads" r
-H "Accept: text/plain"
# Extract specific section
curl -s "https://r.jina.ai/https://example.com" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "X-Target-Selector: article.main-content"
# Parse a PDF
curl -s "https://r.jina.ai/https://example.com/paper.pdf" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Accept: text/plain"
# Wait for dynamic content
curl -s "https://r.jina.ai/https://spa-app.com" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "X-Wait-For-Selector: .loaded-content" r
-H "X-Respond-Timing: network-idle"
Search API (s.jina.ai)
Web search returning LLM-friendly results with full page content.
Basic Usage
# Plain text
curl -s "https://s.jina.ai/your+search+query" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Accept: text/plain"
# JSON
curl -s "https://s.jina.ai/your+search+query" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Accept: application/json"
Or use the helper script: scripts/jina-search.sh "
Search Parameters
| Param | Values | Description |
|---|---|---|
site |
domain | Limit to specific site |
type |
web, images, news |
Search type |
num / count |
0-20 | Number of results |
gl |
country code | Geo-location (e.g. us, in) |
filetype |
extension | Filter by file type |
intitle |
string | Must appear in title |
All Reader parameters also work on search results.
Common Patterns
# Site-scoped search
curl -s "https://s.jina.ai/OpenAI+GPT-5?site=reddit.com" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Accept: text/plain"
# News search
curl -s "https://s.jina.ai/latest+AI+news?type=news&num=5" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Accept: application/json"
# Search for PDFs
curl -s "https://s.jina.ai/machine+learning+survey?filetype=pdf&num=5" r
-H "Authorization: Bearer $JINA_API_KEY"
DeepSearch
Multi-step research agent that combines search + reading + reasoning. OpenAI-compatible chat completions API.
curl -s "https://deepsearch.jina.ai/v1/chat/completions" r
-H "Authorization: Bearer $JINA_API_KEY" r
-H "Content-Type: application/json" r
-d '{
"model": "jina-deepsearch-v1",
"messages": [{"role": "user", "content": "Your research question here"}],
"stream": false
}'
Or use the helper script: scripts/jina-deepsearch.sh "
Use for complex research requiring multiple sources and reasoning chains.
Helper Scripts
| Script | Purpose |
|---|---|
scripts/jina-reader.sh |
Read any URL as markdown |
scripts/jina-search.sh |
Web search |
scripts/jina-deepsearch.sh |
Deep multi-step research |
scripts/jina-reader.py |
Python reader (no deps beyond stdlib) |
Rate Limits
- Free (no key): 20 RPM
- With API key: Higher limits, token-based pricing
API Docs
- Reader: https://jina.ai/reader
- Search: https://s.jina.ai/docs
- OpenAPI specs: https://r.jina.ai/openapi.json | https://s.jina.ai/openapi.json
When to Use
| Need | Use |
|---|---|
| Fetch a URL as markdown | Reader — better than web_fetch for JS-heavy sites |
| Web search | Search — LLM-friendly results |
| Complex multi-source research | DeepSearch |
| Parse a PDF from URL | Reader — pass PDF URL directly |
| Screenshot a page | Reader with X-Respond-With: screenshot |
| Extract structured data | Reader with jsonSchema param |
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
信号管道:自动化营销情报工具 - Openclaw Skills
技能收益追踪器:监控 Openclaw 技能并实现变现
AI 合规准备就绪度:评估与治理工具 - Openclaw Skills
FOSMVVM ServerRequest 测试生成器:自动化 API 测试 - Openclaw Skills
酒店搜索器:AI 赋能的住宿与位置情报 - Openclaw Skills
Dub 链接 API:程序化链接管理 - Openclaw Skills
IntercomSwap:P2P BTC 与 USDT 跨链兑换 - Openclaw Skills
spotplay:macOS 原生 Spotify 播放控制 - Openclaw Skills
DeepSeek OCR:AI驱动的图像文本识别 - Openclaw Skills
Web Navigator:自动化网页研究与浏览 - Openclaw Skills
AI精选
