Video Fetch: YouTube & Bilibili 视频转录 - Openclaw Skills-脚本在线

AI智能体脚本智能办公脚本自动化游戏脚本浏览器自动化脚本服务器脚本

Video Fetch: YouTube & Bilibili 视频转录 - Openclaw Skills

作者：互联网

2026-04-16

AI教程

什么是 Video Fetch — YouTube & Bilibili？

Video Fetch 技能是专为 AI 智能体设计的强大解决方案，用于消化全球最大平台的视频内容。作为 Openclaw Skills 生态系统的一部分，它在视频媒体和文本分析之间提供了无缝桥梁。无论您是处理官方字幕还是需要通过语音转文本生成字幕，该工具都通过优先获取官方转录，再回退到 ElevenLabs、本地 Whisper 或视频描述，确保了高保真度。

该技能特别为需要绕过地理限制和平台特定障碍的开发人员和研究人员进行了优化。通过将 Openclaw Skills 集成到您的工作流程中，您可以利用专业级的准确度和自动翻译功能来处理 Bilibili 的 AI 生成字幕和 YouTube 的海量库。

下载入口:https://github.com/openclaw/skills/tree/main/skills/lambertalpha/you@tube-fetch

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install you@tube-fetch

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级：工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 you@tube-fetch。如果尚未安装 Clawhub，请先安装（npm i -g clawhub）。

Video Fetch — YouTube & Bilibili 应用场景

快速总结长篇教程或讲座，以便快速检索信息。
转录 Bilibili 的外语内容，用于研究和翻译项目。
使用高精度 STT 从缺乏官方字幕的视频中提取文本。
在 Openclaw Skills 框架内自动化内容研究流程。

Video Fetch — YouTube & Bilibili 工作原理

该技能接受视频 URL 并自动检测平台是 YouTube 还是 Bilibili。
它首先尝试通过平台的原生 API 获取官方字幕或转录。
如果缺失字幕，它会使用 yt-dlp 下载音频流，并将其路由到 ElevenLabs Scribe v2 进行高精度转录。
如果没有 API 密钥，它会触发本地 Whisper 回退，在您的机器上处理音频。
在音频处理受限的情况下，它会提取创作者的视频描述作为最终回退。
生成的文本随后传递给 AI 智能体，以用户首选的语言进行总结或分析。

Video Fetch — YouTube & Bilibili 配置指南

要开始在 Openclaw Skills 集合中使用此技能，请安装必要的依赖项：

# 核心需求
pip install you@tube-transcript-api requests yt-dlp

# YouTube 住宅代理支持
pip install pysocks

# 可选：本地 Whisper 回退
pip install openai-whisper

配置包括设置用于高质量转录的 ELEVENLABS_API_KEY，以及可选提供用于身份验证字幕访问的 Bilibili cookie 文件。

Video Fetch — YouTube & Bilibili 数据架构与分类体系

该技能返回结构化数据，确保与各种 Openclaw Skills 工作流兼容：

数据点	描述
内容	完整的转录文本或语音转文本输出
来源类型	标识数据是来自 API、ElevenLabs、Whisper 还是描述
平台	视频来源（YouTube 或 Bilibili）
元数据	从页面提取的标题、描述和作者信息

name: Video Fetch — YouTube & Bilibili description: Fetch, transcribe, and summarize any YouTube or Bilibili video. 4-level fallback: subtitle API → ElevenLabs STT (cloud, high-accuracy) → local Whisper → description. Supports proxy for geo-blocked regions, Bilibili cookie auth, and auto-translation to user's language. Use when the user shares a video link or asks to summarize/analyze a video.

Video Content Fetcher (YouTube & Bilibili)

Before You Start

This skill needs 1-3 things configured depending on your use case:

What	Why	How
Residential IP proxy	YouTube blocks datacenter IPs. You MUST use a residential broadband proxy (e.g., HK home broadband). VPS/cloud IPs will fail.	`--proxy socks5h://127.0.0.1:1080` (also install `pysocks`: `pip install pysocks`)
ElevenLabs API key	Default STT engine for audio transcription when no subtitles exist. Far more accurate than local Whisper, especially for Chinese.	`export ELEVENLABS_API_KEY="sk_..."` or `--stt-api-key @~/.elevenlabs_key`. Get key: https://elevenlabs.io/app/settings/api-keys
Bilibili cookie	Bilibili AI subtitles require login. Without cookie, falls back to STT.	Export `SESSDATA` + `bili_jct` from browser, save to `~/.bilibili_cookie`, use `--cookie @~/.bilibili_cookie`. See details below.

Minimum setup for YouTube: proxy + ElevenLabs API key (or local Whisper) Minimum setup for Bilibili: ElevenLabs API key (or local Whisper). Cookie optional but recommended.

Quick Start

# YouTube
python3 scripts/you@tube_fetch.py "https://www.you@tube.com/watch?v=VIDEO_ID" --proxy socks5h://127.0.0.1:1080

# Bilibili
python3 scripts/you@tube_fetch.py "https://www.bilibili.com/video/BVxxxxxxxxxx"

Fallback Chain

1. Subtitle/Transcript API  (highest fidelity — full spoken content)
       |  if unavailable
       v
2. ElevenLabs STT  (default, cloud-based, high accuracy with punctuation)
       |  if no API key or fails
       v
3. Local Whisper  (auto-fallback, requires local install)
       |  if unavailable
       v
4. Video description  (lowest fidelity — creator's written summary only)

Always tell the user which source was used. If description-only, explicitly note that it's not a full transcript.

Platform Support

Platform	Subtitle source	STT fallback	Description fallback
YouTube	you@tube-transcript-api	ElevenLabs / Whisper	HTML meta extraction
Bilibili	Player API (cookie required for AI subs)	ElevenLabs / Whisper	View API

Platform is auto-detected from URL. Force with --platform you@tube or --platform bilibili.

Usage

# YouTube with proxy
python3 scripts/you@tube_fetch.py VIDEO_URL --proxy socks5h://127.0.0.1:1080

# Bilibili (cookie required for subtitles — without it, falls back to STT/description)
# IMPORTANT: always use @filepath to avoid exposing credentials in shell history / process list
python3 scripts/you@tube_fetch.py "https://www.bilibili.com/video/BVxxxxxxxxxx" --cookie @~/.bilibili_cookie

# Bilibili short link
python3 scripts/you@tube_fetch.py "https://b23.tv/xxxxxx"

# Specify transcript languages (default: zh-Hans,zh-Hant,zh,en)
python3 scripts/you@tube_fetch.py VIDEO_URL --langs "en,ja" --proxy PROXY

# Force local Whisper instead of ElevenLabs
python3 scripts/you@tube_fetch.py VIDEO_URL --stt whisper --whisper-model medium

# Disable audio transcription entirely
python3 scripts/you@tube_fetch.py VIDEO_URL --stt none

# JSON output
python3 scripts/you@tube_fetch.py VIDEO_URL --json --proxy PROXY

# Save to file
python3 scripts/you@tube_fetch.py VIDEO_URL --output /tmp/transcript.txt --proxy PROXY

Dependencies

Required:

you@tube-transcript-api (pip) — YouTube transcript fetching
requests (pip) — HTTP requests for APIs

Required for STT (audio transcription):

yt-dlp (pip or brew) — audio download from YouTube/Bilibili

Optional (for local Whisper, used as auto-fallback when ElevenLabs is unavailable):

openai-whisper (pip) — local audio transcription
ffmpeg (system) — required by whisper

Optional (for SOCKS5 proxy):

pysocks (pip) — required if using socks5h:// proxies

# Required
pip install you@tube-transcript-api requests

# Required for STT
pip install yt-dlp

# If using SOCKS5 proxy
pip install pysocks

# Optional: local Whisper fallback
pip install openai-whisper
brew install ffmpeg  # or apt install ffmpeg

ElevenLabs STT (Default)

ElevenLabs Scribe v2 is the default speech-to-text engine. It provides significantly better accuracy than local Whisper, especially for Chinese — correct characters, automatic punctuation, and proper formatting (e.g., book title marks).

Setup:

# Option 1: Environment variable (recommended)
export ELEVENLABS_API_KEY="sk_your_key_here"

# Option 2: Key file
echo "sk_your_key_here" > ~/.elevenlabs_key && chmod 600 ~/.elevenlabs_key
# Then use: --stt-api-key @~/.elevenlabs_key

If no API key is available, the skill automatically falls back to local Whisper.

Security: Same rules as Bilibili cookie — use env var or @filepath, never pass the key directly on the command line.

Get your API key at: https://elevenlabs.io/app/settings/api-keys

Proxy Setup

YouTube requires a proxy from geo-blocked regions. Bilibili typically works directly from China.

SOCKS5: socks5h://127.0.0.1:1080
HTTP: http://127.0.0.1:1081

Important: Use residential IP proxies for YouTube. YouTube blocks datacenter/cloud server IPs. Residential broadband IPs (e.g., Hong Kong home broadband) work reliably, while VPS/cloud IPs will likely be blocked. Recommend routing only YouTube domains through the proxy.

Bilibili subtitles are almost entirely AI-generated and require login to access. Without cookies, the subtitle API returns empty results for virtually all videos, and the skill will fall back to STT or description.

How to get your cookie:

Log in to bilibili.com in your browser
Open DevTools (F12) > Application > Cookies
Copy the values of SESSDATA and bili_jct
Save to a file: echo "SESSDATA=xxx;bili_jct=xxx" > ~/.bilibili_cookie && chmod 600 ~/.bilibili_cookie
Use: --cookie @~/.bilibili_cookie

Security warning:

Your Bilibili cookie is a login credential. If leaked, others can act as you (post comments, send messages, etc.)
Always use --cookie @filepath instead of passing the cookie string directly on the command line, because:
- Command-line arguments are visible in ps aux to other users on the same machine
- They get saved to shell history (~/.zsh_history, ~/.bash_history)
- They may appear in Claude Code tool call logs
Set file permissions: chmod 600 ~/.bilibili_cookie
Add to .gitignore if the file is anywhere near a git repo
Cookies expire periodically — refresh from browser when needed

Translation & Language

Respond in the user's conversation language. Do not default to any hardcoded language.

If the user speaks English, summarize in English
If the user speaks Chinese, summarize in Chinese
If the user speaks Japanese, summarize in Japanese
If the transcript language differs from the user's language, translate while summarizing

This requires no extra configuration — Claude naturally adapts to the conversation language.

When Summarizing

Transcript/subtitle source: Summarize freely — this is the complete spoken content
ElevenLabs STT source: High accuracy with punctuation — treat as near-transcript quality
Whisper STT source: Reliable but may have character-level errors in Chinese — note this to the user
Description source: Warn the user that this is the creator's written summary, not the full video content
For long transcripts (>50K chars), break into sections and summarize incrementally

上一篇：Apple Mail：macOS 邮件自动化与获取 - Openclaw Skills 下一篇：Schema.org 网站增强器：自动化 SEO 结构化数据 - Openclaw Skills