飞书文档读取器:提取飞书文档内容 - Openclaw Skills
作者:互联网
2026-04-13
什么是 飞书文档读取器?
飞书文档读取器是一款专门设计的工具,旨在弥合飞书(Lark)内容与开发者驱动的自动化之间的鸿沟。作为 Openclaw Skills 生态系统的核心组件,它利用飞书官方开放 API 将复杂的文档结构转换为简洁、可操作的 JSON 数据。对于正在构建需要直接从协作云文档中摄取组织知识的 AI 驱动工作流的开发者来说,该技能尤其具有价值。
通过处理租户身份验证和 API 速率限制的复杂细节,该技能使您能够专注于数据而非协议。它支持从标准 docx 文件到旧版文档和电子表格的广泛内容类型,确保您的 Openclaw Skills 实现能够灵活且稳健地应对不同的企业使用场景。
下载入口:https://github.com/openclaw/skills/tree/main/skills/snowshadow/feishu-doc-reader
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install feishu-doc-reader
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 feishu-doc-reader。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
飞书文档读取器 应用场景
- 通过 Openclaw Skills 将飞书文档内容喂给基于 RAG 的 AI 智能体,从而同步团队知识。
- 自动从飞书表格中提取财务或项目数据,用于外部报告仪表板。
- 以编程方式将文档从飞书迁移到 Markdown 或其他内部知识库。
- 在飞书租户内对文档元数据和访问权限进行批量审计。
- 用户在参考目录(reference directory)中配置应用凭据(App ID 和 Secret)以建立安全连接。
- 触发时,该技能向飞书开放平台请求租户访问令牌(tenant access token),并进行缓存以优化性能。
- 该技能根据提供的令牌识别文档类型(docx、表格或旧版文档),并选择相应的 API 端点。
- 它递归地获取文档块(包括标题、表格和图像),同时保持原始的父子层级结构。
- 原始 API 响应被解析并格式化为结构化 JSON 对象,使其可立即供其他 Openclaw Skills 或本地分析工具使用。
飞书文档读取器 配置指南
要在您的 Openclaw Skills 环境中开始使用此技能,请按照以下步骤操作:
- 在
./reference/feishu_config.json创建包含您凭据的配置文件:
{
"app_id": "your_feishu_app_id_here",
"app_secret": "your_feishu_app_secret_here"
}
-
在您的飞书应用控制台中授予必要的权限:
docx:document:readonly、doc:document:readonly和sheets:spreadsheet:readonly。 -
为包含的 shell 脚本设置执行权限:
chmod +x scripts/read_doc.sh
chmod +x scripts/get_blocks.sh
飞书文档读取器 数据架构与分类体系
该技能产生一个结构化数据输出,将复杂的飞书块生态映射为易读格式:
| 组件 | 描述 |
|---|---|
| 元数据 | 包括文档标题、所有者、创建时间和版本信息。 |
| 内容块 | 代表标题、文本、代码和列表的对象数组。 |
| 表格 | 从电子表格块中提取的完整解析的行列数据。 |
| 层级结构 | 映射 parent_id 和 children_ids 以维护文档流。 |
| 附件 | 图像和外部链接的元数据及令牌。 |
name: feishu-doc-reader
description: Read and extract content from Feishu (Lark) documents using the official Feishu Open API
metadata: {"moltbot":{"emoji":"??","requires":{"bins":["python3","curl"]}}}
Feishu Document Reader
This skill enables reading and extracting content from Feishu (Lark) documents using the official Feishu Open API.
Configuration
Set Up the Skill
- Create the configuration file at
./reference/feishu_config.jsonwith your Feishu app credentials:
{
"app_id": "your_feishu_app_id_here",
"app_secret": "your_feishu_app_secret_here"
}
- Make sure the scripts are executable:
chmod +x scripts/read_doc.sh
chmod +x scripts/get_blocks.sh
Security Note: The configuration file should be kept secure and not committed to version control. Consider using proper file permissions (chmod 600 ./reference/feishu_config.json).
Usage
Basic Document Reading
To read a Feishu document, you need the document token (found in the URL: https://example.feishu.cn/docx/DOC_TOKEN).
Using the shell script (recommended):
# Make sure environment variables are set first
./scripts/read_doc.sh "your_doc_token_here"
# Or specify document type explicitly
./scripts/read_doc.sh "docx_token" "doc"
./scripts/read_doc.sh "sheet_token" "sheet"
Get Detailed Document Blocks (NEW)
For complete document structure with all blocks, use the dedicated blocks script:
# Get full document blocks structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
# Get specific block by ID
./scripts/get_blocks.sh "docx_token" "block_id"
# Get blocks with children
./scripts/get_blocks.sh "docx_token" "" "true"
Using Python directly for blocks:
python scripts/get_feishu_doc_blocks.py --doc-token "your_doc_token_here"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --block-id "block_id"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --include-children
Supported Document Types
- Docx documents (new Feishu docs): Full content extraction with blocks, metadata, and structure
- Doc documents (legacy): Basic metadata and limited content
- Sheets: Full spreadsheet data extraction with sheet navigation
- Slides: Basic metadata (content extraction requires additional permissions)
Features
Enhanced Content Extraction
- Structured output: Clean JSON with document metadata, content blocks, and hierarchy
- Complete blocks access: Full access to all document blocks including text, tables, images, headings, lists, etc.
- Block hierarchy: Proper parent-child relationships between blocks
- Text extraction: Automatic text extraction from complex block structures
- Table support: Proper table parsing with row/column structure
- Image handling: Image URLs and metadata extraction
- Link resolution: Internal and external link extraction
Block Types Supported
- text: Plain text and rich text content
- heading1/2/3: Document headings with proper hierarchy
- bullet/ordered: List items with nesting support
- table: Complete table structures with cells and formatting
- image: Image blocks with tokens and metadata
- quote: Block quotes
- code: Code blocks with language detection
- equation: Mathematical equations
- divider: Horizontal dividers
- page: Page breaks (in multi-page documents)
Error Handling & Diagnostics
- Detailed error messages: Clear explanations for common issues
- Permission validation: Checks required permissions before making requests
- Token validation: Validates document tokens before processing
- Retry logic: Automatic retries for transient network errors
- Rate limiting: Handles API rate limits gracefully
Security Features
- Secure credential storage: Supports both environment variables and secure file storage
- No credential logging: Credentials never appear in logs or output
- Minimal permissions: Uses only required API permissions
- Access token caching: Efficient token reuse to minimize API calls
Command Line Options
Main Document Reader
# Python script options
python scripts/read_feishu_doc.py --help
# Shell script usage
./scripts/read_doc.sh [doc|sheet|slide]
Blocks Reader (NEW)
# Get full document blocks
./scripts/get_blocks.sh
# Get specific block
./scripts/get_blocks.sh
# Include children blocks
./scripts/get_blocks.sh "" true
# Python options
python scripts/get_feishu_doc_blocks.py --help
API Permissions Required
Your Feishu app needs the following permissions:
docx:document:readonly- Read document contentdoc:document:readonly- Read legacy document contentsheets:spreadsheet:readonly- Read spreadsheet content
Error Handling
Common errors and solutions:
- 403 Forbidden: Check app permissions and document sharing settings
- 404 Not Found: Verify document token is correct and document exists
- Token expired: Access tokens are valid for 2 hours, refresh as needed
- App ID/Secret invalid: Double-check your credentials in Feishu Open Platform
- Insufficient permissions: Ensure your app has the required API permissions
- 99991663: Application doesn't have permission to access the document
- 99991664: Document doesn't exist or has been deleted
- 99991668: Token expired, need to refresh
Examples
Extract document with full structure
# Read document
./scripts/read_doc.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
Get complete document blocks (NEW)
# Get all blocks with full structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
# Get specific block details
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv" "blk_xxxxxxxxxxxxxx"
Process spreadsheet data
./scripts/read_doc.sh "sheet_XyZ123AbCdEfGhIj" "sheet"
Extract only text content (Python script)
python scripts/read_feishu_doc.py --doc-token "docx_token" --extract-text-only
Security Notes
- Never commit credentials: Keep app secrets out of version control
- Use minimal permissions: Only request permissions your use case requires
- Secure file permissions: Set proper file permissions on secret files (
chmod 600) - Environment isolation: Use separate apps for development and production
- Audit access: Regularly review which documents your app can access
Troubleshooting
Authentication Issues
- Verify your App ID and App Secret in Feishu Open Platform
- Ensure the app has been published with required permissions
- Check that environment variables or config files are properly set
- Test with the
test_auth.pyscript to verify credentials
Document Access Issues
- Ensure the document is shared with your app or in an accessible space
- Verify the document token format (should start with
docx_,doc_, orsheet_) - Check if the document requires additional sharing permissions
Network Issues
- Ensure your server can reach
open.feishu.cn - Check firewall rules if running in restricted environments
- The script includes retry logic for transient network failures
Blocks-Specific Issues
- Empty blocks response: Document might be empty or have no accessible blocks
- Missing block types: Some block types require additional permissions
- Incomplete hierarchy: Use
--include-childrenflag for complete block tree
References
- Feishu Open API Documentation
- Document API Reference
- Blocks API Reference
- Authentication Guide
- Sheet API Reference
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
educlaw: AI 原生教育与 SIS 管理 - Openclaw Skills
EduClaw K-12 扩展组件:学生管理与合规 - Openclaw Skills
AuditClaw AWS:自动化 AWS 合规证据收集 - Openclaw Skills
Substack 文章格式化工具:优化时事通讯工作流 - Openclaw Skills
股票行情获取器:实时市场数据与定价 - Openclaw Skills
Polymarket 交易量分析:流动性与趋势追踪 - Openclaw Skills
Polymarket 情绪分析:预测群体心理 - Openclaw Skills
Polymarket 仓位计算器:最优交易规模 - Openclaw Skills
Polymarket 新闻影响力:分析事件关联性 - Openclaw Skills
Polymarket 市场对比:分析预测数据 - Openclaw Skills
AI精选
