飞书文档读取器:提取飞书文档内容 - Openclaw Skills

作者:互联网

2026-04-13

AI教程

什么是 飞书文档读取器?

飞书文档读取器是一款专门设计的工具,旨在弥合飞书(Lark)内容与开发者驱动的自动化之间的鸿沟。作为 Openclaw Skills 生态系统的核心组件,它利用飞书官方开放 API 将复杂的文档结构转换为简洁、可操作的 JSON 数据。对于正在构建需要直接从协作云文档中摄取组织知识的 AI 驱动工作流的开发者来说,该技能尤其具有价值。

通过处理租户身份验证和 API 速率限制的复杂细节,该技能使您能够专注于数据而非协议。它支持从标准 docx 文件到旧版文档和电子表格的广泛内容类型,确保您的 Openclaw Skills 实现能够灵活且稳健地应对不同的企业使用场景。

下载入口:https://github.com/openclaw/skills/tree/main/skills/snowshadow/feishu-doc-reader

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install feishu-doc-reader

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 feishu-doc-reader。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

飞书文档读取器 应用场景

  • 通过 Openclaw Skills 将飞书文档内容喂给基于 RAG 的 AI 智能体,从而同步团队知识。
  • 自动从飞书表格中提取财务或项目数据,用于外部报告仪表板。
  • 以编程方式将文档从飞书迁移到 Markdown 或其他内部知识库。
  • 在飞书租户内对文档元数据和访问权限进行批量审计。
飞书文档读取器 工作原理
  1. 用户在参考目录(reference directory)中配置应用凭据(App ID 和 Secret)以建立安全连接。
  2. 触发时,该技能向飞书开放平台请求租户访问令牌(tenant access token),并进行缓存以优化性能。
  3. 该技能根据提供的令牌识别文档类型(docx、表格或旧版文档),并选择相应的 API 端点。
  4. 它递归地获取文档块(包括标题、表格和图像),同时保持原始的父子层级结构。
  5. 原始 API 响应被解析并格式化为结构化 JSON 对象,使其可立即供其他 Openclaw Skills 或本地分析工具使用。

飞书文档读取器 配置指南

要在您的 Openclaw Skills 环境中开始使用此技能,请按照以下步骤操作:

  1. ./reference/feishu_config.json 创建包含您凭据的配置文件:
{
  "app_id": "your_feishu_app_id_here",
  "app_secret": "your_feishu_app_secret_here"
}
  1. 在您的飞书应用控制台中授予必要的权限:docx:document:readonlydoc:document:readonlysheets:spreadsheet:readonly

  2. 为包含的 shell 脚本设置执行权限:

chmod +x scripts/read_doc.sh
chmod +x scripts/get_blocks.sh

飞书文档读取器 数据架构与分类体系

该技能产生一个结构化数据输出,将复杂的飞书块生态映射为易读格式:

组件 描述
元数据 包括文档标题、所有者、创建时间和版本信息。
内容块 代表标题、文本、代码和列表的对象数组。
表格 从电子表格块中提取的完整解析的行列数据。
层级结构 映射 parent_id 和 children_ids 以维护文档流。
附件 图像和外部链接的元数据及令牌。
name: feishu-doc-reader
description: Read and extract content from Feishu (Lark) documents using the official Feishu Open API
metadata: {"moltbot":{"emoji":"??","requires":{"bins":["python3","curl"]}}}

Feishu Document Reader

This skill enables reading and extracting content from Feishu (Lark) documents using the official Feishu Open API.

Configuration

Set Up the Skill

  1. Create the configuration file at ./reference/feishu_config.json with your Feishu app credentials:
{
  "app_id": "your_feishu_app_id_here",
  "app_secret": "your_feishu_app_secret_here"
}
  1. Make sure the scripts are executable:
chmod +x scripts/read_doc.sh
chmod +x scripts/get_blocks.sh

Security Note: The configuration file should be kept secure and not committed to version control. Consider using proper file permissions (chmod 600 ./reference/feishu_config.json).

Usage

Basic Document Reading

To read a Feishu document, you need the document token (found in the URL: https://example.feishu.cn/docx/DOC_TOKEN).

Using the shell script (recommended):

# Make sure environment variables are set first
./scripts/read_doc.sh "your_doc_token_here"

# Or specify document type explicitly
./scripts/read_doc.sh "docx_token" "doc"
./scripts/read_doc.sh "sheet_token" "sheet"

Get Detailed Document Blocks (NEW)

For complete document structure with all blocks, use the dedicated blocks script:

# Get full document blocks structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"

# Get specific block by ID
./scripts/get_blocks.sh "docx_token" "block_id"

# Get blocks with children
./scripts/get_blocks.sh "docx_token" "" "true"

Using Python directly for blocks:

python scripts/get_feishu_doc_blocks.py --doc-token "your_doc_token_here"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --block-id "block_id"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --include-children

Supported Document Types

  • Docx documents (new Feishu docs): Full content extraction with blocks, metadata, and structure
  • Doc documents (legacy): Basic metadata and limited content
  • Sheets: Full spreadsheet data extraction with sheet navigation
  • Slides: Basic metadata (content extraction requires additional permissions)

Features

Enhanced Content Extraction

  • Structured output: Clean JSON with document metadata, content blocks, and hierarchy
  • Complete blocks access: Full access to all document blocks including text, tables, images, headings, lists, etc.
  • Block hierarchy: Proper parent-child relationships between blocks
  • Text extraction: Automatic text extraction from complex block structures
  • Table support: Proper table parsing with row/column structure
  • Image handling: Image URLs and metadata extraction
  • Link resolution: Internal and external link extraction

Block Types Supported

  • text: Plain text and rich text content
  • heading1/2/3: Document headings with proper hierarchy
  • bullet/ordered: List items with nesting support
  • table: Complete table structures with cells and formatting
  • image: Image blocks with tokens and metadata
  • quote: Block quotes
  • code: Code blocks with language detection
  • equation: Mathematical equations
  • divider: Horizontal dividers
  • page: Page breaks (in multi-page documents)

Error Handling & Diagnostics

  • Detailed error messages: Clear explanations for common issues
  • Permission validation: Checks required permissions before making requests
  • Token validation: Validates document tokens before processing
  • Retry logic: Automatic retries for transient network errors
  • Rate limiting: Handles API rate limits gracefully

Security Features

  • Secure credential storage: Supports both environment variables and secure file storage
  • No credential logging: Credentials never appear in logs or output
  • Minimal permissions: Uses only required API permissions
  • Access token caching: Efficient token reuse to minimize API calls

Command Line Options

Main Document Reader

# Python script options
python scripts/read_feishu_doc.py --help

# Shell script usage
./scripts/read_doc.sh  [doc|sheet|slide]

Blocks Reader (NEW)

# Get full document blocks
./scripts/get_blocks.sh 

# Get specific block
./scripts/get_blocks.sh  

# Include children blocks
./scripts/get_blocks.sh  "" true

# Python options
python scripts/get_feishu_doc_blocks.py --help

API Permissions Required

Your Feishu app needs the following permissions:

  • docx:document:readonly - Read document content
  • doc:document:readonly - Read legacy document content
  • sheets:spreadsheet:readonly - Read spreadsheet content

Error Handling

Common errors and solutions:

  • 403 Forbidden: Check app permissions and document sharing settings
  • 404 Not Found: Verify document token is correct and document exists
  • Token expired: Access tokens are valid for 2 hours, refresh as needed
  • App ID/Secret invalid: Double-check your credentials in Feishu Open Platform
  • Insufficient permissions: Ensure your app has the required API permissions
  • 99991663: Application doesn't have permission to access the document
  • 99991664: Document doesn't exist or has been deleted
  • 99991668: Token expired, need to refresh

Examples

Extract document with full structure

# Read document
./scripts/read_doc.sh "docx_AbCdEfGhIjKlMnOpQrStUv"

Get complete document blocks (NEW)

# Get all blocks with full structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"

# Get specific block details
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv" "blk_xxxxxxxxxxxxxx"

Process spreadsheet data

./scripts/read_doc.sh "sheet_XyZ123AbCdEfGhIj" "sheet"

Extract only text content (Python script)

python scripts/read_feishu_doc.py --doc-token "docx_token" --extract-text-only

Security Notes

  • Never commit credentials: Keep app secrets out of version control
  • Use minimal permissions: Only request permissions your use case requires
  • Secure file permissions: Set proper file permissions on secret files (chmod 600)
  • Environment isolation: Use separate apps for development and production
  • Audit access: Regularly review which documents your app can access

Troubleshooting

Authentication Issues

  1. Verify your App ID and App Secret in Feishu Open Platform
  2. Ensure the app has been published with required permissions
  3. Check that environment variables or config files are properly set
  4. Test with the test_auth.py script to verify credentials

Document Access Issues

  1. Ensure the document is shared with your app or in an accessible space
  2. Verify the document token format (should start with docx_, doc_, or sheet_)
  3. Check if the document requires additional sharing permissions

Network Issues

  1. Ensure your server can reach open.feishu.cn
  2. Check firewall rules if running in restricted environments
  3. The script includes retry logic for transient network failures

Blocks-Specific Issues

  1. Empty blocks response: Document might be empty or have no accessible blocks
  2. Missing block types: Some block types require additional permissions
  3. Incomplete hierarchy: Use --include-children flag for complete block tree

References

  • Feishu Open API Documentation
  • Document API Reference
  • Blocks API Reference
  • Authentication Guide
  • Sheet API Reference