Qdrant 高级版:向量数据库操作与搜索 - Openclaw Skills

作者:互联网

2026-03-30

AI教程

什么是 Qdrant 高级向量操作?

Qdrant 高级版是一套专为 Openclaw Skills 设计的生产级脚本套件,旨在简化向量数据库操作。它通过提供文档摄取、集合生命周期管理和高级语义搜索功能的自动化工作流,架起了原始数据与可搜索嵌入之间的桥梁。

该技能使开发人员能够处理向量存储的复杂性,包括上下文分块策略和无缝集合迁移。无论您是在构建 RAG 管道还是管理高维向量数据,该工具集都提供了必要的 CLI 实用程序,以维护健壮且可扩展的 Qdrant 环境。

下载入口:https://github.com/openclaw/skills/tree/main/skills/yoder-bawt/qdrant-advanced

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install qdrant-advanced

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 qdrant-advanced。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

Qdrant 高级向量操作 应用场景

  • 在大型文档集中实现高精度语义搜索。
  • 使用智能分块策略自动摄取技术文档。
  • 以编程方式管理和优化 Qdrant 集合。
  • 创建用于灾难恢复的安全备份和快照。
  • 在集合之间迁移向量数据或升级嵌入模型。
Qdrant 高级向量操作 工作原理
  1. 通过配置 Qdrant 主机并提供 Openclaw Skills 集成所需的 API 密钥来初始化环境。
  2. 使用管理脚本创建并配置具有特定向量维度和距离度量的集合。
  3. 通过应用段落、句子或语义分割等分块策略的专用脚本摄取文档。
  4. 对摄取的数据执行语义搜索,支持元数据过滤和评分阈值。
  5. 通过优化、快照和迁移工具维护数据库健康,以确保长期性能。

Qdrant 高级向量操作 配置指南

要在您的 Openclaw Skills 环境中开始使用此技能,请配置环境变量并使用提供的 bash 脚本。

# 设置环境变量
export QDRANT_HOST="localhost"
export QDRANT_PORT="6333"
export OPENAI_API_KEY="sk-..."

# 创建您的第一个集合
bash manage.sh create my_collection 1536 cosine

# 摄取数据
bash ingest.sh /path/to/docs.txt my_collection paragraph

Qdrant 高级向量操作 数据架构与分类体系

此 Openclaw Skills 技能将数据组织到 Qdrant 集合中,具有特定的点和元数据模式。

组件 描述
向量 高维嵌入(例如,OpenAI 模型的 1536 维)。
有效载荷 包含源路径、类别和自定义标签的元数据 JSON。
分块 基于所选策略(段落、固定长度等)分割的文本段。
快照 用于备份和恢复的压缩集合状态。
name: qdrant-advanced
version: 1.0.0
description: "Advanced Qdrant vector database operations for AI agents. Semantic search, contextual document ingestion with chunking, collection management, snapshots, and migration tools. Production-ready scripts for the complete Qdrant lifecycle. Use when: (1) Implementing semantic search across collections, (2) Ingesting documents with intelligent chunking, (3) Managing collections programmatically, (4) Creating backups and migrations."
metadata:
  openclaw:
    requires:
      bins: ["curl", "python3", "bash"]
      env: ["QDRANT_HOST", "QDRANT_PORT", "OPENAI_API_KEY"]
      config: []
    user-invocable: true
  homepage: https://github.com/yoder-bawt
  author: yoder-bawt

Qdrant Advanced

Production-ready Qdrant vector database operations for AI agents. Complete toolkit for semantic search, document ingestion, collection management, backups, and migrations.

Quick Start

# Set environment variables
export QDRANT_HOST="localhost"
export QDRANT_PORT="6333"
export OPENAI_API_KEY="sk-..."

# List collections
bash manage.sh list

# Create a collection
bash manage.sh create my_collection 1536 cosine

# Ingest a document
bash ingest.sh /path/to/document.txt my_collection paragraph

# Search
bash search.sh "my search query" my_collection 5

Scripts Overview

Script Purpose Key Features
search.sh Semantic search Multi-collection, filters, score thresholds
ingest.sh Document ingestion Contextual chunking, batch upload, progress
manage.sh Collection management Create, delete, list, info, optimize
backup.sh Snapshots Full collection snapshots, restore, list
migrate.sh Migrations Collection-to-collection, embedding model upgrades

Environment Variables

Variable Required Default Description
QDRANT_HOST No localhost Qdrant server hostname
QDRANT_PORT No 6333 Qdrant server port
OPENAI_API_KEY Yes* - OpenAI API key for embeddings
QDRANT_API_KEY No - Qdrant API key (if auth enabled)

*Required for ingest and search operations

Detailed Usage

bash search.sh   [limit] [filter_json]

Examples:

# Basic search
bash search.sh "machine learning tutorials" my_docs 10

# With metadata filter
bash search.sh "deployment guide" my_docs 5 '{"must": [{"key": "category", "match": {"value": "devops"}}]}'

# Score threshold
bash search.sh "error handling" my_docs 10 "" 0.8

Output:

{
  "results": [
    {
      "id": "doc-001",
      "score": 0.92,
      "text": "When handling errors in production...",
      "metadata": {"source": "docs/error-handling.md"}
    }
  ]
}

Document Ingestion

bash ingest.sh   [chunk_strategy] [metadata_json]

Chunk Strategies:

Strategy Description Best For
paragraph Split by paragraphs ( ) Articles, docs
sentence Split by sentences Short content
fixed Fixed 1000 char chunks Code, logs
semantic Semantic boundaries Long documents

Examples:

# Ingest with paragraph chunking
bash ingest.sh article.md my_collection paragraph

# With custom metadata
bash ingest.sh api.md my_collection paragraph '{"category": "api", "version": "2.0"}'

# Ingest multiple files
for f in docs/*.md; do
    bash ingest.sh "$f" my_collection paragraph
done

Collection Management

bash manage.sh  [args...]

Commands:

Command Arguments Description
list - List all collections
create name dim distance Create new collection
delete name Delete collection
info name Get collection info
optimize name Optimize collection

Examples:

bash manage.sh list
bash manage.sh create my_vectors 1536 cosine
bash manage.sh create my_vectors 768 euclid
bash manage.sh info my_vectors
bash manage.sh optimize my_vectors
bash manage.sh delete my_vectors

Backup & Restore

bash backup.sh  [args...]

Commands:

Command Arguments Description
snapshot collection [snapshot_name] Create snapshot
restore collection snapshot_name Restore from snapshot
list collection List snapshots
delete collection snapshot_name Delete snapshot

Examples:

# Create snapshot
bash backup.sh snapshot my_collection
bash backup.sh snapshot my_collection backup_2026_02_10

# List snapshots
bash backup.sh list my_collection

# Restore
bash backup.sh restore my_collection backup_2026_02_10

# Delete old snapshot
bash backup.sh delete my_collection old_backup

Migration

bash migrate.sh   [options]

Migration Types:

  1. Copy Collection: Same embedding model, different name
  2. Model Upgrade: Upgrade to new embedding model (re-embeds)
  3. Filter Migration: Migrate subset with filter

Examples:

# Simple copy
bash migrate.sh old_collection new_collection

# With model upgrade (re-embeds all content)
bash migrate.sh old_collection new_collection --upgrade-model

# Filtered migration
bash migrate.sh old_collection new_collection --filter '{"category": "public"}'

# Batch size for large collections
bash migrate.sh old_collection new_collection --batch-size 50

Chunking Deep Dive

The ingest script provides intelligent chunking to preserve context:

Paragraph Chunking

  • Splits on double newlines
  • Preserves paragraph structure
  • Adds overlap of 2 sentences between chunks
  • Best for: Articles, documentation, blogs

Sentence Chunking

  • Splits on sentence boundaries
  • Minimal overlap
  • Best for: Short content, tweets, quotes

Fixed Chunking

  • Fixed 1000 character chunks
  • 200 character overlap
  • Best for: Code files, logs, unstructured text

Semantic Chunking

  • Uses paragraph + header detection
  • Preserves document structure
  • Best for: Long documents with headers

API Reference

All scripts use Qdrant REST API:

GET    /collections              # List collections
PUT    /collections/{name}       # Create collection
DELETE /collections/{name}       # Delete collection
GET    /collections/{name}       # Collection info
POST   /collections/{name}/points/search     # Search
PUT    /collections/{name}/points           # Upsert points
POST   /snapshots                # Create snapshot
GET    /collections/{name}/snapshots         # List snapshots

Full docs: https://qdrant.tech/documentation/

Performance Tips

  1. Batch uploads: ingest.sh automatically batches uploads (default 100)
  2. Optimize after bulk insert: bash manage.sh optimize my_collection
  3. Use filters: Narrow search scope with metadata filters
  4. Set score thresholds: Filter low-quality matches
  5. Index metadata: Add payload indexes for faster filtering

Troubleshooting

"Connection refused"

  • Check Qdrant is running: curl http://$QDRANT_HOST:$QDRANT_PORT/healthz
  • Verify host/port environment variables

"Collection not found"

  • List collections: bash manage.sh list
  • Check collection name spelling

"No search results"

  • Verify documents were ingested: bash manage.sh info my_collection
  • Check vector dimensions match (e.g., 1536 for text-embedding-3-small)
  • Try lowering score threshold

Embedding errors

  • Verify OPENAI_API_KEY is set
  • Check API key has quota available
  • Verify network access to OpenAI API

Snapshot fails

  • Check disk space available
  • Verify Qdrant has snapshot permissions
  • For large collections, try during low-traffic periods

Requirements

  • Qdrant server v1.0+
  • curl, python3, bash
  • OpenAI API key (for embeddings)
  • Network access to Qdrant and OpenAI

See Also

  • Qdrant Docs: https://qdrant.tech/documentation/
  • OpenAI Embeddings: https://platform.openai.com/docs/guides/embeddings
  • Vector Search Guide: https://qdrant.tech/documentation/concepts/search/