增强型记忆:混合向量与关键词搜索 - Openclaw Skills

作者:互联网

2026-03-30

AI教程

什么是 增强型记忆?

增强型记忆是一个专门设计的扩展插件,旨在显著提升 Openclaw Skills 的检索能力。与标准的扁平向量搜索不同,此技能实现了 4 信号混合检索流水线,融合了向量相似度、关键词匹配、标题对齐和文件路径评分。这种架构使 AI 智能体能够以更高的精度找到相关信息,实现了记录在案的 0.782 平均倒数排名 (MRR)。

通过 Ollama 调用本地嵌入模型,增强型记忆在确保数据隐私和本地化的同时,提供了高级功能,如时间路由(根据日期参考优先处理文件)和伪相关反馈(动态优化搜索结果)。对于构建需要鲁棒、上下文感知记忆管理的复杂 Openclaw Skills 的开发者来说,这是理想的解决方案。

下载入口:https://github.com/openclaw/skills/tree/main/skills/jameseball/enhanced-memory

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install enhanced-memory

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 enhanced-memory。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

增强型记忆 应用场景

  • 提高管理大型文档或记忆库的智能体的检索准确率。
  • 自动构建相关记忆文件之间的交叉引用知识图谱。
  • 使用内置的时间路由逻辑优先处理近期或时效性信息。
  • 通过自动显著性评分识别陈旧或极重要的记忆项。
增强型记忆 工作原理
  1. 系统根据标题结构对 memory 目录下的 Markdown 文件进行分块。
  2. 通过本地 Ollama 实例使用 nomic-embed-text 模型对分块进行嵌入。
  3. 发起查询时,搜索引擎计算四个不同信号的分数:向量相似度、关键词重叠、标题匹配和文件路径相关性。
  4. 如果查询包含“昨天”或特定月份等日期相关术语,系统会应用时间增强。
  5. 如果初始搜索分数低于设定阈值,系统会触发伪相关反馈以扩展查询并重新运行搜索以获得更好结果。
  6. 结果以代码片段、标题和文件路径的排名列表形式返回。

增强型记忆 配置指南

# 1. 安装 Ollama 并拉取所需的嵌入模型
ollama pull nomic-embed-text

# 2. 从工作区根目录索引您的记忆文件
python3 skills/enhanced-memory/scripts/embed_memories.py

# 3. 可选:构建交叉引用知识图谱
python3 skills/enhanced-memory/scripts/crossref_memories.py build

增强型记忆 数据架构与分类体系

组件 数据类型 描述
memory/vectors.json JSON 文件 嵌入向量、分块文本及元数据的持久化存储。
search_memory.py Python 脚本 4 信号混合检索逻辑的主要入口点。
memory_salience.py Python 脚本 计算记忆项重要性与陈旧程度的逻辑。
crossref_memories.py Python 脚本 生成不同记忆块之间的相似性链接。
name: enhanced-memory
description: Enhanced memory search with hybrid vector+keyword scoring, temporal routing, filepath scoring, adaptive weighting, pseudo-relevance feedback, salience scoring, and knowledge graph cross-references. Replaces the default memory search with a 4-signal fusion retrieval system. Use when searching memories, indexing memory files, building cross-references, or scoring memory salience. Requires Ollama with nomic-embed-text model.

Enhanced Memory

Drop-in enhancement for OpenClaw's memory system. Replaces flat vector search with a 4-signal hybrid retrieval pipeline that achieved 0.782 MRR (vs ~0.45 baseline vector-only).

Setup

# Install Ollama and pull the embedding model
ollama pull nomic-embed-text

# Index your memory files (run from workspace root)
python3 skills/enhanced-memory/scripts/embed_memories.py

# Optional: build cross-reference graph
python3 skills/enhanced-memory/scripts/crossref_memories.py build

Re-run embed_memories.py whenever memory files change significantly.

Scripts

Hybrid 4-signal retrieval with automatic adaptation:

python3 skills/enhanced-memory/scripts/search_memory.py "query" [top_n]

Signals fused:

  1. Vector similarity (0.4) — cosine similarity via nomic-embed-text embeddings
  2. Keyword matching (0.25) — query term overlap with chunk text
  3. Header matching (0.1) — query terms in section headers
  4. Filepath scoring (0.25) — query terms matching file/directory names

Automatic behaviors:

  • Temporal routing — date references ("yesterday", "Feb 8", "last Monday") get 3x boost on matching files
  • Adaptive weighting — when keyword overlap is low, shifts to 85% vector weight
  • Pseudo-relevance feedback (PRF) — when top score < 0.45, expands query with terms from initial results and re-scores

Same pipeline with JSON output format compatible with OpenClaw's memory_search tool:

python3 skills/enhanced-memory/scripts/enhanced_memory_search.py --json "query"

Returns {results: [{path, startLine, endLine, score, snippet, header}], ...}.

scripts/embed_memories.py — Indexing

Chunks all .md files in memory/ plus core workspace files (MEMORY.md, AGENTS.md, etc.) by markdown headers and embeds them:

python3 skills/enhanced-memory/scripts/embed_memories.py

Outputs memory/vectors.json. Batches embeddings in groups of 20, truncates chunks to 2000 chars.

scripts/memory_salience.py — Salience Scoring

Surfaces stale/important memory items for heartbeat self-prompting:

python3 skills/enhanced-memory/scripts/memory_salience.py          # Human-readable prompts
python3 skills/enhanced-memory/scripts/memory_salience.py --json   # Programmatic output
python3 skills/enhanced-memory/scripts/memory_salience.py --top 5  # More items

Scores importance × staleness considering: file type (topic > core > daily), size, access frequency, and query gap correlation.

scripts/crossref_memories.py — Knowledge Graph

Builds cross-reference links between memory chunks using embedding similarity:

python3 skills/enhanced-memory/scripts/crossref_memories.py build          # Build index
python3 skills/enhanced-memory/scripts/crossref_memories.py show     # Show refs for file
python3 skills/enhanced-memory/scripts/crossref_memories.py graph          # Graph statistics

Uses file-representative approach (top 5 chunks per file) to reduce O(n2) to manageable comparisons. Threshold: 0.75 cosine similarity.

Configuration

All tunable constants are at the top of each script. Key parameters:

Parameter Default Script Purpose
VECTOR_WEIGHT 0.4 search_memory.py Weight for vector similarity
KEYWORD_WEIGHT 0.25 search_memory.py Weight for keyword overlap
FILEPATH_WEIGHT 0.25 search_memory.py Weight for filepath matching
TEMPORAL_BOOST 3.0 search_memory.py Multiplier for date-matching files
PRF_THRESHOLD 0.45 search_memory.py Score below which PRF activates
SIMILARITY_THRESHOLD 0.75 crossref_memories.py Min similarity for cross-ref links
MODEL nomic-embed-text all Ollama embedding model

To use a different embedding model (e.g., mxbai-embed-large), change MODEL in each script and re-run embed_memories.py.

Integration

To replace the default memory search, point your agent's search tool at these scripts. The scripts expect:

  • memory/ directory relative to workspace root containing .md files
  • memory/vectors.json (created by embed_memories.py)
  • Ollama running locally on port 11434

All scripts use only Python stdlib + Ollama HTTP API. No pip dependencies.

相关推荐