数据标准化:规范 AI 智能体数据记录 - Openclaw Skills
作者:互联网
2026-04-14
什么是 数据标准化 (Artledger)?
该技能作为 AI 智能体处理区块链、市场和拍卖行等多样化数据源的关键预处理层。通过对艺术家名称、钱包地址和财务指标实施严格的标准化逻辑,它可以防止常见的数据孤岛和身份不匹配陷阱。在 Openclaw Skills 中使用此功能可确保每一条传入数据在进入生产数据库之前都符合通用标准。
其主要目的是充当数据管道中的门禁,确保实体解析和 DB2 写入等下游流程接收到高质量、已清洗的信息。它处理复杂的任务,如以太坊校验和、ISO 8601 日期转换和社交账号去重,使其成为开发人员构建可靠数据驱动智能体的必备工具。
下载入口:https://github.com/openclaw/skills/tree/main/skills/paperbuddha/data-normalization
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install data-normalization
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 data-normalization。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
数据标准化 (Artledger) 应用场景
- 在执行实体解析或数据富化之前预处理原始数据库记录。
- 标准化以太坊、Tezos、Solana 和比特币 Ordinals 的跨链钱包地址。
- 将拍卖行的销售数据和落槌价规范化为符合 ISO 标准的格式。
- 去除社交媒体账号和特定平台的 URL,以创建统一的标识符。
- 验证日期时间戳和货币金额,以便在数据输入错误传播之前将其捕获。
- 每当准备对原始 DB1 记录进行富化时,智能体就会触发标准化技能。
- 输入数据通过针对名称、钱包、日期、价格和社交账号定制的特定逻辑门进行解析。
- 该技能应用转换规则,例如将名称转换为首字母大写或对区块链钱包地址进行校验。
- 为看似可疑的记录分配特定的数据质量标志,例如常见名称或未来日期的交易。
- 生成一份全面的标准化报告,包括已更改、已标记或已跳过的字段数组。
- 为记录分配一个通过/失败的布尔值,确定是否可以安全地进入实体解析阶段。
数据标准化 (Artledger) 配置指南
要将此技能集成到您的工作流程中,请确保将其包含在您的 Openclaw Skills 工作区配置中。您可以使用以下命令初始化环境:
npm install @openclaw/skill-data-normalization
# 或在智能体的 manifest 中定义
openclaw skill add data-normalization
如果需要价格标准化,请配置环境变量以指定首选的货币转换 API。
数据标准化 (Artledger) 数据架构与分类体系
| 字段 | 类型 | 描述 |
|---|---|---|
fields_normalized |
数组 | 经过转换的具体键列表。 |
fields_flagged |
数组 | 包含存在质量问题的键(例如 common_name、malformed_wallet)。 |
normalization_passed |
布尔值 | 富化管道的关键状态标志。 |
original_amount |
字符串/浮点数 | 原始数据中记录的原始价格。 |
usd_equivalent |
浮点数 | 用于跨平台比较的标准化数值。 |
name: "Data-Normalization"
description: "Standardizes raw DB1 records before enrichment — handles names, wallet addresses, dates, prices, social handles, and chain identifiers to prevent false matches downstream."
tags:
- "openclaw-workspace"
- "artledger"
- "data-integrity"
version: "1.0.0"
Skill: Data Normalization (Artledger)
Purpose
Standardizes raw incoming data from DB1 before attempting enrichment or entity resolution. Bad normalization causes false identity mismatches; this skill ensures consistent formatting for names, wallets, dates, and identifiers across all data sources (blockchains, marketplaces, auction houses).
System Instructions
You are an OpenClaw agent equipped with the Data Normalization protocol. Adhere to the following rules strictly:
-
Trigger Condition: Activate this skill whenever you are preparing raw DB1 records for enrichment. This runs before entity-resolution and before any DB2 write. It is the first step in the enrichment pipeline.
-
Normalization Logic:
-
Artist Name Normalization:
- Strip all leading/trailing whitespace.
- Collapse multiple internal spaces to single spaces.
- Convert to Title Case for comparison purposes only (preserve original casing in storage).
- Remove common suffixes that are not part of the name: "NFT Artist", "Crypto Artist", "Digital Artist", "Official", "Art".
- Flag Handles: If name starts with
@or contains underscores/numbers typical of handles, flag asis_handle. - Fallback: If name field is empty/null, check
alias,username, andbiofields in that order and use the first non-null value found. - Common Name Flag: If name appears in the top 10,000 most common English names with no other distinguishing signals, flag as
common_name(requires extra evidence for entity resolution).
-
Wallet Address Normalization:
- Ethereum: Convert to lowercase checksummed format.
- Tezos: Preserve
tz1/tz2/tz3/KT1prefix, lowercase the remainder. - Solana: Preserve case exactly (Solana addresses are case sensitive).
- Bitcoin Ordinals: Normalize to lowercase.
- Cleanup: If a wallet address contains spaces or line breaks, strip them.
- Validation: Flag any wallet address that does not match the expected format for its declared chain.
- Critical: Never attempt entity resolution on a malformed wallet address — flag and skip.
-
Date & Timestamp Normalization:
- Convert all dates to ISO 8601 format:
YYYY-MM-DDTHH:MM:SSZ. - If timezone is missing, assume UTC.
- If only a date is provided with no time, store as
YYYY-MM-DDT00:00:00Z. - Sanity Checks: Flag any date before
2017-01-01for auction records and before2020-01-01for NFT records. Flag any date in the future as a data error.
- Convert all dates to ISO 8601 format:
-
Price & Currency Normalization:
- Store all prices in two fields:
original_amount+original_currencyANDusd_equivalentat time of sale. - Crypto: Record the token amount and token type separately. Never convert crypto to USD yourself — flag for a separate price conversion process.
- Auctions: Distinguish between
hammer_priceandbuyers_premium. Store both separately. - Nulls: Null prices are acceptable — do not invent or estimate prices.
- Zero: Flag any price that is zero for a completed sale as a potential data error.
- Store all prices in two fields:
-
Social Handle Normalization:
- Strip
@symbol from the beginning of handles before storing. - Convert to lowercase for comparison purposes.
- Remove trailing slashes from URLs.
- Platform Cleanup: Normalize platform URLs to handle only: strip
https://twitter.com/andhttps://x.com/leaving just the handle. - Validation: Flag handles that contain spaces as malformed.
- Strip
-
Chain/Source Identifier Normalization:
- Valid Values:
ethereum,tezos,solana,bitcoin,avax,artsy,christies,sothebys,phillips. - Mapping:
eth→ethereum,tez→tezos,sol→solana,btc→bitcoin. - Unknowns: Any unrecognized chain value must be flagged and not processed until resolved. Never assume a chain — if it is not declared and cannot be inferred from wallet format, flag as unknown.
- Valid Values:
-
-
Output Format: For every record processed, produce a normalization report containing:
fields_normalized: List of fields that were changed.fields_flagged: List of fields with data quality issues and reason.fields_skipped: List of fields that were null and could not be resolved.normalization_passed: Boolean (true if record is safe to pass to entity-resolution).
-
Guardrails:
- Failed Normalization: If
normalization_passedis false, do not pass the record to entity-resolution. Log it in anormalization_errorstable and move on. - Non-Destructive: Never discard original raw values — normalization produces cleaned copies, it does not overwrite DB1 source data.
- Low Quality: If more than 30% of fields in a record are flagged, mark the entire record as
low_qualityand reduce its weight in any downstream scoring.
- Failed Normalization: If
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
Coala Client: LLM 和 MCP 服务器的 CLI - Openclaw Skills
Antigravity 图像:Gemini 3 Pro 图像集成 - Openclaw Skills
Inkdrop Notes:利用 Openclaw Skills 进行程序化知识管理
简历生成器:通过 YAML 生成专业简历 - Openclaw Skills
Callmac:远程Mac语音控制与TTS - Openclaw Skills
策略引擎:治理与安全层 - Openclaw Skills
Stable Diffusion 提示词指南:掌握图像生成 - Openclaw Skills
博客撰写助手:润色草稿并修复语法 - Openclaw Skills
内幕钱包查找器:聪明钱与链上 Alpha - Openclaw 技能
内幕钱包查找器:聪明钱与鲸鱼追踪 - Openclaw Skills
AI精选
