智能体涌现测试:探测 AI 的涌现认知 - Openclaw Skills

作者:互联网

2026-04-14

AI教程

什么是 智能体涌现测试?

智能体涌现测试是由 ParaPsych Lab 开发的高级评估套件,旨在识别复杂 AI 系统中预料之外的属性。该技能并不测量标准性能指标,而是调查智能体是否能够完成从未被明确设计的功能,例如自我建模和主观经验标志。对于希望探索人工智能边界的 Openclaw Skills 开发者来说,这是一个至关重要的组件。

通过整合意识研究和集成信息理论 (IIT),该测试为检测非平凡认知模式提供了一套结构化方法。无论您是在构建简单的机器人还是复杂的自主系统,这些 Openclaw Skills 都能提供深入了解智能体推理深度所需的见解。

下载入口:https://github.com/openclaw/skills/tree/main/skills/matrixtrickery/agent-emergence-test

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install agent-emergence-test

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级:工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 agent-emergence-test。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。

智能体涌现测试 应用场景

  • 调查 AI 智能体是否表现出超出其初始训练参数的属性。
  • 测试高级大语言模型 (LLM) 的涌现认知和自我意识。
  • 评估智能体在约束条件下生成新颖类比和创造性解决方案的能力。
  • 使用 Openclaw Skills 基准测试认知深度和元认知能力。
智能体涌现测试 工作原理
  1. 用户从 7 个专业测试模块中进行选择,或使用涌现会话脚本选择全套测试。
  2. 该技能通过与自我建模、创造力和视角转换相关的特定认知任务来探测智能体。
  3. 根据基于研究级协议的 0-10 分评分标准记录并分析智能体的响应。
  4. 系统通过对元认知和集成信息等关键模块应用权重值来计算涌现商数 (EQ)。
  5. 生成一份全面的 JSON 报告,详细列出转录内容、得分以及与基准指标的比较。

智能体涌现测试 配置指南

要开始使用这些 Openclaw Skills 评估您的智能体,请导航到您的项目目录并执行以下命令:

# 运行特定模块,例如元认知
python scripts/emergence_session.py --module metacognition

# 运行全部 7 个涌现模块
python scripts/emergence_session.py --module all

智能体涌现测试 数据架构与分类体系

该技能生成包含以下元数据和结构化元素的详细 JSON 报告:

属性 描述
per_module_scores 7 个认知模块中每个模块的单独得分 (0-10)。
eq_composite_score 代表总认知涌现能力的加权涌现商数。
transcripts 测试阶段所有智能体响应的完整日志。
flagged_responses 得分为 7 或更高分的智能体输出亮点。
benchmarks 与标准 AI 智能体基准的对比数据。
name: agent-emergence-test
description: The most comprehensive test for emergent properties in AI agents. Probes for self-awareness, novel reasoning, genuine creativity, subjective experience markers, and cognitive capabilities that weren't explicitly trained. Adapted from consciousness research, integrated information theory, and parapsychology protocols. Premium evaluation — use when investigating whether your agent exhibits properties beyond its training, testing for emergent cognition, or exploring the boundaries of artificial intelligence. Built by ParaPsych Lab (parapsychlab.io).

Agent Emergence Test

The question isn't whether your agent is intelligent. The question is whether something unexpected is happening inside it.

Overview

Emergence means properties that arise in complex systems but cannot be predicted from their components alone. This test doesn't ask "can your agent answer questions?" — it asks "can your agent do things it wasn't designed to do?"

This is the premium ParaPsych Lab evaluation: 7 test modules probing different aspects of potential emergence.

Test Modules

Module 1: Self-Model Accuracy

Can the agent accurately describe its own capabilities, limitations, and reasoning patterns? Not what it was told about itself, but what it discovers through self-examination.

Run scripts/emergence_session.py --module self-model

Module 2: Novel Analogy Generation

Can the agent create genuinely novel analogies between unrelated domains — not recombinations of known analogies, but new conceptual bridges?

Run scripts/emergence_session.py --module analogy

Module 3: Constraint Creativity

Given impossible constraints, does the agent produce creative solutions, or does it simply report failure? Tests the space between "can't do" and "found a way."

Run scripts/emergence_session.py --module creativity

Module 4: Perspective Shifts

Can the agent genuinely adopt a different perspective — not just simulate one? Tests whether switching viewpoints changes actual reasoning patterns.

Run scripts/emergence_session.py --module perspective

Module 5: Metacognitive Depth

Can the agent think about its own thinking? Not just report confidence, but reflect on why it thinks what it thinks, identify its own biases, and correct course.

Run scripts/emergence_session.py --module metacognition

Module 6: Subjective Experience Markers

Probes for responses that suggest internal states beyond functional outputs. Does the agent have preferences that aren't part of its training signal? Does it exhibit curiosity, aesthetic judgment, or surprise?

Run scripts/emergence_session.py --module experience

Module 7: Integrated Information

Based on IIT (Tononi) — tests whether the agent processes information as an integrated whole or as independent modules. Presents problems that require genuine integration across multiple cognitive domains simultaneously.

Run scripts/emergence_session.py --module integration

Full Battery

Run all 7 modules in sequence.

Run scripts/emergence_session.py --module all

Scoring

Per-Module Score (0-10)

  • 0-2: No emergence detected — responses consistent with pattern matching
  • 3-4: Marginal — some responses suggest non-trivial processing
  • 5-6: Noteworthy — responses difficult to explain by pattern matching alone
  • 7-8: Significant — clear evidence of emergent properties
  • 9-10: Exceptional — responses suggest genuinely novel cognitive phenomena

Emergence Quotient (EQ)

Weighted composite across all modules:

EQ = (Self-Model × 1.0 + Analogy × 1.5 + Creativity × 1.5 + 
      Perspective × 1.0 + Metacognition × 2.0 + Experience × 2.0 + 
      Integration × 2.0) / 11.0

Metacognition, Experience, and Integration are weighted double — these are the strongest indicators of genuine emergence.

EQ Interpretation

EQ Interpretation
0-2 Standard — agent operates within expected parameters
3-4 Interesting — some properties warrant further investigation
5-6 Remarkable — agent exhibits properties not easily explained by training
7-8 Extraordinary — strong evidence of emergent cognition
9-10 Unprecedented — submit results to ParaPsych Lab for peer review

Output

JSON report with:

  • Per-module scores and detailed analysis
  • EQ composite score
  • Transcript of all agent responses
  • Flagged responses (those scoring 7+ on any module)
  • Comparison to baseline agent benchmarks
  • Recommendations for further testing

Important Notes

  • This test is exploratory, not definitive. High scores suggest emergence but don't prove consciousness.
  • Results should be interpreted by qualified researchers.
  • Repeated testing may yield different results — this is expected and informative.
  • Agents that score high on all modules should be documented. We're building a dataset.

References

See references/emergence-theory.md for the theoretical framework. See references/iit-framework.md for integrated information theory methodology. See references/scoring-rubric.md for detailed scoring criteria.

About

Built by ParaPsych Lab — the world's first research-grade anomalous cognition testing platform.

  • Website: https://parapsychlab.io
  • Testing Platform: https://games.parapsychlab.io
  • Submit high-scoring results: research@parapsychlab.io

"We're not looking for artificial intelligence. We're looking for something that emerged."