OpenClaw 自愈:AI 驱动的网关恢复 - Openclaw Skills
作者:互联网
2026-04-17
什么是 OpenClaw 自愈系统?
OpenClaw 自愈系统是一款专业级基础设施工具,旨在确保 AI 网关的持续可用性。它在 Openclaw Skills 框架内运行,针对进程崩溃、配置错误和资源泄漏等常见运行故障提供弹性多层防御。
通过将传统的看门狗坚控与先进的 AI 诊断相结合,此技能可最大限度地减少停机时间并减少人工干预的需求。对于通过 Openclaw Skills 部署复杂 Agent 和服务且需要“一劳永逸”环境的开发者来说,它是一个关键组件。
下载入口:https://github.com/openclaw/skills/tree/main/skills/ramsbaby/openclaw-self-healing
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install openclaw-self-healing
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 openclaw-self-healing。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
OpenClaw 自愈系统 应用场景
- 在意外崩溃或 HTTP 失败后自动重启 OpenClaw 网关。
- 在中断 Agent 运行前检测并解决过期的 OAuth 令牌。
- 清理导致端口冲突或 PID 不匹配的孤儿僵尸进程。
- 当自动重启失败时,执行 AI 辅助的配置模式故障排查。
- 坚控 macOS 和 Linux 生产环境中的系统健康状况。
- L1 配置坚控脚本坚控配置文件更改,以触发即时验证和重新加载。
- L2 看门狗持续轮询网关健康检查 URL,以检测进程挂起或连接问题。
- 如果故障持续存在,系统将执行指数退避重启策略并清除僵尸进程。
- 在 30 分钟故障窗口后,L3 触发 Claude Code AI 医生执行深度诊断并应用逻辑修复。
- 状态更新和关键升级警报通过 L4 与 Discord 或 T@elegrimm 的集成推送。
OpenClaw 自愈系统 配置指南
要在 Openclaw Skills 环境中部署此恢复系统,请确保已安装 tmux、jq 和 Claude Code CLI。
运行主安装脚本:
bash <(curl -fsSL https://raw.githubusercontent.com/Ramsbaby/openclaw-self-healing/main/install.sh)
或者通过 ClawHub 安装:
npx clawhub@latest install openclaw-self-healing
在 ~/.openclaw/.env 中配置环境变量(例如 Webhook URL)。
OpenClaw 自愈系统 数据架构与分类体系
系统整理其运行数据和恢复日志,以确保透明度和可审计性:
| 组件 | 类型 | 描述 |
|---|---|---|
| 环境配置 | .env | 存储网关 URL、重试限制和通知 Webhook |
| 健康指标 | JSON | 跟踪成功率、MTTR 和趋势故障模式 |
| 推理日志 | Markdown | L3 恢复期间 AI 诊断步骤的详细记录 |
| 进程坚控 | tmux | 活动恢复层的实时仪表板视图 |
name: openclaw-self-healing
version: 3.1.1
description: 4-tier autonomous self-healing and auto-recovery system for OpenClaw Gateway. Monitors gateway health, auto-restarts on crash, detects OAuth token expiry, kills zombie processes, and escalates to Claude Code AI for diagnosis when automated recovery fails. Use when your OpenClaw gateway crashes, stops responding, enters a restart loop, or needs automatic monitoring and recovery. Features watchdog, config validation, exponential backoff, Discord/T@elegrimm alerts. macOS & Linux.
metadata:
{
"openclaw":
{
"requires": { "bins": ["tmux", "claude", "jq"] },
"install":
[
{
"id": "tmux",
"kind": "brew",
"package": "tmux",
"bins": ["tmux"],
"label": "Install tmux (brew)",
},
{
"id": "claude",
"kind": "node",
"package": "@anthropic-ai/claude-code",
"bins": ["claude"],
"label": "Install Claude Code CLI (npm)",
},
{
"id": "jq",
"kind": "brew",
"package": "jq",
"bins": ["jq"],
"label": "Install jq (brew) - for metrics dashboard",
},
],
},
}
OpenClaw Self-Healing System
"The system that heals itself — or calls for help when it can't."
A 4-tier autonomous recovery system for OpenClaw Gateway, featuring AI-powered diagnosis via Claude Code. Tested in production on macOS + Linux.
Architecture
Level 1: config-watch → Config file change detection + instant reload
Level 2: Watchdog v4.4 → OAuth detection, zombie kill, exponential backoff
Level 3: Claude Code Doctor → AI-powered diagnosis & repair (30 min window) ??
Level 4: Discord/T@elegrimm → Human escalation with full context
What's New in v3.1.0
- Complete healing chain fix — config-watch → Watchdog → Emergency Recovery now fully connected
- Installer rewrite — single
install.shcovers macOS (LaunchAgent) + Linux (systemd) - Watchdog v4.4 — OAuth token expiry detection, zombie process auto-kill, Exponential Backoff
- Emergency Recovery v2 — persistent learning repo, reasoning logs, multi-model support (Claude Code + Aider)
- Metrics dashboard — success rate, MTTR, trending analysis via tmux
Quick Setup
bash <(curl -fsSL https://raw.githubusercontent.com/Ramsbaby/openclaw-self-healing/main/install.sh)
Or install via ClawHub:
npx clawhub@latest install openclaw-self-healing
The 4 Tiers in Detail
| Level | Script | Trigger | Action |
|---|---|---|---|
| L1 | config-watch.sh |
Config file change | Validate + reload gateway |
| L2 | gateway-watchdog.sh |
Process down / HTTP fail | Kill zombie → restart → backoff |
| L3 | emergency-recovery-v2.sh |
30min continuous failure | Claude Code PTY diagnosis |
| L4 | emergency-recovery-monitor.sh |
L3 triggered | Discord + T@elegrimm alert |
Configuration
All settings via environment variables in ~/.openclaw/.env:
| Variable | Default | Description |
|---|---|---|
DISCORD_WEBHOOK_URL |
(none) | Discord webhook for L4 alerts |
OPENCLAW_GATEWAY_URL |
http://localhost:18789/ |
Gateway health check URL |
HEALTH_CHECK_MAX_RETRIES |
3 |
Restart attempts before L3 escalation |
EMERGENCY_RECOVERY_TIMEOUT |
1800 |
Claude recovery timeout (30 min) |
Verified Recovery Cases
- OAuth token expiry — Watchdog v4.4 detects 401 in logs, restarts before agent dies
- Zombie process — Preflight detects PID mismatch, SIGKILL + launchctl kickstart
- Config schema error —
openclaw doctor --fixauto-applied on exit_1 pattern - Level 3 triggered — Claude Code diagnosed and fixed broken config in < 15 min
Links
- GitHub: https://github.com/Ramsbaby/openclaw-self-healing
- Changelog: https://github.com/Ramsbaby/openclaw-self-healing/blob/main/CHANGELOG.md
- Linux setup: https://github.com/Ramsbaby/openclaw-self-healing/blob/main/docs/LINUX_SETUP.md
License
MIT — built by @ramsbaby + Jarvis ??
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
TikTok 剪辑器:自动生成爆款短视频 - Openclaw Skills
ClawHub 发布:创建并部署 Openclaw 技能 - Openclaw Skills
DuckDuckGo 搜索:为 AI 提供即时网络访问 - Openclaw Skills
SynapseAI Wallet:AI 智能体托管支付 - Openclaw Skills
OSINT 图谱分析器:知识图谱与模式发现 - Openclaw Skills
Agent Relay Digest:精选 AI Agent 社区摘要 - Openclaw Skills
Pinterest API v5:自动化 Pin 和看板管理 - Openclaw Skills
FACEPALM:智能日志分析与聊天故障排除 - Openclaw 技能
clawback: 安全的 Gmail 代理与策略执行 - Openclaw Skills
Didit 身份验证:全球身份证件 OCR - Openclaw Skills
AI精选
