Edge TTS 语音技能：高质量 AI 文本转语音

AI智能体脚本智能办公脚本自动化游戏脚本浏览器自动化脚本服务器脚本

Edge TTS 语音技能：高质量 AI 文本转语音 - Openclaw Skills

作者：互联网

2026-04-06

AI教程

什么是 Edge TTS 语音技能？

语音技能 (Edge TTS) 是专为需要高质量、自然语音输出的 AI 智能体设计的精密文本转语音集成方案。通过利用 Microsoft Edge TTS 引擎，Openclaw Skills 库中的这一组件为包括中文、英语、日语和韩语在内的多种语言提供了一种高性价比、低延迟的文本转语音解决方案。

该技能以其企业级安全架构和对实时串流播放的原生支持而脱颖而出。无论您是需要为文档生成静态音频文件，还是在智能体交互期间提供即时语音反馈，此工具都能对语音选择、语速、音量和音调进行细粒度控制，且无需复杂的云端 API 配置。使用此类 Openclaw Skills 可确保您的智能体在各种国际语境中进行有效沟通。

下载入口:https://github.com/openclaw/skills/tree/main/skills/zhaov1976/voice-edge-tts

安装与下载

1. ClawHub CLI

从源直接安装技能的最快方式。

npx clawhub@latest install voice-edge-tts

2. 手动安装

将技能文件夹复制到以下位置之一

全局模式 ~/.openclaw/skills/ 工作区 /skills/

优先级：工作区 > 本地 > 内置

3. 提示词安装

将此提示词复制到 OpenClaw 即可自动安装。

请帮我使用 Clawhub 安装 voice-edge-tts。如果尚未安装 Clawhub，请先安装（npm i -g clawhub）。

Edge TTS 语音技能应用场景

AI 助手和自主编码智能体的实时语音交互。
为内容旁白和文档生成高质量的本地化音频文件。
为长时间运行的 CLI 任务或状态通知提供听觉反馈。
构建需要自然语音合成且无需高昂 API 成本的多语言应用。

Edge TTS 语音技能工作原理

AI 智能体将文本字符串和配置参数（语音、语速、音调）发送到技能接口。
技能针对安全白名单执行严格的输入验证，以防止命令注入漏洞。
Edge TTS 引擎将文本处理成高质量音频流或文件格式。
对于串流操作，音频通过 ffmpeg 处理以进行实时播放；对于 tts 操作，返回本地文件路径供后续使用。

Edge TTS 语音技能配置指南

要从 Openclaw Skills 仓库集成此组件，请先安装必要的 Python 依赖：

pip install edge-tts

接下来，根据您的特定平台安装 ffmpeg 以启用串流功能：

macOS: brew install ffmpeg
Linux: sudo apt install ffmpeg
Windows: 从 CodexFFmpeg 发布页下载最新版本并将 bin 文件夹添加到系统 PATH。

Edge TTS 语音技能数据架构与分类体系

该技能通过结构化架构管理音频生成，返回成功状态和媒体路径。配置通过以下参数处理：

选项	类型	默认值	描述
voice	字符串	zh-CN-XiaoxiaoNeural	要使用的神经网络语音 ID
rate	字符串	+0%	语速调整 (-50% 至 +100%)
volume	字符串	+0%	音量级别调整 (-50% 至 +50%)
pitch	字符串	+0Hz	音调频率调整

成功生成文件后将返回一个 JSON 对象，其中包含带有 MEDIA: 前缀路径的 media 键。

Voice Skill (Edge TTS)

Text-to-speech skill using Microsoft Edge TTS engine with real-time streaming playback support.

Features 功能特点

Edge TTS Engine - High quality text-to-speech using Microsoft Edge
Streaming Playback - Real-time audio streaming (边生成边播放)
Multiple Voices - Support for Chinese, English, Japanese, Korean voices
Customizable - Adjust rate, volume, and pitch
Secure Implementation - No command injection vulnerabilities

Installation 安装

1. Install Python dependencies

pip install edge-tts

2. Install ffmpeg (required for streaming)

Windows: Download from: https://github.com/GyanD/codexffmpeg/releases Extract and add bin folder to PATH

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg

Usage 使用

Streaming Playback (Recommended) 流式播放（推荐）

Real-time audio generation and playback:

// Basic usage
await skill.execute({
  action: 'stream',
  text: '你好，我是小九'
});

// With custom voice
await skill.execute({
  action: 'stream',
  text: 'Hello, how are you?',
  options: {
    voice: 'en-US-Standard-A',
    rate: '+10%',
    volume: '+0%',
    pitch: '+0Hz'
  }
});

Text-to-Speech with File 生成语音文件

await skill.execute({
  action: 'tts',
  text: 'Hello, how are you today?',
  options: {
    voice: 'zh-CN-XiaoxiaoNeural'
  }
});
// Returns: { success: true, media: 'MEDIA: /path/to/file.mp3' }

Direct Speak 直接播放

await skill.execute({
  action: 'speak',
  text: 'Hello!'
});

List Available Voices 查看可用语音

await skill.execute({
  action: 'voices'
});

Available Voices 可用语音

Language	Voice ID
Chinese (Female)	zh-CN-XiaoxiaoNeural
Chinese (Male)	zh-CN-YunxiNeural
Chinese (Male)	zh-CN-YunyangNeural
English (US Female)	en-US-Standard-A
English (US Male)	en-US-Standard-D
English (UK)	en-GB-Standard-A
Japanese	ja-JP-NanamiNeural
Korean	ko-KR-SunHiNeural

Options 参数

Option	Default	Description
voice	zh-CN-XiaoxiaoNeural	Voice ID
rate	+0%	Speech rate (-50% to +100%)
volume	+0%	Volume adjustment (-50% to +50%)
pitch	+0Hz	Pitch adjustment

Security 安全

This skill implements enterprise-grade security best practices:

??? Security Features

Feature	Implementation
Input Validation	Voice parameter whitelist validation - only allowed voices can be used
No Shell Execution	Uses `spawn()` with array arguments instead of shell command concatenation
Command Injection Prevention	All user inputs are properly validated and escaped
Path Safety	Fixed script path prevents path traversal

Security Details

// ? UNSAFE - Don't use exec with string concatenation
exec(`py script.py "${userText}" --voice ${userVoice}`);

// ? SAFE - Use spawn with array arguments
spawn('py', [scriptPath, text, '--voice', voice], { shell: false });

Voice Whitelist

Only these voices are allowed:

const allowedVoices = [
  'zh-CN-XiaoxiaoNeural', 'zh-CN-YunxiNeural', 'zh-CN-YunyangNeural',
  'zh-CN-YunyouNeural', 'zh-CN-XiaomoNeural',
  'en-US-Standard-C', 'en-US-Standard-D', 'en-US-Wavenet-F',
  'en-GB-Standard-A', 'en-GB-Wavenet-A',
  'ja-JP-NanamiNeural', 'ko-KR-SunHiNeural'
];

Any invalid voice parameter will be rejected and replaced with the default voice.

Changelog 更新日志

v1.10 (2026-02-24)

Enterprise-grade security - Full command injection protection
Voice whitelist validation
Replaced exec with spawn for secure process execution
Input sanitization for all parameters

v1.1.0

Add streaming playback support (边生成边播放)
Add ffmpeg dependency
Fix command injection vulnerability
Add voice whitelist validation

v1.0.0

Initial release with basic TTS support

上一篇：Vynn 回测引擎：Openclaw Skills 的交易策略分析工具下一篇：客户留存：降低流失并提升 LTV - Openclaw Skills