ElevenLabs 语音:AI 语音合成与转录 - Openclaw Skills
作者:互联网
2026-04-16
什么是 ElevenLabs 语音与转录?
ElevenLabs 语音技能是一个全面的音频处理工具,专为需要通过人类声音进行交流或理解人类声音的 AI 智能体设计。它利用 ElevenLabs API 提供两项主要服务:使用先进神经模型的高质量文本转语音 (TTS) 和通过 Scribe 引擎实现的语音转文本 (STT)。这一集成允许 Openclaw Skills 生态系统内的开发者创建高度互动、支持多语言且具备情感表达和精确音频理解能力的智能体。
通过使用此技能,智能体可以在 eleven_turbo_v2_5(追求速度)或 eleven_multilingual_v2(支持全球化)等各种模型之间切换。它对于需要自然旁白或处理来自 T@elegrimm 等平台语音备忘录的工作流特别有效。该技能将复杂的音频任务简化为直观的 CLI 命令和 Python 方法,使其成为现代 Openclaw Skills 实现的基础组件。
下载入口:https://github.com/openclaw/skills/tree/main/skills/jeffpignataro/miranda-elevenlabs-speech
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install miranda-elevenlabs-speech
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 miranda-elevenlabs-speech。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
ElevenLabs 语音与转录 应用场景
- 为客户支持机器人和个人助手创建自动语音回复。
- 转录用户的语音消息,以便大语言模型 (LLM) 处理语音优先的输入。
- 为国际观众生成 99 种不同语言的本地化语音内容。
- 使用自定义语音稳定性设置,实现故事讲述或新闻发布的生动旁白。
- 用户提供用于合成的文本字符串或用于转录的音频文件(如 .mp3, .ogg)。
- 对于合成,该技能调用 ElevenLabs TTS API,并带有 voice_id、stability 和 similarity_boost 等参数。
- 对于转录,ElevenLabs Scribe 引擎处理音频文件,可选择使用语言提示或说话人识别。
- 系统将生成的音频保存到指定路径,或将转录的文本返回给智能体工作流。
- 智能体随后可以使用输出发送语音回复或继续进行基于文本的推理。
ElevenLabs 语音与转录 配置指南
要在您的 Openclaw Skills 环境中开始使用此技能,请配置您的 API 凭据:
export ELEVENLABS_API_KEY="your_api_key_here"
您也可以将其存储在工作区根目录的 .env 文件中。确保您的脚本目录中已包含 elevenlabs_speech.py 和 elevenlabs_scribe.py 以处理执行逻辑。
ElevenLabs 语音与转录 数据架构与分类体系
该技能使用以下结构管理音频数据和元数据:
| 参数 | 描述 | 有效值 |
|---|---|---|
voice_id |
特定 AI 语音模型的标识符 | 字符串(例如 'Rachel', 'Josh') |
stability |
决定情感的一致性 | 浮点数 (0.0 到 1.0) |
similarity_boost |
控制输出与原始声音的匹配程度 | 浮点数 (0.0 到 1.0) |
language_code |
转录准确性的 ISO 代码 | 字符串(例如 'eng', 'spa', 'ara') |
num_speakers |
用于 STT 中的说话人识别 | 整数 (1-10) |
支持的音频格式包括 mp3, mp4, wav 和 ogg(针对 T@elegrimm 进行了优化)。
name: elevenlabs-speech
description: Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
ElevenLabs Speech
Complete voice solution — both TTS and STT using one API:
- TTS: Text-to-Speech (high-quality voices)
- STT: Speech-to-Text via Scribe (accurate transcription)
Quick Start
Environment Setup
Set your API key:
export ELEVENLABS_API_KEY="sk_..."
Or create .env file in workspace root.
Text-to-Speech (TTS)
Convert text to natural-sounding speech:
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
With custom voice:
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
List Available Voices
python scripts/elevenlabs_speech.py voices
Using in Code
from scripts.elevenlabs_speech import ElevenLabsClient
client = ElevenLabsClient(api_key="sk_...")
# Basic TTS
result = client.text_to_speech(
text="Hello from zerox",
output_path="greeting.mp3"
)
# With custom settings
result = client.text_to_speech(
text="Your text here",
voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel
stability=0.5,
similarity_boost=0.75,
output_path="output.mp3"
)
# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
print(f"{voice['name']}: {voice['voice_id']}")
Popular Voices
| Voice ID | Name | Description |
|---|---|---|
21m00Tcm4TlvDq8ikWAM |
Rachel | Natural, versatile (default) |
AZnzlk1XvdvUeBnXmlld |
Domi | Strong, energetic |
EXAVITQu4vr4xnSDxMaL |
Bella | Soft, soothing |
ErXwobaYiN019PkySvjV |
Antoni | Well-rounded |
MF3mGyEYCl7XYWbV9V6O |
Elli | Warm, friendly |
TxGEqnHWrfWFTfGW9XjX |
Josh | Deep, calm |
VR6AewLTigWG4xSOukaG |
Arnold | Authoritative |
Voice Settings
- stability (0-1): Lower = more emotional, Higher = more stable
- similarity_boost (0-1): Higher = closer to original voice
Default: stability=0.5, similarity_boost=0.75
Models
eleven_turbo_v2_5- Fast, high quality (default)eleven_multilingual_v2- Best for non-Englisheleven_monolingual_v1- English only
Integration with T@elegrimm
When user sends text and wants voice reply:
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")
# Send via T@elegrimm message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
Pricing
Check https://elevenlabs.io/pricing for current rates. Free tier available!
Speech-to-Text (STT) with ElevenLabs Scribe
Transcribe voice messages using ElevenLabs Scribe:
Transcribe Audio
python scripts/elevenlabs_scribe.py voice_message.ogg
With specific language:
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
With speaker diarization (multiple speakers):
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
Using in Code
from scripts.elevenlabs_scribe import ElevenLabsScribe
client = ElevenLabsScribe(api_key="sk-...")
# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])
# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")
# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
Supported Formats
- mp3, mp4, mpeg, mpga, m4a, wav, webm
- Max file size: 100 MB
- Works great with T@elegrimm voice messages (
.ogg)
Language Support
Scribe supports 99 languages including:
- Arabic (
ara) - English (
eng) - Spanish (
spa) - French (
fra) - And many more...
Without language hint, it auto-detects.
Complete Workflow Example
User sends voice message → You reply with voice:
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient
# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']
# 2. Process/understand the text
# ... your logic here ...
# 3. Generate response text
response_text = "Your response here"
# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")
# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)
Pricing
Check https://elevenlabs.io/pricing for current rates:
TTS (Text-to-Speech):
- Free tier: 10,000 characters/month
- Paid plans available
STT (Speech-to-Text) - Scribe:
- Free tier available
- Check website for current pricing
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
河南油田 PPT 生成器:专业演示文稿自动化 - Openclaw Skills
BS Detector:从长消息中提取核心真相 - Openclaw Skills
腾讯云智能顾问:云架构管理 - Openclaw Skills
落地页构建器:自动化 HTML 与转化优化 - Openclaw Skills
ToolDeck:个人 AI 工具与服务数据库 - Openclaw Skills
Sequence Alignment: NCBI BLAST API Integration - Openclaw Skills
Cap Table Manager:股权与稀释建模 - Openclaw Skills
GitHub Actions 权限范围审计:强制执行最小权限 - Openclaw Skills
AI Job Hunter Pro: 基于 RAG 的职位搜索与自动投递 - Openclaw Skills
Concept2训练日志分析器:获取并分析健身数据 - Openclaw Skills
AI精选
