语音笔记转 MIDI:将音频转换为量化 MIDI - Openclaw Skills
作者:互联网
2026-04-11
什么是 语音笔记转 MIDI?
voice-note-to-midi 技能在 Openclaw Skills 框架内提供了一个先进的音频转 MIDI 转换流水线。它采用多阶段处理方法,旨在处理人类人声录音中的细微差别,如音高偏移和背景噪音。这包括用于隔离旋律内容的音轨分离,以及用于多音符音高提取的 Spotify Basic Pitch 模型。
通过将此工具集成到您的音乐工作流程中,您可以弥补自发的灵感创意与结构化的数字作品之间的差距。该技能通过提供智能量化和调式感知修正,超越了简单的音高检测,确保生成的 MIDI 文件在音乐上准确无误,并可在任何数字音频工作站 (DAW) 中立即使用。它是 Openclaw Skills 如何简化创意技术任务的完美示例。
下载入口:https://github.com/openclaw/skills/tree/main/skills/danbennettuk/voice-note-to-midi
安装与下载
1. ClawHub CLI
从源直接安装技能的最快方式。
npx clawhub@latest install voice-note-to-midi
2. 手动安装
将技能文件夹复制到以下位置之一
全局模式~/.openclaw/skills/
工作区
/skills/
优先级:工作区 > 本地 > 内置
3. 提示词安装
将此提示词复制到 OpenClaw 即可自动安装。
请帮我使用 Clawhub 安装 voice-note-to-midi。如果尚未安装 Clawhub,请先安装(npm i -g clawhub)。
语音笔记转 MIDI 应用场景
- 使用语音备忘录随时随地捕捉旋律灵感,以便后期制作。
- 为非乐器演奏者的词曲作者将哼唱的旋律转换为 MIDI 轨道。
- 从旧的录音中提取主旋律线条以创建新的编曲。
- 在录音室期间,使用人声引导快速制作合成器线条的原型。
- 该流水线使用 HPSS 进行音轨分离,从噪音或打击乐元素中隔离旋律谐波内容。
- 它通过 Spotify 的 Basic Pitch 机器学习模型运行处理后的音频,以提取原始基频和音符起始时间。
- 系统进行详细的音高分析,以检测音乐调式并识别主要的音符分布。
- 它应用智能量化,将音符吸附到时间网格上,并执行可选的调式感知音高修正。
- 最终的后期处理会移除谐波重叠(八度修剪)并合并连奏音符,以获得干净、专业的 MIDI 输出。
语音笔记转 MIDI 配置指南
要开始使用 Openclaw Skills 的此组件,请确保您已安装 Python 3.11+ 和 FFmpeg。使用自动设置脚本进行快速安装:
cd /path/to/voice-note-to-midi
./setup.sh
对于手动安装和依赖管理,您可以设置虚拟环境并通过 pip 安装所需的库:
mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
语音笔记转 MIDI 数据架构与分类体系
该技能处理各种音频输入并生成结构化的 MIDI 数据。下表概述了数据处理情况:
| 数据类型 | 格式 | 描述 |
|---|---|---|
| 输入 | WAV, MP3, M4A, FLAC | 建议使用高质量无损音频以获得最佳效果。 |
| 输出 | 标准 MIDI (.mid) | 与所有现代 DAW 兼容的量化 MIDI 文件。 |
| 分析 | 元数据 | 包括检测到的调式(例如 G 大调)、音高范围和音符计数。 |
该工具为输出文件保持 120 BPM 的参考速度,同时保留原始演奏的相对时值。
name: voice-note-to-midi
description: Convert voice notes, humming, and melodic audio recordings to quantized MIDI files using ML-based pitch detection and intelligent post-processing
author: Clawd
tags: [audio, midi, music, transcription, machine-learning]
?? Voice Note to MIDI
Transform your voice memos, humming, and melodic recordings into clean, quantized MIDI files ready for your DAW.
What It Does
This skill provides a complete audio-to-MIDI conversion pipeline that:
- Stem Separation - Uses HPSS (Harmonic-Percussive Source Separation) to isolate melodic content from drums, noise, and background sounds
- ML-Powered Pitch Detection - Leverages Spotify's Basic Pitch model for accurate fundamental frequency extraction
- Key Detection - Automatically detects the musical key of your recording using Krumhansl-Kessler key profiles
- Intelligent Quantization - Snaps notes to a configurable timing grid with optional key-aware pitch correction
- Post-Processing - Applies octave pruning, overlap-based harmonic removal, and legato note merging for clean output
Pipeline Architecture
Audio Input (WAV/M4A/MP3)
↓
┌─────────────────────────────────────┐
│ Step 1: Stem Separation (HPSS) │
│ - Isolate harmonic content │
│ - Remove drums/percussion │
│ - Noise gating │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 2: Pitch Detection │
│ - Basic Pitch ML model (Spotify) │
│ - Polyphonic note detection │
│ - Onset/offset estimation │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 3: Analysis │
│ - Pitch class distribution │
│ - Key detection │
│ - Dominant note identification │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Step 4: Quantization & Cleanup │
│ - Timing grid snap │
│ - Key-aware pitch correction │
│ - Octave pruning (harmonic removal) │
│ - Overlap-based pruning │
│ - Note merging (legato) │
│ - Velocity normalization │
└─────────────────────────────────────┘
↓
MIDI Output (Standard MIDI File)
Setup
Prerequisites
- Python 3.11+ (Python 3.14+ recommended)
- FFmpeg (for audio format support)
- pip
Installation
Quick Install (Recommended):
cd /path/to/voice-note-to-midi
./setup.sh
This automated script will:
- Check Python 3.11+ is installed
- Create the
~/melody-pipelinedirectory - Set up the virtual environment
- Install all dependencies (basic-pitch, librosa, music21, etc.)
- Download and configure the hum2midi script
- Add melody-pipeline to your PATH
Manual Install:
If you prefer manual setup:
mkdir -p ~/melody-pipeline
cd ~/melody-pipeline
python3 -m venv venv-bp
source venv-bp/bin/activate
pip install basic-pitch librosa soundfile mido music21
chmod +x ~/melody-pipeline/hum2midi
- Add to your PATH (optional):
echo 'export PATH="$HOME/melody-pipeline:$PATH"' >> ~/.bashrc
source ~/.bashrc
Verify Installation
cd ~/melody-pipeline
./hum2midi --help
Usage
Basic Usage
Convert a voice memo to MIDI:
./hum2midi my_humming.wav
This creates my_humming.mid with 16th-note quantization.
Specify Output File
./hum2midi input.wav output.mid
Command-Line Options
| Option | Description | Default |
|---|---|---|
--grid |
Quantization grid: 1/4, 1/8, 1/16, 1/32 |
1/16 |
--min-note |
Minimum note duration in milliseconds | 50 |
--no-quantize |
Skip quantization (output raw Basic Pitch MIDI) | disabled |
--key-aware |
Enable key-aware pitch correction | disabled |
--no-analysis |
Skip pitch analysis and key detection | disabled |
Usage Examples
Quantize to eighth notes
./hum2midi melody.wav --grid 1/8
Key-aware quantization (recommended for tonal music)
./hum2midi song.wav --key-aware
Require longer minimum notes
./hum2midi humming.wav --min-note 100
Skip analysis for faster processing
./hum2midi quick.wav --no-analysis
Combine options
./hum2midi recording.wav output.mid --grid 1/8 --key-aware --min-note 80
Processing MIDI Input
You can also process existing MIDI files through the quantization pipeline:
./hum2midi input.mid output.mid --grid 1/16 --key-aware
This skips the audio processing steps and goes directly to analysis and quantization.
Sample Output
═══════════════════════════════════════════════════════════════
hum2midi - Melody-to-MIDI Pipeline (Basic Pitch Edition)
[Key-Aware Mode Enabled]
═══════════════════════════════════════════════════════════════
Input: my_humming.wav
Output: my_humming.mid
→ Step 1: Stem Separation (HPSS)
Isolating melodic content...
Loaded: 5.23s @ 44100Hz
? Melody stem extracted → 5.23s
→ Step 2: Audio-to-MIDI Conversion (Basic Pitch)
Running Spotify's Basic Pitch ML model on melody stem...
? Raw MIDI generated (Basic Pitch)
→ Step 3: Pitch Analysis & Key Detection
Notes detected: 42 total, 7 unique
Note range: C3 - G4
Pitch classes: C3, E3, G3, A3, C4, D4, G4
Dominant note: G3 (23.8% of notes)
Detected key: G major
→ Step 4: Quantization & Cleanup
Octave pruning: removed 3 harmonic notes above 67 (median+12)
Overlap pruning: removed 2 harmonic notes at overlapping positions
Note merging: merged 5 staccato chunks into legato notes (gap<=60 ticks)
Grid: 240 ticks (1/16)
Notes: 38 notes
Key: G major
Key-aware: 2 notes corrected to scale
Tempo: 120 BPM
? Quantized MIDI saved
═══════════════════════════════════════════════════════════════
? Done! Output: my_humming.mid
═══════════════════════════════════════════════════════════════
?? ANALYSIS SUMMARY
─────────────────────────────────────────────────────────────
Detected Notes: C3, E3, G3, A3, C4, D4, G4
Detected Key: G major
Quantization: Key-aware mode (notes snapped to scale)
MIDI Info: 38 notes, 7 unique pitches, 120 BPM
Pitches: C3, E3, G3, A3, C4, D4, G4
Notes & Limitations
Audio Quality Matters
- Clear, loud melody produces the best results
- Background noise can cause false note detection
- Reverb and effects may confuse pitch detection
- Close-mic'd vocals work significantly better than room recordings
Musical Considerations
- Monophonic sources work best (single melody line)
- Polyphonic audio (chords, multiple instruments) will produce messy results
- Vibrato and pitch bends may be quantized to stepped pitches
- Rapid note passages may be missed or merged
Technical Limitations
- Tempo is fixed at 120 BPM in output (time positions are preserved, but tempo may need adjustment in your DAW)
- Note velocities are normalized but may need manual adjustment
- Very short notes (<50ms) may be filtered out by default
- Extreme pitch ranges may cause octave detection issues
Post-Processing Recommendations
After generating MIDI, you may want to:
- Import into your DAW and adjust tempo to match your original recording
- Quantize further if stricter timing is needed
- Adjust note velocities for dynamics
- Apply swing/groove templates if the rigid grid sounds too mechanical
- Edit individual notes that were misdetected (common with fast runs)
Supported Audio Formats
Input formats supported via FFmpeg:
- WAV, AIFF, FLAC (uncompressed, best quality)
- MP3, M4A, AAC (compressed, acceptable)
- OGG, OPUS (open source formats)
- Most other formats FFmpeg supports
Troubleshooting
No notes detected
- Check that input file isn't silent or corrupted
- Try increasing
--min-notethreshold - Verify audio has clear melodic content (not just noise)
Too many notes / messy output
- Enable octave pruning and overlap pruning (on by default)
- Use
--key-awareto constrain to musical scale - Check for background noise in source audio
Wrong key detected
- Key detection works best with at least 8-10 measures of music
- Chromatic passages may confuse the detector
- Manually review and adjust in your DAW if needed
Notes in wrong octave
- Basic Pitch sometimes detects harmonics instead of fundamentals
- The pipeline includes pruning, but some may slip through
- Use your DAW's transpose function for simple octave shifts
References
- Basic Pitch - Spotify's polyphonic pitch detection model
- librosa HPSS - Harmonic-Percussive Source Separation
- Krumhansl-Kessler Key Profiles - Key detection algorithm
License
This skill integrates Basic Pitch by Spotify, which is licensed under Apache 2.0. The pipeline script and documentation are provided under MIT license.
相关推荐
专题
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
+ 收藏
最新数据
相关文章
【含最新安装包】OpenClaw 2.6.2 高速部署实操教程
OpenClaw一键部署安装代码实现过程详解
阿里云DDoS安全运营智能体发布,网络安全的Agentic时刻来临
Flutter for OpenHarmony 实战之基础组件:第一篇 Container 容器组件完全指南
AI智能体的测试流程
阿里云、本地部署OpenClaw +省Token降成本教程:claude-mem+OpenViking开源记忆神器集成方案
实测腾讯云 AndonQ:号称比肩原厂技术专家的 “领域虾”,到底有多能打?
【含最新安装包】Windows 平台 OpenClaw 5 分钟极速部署指南
【含最新安装包】OpenClaw 2.6.2 一键安装|无需命令行完整教程
2026 最新版 OpenClaw,Windows 一键安装,高效不拖沓(包含新安装包
AI精选
