Initial commit with large files ignored

This commit is contained in:
admin
2025-12-11 00:12:18 +08:00
parent a266c0a88d
commit 1e44eba871
17 changed files with 179286 additions and 329 deletions

View File

@@ -1,55 +1,54 @@
# 👩‍🍳 A Voice Chef's Guide
# 👩‍🍳 声音大厨指南
Welcome to the VoxCPM kitchen! Follow this recipe to cook up perfect generated speech. Let's begin.
欢迎来到 VoxCPM 厨房!按照这份食谱,烹饪出完美的生成语音。让我们开始吧。
---
## 🥚 Step 1: Prepare Your Base Ingredients (Content)
## 🥚 第一步:准备基础食材(内容)
First, choose how you'd like to input your text:
首先,选择你输入文本的方式:
### 1. Regular Text (Classic Mode)
-Keep "Text Normalization" ON. Type naturally (e.g., "Hello, world! 123"). The system will automatically process numbers, abbreviations, and punctuation using WeTextProcessing library.
### 1. 普通文本(经典模式)
-保持“文本标准化 (Text Normalization)”开启。自然地输入文字(例如 "Hello, world! 123")。系统将使用 WeTextProcessing 库自动处理数字、缩写和标点符号。
### 2. Phoneme Input (Native Mode)
-Turn "Text Normalization" OFF. Enter phoneme text like `{HH AH0 L OW1}` (EN) or `{ni3}{hao3}` (ZH) for precise pronunciation control. In this mode, VoxCPM also supports native understanding of other complex non-normalized text—try it out!
- **Phoneme Conversion**: For Chinese, phonemes are converted using pinyin. For English, phonemes are converted using CMUDict. Please refer to the relevant documentation for more details.
### 2. 音素输入(原生模式)
-关闭“文本标准化 (Text Normalization)”。输入音素文本,如 `{HH AH0 L OW1}` (英语) 或 `{ni3}{hao3}` (中文)以进行精确的发音控制。在此模式下VoxCPM 还支持对其他复杂的非标准化文本的原生理解——快来试试吧!
- **音素转换**:对于中文,音素使用拼音转换。对于英语,音素使用 CMUDict 转换。更多详细信息请参考相关文档。
---
## 🍳 Step 2: Choose Your Flavor Profile (Voice Style)
## 🍳 第二步:选择风味(声音风格)
This is the secret sauce that gives your audio its unique sound.
这是让你的音频拥有独特声音的秘制酱料。
### 1. Cooking with a Prompt Speech (Following a Famous Recipe)
- A prompt speech provides the desired acoustic characteristics for VoxCPM. The speaker's timbre, speaking style, and even the background sounds and ambiance will be replicated.
- **For a Clean, Denoising Voice:**
-Enable "Prompt Speech Enhancement". This acts like a noise filter, removing background hiss and rumble to give you a pure, clean voice clone. However, this will limit the audio sampling rate to 16kHz, restricting the cloning quality ceiling.
- **For High-Quality Audio Cloning (Up to 44.1kHz):**
-Disable "Prompt Speech Enhancement" to preserve all original audio information, including background atmosphere, and support audio cloning up to 44.1kHz sampling rate.
### 1. 使用提示语音烹饪(跟随名家食谱)
- 提示语音Prompt Speech)为 VoxCPM 提供所需的声学特征。说话者的音色、说话风格,甚至背景声音和氛围都将被复制。
- **为了获得干净、降噪的声音:**
-启用“提示语音增强 (Prompt Speech Enhancement)”。这就像一个噪音过滤器,去除背景嘶嘶声和隆隆声,给你一个纯净、干净的声音克隆。但是,这将限制音频采样率为 16kHz限制了克隆质量的上限。
- **为了获得高质量音频克隆(最高 44.1kHz**
-禁用“提示语音增强 (Prompt Speech Enhancement)”以保留所有原始音频信息,包括背景氛围,并支持高达 44.1kHz 采样率的音频克隆。
### 2. Cooking au Naturel (Letting the Model Improvise)
- If no reference is provided, VoxCPM becomes a creative chef! It will infer a fitting speaking style based on the text itself, thanks to the text-smartness of its foundation model, MiniCPM-4.
- **Pro Tip**: Challenge VoxCPM with any text—poetry, song lyrics, dramatic monologues—it may deliver some interesting results!
### 2. 自然烹饪(让模型即兴发挥)
- 如果没有提供参考VoxCPM 将成为一位创意大厨!通过其基础模型 MiniCPM-4 的文本智能,它会根据文本本身推断出合适的说话风格。
- **专业提示**:用任何文本挑战 VoxCPM——诗歌、歌词、戏剧独白——它可能会带来一些有趣的结果
---
## 🧂 Step 3: The Final Seasoning (Fine-Tuning Your Results)
## 🧂 第三步:最后的调味(微调结果)
You're ready to serve! But for master chefs who want to tweak the flavor, here are two key spices.
你已经准备好上菜了!但对于想要调整口味的大厨,这里有两个关键的香料。
### CFG Value (How Closely to Follow the Recipe)
- **Default**: A great starting point.
- **Voice sounds strained or weird?** Lower this value. It tells the model to be more relaxed and improvisational, great for expressive prompts.
- **Need maximum clarity and adherence to the text?** Raise it slightly to keep the model on a tighter leash.
- **Short sentences?** Consider increasing the CFG value for better clarity and adherence.
- **Long texts?** Consider lowering the CFG value to improve stability and naturalness over extended passages.
### CFG 值(多严格地遵循食谱)
- **默认值**:一个很好的起点。
- **声音听起来紧张或奇怪?** 降低此值。它告诉模型更加放松和即兴,非常适合富有表现力的提示。
- **需要最大的清晰度和对文本的忠实度?** 稍微调高它,让模型保持更严格的控制。
- **短句?** 考虑增加 CFG 值以获得更好的清晰度和忠实度。
- **长文本?** 考虑降低 CFG 值以提高长段落的稳定性和自然度。
### Inference Timesteps (Simmering Time: Quality vs. Speed)
- **Need a quick snack?** Use a lower number. Perfect for fast drafts and experiments.
- **Cooking a gourmet meal?** Use a higher number. This lets the model "simmer" longer, refining the audio for superior detail and naturalness.
### 推理步数(炖煮时间:质量与速度)
- **需要快餐?** 使用较低的数值。非常适合快速草稿和实验。
- **烹饪大餐?** 使用较高的数值。这让模型“炖煮”得更久,提炼音频以获得卓越的细节和自然度。
---
Happy creating! 🎉 Start with the default settings and tweak from there to suit your project. The kitchen is yours!
祝创作愉快!🎉 从默认设置开始,根据你的项目进行调整。厨房是你的了!