mirror of
https://github.com/OpenBMB/VoxCPM
synced 2025-12-12 03:48:12 +00:00
add lora funetine webUI; optimize lora save and load logic
This commit is contained in:
114
docs/finetune.md
114
docs/finetune.md
@@ -19,6 +19,7 @@ LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that:
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Quick Start: WebUI](#quick-start-webui)
|
||||
- [Data Preparation](#data-preparation)
|
||||
- [Full Fine-tuning](#full-fine-tuning)
|
||||
- [LoRA Fine-tuning](#lora-fine-tuning)
|
||||
@@ -28,6 +29,31 @@ LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that:
|
||||
|
||||
---
|
||||
|
||||
## Quick Start: WebUI
|
||||
|
||||
For users who prefer a graphical interface, we provide `lora_ft_webui.py` - a comprehensive WebUI for training and inference:
|
||||
|
||||
### Launch WebUI
|
||||
|
||||
```bash
|
||||
python lora_ft_webui.py
|
||||
```
|
||||
|
||||
Then open `http://localhost:7860` in your browser.
|
||||
|
||||
### Features
|
||||
|
||||
- **🚀 Training Tab**: Configure and start LoRA training with an intuitive interface
|
||||
- Set training parameters (learning rate, batch size, LoRA rank, etc.)
|
||||
- Monitor training progress in real-time
|
||||
- Resume training from existing checkpoints
|
||||
|
||||
- **🎵 Inference Tab**: Generate audio with trained models
|
||||
- Automatic base model loading from LoRA checkpoint config
|
||||
- Voice cloning with automatic ASR (reference text recognition)
|
||||
- Hot-swap between multiple LoRA models
|
||||
- Zero-shot TTS without reference audio
|
||||
|
||||
## Data Preparation
|
||||
|
||||
Training data should be prepared as a JSONL manifest file, with one sample per line:
|
||||
@@ -177,6 +203,10 @@ lora:
|
||||
# Target modules
|
||||
target_modules_lm: ["q_proj", "v_proj", "k_proj", "o_proj"]
|
||||
target_modules_dit: ["q_proj", "v_proj", "k_proj", "o_proj"]
|
||||
|
||||
# Distribution options (optional)
|
||||
# hf_model_id: "openbmb/VoxCPM1.5" # HuggingFace ID
|
||||
# distribute: true # If true, save hf_model_id in lora_config.json
|
||||
```
|
||||
|
||||
### LoRA Parameters
|
||||
@@ -189,6 +219,15 @@ lora:
|
||||
| `alpha` | Scaling factor, `scaling = alpha / r` | Usually `r/2` or `r` |
|
||||
| `target_modules_*` | Layer names to add LoRA | attention layers |
|
||||
|
||||
### Distribution Options (Optional)
|
||||
|
||||
| Parameter | Description | Default |
|
||||
|-----------|-------------|---------|
|
||||
| `hf_model_id` | HuggingFace model ID (e.g., `openbmb/VoxCPM1.5`) | `""` |
|
||||
| `distribute` | If `true`, save `hf_model_id` as `base_model` in checkpoint; otherwise save local `pretrained_path` | `false` |
|
||||
|
||||
> **Note**: If `distribute: true`, `hf_model_id` is required.
|
||||
|
||||
### Training
|
||||
|
||||
```bash
|
||||
@@ -202,16 +241,37 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \
|
||||
|
||||
### Checkpoint Structure
|
||||
|
||||
LoRA training saves only LoRA parameters:
|
||||
LoRA training saves LoRA parameters and configuration:
|
||||
|
||||
```
|
||||
checkpoints/finetune_lora/
|
||||
└── step_0002000/
|
||||
├── lora_weights.safetensors # Only lora_A, lora_B parameters
|
||||
├── lora_config.json # LoRA config + base model path
|
||||
├── optimizer.pth
|
||||
└── scheduler.pth
|
||||
```
|
||||
|
||||
The `lora_config.json` contains:
|
||||
```json
|
||||
{
|
||||
"base_model": "/path/to/VoxCPM1.5/",
|
||||
"lora_config": {
|
||||
"enable_lm": true,
|
||||
"enable_dit": true,
|
||||
"r": 32,
|
||||
"alpha": 16,
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `base_model` field contains:
|
||||
- Local path (default): when `distribute: false` or not set
|
||||
- HuggingFace ID: when `distribute: true` (e.g., `"openbmb/VoxCPM1.5"`)
|
||||
|
||||
This allows loading LoRA checkpoints without the original training config file.
|
||||
|
||||
---
|
||||
|
||||
## Inference
|
||||
@@ -240,11 +300,10 @@ python scripts/test_voxcpm_ft_infer.py \
|
||||
|
||||
### LoRA Inference
|
||||
|
||||
LoRA inference requires the training config (for LoRA structure) and LoRA checkpoint:
|
||||
LoRA inference only requires the checkpoint directory (base model path and LoRA config are read from `lora_config.json`):
|
||||
|
||||
```bash
|
||||
python scripts/test_voxcpm_lora_infer.py \
|
||||
--config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml \
|
||||
--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
|
||||
--text "Hello, this is LoRA fine-tuned result." \
|
||||
--output lora_output.wav
|
||||
@@ -254,7 +313,6 @@ With voice cloning:
|
||||
|
||||
```bash
|
||||
python scripts/test_voxcpm_lora_infer.py \
|
||||
--config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml \
|
||||
--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
|
||||
--text "This is voice cloning with LoRA." \
|
||||
--prompt_audio /path/to/reference.wav \
|
||||
@@ -262,6 +320,16 @@ python scripts/test_voxcpm_lora_infer.py \
|
||||
--output cloned_output.wav
|
||||
```
|
||||
|
||||
Override base model path (optional):
|
||||
|
||||
```bash
|
||||
python scripts/test_voxcpm_lora_infer.py \
|
||||
--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
|
||||
--base_model /path/to/another/VoxCPM1.5 \
|
||||
--text "Use different base model." \
|
||||
--output output.wav
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## LoRA Hot-swapping
|
||||
@@ -315,20 +383,39 @@ print(f"Loaded {len(loaded)} params, skipped {len(skipped)}")
|
||||
lora_state = model.get_lora_state_dict()
|
||||
```
|
||||
|
||||
### Simplified Usage (Auto LoRA Config)
|
||||
### Simplified Usage (Load from lora_config.json)
|
||||
|
||||
If you only have LoRA weights and don't need custom config, just provide the path:
|
||||
If your checkpoint contains `lora_config.json` (saved by the training script), you can load everything automatically:
|
||||
|
||||
```python
|
||||
import json
|
||||
from voxcpm.core import VoxCPM
|
||||
from voxcpm.model.voxcpm import LoRAConfig
|
||||
|
||||
# Auto-create default LoRAConfig when only lora_weights_path is provided
|
||||
# Load config from checkpoint
|
||||
lora_ckpt_dir = "/path/to/checkpoints/finetune_lora/step_0002000"
|
||||
with open(f"{lora_ckpt_dir}/lora_config.json") as f:
|
||||
lora_info = json.load(f)
|
||||
|
||||
base_model = lora_info["base_model"]
|
||||
lora_cfg = LoRAConfig(**lora_info["lora_config"])
|
||||
|
||||
# Load model with LoRA
|
||||
model = VoxCPM.from_pretrained(
|
||||
hf_model_id="openbmb/VoxCPM1.5",
|
||||
lora_weights_path="/path/to/lora_checkpoint", # Will auto-create LoRAConfig
|
||||
hf_model_id=base_model,
|
||||
lora_config=lora_cfg,
|
||||
lora_weights_path=lora_ckpt_dir,
|
||||
)
|
||||
```
|
||||
|
||||
Or use the test script directly:
|
||||
|
||||
```bash
|
||||
python scripts/test_voxcpm_lora_infer.py \
|
||||
--lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
|
||||
--text "Hello world"
|
||||
```
|
||||
|
||||
### Method Reference
|
||||
|
||||
| Method | Description | torch.compile Compatible |
|
||||
@@ -354,7 +441,6 @@ model = VoxCPM.from_pretrained(
|
||||
|
||||
- Increase `r` (LoRA rank)
|
||||
- Adjust `alpha` (try `alpha = r/2` or `alpha = r`)
|
||||
- Ensure `enable_dit: true` (required for voice cloning)
|
||||
- Increase training steps
|
||||
- Add more target modules
|
||||
|
||||
@@ -366,11 +452,13 @@ model = VoxCPM.from_pretrained(
|
||||
|
||||
### 4. LoRA Not Taking Effect at Inference
|
||||
|
||||
- Ensure inference config matches training config LoRA parameters
|
||||
- Check that `lora_config.json` exists in the checkpoint directory
|
||||
- Check `load_lora()` return value - `skipped_keys` should be empty
|
||||
- Verify `set_lora_enabled(True)` is called
|
||||
|
||||
### 5. Checkpoint Loading Errors
|
||||
|
||||
- Full fine-tuning: checkpoint directory should contain `model.safetensors`(or `pytorch_model.bin`), `config.json`, `audiovae.pth`
|
||||
- LoRA: checkpoint directory should contain `lora_weights.safetensors` (or `lora_weights.ckpt`)
|
||||
- Full fine-tuning: checkpoint directory should contain `model.safetensors` (or `pytorch_model.bin`), `config.json`, `audiovae.pth`
|
||||
- LoRA: checkpoint directory should contain:
|
||||
- `lora_weights.safetensors` (or `lora_weights.ckpt`) - LoRA weights
|
||||
- `lora_config.json` - LoRA config and base model path
|
||||
|
||||
Reference in New Issue
Block a user