add lora funetine webUI; optimize lora save and load logic

2025-12-12 03:48:12 +00:00 · 2025-12-09 21:34:39 +08:00
parent 0779a93697
commit a266c0a88d
9 changed files with 1575 additions and 48 deletions
--- a/docs/finetune.md
+++ b/docs/finetune.md
@@ -19,6 +19,7 @@ LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that:

 ## Table of Contents

+- [Quick Start: WebUI](#quick-start-webui)
 - [Data Preparation](#data-preparation)
 - [Full Fine-tuning](#full-fine-tuning)
 - [LoRA Fine-tuning](#lora-fine-tuning)
@@ -28,6 +29,31 @@ LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that:

 ---

+## Quick Start: WebUI
+
+For users who prefer a graphical interface, we provide `lora_ft_webui.py` - a comprehensive WebUI for training and inference:
+
+### Launch WebUI
+
+```bash
+python lora_ft_webui.py
+```
+
+Then open `http://localhost:7860` in your browser.
+
+### Features
+
+- **🚀 Training Tab**: Configure and start LoRA training with an intuitive interface
+  - Set training parameters (learning rate, batch size, LoRA rank, etc.)
+  - Monitor training progress in real-time
+  - Resume training from existing checkpoints
+
+- **🎵 Inference Tab**: Generate audio with trained models
+  - Automatic base model loading from LoRA checkpoint config
+  - Voice cloning with automatic ASR (reference text recognition)
+  - Hot-swap between multiple LoRA models
+  - Zero-shot TTS without reference audio
+
 ## Data Preparation

 Training data should be prepared as a JSONL manifest file, with one sample per line:
@@ -177,6 +203,10 @@ lora:
  # Target modules
  target_modules_lm: ["q_proj", "v_proj", "k_proj", "o_proj"]
  target_modules_dit: ["q_proj", "v_proj", "k_proj", "o_proj"]
+
+# Distribution options (optional)
+# hf_model_id: "openbmb/VoxCPM1.5"  # HuggingFace ID
+# distribute: true                   # If true, save hf_model_id in lora_config.json
 ```

 ### LoRA Parameters
@@ -189,6 +219,15 @@ lora:
 | `alpha` | Scaling factor, `scaling = alpha / r` | Usually `r/2` or `r` |
 | `target_modules_*` | Layer names to add LoRA | attention layers |

+### Distribution Options (Optional)
+
+| Parameter | Description | Default |
+|-----------|-------------|---------|
+| `hf_model_id` | HuggingFace model ID (e.g., `openbmb/VoxCPM1.5`) | `""` |
+| `distribute` | If `true`, save `hf_model_id` as `base_model` in checkpoint; otherwise save local `pretrained_path` | `false` |
+
+> **Note**: If `distribute: true`, `hf_model_id` is required.
+
 ### Training

 ```bash
@@ -202,16 +241,37 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 \

 ### Checkpoint Structure

-LoRA training saves only LoRA parameters:
+LoRA training saves LoRA parameters and configuration:

 ```
 checkpoints/finetune_lora/
 └── step_0002000/
    ├── lora_weights.safetensors    # Only lora_A, lora_B parameters
+    ├── lora_config.json            # LoRA config + base model path
    ├── optimizer.pth
    └── scheduler.pth
 ```

+The `lora_config.json` contains:
+```json
+{
+  "base_model": "/path/to/VoxCPM1.5/",
+  "lora_config": {
+    "enable_lm": true,
+    "enable_dit": true,
+    "r": 32,
+    "alpha": 16,
+    ...
+  }
+}
+```
+
+The `base_model` field contains:
+- Local path (default): when `distribute: false` or not set
+- HuggingFace ID: when `distribute: true` (e.g., `"openbmb/VoxCPM1.5"`)
+
+This allows loading LoRA checkpoints without the original training config file.
+
 ---

 ## Inference
@@ -240,11 +300,10 @@ python scripts/test_voxcpm_ft_infer.py \

 ### LoRA Inference

-LoRA inference requires the training config (for LoRA structure) and LoRA checkpoint:
+LoRA inference only requires the checkpoint directory (base model path and LoRA config are read from `lora_config.json`):

 ```bash
 python scripts/test_voxcpm_lora_infer.py \
-    --config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml \
    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
    --text "Hello, this is LoRA fine-tuned result." \
    --output lora_output.wav
@@ -254,7 +313,6 @@ With voice cloning:

 ```bash
 python scripts/test_voxcpm_lora_infer.py \
-    --config_path conf/voxcpm_v1.5/voxcpm_finetune_lora.yaml \
    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
    --text "This is voice cloning with LoRA." \
    --prompt_audio /path/to/reference.wav \
@@ -262,6 +320,16 @@ python scripts/test_voxcpm_lora_infer.py \
    --output cloned_output.wav
 ```

+Override base model path (optional):
+
+```bash
+python scripts/test_voxcpm_lora_infer.py \
+    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
+    --base_model /path/to/another/VoxCPM1.5 \
+    --text "Use different base model." \
+    --output output.wav
+```
+
 ---

 ## LoRA Hot-swapping
@@ -315,20 +383,39 @@ print(f"Loaded {len(loaded)} params, skipped {len(skipped)}")
 lora_state = model.get_lora_state_dict()
 ```

-### Simplified Usage (Auto LoRA Config)
+### Simplified Usage (Load from lora_config.json)

-If you only have LoRA weights and don't need custom config, just provide the path:
+If your checkpoint contains `lora_config.json` (saved by the training script), you can load everything automatically:

 ```python
+import json
 from voxcpm.core import VoxCPM
+from voxcpm.model.voxcpm import LoRAConfig

-# Auto-create default LoRAConfig when only lora_weights_path is provided
+# Load config from checkpoint
+lora_ckpt_dir = "/path/to/checkpoints/finetune_lora/step_0002000"
+with open(f"{lora_ckpt_dir}/lora_config.json") as f:
+    lora_info = json.load(f)
+
+base_model = lora_info["base_model"]
+lora_cfg = LoRAConfig(**lora_info["lora_config"])
+
+# Load model with LoRA
 model = VoxCPM.from_pretrained(
-    hf_model_id="openbmb/VoxCPM1.5",
-    lora_weights_path="/path/to/lora_checkpoint",  # Will auto-create LoRAConfig
+    hf_model_id=base_model,
+    lora_config=lora_cfg,
+    lora_weights_path=lora_ckpt_dir,
 )
 ```

+Or use the test script directly:
+
+```bash
+python scripts/test_voxcpm_lora_infer.py \
+    --lora_ckpt /path/to/checkpoints/finetune_lora/step_0002000 \
+    --text "Hello world"
+```
+
 ### Method Reference

 | Method | Description | torch.compile Compatible |
@@ -354,7 +441,6 @@ model = VoxCPM.from_pretrained(

 - Increase `r` (LoRA rank)
 - Adjust `alpha` (try `alpha = r/2` or `alpha = r`)
- Ensure `enable_dit: true` (required for voice cloning)
 - Increase training steps
 - Add more target modules

@@ -366,11 +452,13 @@ model = VoxCPM.from_pretrained(

 ### 4. LoRA Not Taking Effect at Inference

- Ensure inference config matches training config LoRA parameters
+- Check that `lora_config.json` exists in the checkpoint directory
 - Check `load_lora()` return value - `skipped_keys` should be empty
 - Verify `set_lora_enabled(True)` is called

 ### 5. Checkpoint Loading Errors

- Full fine-tuning: checkpoint directory should contain `model.safetensors`(or `pytorch_model.bin`), `config.json`, `audiovae.pth`
- LoRA: checkpoint directory should contain `lora_weights.safetensors` (or `lora_weights.ckpt`)
+- Full fine-tuning: checkpoint directory should contain `model.safetensors` (or `pytorch_model.bin`), `config.json`, `audiovae.pth`
+- LoRA: checkpoint directory should contain:
+  - `lora_weights.safetensors` (or `lora_weights.ckpt`) - LoRA weights
+  - `lora_config.json` - LoRA config and base model path