Nodes Browser
ComfyDeploy: How ComfyUI-FreeVC_wrapper works in ComfyUI?
What is ComfyUI-FreeVC_wrapper?
A voice conversion extension node for ComfyUI based on [a/FreeVC](https://github.com/OlaWod/FreeVC), enabling high-quality voice conversion capabilities within the ComfyUI framework.
How to install it in ComfyDeploy?
Head over to the machine page
- Click on the "Create a new machine" button
- Select the
Edit
build steps - Add a new step -> Custom Node
- Search for
ComfyUI-FreeVC_wrapper
and select it - Close the build step dialig and then click on the "Save" button to rebuild the machine
ComfyUI-FreeVC_wrapper
Support My Work
If you find this project helpful, consider buying me a coffee:
A voice conversion extension node for ComfyUI based on FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.
Features
- Support for multiple FreeVC models:
- Standard models (16kHz): FreeVC, FreeVC-s
- High-quality model (24kHz): FreeVC (24kHz)
- Enhanced voice mimicry capabilities
- Advanced audio pre and post-processing options
- Stereo and mono audio support
- Automatic audio resampling
- Integrated with ComfyUI's audio processing pipeline
- GPU acceleration support (CUDA)
Installation
- Install the extension in your ComfyUI's custom_nodes directory:
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-FreeVC_wrapper.git
cd ComfyUI-FreeVC_wrapper
- Install required Python packages:
pip install librosa transformers numpy torch noisereduce
- Download required checkpoints:
a. Voice Conversion Models: All model checkpoint files (3 models) are available in a single Google Drive folder: Download All Model Checkpoints (Google Drive)
After downloading, extract the file and place the checkpoints folder in the freevc directory:
ComfyUI-FreeVC_wrapper/freevc/
b. Speaker Encoder:
Download the speaker encoder checkpoint from HuggingFace and place it in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/speaker_encoder/ckpt
directory:
| Component | Filename | Required For |
|-----------|----------|--------------|
| Speaker Encoder | pretrained_bak_5805000.pt
| FreeVC, FreeVC (24kHz), D-FreeVC, and D-FreeVC (24kHz) models |
Direct download link:
Your final directory structure should look like this:
ComfyUI-FreeVC_wrapper/
├── freevc/
├── checkpoints/
│ ├── freevc.pth # Standard 16kHz model
│ ├── freevc-s.pth # Source-filtering based model
│ ├── freevc-24.pth # High-quality 24kHz model
│
└── speaker_encoder/
└── ckpt/
└── pretrained_bak_5805000.pt # Speaker encoder checkpoint
Usage
-
In ComfyUI, locate the "FreeVC Voice Converter v2 🎤" node under the "audio/voice conversion" category
-
Connect your inputs:
- Source audio: The audio you want to convert
- Reference audio: The target voice style
- (Optional) Secondary reference: Additional reference for more robust voice matching
- Select model type: Choose between standard and diffusion-enhanced models
-
Configure the conversion parameters:
- Source processing: Noise reduction, source neutralization, clarity enhancement
- Conversion settings: Temperature, diffusion parameters (for diffusion models)
- Post-processing: Voice matching strength, presence boost, normalization
-
Connect the output to your desired audio output node
Model Selection Guide
- FreeVC: Good for general purpose voice conversion at 16kHz
- FreeVC-s: Better preservation of source speech content, recommended for maintaining clarity
- FreeVC (24kHz): Higher quality output with better audio fidelity
Tips for Better Voice Conversion
- Use longer reference samples: 5-10 seconds of clean speech works best
- Try multiple reference samples: Use the secondary reference input for more robust voice profiles
- Adjust voice mimicry settings:
- Increase voice_match_strength (0.6-0.8) for stronger character matching
- Use neutralize_source (0.3-0.5) to reduce source voice influence
- Add presence_boost (0.3-0.5) for more "in the room" sound
Known Issues and Troubleshooting
-
File Not Found Errors:
- Ensure all checkpoint files are in the correct directory
- Verify file names match exactly (case-sensitive)
-
CUDA Out of Memory:
- Try processing shorter audio clips
- Use CPU if GPU memory is insufficient
- Lower diffusion steps for diffusion-based models
-
Audio Quality Issues:
- Try different models - each has strengths for different source/target voices
- For diffusion models, lower the noise coefficient if there's static
- Increase clarity_enhancement for better intelligibility
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Original FreeVC implementation by OlaWod
- ComfyUI framework by comfyanonymous
Citation
If you use this in your research, please cite:
@article{wang2023freevc,
title={FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion},
author={Wang, Jiarui and Chen, Shilong and Wu, Yu and Zhang, Pan and Xie, Lei},
journal={arXiv preprint arXiv:2210.15418},
year={2023}
}