Nodes Browser

ComfyDeploy: How ComfyUI-FreeVC_wrapper works in ComfyUI?

What is ComfyUI-FreeVC_wrapper?

A voice conversion extension node for ComfyUI based on [a/FreeVC](https://github.com/OlaWod/FreeVC), enabling high-quality voice conversion capabilities within the ComfyUI framework.

How to install it in ComfyDeploy?

Head over to the machine page

  1. Click on the "Create a new machine" button
  2. Select the Edit build steps
  3. Add a new step -> Custom Node
  4. Search for ComfyUI-FreeVC_wrapper and select it
  5. Close the build step dialig and then click on the "Save" button to rebuild the machine

ComfyUI-FreeVC_wrapper

Support My Work

If you find this project helpful, consider buying me a coffee:

Buy Me A Coffee

A voice conversion extension node for ComfyUI based on FreeVC, enabling high-quality voice conversion capabilities within the ComfyUI framework.

image

Features

  • Support for multiple FreeVC models:
    • Standard models (16kHz): FreeVC, FreeVC-s
    • High-quality model (24kHz): FreeVC (24kHz)
  • Enhanced voice mimicry capabilities
  • Advanced audio pre and post-processing options
  • Stereo and mono audio support
  • Automatic audio resampling
  • Integrated with ComfyUI's audio processing pipeline
  • GPU acceleration support (CUDA)

Installation

  1. Install the extension in your ComfyUI's custom_nodes directory:
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-FreeVC_wrapper.git
cd ComfyUI-FreeVC_wrapper
  1. Install required Python packages:
pip install librosa transformers numpy torch noisereduce
  1. Download required checkpoints:

a. Voice Conversion Models: All model checkpoint files (3 models) are available in a single Google Drive folder: Download All Model Checkpoints (Google Drive)

After downloading, extract the file and place the checkpoints folder in the freevc directory:

ComfyUI-FreeVC_wrapper/freevc/

b. Speaker Encoder: Download the speaker encoder checkpoint from HuggingFace and place it in the custom_nodes/ComfyUI-FreeVC_wrapper/freevc/speaker_encoder/ckpt directory:

| Component | Filename | Required For | |-----------|----------|--------------| | Speaker Encoder | pretrained_bak_5805000.pt | FreeVC, FreeVC (24kHz), D-FreeVC, and D-FreeVC (24kHz) models |

Direct download link:

Your final directory structure should look like this:

ComfyUI-FreeVC_wrapper/
├── freevc/
    ├── checkpoints/
    │   ├── freevc.pth         # Standard 16kHz model
    │   ├── freevc-s.pth       # Source-filtering based model
    │   ├── freevc-24.pth      # High-quality 24kHz model
    │  
    └── speaker_encoder/
        └── ckpt/
            └── pretrained_bak_5805000.pt  # Speaker encoder checkpoint

Usage

  1. In ComfyUI, locate the "FreeVC Voice Converter v2 🎤" node under the "audio/voice conversion" category

  2. Connect your inputs:

    • Source audio: The audio you want to convert
    • Reference audio: The target voice style
    • (Optional) Secondary reference: Additional reference for more robust voice matching
    • Select model type: Choose between standard and diffusion-enhanced models
  3. Configure the conversion parameters:

    • Source processing: Noise reduction, source neutralization, clarity enhancement
    • Conversion settings: Temperature, diffusion parameters (for diffusion models)
    • Post-processing: Voice matching strength, presence boost, normalization
  4. Connect the output to your desired audio output node

Model Selection Guide

  • FreeVC: Good for general purpose voice conversion at 16kHz
  • FreeVC-s: Better preservation of source speech content, recommended for maintaining clarity
  • FreeVC (24kHz): Higher quality output with better audio fidelity

Tips for Better Voice Conversion

  1. Use longer reference samples: 5-10 seconds of clean speech works best
  2. Try multiple reference samples: Use the secondary reference input for more robust voice profiles
  3. Adjust voice mimicry settings:
    • Increase voice_match_strength (0.6-0.8) for stronger character matching
    • Use neutralize_source (0.3-0.5) to reduce source voice influence
    • Add presence_boost (0.3-0.5) for more "in the room" sound

Known Issues and Troubleshooting

  1. File Not Found Errors:

    • Ensure all checkpoint files are in the correct directory
    • Verify file names match exactly (case-sensitive)
  2. CUDA Out of Memory:

    • Try processing shorter audio clips
    • Use CPU if GPU memory is insufficient
    • Lower diffusion steps for diffusion-based models
  3. Audio Quality Issues:

    • Try different models - each has strengths for different source/target voices
    • For diffusion models, lower the noise coefficient if there's static
    • Increase clarity_enhancement for better intelligibility

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Citation

If you use this in your research, please cite:

@article{wang2023freevc,
  title={FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion},
  author={Wang, Jiarui and Chen, Shilong and Wu, Yu and Zhang, Pan and Xie, Lei},
  journal={arXiv preprint arXiv:2210.15418},
  year={2023}
}