Nodes Browser
ComfyDeploy: How ComfyUI-Lightning works in ComfyUI?
What is ComfyUI-Lightning?
Accelerate FLUX inferencing speed for ComfyUI.
How to install it in ComfyDeploy?
Head over to the machine page
- Click on the "Create a new machine" button
- Select the
Edit
build steps - Add a new step -> Custom Node
- Search for
ComfyUI-Lightning
and select it - Close the build step dialig and then click on the "Save" button to rebuild the machine
⚡ComfyUI-Lightning
Introduction
This repository integrates all the tricks I know to speed up Flux inference:
- Use
TeaCache
orFBCache
orMBCache
orToCa
; - Skip some unnessasery blocks;
- Compile and quantize model;
- Use fast CuDNN attention kernels;
- Use
SageAttention
orSpargeAttn
; - Fix
AttributeError: 'SymInt' object has no attribute 'size'
to speed up recompilation after resolution changing.
MBCache
extends FBCache
and is applied to cache multiple blocks. The codes are modified from SageAttention, ComfyUI-TeaCache, comfyui-flux-accelerator and Comfy-WaveSpeed. More details see above given repositories.
Updates
- [2025/3/10] Add SpargeAttn. For more details, see Usage.
- [2025/2/27] Add ToCa.
- [2025/1/24] Now support Sana. Get your 1024*1024 images within 2s. All the codes are modified from Sana.
Usage
For Flux
<img src="./assets/flux_generation_results.png" alt="Flux Generation Results" width="80%"/>You can use XXCache
, SageAttention
, and torch.compile
with the following examples:
More specific:
- Download Flux diffusion model and VAE image decoder from FLUX.1-dev or FLUX.1-schnell. Put the
flux1-dev.safetensors
orflux1-schnell.safetensors
file intomodels/diffusion_models
and theae.safetensors
file intomodels/vae
; - Download Flux text encoder from flux_text_encoders and put all the
.safetensors
files intomodels/clip
; - Run the example workflow.
For Sana
<img src="./assets/sana_generation_results.png" alt="Sana Generation Results" width="80%"/>- Download Sana diffusion model from Model Zoo and put the
.pth
file intomodels/diffusion_models
; - Download Gemma text encoder from google/gemma-2-2b-it, unsloth/gemma-2b-it-bnb-4bit or Efficient-Large-Model/gemma-2-2b-it and put the whole folder into
models/text_encoders
; - Download DCAE image decoder from mit-han-lab/dc-ae-f32c32-sana-1.0 and put the
.safetensors
file intomodels/vae
; - Run the example workflow.
For SpargeAttn
SpargeAttn is an attention acceleration method based on SageAttention, which requires hyperparameter tuning before using. The tuning process is shown in the following steps:
-
First you should follow the steps below to install
SpargeAttn
. If you have problems installing it, see the original repository;git clone https://github.com/thu-ml/SpargeAttn.git cd ./SpargeAttn pip install -e .
-
If you do not have a hyperparameter file, you should perform a few rounds of quality fine-tuning to get one first. You just need to open the
<img src="./assets/spargeattn_autotune.png" alt="SpargeAttn Autotune" width="35%"/>enable_tuning_mode
of the nodeApply SpargeAttn
and perform the generation. For example, generate 50-step 512*512 images at 10 different prompts (very time-consuming);- The
skip_DoubleStreamBlocks
andskip_SingleStreamBlocks
arguments are used to skip certain blocks that do not require the use ofSpargeAttn
, mainly to work withTeaCache
andFBCache
. - Enable
parallel_tuning
to utilize multiple GPUs to accelerate tuning. In this case, you need to start ComfyUI with the argument--disable-cuda-malloc
. - [New] Follow the author's code updates to liberalize the use of the
l1
andpv_l1
parameters for tuning.
- The
-
Turn off
<img src="./assets/spargeattn_saving.png" alt="SpargeAttn Saving" width="90%"/>enable_tuning_mode
and use theSave Finetuned SpargeAttn Hyperparams
node to save your hyperparameter file; -
Remove or disable the
<img src="./assets/spargeattn_loading.png" alt="SpargeAttn Loading" width="90%"/>Save Finetuned SpargeAttn Hyperparams
node and place the saved hyperparameter file in themodels/checkpoints
folder. Load this hyperparameter file in theApply SpargeAttn
node; -
Enjoy yourself.
To make tuning hyperparameters easier, I've provided an example workflow here. This workflow defaults to generating a 50-step 512*512 image for each of the 10 preset prompts (which can be modified as you see fit). Click on the Queue
button to start tuning. Of course, you need to make sure you have the right environment before you start. Again, this process is very time consuming.
If you have a well-tuned hyperparameter file, feel free to share it.