Furkan Gözükara

Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute

Added 2025-10-02 14:00:00 +0000 UTC

Full step by step Kohya SS GUI FLUX LoRA training. Includes 4 to 48 GB GPUs very best optimized configs. For Windows, RunPod, Massed Compute

Patreon exclusive posts index to find our scripts easily, Patreon scripts updates history to see which updates arrived to which scripts and amazing Patreon special generative scripts list that you can use in any of your task.

Join discord to get help, chat, discuss and also tell me your discord username to get your special rank : SECourses Discord

Please also Star, Watch and Fork our Stable Diffusion & Generative AI GitHub repository and join our Reddit subreddit and follow me on LinkedIn (my real profile)

=======

Latest zip file : Kohya_FLUX_DreamBooth_LoRA_v32.zip
- Use LoRA_Tab_LoRA_Training_Best_FLUX_Configs folder for LoRA configs
Quick new Massed Compute install (Oct 2025) : https://www.youtube.com/watch?v=Ym9rdfy2VZ0

Windows Tutorial Published - 68 Min - 74 Video Chapters

Link > https://youtu.be/nySGu12Y05k
It is pre-requirement of cloud tutorials so please watch Windows tutorial fully
Please also upvote and leave a comment to this Reddit post if possible
Please register RunPod from this link : https://runpod.io?ref=1aka98lq
Register for Massed Compute using the following link:
https://vm.massedcompute.com/signup?linkId=lp_034338&sourceId=secourses&tenantId=massed-compute

Tutorials & Resources

Cloud Tutorial Published - For GPU Poor and Multi GPU
- https://youtu.be/-uhL2nW7Ddw
Full Fine Tuning Configs Published
- https://www.patreon.com/posts/112099700
Extract LoRA from Fine Tuned model
- https://www.patreon.com/posts/how-to-extract-112335162
Convert LoRA into FP8 to save huge disk space
- https://www.patreon.com/posts/115376830

29 October 2025 Update v32

I have added a new amazing tool called as Image Preprocessing
This tool is extremely important and useful when you do training with bucketing enabled
I recommend to use this tool, preprocess your training images and checkout how your images actually used during training
Just run Windows_Install_or_Update_Kohya.bat to update

2 October 2025 Update

Sadly Bmaltais stopped developing Kohya GUI therefore I forked his repo and now we are going to use myself developed
- One advantage of this that now we are going to use always latest version of SD Scripts of Kohya
I have extremely optimized and significantly improved the installation and therefore now it will be way faster and more accurately installed on Windows, RunPod and Massed Compute
I have updated libraries to Torch 2.8, CUDA 12.9, Accelerate 0.48, xFormers 0.33, Flash Attention 2.8.3, Sage Attention 2.2 and Triton 3.4 on all platforms
- Now it supports all of the GPUs starting from RTX 1000 series to 5000 series + cloud GPUs like RTX A6000, A100, H200, B200 etc
Moreover I made the app to auto recognize FLUX Krea Dev and FLUX SRPO models as FLUX.1 - remember you have to enable that checkbox
- All you need to do is after loading the config, select downloaded FLUX SRPO as a base model not FLUX Dev model in model path
I have trained the new FLUX SRPO model with our existing DreamBooth configs and compared it to FLUX Krea and FLUX Dev base model
I can confidently say that the FLUX SRPO model is perfectly trainable with our config and it is a little bit more realistic than FLUX Dev
- So for realism from now on I recommend FLUX SRPO
Here below base 1024x1024 no face restoration or upscale made results below
- Remember our upscale preset in SwarmUI 100%+ improves quality like in my this sharing : https://www.patreon.com/posts/133166462 (this was on FLUX dev not on SRPO)
- FLUX Dev Grid : FLUX_Dev_LoRA.jpg
- FLUX SPRO Grid : FLUX_SRPO_LoRA.jpg
- FLUX Krea Grid : FLUX_Krea_LoRA.jpg
- FLUX Dev vs SRPO vs Krea Grid : FLUX_Dev_vs_SRPO_vs_Krea_175_Epoch_LoRA.jpg
FLUX SRPO is an extremely realistic base model compared to FLUX Dev - it is a special fine tune : https://github.com/Tencent-Hunyuan/SRPO
I recommend to get latest zip file and make a fresh install into a new folder if you want to upgrade to the latest version since a lot of installation process changed
The model downloader script upgraded to our special ultra FAST and robust model downloader - like uGet with 16 connections + SHA 256 verification
The Windows_Download_Training_Model_Files.bat will ask you which model you want to download
On RunPod and Massed Compute read the instruction txt files and you will see commands to download any of the models directly

Windows Requirements

Python 3.10.11, FFmpeg, CUDA 12.9, cuDNN 9.12, C++ Tools, MSVC and Git
- Only Python and Git should be sufficient since I precompile libraries but still to be sure i recommend install all
If you get any errors follow below video and its source link
https://youtu.be/DrhUHnYfwC0
https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-111553210

Massed Compute (Recommend Cloud) :

Please register via this link : https://vm.massedcompute.com/signup?linkId=lp_034338&sourceId=secourses&tenantId=massed-compute
- Use our coupon SECourses
- Our coupon works on all GPUs now
  - H100 has amazing price and speed but you can use like RTX A6000 ADA as well
  - Full details here : https://www.patreon.com/posts/26671823
- Then select our image SECourses from Creator dropdown
- Then follow Massed_Compute_Instructions_READ.txt
- Same as my any other Massed Compute installer script
- Example tutorial for learn how to install and use Massed Compute
  - (Starts at 12:58) : https://youtu.be/KW-MHmoNcqo?si=G1WbG-Qw4ujWvOtG&t=778

RunPod (Cloud):

Please register via this link : https://runpod.io?ref=1aka98lq
- Then follow Runpod_Instructions_READ.txt
- Same as my any other RunPod installer script
- Use the template written in Runpod_Instructions_READ.txt file
- Example tutorial for learn how to install and use RunPod
  - (starts at 22:03) : https://youtu.be/KW-MHmoNcqo?si=QN8X8Sjn13ZYu-EU&t=1323

13 August 2025 Update

I have trained FLUX Krea Dev model with our FLUX Dev LoRA configs and compared the results - inside LoRA_Tab_LoRA_Training_Best_FLUX_Configs folder
- Our model downloader in zip file now auto downloads FLUX Krea Dev too
- So after loading your config just change base model to FLUX Krea Dev
- FLUX Krea Dev Tutorial here
  - https://youtu.be/8MvvuX4YPeo
    - 15:31 FLUX Krea Dev vs FLUX Dev: A Detailed Side-by-Side Image Comparison
    - 16:26 How to Easily Train Your Own LoRAs on the New FLUX Krea Dev Model
    - 17:02 Complete Workflow for Generating High-Quality Images with FLUX Krea Dev
    - 18:20 The Final Verdict: Side-by-Side Result of FLUX Krea Dev vs FLUX Dev
I feel like FLUX Krea Dev needs a little bit higher learning rate or longer training
- I recommend longer training
You can see full size grid comparisons below - trained on 28_imgs_dataset.png
- Massive Grid click to download
I may also research Chroma model and publish presets for it, currently my focus is Qwen Image which I believe will be better than FLUX Dev in every aspect
Hopefully full tutorial and very easy to use workflows and presets coming soon for Qwen Image model training i am working on Gradio App and presets

13 July 2025 Update

Gradio broken thus added temporary fix : Temp_Fix_Gradio_Error.bat
RunPod and Massed Compute fix auto applied

29 May 2025 Update

32 GB RAM configs added - not VRAM system RAM
- They are inside LoRA_Tab_LoRA_Training_Best_FLUX_Configs inside 32 GB RAM Configs - Not VRAM - RAM folder
- The difference is that you have to use flux1-dev-fp8.safetensors and now the config has enabled fp8 base unet
Windows_Download_Training_Model_Files.bat updated to prevent possible errors during download of models

13 May 2025 Update

Now on RunPod and Massed Compute our installer supports RTX 5000 series as well as older GPUs like RTX 3090, RTX 4090 etc
Upgraded to Torch 2.7 and CUDA 12.8

4 May 2025 Update

First run installer and then run Windows_RTX5000_Series_Upgrade_Run_After_Install_Finished.bat
- Now it uses official Torch 2.7, CUDA 12.8, and myself compiled xFormers
- This is required for all GPUs
Training models uploaded to myself hosted XET enabled repo for even faster and more stable downloads : https://huggingface.co/MonsterMMORPG/Kohya_Train/tree/main
All configs are up-to-date with best settings
Amazing 22 special prompts added for woman trainings testing into Test_Prompts folder

17 November 2024 Update

Huge improvements arrived with newest block swapping feature of Kohya
Model downloaders are updated and made super fast compared to before on all platforms like Windows, RunPod and Massed Compute - up to 10 times faster
- On Massed Compute downloading all training models only took 1 minute (over 30 GB)
All configs are updated and please look at the 0_Configs_Explanation_Must_Read.jpg inside the LoRA_Tab_LoRA_Training_Best_FLUX_Configs folder
Pick the config depending on your GPU, the quality you target and the speed you need
Please watch above listed tutorials to fully learn how to use
Update your Kohya to latest via Windows_Install_Torch_2_5_Dev_Huge_Speed_Up.bat or it is better to reinstall Kohya make a fresh install

31 October 2024 Update

xFormers and Torch 2.5.1 fully officially published
Thus use Windows_Install_Torch_2_5_Dev_Huge_Speed_Up.bat file
Massed Compute and RunPod installers also updated for Torch 2.5.1 and xFormers 0.0.28.post3
All configs both Fine-Tuning / DreamBooth and LoRA updated to xFormers instead of SDPA
- I find that xFormers slightly yields better results
Recommend RunPod template changed to below
- RunPod Pytorch 2.2.0
  - runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04

7 October 2024 Update

New amazing prompts added inside New_Test_Prompts folder

4 October 2024 Update

Installer files updated according to the latest Bmaltais Updates
xFormers is kept with 0.0.28.post1 unless you install Torch 2.5 for FLUX
Still no xFormers available for Torch 2.5 yet
Bmaltais added a new option to bat file so we won't overwrite anymore gui.bat file - new command is : --noverify
When you use Windows_Update_Kohya_and_Fix_FLUX_Step2.bat on existing installation you will get error, either do a fresh install or open a cmd on kohya_ss folder and execute git stash and then run uıpdater
Bat files are renamed to better like below with order
- 1: Windows_Install_Step_1.bat
- 2: Windows_Update_Kohya_and_Fix_FLUX_Step2.bat
- 3: Windows_Install_Torch_2_5_Dev_Huge_Speed_Up.bat
- To start 4: Windows_Start_Kohya_SS.bat

19 September 2024 Update

Famous 1-layer training configs added
There are 19 double blocks, 38 single blocks in the FLUX model
I have tested 8 blocks from double blocks and 16 blocks from single blocks and decided that single block 7 is best
However, the quality is lower than full LoRA training
Only advantage is generated file size
I have shared my full thoughts, comparisons, research results and conclusions here: Link will be added once published
New 1-layer training configs are inside Single_Layer_Configs folder
Quality 1 is better than Quality 2 and so on, so best quality is Quality 1 config
Quality_1_25800MB_3_7_Second_IT.json means that it is using min 25800 MB VRAM and per step speed is 3.7 second on RTX A6000 (almost same as RTX 3090)

14 September 2024 Update

New test prompts added
Install_Torch_2_5_Dev_Huge_Speed_Up.bat will install Torch 2.5 release candidate version
A music video published with images generated after 256 images training experiment
Video link : https://youtu.be/aVwu2Faw8Iw
256 images training experiment post finalized with lots of details :
> https://www.patreon.com/posts/training-flux-111891669

9 September 2024 Update

I have tested training T5 XXL text encoder with 7 new unique trainings
The results are not a clear improvement
Check the below grids to see the results
T5-Training-Experiments-Prompt-Set-1-Full-Grid.jpg , T5-Training-Experiments-Prompt-Set-2-Full-Grid.jpg , T5-Training-Experiments-Raw-vs-T5-Captioned-Full-Grid.jpg
With T5 XXL, with same LR, it almost adds nothing new
With T5 XXL, with reduced unet LR, we may get somewhat better results but it is subjective and also the likeliness in some images gets reduced - so you need to generate more images to get perfect likeliness
I tested T5 XXL training with captions (Joycaption used) as well but still I didn't see any reason or improvements
All results are on above grids
One of our followers said that T5 XXL helps when you train something that has text on it
So if you are training something with text, you can try
New testing prompts added to the Test_Prompts folder
I started using face_yolov9c.pt - generate yolov8 folder inside SwarmUI Models folder and put there
So I have included the new configs and latest configs are as below
Rank 3 - T5 would barely fit on a 24GB GPU so make sure to lower your VRAM usage a lot

Ultra Detailed Research and Development Article

You can see ultra detailed, lengthy research post here : https://www.patreon.com/posts/110293257
So far I have done 73 different trainings with each having different configuration and parameters. Each training is 3000 steps. You can see entire list here : https://www.patreon.com/posts/110838414
So this tutorial, configs and workflows are result and analysis of over 64 full trainings

If Your Training Terminating at the Stage of Caching Latents

That means your accelşerate is not set accurately
Watch this tutorial to fix : https://youtu.be/adVhm9aI9Gc
If still failing use below yaml file and replace with yours - your Hugging Face cache under accelerate folder
- https://gist.github.com/FurkanGozukara/4ffbde360e99414f9a8b6d7963ccc039

Automatic Installers and Configs

Download attached Kohya_GUI_Flux_Installer zip file
Latest version will be on the top of the post and in the attachments
Extract into a main drive like c:/Kohya_GUI_Flux_Installer_v17 or d:/Kohya_GUI_Flux_Installer_v17
Don't put into c:/windows c:/users etc
Currently Kohya SS GUI main branch doesn't have FLUX so we are going to install sd3-flux.1 branch
Use Windows_Install_Step_1.bat file
Select option 1
Once it is completed setup accelerator as shown here (56 seconds video) : https://youtu.be/adVhm9aI9Gc
Then close and do not install any other options
After this step, for updating installation to latest version and fix libraries run Update_Kohya_and_Fix_FLUX_Step2.bat file
Whenever you want to update Kohya SS GUI to latest version use this file
You can run this file every time before starting a new training to get latest fixes and changes
Once FLUX arrives to main repo I will update installer bat files
After these steps use Windows_Download_Training_Model_Files.bat file
It will download training necessary files into the same folder
These files are from below links
Models Links Downloads
- FLUX dev FP16 (23.8 GB) : https://huggingface.co/OwlMaster/realgg/resolve/main/flux1-dev.safetensors
- Download Clip L (250 MB) : https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors
- T5 XXL FP16 (9.8 GB) : https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors
- FLUX VAE (335 MB) : https://huggingface.co/OwlMaster/realgg/resolve/main/ae.safetensors
Do not use other files that you have because you may get error due to incompatibility
FP8 version base support added but I didn't train with it so can't tell if working as expected or not. Using FP16 base model will not cause more VRAM usage. It will be automatically casted into accurate precision. I prefer and suggest FP16 base model.

How To Use Config Files

After started Kohya SS GUI, pick the config according to the below comparison chart from Best_Configs folder or Best_Configs_Better_Colors (more info below)
Set your training dataset folders, output folder, output file name, number of epochs, save every n-epochs checkpoint, training model files paths and you are set
By default the configs will train 200 epochs and save every 25 checkpoints (you can also save like every 10 checkpoint)
But if you have too many images like 50 or 100 you can set lower epoch count as well - still if you have time train more and save more frequent checkpoints
We don't use regularization / classification images with FLUX because it doesn't improve results
Comparisons posted on research article which linked at the top
Thus we use repeating 1 - very important

Best_Configs_Better_Colors Folder

The difference of this folder that it uses Timestep Sampling as Shift and using Discrete Flow Shift value as 1
You can see quick research comparison grid here : Shift_Value_1_Test.jpg
An imgsli comparison (10 different images) here (a is Sigmoid b is Shift sampler) : https://imgsli.com/MjkxNzAy
Shift timestep sampling definitely improved colors and overall composition
In some images you will see reduced likeness but in some images even better
It requires more research

Batch Size Experiments And Multi GPU Usage

I have used Rank_2_27360MB_Fast.json config to test batch size impact on RTX A6000
The speed gain from batch size is almost none thus I don't suggest since you lose quality
Lesser batch size = better quality - i tested this so many times and people doesn't properly test
You should use more batch size only when you need more speed
Only Batch size 2 gives you some gain so you may use it if you wish
Batch size 1 : 4.54 second / it : effective speed same
Batch size 2 : 7.98 second / it : effective speed per step 3.99 second / it
Batch size 3 : 12.43 second / it : effective speed per step 4.14 second / it
Batch size 4 : 15.28 second / it : effective speed per step 3.82 second / it
Batch size 5 : 20.18 second / it : effective speed per step 4.03 second / it
Therefore I have added batch size 1 but 4 A6000 GPU Speed config : 4x_GPU_Batch_Size_1.json
With this config you get 5.75 second / it and effective speed per step is : 1.4375 second / it
When you use multiple GPU you need to divide epoch count to number of GPUs
So 200 epoch becomes 50 for 4x GPU
Also increase LR with this formula - best LR x (batch size number of GPUs / 2) so in this case 0.00005 (4/2) = 0.0001
So if you make batch size 2 it becomes = 0.00005 x (2x4/2) = 0.0002
The zip file now has 4x_GPU_Batch_Size_1.json and 4x_GPU_Batch_Size_2.json
I suggest you to use 4x_GPU_Batch_Size_1.json on a 4x RTX A6000 GPU machine

How To Prepare Dataset

I have done extensive testing as shared in above research article
I find that ohwx man yields best results when training a person
If you are gonna train multiple person you can omit class prompt man and just train as ohwx, bbuk, and such random weird words
Since FLUX has an internal Text Encoder alike system it will still have internal captioning effect and learn fully
For style training captioning may still yield better results
I suggest you to use fully multi-GPU supporting and batch size having with lots of features having JoyCaption app : https://www.patreon.com/posts/joycaption-image-110613301
To batch edit generated captions like replacing words, injecting words, use our self developed amazing batch caption editor Gradio APP : https://www.patreon.com/posts/108992085
Hopefully I will do more research on style, object like clothings and multi concept training
The below is my used dataset
It is a low quality dataset since doesn't have expressions, distant shots, different backgrounds and clothings
But still works really really good with FLUX
So when you prepare a better dataset you will get better results
Make sure that your images have very good lightning and focus
Training in higher resolution very slightly yields better results look research thread above
Training in lower resolution yields significantly lower quality - minimum images should be 1024x1024
I have amazing auto subject zoom, crop and resizing scripts
Check this video : https://youtu.be/Fbuyu35TkE4
Auto subject zoom + cropping and then resizing scripts shared here : https://www.patreon.com/posts/sota-subject-and-88391247
Even though I have very extensively tested reg / class images they don't help. You can see results in research post link shared above
You can use .txt for caption tokens or just folder names like 1_ohwx man (so it is equal as having .txt files that contains text ohwx man)
Set repeating 1 since we do not use regularization / classification images
What does repeating means and how it works explained here by Kohya : https://github.com/kohya-ss/sd-scripts/issues/640
Below is my used dataset

How To Use FLUX and LoRAs After Trainings Have Been Completed

I prefer using SwarmUI but you can use ComfyUI and Forge Web UI as well
I have excellent tutorials for SwarmUI
Main SwarmUI tutorial 90 minutes fully chaptered : https://youtu.be/HKX8_F1Er_w
SwarmUI Cloud tutorial (RunPod, Massed Compute, Kaggle) : https://youtu.be/XFUZof6Skkw
SwarmUI backend command (improve 4xxx cards perf) --fast
SwarmUI FLUX tutorial (Windows, RunPod, Massed Compute, Kaggle) : https://youtu.be/bupRePUOA18
I have auto FLUX models downloader for SwarmUI for Windows, RunPod and Massed Compute : https://www.patreon.com/posts/109289967
You can also use in Forge Web UI : https://www.patreon.com/posts/110323512
This above post has auto FLUX models downloaders for Forge Web UI on Windows, RunPod and Massed Compute and auto installers of Forge Web UI on Massed Compute and Windows
ComfyUI automatic Installers for Windows, RunPod, Massed Compute, Linux : https://www.patreon.com/posts/105023709
nvitop command - open a cmd and type : pip install nvitop
after install type : nvitop

How To Train and Use On RunPod and Massed Compute

Please register RunPod from this link : https://runpod.io?ref=1aka98lq
Register for Massed Compute using the following link:
https://vm.massedcompute.com/signup?linkId=lp_034338&sourceId=secourses&tenantId=massed-compute
The zip file contains instructions to install for RunPod and Massed Compute
I suggest to use Massed Compute it is better
For using on RunPod you can use SwarmUI, ComfyUI or Forge Web UI
Follow above tutorials for usage

How To Connect SwarmUI from your PC that is Running on Massed Compute - Preffered Cloudflare Way

First open a terminal and execute below commands to install cloudflared

wget https://github.com/cloudflare/cloudflared/releases/download/2024.8.2/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared-linux-amd64.deb

Then run SwarmUI to update to update it latest, then close the started SwarmUI terminal

Then open a new terminal and execute below commands and it will start SwarmUI and will give you a cloudflare link like (my-pills-sailing-pad-netherlands.trycloudflare .com) that you can connect

cd /home/Ubuntu/apps/StableSwarmUI/
./launch-linux.sh --launch_mode none --cloudflared-path cloudflared

How To Save and Download Your Models From Hugging Face - CivitAI

Use this amazing notebook : https://www.patreon.com/posts/104672510
It is fully optimized for Hugging Face upload and download
Especially useful for cloud services like RunPod and Massed Compute
It is also updated to V5 recently with huge improvements
Follow this tutorial : https://www.youtube.com/watch?v=X5WVZ0NMaTg

Best SDXL and SD 1.5 Configs

For SDXL and SD 1.5 config we use reg images (improves quality a lot) : https://www.patreon.com/posts/87700469
Best updated SDXL config for Kohya SS GUI : https://www.patreon.com/posts/89213064
Best updated SD 1.5 config for Kohya SS GUI : https://www.patreon.com/posts/97379147
If you need LoRA for SD 1.5 or SDXL, do a full fine-tuning / DreamBooth with above configs and extract LoRA as shown here : https://www.patreon.com/posts/108634568