NokiMo
Furkan Gözükara
Furkan Gözükara

patreon


Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute

Full step by step Kohya SS GUI FLUX LoRA training. Includes 4 to 48 GB GPUs very best optimized configs. For Windows, RunPod, Massed Compute

Patreon exclusive posts index to find our scripts easily, Patreon scripts updates history to see which updates arrived to which scripts and amazing Patreon special generative scripts list that you can use in any of your task.

Join discord to get help, chat, discuss and also tell me your discord username to get your special rank : SECourses Discord

Please also Star, Watch and Fork our Stable Diffusion & Generative AI  GitHub repository and join our Reddit subreddit and follow me on LinkedIn (my real profile)

=======

Windows Tutorial Published - 68 Min - 74 Video Chapters

Tutorials & Resources

29 October 2025 Update v32

2 October 2025 Update

Windows Requirements

Massed Compute (Recommend Cloud) :

RunPod (Cloud):


13 August 2025 Update

13 July 2025 Update

29 May 2025 Update

13 May 2025 Update

4 May 2025 Update

17 November 2024 Update

31 October 2024 Update

7 October 2024 Update

4 October 2024 Update

19 September 2024 Update

14 September 2024 Update

9 September 2024 Update

Ultra Detailed Research and Development Article

If Your Training Terminating at the Stage of Caching Latents

Automatic Installers and Configs

How To Use Config Files

Best_Configs_Better_Colors Folder

Batch Size Experiments And Multi GPU Usage

How To Prepare Dataset

How To Use FLUX and LoRAs After Trainings Have Been Completed

How To Train and Use On RunPod and Massed Compute

How To Connect SwarmUI from your PC that is Running on Massed Compute - Preffered Cloudflare Way

First open a terminal and execute below commands to install cloudflared

Then run SwarmUI to update to update it latest, then close the started SwarmUI terminal

Then open a new terminal and execute below commands and it will start SwarmUI and will give you a cloudflare link like (my-pills-sailing-pad-netherlands.trycloudflare .com) that you can connect

How To Save and Download Your Models From Hugging Face - CivitAI

Best SDXL and SD 1.5 Configs

Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX LoRA Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute

Comments

i don't think so you can train on mac sadly. so train on Massed Compute or RunPod then download models and use on your mac with ComfyUI or SwarmUI

Furkan Gözükara

Hi, I have a MacBook Pro m4 max with 64gb, can I do this on it? I’m having trouble finding comfyui material for it, can you tell me a workflow for flux image generation and how to train a lora. If you have videos, show me which ones I should watch. Thank you very much.

MR

enable full bf16 training. Traceback (most recent call last): File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\sd-scripts\flux_train_network.py", line 547, in trainer.train(args) File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\sd-scripts\train_network.py", line 880, in train unet = self.prepare_unet_with_accelerator(args, accelerator, unet) # accelerator does some magic here File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\sd-scripts\flux_train_network.py", line 518, in prepare_unet_with_accelerator accelerator.unwrap_model(flux).prepare_block_swap_before_forward() File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\sd-scripts\library\flux_models.py", line 1009, in prepare_block_swap_before_forward self.offloader_double.prepare_block_devices_before_forward(self.double_blocks) File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 246, in prepare_block_devices_before_forward weighs_to_device(b, torch.device("cpu")) # make sure weights are on cpu File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 106, in weighs_to_device module.weight.data = module.weight.data.to(device, non_blocking=True) torch.AcceleratorError: CUDA error: resource already mapped CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. Traceback (most recent call last): File "C:\Users\Flori\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Flori\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 6, in sys.exit(main()) File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 50, in main args.func(args) File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1213, in launch_command simple_launcher(args) File "E:\koya\Kohya_FLUX_DreamBooth_LoRA_v31\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 795, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\\koya\\Kohya_FLUX_DreamBooth_LoRA_v31\\kohya_ss\\venv\\Scripts\\python.exe', 'E:/koya/Kohya_FLUX_DreamBooth_LoRA_v31/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'E:/koya/Kohya_FLUX_DreamBooth_LoRA_v31/training/training_run/training_ohwx_lora_flux\\model/config_lora-20251010-224918.toml']' returned non-zero exit status 1.

Floppe

Hi, I keep getting this error every time I start the training. Could you tell me what I’m doing wrong? Thanks a lot for the great tutorial. Best, Floppe

Floppe

ye that means install error somewhere. can you try again and send me install logs : monstermmorpg@gmail.com

Furkan Gözükara

After following the instruction for v31 on Massed Compute A6000, it's saying gradio not found after executing everything in the massed compute install instructions.

Leon Emmanuel ISHIMWE

i recommend train on this example dataset to see if working : https://www.patreon.com/posts/114972274

Furkan Gözükara

Hello, I'm having some strange issue with result character loras - all of them are producing just latent image while generating with FLux (at low lora weights like 0.2-0.8 it just didn't catch the character and after 0.8-1 image start noising and just becomes latent image if I set weight to 1.5-2). I'm using latest kohya, latest sd-scripts, and 24GB VRAM configs (tried them all) on my RTX 5090. I tried also many different options to change (tried fp8 and fp16 both flux model and text encoder in all combinations), including full reinstall of kohya and trying to make training inside ComfyUI with the ComfyUI-FluxTrainer node.

Ariloum

probably you dont need. when i used 256 images my best epoch was 70. even 50 can be sufficient

Furkan Gözükara

Thanks for this tutorial. I learned a loooot ! 🥳 I am training a style, fully captioned by hand, my dataset contains 247 images. Should I still aim for 200 epochs ? It seems like a lot

Yvan Nost

it works right away with our workflow. just change base model to krea. it is tested and verified.

Furkan Gözükara

Hi thank you for this tutorial, I very successfully trained flux several times with this technique and your presets, I am interested in trying it with flux krea, on a RTX590, I wondered if you had any plans to explore this?

DavidO

he added that into musubi tuner recently

Furkan Gözükara

Is it possible to train Kontext LoRAs with Kohya_ss?

Christopher

hello i need more information. what you are trying to do exactly?= did you follow this tutorial? https://youtu.be/X5WVZ0NMaTg

Furkan Gözükara

it is. when you do multi gpu you need to reduce epoch count with number of gpus. so 100 epoch 2x gpu = 1 x gpu 200 epochs

Furkan Gözükara

Hello. Thank you for your work, it has helped me a lot. I want to run a Lora training on Massed Compute. I’m having trouble following your instructions. Before starting the training, I tried to upload a file to the Huge Face repository, but I’m getting this error. I’ve tried multiple options, and this seems to be the last one. Could you please tell me what might be causing this issue? ///////////////////////////////////////////////////////////////////////////////////////// """Upload a large folder to the Hub in the most resilient way possible. ■ Stable-Diffusion 14m ago 5255 5256 Several workers are started to upload files in an optimized way. Before being committed to a repo, files must be style_models ignore_patterns-ignore_patterns, num_workers=num_workers, print_report=print_report, print_report_every=print_report_every, File -/apps/venv/lib/python3.10/site-packages/hugging face_hub/_upload_large_folder.py:84, in upload_large_folder_internal (api, repo_id, folder_path, repo_type, revision, private, allow patterns, ignore_patterns, num_workers, print_report, print_report_every) 82 folder_path = Path (folder_path).expanduser().resolve() 83 if not folder_path.is_dir(): ---> 84 raise ValueError(f "Provided path: '{folder_path}' is not a directory") 86 if ignore_patterns is None: 87 ignore_patterns = [] ValueError: Provided path: '/apps/Stable SwarmUI/Models/Lora' is not a directory Upload a single file with specific name to remote repo - Wait till UPLOAD COMPLETED printed [ ]: # This cell is used to upload single file into a repo with certain name from hugging face_hub import HfApi api = HfApi() api.upload_file( path_or_fileobj=r"/home/ubuntu/apps/stable-diffusion-webui/models/Stable-diffusion/model_name.safetensors", path_in_repo="model_name.safetensors"

v. svobodov

is multi gpu training not faster when training sdxl ?

Hazard

full style training tutorial here : https://huggingface.co/MonsterMMORPG/3D-Cartoon-Style-FLUX

Furkan Gözükara

Whats the best steps and epoch for style lora training?

Hazard

Or was it because I selected "end current session" while connecting thorough ThinLinc?

Daniel Cardona Ramirez

I was in the middle of a training in massed compute and I had to restart my local PC. When I connected again I noticed it was closed as if the massed compute PC restarted. Is this normal?

Daniel Cardona Ramirez

sorry for late reply did you proceed and run? it should run

Furkan Gözükara

no it is still there . it is for lora training are you on LoRA tab or on dreambooth? https://pasteboard.co/a6jecC5cPfIA.png

Furkan Gözükara

in the video: "Blazing Fast & Ultra Cheap FLUX LoRA Training on Massed Compute & RunPod Tutorial - No GPU Required!" You said at minute 22:30 that I could disable t5 attention to hugely speed up training loosing some quality, but I can't find that field in the current version of Kohya. Was this removed in an update?

Daniel Cardona Ramirez

Hello! I'm having trouble with the “(Optional) Manually configure Accelerate” step. I typed “5” and pressed Enter, but nothing happens, even after waiting. I’ve uninstalled and reinstalled multiple times with no success. My GPU is an RTX 5090 + cuda toolkit 12.8

Victor Gabriel Carrillo

well i compared several models in past and certainly you can use it. normally i use official dev since my datasets are good but if yours not very good it can be helpful. check his article : https://medium.com/@furkangozukara/thoroughly-experimented-with-fine-tuning-dreambooth-training-of-flux-dev-de-distill-pixelwave-13fab8c625bf

Furkan Gözükara

I've seen articles that recommend using dev2pro for Flux LoRA and fine-tuning training. Have you experimented with it? I would appreciate your perspective on this. Additionally, do you recommend using the official dev model?

洋次郎 山崎

all presets works. only dataset changes slightly you need a better dataset.

Furkan Gözükara

Do you have Kohya settings for training an object? Like a purse, cellphone, etc.... or will using the latest flux lora presets work for that?

Basilio Soto

hi your model download failed for some reason. so your model corrupted. delete model and redownload

Furkan Gözükara

Hi, thanks for your Tutorials! I followed the one to install Kohya and i wanted to start a Training run with one of your configurations. I am getting the following message, maybe you can help:Traceback (most recent call last): File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\sd-scripts\train_network.py", line 1884, in trainer.train(args) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\sd-scripts\train_network.py", line 589, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\sd-scripts\train_network.py", line 179, in load_target_model text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype, accelerator) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\sd-scripts\library\train_util.py", line 5480, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\sd-scripts\library\train_util.py", line 5435, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint( File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\sd-scripts\library\model_util.py", line 1000, in load_models_from_stable_diffusion_checkpoint _, state_dict = load_checkpoint_with_text_encoder_conversion(ckpt_path, device) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\sd-scripts\library\model_util.py", line 977, in load_checkpoint_with_text_encoder_conversion checkpoint = torch.load(ckpt_path, map_location=device) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\venv\lib\site-packages\torch\serialization.py", line 1548, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None _pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Unsupported operand 192 Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. Traceback (most recent call last): File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "F:\Programme\Kohya_FLUX_DreamBooth_LoRA_v28\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['F:\\Programme\\Kohya_FLUX_DreamBooth_LoRA_v28\\kohya_ss\\venv\\Scripts\\python.exe', 'F:/Programme/Kohya_FLUX_DreamBooth_LoRA_v28/kohya_ss/sd-scripts/train_network.py', '--config_file', 'F:\\Programme\\Lora_Training\\Neuer Ordner (3)\\model/config_lora-20250519-102357.toml']' returned non-zero exit status 1. 10:24:22-233429 INFO Training has ended.

Petz

best dreambooth : 48GB_GPU_28200MB_6.3_second_it_Tier_1.json best lora : 48GB_GPU_Quality_Tier_3_28400MB_4.8_Second_IT.json

Furkan Gözükara

Hello Furkan, what are the best settings for Dreambooth and Lora training with an RTX 5090?

puk

hi bat files are only for windows. follow this file : RunPod_Install_Instructions.txt if you are using permanent storage yes run installers everytime because it installs some of them to the local storage which is deleted everytime you change pod. it should be still faster

Furkan Gözükara

hello welcome. i recommend dreambooth and if you have 64 gb ram it works great with 6 gb gpus : https://youtu.be/FvpWy1x5etM if you want lora for sure here tutorial : https://youtu.be/nySGu12Y05k for lora use config : 10GB_GPU_Quality_Tier_4_FP8_9250MB_5.8_Second_IT.json for dreambooth use : 12GB_GPU_11100MB_9.5_second_it_Tier_1.json or 10GB_GPU_8700MB_14.0_second_it_Tier_1.json make sure that you use minimum amount of VRAM before starting the training

Furkan Gözükara

Hello, came to this patreon coz saw some ads in a reddit group about training flux lora with a 3060... where can I find that tutorial?

Alexandre Leitão

I use RunPod with permanent network volume. I rented a cheap RTX A5000 to install everything and download all models. Everything went fine and I was able to open Kohya. I closed the pod to open a new one with four A40. Then I tried to execute Windows_Start_Kohya_SS.bat and nothing happened, I got this response: bash: @echo: command not found bash: REM: command not found bash: REM: command not found bash: gui.bat: command not found bash: REM: command not found bash: pause: command not found Do I have to make a whole installation everytime I open a new pod even with network volume? Maybe I did something wrong with my installation in the first place? I runned chmod +x RunPod_Kohya_FLUX_Installer_part1.sh ./RunPod_Kohya_FLUX_Installer_part1.sh then chmod +x RunPod_Kohya_FLUX_Installer_part2.sh ./RunPod_Kohya_FLUX_Installer_part2.sh then I copied all text from Windows_Kohya_Update.bat and executed it with terminal, and then I did the same with Windows_RTX5000_Series...Finished.bat

stasiu andraczek

testing right now with a fresh install

Furkan Gözükara

I install it here i follow your instructions and also install the Windows_RTX5000_Series_Upgrade_Run_After_Install_Finished.bat because i have a RTX 5090, but when i start a train i got always a error, any tip what may cause it? 10:46:23-348967 INFO Saving training config to D:/Ai/Train_Assets/waterline/model\flux_waterline_v1_20250502-104623.json... 10:46:23-351968 INFO Executing command: C:\AI\Flux_Train\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 C:/AI/Flux_Train/kohya_ss/sd-scripts/flux_train_network.py --config_file D:/Ai/Train_Assets/waterline/model/config_lora-20250502-104623.toml Could not find the bitsandbytes CUDA binary at WindowsPath('C:/AI/Flux_Train/kohya_ss/venv/lib/site-packages/bitsandbytes/libbitsandbytes_cuda128.dll') The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. Traceback (most recent call last): File "C:\AI\Flux_Train\kohya_ss\sd-scripts\flux_train_network.py", line 14, in import train_network File "C:\AI\Flux_Train\kohya_ss\sd-scripts\train_network.py", line 26, in from diffusers.models.autoencoders.autoencoder_kl import AutoencoderKL File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\diffusers\models\autoencoders\__init__.py", line 1, in from .autoencoder_asym_kl import AsymmetricAutoencoderKL File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\diffusers\models\autoencoders\autoencoder_asym_kl.py", line 22, in from ..modeling_utils import ModelMixin File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 35, in from ..quantizers import DiffusersAutoQuantizer, DiffusersQuantizer File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\diffusers\quantizers\__init__.py", line 15, in from .auto import DiffusersAutoQuantizer File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\diffusers\quantizers\auto.py", line 22, in from .bitsandbytes import BnB4BitDiffusersQuantizer, BnB8BitDiffusersQuantizer File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\diffusers\quantizers\bitsandbytes\__init__.py", line 2, in from .utils import dequantize_and_replace, dequantize_bnb_weight, replace_with_bnb_linear File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\diffusers\quantizers\bitsandbytes\utils.py", line 32, in import bitsandbytes as bnb File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\bitsandbytes\__init__.py", line 15, in from .nn import modules File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\bitsandbytes\nn\__init__.py", line 21, in from .triton_based_modules import ( File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\bitsandbytes\nn\triton_based_modules.py", line 7, in from bitsandbytes.triton.int8_matmul_mixed_dequantize import ( File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\bitsandbytes\triton\int8_matmul_mixed_dequantize.py", line 12, in from triton.ops.matmul_perf_model import early_config_prune, estimate_matmul_time ModuleNotFoundError: No module named 'triton.ops' Traceback (most recent call last): File "C:\Users\smere\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\smere\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\AI\Flux_Train\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\AI\Flux_Train\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\AI\\Flux_Train\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/AI/Flux_Train/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/Ai/Train_Assets/waterline/model/config_lora-20250502-104623.toml']' returned non-zero exit status 1. 10:46:30-568552 INFO Training has ended.

Sérgio Merêces

yes this happens with lora training. avoiding this is possible with dreambooth. i am able to get 2048x2048 with no banding at all when i do dreambooth . i dont know exact solution for banding with lora yet

Furkan Gözükara

Hi there Doctor, i have good results with the Kohya workflows, that you gave. I am using a 4090 for the training and the speeds are also good. I have no issues with upto Tier 3,but for Tier 2 and Tier 1, using FLux workflows, i seem to getting Flux banding, kind of like scan lines on the image whenever the image is larger than 1x in fred workflows, so above 1024 by 1024. and it alm,ost dissapears belows 1x. The images are very detailed at higher resolutions, but because of the banding/scan lines when using the Lora they are unusable.

Ken

just the installer. i recommend use Kohya_FLUX_LoRA_DreamBooth_v22.zip - it installs latest branch of github repo. the author combined and merged into main

Furkan Gözükara

Hey Furkan, which is the latest? Kohya_FLUX_LoRA_DreamBooth_v22.zip or Kohya_GUI_Flux_Installer_v46.zip ? What is the difference?

Casper Smit

you must have error somewhere else. can you do this. get lastest file. make a fresh install. when running, open a cmd and type bat file name fully and run that way. it will show error and not close.

Furkan Gözükara

added to top of the page

Furkan Gözükara

How to download the configuration file, I don't see the download link.

锦涛 何

After setting python 3.10 as default, the .bat file won't give any error but it doesn't start. It just opens for a second and closes, but nothing happens

Z

you need to have python 3.10 as default or wont work. please follow this video : https://youtu.be/DrhUHnYfwC0

Furkan Gözükara

After fresh install: Traceback (most recent call last): File "E:\TRAINING\kohya_ss\kohya_gui.py", line 6, in import gradio as gr ModuleNotFoundError: No module named 'gradio'

Taiga

sadly i dont have good workflows for comfyui. that is why i recommend swarmui

Furkan Gözükara

hello, i have trained a model following your tutorial and would like to test it out in Comfy Ui but am unable to get it to work with flux. Is there a workflow you can recomend I use to do this in Comfyui?

m_sil

Replied back sorry for late reply. Discord is best to communicate

Furkan Gözükara

I have an issue with CUDA memory when fine tuning full model, can you please check the DM's

Valentyn Shumakher

if you want such precise control makes total sense

Furkan Gözükara

I have trained an object like a car, however it's difficult to get it to be positioned properly. For example if I want the front of the car to be in the foreground, sometimes I get the back of the car. Is this a situation where it would be beneficial to train the object with description of how it is viewed in the photo? For example: "Z3 car front", "Z3 car side" "Z3 car above" "Z3 car rear" "Z3 car front right"

Hockey

hello sorry for late reply. this zip file has FLUX fine tuning and lora. for SDXL and SD 1.5 we have different posts: use this zip file to install but use these below post configs SD 1.5 : https://www.patreon.com/posts/very-best-kohya-97379147 SDXL : https://www.patreon.com/posts/very-best-for-of-89213064

Furkan Gözükara

Hello just join the patreon today and want to train Lora. Is this Kohya_GUI_Flux_Installer_v46.zip can be used for Lora Flux, SDXL and SD 1,5 ? Should I start from here? thanks for the help

guni

this config is best for you : 12GB_GPU_11100MB_9.5_second_it_Tier_1.json . make sure that you are not using more than 800 MB vram before starting training also you can compare its speed with 10GB_GPU_8700MB_14.0_second_it_Tier_1.json

Furkan Gözükara

can u plz tell me iam fresh installing bcz iam newbie for this .and also tell me which is one Best_Configs_Better_Colors Folder . my gpu is rtx 3060 12gb so plz suggest which i go through der are new version der and old version der tq

kalayan rao

i hope so. hopefully will test asap

Furkan Gözükara

2.6.0 torch available ! ! will this speed up the training even more ?

Sonivas Sx

hi sorry for late reply. looks like you either given inaccurate model or didnt download model correctly: NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Furkan Gözükara

Hi, I'm getting this error on MassedCompute while generating Dreambooth. Can you help me? On my local PC, I don't have any issues. Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 849, in train(args) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 197, in train ae.to(accelerator.device, dtype=weight_dtype) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1340, in to return self._apply(convert) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply module._apply(fn) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply module._apply(fn) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 900, in _apply module._apply(fn) [Previous line repeated 3 more times] File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in _apply param_applied = fn(param) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1333, in convert raise NotImplementedError( NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/Ubuntu/apps/kohya_ss/venv/bin/python3.10', '/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/Ubuntu/Downloads/Risultati/Lacoste/model/config_dreambooth-20250124-170720.toml']' returned non-zero exit status 1. 17:07:32-034530 INFO Training has ended.

Sergio Valsecchi

wow i never tried this. when training lora you can train T5 it can be useful there

Furkan Gözükara

I just want to let everyone know that I discovered an interesting thing and tested it. You can literally prompt the T5xxl encoder in your captions. You just give it instructions after the captions in parentheses. So it would look like this: "A dragon flying in the sky over a town. (Pay attention to the details in the scales and notice the shape of it's wings.)" You caption regularly and then in () you give it instructions. I have tested with and without and there is a noticeable difference when training. If there is a specific detail I want the lora to learn I make sure to instruct T5 for the images that have the detail.

VelorianX

when you give the flux dev model path it becomes enabled

Furkan Gözükara

i have de-distilled training what you mean please give more info : https://www.patreon.com/posts/114969137

Furkan Gözükara

Please update this with the new de-distilled lora training method for flux-dev lora training. I am currently trying it and it's way better than training via the regular flux-dev model. Can't emphasize how much of a game-changer this new method is. You will be blown away at how beautifully it converges.

VelorianX

Hi! I am trying to follow this tutorial to fine tune flux on my linux system, I am running a dockerized version of kohya from the main repo. I specifically installed the flux 1 branch and then reinstalled kohya but I am not showing the flux 1 option when looking at the fine tune page on kohya, any insight on this?

Nick Prestine

hello. it doesnt work on mac. i recommend you to use massed compute we have installers and coupon and everything

Furkan Gözükara

Hello everyone, I'm currently trying to train a Flux LoRA model using Kohya SSon my Mac, but I'm running into an issue with the learning rate scheduler. Machine Specs: MacBook with M4 Pro Max 48 GB RAM 1000 GB storage OS: Sequoia 15.2 The Problem: Error with the Scheduler: In the logs, I see an error related to the learning rate scheduler: TypeError: CosineAnnealingLR.__init__() missing 1 required positional argument: 'T_max' I understand that for CosineAnnealingLR, the parameter T_max (which typically should be equal to the total number of training steps—for example, with 86 images and 35 epochs, roughly 3010 steps) is required so that the scheduler knows over how many steps to adjust the learning rate. This is my config: https://drive.google.com/file/d/1HUSbsO48nULLRLnRpwZT6ja6Z6px2V29/view?usp=sharing

Jose Cruz

I experienced the same issue when enabling the "Apply T5 Attention Mask" option, but it works fine when I disable it

محمد الذوادي

well it is purely experience based. but for style training only style should repeat in training images no other subjects or objects. also style should be consistent. these are 2 key things. i am giving private consultation as well if you need

Furkan Gözükara

Hello, I want to use Flux to train a style Lora model. I now have 100 different character pictures (white background) of the same style and 100 scene pictures. I will use a 4090 graphics card for training. How do you think I should choose the data? Looking forward to your reply~ ❤️ ❤️

努力生活的超子

great

Furkan Gözükara

open a cmd window in that folder, type the bat file name fully and start that way. when you do that way it wont auto close and you will see the error reason

Furkan Gözükara

I found the settings I needed thanks to your configs. Not in 5-7 minutes, but also quickly. It remains to make improvements. Thank you.

Александр Щербинин

When I chose "1.Install kohya_ss GUI" Menu in "Kohya_ss setup menu" and enter the CMD window closes. I can't install anymore. what's problem?

인폴 레

yes i know how they work. they use expensive GPU like H100, very high batch size, higher learning rate and possibly training only certain layers instead of full LoRA. all these reduces quality but brings speed. they could be even using multiple GPUs since you almost get linear speed increase for LoRA

Furkan Gözükara

It's not that you have bad configs. I see that everything works well and the examples are impressive. I thought maybe you already have a ready-made solution or you know how such services work))

Александр Щербинин

my configs are not gonna make 7-8 minute . and there is absolutely no way my configs will be worse. compare with my config lora here and i shared dataset too you can test and compare : https://civitai.com/models/918952/dwayne-johnson-aka-the-rock-flux-dev-lora-model-for-educational-and-research-purposes-full-tutorial if you need faster config i can make research for you with a budget

Furkan Gözükara

We would like to hear your opinion. There is a service https://www.basedlabs.ai/ that trains Lora in about 7-8 minutes. 2000 steps / Number of photos: 30-50, 1024*1024. The results are quite good, and we would like to achieve the same level or better, but locally or through Runpod. The task is to train a large number of different people in a short period of time. For some reason, the result was worse with your configs ((

Александр Щербинин

use 2.5.0 it works best on windows

Furkan Gözükara

Good afternoon. Tell me, please, does torch 2.5.1 affect the quality of generation? It feels as if after the installation of the "Lora" they began to work worse.

Александр Щербинин

you are still using python 3.11. you need to use python 3.10 : "C:\Users\mmerc\AppData\Local\Programs\Python\Python311\Lib\dataclasses.py", line 959, in _process_class

Furkan Gözükara

During install, I'm getting the following error (note - that this is running locally on a Windows Machine . The machine has multiple versions of python installed, but is set to 3.11 in the venv. I changed this manually as even though environment variables had 311 listed first in the path, it kept pinning to an early 3.10 version which was outside of the range in the script) Any thoughts on how to remediate the issue? [12/09/24 04:51:15] INFO Kohya_ss GUI version: v24.2.0 setup_common.py:372 INFO Python version is 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 setup_common.py:28 bit (AMD64)] INFO Submodule initialized and updated. setup_common.py:53 INFO Installing/Validating requirements from setup_common.py:161 requirements_pytorch_windows.txt... [12/09/24 04:51:16] INFO Looking in indexes: https://pypi.org/simple, setup_common.py:183 https://pypi.ngc.nvidia.com, https://download.pytorch.org/whl/cu124 INFO Obtaining file:///A:/Kohya_GUI_Flux_Installer_v46/kohya_ss/sd-scripts setup_common.py:183 (from -r requirements.txt (line 37)) INFO Preparing metadata (setup.py): started setup_common.py:183 [12/09/24 04:51:17] INFO Preparing metadata (setup.py): finished with status 'done' setup_common.py:183 [12/09/24 04:51:38] INFO Installing collected packages: library setup_common.py:183 INFO Attempting uninstall: library setup_common.py:183 INFO Found existing installation: library 0.0.0 setup_common.py:183 INFO Uninstalling library-0.0.0: setup_common.py:183 [12/09/24 04:51:41] INFO Successfully uninstalled library-0.0.0 setup_common.py:183 INFO Running setup.py develop for library setup_common.py:183 [12/09/24 04:51:42] INFO Successfully installed library-0.0.0 setup_common.py:183 Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "A:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 4, in File "A:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 19, in from accelerate.commands.estimate import estimate_command_parser File "A:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\Lib\site-packages\accelerate\commands\estimate.py", line 34, in import timm File "A:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\Lib\site-packages\timm\__init__.py", line 2, in from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \ File "A:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\Lib\site-packages\timm\models\__init__.py", line 28, in from .maxxvit import * File "A:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\Lib\site-packages\timm\models\maxxvit.py", line 225, in @dataclass ^^^^^^^^^ File "C:\Users\mmerc\AppData\Local\Programs\Python\Python311\Lib\dataclasses.py", line 1221, in dataclass return wrap(cls) ^^^^^^^^^ File "C:\Users\mmerc\AppData\Local\Programs\Python\Python311\Lib\dataclasses.py", line 1211, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\mmerc\AppData\Local\Programs\Python\Python311\Lib\dataclasses.py", line 959, in _process_class cls_fields.append(_get_field(cls, name, type, kw_only)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\mmerc\AppData\Local\Programs\Python\Python311\Lib\dataclasses.py", line 816, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default for field conv_cfg is not allowed: use default_factory [12/09/24 04:52:05] ERROR Error occurred while running command: accelerate config default setup_common.py:674 ERROR Error: Command 'accelerate config default' returned non-zero exit setup_common.py:675 status 1.

Marc Mercuri

DUDDDDDDDDDDE! Thank you so much. I could not figure out this out. No matter what I did, I was getting this error and I was not able to solve the issue. VRAM memory error!

Justin McDonald

hi can you tell me which config file you used exactly? you shouldnt modify config files configuration. please let me know config name. with 68 gb ram it should work but are you usre you have 68 gb RAM ? i doubt that you have it . can it be 64 ?

Furkan Gözükara

I tried to train a size 768,512 img in my 3060ti VRAM and 68G RAM device it shows like this. enable full bf16 training. running training / 学習開始 num examples / サンプル数: 70 num batches per epoch / 1epochのバッチ数: 70 num epochs / epoch数: 200 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 14000 steps: 0%| | 0/14000 [00:00 train(args) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\sd-scripts\flux_train.py", line 682, in train accelerator.backward(loss) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 2159, in backward loss.backward(**kwargs) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\torch\_tensor.py", line 581, in backward torch.autograd.backward( File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\torch\autograd\__init__.py", line 347, in backward _engine_run_backward( File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\torch\autograd\graph.py", line 825, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 1125, in unpack_hook frame.recompute_fn(*args) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py", line 1519, in recompute_fn fn(*args, **kwargs) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\sd-scripts\library\flux_models.py", line 833, in _forward attn = attention(q, k, v, pe=pe, attn_mask=attn_mask) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\sd-scripts\library\flux_models.py", line 452, in attention x = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 9.52 GiB is allocated by PyTorch, and 818.46 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) steps: 0%| | 0/14000 [00:48 sys.exit(main()) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_v46\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_v46/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:\\train_imgs\\model/config_dreambooth-20241205-234453.toml']' returned non-zero exit status 1. 23:54:37-269084 INFO Training has ended. 23:58:29-068957 INFO Loading config...

秀煜玩遊戲 Show YoU Gaming

yes we have newer configs if you have 64 GB RAM for 8 GB and thus it can be

Furkan Gözükara

I would like to ask if the training image with 8G GPU can be higher than 512*512.

秀煜玩遊戲 Show YoU Gaming

Hi if you have 32 GB RAM please upgrade to 64 GB and it will be 100% fixed

Furkan Gözükara

hi prof, im using RTX 4060 TI 16gigs vram and when i load your config 16GB_GPU_Quality_Tier_3_14700MB_9.2_Second_IT. the train ended right after i start the training WARNING constant_with_warmup will be good / train_util.py:4873 スケジューラはconstant_with_warmupが良いかもしれません enable full bf16 training. Traceback (most recent call last): File "C:\Kohya\kohya_ss\sd-scripts\flux_train_network.py", line 583, in trainer.train(args) File "C:\Kohya\kohya_ss\sd-scripts\train_network.py", line 664, in train unet = self.prepare_unet_with_accelerator(args, accelerator, unet) # accelerator does some magic here File "C:\Kohya\kohya_ss\sd-scripts\flux_train_network.py", line 554, in prepare_unet_with_accelerator accelerator.unwrap_model(flux).prepare_block_swap_before_forward() File "C:\Kohya\kohya_ss\sd-scripts\library\flux_models.py", line 1006, in prepare_block_swap_before_forward self.offloader_double.prepare_block_devices_before_forward(self.double_blocks) File "C:\Kohya\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 210, in prepare_block_devices_before_forward weighs_to_device(b, "cpu") # make sure weights are on cpu File "C:\Kohya\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 91, in weighs_to_device module.weight.data = module.weight.data.to(device, non_blocking=True) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Traceback (most recent call last): File "C:\Users\MTP023\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\MTP023\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Kohya\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "C:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/Kohya/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'C:\\fluxtraining/train_imgs\\model/config_lora-20241205-101408.toml']' returned non-zero exit status 1. 10:15:30-067542 INFO Training has ended.

Michael Liu

hi as i replied it is image dataset folder naming error : 0 train images with repeating.

Furkan Gözükara

I'm halfway through but cannot get a successful training to run. I have made sure my training dataset of images is in the correct folder many times ? Start training LoRA Flux1 ... 14:13:11-621828 INFO Validating lr scheduler arguments... 14:13:11-622537 INFO Validating optimizer arguments... 14:13:11-623546 INFO Validating lora type is Flux1 if flux1 checkbox is checked... 14:13:11-623546 INFO Validating D:/Sofia/output existence and writability... SUCCESS 14:13:11-624545 INFO Validating D:/Khoya/Kohya_GUI_Flux_Installer_v46/flux1-dev.safetensors existence... SUCCESS 14:13:11-624545 INFO Validating D:/Sofia/trained/test existence... SUCCESS 14:13:11-625544 INFO Regularization factor: 1 14:13:11-625544 INFO Train batch size: 1 14:13:11-626545 INFO Gradient accumulation steps: 1 14:13:11-626545 INFO Epoch: 200 14:13:11-627546 INFO max_train_steps (0 / 1 / 1 * 200 * 1) = 0 14:13:11-628545 INFO stop_text_encoder_training = 0 14:13:11-629545 INFO lr_warmup_steps = 0 14:13:11-630545 INFO Saving training config to D:/Sofia/output\Rank_2_27360MB_Fast_20241203-141311.json... 14:13:11-631546 INFO Executing command: D:\Khoya\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 D:/Khoya/Kohya_GUI_Flux_Installer_v46/kohya_ss/sd-scripts/flux_train_network.py --config_file D:/Sofia/output/config_lora-20241203-141311.toml D:\Khoya\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( D:\Khoya\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( 2024-12-03 14:13:18 INFO Loading settings from train_util.py:4519 D:/Sofia/output/config_lora-20241203-141311.toml... INFO D:/Sofia/output/config_lora-20241203-141311 train_util.py:4538 INFO highvram is enabled / highvramが有効です train_util.py:4190 2024-12-03 14:13:18 INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:43 INFO t5xxl_max_token_length: 512 flux_train_network.py:141 D:\Khoya\Kohya_GUI_Flux_Installer_v46\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-12-03 14:13:19 INFO Using DreamBooth method. train_network.py:325 INFO prepare images. train_util.py:1971 INFO 0 train images with repeating. train_util.py:2012 INFO 0 reg images. train_util.py:2015 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2020 INFO [Dataset 0] config_util.py:567 batch_size: 1 resolution: (1024, 1024) enable_bucket: False network_multiplier: 1.0 INFO [Dataset 0] config_util.py:573 INFO loading image sizes. train_util.py:923 0it [00:00, ?it/s] INFO prepare dataset train_util.py:948 ERROR No data found. Please verify arguments (train_data_dir must be the train_network.py:366 parent of folders with images) / 画像がありません。引数指定を確認してください(train_data_dirには画像が あるフォルダではなく、画像があるフォルダの親フォルダを指定する必要があ ります) 14:13:20-905473 INFO Training has ended.

Gavin Goodman

well with my config for 40 images i would recommend 200 epochs. so 8000 steps. but make sure save checkpoints and compare. probably around 150 epochs will be best but it is subjective - repeating is always 1 with flux since we dont use reg images

Furkan Gözükara

I've loaded them on a freshly restarted PC, so I assume it's something about saving rather than loading, but it's a minor issue. Question regarding the Lora training: Say, I've got a prepared image dataset of a person (30-40 random pics of him). What number of steps should I be aiming to do locally? For the last couple of months, I've been training Loras on Civitai with the following settings: 7 repeats, 15 epochs, 3,4-4,2k steps, LR/Alpha 32 and the results were quite good. What would be your recommendations please?

Justas

well i refresh gradio interface before loading configs or it get messed up :D

Furkan Gözükara

I've tried to save my custom configs on top of yours (even save as new one) but when I loaded them again, some fields were still set to your defaults (i.e. Clip-l or T5 were leading to Ubuntu/downloads/... or so). Save button is not overwriting them, you might need update kohya a bit :)

Justas

thanks a lot. what was the error you got with new config?

Furkan Gözükara

Greetings, I just tried your Config "48GB_GPUz_4x_GPU_Quality_Tier_3_29750MB_5.7_Second_IT" from the your current Installer v46 and it didn't work, error messages before training started. Tried on Runpod with 4xA40. Maybe when you have the time you can check if this is a general problem or I f*cked up something. I'm currently back to your "4x_GPU_Rank_1_SLOW_Better_Quality" from v42 and it seems to work. Nevertheless thanks again for providing all your Auto-Installers for Lazy-Noobs like me.

Shepard4k

simply lower quality. you can also reduce size to 1/4 with converting into fp8 : https://www.patreon.com/posts/115376830

Furkan Gözükara

just wondering, I've just made a new lora and it's over 3gb, it's a fairly simple character model, similar ones I see on civitai are usually a matter of megabytes rather than gigabytes, anyway to make these smaller?

4401 4401

looks like related to your operating system setup. uninstall everything and follow everything same as here with exact C drive and after that let me know of fresh install into c drive : https://youtu.be/DrhUHnYfwC0 please dont skip a second of this video : https://youtu.be/DrhUHnYfwC0

Furkan Gözükara

Hello Furkan, my installation gets stuck when using your .zip file. It stops exactly at wheel creation for fairscale (i dont know what it is) but after this line it stops continuing even if I wait a lot: Here is the last line in cmd window, can you help?: INFO Building wheels for collected packages: fairscale, schedulefree INFO Building wheel for fairscale (pyproject.toml): started INFO Building wheel for fairscale (pyproject.toml): finished with status 'done' INFO Created wheel for fairscale: filename=fairscale-0.4.13-py3-none-any.whl size=332117 sha256=39ab905f6375b7e194e6dd91f3cea3a75c09351518f6c60f84b0abf4f59ad78 a

Mehmet Atakan Çavuşlu

if you have 64 GB RAM 100%. if you have lesser i can't say for sure. system ram 64 gb works

Furkan Gözükara

Can this work on 1070ti 8G?

秀煜玩遊戲 Show YoU Gaming

well you need RAM memory very likely. I just published a monitor application can you monitor and report me back? : https://www.patreon.com/posts/system-resource-116539881

Furkan Gözükara

hi this is out of VRAM error. your config could be corrupted. can you tell me GPU and get a fresh config and try again please

Furkan Gözükara

so easy. when you select the downloaded 23.8 gb flux model in the model selection path it will appear

Furkan Gözükara

Hello ! Please help :) Everything was working quite fine and I wanted to install the new version. To be sure I was not messing with my working version I created a new file. I clicked on Windows_Install_Step_1.bat ... I then clicked on going back to cuda 2.5.0 and now nothing will work no matter what. Here is what I get. I'm sorry. Thank you Traceback (most recent call last): File "C:\kohya_flux_3\kohya_ss\sd-scripts\flux_train_network.py", line 574, in trainer.train(args) File "C:\kohya_flux_3\kohya_ss\sd-scripts\train_network.py", line 639, in train unet = self.prepare_unet_with_accelerator(args, accelerator, unet) # accelerator does some magic here File "C:\kohya_flux_3\kohya_ss\sd-scripts\flux_train_network.py", line 545, in prepare_unet_with_accelerator accelerator.unwrap_model(flux).prepare_block_swap_before_forward() File "C:\kohya_flux_3\kohya_ss\sd-scripts\library\flux_models.py", line 1006, in prepare_block_swap_before_forward self.offloader_single.prepare_block_devices_before_forward(self.single_blocks) File "C:\kohya_flux_3\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 210, in prepare_block_devices_before_forward weighs_to_device(b, "cpu") # make sure weights are on cpu File "C:\kohya_flux_3\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 91, in weighs_to_device module.weight.data = module.weight.data.to(device, non_blocking=True) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Traceback (most recent call last): File "C:\Users\Windows\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Windows\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\kohya_flux_3\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "C:\kohya_flux_3\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\kohya_flux_3\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\kohya_flux_3\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\kohya_flux_3\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/kohya_flux_3/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'C:/Users/Windows/Pictures/voile satin - Copie\\model/config_lora-20241122-214627.toml']' returned non-zero exit status 1. 21:48:00-227345 INFO Training has ended.

kartonpat

I found another post and I will retry the training and see if it solved the issue. "train_double_block_indices": "0,4-18" , it removes some of the blocks that are apparently the culprit for the pattern appearing. So blocks 1 to 3 are removed, they supposedly affect composition a little but not style and likeness.

baz64565 .

Professor, in the kohya parameters section I am missing Flux as an output Lora type. I noticed you use Flux1 in your training video. How do I get Flux1 to show up here as a dropdown option?

Leonard Bruno

Why is the error cuda out of memory? Also I can run on onetrainer fine, no problem so should not be ram issue. Did you even test configs on these VRAM? Even the 6GB VRAM config says out of cuda memory...

Brett Kelly

I have virtual ram, surely that should be sufficient?

Brett Kelly

hi it is removed since not necessary anymore. I am adding removing bat files as being necessary :) now we have Windows_Kohya_Update.bat

Furkan Gözükara

you have to increase block swap since it will use far more VRAM. increase 1 by 1 and test at least 10 steps and see which one work fastest

Furkan Gözükara

You have far too low RAM. you need to upgrade. get 2x 32gb and you will run very good

Furkan Gözükara

SECourses, any ideas why it not run on 8GB as you say it can?

Brett Kelly

Hi professor, I didn't have this file in my folder ~~~~ Update_Kohya_and_Fix_FLUX_Step2

秀煜玩遊戲 Show YoU Gaming

If I'm training with 1536x1536px on an RTX 3090, how should I modify the settings of '24GB_GPU_22900MB_7.3_second_it_Tier_1', as currently switching from 1024px to 1536px goes from 8s/it to 120s/it. I've tried increasing the block swap from 9 to 16. I can try increase it some more, but is there any other setting I could change to bring this s/it down without impacting quality for 1536px training? Thanks Dr

GH

I have 16GB, I tried T5XXL FP8 still out of memory. Tried to increase the virtual ram but made no difference unfortunately.

Brett Kelly

yes that can help certainly

Furkan Gözükara

make sure you load a T5xxl that is FP8, the configs for 8GB wil OOM if you load the default T5xxl which is fp16

baz64565 .

if you set max resolution to 1536 they wont get downscaled to 1024

Furkan Gözükara

How much RAM you have in your computer?

Furkan Gözükara

8GB GPU Ram, using your Tier4 and Tier5 scripts and I am getting out of memory. All startup stuff disabled and I have full free VRAM before starting, any ideas?

Brett Kelly

thanks a lot for the info

Furkan Gözükara

Wow, many thanks for the speedy updates! I create sample promts during training with a 24GB RTX 3090. 24GB_GPU_Quality_Tier_1_22950MB_14.1_Second_IT_Trains_T5_and_T5_Attention unfortunately runs out of memory. With blocks_to_swap 33 it worked. Best regards!

Markus Lienert

Wait what? I thought the max image resolution we could use to train was 1024px? Wouldn't 1536px training images just get resized to 1024px by kohya automatically?

GH

i wouldnt add such close ups. it may confuse as you noticed. instead try like 1536px training

Furkan Gözükara

great let us know. also triton not using training as far as i know

Furkan Gözükara

I found it hard to explain what I was seeing, but i found this on reddit and its exactly whats happening to me when using the loras from kohya training. A sort of patchwork pattern overlayed on the image creating noise and unneccesary sharpness. It seems AI-toolkit fixed it, could it still be present in kohya code base? Others say its a flux problem when going above 1024x1024 and there's not really a solution and that it gets worse when combining multiple loras https://www.reddit.com/r/StableDiffusion/comments/1es91bu/comment/li49tvp/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

baz64565 .

i think triton not used during training. we are already getting linux speed on windows with torch 2.5.1

Furkan Gözükara

Thanks to both of you. I really just went with tier 4 of 16gb but based on what you say maybe i can go for tier 2 and change to fp8 base in parameters. Also is this https://github.com/woct0rdho/triton-windows integrated in kohya or is the option of Triton in kohya different? It took 2hr 10m for 24 images on 4080 and 32gb with Dev flux / 1 repeat/ 1600 steps for a LORA which is crazy . what i dont remember though is which option i chose when installing kohya in number 5 either bf16 or fp8. i will try with bigger dataset like the one you did Dr. F and maybe with the fp8 model as well !!!

Sonivas Sx

Thanks! I will try it :) . I previously tried adding close-up images of the clothing, such as the lining, to the dataset, and it did improve the generated results. Later, I attempted to assign different triggers to each part, including the button on the left and the buttonhole on the right (asymmetrical). When I only added close-ups of one side, I found that LoRA could learn it, but it would turn both the left and right buttons into the appearance of the single close-up. However, when I added detailed close-ups of both sides, I found that LoRA couldn’t distinguish between the left and right sides. May I kindly ask if you have any advice or experience with this approach to incorporating close-up data?

yijie fang

yes bigger resolution training and better dataset

Furkan Gözükara

Hello and thank you for your work :) . I’m trying to train a LoRA model for a down jacket, but it struggles with details like buttons and zippers. Do you have any good solutions for this?

yijie fang

are you inpainting the face? you really should do with distant and mid shots as i do in tutorial

Furkan Gözükara

wow that is new for me. thanks a lot @Arder

Furkan Gözükara

Yes, I only removed "Full bf16 training (experimental)" from 6GB_GPU_Quality_Tier_3_14700MB_9.2_Second_IT and it works on 32gb ram

Arder

It seems I was too early cheering. The lora works, but very limited in use, only close ups are amazing, once the character is a medium distance shot the face looks like it gets too much noise, the same thing happens when combining with other loras, at first I thought it was the epoch that overtraind, but I went back from epoch 200 all the way to 40 and they have it all, I will try again with maybe network dim 64 and alpha 32, I read somewhere someone else had a similar issue and it was because of 128 rank

baz64565 .

i think i will add fp8 model download and fp8 option checkmark to the fp8 configs

Furkan Gözükara

so using that fp8 technique with which workflow you made it work on 32 gb ram?

Furkan Gözükara

In parameters -> Advanced remove checkbox from "Full bf16 training (experimental)" and put on "fp8 base"

Arder

now i get the same error as Arder. i have 32Gb ram and i had no problem in the past with a 4080, how to change to fp8 is it the option from number 5 in intallation or using the fp8 model of flux?

Sonivas Sx

yes it is minimal difference. 1024 is best resolution

Furkan Gözükara

Thanks! Yes I have 32gb Ram, so I will stick with fp8 for the moment, how much quality drop between fp8 and bf16 it shouldnt be big with 1024 resolution?

Arder

please follow every step of this video first and then reinstall : https://youtu.be/DrhUHnYfwC0 if you upgrade gold tier i can install for you as well via remote desktop

Furkan Gözükara

ok doing a fresh install and after step 1 i get this and cant move to 5 Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Kohya\Kohya_GUI_Flux_Installer_v44\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 4, in File "C:\Kohya\Kohya_GUI_Flux_Installer_v44\kohya_ss\venv\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 19, in from accelerate.commands.estimate import estimate_command_parser File "C:\Kohya\Kohya_GUI_Flux_Installer_v44\kohya_ss\venv\Lib\site-packages\accelerate\commands\estimate.py", line 34, in import timm File "C:\Kohya\Kohya_GUI_Flux_Installer_v44\kohya_ss\venv\Lib\site-packages\timm\__init__.py", line 2, in from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \ File "C:\Kohya\Kohya_GUI_Flux_Installer_v44\kohya_ss\venv\Lib\site-packages\timm\models\__init__.py", line 28, in from .maxxvit import * File "C:\Kohya\Kohya_GUI_Flux_Installer_v44\kohya_ss\venv\Lib\site-packages\timm\models\maxxvit.py", line 225, in @dataclass ^^^^^^^^^ File "C:\Python312\Lib\dataclasses.py", line 1275, in dataclass return wrap(cls) ^^^^^^^^^ File "C:\Python312\Lib\dataclasses.py", line 1265, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python312\Lib\dataclasses.py", line 994, in _process_class cls_fields.append(_get_field(cls, name, type, kw_only)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python312\Lib\dataclasses.py", line 852, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default for field conv_cfg is not allowed: use default_factory [11/18/24 13:04:08] ERROR Error occurred while running command: accelerate config default setup_common.py:673 ERROR Error: Command 'accelerate config default' returned non-zero exit status 1.

Sonivas Sx

not needed. so Windows_Install_Step_1.bat - and then option 1 and 5. then use Windows_Install_Torch_2_5_Dev_Huge_Speed_Up.bat and then use our starter to start. also dont forget temporary fix for a while now

Furkan Gözükara

since its been a while i do install step 1,5,6 should i do Triton as well ? heard some news lately about speeding things up

Sonivas Sx

you should do a fresh install. after that use the new attached Fix_Kohya_GUI_Error_Temporary.bat file it will reverse to older working commit. it is error of bmaltais - kohya gui maker

Furkan Gözükara

so should i download kohya again to get the new configs or just update?

Sonivas Sx

how much RAM you have? it could be reason. you might be needed 64 GB RAM

Furkan Gözükara

I have 4070ti super with 16gb. When I try use new config 16GB_GPU_Quality_Tier_3_14700MB_9.2_Second_IT I recieve this error: enable full bf16 training. Traceback (most recent call last): File "D:\AI\kohya\kohya_ss\sd-scripts\flux_train_network.py", line 574, in trainer.train(args) File "D:\AI\kohya\kohya_ss\sd-scripts\train_network.py", line 639, in train unet = self.prepare_unet_with_accelerator(args, accelerator, unet) # accelerator does some magic here File "D:\AI\kohya\kohya_ss\sd-scripts\flux_train_network.py", line 545, in prepare_unet_with_accelerator accelerator.unwrap_model(flux).prepare_block_swap_before_forward() File "D:\AI\kohya\kohya_ss\sd-scripts\library\flux_models.py", line 1006, in prepare_block_swap_before_forward self.offloader_single.prepare_block_devices_before_forward(self.single_blocks) File "D:\AI\kohya\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 210, in prepare_block_devices_before_forward weighs_to_device(b, "cpu") # make sure weights are on cpu File "D:\AI\kohya\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 91, in weighs_to_device module.weight.data = module.weight.data.to(device, non_blocking=True) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Traceback (most recent call last): File "C:\Users\Arder\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Arder\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\AI\kohya\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "D:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\AI\kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\AI\\kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/AI/kohya/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/AI/SwarmUI/SwarmUI/Models/Lora/config_lora-20241118-023205.toml', '--blocks_to_swap', '23']' returned non-zero exit status 1. switch to fp8 instead bf16 work fine and fast 6.65it. Does it mean bf165 doesnt work with 16gb or I missing something in configs that could make it work?

Arder

ok i just tested and your kohya is very old. please make a fresh install. use latest zip file. use Windows_Install_Step_1.bat and then Windows_Install_Torch_2_5_Dev_Huge_Speed_Up.bat and then Windows_Start_Kohya_SS.bat

Furkan Gözükara

let me update configs bmaltais probably fixed his error

Furkan Gözükara

got this error after last update "flux_train_network.py: error: unrecognized arguments: --blocks_to_swap 26 Traceback (most recent call last): File "C:\Python3_10_11\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python3_10_11\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Kohya\Kohya_GUI_Flux_Installer_27\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "C:\Kohya\Kohya_GUI_Flux_Installer_27\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Kohya\Kohya_GUI_Flux_Installer_27\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Kohya\Kohya_GUI_Flux_Installer_27\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Kohya\\Kohya_GUI_Flux_Installer_27\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/Kohya/Kohya_GUI_Flux_Installer_27/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'C:\\Users\\S-Pc\\Desktop\\Training Outputs\\model/config_lora-20241118-021525.toml', '--blocks_to_swap', '26']' returned non-zero exit status 2."

Sonivas Sx

yep. hard to know exactly which checkpoint will be best

Furkan Gözükara

I just finished, it works, but I had to use previous epochs instead of final one, it overtrained fast and the character becamse "noisy", i trained 3200 steps with 16 images and the last satisfying useable epoch was around 2100 steps. So I could have stopped around 3 hours even.

baz64565 .

4 hours is awesome good job there

Furkan Gözükara

Ok , i quit the training and reset dimensions to 128, running stable now at 7gb/8gb used, shared mem at 12.3GB , swap at 30, resolution 512x512, 16 images, 1 repeat, 200 epoch. 4h10 on 3060ti 8GB

baz64565 .

changing to 512,512 and block from 35 -> 30 , made time drop from 12h to 4h ;) Atleast that can finish before I have to go to bed, i don't like long run times, because I don't like the possibility of my PC catching on fire while i'm sleeping or away from home. I know you mostly concentrate on quality, which is great, but a small note in tutorial for those that want to tweak the balance for speed/quality a little, might help a lot of others too, like you did for me just now. I think in the beginning I also decreased network dim and alpha from 128 to 64, because I used to get OOMs and I thought it was caused by the high network dim. Could that be the reason i'm not at high GPU in use , but instead 5.7/8GB dedicated gpu memory used?

baz64565 .

thanks accurate answer. also it will show as started from 0 but it will be actually starting from 125. so make new epoch count accordingly

Furkan Gözükara

yes do this. change resolution in config to 512x512, copy images into a new folder and use that folder. bucket size not important - try swap count 1 by 1 like decrease 1 try decrease again try and such

Furkan Gözükara

its a 3060ti 8GB , i'm using 512x512 pictures, but i notice the config settings is at 896,896 and the bucket_max is at 2048, do those causing slower times even though i'm using only 512 in my training set? what would be an idea swap count instead?

baz64565 .

what is your GPU? reducing resolution will bring most speed up during LoRA training - when you reduce resolution you can also reduce block swap count

Furkan Gözükara

dont use captions. if you want shirt it will be a shirt lora. i would try to gather like 30 40 good poses distances angles

Furkan Gözükara

hi replied from discord

Furkan Gözükara

I found out why I OOM'd, I was using the FP16 t5xxl and it was crashing when it had to be loaded into memory. using the FP8 now.

baz64565 .

In the basic options you have a little optional field called "Network weights" Path to an existing LoRA network weights to resume training from -> give the path to your last trained epoch there

baz64565 .

I've started the 7200MB training config, it's going to take +- 12hours. Which parameters can I change to speed up and sacrifice some quality? Is it network dim and network alpha settings? Or is there a better setting to reduce to get it atleast around 6 hours, maybe max 8 hours.

baz64565 .

I want to train a shirt's LoRA, so will it be called as style or character LoRA? Also how many training images will be required, with or without captions??

Yash Rami

Hi. I have a problem, probably quite childish. I don't know how to resume training from the last checkpoint. I have a 125 epoch chatpoint, I have a .safetensors file. Could someone help me? How can I load it and resume training?

Magica Barłogi

yep check this out : https://www.reddit.com/r/SECourses/comments/1gofhj0/doing_the_final_flux_dev_model_maximum_quality/

Furkan Gözükara

Ok I will try out the finetune when it gets updated to be a bit faster.

baz64565 .

if you are willing to lose quality you can even done it faster. you want quality or speed that matters. my configs targets best quality. and none of the lora will ever reach of fine tuning quality.

Furkan Gözükara

I see, how much time difference are we talking about between lora and finetune? the lora usually takes about 2h40 on 8GB (fluxgym), not sure how much the rank12 on kohya_ss normally took with 15 images.

baz64565 .

i really recommend to do fine tuning instead of lora. it will be way better. also a huge speed up is about to arrive to the fine tuning. i just messaged kohya to merge branch. once updated it will use only 6900 MB VRAM. but current configs and version also works - just slower

Furkan Gözükara

I've tried the rank12 8GB config, but i'm still getting out of memory, only 32MB though, I've already closed everything running that is using GPU except for default windows processes : return t.to( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 7.31 GiB is allocated by PyTorch, and 14.35 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

baz64565 .

it was for 1 trainer i didnt use it. the kohya works in 2 ways. either reads folder name as caption or you provide image_name.txt for every image and write caption in that

Furkan Gözükara

About .txt caption files, I noticed that you just put 1 .txt file in your photo folder. When I used Joy cation, there were 15 caption files for my 15 photos. Should I put 15 caption files to my photo folder or just 1 file?

Quốc Tuấn Dương

Rica ederim.

Furkan Gözükara

teşekkür ederim

Bayram TATAR

evet yetersiz VRAM hatası ayrıca sizde RAM'de yok

Furkan Gözükara

teşekkür ederim. sanırım bu sistemle yapamıyacağız. birde birkaç kez denememe rağmen şöyle bir hata alıyorum. bunun çözümü var mıdır acaba. tam anlayamıyorum... kohya derambooth da : enable full bf16 training. Traceback (most recent call last): File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\sd-scripts\flux_train.py", line 998, in train(args) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\sd-scripts\flux_train.py", line 453, in train flux = accelerator.prepare(flux, device_placement=[not is_swapping_blocks]) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1311, in prepare result = tuple( File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1312, in self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1188, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1435, in prepare_model model = model.to(self.device) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1340, in to return self._apply(convert) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) [Previous line repeated 1 more time] File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply param_applied = fn(param) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert return t.to( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 14.59 GiB is allocated by PyTorch, and 17.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Traceback (most recent call last): File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\Kohya_GUI_Flux_Installer_v42\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_v42\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_v42/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:/LOraY\\model/config_dreambooth-20241109-125149.toml']' returned non-zero exit status 1. 12:56:29-151176 INFO Training has ended.

Bayram TATAR

Bayram bey selamlar. Ekran kartınız VRAM olarak zayıf ama daha büyük problem sadece 16 GB RAM olması. 2x32 GB RAM alırsanız, LoRA yerine direkt fine tuning yaparak harika sonuçlar elde edebilirsiniz. LoRA da gene fena çalışmaz : Rank_12_7500MB_7_68_Second_IT.json

Furkan Gözükara

Hello Mr. Furkan... I am using i5 12th generation processor and RTX 4060 8GB vram and 16GB ram available on the pc (cpu). I have examined the configurations here. But I couldn't fully understand. I don't know much about this. Can I do this training with this computer. and which configuration file would you recommend me to use. thank you. Note: By the way, I am in Samsun, Turkey. you can answer in Turkish.

Bayram TATAR

well with my config trained loras, not changed except like number of epochs, all of them works with swarmui. so impossible to know without further debug. if you upgrade gold member i can setup a training on your computer quickly if you are not doing anything NSFW

Furkan Gözükara

Thank you again. This is so annoying when it does not work… I managed to train the Lora. But when I use flux1.dev with forge and Lora. I get errors in forge. « AssertionError you do not have clip state dict! » and sometimes a message about missing object I don’t remember. I am so sorry to bother you. Lora training is extremely important for my work. Thank you

kartonpat

awesome

Furkan Gözükara

when i last tested fp16 were not learning. but if results are good all good

Furkan Gözükara

Edited: I don't have an anti virus set on this computer. I have a GTX 4080 16go and used the ranked 8 config. Is it the right one ? Because it changed the setting to float instead of bf16 which seams better. And the size is now 768x768. What do you think ?

kartonpat

I did, thank you !

kartonpat

install error. also i would never install there c user windows . please follow this video to install your requirements properly first : https://youtu.be/DrhUHnYfwC0 install everything directly into C drive

Furkan Gözükara

it is definitely not normal. something on your system is blocking hugging face downloader. probably your antivirus . try again and it should resume but you should allow python code to run

Furkan Gözükara

Hello and thank you for all this nice work! when trying to download the models from your file i get this : clip_l.safetensors: 100%|███████████████████████████████████████████████████████████| 246M/246M [00:06<00:00, 38.0MB/s] Could not set the permissions on the file 'C:\kohya flux 2\.cache\huggingface\download\clip_l.safetensors.660c6f5b1abae9dc498ac2d21e1347d2abdb0cf6c0c0c8576cd796491d9a6cdd.incomplete'. Error: [Errno 13] Permission denied: 'C:\\tmp_cb40941d-37f7-4e01-a619-ee5e315fbe64'.|█▌ | 273M/9.79G [00:06<03:42, 42.8MB/s] Continuing without setting permissions. ae.safetensors: 100%|███████████████████████████████████████████████████████████████| 335M/335M [00:08<00:00, 38.8MB/s] Could not set the permissions on the file 'C:\kohya flux 2\.cache\huggingface\download\ae.safetensors.afc8e28272cd15db3919bacdb6918ce9c1ed22e96cb12c4d5ed0fba823529e38.incomplete'. Error: [Errno 13] Permission denied: 'C:\\tmp_32e18f27-7dfe-498f-a69c-7722f8ece12f'. 4%|█▉ | 357M/9.79G [00:08<03:42, 42.4MB/s] Continuing without setting permissions. Is it normal ?

kartonpat

Hello and thank you for your work :) When i launch the GUI I get this : Starting the GUI... this might take some time... Traceback (most recent call last): File "C:\Users\Windows\Documents\kohya flux\kohya_ss\kohya_gui.py", line 6, in import gradio as gr ModuleNotFoundError: No module named 'gradio' (venv) C:\Users\Windows\Documents\kohya flux\kohya_ss> I checked and updated gradio but I still get this message. Any idea ? THank you

kartonpat

Nothing wrong. We save as float thus size is 2x. you can save as fp16 to half the size. you can also later save as fp8 via kohya to half the size again.

Furkan Gözükara

Hi, I'm training FLUX LoRA on massed compute with 1x RTX A6000 [ALT Config] on 15 images with 2000 steps. Best_Configs/Rank_1_29500MB_8_85_Second_IT.json is my configuration and so far everything is going fine, except that checkpoints saved every 25 epochs is 2.5 GB. As far as i know the whole point of LoRA is that it should be realy small file. What bam I doing wrong?

Tycjan Gniew

sadly i dont know. last time i tested it was working. i need to test again later. could be that pod broken or kohya broken. did you try on massed compute?

Furkan Gözükara

Adhil

i would aim 100-150 , depending on how much you can wait. no such thing as 1000 steps optimal. you can see i did way more with my 256 images dataset : https://youtu.be/FvpWy1x5etM

Furkan Gözükara

If I use batch size 1, and have 63 image dataset, how many epochs or steps should I aim for? The configuration file suggests 200 epochs which would be over 12000 steps. I thought 1000 steps was said to be optimal?

Colab

Ty!

Steve

if working yes you can ignore. also i updated torch 2.5.1 it should fix this

Furkan Gözükara

Hello I tried again, it is running now but still gives the error. Is this ok? epoch 1/100 2024-11-01 20:35:23 INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:715 Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory /home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/autograd/graph.py:825: UserWarning: cuDNN SDPA backward got grad_output.strides() != output.strides(), attempting to materialize a grad_output with matching strides... (Triggered internally at ../aten/src/ATen/native/cudnn/MHA.cpp:674.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass steps: 0%| | 1/48000 [00:12<163:06:27, 12.23s/it, avr_loss=0.384]Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared steps: 0%|

Steve

hi i used latest installer here and no problems can you try again? https://www.patreon.com/posts/112099700 make sure to follow this tutorial every step first : https://youtu.be/DrhUHnYfwC0

Furkan Gözükara

Hi, were you able to find the issue? Ty

Steve

you are welcome. well i tested prodigy in past and i didnt find it better than adafactor personally. since I find a very good LR for batch size 1 and each batch size, i think our config works good. we use static learning rate (LR)

Furkan Gözükara

Yes, speed is good, results are also good. One more question, did you also experiment with prodigy optimizer? I read some people mention that prodigy is the best optimizer. It consumes more vram, for rank 3 without t5xxl it consumed about 23.9 gb on 4090. I also did training with prodigy with cosine also at 1 learning rate and the results were also good. Artstyle used for both training were very different so I can't tell which results were better, but both results turned out really good. And again thanks for doing all the work and helping out.

Rustic Engineering

i see it is working so i think you can ignore. i see speed is also good

Furkan Gözükara

Thanks, I'll also try finetuning. Training is going well with 4090 with Rank_3_18950MB_9_05_Second_IT.json config. I got this library missing error before first epoch but now it has stopped showing. Is this something that can be ignored? https://ibb.co/s27Cp5d

Rustic Engineering

there is a very little difference so you can use rank 3 without T5. also best quality is fine tuning rather than lora training : https://youtu.be/FvpWy1x5etM

Furkan Gözükara

Thanks. I'll try with the 48gb gpu. And can you please also tell for training style, should I go with t5xxl files, for both 24 gb gpu and 48gb gpus, or without t5xxl files should also be good for artstyles? Would there be major difference in quality in both of them? I'm trying to get as good quality lora as possible.

Rustic Engineering

ye that one barely fitting into 24 gb so i see. in your case it did few steps and got error later. so you need to use either bigger GPU or use the one without T5 XXL

Furkan Gözükara

Thanks for the reply. I think all the error was not copied into my message, maybe there is limit of text. I'll attach the image link here: 1. https://ibb.co/brKbGt3 2. https://ibb.co/5Fs0zkM I'm using Rank_3_T5_XXL_23500MB_11_35_Second_IT.json file.

Rustic Engineering

hi which config did you try? it is out of vram error. are you also generating samples during training?

Furkan Gözükara

I got this error when trying to train on runpod, this is my pod: 1 x RTX 4090, 12 vCPU, 62 GB RAM runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 This is the error: steps: 0%| | 0/4000 [00:00 trainer.train(args) File "/workspace/kohya_ss/sd-scripts/train_network.py", line 1207, in train accelerator.backward(loss) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2159, in backward loss.backward(**kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward torch.autograd.backward( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward _engine_run_backward( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1125, in unpack_hook frame.recompute_fn(*args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1519, in recompute_fn fn(*args, **kwargs) File "/workspace/kohya_ss/sd-scripts/library/flux_models.py", line 746, in custom_forward outputs = func(*cuda_inputs) File "/workspace/kohya_ss/sd-scripts/library/flux_models.py", line 723, in _forward attn = attention(q, k, v, pe=pe, attn_mask=attn_mask) File "/workspace/kohya_ss/sd-scripts/library/flux_models.py", line 449, in attention x = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 946.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 730.81 MiB is free. Process 1700650 has 22.92 GiB memory in use. Of the allocated memory 20.68 GiB is allocated by PyTorch, and 1.77 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) steps: 0%|▎ | 11/4000 [01:16<7:45:15, 7.00s/it, avr_loss=0.414] Traceback (most recent call last): File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python3', '/workspace/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', '/workspace/kohya_ss/dataset/outputs/model/config_lora-20241028-234140.toml', '--cpu_offload_checkpointing']' returned non-zero exit status 1. 23:43:45-777427 INFO Training has ended.

Rustic Engineering

let me make a fresh install and test

Furkan Gözükara

Hello, in massed compute Kohya Flux Dreambooth training I am getting 2024-10-27 15:19:31 INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:715 Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory I assume this has to do with CUDA. Do you know how to resolve it?

Steve

using hugging face repos as shown in last video : https://youtu.be/FvpWy1x5etM

Furkan Gözükara

What is a good way to transfer large files 23GB + 23GB saved training state to Massed compute? Transfer speed is very slow. Thanks!

Steve

your computer might be virus infected. i use kaspersky internet security and no such issue. here virus total : https://www.virustotal.com/gui/file/9d5b328e342ce9ce5f73fcd8ea7411be3489263568d9863cccfae6284fddce83?nocache=1 the zip file only has .bat and .sh files that you can edit and see :)

Furkan Gözükara

When I try to install Kohya_GUI_Flux_Installer_v41.zip, it says a virus was detected, and the download is blocked. I am using the Google Chrome browser. Is there a solution for this?

대훈 조

I can get "Windows_Update_Kohya_and_Fix_FLUX_Step2" to work on a separate install of kohya, however it states that everything is already up to date, but then when loading Flux Dev in Kohya I don't get the Flux or SD3 Checkbox. Edit: Eh I just uninstalled my other version of python and will reinstall later lol.

Twinklixx Jones

Hello there! Do you have information on how to install flux into Kohya without using your .zip file? Your Windows_Install_Step_1.bat file does not account for PCs with multiple python versions installed, thus I can't use your installation method. This is likely the same problem that the person above me is having.

Twinklixx Jones

please follow this video exactly as it is and then reinstall : https://youtu.be/DrhUHnYfwC0 also start bat file as this. open a cmd and type full name of the bat file and run that way to see error reason

Furkan Gözükara

what is your step speed? yes 256 images takes more time but certainly not forever on rtx 4090 :D

Furkan Gözükara

I can't believe I finally achieved a decent result by following this guide. It's been quite frustrating but finally I got it. I have one question: ¿Is there any way to train faster by using another base model which is not dev-1? Right now on my RTX 4090 it's taking about 4 hours to train using 19 images, but I would like to do the same experiment as you with 256 images but it's gonna take forever.... ¿Any advice?

Daniel Cardona Ramirez

Hi all, is anybody facing the same problem? when I click Windows install step 1, then I pick option 1, the process and windows close immediately. I have installed phyton same version, git, and cuda. thanks in advance

Federico Salmaso

hopefully will be published once i edited. editing the tutorial

Furkan Gözükara

Hi First of all, congratulations on the amazing work! I would like to know where I can find the tutorial for Flux fine-tuning/Dreambooth, please.

Icaro Diniz

out of RAM error. don't use 24 GB RAM configs which are alt config and spot instance

Furkan Gözükara

Massed Compute / DreamBooth INFO caching latents with caching strategy. INFO caching latents... 0%| Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/Ubuntu/apps/kohya_ss/venv/bin/python', '/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/Ubuntu/Downloads/training_imgs/model/config_dreambooth-20241015-224004.toml']' died with . Training has ended.

Pixel Reaction

he just fixed this error after i reported please reinstall : https://github.com/bmaltais/kohya_ss/issues/2901#issuecomment-2414757729

Furkan Gözükara

After clicking on Installer_bat_step1, it is showing this error.. ERROR: Cannot install -r requirements.txt (line 1), -r requirements.txt (line 10), diffusers[torch]==0.25.0 and huggingface-hub==0.24.5 because these package versions have conflicting dependencies.

Zabi G

best case is training on each model. but you can use lora trained on sd 1.5 models on different sd 1.5 models, sdxl on sdxl models and flux on flux models. sd 1.5 vs sdxl vs flux won't work. they only work on same base models

Furkan Gözükara

Hey, everything works perfectly and my flux lora works great. I'm now wondering if i can use the lora model with another checkpoint like epiCRealism? Should i train a lora model specifically for that or can i somehow use the same lora model to keep my character's face consistent?

ole asdsad

hi i am preparing it. meanwhile please fully watch lora tutorial it will be mandatory to watch that : https://youtu.be/nySGu12Y05k

Furkan Gözükara

Is there a tutorial video on it, on your youtube? I'm not great at following written instructions. A video would help a lot. Thanks

V Santhosh

Rank 3 is slow. if you want speed you need rank 4. but i suggest you to do fine tuning instead of lora. it is way better and almost same 6 7 second speed. i am gonna publish comparison grids hopefully tomorrow : https://www.patreon.com/posts/112099700 fine tuning way way better than lora

Furkan Gözükara

Hi there, I'm using RTX 4090, I've enabled all the options like you've showed in the tutorial, When I hit start training I get a eerror saying T5Xxl is already training and cannot be used with cache text encoder outputs or something. So I disabled T5xxl training and started it. What do I lose because of this? Also for a 4090, If I use rank 3 24gb version config file. It's taking around 6-7 seconds per iteration. Which I think is very slow.

V Santhosh

i dont know sadly

Furkan Gözükara

I'll try, in Forge UI, if I try and use my lora with a model, it doesn't work, but if I change the 'Diffusion in Low Bits' option (top right) to from 'Automatic' to 'Automatic (fp16)' it works ok. I was just curious why as I've never had an issue with any other lora I've downloaded. https://imgur.com/a/XpLbCyj

Maso

hello can you elaborate more because i couldn't exactly understand

Furkan Gözükara

I created a flux lora, which I thought was only working with flux-dev (fp16), however upon further investigation it appears to work with only flux-dev checkpoints, but only if I change the setting in Forge 'Diffusion in low bits' to 'Automatic (fp-16 LoRA). I just wondered if you had seen similar? I'm using Rank3 (non T5) and the only Kohya settings I have changed are network dimension 32 and save as fp16, to reduce the sizee.

Maso

hi. those errors are very likely to inaccurate python, cudnn, cuda libraries. you need to install accurate ones. python 3.10, cuda 12.4, cudnn 8.9.7 and you need c++ tools

Furkan Gözükara

yes caching can cause that 100%

Furkan Gözükara

Due to regional issues, I attempted to rent a GPU on an alternative platform. I'm using Jupyter Notebook, and since this platform is similar to RunPod, I tried to apply RunPod's configuration. However, I encountered numerous and varied errors. I'm unsure of their causes or how to fix them. Occasionally, I manage to solve some issues, but I don't fully understand how or why the solutions work. I have some basic knowledge of Python and Linux. What should I study to better equip myself for building an environment to train LoRA on this new platform? I'd appreciate any suggestions on resources or topics to focus on. My goal is to understand the process better and troubleshoot more effectively. Thank you in advance for your help!

Leonidas Freud

I tried training with the "rank 3" option (with rtx3090). It was taking ages but it would have worked if I left it for days (though I hadn't changed it to save more often). Because it was running slow I tried changing the "rank 4" option. But that seems more likely to fail with an error of things like "...Kohya-GUI-Flux-trainer\kohya_ss\sd-scripts\library\flux_models.py", line 1024, in forward if img.ndim != 3 or txt.ndim != 3: AttributeError: 'NoneType' object has no attribute 'ndim' Maybe if it's training with lower precision, when outputting sample images maybe it might be more likely to generate one where one of the values isn't a valid number? Is that what that error message could mean? Could it be anything to do with me increasing network rank to 275 and changing network alpha to 68? Maybe I could have used the "rank 3" one and just increased the learning rate so it could learn faster (but maybe the quality might not be as good then/it would overtrain). I tried training higher res (1280x1280) but that's just too slow with 1 RTX3090, especially when outputting an image sample. The changed Kohya also isn't as good at saving configurations as the standard Kohya because in this verison there's no "save as" option to save a copy of a config in a different name. edit: I think that error might have been something to do with the caching prompts options, because it says you can edit the prompts.txt file that has the prompts used training samples, but if you do and it's cached the prompts to memory/disc it seems to give an error if it finds one that's changed in the prompt.txt file.

cool1

Thanks. Somehow it worked after clicking the "install torch 2.5 dev.. bat file again". I'd already done that before a few times but this time it worked and allowed me to run the start windows bat with it seeming to load okay (haven't tried gui.bat again yet).

cool1

torch 2.5 works i don't know what you are doing :) open a cmd and run bat file from that cmd to see message

Furkan Gözükara

Thanks. I tried that and it quickly exits with the same error messages (too fast to read but they look the same as the ones shown when using gui.bat). That's after re-installing and using the torch 2.5 speed up bat file before it. If I re-install and don't click the "torch 2.5 dev huge speed up" clicking on gui.bat works and loads and says it's running on a particular ip address (like the standard one). Though I assume it will be a lot slower than if torch 2.5 could work (but it doesn't seem compatible with it/the requirements).

cool1

you have to use Windows_Start_Kohya_SS.bat it has skip args

Furkan Gözükara

Thanks. I'll try going through that other video. I tried re-installing everything (including the torch 2.5 dev huge speed up" and when trying to run the gui.bat it said things like "WARNING Package wrong version: torch 2.5.0+cu124 required 2.4.1+cu124 09:01:34-225354 INFO Installing package: torch==2.4.1+cu124" So it seems like torch 2.5 (that' supposed to give the huge speed up) isn't compatible with it, and the requirements want a lower torch version.

cool1

try reinstall. also i would follow this tutorial and reinstall kohya : https://youtu.be/DrhUHnYfwC0

Furkan Gözükara

I tried installing this Kohya (I have the other) and I ran the "install torch 2.5 dev huge speed up". then when trying to run gui.bat it says "the procedure entry point ?set_tracing@SAved TensorDefaultHooks... could not be located in the dynamic link library g:\Kohy\...site-packages\torchvision\_C.pyd then in the command window "operator torchvision::nms does not exist" What's the easiest fix? Should I re-install without that speed up .bat file (or will that be too slow at running)?

cool1

awesome

Furkan Gözükara

hi open a cmd window and type bat file name and run from there, so we can see error reason. very likely you are missing requirements : https://youtu.be/DrhUHnYfwC0 also dont run as administrator

Furkan Gözükara

hi, when i run the windows install step 1 and i enter 1 to install the gui the bat closes straight away?

Alex puliatti

thanks, it really works when i extend the virtual ram!

Kun HUANG

sure just save as fp16 and reduce lora network dimension to 32 from 128, this will reduce size from 2gb to 250 MB

Furkan Gözükara

Hello! Thanks to you, I'm really getting a lot of help. I understand that quality is important, but it seems that the flux Lora of about 60-70mb shared on civitai also has pretty good quality. Even when generating with fp16, the size of over 1GB is troublesome. Is there any way to generate a Lora with a size of around 70mb like this? Or would it be impossible with koyha? https://www.reddit.com/r/StableDiffusion/comments/1f523bd/good_flux_loras_can_be_less_than_45mb_128_dim/

SUNG SHU LIN

Very likely out of RAM error. how much RAM you have? setup 64 GB virtual ram it should fix it

Furkan Gözükara

I don't know what's the error means: enable full bf16 training. Traceback (most recent call last): File "D:\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:/kohya_ss/outputs/kun\\model/config_dreambooth-20241006-150903.toml', '--cpu_offload_checkpointing']' returned non-zero exit status 3221225477. 15:09:23-564854 INFO Training has ended.

Kun HUANG

5-10 minute usually. you dont like discord but we have top staff of massed compute in discord we can ping :D

Furkan Gözükara

Trying your Massed Compute method out and it seems to be hanging forever on initializing. It's been around 5 minutes now. How long does that typically take? Oh and apologies but dislike Discord with a fiery passion. Worse interface ever.

Charles Leo

nice give it a try if works.

Furkan Gözükara

awesome

Furkan Gözükara

Thanks. If I get a spare moment I might give it a test run even if at lower epochs to see if it breaks. I found this clip by City96 to be better at handling prompts but unsure how. It handles here.

Charles Leo

I did run those commands from a super user. Fixed for now, I have no idea what the problem was. But thanks :)

Oleh Kopyl

Thank you very much :)

Oleh Kopyl

it appears you are using your own ubuntu installation. you need add sudo to your commands sudo apt-get install python3-tk and sudo apt-get install python3-tk also try solutions given by chat gpt : https://poe.com/s/SKnYvYG4V8ngB8oKCT11

Furkan Gözükara

yes i trained but works really bad. now i am following this model : https://www.reddit.com/r/SECourses/comments/1fuhi3f/openflux1_distillation_removed_normal_cfg_flux/

Furkan Gözükara

Can train with Flux Schnell ?

Louis Lee

When I run `./gui.sh --listen=0.0.0.0 --headless` i get this error: ``` Traceback (most recent call last): File "/home/azureuser/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/utils.py", line 29, in import tkinter as tk # python3 ModuleNotFoundError: No module named 'tkinter' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/azureuser/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/utils.py", line 36, in import Tkinter as tk # python2 ModuleNotFoundError: No module named 'Tkinter' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/azureuser/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 15, in from . import utils as ut File "/home/azureuser/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/utils.py", line 43, in raise ImportError("Unable to find tkinter package.") ImportError: Unable to find tkinter package. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/azureuser/kohya_ss/kohya_gui.py", line 4, in from kohya_gui.class_gui_config import KohyaSSGUIConfig File "/home/azureuser/kohya_ss/kohya_gui/class_gui_config.py", line 2, in from .common_gui import scriptdir File "/home/azureuser/kohya_ss/kohya_gui/common_gui.py", line 5, in from easygui import msgbox, ynbox File "/home/azureuser/kohya_ss/venv/lib/python3.10/site-packages/easygui/__init__.py", line 34, in from .boxes.button_box import buttonbox File "/home/azureuser/kohya_ss/venv/lib/python3.10/site-packages/easygui/boxes/button_box.py", line 18, in import global_state ModuleNotFoundError: No module named 'global_state' ``` Is there any way to fix it? Running `apt-get install python3-tk` or `apt-get install python-tk` does not help.

Oleh Kopyl

i think it is not suggested. base fp16 models are auto casted according to the settings so it doesnt matter during training

Furkan Gözükara

What might happen if you were to use a different clip model such as t5xxl_fp8_e4m3fn?

Charles Leo

thank you for understanding. yes discord is best.

Furkan Gözükara

No worries and I appreciate it. We all have shit days and moments. I’ll see if I can grab some screens tomorrow and hop on discord.

Charles Leo

by the way sorry for my tone if i sounded arrogant. i am really tired at the moment.

Furkan Gözükara

i really dont understand your questions because you really should join discord and show screenshots or video of what you are doing. impossible to know what you are doing from text what that file does is replacing kohya bat file so that when you start the application it will not return back to torch 2.4 from torch 2.5. torch 2.5 speeds up the training on windows a lot also if you need installation help i connect pc of gold supporters and help them to setup

Furkan Gözükara

Oh FTR, that py file that's a leftover/not even needed and throws out errors via the bat is setting the incorrect Python version... 3.10.9

Charles Leo

Yes. I'm well aware (see above) and that's where I manually moved it from. but your bats do NOT install that into the directory and your output says it's missing... What does that file do/why does it exist? Is it not reasonable for someone to want to know what's being installed or if it's even needed? You said to pay attention to errors, but you have haven't even addressed your own. You know, I defended you yesterday on Reddit but now I'm starting to see what others are saying after these interactions. You're tone is condescending and I'd appreciate it if you talked to me like you would want to be treated as well. You're not always going to be the smartest person in the room either. It's not much to ask for.

Charles Leo

I see what you're doing now and I'm frankly surprised there's that many new clueless people trying to generate bad manga at home. But hey you're trying to assist that denominator and I get it as well. I had hoped to circumvent hours of experimentations (as a background I've helped pioneer CUDA and rendering applications over 15 years ago which led to these AI developments) by following some of your tests. I know you're a busy person but it would be great to have a synopsis for more experienced users. I'll also admit that my experience is limited with training LORAs so I was hoping this would be a starter course. Mainly just looking for general settings and a conf. to work with. My big mistake was not realizing that there was a separate branch for Flux so stupid me.

Charles Leo

your error very obvious : https://pasteboard.co/hDtHytOoQNpT.png but please if you are not expert just stick to tutorial

Furkan Gözükara

No offense, but you've contradicted yourself. In your video, you specifically state to look for errors. Upon following your installers, the output (I believe twice) says that this particular file is missing in System32. HOWEVER, it is in your install folder. So I manually added it to the System32 directory only to find out that it breaks the Koyha gui.bat. You may want to consider removing those lines calling for this file as it's possibly a leftover from maybe a previous script? I can't say for certain but it doesn't serve any purpose.

Charles Leo

1 i also dont know. 2 related to your folders. you are not following me. you need to follow tutorial.

Furkan Gözükara

Walking more through your installers. Two more things: 1. would you bother enabling NUMA? 2. I've seen this a few times and manually added it to my directory. Not sure what it does exactly or if it's even important: "python: can't open file 'C:\\Windows\\System32\\Fix_GUI_Bat.py'"

Charles Leo

ye i understand you but sadly it is impossible to cover everyone. so either you have to be experienced to figure out or follow me exactly :D

Furkan Gözükara

Okay my oversight as I wasn't using your step 1 (especially if some of us already had Koyha_ss installed prior.) You're pulling/checkout a specific branch 'sd3-flux.1' which is probably the missing piece of the puzzle. I'm surprised that it's so many commits ahead of the Master (140 to be precise.) Also, following your video and writings here the order seems a bit mixed up as I watched half your video first PRIOR to even downloading your configs, not even realizing that you're trying to automate most of the installs for other people as I'm new here, but not to CG or AI in general. I'm more granular so by skipping, I missed this piece. Two other things I'd add is that your bats assume a specific install location. I prefer to designate my own directory structure/naming convention. The other is that I'd prefer to not have duplicate flux or other models floating around on the same drive and would rather share one file per different applications. It's one of the few reasons I started using Stability Matrix versus manual installations was for a more consistent organization of assets.

Charles Leo

Yes, but to save a little time are you adding a radio/checkbox for Flux independently that’s not part of the formal Koyha_SS commit? Is it a separate fork that can be found?

Charles Leo

you can edit my installer and see what it does so easy.

Furkan Gözükara

I’ll try giving it another clean install or on another pc. One thing I will say is that I tend to install everything manually first with latest updates and only then load in you config file. I prefer to know ahead of time what I’m installing with other bats and sources.

Charles Leo

I also tried it manually directly from the latest repo prior to trying to install it via SM. I made sure to get the latest updates and even latest 2.5 torch. I took a screenshot but seems like this interface wont allow me to share it here.

Charles Leo

Stability Matrix wont work. you have to use our installer since we switch to accurate branch

Furkan Gözükara

Unfortunately it doesn’t. I even went to install it via Stability Matrix instead of installing Koyha_ss manually thinking the install didn’t go properly and even that version doesn’t provide the option.

Charles Leo

when you select flux dev1 model it appears - select it from your pc

Furkan Gözükara

Going a little crazy as I do not have the checkbox for Flux as you do in your LORA video. It shows v2, v_parameterization, and SDXL only. Everything is up to date.

Charles Leo

thank you so much for comment. it is a good question. if you have too many images i would say not increase. if you dont have many images like only 15 images you can increase as you said

Furkan Gözükara

why don't you try SwarmUI? its x/y/z plot system is amazing

Furkan Gözükara

I wonder if somebody could share (or point me to) a FLUX1.dev workflow for ComfyUI XY-plot to test the generated Loras with several epochs.

Jason Dawn

When using multi GPU (2x GPU A6000) in Massed Compute, do I have to adjust the LR for Lora by your formula slightly, going from 0.00005 to 0.00006 or 0.00007 maybe? Your formula says that I should not adjust the LR, because LR = ( 0.00005 * 2 / 2 ) stays the same LR, but should it not also be adjusted slightly to take the dividing / splitting on 2x GPU into account? Thank you for all your work and all your help here!

Chris

have you tried face enhance of the SUPIR it may help. both enable face and background enhance options

Furkan Gözükara

SUPIR tends to distort or inadequately process faces when they are not the central element of the image or when they are situated deeper in the image's depth of field. Let's hope the new Flux ControlNet Upscale is good. Trying a workflow with it borrowed from a chinese guy.

Robert Arsene

ye upscale is bad use SUPIR : https://youtu.be/OYxVEvDf284

Furkan Gözükara

Any easy way to upscale them in SwarmUI for better quality? Could not find a specific tutorial for Flux.

Robert Arsene

Better now. Thanks. Still trying to get the quality face generation you get. It works well for close-up, but when the subject is further or full body, it barely works.

Robert Arsene

set CFG = 1

Furkan Gözükara

Seems the issue is I only get blurry images, even with other prompts.

Robert Arsene

you must be doing something wrong. i am preparing new prompts and will show in more details hopefully in upcoming video

Furkan Gözükara

I am trying to use some of your test prompts with photo of in SwarmUI. I think this is what makes full body generation with better faces, right? I only get a blurry image and only the face is clear.

Robert Arsene

your rank 3 was probably corrupted due to loading into dreambooth tab :D 3.59 decent speed

Furkan Gözükara

I switched to Rank_4_17250MB_4_85_Second_IT.json it seems to do the job at 3.59s/it

djbuzz

My card is 4090 RTX and I took the rank3 18000 something config

djbuzz

Hi, I installed everything flawlessly, and I command you for the perfect job you are doing. When I launch the training, I see this: 1/1950 [24:39<800:44:46, 1479.06s/it

djbuzz

Thank you, will try

Chris

if files exists it will work. i would use T5 16 bit a little bit better, you need to either manually select or download with certain file name our downloaders download

Furkan Gözükara

So Generation is better quality with Them? Because it also generates even when Not actively selecting and enabling them in the UI

Chris

you are welcome

Furkan Gözükara

Thank you

Chris

yes bnb not compatible dont waste time. i suggest use fp8 dev

Furkan Gözükara

When using swarmui and flux bnb-v4 model, loras dont work. can you reproduce that? it always gives error [ComfyUI-0/STDERR] RuntimeError: .to() does not accept copy argument

Chris

you need to put clip and vae file. it auto downloads fp8 T5

Furkan Gözükara

Do i have to use t5 file and clip file and ae file in the corresponding folders and select them in swarmui when creating an image, or does ist automatically load? or are those files not needed for creation? it is better with those files, like the t5 file?

Chris

great

Furkan Gözükara

I tried to rename the folder and on rank 5 I'm trying it all the time. finally I changed the encoder from t5xxl_fp16 to t5xxl_fp8_e4m3fn and so far it seems to work

Rolf Steiner

your vram was not sufficient. either reduce your VRAM before starting to training or use lower VRAM. and i see your paths has special chars (složka) it is a big red flag for AI. try rank 5 it is slow but best and should work fine on 16 gb

Furkan Gözükara

INFO move text encoders to gpu flux_train_network.py:216 Traceback (most recent call last): File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\sd-scripts\flux_train_network.py", line 519, in trainer.train(args) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\sd-scripts\train_network.py", line 402, in train self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\sd-scripts\flux_train_network.py", line 218, in cache_text_encoder_outputs_if_needed text_encoders[1].to(accelerator.device) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\transformers\modeling_utils.py", line 2905, in to return super().to(*args, **kwargs) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1340, in to return self._apply(convert) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) [Previous line repeated 4 more times] File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply param_applied = fn(param) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert return t.to( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB. GPU 0 has a total capacity of 16.00 GiB of which 10.91 GiB is free. Of the allocated memory 3.92 GiB is allocated by PyTorch, and 14.50 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Traceback (most recent call last): File "C:\Users\fotog\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\fotog\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\Kohya_GUI_Flux_Installer_v36\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_v36\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_v36/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/Nová složka\\model/config_lora-20240921-105913.toml']' returned non-zero exit status 1. 11:00:18-302466 INFO Training has ended.

Rolf Steiner

use latest zip file it will uninstall xformers

Furkan Gözükara

same issue

Rolf Steiner

Kohya GUI has LoRA merge feature but i doubt that it will work as you expect.

Furkan Gözükara

Hi, I've created 20 LoRAs with a dataset of around 50,000 frames taken from the FMVs of a specific video game. However, since the process would have been very long, I divided them into 20 LoRAs. I'd like to know if there's a way to combine all 20 LoRAs into a single one. Or perhaps some other solution. Otherwise, I'll have to use all 20 of them together to generate images, and I don't know if there will be any conflicts. If it's possible to combine them, I'd be happy to hear it. The theme is always the same since we're talking about FMV of a specific video game, so in theory, it should work? And be compatible with each other? I only used the trigger word without a prompt since I'm interested in the specific style. Then flux does the rest. To create the text files very quickly, I created a Python script that creates a text file for each frame with the same name and my chosen trigger word, and automatically calculates the number of frames in the folder. Best regards

The Room

But with Torch 2.5 I have no performance improvement to 2.4.1. How can this be ?

Bolli Hotshots

No it doesn't work same way as SDXL. Flux really tends to mix multi concepts and you are kind of trying that. We couldn't solve this issue yet sadly

Furkan Gözükara

Hello Furkan, first of all thank you for your incredible work and passion to teach us so much about Lora Training. I am diligently following and trying out your progress with FLUX training. So far my results have been very good. I started with 50 photos and then 100 photos. Finally I have now trained 3,400 photos as I did with SDXL. I train everything locally on my 4090. the training with 3,400 photos took almost 40 hours with 1 repeat and 10 epochs. The results with 3,400 are really good. But what I have noticed. I don't get the elements generated that I got with SDXL. I'll explain briefly what my goal is. I have 3,400 photos of a person in different outfits, locations, poses and expressions. The goal is to end up with a Lora that can reproduce all these elements. With SDXL I have achieved this with captioning. You say that Flux doesn't need detailed captions, but I still did it with the captions from my SDXL set. When I enter them during generation I don't get the results I was hoping for. My question is, does the training in Flux work the same way as in SDXL? Which way would you choose so that I can generate all elements of the training photos? Translated with www.DeepL.com/Translator (free version)

puk

please download latest zip file and run torch 2.5, it will uninstall xformers and dont select xformers use configs as they are

Furkan Gözükara

hello Furkan, im dealing with this error second day, need help. i have 3070 ti (laptop). 00:35:29-062276 INFO Copy C:/PavelGoliksteal to C:/FluxTraining\img/1_ohwx style... 00:35:29-093924 INFO Regularization images directory is missing... not copying regularisation images... 00:35:29-095832 INFO Done creating kohya_ss training folder structure at C:/FluxTraining... 00:35:55-757651 INFO Start training LoRA Flux1 ... 00:35:55-758655 INFO Validating lr scheduler arguments... 00:35:55-759657 INFO Validating optimizer arguments... 00:35:55-760659 INFO Validating lora type is Flux1 if flux1 checkbox is checked... 00:35:55-761662 INFO Validating C:/FluxTraining\log existence and writability... SUCCESS 00:35:55-763667 INFO Validating C:/FluxTraining\model existence and writability... SUCCESS 00:35:55-764670 INFO Validating C:/WORK/Apps/Kohya/flux1-dev.safetensors existence... SUCCESS 00:35:55-765673 INFO Validating C:/FluxTraining\img existence... SUCCESS 00:35:55-766675 INFO Folder 1_ohwx style: 1 repeats found 00:35:55-767679 INFO Folder 1_ohwx style: 15 images found 00:35:55-768680 INFO Folder 1_ohwx style: 15 * 1 = 15 steps 00:35:55-769683 INFO Regulatization factor: 1 00:35:55-770686 INFO Train batch size: 1 00:35:55-772169 INFO Gradient accumulation steps: 1 00:35:55-772673 INFO Epoch: 150 00:35:55-773672 INFO max_train_steps (15 / 1 / 1 * 150 * 1) = 2250 00:35:55-774675 INFO stop_text_encoder_training = 0 00:35:55-779166 INFO lr_warmup_steps = 0 00:35:55-782174 INFO Saving training config to C:/FluxTraining\model\FirstTryGolik_20240919-003555.json... 00:35:55-784179 INFO Executing command: C:\WORK\Apps\Kohya\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 C:/WORK/Apps/Kohya/kohya_ss/sd-scripts/flux_train_network.py --config_file C:/FluxTraining\model/config_lora-20240919-003555.toml 2024-09-19 00:36:03 WARNING WARNING[XFORMERS]: xFormers can't load C++/CUDA _cpp_lib.py:148 extensions. xFormers was built for: PyTorch 2.4.0+cu121 with CUDA 1201 (you have 2.0.1+cu117) Python 3.10.11 (you have 3.10.11) Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-x formers) Memory-efficient attention, SwiGLU, sparse and more won't be available. Set XFORMERS_MORE_DETAILS=1 for more details Traceback (most recent call last): File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\xformers\checkpoint.py", line 53, in from torch.utils.checkpoint import SAC_IGNORED_OPS as _ignored_ops # type: ignore ImportError: cannot import name 'SAC_IGNORED_OPS' from 'torch.utils.checkpoint' (C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 710, in _get_module return importlib.import_module("." + module_name, self.__name__) File "C:\Users\sasab\AppData\Local\Programs\Python\Python310\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\loaders\ip_adapter.py", line 34, in from ..models.attention_processor import ( File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 31, in import xformers File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\xformers\__init__.py", line 12, in from .checkpoint import ( # noqa: E402, F401 File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\xformers\checkpoint.py", line 57, in from torch.utils.checkpoint import _ignored_ops # type: ignore ImportError: cannot import name '_ignored_ops' from 'torch.utils.checkpoint' (C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 710, in _get_module return importlib.import_module("." + module_name, self.__name__) File "C:\Users\sasab\AppData\Local\Programs\Python\Python310\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 24, in from ...loaders import FromSingleFileMixin, IPAdapterMixin, LoraLoaderMixin, TextualInversionLoaderMixin File "", line 1075, in _handle_fromlist File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 700, in __getattr__ module = self._get_module(self._class_to_module[name]) File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 712, in _get_module raise RuntimeError( RuntimeError: Failed to import diffusers.loaders.ip_adapter because of the following error (look up to see its traceback): cannot import name '_ignored_ops' from 'torch.utils.checkpoint' (C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\WORK\Apps\Kohya\kohya_ss\sd-scripts\flux_train_network.py", line 13, in from library import flux_models, flux_train_utils, flux_utils, sd3_train_utils, strategy_base, strategy_flux, train_util File "C:\WORK\Apps\Kohya\kohya_ss\sd-scripts\library\flux_train_utils.py", line 17, in from library import flux_models, flux_utils, strategy_base, train_util File "C:\WORK\Apps\Kohya\kohya_ss\sd-scripts\library\train_util.py", line 52, in from diffusers import ( File "", line 1075, in _handle_fromlist File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in __getattr__ value = getattr(module, name) File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in __getattr__ value = getattr(module, name) File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 700, in __getattr__ module = self._get_module(self._class_to_module[name]) File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 712, in _get_module raise RuntimeError( RuntimeError: Failed to import diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion because of the following error (look up to see its traceback): Failed to import diffusers.loaders.ip_adapter because of the following error (look up to see its traceback): cannot import name '_ignored_ops' from 'torch.utils.checkpoint' (C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py) Traceback (most recent call last): File "C:\Users\sasab\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\sasab\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\WORK\Apps\Kohya\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\WORK\Apps\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\WORK\\Apps\\Kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/WORK/Apps/Kohya/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'C:/FluxTraining\\model/config_lora-20240919-003555.toml']' returned non-zero exit status 1. 00:36:05-159564 INFO Training has ended

Sasha

ye that is a good speed. linux faster but this is what we have on windows. it is like 100% faster than rtx 3090

Furkan Gözükara

Still slow the 2.5. Training. Should it be faster on Windows and 4090 GPU ? 2024-09-17 12:07:26 INFO epoch is incremented. current_epoch: 70, epoch: 71 train_util.py:672 steps: 36%|████████████████████████████████████████████████▎ | 13490/38000 [14:09:23<25:43:15, 3.78s/it, avr_loss=0.466] epoch 72/200

Bolli Hotshots

I have emailed you

JASH MEHTA

first thank you. duration sounds to me right. what is your step speed like how many seconds it and also which rank?

Furkan Gözükara

You are a true master. I am training right now my portrait with Lora using Flux and 20 photos with 200 epoch. I train locally using my rtx 4090 gfxcard. Is it normal that the Lora training will take 4 a 5 hours or did I miss an optimization script or setting?

Judi Godvliet

we dont use xformers during training

Furkan Gözükara

Clip L text encoder improves significantly. T5 is very minimal impact in my tests.

Furkan Gözükara

Strange I upgraded now to 2.5 - but feels slower performance on Windows in comparison to 2.4.1. Also Xformer upgrade needed ?

Bolli Hotshots

What was the impact text encoder training on vs off? If you got a chance to compare.

Manpreet Singh

Download latest zip file and it has another bat file look all names

Furkan Gözükara

Please link, your Update_Kohya_and_Fix_FLUX_Step2.bat just installs torch 2.4.1...

Bolli Hotshots

sadly hard to debug with this info

Furkan Gözükara

you need to upgrade 2.5 use my bat file, it speeds up training

Furkan Gözükara

email me entire logs this doesnt tell the error : monstermmorpg@gmail.com

Furkan Gözükara

hey furkan thanks for the tutorial, I have setup everything correctly but when the training starts it is just getting stuck, what should I do? I have a RTX 4090, also I am using rank 4 config Training has ended. 15:08:48-651407 INFO Start training Dreambooth... 15:08:48-652411 INFO Validating lr scheduler arguments... 15:08:48-652411 INFO Validating optimizer arguments... 15:08:48-653448 INFO Validating C:/Forge/webui_forge_cu121_torch21/webui/models/Stable-diffusion/jash-flux/log existence and writability... SUCCESS 15:08:48-654448 INFO Validating C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\model existence and writability... SUCCESS 15:08:48-654448 INFO Validating C:/Users/admin/Downloads/flux_dev.safetensors existence... SUCCESS 15:08:48-655448 INFO Validating C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img existence... SUCCESS 15:08:48-656447 INFO Folder 1_ohwx man: 1 repeats found 15:08:48-657447 INFO Folder 1_ohwx man: 15 images found 15:08:48-657447 INFO Folder 1_ohwx man: 15 * 1 = 15 steps 15:08:48-658447 INFO Regulatization factor: 1 15:08:48-658447 INFO Total steps: 15 15:08:48-659446 INFO Train batch size: 1 15:08:48-659446 INFO Gradient accumulation steps: 1 15:08:48-660448 INFO Epoch: 200 15:08:48-660448 INFO max_train_steps (15 / 1 / 1 * 200 * 1) = 3000 15:08:48-661448 INFO lr_warmup_steps = 0 15:08:48-662593 INFO Saving training config to C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\model\jash_flux_20240916-150848 .json... 15:08:48-663460 INFO Executing command: C:\Kohya_GUI_Flux_Installer_v33\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 C:/Kohya_GUI_Flux_Installer_v33/kohya_ss/sd-scripts/flux_train.py --config_file C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\model/config_dreambooth-2024091 6-150848.toml C:\Kohya_GUI_Flux_Installer_v33\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( C:\Kohya_GUI_Flux_Installer_v33\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( C:\Kohya_GUI_Flux_Installer_v33\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( 2024-09-16 15:08:57 INFO Loading settings from train_util.py:4230 C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\model/co nfig_dreambooth-20240916-150848.toml... INFO C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\model/co train_util.py:4249 nfig_dreambooth-20240916-150848 2024-09-16 15:08:57 INFO Using DreamBooth method. flux_train.py:101 INFO prepare images. train_util.py:1807 INFO get image size from name of cache files train_util.py:1745 100%|███████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 2137.19it/s] INFO set image size from cache files: 15/15 train_util.py:1752 INFO found directory train_util.py:1754 C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_oh wx man contains 15 image files WARNING No caption file found for 15 images. Training will continue without captions for train_util.py:1785 these images. If class token exists, it will be used. / 15枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャ プションなしで学習を続行します。class tokenが存在する場合はそれを使います。 WARNING C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_oh train_util.py:1792 wx man\a_photo_of_ohwxman (1).JPG WARNING C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_oh train_util.py:1792 wx man\a_photo_of_ohwxman (10).JPG WARNING C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_oh train_util.py:1792 wx man\a_photo_of_ohwxman (11).JPG WARNING C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_oh train_util.py:1792 wx man\a_photo_of_ohwxman (12).JPG WARNING C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_oh train_util.py:1792 wx man\a_photo_of_ohwxman (13).JPG WARNING C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_oh train_util.py:1790 wx man\a_photo_of_ohwxman (14).JPG... and 10 more INFO 15 train images with repeating. train_util.py:1848 INFO 0 reg images. train_util.py:1851 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1856 INFO [Dataset 0] config_util.py:570 batch_size: 1 resolution: (1024, 1024) enable_bucket: False network_multiplier: 1.0 [Subset 0 of Dataset 0] image_dir: "C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\img\1_o hwx man" image_count: 15 num_repeats: 1 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, is_reg: False class_tokens: ohwx man caption_extension: .txt INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:880 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00 flux_utils.py:69 INFO [Dataset 0] train_util.py:2328 INFO caching latents with caching strategy. train_util.py:988 INFO checking cache validity... train_util.py:998 100%|███████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 2246.87it/s] INFO no latents to cache train_util.py:1038 C:\Kohya_GUI_Flux_Installer_v33\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-09-16 15:08:58 INFO Building CLIP flux_utils.py:74 INFO Loading state dict from flux_utils.py:167 C:\ComfyUI_windows_portable\ComfyUI\models\clip\clip_l.safetensors INFO Loaded CLIP: flux_utils.py:170 INFO Loading state dict from flux_utils.py:215 C:\ComfyUI_windows_portable\ComfyUI\models\clip\t5xxl_fp16.safetensors INFO Loaded T5xxl: flux_utils.py:218 2024-09-16 15:09:13 INFO [Dataset 0] train_util.py:2349 INFO caching Text Encoder outputs with caching strategy. train_util.py:1111 INFO checking cache validity... train_util.py:1117 100%|███████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 2295.40it/s] INFO no Text Encoder outputs to cache train_util.py:1139 INFO cache Text Encoder outputs for sample prompt: flux_train.py:235 C:\Forge\webui_forge_cu121_torch21\webui\models\Stable-diffusion\jash-flux\model\sam ple/prompt.txt INFO Building Flux model dev flux_utils.py:45 INFO Loading state dict from C:/Users/admin/Downloads/flux_dev.safetensors flux_utils.py:52 INFO Loaded Flux: flux_utils.py:55 FLUX: Gradient checkpointing enabled. CPU offload: False INFO enable block swap: double_blocks_to_swap=0, single_blocks_to_swap=0 flux_train.py:272 number of trainable parameters: 11901408320 prepare optimizer, data loader etc. INFO use Adafactor optimizer | {'scale_parameter': False, 'relative_step': False, train_util.py:4541 'warmup_init': False, 'weight_decay': 0.01} WARNING because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 / train_util.py:4569 max_grad_normが設定されているためclip_grad_normが有効になります。0に設定して無効に したほうがいいかもしれません WARNING constant_with_warmup will be good / train_util.py:4573 スケジューラはconstant_with_warmupが良いかもしれません enable full bf16 training. running training / 学習開始 num examples / サンプル数: 15 num batches per epoch / 1epochのバッチ数: 15 num epochs / epoch数: 200 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 3000 steps: 0%| | 0/3000 [00:00 2024-09-16 15:09:33 INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:672

JASH MEHTA

Hi, Torch 2.4.1 performance Issue under Window is really severe. Is it possible to downgrade to an older but performant torch version, e.g. 2.3.1. ?

Bolli Hotshots

In total, I tried the 4x FAST, 4x SLOW and the Rank_1_T5_XXL_39200MB_9_46_Second_IT profiles. The GPU id's were set on the 4x profiles and I set it on the third. The two 4x profiles froze after "moving encoders to GPU" and the third froze before that point. I didn't have time to investigate, but I didn't have luck with any today.

Oinksauce

not important we dont use xformers. also on windows use Install_Torch_2_5_Dev_Huge_Speed_Up.bat it will uninstall xformers and upgrade torch to much faster working version

Furkan Gözükara

Hey Furkan, Apologies if this has already been addressed, but during the installation step, I encountered an error that says: ERROR: Could not find a version that satisfies the requirement xformers==0.0.27.post2 (from versions: 0.0.28.post1) ERROR: No matching distribution found for xformers==0.0.27.post2 Do you have any insight on how to fix this?

GhostDance

good tips thank you

Furkan Gözükara

Take a look at the overall cloudflared tunnel feature to create a bridge rather than the gradio share links. If you need to move files around you can temporarily zip them up and put them in one of the output folders and access them via the tunnel. It also supports much better security options

Bobbie

you are welcome let me know results

Furkan Gözükara

I did not try other configs - will try another rank 1 config and see if that works. TY

Oinksauce

yes 512 would be faster but lower quality. the size depends on saving as float or fp16, also training clip l and t5 increases ize. moreover size also depends on network lora rank.

Furkan Gözükara

did you try other configs? i did the multi gpu training on massed compute. did you set gpu ids? since it doesnt give error i dont know. if you upgrade gold membership i can connect your pc and try to help

Furkan Gözükara

hmm - Using the 4x GPU SLOW profile, the training keeps hanging at "move encoders to gpu" Using v33. I've tried 3x and on two different Pods on Runpod.

Oinksauce

I tried with virtual memory. around 92GB of it which i read was the recommended max limit and i have a Samsung 990pro SSD. But the outcome seems to be still the same. the moment it gets to loading "cache text encoder outputs" and the training ends. tracking the memory usage while i start training still appears to behave the same way. goes until 100% and drops down to 70% and then stops. is there anything else i can do to solve this ?

pratham

I have shown virtual ram in video check that part

Furkan Gözükara

You mean the system RAM? I have got 32gb of ram and also using an RTX 4070tiSuper, I will try resolving this on my own based on your reply but if it fails i will try upgrading to gold :)

pratham

With rank9 file, my loras are about 500mb in size, is that correct? seems a bit small? Also, could i reduce the resolution of the source images to 512px and set this in the training settings? would it be faster in training and use less vram? I have a 3060 12GB card.

Chris

you can save as FP16, it will reduce size to half, other than that no way to reduce size without sacrificing quality. they are big since Clip L + T5 XXL also added to the LoRA and LoRA rank is 128 and we save as Float max quality :)

Furkan Gözükara

Ok to ask questions here? I'm curious why the LORA's produced here are so large? I'm getting 4.6GB for these, but LORA's on CIVITAI are typically 10-100mb in size. Is there a way to make them smaller without sacrificing quality?

Oinksauce

very nice then use Rank_1_T5_XXL_39200MB_9_46_Second_IT.json

Furkan Gözükara

I train on runpod H100 with 80GB. I don't mind the cost I'm just looking for highest likeness, I also don't mind slightly overfitting. Thanks :)

Victor Mustin

which GPU you have i can tell you according to that. your GPU and its VRAM

Furkan Gözükara

I was referring to the 256 dataset training, but I have to admit I can't really sort through all these links. But yes say I have 15 images, which is the config that I should use ? The one that maximises likeness

Victor Mustin

are you using swarmui and the settings i used in video after training the lora when generating images? if not please try my config and see if makes improvements . i couldnt open your image you can email me : monstermmorpg@gmail.com

Furkan Gözükara

no it wouldnt matter much. ohwx is still a rare token for clip l and possibly T5 XXL. so it is safe to use

Furkan Gözükara

Hello, I've tried using your settings to train a LoRa 2 different times and I keep getting images that look off (https://postimg.cc/nCV44Cq2). The likeness is there, but you can see by the image that the quality is bad, and the background is way to blurred. Without using the LoRa, the images look great. What am I doing wrong?

sonofabear

Hey Furkan, thanks for your work. I wanted to ask, is the ohwx token still working for flux? Wasn’t this created for SD 1.5? Check out this post, that’s how they pulled the rare tokens for SD 1.5 https://www.reddit.com/r/StableDiffusion/comments/zc65l4/rare_tokens_for_dreambooth_training_stable/ If they’re different for flux, maybe it could lead to better results in the model, right?

Axel Blaze

well these entire research was done on 15 images dataset. or you mean even smaller? it is already shared in the post you can see.

Furkan Gözükara

Incredible! What would be some guidelines on what to change if our dataset is smaller ?

Victor Mustin

download v33 and use upgrade now it will install torch 2.5

Furkan Gözükara

i am going to update installers now i found it : pip3 install torch==2.5.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu124

Furkan Gözükara

you mentioned in your post that torch 2.5 is much faster than 2.4, what about 2.6 ? this line in your script probably installs latest nightly build, so that is why i am getting 2.6 pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 --upgrade is there a way i can install version 2.5 instead? tried to change script on my own but no success EDIT: tested 2.4 and 2.6, getting same speed, would be really nice if there would be a way to install 2.5 version. Regardless, great tutorial, thanks

smogas

you can use rank 4 to speed up huge. other than that you have to reduce resolution which degrades quality a lot

Furkan Gözükara

used script to install version 2.5 but it installed version 2.6. Loaded rank 3 on my 4090 and got only 4.5it/s. Have dataset of 57 uncaptioned images if that matters. How can i increase speed?

smogas

if your folder name is 1_ohwx man then it is okay you can ignore it

Furkan Gözükara

Hi Furkan, thanks for the fix, it is working now! However, I see the class_tokens: ohwx_man (with a underscore between ohwx and man) instead of class_tokens: ohwx man (which only has space in between in your video tutorial). Did I do anything wrong here?

TriVectorX

Oh I see, thank you so much I will definitely watch that one and try again later. The chosen config file is Rank_3_18950MB_9_05_Second_IT, as I understand 4090 can handle this.

TriVectorX

hello ty. this happens when you dont set accelerate. can you follow this video and also which config file are you using? https://youtu.be/adVhm9aI9Gc

Furkan Gözükara

Hi Furkan, thank you very much for the great tutorial! When I start the training, it stops at the step of caching latents 0/15, do you have any idea of how to fix this? I'm training on my windows PC with 4090 and I have 128GB of RAM.

TriVectorX

ill try that tonight cheers

Sonivas Sx

🥲

The Room

It doesn't work atm sadly

Furkan Gözükara

1:1 works best. So have a model that, have baseline then you can try multi aspect ratio resolution

Furkan Gözükara

Do we really need 1:1 512x512 Aspect ratio of the Dataset images when training a character ? or can we have different Dimensions in the same Dataset? are 2:1 ok, or do they yield worst results ?

Sonivas Sx

How Pausing and resuming LORA training with kohya? Tutorial? Best regards

The Room

ye stripes are so obvious. i use SwarmUI, 16 bit precision and ipndm sampler. can you test with same settings? also 40 steps. and what resolution you generate images?=

Furkan Gözükara

Thank you for letting me know; I hadn’t noticed. I’ve just updated the link. You can now view the dataset and the results generated with LoRA!

Rogério Bernardes

i just checked your images again and they link to same place. you can upload your training images to another link?

Furkan Gözükara

I trained on Massed Compute; it wasn't done locally. I used 2 RTX A6000 GPUs and set the rank to 1 (for 1 GPU).

Rogério Bernardes

can you repeat training on massed compute and train in fp16 and compare? did you train on computer? which rank? i am suspecting this

Furkan Gözükara

Hello! First of all, thank you so much for the guide. It’s been fantastic, and I've learned a lot from you. I’ve just created my first LoRA — I had never made one before, and Flux seems amazing for this purpose. I'm quite happy with the result, but I noticed that when I use it, the images have faint stripes in the lower-central area. The images in the dataset were of good quality and sized 1024x1024. Could this issue be related to the quality of the dataset? Could anyone help me with this? I'll include some sample images for reference. Dataset -> https://imgur.com/gallery/pOGRXed My Images with LoRA -> https://imgur.com/gallery/faint-stripes-lora-rWwlbjV What could be causing this? Thank you in advance!

Rogério Bernardes

when training nothing happens it internally fully captions , thus it removes my eyeglasses as well if i dont mention it

Furkan Gözükara

lol good call! any idea what may be causing the issue on the training/Lora side?

wylason

in that case also add black fur to your prompts. in my case i add eyeglasses to get more likeliness :D and yes flux is next level

Furkan Gözükara

This guide is great! have gotten some great results with just a few images.... blows SDXL/Dreambooth out of water for both consistency and likeness.... One question though... I've done some trainings on my dog (using 'dog' as the class name) and while it seems to get general likeness and features down.... the color/coat of the dog is completely off (chooses color randomly, while my dog has black fur)... any ideas? Does it have to do with the Flux model itself just not being great with animals/dogs?

wylason

true

Furkan Gözükara

Sounds good. Unfortunately, I see that even fp8 has some slight degradation compared to full model, but it's definitely better than schnell. Thanks!

gawdman

Kohya can't train quant model, not that I know any trainer can train.

Furkan Gözükara

Fp8 works perfect with loras so use fp8 + swarmui

Furkan Gözükara

Hi Furkan, Great tutorials. I've been able to follow you and use your configs to do some trainings with good results with the Flux1.dev model. However, I'm curious if you looked into seeing training on the quantized models. The reason is that the LoRAs trained on the Flux1.dev model are OK with the full model, but they perform so-so on the quantized models (which makes sense). I can run the full model on RunPod and MassedCompute just fine, but locally, I obviously can't load the entire Flux1.dev model without something like 32GB minimum VRAM, but I can run fp8 or schnell. Have you tried training LoRAs on those models?

gawdman

great you are welcome

Furkan Gözükara

Since I unchecked the following three checkboxes at once, I am not sure which one is responsible for the error, so I will verify each one once I finish the Lora study that is running now. Cache Text Encoder Outputs Cache Text Encoder Outputs to Disk Memory Efficient Save Anyway, I am glad that the Lora training is running. Thank you so much!

shigeto

yes but it must be on otherwise you dont have to turn off Cache Text Encoder Outputs, Cache Text Encoder Outputs to Disk unless you have some other problem for some reason

Furkan Gözükara

You mean Train T5-XXL model check box? I am using your Rank_3_18950MB_9_05_Second_IT.json, so this was off all along.

shigeto

because you are not using our configs. you enabled T5 training.

Furkan Gözükara

refresh page it works at the moment

Furkan Gözükara

HI, When i try to download the link:Kohya_GUI_Flux_Installer_30.zip,it shows that { "errors": [ { "code": 902, "code_name": "AttachmentNotFound", "detail": "Attachment with id 20420751 was not found.", "id": "755de518-de0b-513d-b721-4f07529ac096", "status": "404", "title": "Attachment was not found." } ] } how can i deal with it, thank you.

Goethe Chilwell

Finally I could run this script. I'm not entirely sure, but it seems to have worked with the following turned off Cache Text Encoder Outputs Cache Text Encoder Outputs to Disk Memory Efficient Save

shigeto

24gb

shigeto

very likely out of RAM. setup virtual RAM. also what is your GPU? If you upgrade to Gold membership i can connect your pc and fix hopefully

Furkan Gözükara

hi, the tutorial has been very much helpful as a complete beginner, but in the end I'm still getting an error, i spent around a whole day trying to find out what might be wrong but it seems like i have hit a dead end. how do i fix this? here's what im getting in my cmd 16:19:53-972928 INFO headless: False 16:19:54-002440 INFO Using shell=True when running external commands... Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. 16:20:27-764954 INFO Loading config... 16:22:24-261564 INFO Start training LoRA Flux1 ... 16:22:24-262565 INFO Validating lr scheduler arguments... 16:22:24-263566 INFO Validating optimizer arguments... 16:22:24-264566 INFO Validating C:\trained_images\log existence and writability... SUCCESS 16:22:24-264566 INFO Validating C:\trained_images\model existence and writability... SUCCESS 16:22:24-265564 INFO Validating C:/Kohya_GUI_Flux_Installer_23/flux_files/flux1-dev.safetensors existence... SUCCESS 16:22:24-266566 INFO Validating C:\trained_images\img existence... SUCCESS 16:22:24-267564 INFO Folder 1_ohwx man: 1 repeats found 16:22:24-268564 INFO Folder 1_ohwx man: 28 images found 16:22:24-268564 INFO Folder 1_ohwx man: 28 * 1 = 28 steps 16:22:24-269566 INFO Regulatization factor: 1 16:22:24-269566 INFO Total steps: 28 16:22:24-270566 INFO Train batch size: 1 16:22:24-271564 INFO Gradient accumulation steps: 1 16:22:24-271564 INFO Epoch: 200 16:22:24-272563 INFO max_train_steps (28 / 1 / 1 * 200 * 1) = 5600 16:22:24-272563 INFO stop_text_encoder_training = 0 16:22:24-273566 INFO lr_warmup_steps = 0 16:22:24-275566 INFO Saving training config to C:\trained_images\model\Best_v2_20240909-162224.json... 16:22:24-276566 INFO Executing command: C:\Kohya_GUI_Flux_Installer_23\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 C:/Kohya_GUI_Flux_Installer_23/kohya_ss/sd-scripts/flux_train_network.py --config_file C:\trained_images\model/config_lora-20240909-162224.toml C:\Kohya_GUI_Flux_Installer_23\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( C:\Kohya_GUI_Flux_Installer_23\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_fwd") C:\Kohya_GUI_Flux_Installer_23\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_bwd") C:\Kohya_GUI_Flux_Installer_23\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( 2024-09-09 16:22:32 INFO Loading settings from train_util.py:4189 C:\trained_images\model/config_lora-20240909-162224.toml... INFO C:\trained_images\model/config_lora-20240909-162224 train_util.py:4208 2024-09-09 16:22:32 INFO t5xxl_max_token_length: 512 flux_train_network.py:144 C:\Kohya_GUI_Flux_Installer_23\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-09-09 16:22:33 INFO Using DreamBooth method. train_network.py:281 INFO prepare images. train_util.py:1803 INFO get image size from name of cache files train_util.py:1741 100%|████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00<00:00, 1399.78it/s] INFO set image size from cache files: 28/28 train_util.py:1748 INFO found directory C:\trained_images\img\1_ohwx man contains 28 image files train_util.py:1750 INFO 28 train images with repeating. train_util.py:1844 INFO 0 reg images. train_util.py:1847 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1852 INFO [Dataset 0] config_util.py:570 batch_size: 1 resolution: (1024, 1024) enable_bucket: False network_multiplier: 1.0 [Subset 0 of Dataset 0] image_dir: "C:\trained_images\img\1_ohwx man" image_count: 28 num_repeats: 1 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, is_reg: False class_tokens: ohwx man caption_extension: .txt INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:876 100%|██████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:00 flux_utils.py:55 INFO prepare split model flux_train_network.py:99 2024-09-09 16:22:34 INFO load state dict for lower flux_train_network.py:106 INFO load state dict for upper flux_train_network.py:111 INFO prepare upper model flux_train_network.py:114 16:22:42-596679 INFO Training has ended.

pratham

that is GPU but your system RAM how much you have?

Furkan Gözükara

I use RTX4090. Lora learning in sdxl was working fine (in main branch).

shigeto

no not related to xformers. second problem could be your RAM. how much you have? Also enable virtual RAM and set to 60 gb restart and try again

Furkan Gözükara

I have downloaded it again since then, watched the youtube tutorial carefully from the beginning, and performed the steps, but I still get the following error. (never used dreambooth tab) One thing I am wondering is that during install, I was getting an error that xformers could not be installed, is this related? raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_30\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_30/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/Kohya_GUI_Flux_Installer_30/kohya_ss/outputs/shigeto_flux/model/config_lora-20240909-184210.toml']' returned non-zero exit status 3221225477.

shigeto

did you set accelerate? that error doesnt tell what is it the error

Furkan Gözükara

Thank you. After re-installing and re-running from the beginning, this error no longer occurred, but the following error occurred File "C:\Users\amata\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\amata\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_30\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_30/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/Kohya_GUI_Flux_Installer_30/kohya_ss/outputs/shigeto_flux/model/config_lora-20240909-104046.toml']' returned non-zero exit status 3221225477. 10:42:30-927356 INFO Training has ended.

shigeto

reduce VAE batch size to 1 if it is more than 1. also it is very likely that you at least 1 time loaded config into dreambooth tab and corrupted it forever

Furkan Gözükara

the only compression that wont cause quality loss is saving as fp16 - minimal loss. after that you have to reduce network lora rank which will reduce quality

Furkan Gözükara

Hi, I just have a general curiosity question. How come the loras we are training are around 2.5 gb? I see many fluxD ones that are as low as 25 mb and produce very good results. Is there a way to "compress" our loras without quality loss?

Michael

Thank you! You were exactly right. I accidentally filled out the dreambooth portion before realizing my mistake. I didn't think it would break it xD. Also had to redownload the config file because I saved it along the way in dreambooth.

Michael

CUDA Out of Memory during Caching Latents Stage - Assistance Needed Hello, Here’s what I’ve tried so far: Verified accelerate settings Followed the tutorial: https://youtu.be/adVhm9aI9Gc Replaced my YAML file with the recommended one in the Hugging Face accelerate folder I am encountering a persistent issue during training at the stage of caching latents. Despite having sufficient GPU memory (23.99 GiB total, 19.18 GiB free), the process fails with a CUDA Out of Memory error. Below is a snippet of the error message: INFO caching latents... train_util.py:1038 0%| | 0/5 [00:00 trainer.train(args) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\train_network.py", line 387, in train train_dataset_group.new_cache_latents(vae, accelerator.is_main_process) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\train_util.py", line 2325, in new_cache_latents dataset.new_cache_latents(model, is_main_process) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\train_util.py", line 1041, in new_cache_latents caching_strategy.cache_batch_latents(model, batch, subset.flip_aug, subset.alpha_mask, subset.random_crop) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\strategy_flux.py", line 230, in cache_batch_latents self._default_cache_batch_latents( File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\strategy_base.py", line 279, in _default_cache_batch_latents latents_tensors = encode_by_vae(img_tensor).to("cpu") File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\strategy_flux.py", line 226, in encode_by_vae = lambda img_tensor: vae.encode(img_tensor).to("cpu") File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\flux_models.py", line 343, in encode z = self.reg(self.encoder(x)) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\flux_models.py", line 196, in forward h = self.down[i_level].block[i_block](hs[-1]) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\flux_models.py", line 104, in forward h = swish(h) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\library\flux_models.py", line 54, in swish return x * torch.sigmoid(x) torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacity of 23.99 GiB of which 19.18 GiB is free. Of the allocated memory 3.18 GiB is allocated by PyTorch, and 26.62 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Traceback (most recent call last): File "C:\Users\amata\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\amata\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_30\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_30/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/Kohya_GUI_Flux_Installer_30/kohya_ss/output/shigeto_flux/model/config_lora-20240909-015530.toml']' returned non-zero exit status 1. 01:55:48-740726 INFO Training has ended.

shigeto

you are welcome

Furkan Gözükara

thanks !!!!!!

ProudChinaLover

use the Massed_Compute_Kohya_FLUX_Instructions.txt file and it will upgrade kohya and torch to 2.4 and yes it works max performance

Furkan Gözükara

For Mass Compute, is it being powered by the most updated Torch so that I can train much faster?

ProudChinaLover

hello i just tested our configs on the newest version. loaded into lora tab directly and set my paths. it works perfect no issues. so my guess is that you loaded the config at least 1 time into dreambooth tab and corrupted it. get a fresh config load into lora tab and re-set - also dont enable Train T5-XXL

Furkan Gözükara

I think I broke something -.- I made a great model and then I updated to v30 and I thought I updated all other files but am now getting this message when I try to start training: Traceback (most recent call last): File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\flux_train_network.py", line 519, in trainer.train(args) File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\train_network.py", line 441, in train self.post_process_network(args, accelerator, network, text_encoders, unet) File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\sd-scripts\flux_train_network.py", line 170, in post_process_network self.train_t5xxl = network.train_t5xxl File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1729, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LoRANetwork' object has no attribute 'train_t5xxl' Traceback (most recent call last): File "C:\Users\Aprucia\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Aprucia\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Kohya\Kohya_GUI_Flux_Installer_30\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Kohya\\Kohya_GUI_Flux_Installer_30\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/Kohya/Kohya_GUI_Flux_Installer_30/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'C:/Users/Aprucia/Desktop/AIModel training/Image/Model_Batches/Ted_Wheeler/Training_destination\\model/config_lora-20240907-142501.toml']' returned non-zero exit status 1.

Michael

incorrect folder setup. please watch the relevant part of the tutorial and reprepare your training dataset folders

Furkan Gözükara

Hello, I run on runpod, I get this error. How can I fix this? INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:876 0it [00:00, ?it/s] INFO make buckets train_util.py:882 WARNING min_bucket_reso and max_bucket_reso are ignored if train_util.py:899 bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画 像サイズから自動計算されるため、min_bucket_resoとmax_bu cket_resoは無視されます INFO number of images (including repeats) / train_util.py:928 各bucketの画像枚数(繰り返し回数を含む) /workspace/kohya_ss/venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /workspace/kohya_ss/venv/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) INFO mean ar error (without repeats): nan train_util.py:938 ERROR No data found. Please verify arguments train_network.py:332 (train_data_dir must be the parent of folders with images) / 画像がありません。引数指定を確認してください(train_ data_dirには画像があるフォルダではなく、画像があるフ ォルダの親フォルダを指定する必要があります)

Ichi Shun

you are welcome. i think support for these models will come eventually

Furkan Gözükara

Ok, thanks.

Albert

well its size is not same as dev model. so something is off. that is why kohya probably failing. " Clips, t5 and VAE are included in the models." you can open an issue thread on kohya and i am sure he can fix

Furkan Gözükara

All of my downloaded 20-11 GB DEV models. For example STOIQO NewReality Flux Dev.

Albert

are other models on civitai original fp8 or fp16? they are probably not. they need to be like either 23.8 gb or 11.8 gb

Furkan Gözükara

Thanks, I'll update next month. I mean I downloaded other flux models on civitai.com, with these flux models the Lora training doesn't run, but with the basic flux dev runs. You said in the video that you can train on other models, I checked and practicing on other models is not possible. This is not only my problem, but also future people who will train on your parameters will not be able to train on other models. I checked three different models, it seemed to me either you have a bug or a defense.

Albert

maybe post on kohya ss as a github issue : https://github.com/kohya-ss/sd-scripts/issues

Furkan Gözükara

i trained fp16 dev and fp16 schnell model not others . if you upgrade to gold i can connect your pc and determine your error.

Furkan Gözükara

Thanks, I set the settings correctly, the training goes on the base model Flux dev, when I tried to train on other models, the training does not start immediately error, I tried 3 different models and does not train, trains only on the base model Flux dev? What is the reason? Have you tried training on other models?

Albert

just wanna figure out why this model not working

guangyu niu

there is not a precise number but for flux more better i see that

Furkan Gözükara

With 1 product, how many different people will be fine? In case the hoodie has printing on both the front and back, when training, I should train it twice for the front and back or put it together in 1 training.

Ichi Shun

include both but dont have same person images multiple time. so it will only learn hoodie not the person

Furkan Gözükara

If I want to train a hoodie with a printed image, should I use a product photo or a product photo with the model wearing it? Is there any advice for this situation?

Ichi Shun

use my model downloader. use downloaded models

Furkan Gözükara

Hi, I'm trying to train a flux lora base on a flux finetune model, however, when I use koyha script with this finetune model, it report a error I never seen before which is "NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device." any idea why this happen? model link: https://www.liblib.art/modelinfo/ad11ceec4e26466b8da847bd8f6b4068?from=search

guangyu niu

Thank you

Raffaele Massaro

you are not wrong. with torch 2.5 , empty second rtx 3060 i get 20 second / it best. that is the speed loss due to having low VRAM

Furkan Gözükara

first of all thanks for these tutorials, I followed everything perfectly, but with a geforce 3060 with 12gb and 32gb of ram with rank 5 it doesn't go below 26s/it and with rank 6 24s/it ... where am I wrong?

Raffaele Massaro

well i tested configs and sadly didnt reduce that much. check the new configs. CuDNN doesnt make difference

Furkan Gözükara

Great! my VRAM is 16GB. Do you mean to move to rank 4 or download CuDNN or both ?

Sonivas Sx

just ohwx man. or if you want to train a style a full dataset here with proper captions check them : https://huggingface.co/MonsterMMORPG/3D-Cartoon-Style-FLUX

Furkan Gözükara

Thank you so much for doing all this work. Amazing work I love learning all of these new technologies but just don't have the time between family and work to dedicate as much time as I would like. So thank you for helping me keep up 😌🙏 Any chance you could try providing a couple of different datasets with example captioning? I know you are using the default dataset for consistent testing I would just like to see other datasets too because sometimes I can't tell if my datasets is a problem or my captioning :)

Michael

i just updated v28 and step 2 will fix it. it is fastapi error

Furkan Gözükara

i just updated v28 and step 2 will fix it. it is fastapi error

Furkan Gözükara

activate venv and do pip install fastapi==0.111.0 . i will tell kohya to make fix asap . it is error of fastapi

Furkan Gözükara

i am about to make a new update. how much VRAM you had? if 16 gb it can benefit you a lot

Furkan Gözükara

I got the same error. I reinstalled 2 times and the result is the same

Cemil Hacimahmutoglu

Thanks for everything! Doing my first training on a 4080 i used Rank 5, can push it to rank 4 ? Also should i download CuDNN? For a dataset of 24 images it takes around 14hrs. Any suggestions you can give me or information i can give you to make this faster i'd be glad to help

Sonivas Sx

HI! Cant load config into gui. No browser windows for selecting config files. On the right top is error: error Connection errored out. Please help What I've tried: 1) reunstall 2) different browser 3) update pydantic logs: File "c:\ai\onetrainer\venv\lib\site-packages\pydantic\_internal\_generate_schema.py", line 789, in _generate_schema_inner return self.match_type(obj) File "c:\ai\onetrainer\venv\lib\site-packages\pydantic\_internal\_generate_schema.py", line 875, in match_type return self._unknown_type_schema(obj) File "c:\ai\onetrainer\venv\lib\site-packages\pydantic\_internal\_generate_schema.py", line 415, in _unknown_type_schema raise PydanticSchemaGenerationError( pydantic.errors.PydanticSchemaGenerationError: Unable to generate pydantic-core schema for . Set `arbitrary_types_allowed=True` in the model_config to ignore this error or implement `__get_pydantic_core_schema__` on your type to fully support it. If you got this error by calling handler() within `__get_pydantic_core_schema__` then you likely need to call `handler.generate_schema()` since we do not call `__get_pydantic_core_schema__` on `` otherwise to avoid infinite recursion. For further information visit https://errors.pydantic.dev/2.8/u/schema-for-unknown-type

AeAeAi_Studio

yes i plan to cover onetrainer too. you can actually copy paste our config there i mean look each variable but i didnt have chance yet

Furkan Gözükara

OneTrainer now also supports Flux Dev Model. Did you try that? Are you planing to release a tutorial on that for different GPU (mine is 3060 12GB), or would it be better just to stick with Khoya tutorial? Thanks for all your great work!

Chris

message me from discord and show all settings screenshots. i did over 100 trainings not a once that error

Furkan Gözükara

I downloaded forge, checked it with different models, the same thing is just being generated and lora is not taken into account, it does not work.

Albert

I trained on runpod and on windows, same error.

Albert

Hi! I have a bit of skill success training on sdxl, but now I trained on fluxdev on runpod as per instructions and after training added lora to comfyui and at gyneration error ora key not loaded: lora_te1_text_model_encoder_layers_9_self_attn_v_proj.alpha lora key not loaded: lora_te1_text_model_encoder_layers_9_self_attn_v_proj.lora_down.weight lora key not loaded: lora_te1_text_model_encoder_layers_9_self_attn_v_proj.lora_up.weight

Albert

awesome please message me from discord and install google remote

Furkan Gözükara

Just updated my membership to gold so you can do that ^^ How should we proceed so you can check my PC?

inès Marzat

For inference yes that is better. But I haven't tested loras with them yet. So don't know if working

Furkan Gözükara

Pretty sure we need to use GGUF model on 8GB, I can't work out how to use it on swarmUI though but reading from others need to use that for lowvram

Brett Kelly

this is a valid point. it works in swarmui but i didnt test others yet

Furkan Gözükara

I think it is the training after adding CLIP_l that makes LORA invalid. I also encountered the problem that the LORA model is invalid after loading the LORA file.

楠 陈

well if you upgrade gold member i can connect your pc and check. nothing else comes to my mind

Furkan Gözükara

I've tried it with v27 and i've set the accelerate, ive reset it to be sure and the problem still exists but v16 still works. thank you for your response

chucky inaba

i can look for it good idea

Furkan Gözükara

I think we need to use GGUF model for loading Lora with low vram cards, maybe you can do tutorial about this?

Brett Kelly

yes you can use any cloud service. great

Furkan Gözükara

Thank you so much, it worked for me with Massed Compute. The only thing that didn't work was when I tried to upload my Flux Lora model to Hugging Face using JupyterLab to download it to my computer. So what I do is connect to an online cloud service (like pCloud, OneDrive, etc.) and upload my Loras there to download them later to my computer. Apart from that, everything was good. Thanks again !

AshHir

person training, I cant see anything in the logs for swarmui about using the lora. I have an 8GB card so maybe some issue with this...

Brett Kelly

That is great so something causing. Sadly I don't know atm from this info

Furkan Gözükara

That's nice but you don't need it. I've got it running on my other system. It just doesn't run on my main computer.

Hans Peter

Sent a private message please check it out. If you still can't make upgrade gold tier I will connect pc and setup for you

Furkan Gözükara

Doesn't change anything if I load the models from a different folder. The problem remains.

Hans Peter

Sadly hard to know without seeing entire process :/ if you upgrade gold level I can connect your pc and check

Furkan Gözükara

I've just made an experiment to see if the problem would come from my trained Lora files, downloaded a Lora Flux model on civitai, added it in SwarmUI, but same, it didn't apply the Lora on my generated images :( The Lora file is selected, Flux model as well, followed your tut for parameters... Do you have any ideas of where the problem could come from ? Things I should test or try to fix it ? Especially now that it's also weirdly affecting my Comfy UI installation that used to work well before I updated it ! Thanks again for your help.

inès Marzat

You are welcome thanks for support

Furkan Gözükara

Thanks Dr.!

Tiago4D

Not very much since not fitting into vram. You can reduce resolution to 512px for faster but it will reduce quality. Or use cloud services just published a full tutorial for it

Furkan Gözükara

Don't use pinokio follow tutorials

Furkan Gözükara

You are using pinokio install as in tutorials

Furkan Gözükara

on this case its working, but my machine gpu is 3060... I´m trying to reduce the time of calculations, is possible?

Tiago4D

And the parameters are also not taken over by Kohya from the config file is always set to None

Hans Peter

Training does not start. It loads something into the normal ram but does not transfer it to the graphics card. INFO Loaded T5xxl: flux_utils.py:216 INFO Building AutoEncoder flux_utils.py:62 INFO Loading state dict from flux_utils.py:66 C:/Users/xxx/pinokio/api/comfyui.git/app/models/vae/ae.safetensors INFO Loaded AE: flux_utils.py:69 import network module: networks.lora_flux INFO [Dataset 0] train_util.py:2326 INFO caching latents with caching strategy. train_util.py:984 INFO checking cache validity... train_util.py:994 100%|██████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00

Hans Peter

hello just proceed with step 2 installer it fixes it

Furkan Gözükara

Hello, I follow the tutorial step by step but I get this error message when installing Kohya_ss: INFO Kohya_ss GUI version: v24.2.0 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit4)] INFO Submodule initialized and updated. INFO Installing requirements from requirements_pytorch_windows.txt... Looking in indexes: https://download.pytorch.org/whl/cu124 Collecting torch==2.4.0+cu124 Using cached https://download.pytorch.org/whl/cu124/torch-2.4.0%2Bcu124-cp310-cp310-win_amd64.whl (2503.3 MB) Collecting torchvision==0.19.0+cu124 Using cached https://download.pytorch.org/whl/cu124/torchvision-0.19.0%2Bcu124-cp310-cp310-win_amd64.whl (5.9 MB) ERROR: Could not find a version that satisfies the requirement xformers==0.0.27.post2 (from versions: none) ERROR: No matching distribution found for xformers==0.0.27.post2 What do I have to do?

Paco Chanivet

use 24 gb dev model file

Furkan Gözükara

can you try with swarmUI? also i trained a style and shared checkpoints here working perfect. you can see dataset too. i am generating grids right now . and you trained FLUX LoRA right? : https://huggingface.co/MonsterMMORPG/3D-Cartoon-Style-FLUX

Furkan Gözükara

Just ran Comfy UI after having updated it and it seems it doesn't apply my other Loras either anymore :( very weird, some weeks/months ago it worked perfectly! Would the problem come from the update ? something external ? Any clue would help a lot.

inès Marzat

I guessed :'( I trained a model with my own images. I'm an artist and I do 3D abstract faces kind of... So it's kind of persons, kind of concepts. When I followed you tutorials for training with stable diffusion on ComfyUI I didn't have that issue tho. Should I try to test my Loras on ComfyUI maybe ? I tried it quickly but it didn't recognise the flux 1dev model. Do you have any tutorials on that ? big thanks for your help!

inès Marzat

what is your GPU? your problem is "says GPU not available"

Furkan Gözükara

well without seeing your entire training workflow it is impossible to know :/ i did over 90 trainings all worked :/ what did you train a person object style?

Furkan Gözükara

well without seeing your entire training workflow it is impossible to know :/ i did over 90 trainings all worked :/ what did you train a person object style?

Furkan Gözükara

I think I have the same issue :(

inès Marzat

Hey! Thanks a lot for doing all those amazing tutorials. I've successfully installed Kohya, trained my Lora Models, installed SwarmUI, Flux models appear in the interface, my Loras too, everything seems to be in the right place and folder BUT my lora models doesn't seem to affect the generated image (°_°). I've tried many different things but I don't know what to do anymore. Could you help me figure this out ? Many thanks!

inès Marzat

is not working, need to change on step 01?

Tiago4D

Training went okay but when I try and do a prompt and apply the Lora the model does not seem to use the lora, any ideas what may be causing this?

Brett Kelly

2024-09-04 10:39:58 INFO prepare split model flux_train_network.py:99 INFO load state dict for lower flux_train_network.py:106 INFO load state dict for upper flux_train_network.py:111 INFO prepare upper model flux_train_network.py:114 Traceback (most recent call last): File "C:\kohya_ss\kohya_ss\sd-scripts\flux_train_network.py", line 446, in trainer.train(args) File "C:\kohya_ss\kohya_ss\sd-scripts\train_network.py", line 344, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "C:\kohya_ss\kohya_ss\sd-scripts\flux_train_network.py", line 83, in load_target_model model = self.prepare_split_model(model, weight_dtype, accelerator) File "C:\kohya_ss\kohya_ss\sd-scripts\flux_train_network.py", line 116, in prepare_split_model flux_upper.to(accelerator.device, dtype=target_dtype) File "C:\kohya_ss\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1174, in to return self._apply(convert) File "C:\kohya_ss\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 780, in _apply module._apply(fn) File "C:\kohya_ss\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 805, in _apply param_applied = fn(param) File "C:\kohya_ss\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1167, in convert raise NotImplementedError( NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. Traceback (most recent call last): File "C:\Users\tiago\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\tiago\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\kohya_ss\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "C:\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\kohya_ss\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/kohya_ss/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'F:/OneDrive/_00_Tiago4D/Pol/Jobs/Titi/lora\\model/config_lora-20240904-103947.toml']' returned non-zero exit status 1. 10:39:59-899928 INFO Training has ended.

Tiago4D

on the gui look fp8 : fp8 base unet

Furkan Gözükara

Sorry, where to change?

Tiago4D

Hi, I followed your tutorial, whenevr I try to train anything, I get to this, essentially it gets to "caching latents" and then nothing happens and says "training has ended." What am I doing wrong? Also initially when I load the GUY, I get this which says GPU not available, might that be it? A start up: 15:38:05-868768 INFO Kohya_ss GUI version: v24.2.0 15:38:06-183803 INFO Submodule initialized and updated. 15:38:06-185802 INFO nVidia toolkit detected 15:38:07-127402 INFO Torch 2.4.0+cu124 15:38:07-148194 WARNING Torch reports GPU not available Here is when trying to train: INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:876 100%|████████████████████████████████████████████████████████████████████████████████| 59/59 [00:00<00:00, 58794.00it/s] INFO prepare dataset train_util.py:884 INFO network for CLIP-L only will be trained. T5XXL will not be trained flux_train_network.py:50 / CLIP-Lのネットワークのみが学習されます。T5XXLは学習されません INFO preparing accelerator train_network.py:335 accelerator device: cpu INFO Building Flux model dev flux_utils.py:45 INFO Loading state dict from flux_utils.py:52 C:/Kohya_GUI_Flux_Installer_24/flux1-dev.safetensors INFO Loaded Flux: flux_utils.py:55 INFO Building CLIP flux_utils.py:74 INFO Loading state dict from C:/Kohya_GUI_Flux_Installer_24/clip_l.safetensors flux_utils.py:167 INFO Loaded CLIP: flux_utils.py:170 INFO Loading state dict from flux_utils.py:213 C:/Kohya_GUI_Flux_Installer_24/t5xxl_fp16.safetensors INFO Loaded T5xxl: flux_utils.py:216 INFO Building AutoEncoder flux_utils.py:62 INFO Loading state dict from C:/Kohya_GUI_Flux_Installer_24/vae.safetensors flux_utils.py:66 INFO Loaded AE: flux_utils.py:69 import network module: networks.lora_flux INFO [Dataset 0] train_util.py:2326 INFO caching latents with caching strategy. train_util.py:984 INFO checking cache validity... train_util.py:994 100%|███████████████████████████████████████████████████████████████████████████████████████████| 59/59 [00:00

C

Great, thank you!

Mikkel Olsen

we fixed this issue with v26 configs. also did you set accelerate : https://youtu.be/adVhm9aI9Gc

Furkan Gözükara

hi you have 112k steps. also i think you have at least once loaded the config into the dreambooth tab and corrupted. so you have lots of error first get a fresh config from zip file second make your folder repeat 1 and it should work perfect on rtx 4090. you can use rank 3 or rank 4

Furkan Gözükara

yes overwrite existing files. also before running kohya use Update_Kohya_and_Fix_FLUX_Step2.bat to get latest version

Furkan Gözükara

How do you update from one version of the zip to a newer one? Can you overwrite the files in the folder or do you need to start all over with install when unzipping?

Mikkel Olsen

Hi and thank u for ur effort, I followed your tutorial and tried one of the lowest conf still not working. I have an RTX 4090. here what I am getting: 09:58:00-132843 INFO headless: False 09:58:00-172582 INFO Using shell=True when running external commands... Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. 09:58:14-090516 INFO Loading config... 09:59:11-912756 INFO Copy D:/NJK to D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\img/40_NJK man... 09:59:11-966755 INFO Regularization images directory is missing... not copying regularisation images... 09:59:11-968755 INFO Done creating kohya_ss training folder structure at D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01... 09:59:48-585619 INFO Start training Dreambooth... 09:59:48-586619 INFO Validating lr scheduler arguments... 09:59:48-587619 INFO Validating optimizer arguments... 09:59:48-587619 INFO Validating D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\log existence and writability... SUCCESS 09:59:48-588618 INFO Validating D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\model existence and writability... SUCCESS 09:59:48-589620 INFO Validating D:/Kohya_GUI_Flux_Installer_25/flux1-dev.safetensors existence... SUCCESS 09:59:48-590619 INFO Validating D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\img existence... SUCCESS 09:59:48-590619 INFO Folder 40_NJK man: 40 repeats found 09:59:48-591619 INFO Folder 40_NJK man: 14 images found 09:59:48-592619 INFO Folder 40_NJK man: 14 * 40 = 560 steps 09:59:48-592619 INFO Regulatization factor: 1 09:59:48-593619 INFO Total steps: 560 09:59:48-593619 INFO Train batch size: 1 09:59:48-594619 INFO Gradient accumulation steps: 1 09:59:48-595619 INFO Epoch: 200 09:59:48-595619 INFO max_train_steps (560 / 1 / 1 * 200 * 1) = 112000 09:59:48-596618 INFO lr_warmup_steps = 0 09:59:48-597618 INFO Saving training config to D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\model\Rank_10_20240904-095948.json... 09:59:48-598619 INFO Executing command: D:\Kohya_GUI_Flux_Installer_25\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 D:/Kohya_GUI_Flux_Installer_25/kohya_ss/sd-scripts/flux_train.py --config_file D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\model/config_dreambooth-20240904-095948.toml D:\Kohya_GUI_Flux_Installer_25\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( D:\Kohya_GUI_Flux_Installer_25\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_fwd") D:\Kohya_GUI_Flux_Installer_25\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:344: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_bwd") D:\Kohya_GUI_Flux_Installer_25\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( 2024-09-04 09:59:55 INFO Loading settings from train_util.py:4189 D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\model/config_dreambooth-2024090 4-095948.toml... INFO D:/Kohya_GUI_Flux_Installer_25/OUTPUT/01\model/config_dreambooth-2024090 train_util.py:4208 4-095948 2024-09-04 09:59:55 INFO Using DreamBooth method. flux_train.py:101 INFO prepare images. train_util.py:1803 INFO get image size from name of cache files train_util.py:1741 100%|███████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 13964.39it/s] INFO set image size from cache files: 0/14 train_util.py:1748 INFO found directory D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man train_util.py:1750 contains 14 image files WARNING No caption file found for 14 images. Training will continue without train_util.py:1781 captions for these images. If class token exists, it will be used. / 14枚の画像にキャプションファイルが見つかりませんでした。これらの画像につ いてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 WARNING D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man\NJ (1).jpg train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man\NJ (10).jpg train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man\NJ (11).jpg train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man\NJ (12).jpg train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man\NJ (13).jpg train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man\NJ (14).jpg... train_util.py:1786 and 9 more INFO 560 train images with repeating. train_util.py:1844 INFO 0 reg images. train_util.py:1847 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1852 INFO [Dataset 0] config_util.py:570 batch_size: 1 resolution: (896, 896) enable_bucket: False network_multiplier: 1.0 [Subset 0 of Dataset 0] image_dir: "D:\Kohya_GUI_Flux_Installer_25\OUTPUT\01\img\40_NJK man" image_count: 14 num_repeats: 40 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, is_reg: False class_tokens: NJK man caption_extension: .txt INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:876 100%|█████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 466.67it/s] INFO prepare dataset train_util.py:884 INFO prepare accelerator flux_train.py:171 accelerator device: cuda INFO Building AutoEncoder flux_utils.py:62 INFO Loading state dict from D:/Kohya_GUI_Flux_Installer_25/ae.safetensors flux_utils.py:66 INFO Loaded AE: flux_utils.py:69 2024-09-04 09:59:56 INFO [Dataset 0] train_util.py:2326 INFO caching latents with caching strategy. train_util.py:984 INFO checking cache validity... train_util.py:994 100%|██████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-09-04 09:59:59 INFO Building CLIP flux_utils.py:74 INFO Loading state dict from D:/Kohya_GUI_Flux_Installer_25/clip_l.safetensors flux_utils.py:167 INFO Loaded CLIP: flux_utils.py:170 INFO Loading state dict from flux_utils.py:213 D:/Kohya_GUI_Flux_Installer_25/t5xxl_fp16.safetensors 2024-09-04 10:00:00 INFO Loaded T5xxl: flux_utils.py:216 2024-09-04 10:00:06 INFO [Dataset 0] train_util.py:2347 INFO caching Text Encoder outputs with caching strategy. train_util.py:1107 INFO checking cache validity... train_util.py:1113 100%|██████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00 flux_utils.py:55 FLUX: Gradient checkpointing enabled. CPU offload: False INFO enable block swap: double_blocks_to_swap=0, single_blocks_to_swap=0 flux_train.py:272 number of trainable parameters: 11901408320 prepare optimizer, data loader etc. INFO use Adafactor optimizer | {'scale_parameter': False, 'relative_step': train_util.py:4500 False, 'warmup_init': False, 'weight_decay': 0.01} WARNING because max_grad_norm is set, clip_grad_norm is enabled. consider set to train_util.py:4528 0 / max_grad_normが設定されているためclip_grad_normが有効になります。0に設定 して無効にしたほうがいいかもしれません WARNING constant_with_warmup will be good / train_util.py:4532 スケジューラはconstant_with_warmupが良いかもしれません enable full bf16 training. running training / 学習開始 num examples / サンプル数: 560 num batches per epoch / 1epochのバッチ数: 560 num epochs / epoch数: 200 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 112000 steps: 0%| | 0/112000 [00:00

JackOppss

Hey, ive been using v16 of the khoya trainer and thats been working fine but everytime i try to use the any of the later version it refuses to run off the gpu and instead uses my cpu. i checked the params in khoya but im still not sure whats causing it. help would be much appreciated, thanks.

chucky inaba

Only FP8 and FP16 base models supported. Use FP16 base model it gets auto cast

Furkan Gözükara

Hi Dr.! How I can do my training with flux1-dev-bnb-nf4 model? is possible?

Tiago4D

rank 4 was broken previously and yes currently fixed. rank 3 has it on but it slows down hugely. i think it is also bug and i am expecting a fix soon hopefully

Furkan Gözükara

T5 attention mask is turned off on v26 rank 4. is that intended? it was enabled on v25 seems highvram and T5 attention were switched. better quality?

Dallin Mackay

good question but i didnt test impact of it since i don't have a good such dataset.

Furkan Gözükara

So you suggest training at 1024x1024 only is better than bucketing across resolutions?

Manpreet Singh

by the way just fixed rank4 so download newest zip file

Furkan Gözükara

awesome

Furkan Gözükara

awesome

Furkan Gözükara

it doesnt use since bucketing not enabled. even if you enable it only downscale unless you enable upscale option which i dont know how :D

Furkan Gözükara

Thanks. It now works.

JamZam WamBam

Looks like the configs have max bucket resolution set to 2048 and you also crop+resize your images to 1024x1024. Does Kohya ever use the 2048 bucket in this case? Or this effectively using 1024 as max bucket resolution?

Manpreet Singh

I was able to get it working under Linux, as well. Torch 2.4.0+cu124 works fine, Python version is 3.10.14 (main, Aug 5 2024, 02:53:45) [GCC 13.2.0] I have went thorough your bat files, and executed them in the same order. Training speed is 20.5s/it on a secondary (no graphics) 3060 12GB using rank_5 (11489MiB / 12288MiB)

Kornel Hartung

thank you so much i will try to follow your suggestions

Furkan Gözükara

it requires 3.10.11 minimum

Furkan Gözükara

please download v25 configs it should fix

Furkan Gözükara

please download v25 configs it should fix

Furkan Gözükara

please download v25 configs it should fix

Furkan Gözükara

please download v25 configs it should fix

Furkan Gözükara

please download v25 configs it should fix

Furkan Gözükara

you are welcome also download v25 configs please

Furkan Gözükara

Thank you!

C

second part Update_Kohya_and_Fix_FLUX_Step2.bat fixes this issue

Furkan Gözükara

Hi, I've encountered this error during Step 1, how bad is it? How do I fix it? 13:32:06-340013 INFO Installing requirements from requirements_pytorch_windows.txt... Looking in indexes: https://download.pytorch.org/whl/cu124 Collecting torch==2.4.0+cu124 Using cached https://download.pytorch.org/whl/cu124/torch-2.4.0%2Bcu124-cp310-cp310-win_amd64.whl (2503.3 MB) Collecting torchvision==0.19.0+cu124 Using cached https://download.pytorch.org/whl/cu124/torchvision-0.19.0%2Bcu124-cp310-cp310-win_amd64.whl (5.9 MB) ERROR: Could not find a version that satisfies the requirement xformers==0.0.27.post2 (from versions: none) ERROR: No matching distribution found for xformers==0.0.27.post2

C

awesome. i saw someone also had fixed that way. so i say set restart computer and repeat until works :D

Furkan Gözükara

Edit: Actually it still doesn't work

EvilGiggles

this is due to accelerate please read section of the post : If Your Training Terminating at the Stage of Caching Latents

Furkan Gözükara

this is due to accelerate please read section of the post : If Your Training Terminating at the Stage of Caching Latents

Furkan Gözükara

what is your error exactly?

Furkan Gözükara

if your error is same : this is due to accelerate please read section of the post : If Your Training Terminating at the Stage of Caching Latents

Furkan Gözükara

this is due to accelerate please read section of the post : If Your Training Terminating at the Stage of Caching Latents

Furkan Gözükara

it auto handles. just make sure you dont use vram as much as possible and you have sufficient amount of virtual RAM set as shown in video

Furkan Gözükara

how to control the usage of VRAM, when started, RAM used, VRAM not. Before start usage of VRAM was quite low (0%~3%), windows RTX4070Ti 12GB, rank8_9500MB...json Thanks so much

A~_seXir🔞

The V21 version can be trained normally, but subsequent versions cannot be trained normally.

楠 陈

Yes, the latest V24 version is also stuck and cannot be trained, W10 system. Hope this problem can be fixed

楠 陈

I was training just fine with the previous versions, but the newest v24 version gets stuck at caching latents and never starts training. Even with the older configs.

EvilGiggles

2024-09-03 04:47:20 INFO Using DreamBooth method. train_network.py:281 INFO prepare images. train_util.py:1803 INFO get image size from name of cache files train_util.py:1741 100%|████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 3065.62it/s] INFO set image size from cache files: 0/23 train_util.py:1748 INFO found directory C:\SD\Training\KohySaraTrainingOutput\img6\1_sofia train_util.py:1750 vergara girl contains 23 image files INFO 23 train images with repeating. train_util.py:1844 INFO 0 reg images. train_util.py:1847 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1852 INFO [Dataset 0] config_util.py:570 batch_size: 1 resolution: (1024, 1024) enable_bucket: False network_multiplier: 1.0 [Subset 0 of Dataset 0] image_dir: "C:\SD\Training\KohySaraTrainingOutput\img6\1_sofia vergara girl" image_count: 23 num_repeats: 1 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, is_reg: False class_tokens: sofia vergara girl caption_extension: .txt INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:876 100%|██████████████████████████████████████████████████████████████████████████████████████████| 23/23 [00:00 flux_utils.py:55 INFO prepare split model flux_train_network.py:99 INFO load state dict for lower flux_train_network.py:106 INFO load state dict for upper flux_train_network.py:111 INFO prepare upper model flux_train_network.py:114 2024-09-03 04:47:31 INFO split model prepared flux_train_network.py:129 INFO Building CLIP flux_utils.py:74 INFO Loading state dict from flux_utils.py:167 A:/Forge-flux/webui/models/text_encoder/clip_l.safetensors INFO Loaded CLIP: flux_utils.py:170 INFO Loading state dict from flux_utils.py:213 A:/Forge-flux/webui/models/text_encoder/t5xxl_fp16.safetensors INFO Loaded T5xxl: flux_utils.py:216 INFO Building AutoEncoder flux_utils.py:62 INFO Loading state dict from C:/Users/James/Documents/New folder/ae.safetensors flux_utils.py:66 INFO Loaded AE: flux_utils.py:69 import network module: networks.lora_flux INFO [Dataset 0] train_util.py:2326 INFO caching latents with caching strategy. train_util.py:984 INFO checking cache validity... train_util.py:994 100%|███████████████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 23001.67it/s] INFO caching latents... train_util.py:1038 0%| | 0/23 [00:00

JamZam WamBam

For some reason before training starts it freezes on 'caching latents'. It started doing this since i installed v24. Have you any idea why? Thanks!

JamZam WamBam

I have had the same Error, the solution was: I used the "Dreambooth" Tab in Kohya no the "Lora" Tab .... :-D

Doc Snyder

great ty for update

Furkan Gözükara

In case anyone was wondering, I had set my max train epochs to 35 when it should have been set to 0. Problem solved! Thanks Furkan!

Michael

Okay. I just added you on Discord. Once you accept I can send it to you. Unless you want me to just post it in a channel?

Michael

i wonder if they properly test it. you can also do multi res training. generate multiple 1_ohwx man, set 1 folder 512x512, another one 768x768 and third one 1024x1024. it will train all. enable buckets too but lower resolution lowers the quality

Furkan Gözükara

2kpr has tested multi-resolution training with the new timestep sampling method "flux_shift" and got better results than single-res for likeness. faster too.

Dallin Mackay

Kohya need 3.10.11. you should follow this tutorial. 3.10.11 works with all open source AI apps i tested all :D https://youtu.be/-NjNy7afOQ0

Furkan Gözükara

both true ty for reply Douglas

Furkan Gözükara

are you using shared vram? for 8 gb to work you need to minimize VRAM usage. you need to have less than 500 mb usage before starting the training. can you verify both?

Furkan Gözükara

yes you need to have 5 checkpoints. can you send me your json file from discord so i can check

Furkan Gözükara

true. addition to this save as fp16

Furkan Gözükara

we save as float and 128 rank. if you mandatory need lower size, make save as fp16 - makes it 1 gb and also half rank makes it 512 mb

Furkan Gözükara

please refresh page and download v24. i think you checked while i was updating

Furkan Gözükara

Attached zip file is missing.

BecauseReasons

128 rank and dimensions, change to 64 for both for a decrease in size and some quality loss, I've even dropped to 32 for both and it's much smaller. 128 is best for quality 64 for a mixture of quality and size and 32 if you wanna prioritise size. Test all three (am I right in thinking you can resize the 128 to 64 later? This would avoid restraining)

s h a r k e y

Hey! Can I ask why the trained Loras are over 2GB in size? Should it not be few hundred MB?

Robert Arsene

Hi, I really enjoy this tutorial, thank you. I have one question: In Kohya, under "Save every N epochs" I have it set to 20 with my total epoch being 100. Theoretically I should have 5 checkpoints, however at the end of training I only have one. No other checkpoints are being saved. Any ideas on what I might be getting wrong? I am running this with the new torch 2.5. Thank you

Michael

Try with Python 3.10.11 (that's the one I'm using...) Tip: you can use PYENV (python version manager) to handle multiple python versions.

Douglas Davila

Hey, first of all, congratulations for this content. It's awesome. I've followed the guide and was able to configure and start the LORA's training process... however, the completion forecast when training a LORA with <10 face images is way far from my expectations: 24 hours to complete using RTX3060 8GB. I'm starting to guess if there's something wrong in my configuration step or if it's actually the expected time for 8GB VRAM GPU.

Douglas Davila

This is the error when I try to set up Kohya: Kohya_ss setup menu: 1. Install kohya_ss GUI 2. (Optional) Install CuDNN files (to use the latest supported CuDNN version) 3. (DANGER) Install Triton 2.1.0 for Windows... only do it if you know you need it... might break training... 4. (Optional) Install specific version of bitsandbytes 5. (Optional) Manually configure Accelerate 6. (Optional) Launch Kohya_ss GUI in browser 7. Exit Setup Select an option: 1 09:02:47-811404 INFO Kohya_ss GUI version: v24.2.0 09:02:47-819404 INFO Python version is 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] 09:02:47-822404 ERROR The current version of python (sys.version_info(major=3, minor=9, micro=0, releaselevel='final', serial=0)) is not appropriate to run Kohya_ss GUI 09:02:47-824404 ERROR The python version needs to be greater or equal to 3.10.9 and less than 3.11.0 D:\Software\AI Software\Kohya_Flux\kohya_ss>

Brett Baker

11700 is very vram requiring may fail. are you monitoring the vram usage from nvitop? also extract config again and load into lora tab. make sure that never loaded into dreambooth. and enable back shared vram

Furkan Gözükara

Kohya_GUI_Flux_Installer_24 With "Rank_5_11700MB_12_40_Second_IT" have out of memory. I use GTX 1070 8GB for display and RTX 3080 12GB for training. 56GB RAM. torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 486.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 9.76 GiB is allocated by PyTorch, and 1.05 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.

Vlad

great ty

Furkan Gözükara

I'll look into it first thing when I wake up 🤣 I'm sure you can agree this keeps you up all night. I did see you mention about not needing to go past batch size 2 so will need to look into it next. Ideally a lower batch size should give me greater accuracy on a smaller dataset (I think for styles or finetuning this might be worth while possibly) I will let you know my findings.

s h a r k e y

no difference. i just made it easier to do hugging face snapshot repo. i uploaded same files :D

Furkan Gözükara

this is totally a research topic. you need to test every case and compare. but after batch size 2 there weren't any actual speed improvement. have you tested it? bigger batch size = lower quality so if there isnt speed improvement you shouldn't increase batch size

Furkan Gözükara

Currently testing Flux.Dev from a different angle, how fast can I train a likeness of a person with low quality images on a RTX 4090, with the goal of using as little images as possible and completing the training as quickly as I can. I'm happy with 512 resolution (for now) and I have managed to train at a batch size of 8 before it hitting shared ram and slowing to a crawl. I wonder how far flux can be pushed, 4 batch size with 0.001 for 50 epoch then retraining the best Lora output at 0.0001 for maybe 20 epoch has been pretty fast and the likeness is decent. I will test more and hopefully you can weigh in on what you might suggest *if we were to consider speed being priority over quality*

s h a r k e y

Is it any difference in taking the models from the links on this post or if you take them from the official repository? Or why have you decided to link to https://huggingface.co/OwlMaster/realgg/ Instead of https://huggingface.co/black-forest-labs/FLUX.1-dev/tree/main ?

Charlie Krogars

yes looks like out of RAM. please set virtual ram as shown in tutorial : https://www.youtube.com/watch?v=nySGu12Y05k minute 39 second 59 start from there

Furkan Gözükara

Hi ! Followed all the steps and I get this error after only 5 min of starting the training: D:\Kohya_GUI_Flux_Installer_21>Windows_Start_Kohya_SS.bat 20:44:45-377372 INFO headless: False 20:44:45-423277 INFO Using shell=True when running external commands... Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. 20:45:01-113093 INFO Loading config... 20:49:34-936952 INFO Removing existing directory D:/Kohya_GUI_Flux_Installer_21/train_imgs\img/1_ohwx man... 20:49:34-948916 INFO Copy D:/TRAINING 2 to D:/Kohya_GUI_Flux_Installer_21/train_imgs\img/1_ohwx man... 20:49:35-013720 INFO Regularization images directory is missing... not copying regularisation images... 20:49:35-015715 INFO Done creating kohya_ss training folder structure at D:/Kohya_GUI_Flux_Installer_21/train_imgs... 20:50:11-458173 INFO Start training LoRA Flux1 ... 20:50:11-459197 INFO Validating lr scheduler arguments... 20:50:11-461193 INFO Validating optimizer arguments... 20:50:11-463176 INFO Validating D:/Kohya_GUI_Flux_Installer_21/train_imgs\log existence and writability... SUCCESS 20:50:11-464157 INFO Validating D:/Kohya_GUI_Flux_Installer_21/train_imgs\model existence and writability... SUCCESS20:50:11-466179 INFO Validating D:/Kohya_GUI_Flux_Installer_21/flux1-dev.safetensors existence... SUCCESS 20:50:11-467176 INFO Validating D:/Kohya_GUI_Flux_Installer_21/train_imgs\img existence... SUCCESS 20:50:11-468173 INFO Folder 1_ohwx man: 1 repeats found 20:50:11-469170 INFO Folder 1_ohwx man: 15 images found 20:50:11-470168 INFO Folder 1_ohwx man: 15 * 1 = 15 steps 20:50:11-471166 INFO Regulatization factor: 1 20:50:11-473133 INFO Total steps: 15 20:50:11-474157 INFO Train batch size: 1 20:50:11-475127 INFO Gradient accumulation steps: 1 20:50:11-476125 INFO Epoch: 200 20:50:11-477122 INFO max_train_steps (15 / 1 / 1 * 200 * 1) = 3000 20:50:11-479116 INFO stop_text_encoder_training = 0 20:50:11-479116 INFO lr_warmup_steps = 0 20:50:11-481143 INFO Saving training config to D:/Kohya_GUI_Flux_Installer_21/train_imgs\model\Best_v2_20240901-205011.json... 20:50:11-483134 INFO Executing command: D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 D:/Kohya_GUI_Flux_Installer_21/kohya_ss/sd-scripts/flux_train_network.py --config_file D:/Kohya_GUI_Flux_Installer_21/train_imgs\model/config_lora-20240901-205011.toml D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( 2024-09-01 20:50:21 INFO Loading settings from D:/Kohya_GUI_Flux_Installer_21/train_imgs\model/config_lora-20240901-205011.toml... train_util.py:4189 INFO D:/Kohya_GUI_Flux_Installer_21/train_imgs\model/config_lora-20240901-205011 train_util.py:4208 highvram is enabled / highvramが有効です 2024-09-01 20:50:21 INFO t5xxl_max_token_length: 512 flux_train_network.py:144 D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 INFO Using DreamBooth method. train_network.py:281 INFO prepare images. train_util.py:1803 INFO get image size from name of cache files train_util.py:1741 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 7424.42it/s] INFO set image size from cache files: 0/15 train_util.py:1748 INFO found directory D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man contains 15 image files train_util.py:1750 WARNING No caption file found for 15 images. Training will continue without captions for these images. If class token exists, it will be used. / train_util.py:1781 15枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 WARNING D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man\dan-1.png train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man\dan-10.png train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man\dan-11.png train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man\dan-12.png train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man\dan-13.png train_util.py:1788 WARNING D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man\dan-14.png... and 10 more train_util.py:1786 INFO 15 train images with repeating. train_util.py:1844 INFO 0 reg images. train_util.py:1847 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1852 INFO [Dataset 0] config_util.py:570 batch_size: 1 resolution: (1024, 1024) enable_bucket: False network_multiplier: 1.0 [Subset 0 of Dataset 0] image_dir: "D:\Kohya_GUI_Flux_Installer_21\train_imgs\img\1_ohwx man" image_count: 15 num_repeats: 1 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, is_reg: False class_tokens: ohwx man caption_extension: .txt INFO [Dataset 0] config_util.py:576 INFO loading image sizes. train_util.py:876 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 326.75it/s] 2024-09-01 20:50:22 INFO prepare dataset train_util.py:884 INFO preparing accelerator train_network.py:335 accelerator device: cuda INFO Building Flux model dev flux_utils.py:45 INFO Loading state dict from D:/Kohya_GUI_Flux_Installer_21/flux1-dev.safetensors flux_utils.py:52 Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Kohya_GUI_Flux_Installer_21\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Kohya_GUI_Flux_Installer_21/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'D:/Kohya_GUI_Flux_Installer_21/train_imgs\\model/config_lora-20240901-205011.toml']' returned non-zero exit status 3221225477. 20:50:25-856086 INFO Training has ended. Any idea what I'm doing wrong ?

DGMDATAINTERACTIF

how many images you have and are you training 1024x1024? i did over 80 trainings and trained on another person - with 24 images worked perfect. so many of my users also got perfect results so far. can you message me from discord and give me details?

Furkan Gözükara

the difference is T5 attention masking between fast and slow ones. it minorly improves quality but hugely reduces speed :)

Furkan Gözükara

yes i already use it in swarm ui. it is photo of ohwx man

Furkan Gözükara

This is weird, I tried Rank_3_18246MB_Slow.json and the likeness is not good. I trained 300 epoch and tested every 20 epoch. It gets overfitted before I get a good likeness. On ai-toolkit I get an extremely good results with the exact same dataset.

RayHell

Very nice! What are the configuration differences betwen rank 3-slow vs rank 4-fast? I can't find any config differences. Thanks!

Steve

It's an odd error since my python version is 3.10

Brett Baker

Here is the error: Kohya_ss setup menu: 1. Install kohya_ss GUI 2. (Optional) Install CuDNN files (to use the latest supported CuDNN version) 3. (DANGER) Install Triton 2.1.0 for Windows... only do it if you know you need it... might break training... 4. (Optional) Install specific version of bitsandbytes 5. (Optional) Manually configure Accelerate 6. (Optional) Launch Kohya_ss GUI in browser 7. Exit Setup Select an option: 1 09:02:47-811404 INFO Kohya_ss GUI version: v24.2.0 09:02:47-819404 INFO Python version is 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] 09:02:47-822404 ERROR The current version of python (sys.version_info(major=3, minor=9, micro=0, releaselevel='final', serial=0)) is not appropriate to run Kohya_ss GUI 09:02:47-824404 ERROR The python version needs to be greater or equal to 3.10.9 and less than 3.11.0 D:\Software\AI Software\Kohya_Flux\kohya_ss>

Brett Baker

I have 1 question, is there any Adetailer in Flux (as in Stable Diffusion) that can improve the 'face' of the subject?

ProudChinaLover

Please please please keep this dumb-proof approach. I love your approach: not too technical and explain stuff in detail. I really like it. I have 1 suggestion: you may have this inside your head while doing your recording; give a mini-answer/summary first before you explain a new aspect. (I think most of your parts are doing like this and I really like it). I always need some kinda non-technical explanations on how Generative AI works . Thanks for your hardwork. Please don't skip certain steps and assume everyone has watched everything of yours before . If you have some detailed explanations in another video, please show it on your youtube that I can refer to or put it inside the comment/description section. Thanks for your good work! I like it!

ProudChinaLover

can you try with swarmui? it is best one to test properly

Furkan Gözükara

Hey, i got some problem with the lora i train following your method When i run it on comfyui the lora dont have any effect Do you have an idea why?

howmuchsize

please start on a cmd window so we can see the error reason : https://pasteboard.co/o1TMA2cLHwLw.png

Furkan Gözükara

yes this is another reason good tips

Furkan Gözükara

you are welcome. sorry for delay hopefully tomorrow

Furkan Gözükara

thank you so much. runpod and massed compute tutorials already recorded hopefully will be published tomorrow

Furkan Gözükara

yes rank 3 and below uses fp8. fp16 requires min like 27 gb. also use base model file as fp16 it will be auto casted. i prefer that to be sure scripts working perfect

Furkan Gözükara

with Rank_4_16960MB_Fast.json i get around 5-5.5 second / it with rtx 3090. use that config instead of rank 3. almost same quality. also ai-toolkit trains 512 px lower res pay attention to that. 512px reduces quality

Furkan Gözükara

please open a cmd and call that file name. so we can see error reason : https://pasteboard.co/o1TMA2cLHwLw.png

Furkan Gözükara

I've tried multiple times and kohya_ss just won't install. It's odd since all of your other installers work fine. Here is the screen capture: https://www.youtube.com/watch?v=RK98WkZFOUg

Brett Baker

getting 9 s/it on 3090, and that's with torch 2.5 dev for speedup. its more than 2x slower than ai-toolkit the high-vram setting makes training even slower After 20 minutes training, the estimated time to complete has only dropped 5 minutes (s/it is consistently raising). 10 s/it now, meaning training on just 13 images will take at least 8 hours............................ that is unusable. Here is a screenshot of the speeds: https://imgur.com/a/4m2GUA5

Dallin Mackay

is the rank3 config supposed to be using fp8 unet and the fp8 base option? because it is

Dallin Mackay

you need to remove spaces from directory name

Dallin Mackay

I should have subscribed your patreon earlier. Very informative. Please keep this approach and make it not so technical and approachable. Thanks for your hard work. I love your explanations.

ProudChinaLover

WOW! Thanks!

Albert

yes i have recorded the video for both massed compute and runpod. currently editing the video. hopefully will be available tomorrow on the channel

Furkan Gözükara

Will there be instructions on how to do the training on the Runpod?

Albert

please open a cmd and run the bat file there so we can see the reason. so open a cmd on the folder , type the name of the bat file fully and type enter

Furkan Gözükara

no it is not ok. please manually download from the links shared in the post. but permission error is bad you should fix it. either your windows account or your antivirus preventing

Furkan Gözükara

I have a problem installing Install kohya_ss GUI when I press 1 the command line crashes.

PARALLAX

I have a problem installing Install kohya_ss GUI when I press 1 the command line crashes.

PARALLAX

When downloading the Flux models using the windows download script, it says, "Could not set the permissions on the file 'C:\Kohya_GUI_Flux_Installer_21\.cache\huggingface\download\flux1-dev.safetensors.4610115bb0c89560703c892c59ac2742fa821e60ef5871b33493ba544683abd7.incomplete'. Error: [Errno 13] Permission denied: 'C:\\tmp_dc590ede-e5ad-4df7-b77a-40bb55a883b8'. Continuing without setting permissions." Is is okay to ignore this error?

druhl

no it is fine. we don't use xformers for flux training. we use SDPA - torch optimization . it will eventually get fixed with next xformers version. it happens since we use latest torch :)

Furkan Gözükara

Following your new tutorial video to install Kohyah, after i have selecte step one everything looks good except i do get this error when scrolling up ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.9/5.9 MB 16.3 MB/s eta 0:00:00 ERROR: Could not find a version that satisfies the requirement xformers==0.0.27.post2 (from versions: none) ERROR: No matching distribution found for xformers==0.0.27.post2 the rest looks all good, should i be solving this issue before continuing with the installation?

eduardo

are you using torch 2.5? if so can you make a new install and dont upgrade torch 2.5 and try again? also how much vram you are using before starting to train? did you check it? how much RAM you have? did you disable shared vram? i can connect your pc and check out few hours later please message me from discord

Furkan Gözükara

Hi! I'm running into some trouble with your script while training on my NVIDIA RTX 3080 (8GB VRAM). I'm using the Rank_9_7514MB.json config, which should work on my setup, but it keeps crashing. What's Happening I get a CUDA out-of-memory error when the script tries to load the T5-XXL text encoder onto the GPU, around 7.7 GB VRAM usage: text_encoders[1].to(accelerator.device, dtype=weight_dtype) What I've Tried So Far: Set lowvram to true, enabled gradient_checkpointing, and switched to FP8 precision for the text encoder. Turned on mem_eff_save and tried using split_mode and split_qkv to save memory. Reduced the network alpha and dimensions from 64 to 4 and 8, respectively. Also, tried enabling bucket_no_upscale and reducing max token length, but no luck. Any idea what else I can try to make it work on my 8GB VRAM? Thanks!

esteban orozco

please follow this part : https://youtu.be/nySGu12Y05k 19:58 Setting the destination directory for saving training data 20:26 Preparing training data in Kohya GUI and generated folder structure 26:06 Kohya GUI and copying info to respective fields 27:01 "Train images image" folder path and its relevance

Furkan Gözükara

Hi! I got this error, tell me how to fix it? "No data found. Please verify arguments (train_data_dir must be the parent of folders train_network.py:322 images)"

Happy_in_happy

you are loading into inaccurate tab and very likely your config is corrupted. please follow the tutorial and use fresh config from zip file : https://youtu.be/nySGu12Y05k

Furkan Gözükara

yes you can but it uses more VRAM than swarmui and also i don't know if works with quantized models

Furkan Gözükara

sadly i dont have i use swarmui : https://youtu.be/nySGu12Y05k

Furkan Gözükara

do you have any flux comfyui basic workflow with maybe some prompt suggestion and lora option to start with? i was offline for past 3 weeks so I have to catch up

Jan Zhor

Can I use lora flux in forge sir?

Steven Nguyen

Hi there , salam, I installed this Kohya Flus at least 8 times and it is giving me errors even on SDXL training please solve , thanks : epoch 1/13 INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:668 Traceback (most recent call last): File "D:\KOHYA FLUX\kohya_ss\sd-scripts\sdxl_train_network.py", line 210, in trainer.train(args) File "D:\KOHYA FLUX\kohya_ss\sd-scripts\train_network.py", line 1088, in train text_encoder_conds is None UnboundLocalError: local variable 'text_encoder_conds' referenced before assignment steps: 0%| | 0/2000 [00:00 File "D:\KOHYA FLUX\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\KOHYA FLUX\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\KOHYA FLUX\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\KOHYA FLUX\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/KOHYA FLUX/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'C:/MODEL TRAINING/LIA_OHWX\\model/config_lora-20240829-122635.toml']' returned non-zero exit status 1. 12:27:28-029824 INFO Training has ended.

MUNEER AHMED

probably in accurate requirements install uninstall all python cuda and restart pc and reinstall exactly as shown here https://youtu.be/-NjNy7afOQ0 and reinstall kohya :

Furkan Gözükara

restart pc and set accelerator : https://youtu.be/adVhm9aI9Gc and restart training

Furkan Gözükara

it says it cant find your clip l file. give it accurately EnvironmentError( OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'.

Furkan Gözükara

I tried the steps above but my kohya immediately crashes when trying to launch. Any ideas?

visors

Hi! i didn't have this error in previous version and configs.. what is the problem? 2024-08-29 00:05:48 INFO t5xxl_max_token_length: 512 flux_train_network.py:144 Traceback (most recent call last): File "/home/rafis88/Kohya/kohya_ss/sd-scripts/flux_train_network.py", line 445, in trainer.train(args) File "/home/rafis88/Kohya/kohya_ss/sd-scripts/train_network.py", line 258, in train tokenize_strategy = self.get_tokenize_strategy(args) File "/home/rafis88/Kohya/kohya_ss/sd-scripts/flux_train_network.py", line 145, in get_tokenize_strategy return strategy_flux.FluxTokenizeStrategy(t5xxl_max_token_length, args.tokenizer_cache_dir) File "/home/rafis88/Kohya/kohya_ss/sd-scripts/library/strategy_flux.py", line 27, in __init__ self.clip_l = self._load_tokenizer(CLIPTokenizer, CLIP_L_TOKENIZER_ID, tokenizer_cache_dir=tokenizer_cache_dir) File "/home/rafis88/Kohya/kohya_ss/sd-scripts/library/strategy_base.py", line 46, in _load_tokenizer tokenizer = model_class.from_pretrained(model_id, subfolder=subfolder) File "/home/rafis88/Kohya/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2255, in from_pretrained raise EnvironmentError( OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'openai/clip-vit-large-patch14' is the correct path to a directory containing all relevant files for a CLIPTokenizer tokenizer. Traceback (most recent call last): File "/home/rafis88/Kohya/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/rafis88/Kohya/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/rafis88/Kohya/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/rafis88/Kohya/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/rafis88/Kohya/kohya_ss/venv/bin/python', '/home/rafis88/Kohya/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', '/home/rafis88/Kohya/kohya_ss/dataset/images/img/model/config_lora-20240829-000541.toml']' returned non-zero exit status 1. 00:08:05-153682 INFO Training has ended.

Hrabia M

Djudge

so true model is amazing :D

Furkan Gözükara

Can't wait to try new configs, I'm always surprised you get amazing results without using any full length pictures. I'm surprised the model doesn't think you have no legs.

Steve

i would say yes add more expressions and still go 100 epoch. it doesnt get overtrained quickly

Furkan Gözükara

it is a library error not a particular error to fix. my suggestion to you would be uninstall your python and reinstall your python exactly as in this video, also install cuda and c++ tools and then reinstall kohya with newest zip file : https://www.youtube.com/watch?v=-NjNy7afOQ0&feature=youtu.be follow this tutorial video step by step

Furkan Gözükara

i've gone step by step, but after clicking the "start training" button I got this, what should I do? To create a public link, set `share=True` in `launch()`. 15:00:19-460179 INFO Start training LoRA Flux1 ... 15:00:19-462150 INFO Validating lr scheduler arguments... 15:00:19-464150 INFO Validating optimizer arguments... 15:00:19-467724 INFO Validating C:/ace/kenta_joy_prepared\log existence and writability... SUCCESS 15:00:19-470153 INFO Validating C:/ace/kenta_joy_prepared\model existence and writability... SUCCESS 15:00:19-472159 INFO Validating C:/Kohya_GUI_Flux_v20/flux1-dev.safetensors existence... SUCCESS 15:00:19-474184 INFO Validating C:/ace/kenta_joy_prepared\img existence... SUCCESS 15:00:19-476155 INFO Folder 1_ohwx woman: 1 repeats found 15:00:19-478187 INFO Folder 1_ohwx woman: 55 images found 15:00:19-479183 INFO Folder 1_ohwx woman: 55 * 1 = 55 steps 15:00:19-480155 INFO Regulatization factor: 1 15:00:19-482156 INFO Total steps: 55 15:00:19-484155 INFO Train batch size: 1 15:00:19-485157 INFO Gradient accumulation steps: 1 15:00:19-487620 INFO Epoch: 100 15:00:19-488620 INFO max_train_steps (55 / 1 / 1 * 100 * 1) = 5500 15:00:19-489620 INFO stop_text_encoder_training = 0 15:00:19-490619 INFO lr_warmup_steps = 0 15:00:19-493620 INFO Saving training config to C:/ace/kenta_joy_prepared\model\kenta_flux1_ohwx_woman_v01_20240828-150019.json... 15:00:19-496620 INFO Executing command: c:\Kohya_GUI_Flux_v20\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 c:/Kohya_GUI_Flux_v20/kohya_ss/sd-scripts/flux_train_network.py --config_file C:/ace/kenta_joy_prepared\model/config_lora-20240828-150019.toml c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( Traceback (most recent call last): File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1603, in _get_module return importlib.import_module("." + module_name, self.__name__) File "C:\Users\ace\AppData\Local\Programs\Python\Python310\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\transformers\models\clip\image_processing_clip.py", line 21, in from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\transformers\image_processing_utils.py", line 21, in from .image_transforms import center_crop, normalize, rescale File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\transformers\image_transforms.py", line 49, in import tensorflow as tf File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\tensorflow\__init__.py", line 38, in from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow # pylint: disable=unused-import File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 85, in raise ImportError( ImportError: Traceback (most recent call last): File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common causes and solutions. If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 710, in _get_module return importlib.import_module("." + module_name, self.__name__) File "C:\Users\ace\AppData\Local\Programs\Python\Python310\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 20, in from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection File "", line 1075, in _handle_fromlist File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1594, in __getattr__ value = getattr(module, name) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1593, in __getattr__ module = self._get_module(self._class_to_module[name]) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1605, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback): Traceback (most recent call last): File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common causes and solutions. If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\Kohya_GUI_Flux_v20\kohya_ss\sd-scripts\flux_train_network.py", line 13, in from library import flux_models, flux_train_utils, flux_utils, sd3_train_utils, strategy_base, strategy_flux, train_util File "c:\Kohya_GUI_Flux_v20\kohya_ss\sd-scripts\library\flux_train_utils.py", line 17, in from library import flux_models, flux_utils, strategy_base, train_util File "c:\Kohya_GUI_Flux_v20\kohya_ss\sd-scripts\library\train_util.py", line 48, in from diffusers import ( File "", line 1075, in _handle_fromlist File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in __getattr__ value = getattr(module, name) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in __getattr__ value = getattr(module, name) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 700, in __getattr__ module = self._get_module(self._class_to_module[name]) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 712, in _get_module raise RuntimeError( RuntimeError: Failed to import diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion because of the following error (look up to see its traceback): Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback): Traceback (most recent call last): File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common causes and solutions. If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message. Traceback (most recent call last): File "C:\Users\ace\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\ace\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "c:\Kohya_GUI_Flux_v20\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['c:\\Kohya_GUI_Flux_v20\\kohya_ss\\venv\\Scripts\\python.exe', 'c:/Kohya_GUI_Flux_v20/kohya_ss/sd-scripts/flux_train_network.py', '--config_file', 'C:/ace/kenta_joy_prepared\\model/config_lora-20240828-150019.toml']' returned non-zero exit status 1. 15:00:34-985849 INFO Training has ended.

Jan Zhor

yes totally! Thank you ❤️❤️❤️

Samael1976

Hi, how do you think higher training image will do? I tried your setting with 40 images and it works really well (100 epoc/4000steps). But now I want the model to have more expression of that person, aiming for 75-100 images. do you think the model needs to be about the same epoch? (100 epoc/7000 steps) just don't want it to be overfitted

Moonlyte

whoever reads using fresh config from zip and loading into LoRA fixed issue for Samael. sometimes configs may get corrupted

Furkan Gözükara

So there's a problem with RunPod... I'll do it all again and tell you. Because I assure you that I had taken your configuration file and modified by hand only the parts I needed. While the one I sent you is the one I "saved" and downloaded from the runpod server. I'll let you know soon. Thank you!

Samael1976

you are not using please look all these differences between your config and my zip included Rank_1_28700MB_Slow.json https://pasteboard.co/K1jlQFvFtcsL.png

Furkan Gözükara

- I'm using the template: RunPod Pytorch 2.1 runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 (this is what you wrote in your guide) - my config is the same of yours look better, and also with yours (tested in two persons) is not working - obviously it is modified in the epoch numbers, and I put the saving at each epoch, since I have a dataset of 900 images) and as regards the optimizer, I chose AdamW (but I repeat, while leaving AdaFactor with your parameters supplementary does not work). - I changed the sample prompt and the sampler for the prompt - I changed the learning rate (because with 900 images of the dataset) Everything else is identical to yours. But I repeat, even using yours, modifying ONLY the epochs (15) and saving at each epoch, it doesn't work. - I'm using Lora Tab

Samael1976

you are not using accurate template. use pytorch template written in guide please also your config is inaccurate compare it with my zip file config you will see differences. please use lora tab to load after extracted from zip file. i recorded the video editing to publish

Furkan Gözükara

This is the error: Tests carried out - Both with Adafactor (and all your settings) - Both with AdamW (and AdamW8bit) - Tried setting the TE learning rate to 0 "learning_rate_te": 0, "learning_rate_te1": 0, "learning_rate_te2": 0, FIRST STEP INSTALLATION LOG: https://silverider76.iliadboxos.it:27440/share/IKEz4WK2mkaktgh6/01%20-%20First%20Step%20Installation.txt SECOND STEP INSTALLATION LOG: https://silverider76.iliadboxos.it:27440/share/mRMx-MpQKCliFNiI/02%20-%20Second%20Step%20Installation.txt PRINT TRAINING COMMANDS: https://silverider76.iliadboxos.it:27440/share/Q9-JLRx79qecFpOa/03%20-%20Printed%20Training%20Command.txt ERROR LOG: https://silverider76.iliadboxos.it:27440/share/TVPgGmBheL3sLrgA/04%20-%20Error.txt JSON CONFIGATION FILE: https://silverider76.iliadboxos.it:27440/share/_1x3ROpOB9TqbxA6/Rank_1_28700MB_Slow_AniToon.json

Samael1976

sadly FP16 doesnt work . last time i tested it didnt train. i tested runpod and it was working . what error you got on runpod can you let me know? i will try again and fix if broken

Furkan Gözükara

Hi Furkan! Today (yesterday it worked) I made a runpod with version 17. I used different configuration files (both 48 and 24) but none of them worked. Second thing (less important) I tried to make it work at home that I have a 2060 with 12gb. It doesn't work at home either, it always goes out of cuda (even though I tried even lower ranks). I can tell you that obviously I tried to set FP16 (and also FP8) since the 2060 doesn't support BF16. But probably Kohya (I'm not sure) has a bug because when I go to see the execution of the script, in the accelerate.exe phase it always reads BF16 and never FP16. If you can test Runpod (maybe with optimizer AdamW) you would do me a huge favor (because today we tried in two and nothing).

Samael1976

you are right. apply_t5_attn_mask is improving stability of the training and increases VRAM. let me know final results

Furkan Gözükara

but it may have nothing to do with the json, and it is simply the dataset. It still seems strange to me that when training at 1024x1024 and 88 images it looks worse than when training at 768x768 and 44 images. I will have to repeat the training with the 44 images at 1024x1024 with the 3rank json to discard, otherwise I will simply use the previous version of json that you had placed (24 gb json). I was reviewing the two json and the only difference is this parameter: apply_t5_attn_mask flux. So I guess the problem is mine and has nothing to do with the json. I understand that this parameter increases ram memory consumption but improves the stability of the training.

daniel mendoza

I just did the test in swarm-ui and it actually comes out bad, it even comes out worse than how it comes out through comfyui. To compare I used the two loras, and the truth is that the one I made with the json 24gb looks much better. Compare: 24gb: https://i.postimg.cc/fbJ3sTKR/1225-Amateur-photography-of-a-ohwx-woman-wea-flux1-dev-420838505.png 3 rank: https://i.postimg.cc/YCX4TmQP/1231-Amateur-photography-of-a-ohwx-woman-wea-flux1-dev-1634148689.png

daniel mendoza

can you try with swarmui to see if it produces such error? current config very similar to latest config you can compare

Furkan Gözükara

The two images are made with the same configuration and the same workflow.

daniel mendoza

https://i.postimg.cc/2rzdcdd3/generacion.png

daniel mendoza

yep hopefully recording today

Furkan Gözükara

how do you generate images with swarmui? resolution doesnt have such impact. previous config was also 1024x1024. check your image generation settings and set cfg to 1 dont forget

Furkan Gözükara

I applied the configuration profile "Rank_3_18246MB_Slow.json - 8bit - 8.62 second / it", and at the end of the training, and generating test images, I noticed that they all come out with vertical lines. The same thing didn't happen to me when I used the previous profile you had created called "24_GB_GPUs". You also have to consider that I used twice the amount of images. With the rank 3 profile I used 88, when I had previously used 44. In addition, the training with rank 3 was at 1024x1024 and previously it was with 768x768. I would have to perform a new training with the initial 44 images in 768x768 to confirm that it is a profile error, but it should not generate those obnoxious vertical stripes. Check: https://i.postimg.cc/fLr9GSXC/ejemplo-1.png https://i.postimg.cc/vBBn2t1D/ejemplo-2.png

daniel mendoza

Hi! Are you gonna record a video on the topic?

Vyacheslav Belyaev


Related Creators