NokiMo
Furkan Gözükara
Furkan Gözükara

patreon


Kohya FLUX Fine Tuning (Full Checkpoints) / DreamBooth Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute

The very best complete workflow and configurations for full Fine Tuning of FLUX Models with as low as 6 GB VRAM GPUs on Windows and Cloud

Patreon exclusive posts index to find our scripts easily, Patreon scripts updates history to see which updates arrived to which scripts and amazing Patreon special generative scripts list that you can use in any of your task.

Join discord to get help, chat, discuss and also tell me your discord username to get your special rank : SECourses Discord

Please also Star, Watch and Fork our Stable Diffusion & Generative AI  GitHub repository and join our Reddit subreddit and follow me on LinkedIn (my real profile)

=======

Full main tutorial : https://youtu.be/FvpWy1x5etM

Latest zip file : Kohya_FLUX_DreamBooth_LoRA_v32.zip

Quick new Massed Compute install (Oct 2025) : https://www.youtube.com/watch?v=Ym9rdfy2VZ0

29 October 2025 Update v32

2 October 2025 Update

Windows Requirements

Massed Compute (Recommend Cloud) :

RunPod (Cloud):

13 August 2025 Update

13 July 2025 Update

29 May 2025 Update

13 May 2025 Update

4 May 2025 Update

20 November 2024 Update

17 November 2024 Update

31 October 2024 Update

14 October 2024 Update


Comparisons

Fine Tuning / DreamBooth vs LoRA Comparisons

Best Found Epochs and Step Counts and Durations

Conclusions

16 September 2024 Update

Why Fine Tuning Better Than LoRA

What Is Not Useful

Kohya FLUX Fine Tuning (Full Checkpoints) / DreamBooth Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX Fine Tuning (Full Checkpoints) / DreamBooth Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX Fine Tuning (Full Checkpoints) / DreamBooth Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX Fine Tuning (Full Checkpoints) / DreamBooth Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX Fine Tuning (Full Checkpoints) / DreamBooth Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute Kohya FLUX Fine Tuning (Full Checkpoints) / DreamBooth Training Full Tutorial For Local Windows and Cloud RunPod and Massed Compute

Comments

it is fixed already thankfully

Furkan Gözükara

Where do we stand with GitHub atm?

Michael Harleman

it is still there and it is even easier to use : https://pasteboard.co/2MRoDFYlhLVC.png

Furkan Gözükara

Hello and thank you for your amazing work! How do you activate 'Regional Prompting' that you show at 53:42? If it no longer exists, what equivalent can we use?

mathieu

what issue you having with Qwen we have lora configs already

Furkan Gözükara

Thanks for update to Flux training using SRPO. This model has finally improved image quality above flux-dev. I am looking forward to trying Qwen when I can get past the out of memory errors since I think it will further the quality and realism.

Ec Jep

For FLUX DreamBooth it shouldnt make big difference. But for bigger models like Qwen Image training or Wan training it would make more significant difference. Also I just tested RTX 6000 PRO on Massed Compute and it was way more faster than my RTX 5090 for FLUX training which doesnt make sense :D so the difference will come from avoiding block swapping during difference. still RTX 6000 PRO will help you almost at every new model since it has 3x VRAM

Furkan Gözükara

Hi Furkan, I'm planning to switch from an RTX 5090 to an RTX Pro 6000. Do you know if the training with the Pro will be faster or if there may be problems because of the drivers?

puk

according to the number of images only number of epoch changes. more images needs lesser epochs. so for 100 images go up to 100-150 epochs and compare checkpoints. i havent tested that model yet but i should test thanks

Furkan Gözükara

So, if I'm using 100 images, not 256. Is it the same settings, or do I need to adjust it a bit? Also, have you seen the Flux-Dev2Pro that claims to produce better LoRAs than training with Flux-Dev? https://huggingface.co/ashen0209/Flux-Dev2Pro I wonder if you'd like to do the research for us?

ClubJulze

no you didnt make mistake. Krea may require lower steps or higher steps. I noticed same thing and it depends on the dataset. so compare checkpoints and if you see too overfit sligthly reduce LR and train again

Furkan Gözükara

Hi Professor, The training configs you shared worked amazingly for Flux Dev Dreambooth—thank you so much for providing them. However, when I reran training with the exact same config and dataset (19 images at 1024×1024) but swapped in Flux Krea, the results degraded significantly. I saw far more noticeable AI errors and visual artifacts, especially concentrated around the eyes and facial regions, and these issues became more severe as training progressed. With Flux Dev, the outputs were clean and stable with a sweet spot around 5,000 steps. But with Krea, the images quickly became blurry, and overfitting artifacts and noise appeared much earlier, making the results look distorted. Did I potentially miss something in the setup? I used the same 48GB_GPU_28200MB_6.3_second_it_Tier_1.json config, only replacing the Flux Dev model with Krea. Thanks again for your guidance!

Legenderox Gaming

yes on my list after Qwen. Preparing an app

Furkan Gözükara

Can you please make a tutorial for wan 2.2 lora training or fine-tuning?

Brinet Claus

thanks i will check

Furkan Gözükara

thanks

Furkan Gözükara

Here is the link to the Chroma HD version (sorry I hit enter before pasting): https://huggingface.co/lodestones/Chroma1-HD/tree/main

Edward Ten Eyck

Thank you for all of the wonderful installers and presets! If you decide to check Chroma there is version 1-50 but also an HD version. If you decide to investigate I am wondering if there is a benefit of training Loras or doing a fine tune with source image resolutions higher than 1024x1024?

Edward Ten Eyck

ai dubbing arrived as a feature before this video. google said they will add to my older videos too hopefully

Furkan Gözükara

Weird, on this video it’s not possible to turn on the AI dubbing like on your previous ones maybe you have something in the video settings?

starysmok

first 73 minutes is windows part

Furkan Gözükara

Is there somewhere that you have these separated by device? I am on windows and it would be a lot easier just to see the windows information vs all of it in one video. It may not be the case but I thought I would ask.

Damien Rufus

Hello, I tried a fresh install of v30 in windows 10 and the cmd crashes when trying to start with option 1 or 6. I have run the windows_kohya_update.bat and the gradio_temp_fix.bat. Any ideas? Thanks!

Steve

it is still working and your error not giving me much info sadly. i am daily training on runpod and on windows no issues. pro 6000 should work perfect too.

Furkan Gözükara

I had this running fine on my windows11 a month or so ago, but am currently dealing with this LibUV error: "torch.distributed.DistStoreError: use_libuv was requested but PyTorch was built without libuv support, run with USE_LIBUV=0 to disable it." I have a Pro 6000 with latest drivers, cuda 12.8, Python 3.10.11 and Python 3.11.9 as well as C++ Tools. Can't seem to find a solution online or through Claude/Gemini. Not sure if it matters, but I do also have a 3060ti plugged in just for my monitors and am not trying to use it for Kohya training.

aiworkflowzzz

i dont know some people recently reporting issues with gradio maybe related to that

Furkan Gözükara

you can manually give paths it will work. also parameters will appear after you give path of flux dev model

Furkan Gözükara

can you test with our example dataset to verify? https://www.patreon.com/posts/114972274

Furkan Gözükara

Hi, I tried installing kohya in another computer and it "runs" but shows nothing on the web UI neither with other explorers. I've already installed all of the mandatory turtorials. Any clue wy this is happening?

Daniel Cardona Ramirez

Hello! I've been using this setup on Massedcompute and it's always great. In the last few days, It looks like the parameters in Kohya for setting the location of clip, ae, and t5xxl have stopped being there. Do you know if this matters? In the command, it shows them being loaded from downloads folders, to they seem to be used. I just liked it when I could move them into the SwarmUI directory but still use them in Kohya.

Daniel Sturtevant

Hello, I'm using the configuration you have been using for 48 gb ram. I rented an H100 GPU with 20 image dataset but the images generated after training does not even match with my training data. Where do you think I might be going wrong? all the parameters are loaded from 15 image dataset config file. Can you please advise?

badcipher

ok this is really hard normally. for this you must have photos together and captioned together. so you will have you, your wife, both photos. caption them as ohwx man, bbuk woman, ohwx man at the left and bbuk woman at the right (lets say together photo). this way you can train

Furkan Gözükara

Hello, I've been able to train models of myself succesfully thanks to your amazing tutorial. You're an absolute PRO!! But hey, I've got some questions, hope you can reply: What should I do if I want to train a model that is able to make pics of me and my wife? How should the dataset be? Should I include only pics of both together or should I use pics of her and me alone and some together? What would the caption be and how should I caption each in a quick way? Can I train a "style" in this same finetuning trainment? As flux tends to always make such of "perfect" and professional photos and people notice they're AI generated. I've seen some amazing pics that look just natural and realistic since they seem taken by an amateur phone and sometimes with low light, noise, flashes and those kind of things that make an image feel life-like. What should I do to get this style when I want? I've tried some LoRAs but they tend to cause deformations

Daniel Cardona Ramirez

yes something is changed. RTX 3090 is now slower than before

Furkan Gözükara

I wouldn't say A6000 speeds compare to 3090 unless something is broken right now. My ftw3 3090 is 11s/it average on your 7 sec 24gb config. that's with torch upgrade

Dallin Mackay

i think can be possible but probably you need linux. we have multi gpu training here : https://youtu.be/-uhL2nW7Ddw

Furkan Gözükara

hi, what do you think about fine tuning on dual rtx 5090? is it possible? what could be the batch size and other params?

jw

what model is this SDXL?

Furkan Gözükara

So I've finally decided to try SwarmUI but I can't get it to work, when I click "generate" it gives an error that says it can't find my model even though I've already put my safetensors models on the Stable-Difussion folder. Do you have any clue of what am I doing wrong? -I've converted the models to FP8 with Kohya Error shown: All available backends failed to load the model 'E:/KOHYA_SS_SDECOURSES/SwarmUI/Models/Stable-Diffusion/LALITO_V5-000125_FP8.safetensors'. Possible reason: ComfyUI execution error: No such file or directory: "E:\\KOHYA_SS_SDECOURSES\\SwarmUI\\Models\\Stable-Diffusion\\LALITO_V5-000125_FP8.safetensors"

Daniel Cardona Ramirez

no each one is checkpoint. they should work perfect at swarmui on 4090

Furkan Gözükara

Hello, I finally finished the training after 48 hours on my RTX 4090 But I don't know if I did something wrong, each output file is over 22 gb and it makes it impossible to use them as a checkpoint as image generation takes way too long. Maybe there are some steps left that I didn't complete?

Daniel Cardona Ramirez

you are welcome. also for flux sadly reg images not working great as in sdxl but you can try

Furkan Gözükara

hi you need to report but this is exactly why i dont recommend samples. they always cause problems.

Furkan Gözükara

possible. have both of you in same pictures. caption like ohwx man bbuk woman. so model will learn. have both and seperate images. use imagename.txt for captions

Furkan Gözükara

Is it possible to train two people at the same time? I would like to create photos of me and my wife

Daniel Cardona Ramirez

im running kohya ss to train a flux model. all goes well but when it tries to generate a sample images it crashes; Traceback (most recent call last): File "D:\Kohya_FLUX_DreamBooth_LoRA_v29\kohya_ss\sd-scripts\flux_train.py", line 850, in train(args) File "D:\Kohya_FLUX_DreamBooth_LoRA_v29\kohya_ss\sd-scripts\flux_train.py", line 708, in train flux_train_utils.sample_images( File "D:\Kohya_FLUX_DreamBooth_LoRA_v29\kohya_ss\sd-scripts\library\flux_train_utils.py", line 70, in sample_images text_encoders = [accelerator.unwrap_model(te) for te in text_encoders] File "D:\Kohya_FLUX_DreamBooth_LoRA_v29\kohya_ss\sd-scripts\library\flux_train_utils.py", line 70, in text_encoders = [accelerator.unwrap_model(te) for te in text_encoders] File "D:\Kohya_FLUX_DreamBooth_LoRA_v29\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 2866, in unwrap_model return extract_model_from_parallel(model, keep_fp32_wrapper, keep_torch_compile) File "D:\Kohya_FLUX_DreamBooth_LoRA_v29\kohya_ss\venv\lib\site-packages\accelerate\utils\other.py", line 176, in extract_model_from_parallel has_compiled = has_compiled_regions(model) File "D:\Kohya_FLUX_DreamBooth_LoRA_v29\kohya_ss\venv\lib\site-packages\accelerate\utils\other.py", line 70, in has_compiled_regions if module._modules: AttributeError: 'NoneType' object has no attribute '_modules' steps: 1%|▍

Casper

true it is not necessary anymore. please use Windows_RTX5000_Series_Upgrade_Run_After_Install_Finished.bat

Furkan Gözükara

I can't find the Windows_Install_Torch_2_5_Dev_Huge_Speed_Up.bat in the last downloadable file

Daniel Cardona Ramirez

When I load the workflow into Comfyui within Swarm it looks like the error happens after the second Ksampler where the resampled face becomes either black or noise. Once this error occurs the original Ksampler will only make a black image or noise until I load a different model.

aiworkflowzzz

Not sure why, but running trained model in SwarmUI it tends to break Swarm once you use Segmentation. If I run same workflow in standalone Comfy with Segmentation (yolo) it works fine. In Swarm once I try yolo once it'll create a black box around face and then all subsequent generations are either black or all noise. Doing a standard generation with just Flux.Dev seems to reset it, but once I put segmentation back in it breaks again. Running Blackwell 6000 so maybe it's a driver issue?

aiworkflowzzz

It worked perfectly in an RTX 5090. 150 images-100 epochs at 3.89 s/it. I have checked it fuses the created character with the class. I guess to avoid that, regularization images are needed. It is marked in the video some update on this. I hope you discuss that issue in the future. Thank you!

Juanmyth

just ohwx man token is sufficient. dont use very detailed captions

Furkan Gözükara

i trained few days ago on runpod and even 8 gpus worked perfect. what is the error you getting?

Furkan Gözükara

Doc, can you or your team check on the install for Latest zip file : Kohya_FLUX_DreamBooth_LoRA_v28.zip as it relates to Runpod? I am selecting the correct build: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel However, after install, on a RTX 4090, it will fail on every occasion training attempts to begin. I don't have the error in front of me now, but I was waiting a few days assuming someone else would mention and it would be resolved. It throws an error to do with drivers or GPU, and it's been like that since the release of v28 if I'm not mistaken.

Pew

I don't understand why you can perfectly trigger your image through a name. Do I need to restore more tag words every time。

诗杰 温

nope definitely not. but keep going and see if works.

Furkan Gözükara

Hi, is this expected? I've seen it once before on another install: [05/14/25 16:02:10] ERROR Failed to install requirements: ERROR: THESE PACKAGES DO NOT MATCH THE setup_common.py:194 HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. onnxruntime-gpu==1.19.2 from https://files.pythonhosted.org/packages/92/82/95e3446724f9e99299c40495d 5e04cb7cb319c3a4836c724dbdceb2facd9/onnxruntime_gpu-1.19.2-cp310-cp310- win_amd64.whl (from -r requirements_windows.txt (line 4)): Expected sha256 b895920bb5e4241299f68874e0becdc2635ea0142939c11e7ff5ae5b28993613 Got 99331ed68e09dda1b16d6241236fea9e27810dd091d6c697593232461743966b [notice] A new release of pip is available: 23.0.1 -> 25.1.1 [notice] To update, run: python.exe -m pip install --upgrade pip 'accelerate' is not recognized as an internal or external command, operable program or batch file. ERROR Error occurred while running command: accelerate config default setup_common.py:683 ERROR Error: Command 'accelerate config default' returned non-zero exit setup_common.py:684 status 1.

Pew

no 1024x1024 is best for flux. enabling buckets can certainly help depend on dataset.

Furkan Gözükara

Hi i have different resolution photos in my dataset some of them are full body some for example 1080x1920 other ones are 1024x1024 turning buckets on helps for realism second question changing max resolution from 1024x1024 to something bigger helps for quality in training ?

Marcin Jarosz

focus on whatever you want to generate after training. 1024x1024 works best if possible. so have all poses in training dataset that you want to generate after training. if you want mid shot have it, if you want close shot have it, actually having close shot helps better face and face inpainting

Furkan Gözükara

In my use case—athlete photography—full-body images are very common, and the style of the athlete’s poses and movements is also relevant. Given that, would you recommend building the dataset with a mix of full-body photos and close-up face shots, including a wide range of poses? Or should I focus exclusively on full-body photos? I can photograph my client specifically for dataset creation, so I have full control over the material. My main question is whether pose variety should be prioritized during training, or if pose control is better handled later during image generation (e.g., with ControlNet), allowing me to focus the training solely on the subject’s face and body. And in the case of using mostly full-body shots, should I still keep the resolution at 1024x1024?

Geziel Machado

if possible collect high res dataset and avoid AI upscale.

Furkan Gözükara

should I use SUPIR to upscale images before I use the ULT Image Processing when prepairing my dataset?

4401 4401

hello. the configs are suitable for everything literally and already tested. what matter is your dataset

Furkan Gözükara

this learning rate is really great for fine details. but perhaps you can reduce by 0.1 and do more epochs

Furkan Gözükara

This turned out to be great, but can you suggest a few morning learning rates for increased fine details maybe.

Ken

Hello, I was wondering if you have a template for character lora training for Illustrious ? (anime and realistic) Thank you!

Armando Alva

you have to reduce lora strength / scale. with flux loras are very overtrained thus it is overwriting your own trained model data

Furkan Gözükara

As soon as i try to use other LORAs with my finetuned model quality declines so much. Any suggestions?

ole asdsad

hi i will tell MC to add that file. meanwhile you can use downloader to download https://www.patreon.com/posts/114517862

Furkan Gözükara

Hi, everything worked flawlessly for me, but it seems that a very important component does not work anymore, while it did in the past for me. When running everything in MassedCompute, after creating the yolov8 folder in Home/apps/Models and pasting in the face_yolov9c.pt file, I get an error when generating images. In the past it used to work just fine, so I assume some update in SwarmUI does not recognize it anymore? The error I get is: "ComfyUI execution error: Model face_yolov9c.pt not found, or yolov8 folder path not defined" Since the yolo file works wonders for the images I want, could you please take a look and identify why what you show in your video tutorial does not work anymore?

Sean Prasing

sure here dataset : https://www.patreon.com/posts/114972274

Furkan Gözükara

Hello, you have a great dataset for training. Tell me, can I download you dataset and create my dataset from via controlnet + openpose?

Dmitry

follow every step of this video 100% then reinstall. i installed yesterday and trained 0 issues : https://youtu.be/DrhUHnYfwC0 if you upgrade gold member i can connect your pc and install for you

Furkan Gözükara

Hey, i tried re-installing from the beginning when Kohya_FLUX_DreamBooth_v19.zip is downloaded and it still shows the same error. Any idea on how I can tackle this?

Certified

yep begin from beginning. you can use your older downloaded flux model files. you can move them

Furkan Gözükara

By fresh install, do you mean I have to repeat the entire process in the tutorial, starting from when we download "Kohya_FLUX_DreamBooth_v19.zip" from this page? I just want to make sure. Thanks for the quick reply! That caught me by surprised haha

Certified

hello. you cant move python based app installations. you can only move static files like models etc. so make a fresh install and it will be fixed. this is because python venv is associated based on folder paths

Furkan Gözükara

For some reason, whenever I run "Windows_Start_Kohya_SS", the Kohya gui doesn't appear. When I did it the first time, it worked during installation. All I did was move the "Kohya_FLUX_DreamBooth_v19" from the Downloads folder to another folder. Any idea what I can do? I reinstalled Kohya + Updates but it still isn't working. This is what it said when I tried to run as administrator as last resort: "The system cannot find the path specified. 'gui.bat' is not recognized as an internal or external command, operable program or batch file. Press any key to continue . . ." Update 1: I tried opening the gui.bat inside the Kohya folder and this is what it says: "Starting the GUI... this might take some time... Traceback (most recent call last): File "C:\Users\___\Desktop\___\Kohya_FLUX_DreamBooth_v19\kohya_ss\kohya_gui.py", line 6, in import gradio as gr ModuleNotFoundError: No module named 'gradio'"

Certified

still python 3.10. sorry for late reply

Furkan Gözükara

Should we change the python version or keep using 3.10?

Robert Arsene

done with v18

Furkan Gözükara

sure doing right now

Furkan Gözükara

@SECourses: Would you please update the file plz ? I would like to try using 5090 to use Kohya (Windows) to train Flux Dreambooth

ProudChinaLover

of course. A100 and H100 are really faster

Furkan Gözükara

Ah, got it. I'll try starting another pod. By the way, since you've already tested the H100, does it deliver fine-tuning faster than the 4090?

Matheus Flores

cloud machine HDD died so all data lost :D

Furkan Gözükara

awesome

Furkan Gözükara

i trained on h100 with 0 issues. sadly dont know exact reason but i would say your pod is broken

Furkan Gözükara

I'm getting this error while trying to train on RunPod using an H100. I saw that someone else had the same error and managed to fix it, but they didn't explain how. I've tried everything I know, but nothing works. What could it be? ------------------------------------------------------------------- running training / 学習開始 num examples / サンプル数: 445 num batches per epoch / 1epochのバッチ数: 445 num epochs / epoch数: 4 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1600 steps: 0%| | 0/1600 [00:00 train(args) File "/workspace/kohya_ss/sd-scripts/flux_train.py", line 690, in train accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2303, in clip_grad_norm_ self.unscale_gradients() File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2253, in unscale_gradients self.scaler.unscale_(opt) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 338, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 260, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. steps: 0%| | 0/1600 [00:01 sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python3', '/workspace/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/workspace/Final_Train_Models/model/config_dreambooth-20250226-191450.toml']' returned non-zero exit status 1. 19:17:15-651662 INFO Training has ended.

Matheus Flores

I'm getting the same error. How did you manage to make it work?

Matheus Flores

ye flux will bleed for this. you can train style but you cant train multiple concepts at once like that. i see now. flex can be tested for this purpose i agree

Furkan Gözükara

I have a captioned dataset of 714 images. The purpose is to generate city, settlement, and region maps for a D&D campaign. I've categorized the images based on what it depicts: a settlement, a landscape, and also the perspective of the image: top-down view or at an angle. There's another category where the images are taken from a specific source, I have a unique trigger word for that so I can emulate the art style. Finally, I have many images that have the filename appended in their respective captions. These file names are the locations in the images. My intent is for the model to be trained enough to where I can use that location's name to generate an image that will be similar to the original. I need the model to be very well-trained, so when I hear Flux can degrade easily at a certain point in training, that worries me.

Nicholas Agranoff

it depends on your purpose. what is your purpose?

Furkan Gözükara

Would you say Flux-dev is worth full Dreambooth finetuning? I'm concerned about it breaking down quickly in comparison to SDXL and being resistant to training. I've read that is a common issue regarding Flux finetuning that still persists. Do you agree that that is an issue or should I not be concerned?

Nicholas Agranoff

i didnt do such intensive testing. actually i was gonna publish the results but the cloud machine i used died and all data lost. however you can train with our config and kohya and use in swarmui it works

Furkan Gözükara

Does it appear to be more responsive to training and less likely to break down compared to dev, or no?

Nicholas Agranoff

hi i already trained on this. it works but we need better hyper parameters like lowered learning rate perhaps. also this model is not as good as flux dev. but i may do more research on it.

Furkan Gözükara

awesome it was the right choice.

Furkan Gözükara

Thank you for answer! RAM sticks ordered :D

K W

Any updates regarding this? Thank you. "Could you please check out this model when you have the opportunity? It's an altered Flux-dev model designed with the purpose of being more receptive to finetuning than ordinary Flux. I'd appreciate your insight on it; I've heard Flux can easily degrade with finetuning and is unusually resistant to training compared to SDXL. I'd like to utilize and Dreambooth Flux for its superior prompt adherence, but not if it is adverse to training new concepts. Thank you. https://huggingface.co/ostris/Flex.1-alpha"

Nicholas Agranoff

yes please upgrade to 64 GB RAM and it will work even faster than 48 GB A6000 GPU :) i tested

Furkan Gözükara

Hi! I am getting out of cuda memory while using any of your configs. I have RTX 4090 but I also noticed that when fine tuning starts it tries to fill shared memory first (100% result in mentioned error, GPU then is loaded approx 20%). I have read that this is correct behavior. Is it because I need more RAM (on your tutorial video your RAM is filled 44GB)? currently I have 32gb so total GPU is 39GB (approx. 24+15GB)

K W

Any updates? Thank you.

Nicholas Agranoff

Yes, I’m trying to generate samples every 500 steps. I’d like to train 3 concepts at once, as I usually do on OneTrainer, so my dataset is quite large. I’m planning to train for just 10 epochs. This absurd number is due to the 200-epoch default in your config files, haha. I’ve reduced it to a maximum of 20 epochs. For some reason, the A6000 instances didn’t work for me. I’m now trying it on an L40, and it looks like it’s working fine so far. I'm using: 48GB_GPU_28200MB_6.3_second_it_Tier_1

Hiago Ramos

hi. you are trying to generate samples at 1000 steps? also this is too many steps you must be doing something not right : 1209000 . are you getting checkpoint or something special? what you are doing at 1000 steps? which config you are using?

Furkan Gözükara

Weird... whenever training reaches 1000 steps it crashes: AttributeError: 'NoneType' object has no attribute 'ndim' steps: 0%| | 1000/1209000 [1:41:31<2044:05:03, 6.09s/it, avr_loss=0.443] Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/Ubuntu/apps/kohya_ss/venv/bin/python3.10', '/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/Ubuntu/MEGA downloads/Kohya workspace/output/config_dreambooth-20250222-115322.toml']' returned non-zero exit status 1. 13:36:14-073212 INFO Training has ended.

Hiago Ramos

Yes! Thanks for the feedback! It was some bug on massed compute instance. I deleted it, deployed a new one and it worked. However, I'm not being able to train using BF16 precision. Kohya are not recognizing A6000 Tensor Cores for some reason.

Hiago Ramos

i just tried and no issues here screenshot : https://pasteboard.co/Zx6yxzBHZ7Fu.png - v17 zip file

Furkan Gözükara

gonna test now on a new machine.

Furkan Gözükara

Massed compute installer is corrupted. /Massed_Compute_Kohya_FLUX.sh: line 20: ./gui.sh: No such file or directory

Hiago Ramos

yes that means some of your images are higher than 1 megapixel. please reduce batch size to 6. that config very tightly fits when all images are exactly 1024x1024

Furkan Gözükara

Hey! Thank you for the guide and all the info. Currently trying to train on Runpod with L40s. I've loaded your config file for Batch size 7, 48gb gpu. I have a dataset of about 560 images. When I try to train with buckets enabled on bf16 I get an OOM error. Do you know how I can successfully load this larger size dataset?

goose1969x

json files are config files. as you do training when it reaches the desired epoch / step that you did set, it will save a .safetensors file. please watch tutorials to see the entire workflow.

Furkan Gözükara

PS: I also already added my captions to my images with BooruDatasetTagManager, and I'm using about 450 images, dunno if it's too much

Ophelia Darla

Hi, is it normal if the training saves the file as a jason file? I thought it would come out as a safetensor one

Ophelia Darla

are you using cfg 1 ? or cfg 7 ? use cfg 1. also i cant see image

Furkan Gözükara

Hi, sorry, another question. I trined model with 24 gb config and 39 images (high res). But there is a problem with the model that came out. all the images are very low quality and blurry. Please take a look I out them here; https://www.canva.com/design/DAGfW4YSoK8/SGbQ0znP7YrixUY8IzTDAA/edit. . What could be the reason and how can I fix it please??

Valentyn Shumakher

hi please send me entire log

Furkan Gözükara

Hi, Ive been trying to fine tune flux model and I keep getting this error. I already reinstalled all the tech on a different storage and still get this issue. raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python3', '/workspace/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/workspace/trainedfinal7/model/config_dreambooth-20250216-145941.toml']' returned non-zero exit status 1. and in the beginning it says: epoch is incremented. can you please help??

Valentyn Shumakher

i made the trainings will publish results hopefully today

Furkan Gözükara

Interesting! Thanks for sharing!

Chris

thanks let me start a training today to test

Furkan Gözükara

Could you please check out this model when you have the opportunity? It's an altered Flux-dev model designed with the purpose of being more receptive to finetuning than ordinary Flux. I'd appreciate your insight on it; I've heard Flux can easily degrade with finetuning and is unusually resistant to training compared to SDXL. I'd like to utilize and Dreambooth Flux for its superior prompt adherence, but not if it is adverse to training new concepts. Thank you. https://huggingface.co/ostris/Flex.1-alpha

Nicholas Agranoff

Sadly i dont know zerogpu as well. I think if you ask to hugging face they would help you

Furkan Gözükara

Hello, Dr! I'm having a problem. If you can help me, I would appreciate it. I followed the tutorial and managed to use the checkpoints to generate images in SwarmUI (both on Massed Compute and on Kaggle). However, I can’t figure out how to use one of my checkpoints through a Hugging Face zerogpu space. Since I know very little about image generation models and the diffusers and transformers libraries, I’m not sure if I’m making some obvious mistake or if it’s even worth continuing to try. I’ve tried various ways to set up my private repository with one of my checkpoints and all of the necessary components and config files (including cloning and trying to adapt the FLUX.1-dev repository: https://huggingface.co/black-forest-labs/FLUX.1-dev) and I’ve already tested several scripts in my space to try to use this repository (including trying to adapt the files from the FLUX.1-dev zerogpu space: https://huggingface.co/spaces/black-forest-labs/FLUX.1-dev). Anyway, what I’d like to know is: 1- Is it possible/recommended to use one of my checkpoints in a Hugging Face zerogpu space? 2- If so, how should I set up the repository? Is it really necessary to convert or split the “.safetensors” file? And what about the application to run on my HF space? Is there any reference that I can easily adapt? Would it be more advisable (if possible) to install SwarmUI on my HF space and simply load my finetune in the right folder? Anyway, thank you!

Victor Teixeira

sure. i am assuming your 600 images dataset is consistent that is the key.

Furkan Gözükara

Okay thank you. I'll quickly restart the training, it's just been a half hour.

Deniz Oliveira Erdinc

for 600 images do maximum 40 epochs and save every 5 epoch and compare

Furkan Gözükara

hi, I finally got it to work. Let's see what happens with my 600 picture dataset. I got a pretty good result for this dataset with ai toolkit and high lora ranks and low learning rates aaand... prodigy (made huge difference) after a lot of experimenting but well.. that was a lora, this is a fine tune. I just want to ask: for 600 pictures and 200 epochs, I'm looking at 8 days of training and 150 dollars with an L40 on runpod (I don't have a gpu at home). Is there any way I could reduce the cost and the speed ? If not, is there any way I could stop then restart training ? I've looked at all your video about kohya and also done some research but I couldn't understand how to do that on runpod and do it cleanly... I thank you for your beautiful work and your patience.

Deniz Oliveira Erdinc

the config is inside zip file. inside DreamBooth_Tab_Fine_Tuning_Best_FLUX_Configs folder. you still having error?

Furkan Gözükara

Hi Furkan, thank you for the quick reply. By base model model you mean flux? That's what I used. I didn't change the quantization I think it was fp16 on default. I couldn't find the .json file on the latest zip file (Kohya_FLUX_DreamBooth_v17.zip) when I downloaded it today, so I tried setting it up manually with the youtube video. Other things I change which could have messed things up: I didn't crop or upscale images and put 512,1024... believing it would automatically bucket them into these two resolutions. I did this because the cropping utility was cropping some full body shots that I wanted to teach the model to learn a character's anatomy well, and because I also have some low resolution images in my dataset, so low that upscaling makes it look grainy. Except for these, I followed everything to the book. Haha, maybe I didn't follow everything 'to the book' that much :D I have just a bit of experience with flux loras having trained maybe 20 or so on ai toolkit, changing hyperparameters, etc. But ! I am new to kohya =(((

Deniz Oliveira Erdinc

hi which base model? looks like model error : ValueError: Attempting to unscale FP16 gradients.

Furkan Gözükara

Hi, I'm getting the following error message. Trying to train on runpod on a H100. running training / 学習開始 num examples / サンプル数: 608 num batches per epoch / 1epochのバッチ数: 608 num epochs / epoch数: 200 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 121600 steps: 0%| | 0/121600 [00:00 train(args) File "/workspace/kohya_ss/sd-scripts/flux_train.py", line 690, in train accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2303, in clip_grad_norm_ self.unscale_gradients() File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2253, in unscale_gradients self.scaler.unscale_(opt) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 338, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 260, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. steps: 0%| | 0/121600 [00:04 sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python3', '/workspace/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/workspace/final_training_models/model/config_dreambooth-20250207-193741.toml']' returned non-zero exit status 1. 19:41:11-849673 INFO Training has ended.

Deniz Oliveira Erdinc

awesome

Furkan Gözükara

Hi! I did a repeat training session. Everything is successful! Thank you!

Alexon

yes it is on my next thing so sorry for delay.

Furkan Gözükara

Hello, Hope you will do a update for Kohya ? Musubi Tuner, a new addition for Kohya, for training Hunyuan Video LoRA Models.

Daniels MV

We use Python 3.10 as default. currently all of my apps working with Python 3.10 since DeepSpeed now also available for Python 3.10 or Triton as well

Furkan Gözükara

i recommend to use only ohwx man or ohwx woman. having detailed captions reduces likeliness

Furkan Gözükara

I have a normal data set, I trained Brad Pitt. Shouldn't you be training with subtitles? The training was great and so was the quality, there's just no character attachment. How else can you make a hard binding when training? I think Lora is still better as a model.

Alexon

i think you need to have better dataset. you can try and see our example dataset and its model https://civitai.com/models/911087/dwayne-johnson-aka-the-rock-flux-dev-fine-tuning-dreambooth-model-for-educational-and-research-purposes-dwayne-johnson-aka-the-rock-flux-dev-lora-model-for-educational-and-research-purposes-full-tutorial https://www.patreon.com/posts/114972274

Furkan Gözükara

Hi! Did everything according to the instructions, training was successful on 30 photos 1024x1024 without subtitles, but when generating a person who was trained, in general, the problem is that the trigger word ohwx man generated different similar people sometimes comes across that person kotrogoh trained or part of his face but hairstyle other and other, changed different prompts and does not work. How to solve this problem? Maybe do the training with a different trigger word?

Alexon

I had to change the priority on Python Path

Robert Arsene

I think this is because of Python version. Was it Python 3.11 on this last version of Kohya or 3.10?

Robert Arsene

Well, I tried reinstalling it and now I get error when trying to config accelerate. Do you know why this could be? [01/31/25 14:36:13] ERROR Error occurred while running command: accelerate config default setup_common.py:673 ERROR Error: Command 'accelerate config default' returned non-zero exit setup_common.py:674 status 1.

Robert Arsene

yes but it works i tested

Furkan Gözükara

I replaced the CLIP_L with the newest one and I get some compatibility warnings with the vision model on training start, but the training continues. Have you experienced this?

Robert Arsene

you are welcome

Furkan Gözükara

Thanks!

Victor Teixeira

fp16 is not preferred at any tier. we train at BF16. also all tier 1 is equal it just changes training speed due to block swap optimization

Furkan Gözükara

So fp16 in the 48GB tier? just checking. I thought bf16 was preferred.

Casper Smit

yes you can. we already have kaggle notebook which works on free kaggle : https://www.patreon.com/posts/106650931

Furkan Gözükara

Hey! Is it possible to use the fine-tuned checkpoints to generate images in the free version of Google Colab? I tried here and even managed to install and use SwarmUI in Google Colab, but I can’t get the models I trained on Massed Compute to work, and I’m not sure if it’s worth continuing to try…

Victor Teixeira

hello looks like you given inaccurate model. are you training on massed compute?

Furkan Gözükara

running training / 学習開始 num examples / サンプル数: 878 num batches per epoch / 1epochのバッチ数: 126 num epochs / epoch数: 100 batch size per device / バッチサイズ: 7 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 12543 steps: 0%| | 0/12543 [00:00 train(args) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 634, in train packed_noisy_model_input = flux_utils.pack_latents(noisy_model_input) # b, c, h*2, w*2 -> b, h*w, c*4 File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/flux_utils.py", line 362, in pack_latents x = einops.rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/einops/einops.py", line 591, in rearrange return reduce(tensor, pattern, reduction="rearrange", **axes_lengths) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/einops/einops.py", line 533, in reduce raise EinopsError(message + "\n {}".format(e)) einops.EinopsError: Error while processing rearrange-reduction pattern "b c (h ph) (w pw) -> b (h w) (c ph pw)". Input tensor shape: torch.Size([7, 16, 135, 240]). Additional info: {'ph': 2, 'pw': 2}. Shape mismatch, can't divide axis of length 135 in chunks of 2 steps: 0%| | 0/12543 [00:00 sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/Ubuntu/apps/kohya_ss/venv/bin/python3.10', '/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/Ubuntu/apps/StableSwarmUI/Models/diffusion_models/model/config_dreambooth-20250121-060535.toml']' returned non-zero exit status 1.

Tushaar

well the more images it works better :D class prompt solely depends on which product i try different for each case

Furkan Gözükara

Thank you Furkan for the answer. It gives me energy knowing that I am on the right track. Would you plase elaborate on your findings a little bit? As a rule of thumb how many pictures, name and class and type of prompt usually yields a good result ?

Fred Abba

i have done a lot of product trainings for clients. the key is dataset, how you prepare dataset and at the inference using proper prompting. it comes with experience sadly

Furkan Gözükara

Hi Furkan, First and foremost, thank you for your incredible effort and outstanding documentation—truly impressive work! I have a question where I believe your expertise is essential. I’m looking to train Flux (either Dreambooth or LoRA) to generate product photography for specific items like furniture or perfume. The goal is to ensure the model can reproduce my exact product, including its shape, texture, and other details. Here’s what I’ve done so far: • I ran a test using your Dreambooth configuration (24GB, 200 epochs) with 15 product training images taken from various angles. • Around epochs 125–150, the results showed good texture generation when I prompted for “I used name =mcft and class = chair” with exact colors and textures. However, the model failed to retain the chair’s shape, producing completely different designs. It seems the model learns the texture well but struggles with preserving the product’s shape. In your experience, which approach would yield better results for this use case—Dreambooth fine-tuning or LoRA? And do you have any suggestions on how to ensure the model retains both the shape and texture of the product accurately? Looking forward to your advice!

Fred Abba

certainly you can use any number of images. for 30 images i would still recommend to train up to 200 epochs and compare. 30 is not that much

Furkan Gözükara

Thank you deeply for your guidance. It worked incredibly well and produced amazing results. Additionally, I have a question: are there any other options besides using 15 or 256 images? If I set the standard to 30 images, I’m wondering if 80 epochs would be an appropriate value.

Leo

if my dataset worked and yours didnt there can be 2 cases. 1st you made a mistake when training your dataset. second, your dataset is very wrong or your captioning and inference prompts wrong. i use this config and workflow an all kind of datasets and so far it works 100%. but dataset preparation is really crucial. i give private consultation and do model training if you need

Furkan Gözükara

Hello! I came here through Youtube, and I tried to train dreamtooth using the configuration you provided in the article. I first trained using the 3D rendering dataset you provided, and the results were very good. Model even learned this style at the 10th epoch. Then I used my dataset for training. My dataset was screenshots of one collection mobile app with a uniform style. There were 18 pictures with a size of 1024*1024. I hoped to learn the style of this app and then generate some similar styles pages. But with the same configuration, I only modified the dataset, but the results were completely wrong. The model did not show anything similar to the style of the training data, even if I trained 200 epochs. I used the x-flux framework to train flux lora before. You can learn style, but as you said, the generalization of the model is not good, so I found your video hoping to learn dreamtooth training. Are there any questions I can improve? By the way, like in your video, I didn't add caption text to the images.

Sam Young

please watch this video it will help you tremendously : https://youtu.be/FvpWy1x5etM

Furkan Gözükara

I am sorry but i need a few more context clues. I don’t know where I select flux dev at. Thank you for all of the laborious content you create! And thank you for your speedy response!

MarcusCooper10@gmail.com

select flux dev and it will become 1024x1024 by default

Furkan Gözükara

When I load the Cloudfare SwarmUI like you suggest for Mass Compute the resolution is set to 512x512. How do I change it to 1024x1024?

MarcusCooper10@gmail.com

awesome feedback ty

Furkan Gözükara

Just wanted to share my experience with Finetuning Loras: flux_dev_fp8_scaled_diffusion_model, not good at all. flux1-dev-fp8, best. flux1-dev-Q8_0, works well. So if you don't get a good likeness of your Lora it might be solved by using another flux model. Try also with and without segment face, sometimes better, sometimes not. Can't see difference between 24 Gb file version and 12 Gb file version. I have a face recognition program that says my best epoch has a 64.83% likeness with one of the training images, however I disagree with it. So I don't think we can trust these face recognition programs yet. That was in SwarmUI using IPNDM with DDIM at 40 steps. However in ForgeUI the best likeness was 70.75% with [Forge] Flux Realistic (2x Slow) with simple at 20 steps. It should be noted that IPNDM with DDIM at 40 steps in Forge was a bit blurry and the best likeness was at 67.00%, while it wasn't blurry in Swarm.

Hockey

awesome

Furkan Gözükara

I did a fresh install now it is running, thanks.

Rafael Villalba

Hello Furkan, Sorry for the inconvenience, I have followed the entire tutorial up to this minute. When we have to do the update and start the swarmUI, I get an error in the logs that does not allow me to use the trained models: https://youtu.be/FvpWy1x5etM?t=5995 Failed to auto-update comfy backend: System.ComponentModel.Win32Exception (2): An error occurred trying to start process '/bin/git' with working directory '/home/Ubuntu/apps/StableSwarmUI/dlbackend/ComfyUI'. No such file or directory at System.Diagnostics.Process.ForkAndExecProcess(ProcessStartInfo startInfo, String resolvedFilename, String[] argv, String[] envp, String cwd, Boolean setCredentials, UInt32 userId, UInt32 groupId, UInt32[] groups, Int32& stdinFd, Int32& stdoutFd, Int32& stderrFd, Boolean usesTerminal, Boolean throwOnNoExec) at System.Diagnostics.Process.StartCore(ProcessStartInfo startInfo) at System.Diagnostics.Process.Start(ProcessStartInfo startInfo) at SwarmUI.Utils.Utilities.RunGitProcess(String args, String dir, Boolean canRetry) in /home/Ubuntu/apps/StableSwarmUI/src/Utils/Utilities.cs:line 987 at SwarmUI.Builtin_ComfyUIBackend.ComfyUISelfStartBackend.<>c__DisplayClass20_0.<b__0>d.MoveNext() in /home/Ubuntu/apps/StableSwarmUI/src/BuiltinExtensions/ComfyUIBackend/ComfyUISelfStartBackend.cs:line 318 thanks for your help.

Rafael Villalba

sadly there isnt a formula but for 1000 images do like 30 epochs, save every 5 epochs and compare them later. 1000 is a lot

Furkan Gözükara

You have a lot of experience in model training, it's amazing. I would like to ask you a question in the past two days, I have prepared about 1000 training images, but I don't know how to set up between the number of training images and the number of rounds, what kind of relationship they are? As well as the learning rate and other parameters, is there any ideal recommended settings? I hope you can give me some advice on how to set these parameters to get the ideal large model checkpoints.

jean1992

hello. kohya made fundamental changes. currently all configs are most up to date. we have for fine tuning 6GB_GPU_5400MB_14.1_second_it_Tier_3_512px.json for lora 6GB_GPU_Quality_Tier_6_FP8_5370MB_10.0_Second_IT_768px_and_T5_Attention.json

Furkan Gözükara

Hello Furkan, hope you are well. =) In the video I see a file named 6GB_GPU_4850MB_24.3_second_it_Tier_1.json for Dreambooth but I can't find it in Kohya Installer v46. Has it been withdrawn? It would also be helpful if the files had a RAM minimum requirement or can that be solved by increasing Windows virtual memory? Another question, Flux files take a lot of space, I have a flash drive and a hard drive, would I lose speed if I had Swarm and safetensor files on my slower hard drive? Friendly Regards /Hockey

Hockey

awesome

Furkan Gözükara

The problem with reporting errors has been resolved, thank you very much!

jean1992

Thank you for your reply. I look forward to working with you when I have the chance as well, today when I was fine-tuning the big model, I encountered that the training image set could not be recognized when specified, can I trouble you to guide me? Below is the reported error:18:26:43-159273 INFO The running process has been terminated. 18:26:43-677005 INFO Training has ended. 18:28:21-252734 INFO Start training Dreambooth... 18:28:21-253731 INFO Validating lr scheduler arguments... 18:28:21-254515 INFO Validating optimizer arguments... 18:28:21-255515 INFO Validating E:\comfyui\comfyui\ComfyUI-aki-v1.3\models\unet\model existence and writability... SUCCESS 18:28:21-256511 INFO Validating E:/comfyui/comfyui/ComfyUI-aki-v1.3/models/unet/flux1-dev.safetensors existence... SUCCESS 18:28:21-257507 INFO Validating F:/Kohya_FLUX_DreamBooth_v17/kohya_ss/dataset/images/reg existence... SUCCESS 18:28:21-258504 INFO Validating F:/Kohya_FLUX_DreamBooth_v17/kohya_ss/dataset/images/train existence... SUCCESS 18:28:21-280879 WARNING Regularization images are used... Will double the number of steps required... 18:28:21-281875 INFO Regularization factor: 2 18:28:21-283995 INFO Total steps: 0 18:28:21-284366 INFO Train batch size: 1 18:28:21-285402 INFO Gradient accumulation steps: 1 18:28:21-285402 INFO Epoch: 200 18:28:21-286388 INFO max_train_steps (0 / 1 / 1 * 200 * 2) = 0 18:28:21-287390 INFO lr_warmup_steps = 0 18:28:21-290416 INFO Saving training config to E:\comfyui\comfyui\ComfyUI-aki-v1.3\models\unet\model\flux_yanbao_269_20241227-182821.json... 18:28:21-292341 INFO Executing command: F:\Kohya_FLUX_DreamBooth_v17\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 F:/Kohya_FLUX_DreamBooth_v17/kohya_ss/sd-scripts/flux_train.py --config_file E:\comfyui\comfyui\ComfyUI-aki-v1.3\models\unet\model/config_dreambooth-20241227-182821.toml F:\Kohya_FLUX_DreamBooth_v17\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( F:\Kohya_FLUX_DreamBooth_v17\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( 2024-12-27 18:28:27 INFO Loading settings from train_util.py:4528 E:\comfyui\comfyui\ComfyUI-aki-v1.3\models\unet\model/config_dreambooth- 20241227-182821.toml... INFO E:\comfyui\comfyui\ComfyUI-aki-v1.3\models\unet\model/config_dreambooth- train_util.py:4547 20241227-182821 2024-12-27 18:28:27 INFO Using DreamBooth method. flux_train.py:115 INFO prepare images. train_util.py:1971 INFO 0 train images with repeating. train_util.py:2012 INFO 0 reg images. train_util.py:2015 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2020 INFO [Dataset 0] config_util.py:567 batch_size: 1 resolution: (1024, 768) enable_bucket: False network_multiplier: 1.0 INFO [Dataset 0] config_util.py:573 INFO loading image sizes. train_util.py:923 0it [00:00, ?it/s] INFO prepare dataset train_util.py:948 INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:43 ERROR No data found. Please verify the metadata file and train_data_dir option. flux_train.py:169 / 画像がありません。メタデータおよびtrain_data_dirオプションを確認してくだ さい。 18:28:29-169699 INFO Training has ended.

jean1992

yes it can improve variety but not particular elements. that is key difference. so you can get randomly better packages but not particular packages. have as many as good images and it should work. for your latter question i didnt get from here sadly. but i can give you private consultation if you need.

Furkan Gözükara

Thank you for your reply. I'm honored to discuss this aspect with you, you mentioned about going to enhance the model's comprehension of the overall package design, can I understand it to mean that there is a requirement for the number of large models to be fine-tuned using the packaging renderings of various brands? Or the more the better? And I see that you have replied to other people about: at present, the flux can only learn a single concept, but the packaging renderings of various brands itself contains a variety of different styles and images, can it play a positive role in fine-tuning the large model? Or would there be a difference between training for product-type renderings and style training for artists' conceptual painting styles?

jean1992

nope it wont work. currently flux is only able to learn a single concept, it can be a style, object, person , etc, but you cant teach multiple distinct styles sadly.

Furkan Gözükara

Hi. I've followed your guide and settings and got really good results in the past finetunes. I used to only train with 30-40 style images and got very good results. Now I'm thinking of training on a lot more images of different styles, around 500-1000 images. I'll mostly be using images from artstation of concept art that I like from different people. It'll be different artists and different styles. Do you think flux would learn all of different art styles and compositions of different artists well with the one finetune? Will it learn colors and details of so many different artists? Is there anything I should keep it mind while training it? And will using 48 gb gpu with batch size of 5-7 be better than using 3090 with small batch size for quality. I want to get the best quality possible. Thanks.

Rustic Engineering

you can train a style but individual multiple element training doesnt work atm for flux. so it is possible for you to improve overall package design capability of the model. and once you have it, you can train a lora over that improved model .

Furkan Gözükara

Thank you very much for your efforts to give me some results on the large model training, I am currently encountering some problems and would appreciate your reply. I have trained lora with 200 gift box images, the result is very good, but the generated elements are not rich enough, so I have fine-tuned the training of the big model, to expand the big model's ability to understand the package design, so that it can generate more artistic and better quality of the details of the images, is this the right way to think about it? Another question is that when I use the fine-tuned large model with Lora trained using the official flux base model to generate images, I encounter bad images, poor image quality and incomplete noise reduction. Is it possible to train lora on the basis of the fine-tuned large model? Is it better to use it in this way?

jean1992

you can continue from last saved checkpoint. instead of giving flux dev as a base, give your last checkpoint and reduce your number of epochs. total_epochs - last_completed_epoch

Furkan Gözükara

Hello Furkan,If I exit during the training process, how do I continue training from the last progress?

小龙 李

you mean regularization / classification images or generating images during training? for the latter part it increases your vram and ram usage during training thus can cause training to be terminated. also sometimes not working properly and being misleading. for first part they dont help flux.

Furkan Gözükara

Hey quick question, any reason why your not using prior preservation loss?

Legenderox Gaming

just a little bit reduces bleeding but not fixing sadly

Furkan Gözükara

Great, highly interesting. I still was looking for a propper cfg>1 value for training dedistilled. All methods have limits at the time being and the most natural looks still came from flux dev, but dedistilled has the greater flexibility. Could the character bleedings maybe be reduced by trainig to an non-existing class like let's say ohwx and class "h3r0"? ... with increased training time / size for the class to sink in?

anmine42

if working not important but here : https://youtu.be/adVhm9aI9Gc

Furkan Gözükara

sure i published very extensive research here : https://www.patreon.com/posts/114969137

Furkan Gözükara

Hi Furkan! Great work, honestly. You shared methods and approaches far from obvious with us and I can confirm your stunning results. You mentioned testing dedistilled somewhere, I have been for a while and results are very promising. Have you had any insights yet? Anyway, thank you for all the effort.

anmine42

Hello, I have similar issue, I have 8GB VRAM and 32 GB of System RAM. Is not possible to run a test with the provided models? I was capable to train LORAs with the same hardware.

Iván López

About 'Manual configure Accelerate' in Kohya setup, in the video you say you explained in another video, or something like that I remember, is it important option? where should I check for this?

Pablo Montero

Oh shit, it was my fault, I had an older folder and I was confused

Pablo Montero

hi please download latest zip file and make a fresh install. we dont have fix anymore

Furkan Gözükara

I'm facing with this error when I open 'Windows_Start_Kohya_SS - Shortcut' The system cannot find the path specified. 'gui.bat' is not recognized as an internal or external command, operable program or batch file. Press any key to continue . . . Then I try 'Fix_GUI_Bat', open in Visual Studio, Run, but nothing: c:/AI/Kohya_GUI_Flux_Installer_v37/Fix_GUI_Bat.py Traceback (most recent call last): File "c:\AI\Kohya_GUI_Flux_Installer_v37\Fix_GUI_Bat.py", line 34, in delete_and_download_gui_bat() File "c:\AI\Kohya_GUI_Flux_Installer_v37\Fix_GUI_Bat.py", line 26, in delete_and_download_gui_bat with open(existing_file_path, 'wb') as file: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'kohya_ss\\gui.bat'

Pablo Montero

yes too slow atm. you must be making changes that is breaking your config. if you send me your config from discord i can check your error. also wait few more steps and see if it s getting down and down more. also try 6 GB GPU config and see how it performs please

Furkan Gözükara

After 7 minutes i got this: Seems like it will take forever like this? I dont know what is wrong tbh steps: 0%| | 1/2000 [06:23<212:46:32, 383.19s/it, avr_loss=0.51]

Mehmet Atakan Çavuşlu

Do you mean this? Seems like I have 2000 steps. 20 pictures, 1 batch, 100 epoch. Total steps: 20 01:25:02-651089 INFO Train batch size: 1 01:25:02-652092 INFO Gradient accumulation steps: 1 01:25:02-653096 INFO Epoch: 100 01:25:02-653597 INFO max_train_steps (20 / 1 / 1 * 100 * 1) = 2000 For per step speed I dont see the speed, it is stuck at 0. Maybe it will fill a bit when I wait longer.. steps: 0%| | 0/2000 [00:00

Mehmet Atakan Çavuşlu

yes batch size increases VRAM usage. how many steps you doing ? what is your per step speed?

Furkan Gözükara

Solved problem by decreasing batch size to 1, with 125 epochs on 45 images. It gave me around 385 hours training time in first epoch ahaha, i guess i will not be able to pull it off with my 11GB RTX 2080 Ti, should I go SD1.5 way?

Mehmet Atakan Çavuşlu

Hello again, I increased my RAM to 64gb but now it gives another memory error: I am using 8gb config, I dont understand why it tries to allocate more and fails. Do you have any idea? torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.29 GiB. GPU 0 has a total capacity of 11.00 GiB of which 3.21 GiB is free. Of the allocated memory 6.31 GiB is allocated by PyTorch, and 328.88 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.

Mehmet Atakan Çavuşlu

yes 32 GB is your problem. if you can upgrade to 64 GB VRAM it should be fixed. so many people fixed the problems with such upgrade

Furkan Gözükara

Hello Furkan, I am having this error during training start: RuntimeError: CUDA error: out of memory I have RTX2080Ti with 11 GB of VRAM, and I tried both 10GB and 8GB config files yet the same error happens. I believe it gives error because of Shared GPU Memory, I have 32 GB of system RAM, if I update to 64 GB would it solve the problem? Thanks for the detailed tutorial!

Mehmet Atakan Çavuşlu

yes you cant generate specific garments with such training. you have to train each garment individually. this is a limitation of flux that no one was able to solve yet

Furkan Gözükara

Hello! I trained a lora based on over a hundred high-quality hand-drawn garment renderings, and it was able to generate better garment renderings, but for garments not in the training set, the generation was more general. For some specific garments (e.g., big name couture styles), flux could not fully understand the styles. Therefore, it was unable to generate these garments. However, I have some problems with the lack of effect material for these garments online, only real show images: 1: Is it possible for me to add some high fashion show pictures as training material, and fine-tune the flux so that the model can retain the style of the rendering, but also generate the high fashion effect? 2: Will it cause the generated high fashion not to tend to the hand-painted style? 3: If 1 is feasible, how should I allocate the material ratio? (I have about three hundred hand-drawn effects materials) Thank you for your hard work? I gained so much from your tutorials that I didn't hesitate to pay for them. Very much looking forward to your reply now, thanks!

jc

both is wrong. make parent folder name like 1_ohwx man - so the kohya will read from there. i recommend watching relevant part of this video : https://youtu.be/nySGu12Y05k 20:26 Preparing training data in Kohya GUI and generated folder structure

Furkan Gözükara

Yes it's a single person. So I have to create like this file : ohwx_man.txt that contains : ohwx man. Or should it contain a description for each image like this : img1 : ohwx man from front angle ; img2 : .......

Chahrazad Seh

hi i can't know what happened from this much information sadly.

Furkan Gözükara

Hiii, Liked your work a lot I trained the model (using pod method ) Due to some emergency I had to stop after the training now when I'm connecting back I'm not able to find files that I need to put in Stable-Diffusion folder

Abhishek

if your dataset is single subject like a particular person, object, product, style, yes use as i do. like ohwx man, ohwx woman, ohwx style and such. you dont need detailed captions

Furkan Gözükara

Hello Thank you very much for this amazing tutorial. I have a question concerning dataset. Should it be like yours : images and one annotation file or can I use my dataset format which is one annotation file for each image. If i have to change it to your format, is there any specific things about the annotaion file arcitecture ?

Chahrazad Seh

100% setup must be wrong. if you send me your used config (modified by you), screenshot of your input folders i can comment. you can send me from discord

Furkan Gözükara

Hello, thanks for the guide! I was trying to create a style checkpoint with 15 photos. I did this with Loras before, and there was always a likeness visible, but with the full checkpoint training nothing shows up. I'm using the trigger word in the prompt and trained with 150 epochs. Any idea what I could be doing wrong? It generates images fine, but ignores the trained tiggerword. I'm using swarmUI, but just using comfy also doesn't change this. Thanks!

Florian Maas

you are welcome. so many people solved issues with 64 GB RAM.

Furkan Gözükara

Ok, thank you very much for solving my doubt and thank you for your time.

Myke Guty

yes exactly RAM memory. 16 GB is not enough. please upgrade to 64 GB and it will work perfect i give 100% guarantee

Furkan Gözükara

if you are referring to RAM memory I have 16Gb

Myke Guty

hello how much system RAM you have? if you have 32 gb please upgrade to 64 GB and you can train 1024x1024 with 0 issues.

Furkan Gözükara

I have an 8GB RTX 3070 TI or if you want to be more precise 8192Megabytes. The configuration of the images is 512x512 with a total of 24 images. The configuration I'm using for Dreambooth is the JSON file: DreamBooth_Tab_Fine_Tuning_Best_FLUX_Configs/6GB_GPU_5400MB_14.1_second_it_Tier_3_512px.json The only thing I have changed would be the epochs to 140 and the saving of each epoch every 35, the rest I have not touched anything else.

Myke Guty

also i prefer discord better

Furkan Gözükara

ye please use unified downloader as shown in this video : https://youtu.be/hewDdVJEqOQ

Furkan Gözükara

yes this is out of VRAM error. how much RAM you have? which config you used?

Furkan Gözükara

I don’t currently have a way to share images or videos directly. If there’s a platform you prefer for sharing such files, please let me know, and I’d be happy to use it. I’ve attached a 4-image compilation that I created to demonstrate my visibility of SwarmUI, following the instructions from the video. Each image includes details of the files I pasted into the corresponding folders. https://imgur.com/a/xqUj41W

GhostDance

I have returned to do the steps you told me and if I have seen a breakthrough, now shows another type of error, I have closed all applications and shows me the following error: Traceback (most recent call last): File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\sd-scripts\flux_train.py", line 849, in train(args) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\sd-scripts\flux_train.py", line 461, in train flux = accelerator.prepare(flux, device_placement=[not is_swapping_blocks]) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1311, in prepare result = tuple( File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1312, in self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1188, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 1435, in prepare_model model = model.to(self.device) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1340, in to return self._apply(convert) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) [Previous line repeated 1 more time] File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply param_applied = fn(param) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert return t.to( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 14.57 GiB is allocated by PyTorch, and 6.34 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) Traceback (most recent call last): File "C:\Users\Michael\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Michael\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\AI\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\AI\\Kohya_FLUX_DreamBooth_v16\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/AI/Kohya_FLUX_DreamBooth_v16/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'C:/AI/SwarmUI/Models/diffusion_models/model/config_dreambooth-20241209-172601.toml']' returned non-zero exit status 1. 17:27:46-882424 INFO Training has ended.

Myke Guty

model error. either inaccurate model given or your downloaded model corrupted : "C:\AI\kohya_ss\sd-scripts\library\model_util.py", line 267, in convert_ldm_unet_checkpoint new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"] KeyError: 'time_embed.0.weight' Traceback (most recent call last): File delete and redownload and give accurate models

Furkan Gözükara

I have the following error when launching it, I have a RTX 3070TI with 8GbVram. I have the photos in 512x512 resolution, I'm using your Dreambooth configuration: 6GB_GPU_5400MB_14.1_second_it_Tier_3_512px --- 13:45:41-256892 INFO SD v2 base model selected. Setting --v2 parameter 13:50:10-888972 INFO Loading config... 13:52:02-917648 INFO Loading config... 13:55:42-684442 INFO Copy C:/Users/Michael/Pictures/Michael_IA_Entrenamiento/2025_Dreambooth_SECourses to C:/AI/SwarmUI/Models/diffusion_models\img/1_ohwx man... 13:55:42-748413 INFO Regularization images directory is missing... not copying regularisation images... 13:55:42-751446 INFO Done creating kohya_ss training folder structure at C:/AI/SwarmUI/Models/diffusion_models... 14:00:38-379443 INFO Save... 14:00:56-914095 INFO Start training Dreambooth... 14:00:56-916048 INFO Validating lr scheduler arguments... 14:00:56-917078 INFO Validating optimizer arguments... 14:00:56-919049 INFO Validating C:/AI/SwarmUI/Models/diffusion_models/model existence and writability... SUCCESS 14:00:56-920048 INFO Validating C:/AI/ORGINAL_FLUX/flux1-dev.safetensors existence... SUCCESS 14:00:56-921049 INFO Validating C:/AI/SwarmUI/Models/diffusion_models\img existence... SUCCESS 14:00:56-924048 INFO Folder 1_ohwx man: 1 repeats found 14:00:56-925048 INFO Folder 1_ohwx man: 24 images found 14:00:56-928048 INFO Folder 1_ohwx man: 24 * 1 = 24 steps 14:00:56-929075 INFO Regulatization factor: 1 14:00:56-930048 INFO Total steps: 24 14:00:56-932048 INFO Train batch size: 1 14:00:56-933064 INFO Gradient accumulation steps: 1 14:00:56-934073 INFO Epoch: 140 14:00:56-935048 INFO max_train_steps (24 / 1 / 1 * 140 * 1) = 3360 14:00:56-937047 INFO lr_warmup_steps = 0 14:00:56-942051 INFO Saving training config to C:/AI/SwarmUI/Models/diffusion_models/model\Quality_3_20241209-140056.json... 14:00:56-984079 INFO Executing command: C:\AI\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --gpu_ids 0 --mixed_precision bf16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 C:/AI/kohya_ss/sd-scripts/train_db.py --config_file C:/AI/SwarmUI/Models/diffusion_models/model/config_dreambooth-20241209-140056.toml 14:00:56-990049 INFO Command executed. 2024-12-09 14:01:16 INFO Loading settings from train_util.py:4174 C:/AI/SwarmUI/Models/diffusion_models/model/config_dreambooth-20241209-1 40056.toml... INFO C:/AI/SwarmUI/Models/diffusion_models/model/config_dreambooth-20241209-1 train_util.py:4193 40056 2024-12-09 14:01:16 INFO prepare tokenizer train_util.py:4665 INFO update token length: 75 train_util.py:4682 INFO prepare images. train_util.py:1815 INFO found directory C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man train_util.py:1762 contains 24 image files WARNING No caption file found for 24 images. Training will continue without train_util.py:1793 captions for these images. If class token exists, it will be used. / 24枚の画像にキャプションファイルが見つかりませんでした。これらの画像につ いてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。 WARNING C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man\mykeguty (1).jpg train_util.py:1800 WARNING C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man\mykeguty (10).jpg train_util.py:1800 WARNING C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man\mykeguty (11).jpg train_util.py:1800 WARNING C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man\mykeguty (12).jpg train_util.py:1800 WARNING C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man\mykeguty (13).jpg train_util.py:1800 WARNING C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man\mykeguty train_util.py:1798 (14).jpg... and 19 more INFO 24 train images with repeating. train_util.py:1856 INFO 0 reg images. train_util.py:1859 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1864 INFO [Dataset 0] config_util.py:572 batch_size: 1 resolution: (512, 512) enable_bucket: False network_multiplier: 1.0 [Subset 0 of Dataset 0] image_dir: "C:\AI\SwarmUI\Models\diffusion_models\img\1_ohwx man" image_count: 24 num_repeats: 1 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_separator: , secondary_separator: None enable_wildcard: False caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, alpha_mask: False, is_reg: False class_tokens: ohwx man caption_extension: .txt INFO [Dataset 0] config_util.py:578 INFO loading image sizes. train_util.py:911 100%|███████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 12006.60it/s] INFO prepare dataset train_util.py:919 INFO prepare accelerator train_db.py:106 accelerator device: cuda INFO loading model for process 0/1 train_util.py:4823 INFO load StableDiffusion checkpoint: train_util.py:4779 C:/AI/ORGINAL_FLUX/flux1-dev.safetensors Traceback (most recent call last): File "C:\AI\kohya_ss\sd-scripts\train_db.py", line 529, in train(args) File "C:\AI\kohya_ss\sd-scripts\train_db.py", line 123, in train text_encoder, vae, unet, load_stable_diffusion_format = train_util.load_target_model(args, weight_dtype, accelerator) File "C:\AI\kohya_ss\sd-scripts\library\train_util.py", line 4825, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( File "C:\AI\kohya_ss\sd-scripts\library\train_util.py", line 4780, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint( File "C:\AI\kohya_ss\sd-scripts\library\model_util.py", line 1005, in load_models_from_stable_diffusion_checkpoint converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config) File "C:\AI\kohya_ss\sd-scripts\library\model_util.py", line 267, in convert_ldm_unet_checkpoint new_checkpoint["time_embedding.linear_1.weight"] = unet_state_dict["time_embed.0.weight"] KeyError: 'time_embed.0.weight' Traceback (most recent call last): File "C:\Users\Michael\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Michael\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\AI\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in sys.exit(main()) File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\AI\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/AI/kohya_ss/sd-scripts/train_db.py', '--config_file', 'C:/AI/SwarmUI/Models/diffusion_models/model/config_dreambooth-20241209-140056.toml']' returned non-zero exit status 1. 14:01:19-478101 INFO Training has ended.

Myke Guty

sadly i dont know. if you make a video of this i can comment better or screenshots

Furkan Gözükara

Hi Dr. Furkan, I had to redownload everything from scratch and am now running SwarmUI v0.9.4.0. I’ve followed the steps to the best of my ability, but I’ve noticed that no checkpoint appears on my "Models" tab, even though items are showing up on the VAE tab before I plug in configurations. Do you have any suggestions for what I can try to resolve this? Thank you for your help in advance!

GhostDance

Interesting. Thanks a lot.

Zd Ba

just a little bit i tested : https://www.patreon.com/posts/114969137

Furkan Gözükara

Ok thanks. Dedistilled model could help?

Zd Ba

true. currently this is biggest issue of flux and no definitive or easy solution yet. flux has huge class bleed problem

Furkan Gözükara

Hi, I finetuned Flux dev of a person according to your video. I used 35 images and from 70 to 110 epochs it looks nice. I used midle length caption with “special” and “class” word at the beginning. I know you recommend no caption, only using directory name for flux Kohya training. I will try it next time. But it looks to me by finetuning DreamBooth I overtrained all other ‘man’ faces. With writing prompt asxM man, young man, or teen boy I get more or less similar face. With asxMan man I get perfect face with other man similar. Is result without caption better ? I mean how to do it that with asxM man prompt I get my face with young man prompt I get different one. Thanks.

Zd Ba

Enable advanced options and that option appears when you select FLUX dev model or other FLUX dev models like fill depth canny

Furkan Gözükara

Hey Doc & Others, I’m currently running SwarmUI v0.9.3.2 and following the tutorial as best as I can since I’m new to this. However, I’ve noticed that some options don’t align with the tutorial. Specifically, I’m unable to locate the Flux Guidance Scale under the Sampling section. Are there any recommended next steps to locate it, or alternative actions I should take?**

GhostDance

there is no such thing. go up to 150 epochs for 140 images

Furkan Gözükara

Hi, still couldn't make it right with Epoch, is there any equation I can follow? I have a 140 images to train with.

JackOppss

sure

Cgpress.us

please send me your config file from discord so i can tell your error. your modified and run config please

Furkan Gözükara

Yes I have 4 GPUs in one machine but I don't marck Multi GPU in the Accelerate launch

Cgpress.us

you say GPUs. are you trying multiple GPU?

Furkan Gözükara

It was only the path of the model, clip, T5-XXL, and VAE that was changed. No other configuration was changed. My GPUs are A5000 also I used the DreamBooth_Tab_Fine_Tuning_Best_FLUX_Configs/24GB_GPU_22900MB_7.3_second_it_Tier_1.json

Cgpress.us

hi you looks like changed the config. you should load my config and use them. i dont know what you changed. which config you are trying to use? on which gpu?

Furkan Gözükara

I have an issue with Kohya FLUX Fine Tuning, my error is : Traceback (most recent call last): File "C:\Users\Node 1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Node 1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\AI\kohya_ss\venv\Scripts\accelerate. EXE\ __ main __. py", line 7, in sys.exit(main()) File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args. func(args) File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1084, in launch_command args, defaults, mp_from_config_flag = _validate_launch_command(args) File "C:\AI\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 957, in _validate_launch_command raise ValueError( ValueError: Less than two GPU IDs were configured and tried to run on on multiple GPUs. Please ensure at least two are s pecified for ' -- gpu_ids', or use ' -- gpu_ids='all''.

Cgpress.us

awesome

Furkan Gözükara

yes i tried but i didnt see much difference.

Furkan Gözükara

awesome

Furkan Gözükara

OK, i found out the issue, I did not press load when I loaded your settings :D Now its running. thanks for the help!

Manuel Garrido Peña

Nevermind! I activated the venv and did pip install --upgrade timm. that worked. Thanks for the awesome work

What to watch high

Any idea what could cause this: Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "P:\AI\Kohya_GUI_Flux_Installer_v45\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 4, in File "P:\AI\Kohya_GUI_Flux_Installer_v45\kohya_ss\venv\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 19, in from accelerate.commands.estimate import estimate_command_parser File "P:\AI\Kohya_GUI_Flux_Installer_v45\kohya_ss\venv\Lib\site-packages\accelerate\commands\estimate.py", line 34, in import timm File "P:\AI\Kohya_GUI_Flux_Installer_v45\kohya_ss\venv\Lib\site-packages\timm\__init__.py", line 2, in from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \ File "P:\AI\Kohya_GUI_Flux_Installer_v45\kohya_ss\venv\Lib\site-packages\timm\models\__init__.py", line 28, in from .maxxvit import * File "P:\AI\Kohya_GUI_Flux_Installer_v45\kohya_ss\venv\Lib\site-packages\timm\models\maxxvit.py", line 225, in @dataclass ^^^^^^^^^ File "C:\Program Files\Python311\Lib\dataclasses.py", line 1230, in dataclass return wrap(cls) ^^^^^^^^^ File "C:\Program Files\Python311\Lib\dataclasses.py", line 1220, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\dataclasses.py", line 958, in _process_class cls_fields.append(_get_field(cls, name, type, kw_only)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\dataclasses.py", line 815, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default for field conv_cfg is not allowed: use default_factory

What to watch high

I don't use use 8-bit AdamW optimizer so this is not my config.

Furkan Gözükara

you are right, running it with 3.10.11 made it work. I am experiencing however, the same error I had when I was running the tool from linux here are the logs. ``` Starting the GUI... this might take some time... 14:06:02-013747 INFO Kohya_ss GUI version: v24.2.0 14:06:02-415472 INFO Submodule initialized and updated. 14:06:02-419469 INFO nVidia toolkit detected 14:06:03-805505 INFO Torch 2.5.0+cu124 14:06:03-821505 INFO Torch backend: nVidia CUDA 12.4 cuDNN 90100 14:06:03-824509 INFO Torch detected GPU: NVIDIA GeForce RTX 3090 VRAM 24576MB Arch 8.6 Cores 82 14:06:03-825509 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] 14:06:03-826509 INFO Installing/Validating requirements from requirements_pytorch_windows.txt... 14:06:04-607000 INFO Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu124 14:06:04-608000 INFO Obtaining file:///D:/Backup/Proyectos/stable-diffusion/sec_courses_patreon_tutorial/Kohya_FLUX_DreamBooth_v16/kohya_ss/sd-scripts (from -r D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\requirements.txt (line 37)) 14:06:04-610000 INFO Preparing metadata (setup.py): started 14:06:05-033374 INFO Preparing metadata (setup.py): finished with status 'done' 14:06:05-720831 INFO Installing collected packages: library 14:06:05-721831 INFO Attempting uninstall: library 14:06:05-722831 INFO Found existing installation: library 0.0.0 14:06:05-724831 INFO Uninstalling library-0.0.0: 14:06:06-794508 INFO Successfully uninstalled library-0.0.0 14:06:06-795507 INFO Running setup.py develop for library 14:06:07-522457 INFO Successfully installed library 14:06:07-805399 INFO headless: False 14:06:07-813395 INFO Using shell=True when running external commands... INFO: Could not find files for the given pattern(s). * Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. 14:12:54-839919 INFO Destination training directory is missing... can't perform the required task... 14:13:16-950094 INFO Copy D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\raw_input\combined_cropped_resized to D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\img/1_ohwx man... 14:13:19-144154 INFO Regularization images directory is missing... not copying regularisation images... 14:13:19-145153 INFO Done creating kohya_ss training folder structure at D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models... 16:22:21-149268 INFO Start training Dreambooth... 16:22:21-150269 INFO Validating lr scheduler arguments... 16:22:21-151268 INFO Validating optimizer arguments... 16:22:21-152267 INFO Validating D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\log existence and writability... SUCCESS 16:22:21-152267 INFO Validating D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\model existence and writability... SUCCESS 16:22:21-153772 INFO Validating D:/Backup/Proyectos/stable-diffusion/kohya_dreambooth/models/flux1-dev.safetensors existence... SUCCESS 16:22:21-154777 INFO Validating D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\img existence... SUCCESS 16:22:21-155777 INFO Folder 1_ohwx man: 1 repeats found 16:22:21-155777 INFO Folder 1_ohwx man: 118 images found 16:22:21-156777 INFO Folder 1_ohwx man: 118 * 1 = 118 steps 16:22:21-157776 INFO Regularization factor: 1 16:22:21-157776 INFO Total steps: 118 16:22:21-158781 INFO Train batch size: 1 16:22:21-159779 INFO Gradient accumulation steps: 1 16:22:21-160779 INFO Epoch: 100 16:22:21-160779 INFO Max train steps: 1600 16:22:21-161780 INFO lr_warmup_steps = 0.1 16:22:21-163778 INFO Saving training config to D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\model\100_imgs_BS_1_20241125-162221.json... 16:22:21-165781 INFO Executing command: D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode default --mixed_precision fp16 --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 D:/Backup/Proyectos/stable-diffusion/sec_courses_patreon_tutorial/Kohya_FLUX_DreamBooth_v16/kohya_ss/sd-scripts/flux_train.py --config_file D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\model/config_dreambooth-20241125-162221.toml D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( 2024-11-25 16:22:37 INFO Loading settings from D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\model/config_dreambooth-20241125-162221.toml... train_util.py:4519 INFO D:\Backup\Proyectos\stable-diffusion\kohya_dreambooth\SwarmUI\Models\diffusion_models\model/config_dreambooth-20241125-162221 train_util.py:4538 2024-11-25 16:22:37 INFO Using DreamBooth method. flux_train.py:115 INFO prepare images. train_util.py:1971 INFO get image size from name of cache files train_util.py:1886 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 118/118 [00:00 flux_utils.py:152 INFO [Dataset 0] train_util.py:2495 INFO caching latents with caching strategy. train_util.py:1048 INFO caching latents... train_util.py:1097 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 118/118 [00:27<00:00, 4.23it/s] tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-11-25 16:23:15 INFO Building CLIP-L flux_utils.py:163 INFO Loading state dict from D:/Backup/Proyectos/stable-diffusion/kohya_dreambooth/models/clip_l.safetensors flux_utils.py:259 2024-11-25 16:23:16 INFO Loaded CLIP-L: flux_utils.py:262 INFO Loading state dict from D:/Backup/Proyectos/stable-diffusion/kohya_dreambooth/models/t5xxl_fp16.safetensors flux_utils.py:314 2024-11-25 16:23:17 INFO Loaded T5xxl: flux_utils.py:317 INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:43 INFO Building Flux model dev from BFL checkpoint flux_utils.py:101 INFO Loading state dict from D:/Backup/Proyectos/stable-diffusion/kohya_dreambooth/models/flux1-dev.safetensors flux_utils.py:118 2024-11-25 16:31:27 INFO Loaded Flux: flux_utils.py:137 number of trainable parameters: 11901408320 prepare optimizer, data loader etc. 2024-11-25 16:31:28 INFO use 8-bit AdamW optimizer | {} train_util.py:4673 running training / 学習開始 num examples / サンプル数: 118 num batches per epoch / 1epochのバッチ数: 118 num epochs / epoch数: 14 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1600 steps: 0%| | 0/1600 [00:00 train(args) File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\sd-scripts\flux_train.py", line 690, in train accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 2303, in clip_grad_norm_ self.unscale_gradients() File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\accelerator.py", line 2253, in unscale_gradients self.scaler.unscale_(opt) File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\amp\grad_scaler.py", line 338, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_( File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\torch\amp\grad_scaler.py", line 260, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. steps: 0%| | 0/1600 [01:25 sys.exit(main()) File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['D:\\Backup\\Proyectos\\stable-diffusion\\sec_courses_patreon_tutorial\\Kohya_FLUX_DreamBooth_v16\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Backup/Proyectos/stable-diffusion/sec_courses_patreon_tutorial/Kohya_FLUX_DreamBooth_v16/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:\\Backup\\Proyectos\\stable-diffusion\\kohya_dreambooth\\SwarmUI\\Models\\diffusion_models\\model/config_dreambooth-20241125-162221.toml']' returned non-zero exit status 1. 16:34:26-153574 INFO Training has ended. ```

Manuel Garrido Peña

this is your error. make python 3.10 default not Python version is 3.11.9. also use latest zip file

Furkan Gözükara

herşeyi direkt fp16 versiyon kullanabilirsin. seçim yapman gerekmiyor. hepsini config ve kohya otomatik hallediyor bunları değiştirme

Furkan Gözükara

I installed all windows requirements, and tested other bat installations (facefusion or liveportrait) and they work. However running v26 of the zip file for kohya fails when installing kohya_ss gui with the following error: ``` Select an option: 1 [11/25/24 13:25:37] INFO Kohya_ss GUI version: v24.2.0 setup_common.py:371 INFO Python version is 3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)] setup_common.py:28 INFO Submodule initialized and updated. setup_common.py:53 INFO Installing/Validating requirements from requirements_pytorch_windows.txt... setup_common.py:161 [11/25/24 13:25:38] INFO Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu124 setup_common.py:183 INFO Obtaining file:///D:/Backup/Proyectos/stable-diffusion/sec_courses_patreon_tutorial/Kohya_FLUX_DreamBooth_v16/kohya_ss/sd-scripts (from -r requirements.txt (line 37)) setup_common.py:183 INFO Preparing metadata (setup.py): started setup_common.py:183 INFO Preparing metadata (setup.py): finished with status 'done' setup_common.py:183 [11/25/24 13:25:41] INFO Installing collected packages: library setup_common.py:183 INFO Attempting uninstall: library setup_common.py:183 INFO Found existing installation: library 0.0.0 setup_common.py:183 INFO Uninstalling library-0.0.0: setup_common.py:183 [11/25/24 13:25:42] INFO Successfully uninstalled library-0.0.0 setup_common.py:183 INFO Running setup.py develop for library setup_common.py:183 INFO Successfully installed library-0.0.0 setup_common.py:183 Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 4, in File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 19, in from accelerate.commands.estimate import estimate_command_parser File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Lib\site-packages\accelerate\commands\estimate.py", line 34, in import timm File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Lib\site-packages\timm\__init__.py", line 2, in from .models import create_model, list_models, is_model, list_modules, model_entrypoint, \ File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Lib\site-packages\timm\models\__init__.py", line 28, in from .maxxvit import * File "D:\Backup\Proyectos\stable-diffusion\sec_courses_patreon_tutorial\Kohya_FLUX_DreamBooth_v16\kohya_ss\venv\Lib\site-packages\timm\models\maxxvit.py", line 225, in @dataclass ^^^^^^^^^ File "C:\Python311\Lib\dataclasses.py", line 1232, in dataclass return wrap(cls) ^^^^^^^^^ File "C:\Python311\Lib\dataclasses.py", line 1222, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python311\Lib\dataclasses.py", line 958, in _process_class cls_fields.append(_get_field(cls, name, type, kw_only)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python311\Lib\dataclasses.py", line 815, in _get_field raise ValueError(f'mutable default {type(f.default)} for field ' ValueError: mutable default for field conv_cfg is not allowed: use default_factory [11/25/24 13:25:47] ERROR Error occurred while running command: accelerate config default setup_common.py:673 ERROR Error: Command 'accelerate config default' returned non-zero exit status 1. ```

Manuel Garrido Peña

kusura bakmayın çok soru soruyorum ama benim 8gb vram ve 64 gb ram var. bu sistem için finetuning yaparken pretrained model =? flux1-fp8 mi yoksa flux1-dev mi? clip=? clip l mi clip g mi T5XXL=? fp8 mi yoksa normal mi floatın olduğu yerden fp16 mı bf 16 mı seçmeliyim. bu konularda bilgim az olduğundan kararsız kalıyorum.bana bunun için bir öneride bulunabilirmisiniz..Yardım ve ilginiz için teşekkür ederim.

Bayram TATAR

float işaretlemen hatalı olmuş. o direkt 2 katına çıkartıyor

Furkan Gözükara

Merhaba Furkan Bey, Kolay gelsin RAM Belleğim geldi. Dediğiniz gibi Train edebiliyorum. Ancak bir sorun var. Üretilen checkpointer 44GB(Çok büyük Oldu) Şöyle bir conigurasyon yaptım. Folder 1_ohwx man: 1 repeats found 11:40:45-449658 INFO Folder 1_ohwx man: 15 images found 11:40:45-450658 INFO Folder 1_ohwx man: 15 * 1 = 15 steps 11:40:45-451658 INFO Regularization factor: 1 11:40:45-452658 INFO Total steps: 15 11:40:45-452658 INFO Train batch size: 1 11:40:45-453657 INFO Gradient accumulation steps: 1 11:40:45-454659 INFO Epoch: 100 11:40:45-454659 INFO max_train_steps (15 / 1 / 1 * 100 * 1) = 1500 11:40:45-455659 INFO lr_warmup_steps = 0 sizinkinden farklı olarak Pretrained model olarak flux1-dev-fp8.safetensors T5XXL yi de t5xxl_fp8_e4m3fn.safetensors kullandım. diğerleri sizin yaptığınızla aynısı. 8gb 1024x1024 kongigurasyon kullandım. birde float işaretlemişim yanlışlıkla. nerde hata yapmış olabilirim. Şimdiden teşekkürler

Bayram TATAR

çok çözen oldu 64 gb ram gayet iyi hayırlı olsun

Furkan Gözükara

32 yeter belki diye 32+8 takmıştım. bir tane daha 32 gb sipariş verdim. inşallah sorun çözülür. çünkü vrami yükseltme şansım yok gibi çok maliyetli. ilginiz için teşekkür ederim. Ayrıca bir ekran kartı alsam hesaplı görülüyor tavsiye edermisiniz.. Palit Nvidia Geforce RTX4060Ti Jetstream 16GB 128Bit GDDR6 Ekran Kartı NE6406T019T1-1061J

Bayram TATAR

well you really need to open an issue thread on here : https://github.com/bmaltais/kohya_ss/issues i am pretty sure bmaltais can make a fix write a detailed post

Furkan Gözükara

tamamıyla RAM yetmiyor maalesef. 64 GB yapsanız sorun kalmaz.

Furkan Gözükara

Thanks. It's working in my standard branch Kohya SS (on my c drive that I installed a long time ago) so I can edit the captions in that (but it's also an additional thing that will need doing when there's already an option for it in the Flux Kohya but that doesn't work now). Though I assume if I update that standard branch it will install that new gradio into that folder too and it won't work on that either then I'd need some other program to edit the captions in probably (unless it's something that could easily be put in a .bat file or source code file). I understand they want it more secure but the Kohya Flux branch programmer doesn't seem to have edited the app to take that change into account (maybe he hasn't tested that manual caption editing with the new gradio that it now uses or maybe they need a bit clearer messages about what users need to do). For your app for editing captions I assume you can add that For loop into it to allow all drives to be accessed. Maybe the app source code for kohya could be edited to do something like that (even though he says it's insecure that way, but if they're local drives it should be okay (but it would probably need to be a change the programmer of kohya/the Flux branch would need to add, as if we added it it would get overwritten when it checked for updates). Or maybe it could be a menu somewhere on installation for which drives and folders you want the app to be able to access and it could just allow those.). I did try your app for doing the manual captions previously. Maybe that might still work if it isn't updated. Though with the Kohya one it splits the captions and images into pages so you can easily go from page 1 to the last page and back to check you've done them all but your one didn't. Maybe with it split into pages it's faster when it needs to reload a certain amount of images & captions, if it only needs to load 1 page at a time.

cool1

Merhaba Furkan Bey, Sizinle daha önce görüşmüştük. Şu an 40gb RAM ve 8gb vram (rtx4060) sistemim var. Flux lorayı sorunsuzca çalıştırabiliyorum. ancak Flux FineTuning i bir türlü çalıştıramadım. aynen dediklerinizi yapıyorum fakat hata alıyorum. bir kaç farklı konfigürasyon da denedim ama nafile.Çözemedim son çare olarak siz yazıyorum. Müsait bir vaktinizde uzak bağlantı ile bakabilirmisiniz acaba. yada başka nasıl yardımcı olabilirsiniz. Şimdiden Teşekkür ederim. Aldığım hatanın genel çıktısı şu şekilde: enable full bf16 training. running training / 学習開始 num examples / サンプル数: 15 num batches per epoch / 1epochのバッチ数: 15 num epochs / epoch数: 140 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 2100 steps: 0%| | 0/2100 [00:00 train(args) File "D:\Kohya_FLUX_DreamBooth_v16\kohya_ss\sd-scripts\flux_train.py", line 570, in train accelerator.unwrap_model(flux).prepare_block_swap_before_forward() File "D:\Kohya_FLUX_DreamBooth_v16\kohya_ss\sd-scripts\library\flux_models.py", line 1006, in prepare_block_swap_before_forward self.offloader_single.prepare_block_devices_before_forward(self.single_blocks) File "D:\Kohya_FLUX_DreamBooth_v16\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 210, in prepare_block_devices_before_forward weighs_to_device(b, "cpu") # make sure weights are on cpu File "D:\Kohya_FLUX_DreamBooth_v16\kohya_ss\sd-scripts\library\custom_offloading_utils.py", line 91, in weighs_to_device module.weight.data = module.weight.data.to(device, non_blocking=True) RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. steps: 0%| | 0/2100 [00:30 sys.exit(main()) File "C:\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Python310\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Python310\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Python310\\python.exe', 'D:/Kohya_FLUX_DreamBooth_v16/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:\\swarm_ui\\SwarmUI\\Models\\diffusion_models\\model/config_dreambooth-20241124-201249.toml']' returned non-zero exit status 1. 20:14:32-493930 INFO Training has ended.

Bayram TATAR

it is because of gradio you can see my issue thread here : https://github.com/gradio-app/gradio/issues/9809 so report this to kohya ss bmaltais

Furkan Gözükara

Today I installed the newer version of this Kohya SS Flux trainer (from the v16.zip file), after using an older version that you posted previously. But on this newer version, after doing option 1 of windows_install_step1.bat (which seemed like all you were saying was needed to be done. Though I also tried doing the "windows update" after the problem but that didn't make it work). and after clicking the windows start bat file, after going to manual captioning and selecting a folder with existing .jpgs and .txt files in (that had loaded in old versions Kohya) it now says: "gradio.exceptions.InvalidPathError: Cannot move [the jpeg path and filename] to the gradio cache dir because it was not created by the application or it is not located in either the current working directory or your system's temp directory. To fix this error, please ensure your function returns files located in either the current working directory(..) your system's temp directory, or add ([that specific path and folder] to the allowed_paths parameter of launch()...". Why is this happening like that now when it didn't in the previous one installed? I didn't need to edit paths in the old one. I'm not sure exactly what I need to edit.

cool1

you need to upgrade 64 gb to be able to properly train. also someone said disabling full bf16 training fixed but i dont know for sure. if you message me from discord i can connect your computer and try to make it work so we can tell others too if works

Furkan Gözükara

Hey tried with a dataset of 20 images and worked really well! I'm wondering if you have done testing with also training the T5 text encoder? If so any tactical advice to increase consistency between output images and the input objects?

Blendi Bylygbashi

how many minimum ram required? I have 32 GB ram and 4070Ti (12GB) and I always got oom issue when start the training and shared vram hit its limit. Do you have any suggestion which parameter should I change to make it fit 32GB ram

Kingbund17z

ok sure let me know progress

Furkan Gözükara

yes 7 images to low. i recommend you to crop images into different distances positions accurately and make your image count to like 20. currently i am training a biscuit package for a client and i did that to increase image count. this works every time

Furkan Gözükara

Hi there! This is super useful, thanks so much for putting this tutorial together, for some reason the model is not learning the object I am training it on. I have 7 input images and am using the 24GB_GPU_23150MB_10.2_second_it_Tier_1.json config file on my RTX 4090. The model is learning very little about the object any ideas why this could be happening?

Blendi Bylygbashi

Well, in the video you selected 16gb, just in case. I will try using the 24gb config and if not, i will try installing everything natively on windows

Manuel Garrido Peña

thanks for reminding

Furkan Gözükara

i also have rtx 3090. working perfect with perfect speed on windows atm. and why not using 24 gb config?

Furkan Gözükara

OpenFlux is de-distilled Schnell. https://huggingface.co/ostris/OpenFLUX.1

The Zet

Since Im used to working with linux I thought i would try with it. My gpu is a Geforce RTX 3090 (24gb ram). I run the training with the 16GB settings.

Manuel Garrido Peña

you dont need WSL now windows speed same as linux. also what is your GPU?

Furkan Gözükara

running this on WSL2 linux, had to update cuda to 12.4 (and torch to 2.5.1) . Running the training fails with this error: ``` steps: 0%| | 0/1600 [00:00 train(args) File "/mnt/d/Backup/Proyectos/stable-diffusion/kohya_dreambooth/kohya_ss/sd-scripts/flux_train.py", line 690, in train accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm) File "/mnt/d/Backup/Proyectos/stable-diffusion/kohya_dreambooth/kohya_ss/.venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2303, in clip_grad_norm_ self.unscale_gradients() File "/mnt/d/Backup/Proyectos/stable-diffusion/kohya_dreambooth/kohya_ss/.venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2253, in unscale_gradients self.scaler.unscale_(opt) File "/mnt/d/Backup/Proyectos/stable-diffusion/kohya_dreambooth/kohya_ss/.venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 338, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_( File "/mnt/d/Backup/Proyectos/stable-diffusion/kohya_dreambooth/kohya_ss/.venv/lib/python3.10/site-packages/torch/amp/grad_scaler.py", line 260, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. ```

Manuel Garrido Peña

I see. I like to play around between 140 - 180 epochs, so I will try out the "save last N epoch states" setting.

Chris

it will be like that at the end true. but you cant tell it to save e.g. like 100, 125 and 160

Furkan Gözükara

Ok, thank you for the suggestion.

David Fava

Just curious, with 20 images that were more or less similar to each other coming from 4 images, how many epochs did you use? 160?

Chris

But when you train 180 epochs, and you set "save every 20 epochs" and then from above advanced setting to "Save last N epochs" or maybe "Save last N epochs state" to "3", shouldn't that mean that the script saves epoch 140, 160, 180 to disk and disregards all smaller epochs?

Chris

yes now you can use Save last N epochs. but still we cant tell save these particular steps or epochs :(

Furkan Gözükara

Regarding Kohya Script in order to only save the last XY epochs (like 140,160,180) if you do not want to save every N epochs regarding limited disk space, I just found this in the GUI in advanced parameter section. Is this the functionality mentioned? If so, maybe short update on youtube tutorial for everyone? Save every N steps (Optional) The model is saved every specified steps 0 Save last N steps (Optional) Save only the specified number of models (old models will be deleted) 0 Save last N steps state (Optional) Save only the specified number of states (old models will be deleted) 0 Save last N epochs (Optional) Save only the specified number of epochs (old epochs will be deleted) 0 Save last N epochs state (Optional) Save only the specified number of epochs states (old models will be deleted) 0

Chris

you need to find a de-distilled Flux Schnell model. I think there was a project but i couldnt remember right now. if you find yes only change training cfg to 3.5

Furkan Gözükara

Any idea how to "de-distill" training works for Flux Schnell? Could we just fine tune using guidance set at 3.5 instead of 1?

Henry

hi it was bmaltais error. he just fixed it please update kohya to latest again : https://github.com/bmaltais/kohya_ss/commit/309a9bbc1b333d965592efc1a4c762b2bd489c1c

Furkan Gözükara

Hello - Thanks for all your work! After removing kohya and starting fresh with the 17 November update, I have been getting this error: File "C:\kohya\kohya_ss\kohya_gui\dreambooth_gui.py", line 817, in train_model log.info(max_train_steps_info) UnboundLocalError: local variable 'max_train_steps_info' referenced before assignment This is using 8GB_GPU_7500MB_20.5_second_it_Tier_1 config and with the four model files downloaded from the links in the Flux LoRa article. I have left Max Train Steps as default 0, as well as tried setting a number and got the same error. Any thoughts?

Daniel Sturtevant

thanks for info

Furkan Gözükara

nope. you can make data augmentation cropping 2 images into different positioned distant etc 20 images and then train. i recently did this for a client, he had only 4 images and worked really good

Furkan Gözükara

use 12GB_GPU_11100MB_9.5_second_it_Tier_1 - it works faster for some reason even kohya couldnt figure out :D

Furkan Gözükara

es igual que un checkpoint de un videojuego, una captura de un momento especifico del entrenamiento, al que puedes volver y revisar.

Manuel Garrido Peña

Quick questions. Assuming I have only 2 images of the subject (a person), what would be the reccomended flow to fine tune? More epochs?

David Fava

Hi ! I have a RTX 3090 with 16 Gb of VRAM and I plan to use 100 images datasets everytime to have a streamlined homogeneous workflow. According to your experience, what would be the optimal Dreambooth finetuning setting to ensure both ultra high fidelity and flexible stylization, knowing all these elements ? Is this one ok (16GB_GPU_15150MB_11.6_second_it_Tier_1) or should I use this one (12GB_GPU_11100MB_9.5_second_it_Tier_1) ? Thanks in advance for your reply man !

Skuuurt

check point is a full standalone model. base model. like 23.8 GB FLUX model

Furkan Gözükara

Disculpas por mi ignorancia, para que sirve un checkpoint?

Juan Grande

yes it is coming hopefully soon. i will start working on it hopefully

Furkan Gözükara

you dont need more than 30 gb vram for max quality at batch size 1. higher VRAM helps with higher batch size but each batch size requires a new learning rate and even with best learning rate it reduces quality slightly

Furkan Gözükara

What is the purpose of having more VRAM, I am using an A100 with 80GB, so should I increase the amount of 'Train batch size', does this not affect the quality of the training?

Minastrin

Will the flux training script work on training SD3.5 Large?

Henry

you cant split but you can train on each one thus multiply the effective batch count

Furkan Gözükara

yes we have 2 GPUs tutorial here : https://youtu.be/-uhL2nW7Ddw the only difference for fine tuning is that, you need to use at least 80 GB GPUs and use SXM machines not PCI ex

Furkan Gözükara

you cant split vram usage like that, you need to have a single gpu with enough vram for fine tuning. multiple gpus is only for generating more then one picture at the same time

Markus

How to use 2GPU for this in RUNPOD ? any tutorial or guide ?

Minastrin

thanks just asked that to massed compute team

Furkan Gözükara

Thinlinc client is extremely laggy and slow, it's almost unusable. Do you have any advice for this? Thank you for amazing tutorials!

Ava Frigg

yep. and it will become faster with newest branch once arrives like 5 second / it

Furkan Gözükara

I've done two tests on 4090 today 12 GB Training, 10 photos 1600/1600 [5:14:05<00:00, 11.78s/it, avr_loss=0.427] 24GB Training, 10 photos 1600/1600 [3:09:32<00:00, 7.11s/it, avr_loss=0.417] Does this speed make sense for 4090?

Dmitrii

Rica ederim

Furkan Gözükara

Anladım. Vakit ayırdığınız için teşekkür ederim

Ii I

shared memory is mandatory since it needs 28 gb to fine tune. it speeds up like 30% at least. your setup seems to me slower than expected atm

Furkan Gözükara

Thank you for your response. So, with this new branch, would the processing time be roughly cut in half, reducing from 30 hours to about 15 hours? Also, I'm concerned about using shared memory - is this slowdown normal for everyone, or is it just my setup? Are you also experiencing these issues with shared memory, or could there be something wrong with my PC configuration or method?

Ii I

if you are using runpod all paths starts with / so it has to be /workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json

Furkan Gözükara

your speed is slower than expected. also fast branch is about to arrive which works 5.5 second / it on rtx 4090 we tested. i am waiting kohya to merge it

Furkan Gözükara

Hello(•ᴗ•❁), thank you for your easy-to-follow video tutorials and useful toolkit. I have a question about running DreamBooth locally with an RTX 4090. How long should it typically take? In my case, I'm seeing the following output: steps: 1%|▌| 118/10050 [22:23<31:24:31, 11.38s/it, avr_loss=0.435] I'm using the 24GB_GPU_23150MB_10.2_second_it_Tier_1.json configuration. The GPU is using 20~22GB of its dedicated memory and about 10GB of shared memory. My system has 64GB of RAM. It's estimating over 30 hours to complete - is this normal? I'd really appreciate your insight on whether these training times are typical for this setup."

Ii I

im not able to load any config or any model, when you are adding path for flux dev, its automatically detecting but not in my case, im using jarvis labs, when i tried to load configs got file not found error, even if i put '/' first here is the complete log--- [notice] A new release of pip is available: 23.0.1 -> 24.3.1 [notice] To update, run: pip install --upgrade pip 2024-11-11 15:18:38.888816: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-11-11 15:18:38.888844: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-11-11 15:18:38.890127: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-11-11 15:18:38.896984: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-11-11 15:18:39.995053: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 15:18:44-133104 WARNING Skipping requirements verification. 15:18:44-137678 INFO headless: False 15:18:44-138992 INFO Using shell=True when running external commands... * Running on local URL: http://0.0.0.0:6007 * Running on public URL: https://345ef0cd6caac1f47a.gradio.live This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces) Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2014, in process_api inputs = await self.preprocess_data( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1709, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in preprocess choice_values = [value for _, value in self.choices] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in choice_values = [value for _, value in self.choices] ValueError: not enough values to unpack (expected 2, got 0) Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2014, in process_api inputs = await self.preprocess_data( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1709, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in preprocess choice_values = [value for _, value in self.choices] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in choice_values = [value for _, value in self.choices] ValueError: not enough values to unpack (expected 2, got 0) Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2014, in process_api inputs = await self.preprocess_data( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1709, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in preprocess choice_values = [value for _, value in self.choices] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in choice_values = [value for _, value in self.choices] ValueError: not enough values to unpack (expected 2, got 0) Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2014, in process_api inputs = await self.preprocess_data( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1709, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in preprocess choice_values = [value for _, value in self.choices] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in choice_values = [value for _, value in self.choices] ValueError: not enough values to unpack (expected 2, got 0) Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2014, in process_api inputs = await self.preprocess_data( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1709, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in preprocess choice_values = [value for _, value in self.choices] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in choice_values = [value for _, value in self.choices] ValueError: not enough values to unpack (expected 2, got 0) Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2014, in process_api inputs = await self.preprocess_data( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1709, in preprocess_data processed_input.append(block.preprocess(inputs_cached)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in preprocess choice_values = [value for _, value in self.choices] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/components/dropdown.py", line 193, in choice_values = [value for _, value in self.choices] ValueError: not enough values to unpack (expected 2, got 0) Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/lab/tree/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'lab/tree/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'lab/tree/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/tree/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'tree/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json' Traceback (most recent call last): File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events response = await route_utils.call_process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api output = await app.get_blocks().process_api( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api result = await self.call_function( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function prediction = await anyio.to_thread.run_sync( # type: ignore File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread return await future File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run result = context.run(func, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper response = f(*args, **kwargs) File "/workspace/kohya_ss/kohya_gui/dreambooth_gui.py", line 455, in open_configuration with open(file_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: 'workspace/config/48GB_GPU_28200MB_6.4_second_it_Tier_1.json'

402

i think good speed because that is 2.25 times more pixels

Furkan Gözükara

yes i tried : https://www.reddit.com/r/SECourses/comments/1f4v9lh/trained_a_lora_with_flux_schnell_turbo_model_with/

Furkan Gözükara

Just curious, have you tried to fine tune Schnell?

Henry

Training 1536x1536 now with a 4090 at 9.1 it/s. It seems like 16 blocks_to_swap works.

Henry

4x lanes could be 100% impacting block swap thing because it moves data. good catch

Furkan Gözükara

Got it running with blocks_to_swap set to 12 instead of 10 for 1024x1024. I noticed a funny thing with 2 GPUs. I tested 2 machines with 2x 4090 each, and on both of them GPU 0 is running at 5.5s/it while GPU 1 is running at 7.5s/it for training on the exact same config. I'm using the integrated mother video output. I think in the setup script for GPUs I choose "all", uncommented "set CUDA_VISIBLE_DEVICES=1" in the startup bat file, and choose GPU id in the UI config section as 0 or 1. Maybe it has to do with the PCI-e x8 lanes on GPU0 vs x4 lanes on GPU1? I suspect it's the slow x4 lanes on GPU1 affecting the blocks_to_swap maybe. Hopefully the 5090 won't have to swap and will not be affected by this

Henry

it is not related to my installers. kohya doesnt support 00001-of-00003.safetensors . it supports 23.8 gb safetensors

Furkan Gözükara

well you need to test swap count for each resolution. monitor your VRAM usage. I dont recommend bucketing it may not work or it may cause errors which happened in past.

Furkan Gözükara

Your configs are awesome! What do you recommend for the blocks_to_swap value on a 24gb vram card for fine tuning at larger resolutions at like 1280x1280, 1536x1536 , or 2048x2048? Also, the script is freezing after 2 epochs when I tried to fine tune with 800 images with buckets enabled. I suspect I need to raise the blocks_to_swap value there from 10 to something like 12? Or maybe do not enable buckets?

Henry

I don't know what I am doing wrong. I download Kohya_FLUX_DreamBooth_v11.zip, unzip to Runpod, follow your install instructions as they are written. Start training and I get this error: FileNotFoundError: No such file or directory: "/workspace/_models_flux/transformer/diffusion_pytorch_model-00001-of-00003.safetensors" Then the training ends automatically. I've used your installers below, I used v10 no issue. v11 seems to introduce the attempt to locate /transformer/diffusion_pytorch_model-00001-of-00003.safetensors The only thing I did differently was change the path to where I have downloaded wget https://huggingface.co/OwlMaster/realgg/resolve/main/flux1-dev.safetensors wget https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors wget https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors wget https://huggingface.co/OwlMaster/realgg/resolve/main/ae.safetensors

Pew

why you are doing such thing? kohya doesnt support that way training as far as i know

Furkan Gözükara

Hey Doc, using Runpod and I continue to get: FileNotFoundError: No such file or directory: "/workspace/_models_flux/transformer/diffusion_pytorch_model-00001-of-00003.safetensors" When using 'Kohya_FLUX_DreamBooth_v11.zip' for which I've modified 'Download_Train_Models.py' to be (seen below): Note, I've also downloaded this same files using: wget https://huggingface.co/OwlMaster/realgg/resolve/main/flux1-dev.safetensors wget https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors wget https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors wget https://huggingface.co/OwlMaster/realgg/resolve/main/ae.safetensors The outcome is the same, downloading direct or via script, I get FileNotFoundError: No such file or directory: "/workspace/_models_flux/transformer/diffusion_pytorch_model-00001-of-00003.safetensors" I have reinstalled Kohya_ss using the Kohya_FLUX_DreamBooth_v11.zip installer for Runpod, and there is no difference, the issue remains. ``` from huggingface_hub import snapshot_download import os # Set the directory to save the downloaded files local_dir = "/workspace/_models_flux/" # Ensure the local directory exists os.makedirs(local_dir, exist_ok=True) # Specify the repository to download repo_id = "OwlMaster/FLUX_LoRA_Train" # Download snapshot snapshot_download(local_dir=local_dir, repo_id=repo_id) print(".\n.\nDOWNLOAD COMPLETED To The Bat File Run Folder - Check Folder Content") ```

Pew

you are out of disk. get a new pod and make temp disk size 30 gb

Furkan Gözükara

cd /workspace export HF_HOME="/workspace" export PYTHONWARNINGS="ignore" source venv/bin/activate python app.py --share Matplotlib created a temporary cache directory at /workspace/matplotlib-b9zhu27t because the default path (/root/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. WARNING ⚠️ user config directory '/root/.config/Ultralytics' is not writeable, defaulting to '/tmp' or CWD.Alternatively you can define a YOLO_CONFIG_DIR environment variable for this path. Traceback (most recent call last): File "/workspace/app.py", line 7, in from ultralytics import YOLO File "/workspace/venv/lib/python3.10/site-packages/ultralytics/__init__.py", line 11, in from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld File "/workspace/venv/lib/python3.10/site-packages/ultralytics/models/__init__.py", line 3, in from .fastsam import FastSAM File "/workspace/venv/lib/python3.10/site-packages/ultralytics/models/fastsam/__init__.py", line 3, in from .model import FastSAM File "/workspace/venv/lib/python3.10/site-packages/ultralytics/models/fastsam/model.py", line 5, in from ultralytics.engine.model import Model File "/workspace/venv/lib/python3.10/site-packages/ultralytics/engine/model.py", line 11, in from ultralytics.cfg import TASK2DATA, get_cfg, get_save_dir File "/workspace/venv/lib/python3.10/site-packages/ultralytics/cfg/__init__.py", line 10, in from ultralytics.utils import ( File "/workspace/venv/lib/python3.10/site-packages/ultralytics/utils/__init__.py", line 816, in USER_CONFIG_DIR = Path(os.getenv("YOLO_CONFIG_DIR") or get_user_config_dir()) # Ultralytics settings dir File "/workspace/venv/lib/python3.10/site-packages/ultralytics/utils/__init__.py", line 799, in get_user_config_dir path.mkdir(parents=True, exist_ok=True) File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode) OSError: [Errno 122] Disk quota exceeded: '/tmp/Ultralytics' Getting this error in runpod. Please help me resolve this. I am using A6000 GPU

vijay kumaran

accelerator device: cpu - your accelerator is not setup correctly. what is your GPU?

Furkan Gözükara

It's always %0 unable to proceed, is there any way to fix it?

佳伟 赵

INFO [Dataset 0] config_util.py:573 INFO loading image sizes. train_util.py:923 100%|█████████████████████████████████████████████████████████████████████████████████| 45/45 [00:00<00:00, 344.01it/s] 2024-11-05 13:07:49 INFO prepare dataset train_util.py:948 INFO preparing accelerator train_network.py:373 accelerator device: cpu INFO Checking the state dict: Diffusers or BFL, dev or schnell flux_utils.py:43 INFO Building Flux model dev from BFL checkpoint flux_utils.py:101 INFO Loading state dict from D:/sdai/KaraDetroit_lora/flux1-dev.safetensors flux_utils.py:118 2024-11-05 13:07:51 INFO Loaded Flux: flux_utils.py:137 INFO Building CLIP-L flux_utils.py:163 INFO Loading state dict from D:/sdai/KaraDetroit_lora/clip_l.safetensors flux_utils.py:259 2024-11-05 13:07:54 INFO Loaded CLIP-L: flux_utils.py:262 INFO Loading state dict from D:/sdai/KaraDetroit_lora/t5xxl_fp16.safetensors flux_utils.py:314 2024-11-05 13:07:55 INFO Loaded T5xxl: flux_utils.py:317 INFO Building AutoEncoder flux_utils.py:144 INFO Loading state dict from D:/sdai/KaraDetroit_lora/ae.safetensors flux_utils.py:149 2024-11-05 13:07:58 INFO Loaded AE: flux_utils.py:152 import network module: networks.lora_flux 2024-11-05 13:07:59 INFO [Dataset 0] train_util.py:2493 INFO caching latents with caching strategy. train_util.py:1048 INFO caching latents... train_util.py:1097 0%| | 0/45 [00:00

佳伟 赵

ok that is why. many people upgraded to 64 gb and their problem fixed

Furkan Gözükara

i have 32 GB RAM

Mostafa Elnajar

how much RAM you have? lesser than 64 GB also getting similar errors

Furkan Gözükara

i install v10 the same not start training but lora work perfect

Mostafa Elnajar

sadly this doesnt tell the error. do a fresh install. also install requirements as in this video : https://youtu.be/DrhUHnYfwC0 after this install v10 file. also make a fresh nvidia driver install as well reset all settings

Furkan Gözükara

steps: 0%| | 0/1600 [00:18 sys.exit(main()) File "C:\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Python310\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Python310\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Python310\\python.exe', 'C:/Users/Mostafa/Desktop/Kohya_FLUX_DreamBooth_v9/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'C:/Users/Mostafa/Desktop/New folder (5)\\model/config_dreambooth-20241101-210644.toml']' returned non-zero exit status 1. 21:07:37-555734 INFO Training has ended. give me every time i run dreambooth steps: 0%| 4070ti super

Mostafa Elnajar

Sadly no such formula exists. Batch size 1 is best but for speed up you can calculate which batch size gives you most speed. But for each batch size you need a new unique Learning rate.

Furkan Gözükara

Hi, thank you for the great content! Is there a formula that can give the best epoch give the number of training images? And any suggestion for the right batch size in general?

David Fava

you welcome. actually 24 gb config should work with 32gb RAM but i suppose i was wrong at this :D

Furkan Gözükara

Damn, thank you.

Neven Krcmarek

32 GB is your problem. sadly you need min 48 GB physical ram. there were few other people with 32GB, they upgraded to 64GB and issues solved

Furkan Gözükara

Hmm now I was able to see via NVITOP what is going on. Regular RAM is maxed out and then it reports the problem above. There is nothing to do with VRAM. I've tried the 10GB config and VRAM never goes above 10GBs like it should. I have 32 GB of regular RAM and I have virtual memory on system managed size.

Neven Krcmarek

There are new drivers available 566.03. Should I update to this new version also?

Neven Krcmarek

I'm loading into a Dreambooth tab. Shared memory is enabled (16357mb). Nvidia drivers are 555.99. And I did try 3 configs: 24GB_GPU_23150MB_10.2_second_it_Tier_1.json 16GB_GPU_15150MB_13.8_second_it_Tier_1.json 12GB_GPU_10800MB_17.2_second_it_Tier_1.json By resetting all the settings, you mean in kohya ss? Thank you.

Neven Krcmarek

are you loading into dreambooth tab? how much RAM you have? shared VRAM enabled or disabled? you can try a fresh nvidia driver install and reset all settings too

Furkan Gözükara

yes. as a base model give epoch 60 checkpoint, train 40 more epochs

Furkan Gözükara

out of memory error. tell me your config name and gpu you are using

Furkan Gözükara

Is there any way to continue training after it was interrupted? Last checkpoint I have saved is from epoch 60, and I wanted to train till 100 epochs. Is there any way to continue from epoch 60 checkpoint. I'm using runpod.

Rustic Engineering

I'm on 3090 and I've tried the 24gb, 16gb and 12gb config and always get this error: Traceback (most recent call last): File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\sd-scripts\flux_train.py", line 998, in train(args) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\sd-scripts\flux_train.py", line 787, in train model_pred = flux( File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 819, in forward return model_forward(*args, **kwargs) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 807, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 44, in decorate_autocast return func(*args, **kwargs) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\sd-scripts\library\flux_models.py", line 1098, in forward wait_for_blocks_move(unit_idx, futures) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\sd-scripts\library\flux_models.py", line 1075, in wait_for_blocks_move ftr.result() File "C:\Users\Neven\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 451, in result return self.__get_result() File "C:\Users\Neven\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\_base.py", line 403, in __get_result raise self._exception File "C:\Users\Neven\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\sd-scripts\library\flux_models.py", line 1053, in move_blocks block.to("cpu", non_blocking=True) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1340, in to return self._apply(convert) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply param_applied = fn(param) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert return t.to( RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. steps: 0%| | 0/6500 [03:43 sys.exit(main()) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "d:\Kohya_FLUX_DreamBooth_v9\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['d:\\Kohya_FLUX_DreamBooth_v9\\kohya_ss\\venv\\Scripts\\python.exe', 'd:/Kohya_FLUX_DreamBooth_v9/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'D:/LORA_for_Clients/Mike/output\\model/config_dreambooth-20241031-104818.toml']' returned non-zero exit status 1. 10:56:00-146229 INFO Training has ended.

Neven Krcmarek

yes usually i train up to 200 epochs and compare checkpoints to be sure

Furkan Gözükara

yes i have a full tutorial for art style here : https://huggingface.co/MonsterMMORPG/3D-Cartoon-Style-FLUX

Furkan Gözükara

sorry I did more training and around 80 epoch I started getting good results. Before I was testing around 60 epochs.

Rustic Engineering

Hi. Did you also try training it with artstyles? I have done my first finetune on artstyle but the results are not very good. Artstyle looks diluted, it is similar to how artists style become less visible as prompt gets longer, and anatomy also became bad. With shorter prompt style shows more.

Rustic Engineering

follow video again you are giving inaccurate folder path. actually i explained logic in this video : https://youtu.be/nySGu12Y05k

Furkan Gözükara

My images are stored in a folder named example. This example folder is placed inside kohya_ss/dataset. I have specified the image file path as kohya_ss/dataset, but every time I try, it shows that I have 0 images. Do you have any ideas on what might be causing this issue? Things I’ve tried: 1. Changing the image file path to kohya_ss/dataset/example. 2. Renaming example to 1_example (since the step count is 1). Even after these changes, it still doesn’t work. I am currently using RunPod.

대훈 조

thanks. de-distilled models hopefully coming soon :D

Furkan Gözükara

Couple things for you: Run Pod read me text has: Then you can set their paths on GUI check Example Full Setup.jpg But can't find that JPG anywhere Also: Really need a version for the new De-Distilled Flux versions. :)

Andrew Taylor

Awesome! Thank you for the quick reply and sorry for deleting the main message but I had some private information in the logs I did not see.

Wolf

hi please reinstall he fixed errors few minutes ago

Furkan Gözükara

Wolf

looks like bmaltais broken gonna report. until get fixed you can checkout branch to : 64fd4b201243d0da2db3e3d537aa994328252975

Furkan Gözükara

I can't get started on Massed Compute. This command doesn't seem to work anymore: chmod +x Massed_Compute_Kohya_FLUX.sh ./Massed_Compute_Kohya_FLUX.sh Here's what the terminal looks like when running it: https://drive.google.com/file/d/1BJpAR0r6QZTwAkDBLHaayiL9Q-6pHxwJ/view?usp=drive_link Any idea why? Update: Same thing happens on RunPod as well

Julius

the only changed parameter is SPDA to xformers. can you try SPDA? i tested on linux it uses still 7 GB VRAM

Furkan Gözükara

With the latest Torch 2.5 - 8GB card config isn't working. 64GB System Ram, running headless so only 0-200MB memory used by card. Older configs use to work, but now no way to go back.. Here are some of the errors. Tried both 8GB and 6GB configs, with 35/36 block swap. Error RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. OR: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm) OR: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.90 GiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 8.05 GiB is allocated by PyTorch, and 2.07 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.

UC Arcade

Thanks, yes that is a workaround that works

Chris

awesome strategy

Furkan Gözükara

great strategy

Furkan Gözükara

i found a solution during my trainning. Do a first trainning till 140 epoch and just save the last one. Then take the resulting model, put it in the source model and do a 60 epoch workout again (200-140) keeping every 10 epoch

io io

Thank you very much for your feedback. I finished my first training session and the result is as you claim. I found a little trick to save a little space during training. I train on 100 epoch by saving that the final model. I get this model and there, I continue to train it 100 epoch by saving all 10 epoch. This means that we don't want to keep the first 9 models.

io io

same i am waiting 5090 as well the pictures leaked :D

Furkan Gözükara

i asked this but he didnt do sadly

Furkan Gözükara

22.5 second sounds to me decent speed. so it found the cuda otherwise it wouldnt work. hopefully i will update installers and xformer for torch 2.5 . but it shouldnt impact speed if you already used torch 2.5

Furkan Gözükara

very good question :))

io io

can't upgrade to 64GB system ram, motherboard ist too old :D time to save money for a new pc with a 5090 then :D

Chris

okay thanks, will have a look!

Chris

Hi Furkan, do you think the following feature can be added by kohya into his dreambooth ui? only start saving checkpoints when a certain epoch is reached. for example, i only want to compare epoch 140,160,180,200 full checkpoint files, so no need to save epoch 20,40.....120.... but only start saving at epoch 140 and then every XY steps. maybe this would be a useful feature? can you request that feature at kohya?

Chris

Salut Furkan. FIRST tank you and Congratulationsfor your work. I'm trying to find a solution by my way , but if someone get the same error than me, may be that could be faster to fix it, ? or not. I trainning on 4060 16vram, 64 ram, windows. I get a warning mesage at each trainning i try. WARNING: Failed to find CUDA. WARNING: Failed to find CUDA. I dont have any error message when i install all prerequired and kohya. So my speed it's around 22.50s/it after 50 steps, so i don't now if it's normal, or if i need to fix something. I try uninstall prerequired and reinstall , but nothing change.

io io

yes you need at least 48 GB RAM. if you rent 4x A6000 it wont impact your speed but you will have 1024 GB true. but you can do 4 trainings at the same time or you can rent a single L40S and train faster and get 640 GB or you can rent 4x A6000 and do a 4x faster LoRA training

Furkan Gözükara

If I understood correctly, I won't be able to run this training locally on my RTX 3060 12GB if I only have 16GB of RAM? If I'm right, with 220 images, what would be the best configuration according to you, which graphics card should I rent from MASSED COMPUTE. I know I need lots of space, so for example if I rent A6000 x 4, will have 1024GB storage, but will I have any speed advantage when training ? Between rent 1 vs 4 GPU ?. I'm not sure what set is best regarding price.

Arkadiusz D

It even goes up to ~50GB when loading the model. 48 might not be enough; haven't tried though as I have 64GB :)

Diffusor

nope it doesnt work that way :) actually fine tuning needs 75 GB VRAM on each GPU when you use 2x GPU :)))

Furkan Gözükara

Just a thought - if finetuning requires 28gb VRAM and most high end GPUs have 24gb, would it make sense to get a cheap 2nd graphics card to take the total GPU VRAM over 28gb (e.g. an RTX 4090 + GTX 1070), or would the different GPU specs cause more trouble than it's worth? I know it's best to use two of the same graphic cards, but if the finetuning is going to use shared RAM regardless for the extra 4gb it needs, wouldn't it still be better if this 4gb came from another low/mid-end GPU rather than system RAM?

GH

woops, sorry Chris

GH

awesome

Furkan Gözükara

because it really doesnt tell you it is undertrained or overtrained. look at your loss and your best checkpoint after training via manual analysis as i do (grid generation + compare) and you will see what i mean

Furkan Gözükara

Thanks for link GH. If you have 12 GB VRAM, you need min 48 GB system RAM

Furkan Gözükara

For 32 GB RAM you can do LoRA at the moment. Sadly Fine Tuning we have to wait Kohya if he can reduce RAM requirement. But upgrading RAM is cheap

Furkan Gözükara

You already can - https://www.patreon.com/posts/huge-news-for-as-113531833

GH

Hi Furkan, do you think it will sometime be possible to train full checkpoints with 3060 12GB vram and only 32GB system ram? Are there any optimizations in the pipeline coming from koya that you know of?

Chris

I fixed it!!!! What a shame hahahaha. If someone tells the same story as me, it is so simple to fix, only create a new clean Kohya installation and then it will work fine. Now running at 10.45s/it, I'm so happy!! Thank you so much for the patience.

Vicent Guardiola

Huh, that's the first time I've heard someone say that loss is meaningless in this context! Could you explain why? I thought loss measured how different the generated image is from the actual training image at each step. Wouldn't that show how close the model is to being progressed or cooked?

GH

great idea. almost ready

Furkan Gözükara

loss is totally meaningless when training text to image diffusion models :D

Furkan Gözükara

you can't improve speed sadly. only reducing training resolution may improve but it will reduce quality.

Furkan Gözükara

if you become gold member, do everything you can and once failed, message me from discord and i will connect your PC and solve. also i just recorded a new fine tuning tutorial a very detailed one and shown how to install and setup on runpod again in details. hopefully will be published very soon

Furkan Gözükara

Also, out of curiosity, what does your average loss value tend to be as you near the end of your training? I know this will be highly subjective to the dataset, number of steps, quality of training prompts etc, but just curious what value range it typically is

GH

Hey Furkan, I'm using "Quality_1_22460MB_15_12_Second_IT" on an RTX 3090 with 64gb RAM, and I'm getting around 14it/s. If I want to increase the it/s speed without reducing quality by any noticeable amount, what changes would you recommend? I.e. what changes would only reduce the quality very slightly but bring decent speed improvements? Thanks!

GH

Hi, tried this last night on Runpod. Followed your video exactly. Didn't even come close to working. I tried it again this morning from a new permanent storage (clean install from scratch). Didn't work, again. Not sure what to do. I have images of the errors in the jupyterlab but they are beyond my ability to understand.

John Habermeier

Yes, my computer is doing nothing but training. I will wait to see your new tutorial and try it step by step with it. For me, it's a bit strange, some weeks ago I successfully trained a full model with the same PC. Never mind, I will wait for the new tutorial and I will let you know if it's working, probably I'm doing something wrong.

Vicent Guardiola

hi yes. you should get like 10 second / it when your computer is doing nothing except training. so you have a mistake somewhere - i tested this on my PC

Furkan Gözükara

Hi! I think something is wrong with my training. RTX 3090, and is using 23.7VRAM and 4GB of shared VRAM. The speed is horrible, I paste here a little bit of logs. The config is the 24GB config for full training, not lora. Thanks! enable full bf16 training. running training / 学習開始 num examples / サンプル数: 11 num batches per epoch / 1epochのバッチ数: 11 num epochs / epoch数: 260 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 2860 steps: 0%| | 0/2860 [00:00

Vicent Guardiola

I extensively tried using prior preservation loss and class images and none worked. but there is a new feature just merged like today and hopefully will test it intensively

Furkan Gözükara

Hi Dr. Have you tried using prior loss during object customization with Dreambooth (instead of the current focus on character personalization)? When working on personalizing a dog with Dreambooth, I found that using prior preservation loss did not yield good results.

Dragon DarkMoon

wow that is a significant difference. yes drivers can make impact 100%

Furkan Gözükara

Guys, when you're training on Windows make sure to use NVIDIA driver 531.18. On my 3060 with 12GB - with 560.94 I'm getting ~33 s/it -with 531.18 it goes down to 29.3s/it it's not much but it's honest work :)

Diffusor

i am editing around 3 hours tutorial right now for fine tuning tutorial . yes you can do that but flux is currently not very good for multiple subjects. it is in research and we might obtain it soon

Furkan Gözükara

sure you can give 160 as a base and it will be like continuing. but remember it will show as starting from 0, thus make your new epoch account accordingly. like if you set 60 it will be total 160+60

Furkan Gözükara

Okay thanks! Will try with 160, I can resume at 160 if i leave the trained directory where it is and train again and adjust for 200 epochs, right?

Chris

hi ping me in the discord so i will ping the staff members to reply you

Furkan Gözükara

there is sadly not a rule for all. so i would train up to 200 and compare

Furkan Gözükara

each checkpoint is 24 gb. you can rent multiple GPUs and delete existing apps and models on the massed compute. i have shown that in the upcoming tutorial.

Furkan Gözükara

How much storage does dreambooth training require? I noticed that for 1 GPU A6000 on MassedCompute I only get 256GB of storage, pretty sure it used to be 1TB before? My training got aborted due to storage full…

Gregory Nilsson

Hi Furkan, do you think that using 20 or 30 Images instead of 15, one should aim for another epoch saved checkpoint? Like 180 or 200? Or do 15,20,30 pictures not make any significant difference and 160 epochs are always a good choice?

Chris

Hi Dr! what is mass compute, this company is not registred anywhere... no terms and conditions, no privacy policy... very weird!

Ralph Erre

I'd rather train both Actors at once than consecutively.

Diffusor

Hi, Just a quick question, Are u planning on making a tutorial for finetuning? Let's say, I'm training using fine-tuning of an actor, and the resultant 23 GB sft file will have that actor baked into the base model ? is this correct? Then can I use this model to train another actor and save it as another 23 GB sft file? so will the second sft file have both actors baked in? i.e., without loras, I can give their names in prompts and get results??

V Santhosh

100% awesome

Furkan Gözükara

Ok, so I upgraded my RAM to 64GB just to be able to finetune locally and it works. Getting 30sec/it on my 3060 with 12GB - which is only three times slower than the 4090 I had rented on runpod. Not too bad.

Diffusor

just submit the final finetuned .safetensors as the base model - and yes. there are state saving but i never used :/

Furkan Gözükara

Thanks, my first training has completed successfully. The final loss/epoch is around 0.3, which I think is a bit high, but I'll test it out. When I trained loras on ai-toolkit I was getting around 0.2 on a good dataset. So, my next question :) What is the best way to resume training from the GUI if I want to train for say extra 100 epochs? Do I just submit the final finetuned .safetensors as the base model and use the same training images? It would reset all the counters in file names though, and the tensorboard graph will start fresh. I'd prefer to continue the training from where it left off, if possible.

st

i think you need to test both cases as they are very experimental. i usually improve dataset and train from 0 :D

Furkan Gözükara

he just fixed this error. please reinstall : https://github.com/bmaltais/kohya_ss/issues/2908#issuecomment-2422522249

Furkan Gözükara

I re installed everything but now when i start windows_start_koyha_ss.bat my first line is: WARNING Skipping requirements verification. This problem is gone when i start gui.bat from the folder itself. But than I have the following error when starting a training: ImportError: cannot import name 'cached_download' from 'huggingface_hub'

Winant Veldhuizen

SwarmUI developer made diffusion_models is official folder, so he changed from unet to diffusion_models

Furkan Gözükara

100% normal. because fine tuning needs minimum 28 GB VRAM therefore it will use shared VRAM. 7s / it is very decent speed

Furkan Gözükara

if you record video of your entire process i can know. otherwise impossible to guess. if you also upgrade gold tier i can check your computer quickly and tell your mistake

Furkan Gözükara

Hi! I tried your configs for LoRa (tier 3 which fit op 4090) and Finetune with 2 databases (car and person), but both turn out bad. I see some stuff come over from the training data but it does not look at all like the person or car. I used Fluxgym in the past with the same databases and they turned out fine. I am using 1 repeats and 200/300 echos (making around 5000 steps, and saving every 50 epoch to test). I am using your installer and it is up to date. Do you have any clue what could be the issue?

Winant Veldhuizen

Hi Furkan, thank you for your work. I have a few questions. My setup: * running dreambooth training on a 4090, 32GB RAM * using config "24GB_GPU_23150MB_10.2_second_it_Tier_1.json" * before training dedicated GPU VRAM usage is exactly 0 because I'm using integrated motherboard video output for monitor. During training, dedicated GPU VRAM usage is around 21-22GB, but shared GPU memory usage is also around 10GB. Is that normal? I thought it would all fit into 24GB dedicated VRAM? Nonetheless I'm getting around 7s/it speed. I guess it's a normal speed?

st

fine tuning If the 4090's 24G video memory uses shared video memory, will the speed be very slow?

sunny

I see you wrote "The generated checkpoint files should be put into \SwarmUI\Models\diffusion_models". I was using it in Unet folder and worked fine. I now directed the models folder to My ComfyUI Unet folder so I don't duplicate same models. Is there a reason for puting the checkpoints in diffusion_models?

Robert Arsene

thank for the reply. I will try again with different dataset. I have another question. if I want to continue improve one checkpoint. should I continue finetune it using Dreambooth to add more concept or can I just merge it with other Lora?

Mime of Culture

i am recording a video right this moment :d

Furkan Gözükara

Could you make a YouTube tutorial on this? would be awesome as many of the latest posts are regarding this subject. Keep doing what you doing, great research!

Arian Moeini

bence gerek yok. eğer eğitim setin consisten ise, sadece şöyle yapabilirsin : zettkose sectional sofa

Furkan Gözükara

yes i am preparing please open bell on our youtube channel step by step guide : https://www.youtube.com/secourses

Furkan Gözükara

thanks. impossible to know without seeing all details. i did 10s of trainings never had such issue.

Furkan Gözükara

Great Work!. I tried Flux Dreambooth for the first time last night. the generated image has many artifact line or small square in it. do you know the cause?

Mime of Culture

I still don't quite understand it. Can you post a detailed graphic tutorial? Thank you

sunny

Modelimin tetikleyici kelimesi "zettkose" ve class ise "sectional sofa" Her görsel txt'sinde aşağıdaki gibi JoyCaption ile oluşturduğum ve biraz düzenlediğim metinler var; (zettkose sectional sofayı bir kere kullanmayı tercih ettim, bilmiyorum iyi mi ettim.) This image is a photograph of a modern, L-shaped zettkose sectional sofa. The sofa is upholstered in a dark gray fabric with a textured, woven appearance that adds a sense of depth and comfort. The sofa features a clean, minimalist design with straight lines and simple curves. The backrest and armrests are padded and have a slightly rounded shape, providing a comfortable seating experience. The sofa has a single seat cushion and two back cushions, all upholstered in the same dark gray fabric. Each back cushion is adorned with a decorative pillow that adds a touch of style and comfort. These pillows are rectangular in shape and feature a white grid pattern on a black background, enhancing the modern and contemporary aesthetic of the sofa. The legs of the sofa are dark brown, adding a touch of warmth and contrast to the overall design. The legs are slender and tapered, contributing to the sleek and modern appearance of the piece. The background is white.

Pixel Reaction

you are welcome

Furkan Gözükara

txt dosyaları içerisinde ne yazdın? npz dosyaları normal

Furkan Gözükara

Ok! Thanks!!

Vicent Guardiola

"DreamBooth_Tab_Fine_Tuning_Best_FLUX_Configs" klasörü içerisindeki ikinci görsel beni yanılttı sanırım. Gerçekçi çıktılar için Batch size 1'in, stilize sonuçlar için de 7'nin iyi sonuçlar verdiğini yazmışsınız o yüzden 1 yaptım. Bir de data setimdeki her görsel captionu için ayrı bir .txt doyası hazırlayıp ekledim bu bir sorun doğurur mu? Aşağıdaki örnek gibi; owh man (1).png owh man (1).txt owh man (2).png owh man (2).txt Bir de son olarak training devam ederken img klasörüne girdiğimde her görsel için birer tane flux_te.npz ve 2048x2048_flux_te.npz dosyasının oluştuğunu gördüm. Normal mi? Acemi sorularla vaktinizi çalmak istemiyorum ancak süreci doğru öğrenmek istiyorum.

Pixel Reaction

massed compute üzerinde eğitiyorsan batch size 7 kullan epey daha hızlı olacak. o config 24 gb lar için :) kohya otomatik 1024 px boyuta downscale eder. evet bf16 eğitip fp16 saklıyoruz böyle gerekiyor

Furkan Gözükara

Merhaba, Massed Compute üzerinde bir deneme başlatabildim nihayet. Mantıklı mı bilmiyorum ama bir ürün (köşe koltuğu) eğitimi denemek istedim. 16 görsel / 200 epoch / 3200 steps / 1 repeat / 8-10 saat arası sürecek gibi. (nesne eğitimi ile ilgili bir eğitim videosu çok ilgi görür eminim, malum nette pek fazla kaynak yok.) 1x RTX A6000 GPU ve sizin verdiğiniz config dosyasını kullandım. DreamBooth Sekmesinde (24GB_GPU_23150MB_10.2_second_it_Tier_1) Bu arada görseller 2048*2048 boyutlarında olsa da Kohya üzerinde çözünürlük ayarı yapmadığımı farkettim. Bu nasıl bir etki eder? Eğitim videonuzda Kohya'yı bf16 kuruyoruz ama arayüzde "Save precision: Float" seçiliydi. (videoda) Config dosyasını yüklediğimde fp16 seçili halde olduğu için dokunmadım. Yukarıda belirttiklerim ve model eğitimini başlattığımda aldığım bazı uyarılar (aşağıda) dikkate alındığında eğitimi devam ettirmeli miyim? Süreç başlayalı 1 saat kadar oldu. (çok uzun olmaması için tüm terminal log'u nu eklemedim ama gerekliyse ekleyebilirim.) Teşekkürler. WARNING no regularization images WARNING because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 / max_grad_norm WARNING constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません train_util.py:4767 enable full bf16 training.

Pixel Reaction

not yet but hopefully very soon. it is exactly same as LoRA but you load into DreamBooth tab

Furkan Gözükara

no you can put anywhere

Furkan Gözükara

Should the Kohya_FLUX_DreamBooth_v7 folder be placed in the root directory of Kohya?

sunny

Is there a video tutorial for FLUX Fine Tuning? I don't understand how to do it in this article. Thank you

sunny

everything same as LoRA multi GPU. for learning rate i recommend new LR = square root of GPU count x Learning rate batch size 1

Furkan Gözükara

sure hopefully i will record tomorrow. waiting my 64 gb ram to arrive my current rams problematic

Furkan Gözükara

I tried did a DreamBooth training using earlier tutorial for Lora but unfortunate no luck, errors etc.. I'm waiting for tutorial, please

Arkadiusz D

Hey Doc, I see you've included: "Multi GPU requires 80 GB A100 or above GPUs, 48 GB is not sufficient sadly" However, I do not see any commentary or recommended hyper-parameters for multi-GPU using 80 GB VRAM? Can you make any recommendations there? I am interested in renting *) GB GPU x 2 or even x 4. Thank you!

Pew

for fine tuning we have to use shared vram. for lora we have configs still working. fine tuning need min 28 gb so we have to use shared vram

Furkan Gözükara

Hi! Can you please revise the configs for 24GB and 16GB, all two make my 3090 to use shared memory. I monitor the VRAM usage with nvitop and I am using 500mb of VRAM without any training. With the configs of a while ago I was able to trainng the model with de 24GB config without any problem. Thanks!

Vicent Guardiola

awesome

Furkan Gözükara

Great, it works now, thank you!

Neven Krcmarek

Thanks for replying... just seen this you're such a sweetheart; in the meantime I have looked into your posts and I've found your tool to process images. Fucking amazing results

Matei

you are welcome

Furkan Gözükara

he just fixed this error after i reported please reinstall : https://github.com/bmaltais/kohya_ss/issues/2901#issuecomment-2414757729

Furkan Gözükara

this error just fixed before few minutes ago after i reported. please make a fresh install. it was error of kohya gui developer

Furkan Gözükara

I've installed this 2 times and get this error when a start Windows_Install_Step_1.bat: Traceback (most recent call last): File "d:\Kohya_FLUX_DreamBooth_v7\kohya_ss\kohya_gui.py", line 6, in import gradio as gr ModuleNotFoundError: No module named 'gradio' Kohya_GUI_Flux_Installer_v37 works great but this dreambooth v7 does not. Am I doing something wrong? Thank you.

Neven Krcmarek

ERROR: Cannot install -r requirements.txt (line 1), -r requirements.txt (line 10), diffusers[torch]==0.25.0 and huggingface-hub==0.24.5 because these package versions have conflicting dependencies. The conflict is caused by: The user requested huggingface-hub==0.24.5 accelerate 0.33.0 depends on huggingface-hub>=0.21.0 diffusers[torch] 0.25.0 depends on huggingface-hub>=0.19.4 gradio 5.0.1 depends on huggingface-hub>=0.25.1 To fix this you could try to: 1. loosen the range of package versions you've specified 2. remove package versions to allow pip attempt to solve the dependency conflict ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts ----------------------------------------------- I can complete the installation by modifying the requirements.txt file, HF HUB 0.25.2. and I get an error when extracting flux lora. think there's a typo. FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( ------------------------------------------------ If you don't correct typos, errors like the following will occur Kohya_FLUX_DreamBooth_v7\kohya_ss\venv\lib\site-packages\gradio\blocks.py:1749: UserWarning: A function (extract_flux_lora) returned too many output values (needed: 0, returned: 1). Ignoring extra values. Output components: [] Output values returned: [None] warnings.warn(

SUNG SHU LIN

i will try ! Thx a lot !!!!

Raph wess

yep your error is RTX A6000 spot instance. terminate it and get 48 GB RAM having RTX A6000 with SECourses coupon

Furkan Gözükara

Yes, I’m using a config file, and it's the one you recommend in your tutorial: 24GB_GPU_23150MB_10.2_second_it_Tier_1.json. And here’s the configuration I’m using on Massed Compute: SECourses Created: October 13th, 4:01pm Europe/Paris 1x RTX A6000 [Spot] 6 vCPU, 24 GB RAM, 256 GB storage Thank you very much for your help.

Raph wess

looks to me like out of RAM error. are you using ALT config? how much RAM do you see?

Furkan Gözükara

Hi there I'm not sure what's going on, but when I try to save the first epoch, it says it's saved successfully. However, I get errors afterward, and when I check the file, there's no epoch saved. Do you have any idea what might be causing this? I'm using an A6000 from Massed Compute. Error on terminal : 2024-10-15 15:16:46 INFO train_util.py:5553 INFO saving checkpoint: train_util.py:5554 /home/Ubuntu/Downloads/Training/Clarins_doubleSerum_Dreambooth_v6/model/Qiriness_dreambooth_adzerq-000100.safetensors Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/Ubuntu/apps/kohya_ss/venv/bin/python', '/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/Ubuntu/Downloads/Training/Clarins_doubleSerum_Dreambooth_v6/model/config_dreambooth-20241015-121718.toml']' died with . 15:17:18-500422 INFO Training has ended.

Raph wess

you are welcome

Furkan Gözükara

Ah, I found it in your article. I must have missed that. Thank you.

SUNG SHU LIN

I understand. I was referring to the results for each epoch. Like this: 256 dataset / each 30 40 50 60 70 epoch. 15 dataset / 25 50 75 100 125 150 170 epoch. Did I fail to find the article? If there are no samples, don't worry about it! I'm planning to test it on a cloud GPU soon too.

SUNG SHU LIN

already posted read the topic top to bottom and look grids

Furkan Gözükara

Is there any plan to post the results of the epoch grid?

SUNG SHU LIN

totally visual testing. it is personal for everyone

Furkan Gözükara

Hey doctor, I have a question. Is the optimal epoch value based on your visual testing results? Or is it a mathematical conclusion? If you could post images for each epoch in a grid, it would be a great help to me. Sometimes collecting too many datasets can be very difficult, so I'm compromising with about 100 images. According to your test, I concluded that running 92 epochs would be optimal.

SUNG SHU LIN

ye it should work way faster. also put your models into fast disk for fast load :D

Furkan Gözükara

Okay i will try

Chris

use SwarmUI it has better optimizations

Furkan Gözükara

When I try to use my full checkpoint 23GB file to create images in forge (haven't switched to swarmui yet), it takes ages to generate 1 image on my 3060 12gb, 32gb system ram, 40gb virtual swap file ram. all the system ram is used, then half of the virtual ram. i load up ae.safetensors, t5xxl and clip-l in forge, then it takes about 10 minutes to start creating an image, and image creation starts with 600s/it :-O do you have an idea what i can try in order to get this better working?

Chris

you need to use 23.8gb base model and need 64 gb RAM - physical RAM

Furkan Gözükara

Okay thanks!

Chris

could be out of RAM. how much RAM you have? ye it doesnt tell error reason

Furkan Gözükara

everything same just change dataset.

Furkan Gözükara

Ok, just tested. The above issue is solved. However, even using the 6GB config results in an out of memory error on a 10GB card. I suspect you actually need more to do the initial load, even if during training it drops back down. As mentioned in my other comment, I got around this issue for Lora by using the fp8 t5.. But this trick doesn't work for Dreambooth as it says "Runtime Error: index_select_cuda not implemented for 'Float8_e4m3fn'.

RedrockVP

Ah, didn't see the other reply, thanks for that 😊 will give it another test when I'm able.

RedrockVP

Also just a follow up here. I found with my 10GB of VRAM, even for Lora training, loading the full fp16 version of t5xxl text encoder doesn't work and I get an OOM error. Lora training works if I load the fp8 t5, but Dreambooth doesn't support using it it seems. Also, with the above error, it got past that issue when I disabled Cache Latents and Cache Latents to disk.

RedrockVP

Hi! Great findings, always love your research! Do you also have a tutorial for stylized full checkpoint training with 15-20 images on massed compute with batch size = 7? What other steps are needed in comparison to realism training? Captioning? Also, good to know that i do not need to go all the way up to epoch 200 with 15-20 images i guess. saves time and money while training.

Chris

My training stopped with this error: Traceback (most recent call last): File "C:\Python3_10_11\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python3_10_11\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Python3_10_11\Scripts\accelerate.EXE\__main__.py", line 7, in File "C:\Python3_10_11\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Python3_10_11\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Python3_10_11\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Python3_10_11\\python.exe', 'C:/KohyaFLUX/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'C:/ComfyUI/ComfyUI_windows_portable/ComfyUI/models/unet/config_dreambooth-20241014-165848.toml']' returned non-zero exit status 3221225477. 17:58:28-450140 INFO Training has ended. no indication of what went wrong everything is updated one hour ago

Dallin Mackay

please make a fresh install and use a fresh config from zip file. dont forget load lora into lora tab and fine tuning into dreambooth tab not into fine tuning tab scripts\library\train_util.py", line 1909, in __init__ with open(subset.metadata_file, "rt", encoding="utf-8") as f: PermissionError: [Errno 13] Permission denied: '/'

Furkan Gözükara

this error just fixed, they fixed the error after i reported. please fresh install or update to latest

Furkan Gözükara

this error just fixed, they fixed the error after i reported. please fresh install or update to latest

Furkan Gözükara

this error just fixed, they fixed the error after i reported. please fresh install or update to latest

Furkan Gözükara

ye this is newest installation. older ones will work. it is like broken 4 hours ago

Furkan Gözükara

Oh, I see. v40 working fine.

SUNG SHU LIN

kohya broken the script please reply here : https://github.com/kohya-ss/sd-scripts/issues/1696

Furkan Gözükara

[rank0]: Traceback (most recent call last): [rank0]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 998, in [rank0]: train(args) [rank0]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 193, in train [rank0]: train_dataset_group.new_cache_latents(ae, accelerator.is_main_process) [rank0]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/train_util.py", line 2467, in new_cache_latents [rank0]: dataset.new_cache_latents(model, accelerator) [rank0]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/train_util.py", line 1066, in new_cache_latents [rank0]: num_processes = accelerator.num_processes [rank0]: AttributeError: 'bool' object has no attribute 'num_processes' I am getting the same error in the new version. on both local version and massed compute.

SUNG SHU LIN

it is broken by kohya please reply to this post : https://github.com/kohya-ss/sd-scripts/issues/1696

Furkan Gözükara

it is broken by kohya please reply to this post : https://github.com/kohya-ss/sd-scripts/issues/1696

Furkan Gözükara

it is broken by kohya please comment to this post : https://github.com/kohya-ss/sd-scripts/issues/1696

Furkan Gözükara

it gets checked only when you pick your model file path as shown in lora tutorial

Furkan Gözükara

testing this bmaltais could have broken

Furkan Gözükara

yes it is . maybe bmaltais broken let me test

Furkan Gözükara

thanks a lot

Furkan Gözükara

you need to use dreambooth tab and also use our massed compute flux updater script. it shows error : AttributeError: 'bool' object has no attribute 'num_processes'

Furkan Gözükara

48 gb gpu can max fit batch size 7. i also wanted to thoroughly test its impact because you need multiple GPU for true speed up. so now we know all :D

Furkan Gözükara

Thanks for all the tests! These are super valuable experiments. I'm curious though.. what made you decide to compare a batch size of 7? Instead of 2,3,4,5 etc... Thinking I could still get a speed boost with a batch size >1 but less than 7 (since you mentioned the larger batch sizes suffer some quality issues)

Goldwaters

When using Batch_Size_7_48GB_GPU_46250MB_28.9_second_it_Tier_1 I constantly getting error when using your settings. What am I doing wrong?? INFO Loaded AE: INFO [Dataset 0] train_util.py:2466 INFO caching latents with caching train_util.py:1039 strategy. Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 998, in train(args) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 193, in train train_dataset_group.new_cache_latents(ae, accelerator.is_main_process) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/train_util.py", line 2467, in new_cache_latents dataset.new_cache_latents(model, accelerator) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/train_util.py", line 1066, in new_cache_latents num_processes = accelerator.num_processes AttributeError: 'bool' object has no attribute 'num_processes' Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/Ubuntu/apps/kohya_ss/venv/bin/python', '/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py', '--config_file', '/home/Ubuntu/Desktop/model/config_dreambooth-20241014-183322.toml']' returned non-zero exit status 1. 18:33:33-396950 INFO Training has ended.

Daan Tilburg

god... so good info!

SUNG SHU LIN

Hi there, have been able to successfully train Lora with your configs on a 10GB 3080, so thanks for that :) Having an issue with the Dreambooth training though.. Getting this error, but not sure why. I did run through the steps to manually configure Accelerate during the install of Kohya: num_processes = accelerator.num_processes AttributeError: 'bool' object has no attribute 'num_processes' This seems really weird to me since this is supposed to be an integer value?

RedrockVP

Just FYI, 24GB_GPU_23150MB_10.2_second_it_Tier_1.json , didnt have the "Flux.1" checked. I dont know if this was intentional. Folder = DreamBooth_Tab_Fine_Tuning_Best_FLUX_Configs

N V

i suggest all 1024x1024. otherwise you need to enable bucketing. if you do compare it with 1024x1024 training

Furkan Gözükara

For dataset preparation; do I need to set them all for 1024x1024 ? As I have resized most to 1024x1365 -- does it matter if I crop?

Matei

yes you did something wrong. none of my configs generates such lower dimension lora

Furkan Gözükara

I have sorted it out by following your tutorial and using swarmui now After training on 4x GPUs on 11 images and I have ended up with 8 files that are around 153MB (I have seen in your video that you've ended up with much larger files -- 2.4GB) . Likeness is not satisfactory either; barely looks like the person I've trained. Have I done something wrong or could it be my dataset?

Matei

i use swarmui so dont know how to do in comfyui

Furkan Gözükara

Thanks a lot! Forgot to change the tab :D I am still getting "ComfyUI execution error: Model face_yolov9c.pt not found, or yolov8 folder path not defined" when running your prompts? How do I get the Face Yolo thing?

Matei

this is full checkpoint training. we train entire model and we get 23.8 GB checkpoints

Furkan Gözükara

Thanks for all the hard work you put in to helping us all. I think I misunderstood the post title though, this is not full checkpoint training tutorial yet, that is still in on its way soon?

Lee

thanks

Furkan Gözükara

yes it brings significant speed and also fused backwards pass , disable this to get better speed

Furkan Gözükara

If there is enough VRAM available disabling Gradient checkpointing brings any speed improvement?

San Milano

Great News. Can't wait for tutorial for this research and Fine-Tuning / DreamBooth tutorial

Arkadiusz D

this happens when you load rank 3 lora into dreambooth tab.

Furkan Gözükara

Sometimes on MassedCompute I am getting [rank3]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 3 has a total capacity of 47.44 GiB of which 53.75 MiB is free. Including non-PyTorch memory, this process has 47.36 GiB memory in use. Of the allocated memory 46.50 GiB is allocated by PyTorch, and 440.12 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Matei

nope use dreambooth tab

Furkan Gözükara

Does Finetune tab work in Kohya?Traceback (most recent call last): File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\flux_train.py", line 905, in train(args) File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\flux_train.py", line 127, in train train_dataset_group = config_util.generate_dataset_group_by_blueprint(blueprint.dataset_group) File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\library\config_util.py", line 485, in generate_dataset_group_by_blueprint dataset = dataset_klass(subsets=subsets, **asdict(dataset_blueprint.params)) File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\sd-scripts\library\train_util.py", line 1909, in __init__ with open(subset.metadata_file, "rt", encoding="utf-8") as f: PermissionError: [Errno 13] Permission denied: '/' Traceback (most recent call last): File "C:\Users\Eug\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Eug\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Kohya_Flux\Kohya_GUI_Flux_Installer_21\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Kohya_Flux\\Kohya_GUI_Flux_Installer_21\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/Kohya_Flux/Kohya_GUI_Flux_Installer_21/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'C:/Kohya_Flux/Kohya_GUI_Flux_Installer_21/kohya_ss/models/Dream001\\model/config_finetune-20241013-233519.toml']' returned non-zero exit status 1.

Steve

fixed thanks for letting me know. if such thing happens always look attachments.

Furkan Gözükara

fixed thanks for letting me know. if such thing happens always look attachments.

Furkan Gözükara

fixed. thanks for letting me know. if such things happen always look attachments

Furkan Gözükara

i never used training saved state. instead i would use checkpoint. so at next training, use checkpoint and it should work. i bet kohya didnt test training state :D

Furkan Gözükara

Click the link below the post, that one works

Lukas Kuhn

Best_Configs_v5.zip gives an Expired URL error EDIT: Link below the post works!

Lukas Kuhn

same for me EDIT: link below the post works

Lukas Kuhn

The link is invalid for Best_Configs_v5.zip

Daniel Limia Aspas

Best_Configs_v5.zip link is invalid, Pls update it, Thanks.

Huang Howard

i've found the cause of the problem - tensorflow requires AVX instructions on CPU, which I don't have

Jan Zhor

INFO resume training from local state: train_util.py:4362 D:/Lora/LoraTrain/test/1/model/test-000030-state INFO Loading states from accelerator.py:3085 D:/Lora/LoraTrain/test/1/model/test-000030-state 2024-10-09 11:05:30 INFO All model weights loaded successfully checkpointing.py:214 D:\Lora\Kohya_GUI_Flux_Installer_v40\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py:220: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. optimizer_state = torch.load(input_optimizer_file, map_location=map_location) 2024-10-09 11:05:31 INFO All optimizer states loaded successfully checkpointing.py:222 D:\Lora\Kohya_GUI_Flux_Installer_v40\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py:228: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. scheduler.load_state_dict(torch.load(input_scheduler_file)) INFO All scheduler states loaded successfully checkpointing.py:229 INFO All dataloader sampler states loaded successfully checkpointing.py:241 D:\Lora\Kohya_GUI_Flux_Installer_v40\kohya_ss\venv\lib\site-packages\accelerate\checkpointing.py:251: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. states = torch.load(input_dir.joinpath(f"{RNG_STATE_NAME}_{process_index}.pkl")) INFO All random states loaded successfully checkpointing.py:262 INFO Loading in 0 custom states accelerator.py:3170 running training / 学習開始 num examples / サンプル数: 34 num batches per epoch / 1epochのバッチ数: 34 num epochs / epoch数: 200 batch size per device / バッチサイズ: 1 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 6800 steps: 0%|| 0/6800 [00:00 ?, ?it/s] epoch 1/200 2024-10-09 11:05:32 INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:701 D:\Lora\Kohya_GUI_Flux_Installer_v40\kohya_ss\venv\lib\site-packages\torch\autograd\graph.py:825: UserWarning: cuDNN SDPA backward got grad_output.strides() != output.strides(), attempting to materialize a grad_output with matching strides... (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\cudnn\MHA.cpp:676.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass steps: 0%| | 9/6800 [02:09<27:03:57, 14.35s/it, avr_loss=0.33]

Vlad

Does anyone have problems with training after saved training state? I started training after saved training state 30 , but the training starts from the beginning.

Vlad

it should be faster than 10 second it. also you probably made repeat not 1. repeat has to be 1.

Furkan Gözükara

i think fp8 wont work or it will upcast back into fp16 you can try :D although you can convert your checkpoints into fp8 later.

Furkan Gözükara

I have a 4090 24gb vram, 96gb ram. Step speed was around 10 its.

Mario Santiago

What if the base model itself was an fp8 model? wouldn't that save vram? If one could train using fp8 t5 and fp8 unet then that is much savings in vram. This is what flux gym does for loras (kohya backend). I saw image comparisons of fp16 and fp8 dev image outputs they were nearly identical.

Mario Santiago

i see. looks like this method can't use virtual RAM. i see that python uses 35 GB RAM with 12 GB config

Furkan Gözükara

Yes, exactly

Tranquil

well sadly i can't test since i have 64 GB. but if you had 24 GB GPU i think 32 GB RAM would work. so even though you have virtual ram set it didnt work right?

Furkan Gözükara

Thanks for your answer, maybe time to wait for the 5090 and a new pc then :D

Chris

I'm not sure if Fine Tuning requires 64GB of RAM. Due to my limited knowledge, I think we might need to add some information above to clarify this and avoid wasting time on unnecessary attempts. Btw Thank you for your help.

Tranquil

i see. T5 wont make diff because we cache the text encoders since we dont train them so it shouldnt make any diff. so your only option is either getting 24 GB GPU or upgrade to 64 GB RAM. Also getting more RAM is always helpful

Furkan Gözükara

no your system ram from 32 to 64, buying new ram sticks

Furkan Gözükara

i asked this and currently Kohya doesnt know how to train in FP8

Furkan Gözükara

Sorry, I don't understand your comment. Are you saying i have to increase my virtual RAM to 64GB or do you mean I need to upgrade my hardware

Tranquil

Dr. Have you done any studios on fine tuning using either fp8 kohya settings or fp8 model? I don't want to used shared gpu memory as that will take over a year to train. 23gb vram or less preferred. Even a slightly lower quality is better if the speed is faster and uses less resources.

Mario Santiago

Unfortunately, no combinations of config file and virtual ram pagefile works. My system fills up the native ram up to the whole 32GB, then tries to fill the windows virtual memory swap file (64GB swap file) a little bit and then crashed with cuda out of memory error. Do you think there are more optimizations available in order to fit the training in 32GB system ram with a 3060 12GB vram card? Tried the 12gb,10gb,6gb config files. Maybe load up smaller T5xxl file, like a t5xxl_fp8 model file (i thought i saw this somewhere). maybe somewhere ram can be saved?

Chris

ye something off. what is your gpu and what is your step speed?

Furkan Gözükara

awesome

Furkan Gözükara

it needs 35 gb RAM for 12 GB config so 8 GB probably needs more. i think you have to upgrade to 64 GB RAM

Furkan Gözükara

I have 32GB of RAM, and I tried setting the virtual memory to 49152MB for the initial size and 65536MB for the maximum size, but the problem still persists. I just tried again, and I was able to run the LoRA training normally, but I still can't run fine-tuning with your 8GB configuration

Tranquil

solved by moving to massed compute. works perfectly. thanks a lot

Jan Zhor

library error for some reason. also this looks incorrect : C:/!/ludwig/ i would reinstall everything as shown in this video https://youtu.be/DrhUHnYfwC0 if you cant solve i can connect your pc and help if you upgrade gold member

Furkan Gözükara

you need to enabled shared VRAM. also how much RAM you have?

Furkan Gözükara

great work and strategy.

Furkan Gözükara

Ive gotten similiar errors, try asking chat gpt how to fix the first error shown, do what it says, then try again, and then keep on having chat gpt troubleshoot, it worked for me

Freethink

It says [4:58:15<1798:59:51, thats a long time. Was there something wrong with the config?

Mario Santiago

Hello, I'm getting this error when I start training (FLUX Fine Tuning), I only have one GPU with 16 GB VRAM. I've tried, and I've set two settings for 12 GB and 16 GB,with torch 2.4.1 and 2.5 , but all attempts have failed with the same error. And this is my full log if you want detail (https://www.mediafire.com/file/kn4ad962qkgl5cc/bug.txt/file) steps: 0%| | 0/3600 [00:00 Traceback (most recent call last): File "C:\portablefiles2\SE_Kohya\kohya_ss\sd-scripts\flux_train.py", line 994, in train(args) .... File "C:\portablefiles2\SE_Kohya\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ..... File "C:\portablefiles2\SE_Kohya\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in _apply param_applied = fn(param) File "C:\portablefiles2\SE_Kohya\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1326, in convert return t.to( RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Tranquil

I did a fresh install, loaded the new best configs and I'm getting this error. what to do? i have CUDA 12.4 as default, installed cudnn from the kohya menu and did manually configure accelerate as shown in your video with troubleshooting c:\Kohya_v40\kohya_ss\venv\lib\site-packages\diffusers\utils\outputs.py:63: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead. torch.utils._pytree._register_pytree_node( Traceback (most recent call last): File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1603, in _get_module return importlib.import_module("." + module_name, self.__name__) File "C:\Python3_10_11\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\transformers\models\clip\image_processing_clip.py", line 21, in from ...image_processing_utils import BaseImageProcessor, BatchFeature, get_size_dict File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\transformers\image_processing_utils.py", line 21, in from .image_transforms import center_crop, normalize, rescale File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\transformers\image_transforms.py", line 49, in import tensorflow as tf File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\tensorflow\__init__.py", line 38, in from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow # pylint: disable=unused-import File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 85, in raise ImportError( ImportError: Traceback (most recent call last): File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common causes and solutions. If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 710, in _get_module return importlib.import_module("." + module_name, self.__name__) File "C:\Python3_10_11\lib\importlib\__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1050, in _gcd_import File "", line 1027, in _find_and_load File "", line 1006, in _find_and_load_unlocked File "", line 688, in _load_unlocked File "", line 883, in exec_module File "", line 241, in _call_with_frames_removed File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py", line 20, in from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection File "", line 1075, in _handle_fromlist File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1594, in __getattr__ value = getattr(module, name) File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1593, in __getattr__ module = self._get_module(self._class_to_module[name]) File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\transformers\utils\import_utils.py", line 1605, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback): Traceback (most recent call last): File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common causes and solutions. If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "c:\Kohya_v40\kohya_ss\sd-scripts\flux_train.py", line 31, in from library import deepspeed_utils, flux_train_utils, flux_utils, strategy_base, strategy_flux File "c:\Kohya_v40\kohya_ss\sd-scripts\library\flux_train_utils.py", line 17, in from library import flux_models, flux_utils, strategy_base, train_util File "c:\Kohya_v40\kohya_ss\sd-scripts\library\train_util.py", line 53, in from diffusers import ( File "", line 1075, in _handle_fromlist File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in __getattr__ value = getattr(module, name) File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 701, in __getattr__ value = getattr(module, name) File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 700, in __getattr__ module = self._get_module(self._class_to_module[name]) File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\diffusers\utils\import_utils.py", line 712, in _get_module raise RuntimeError( RuntimeError: Failed to import diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion because of the following error (look up to see its traceback): Failed to import transformers.models.clip.image_processing_clip because of the following error (look up to see its traceback): Traceback (most recent call last): File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 70, in from tensorflow.python._pywrap_tensorflow_internal import * ImportError: DLL load failed while importing _pywrap_tensorflow_internal: A dynamic link library (DLL) initialization routine failed. Failed to load the native TensorFlow runtime. See https://www.tensorflow.org/install/errors for some common causes and solutions. If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message. Traceback (most recent call last): File "C:\Python3_10_11\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python3_10_11\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "c:\Kohya_v40\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "c:\Kohya_v40\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['c:\\Kohya_v40\\kohya_ss\\venv\\Scripts\\python.exe', 'c:/Kohya_v40/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'C:/!/ludwig/ready\\model/config_dreambooth-20241008-111926.toml']' returned non-zero exit status 1.

Jan Zhor

for fine tuning min 28.5 gb is necessary so shared vram usage totally normal. 8 second / it not very bad

Furkan Gözükara

I download these and might later forget what it’s for. If your naming scheme can include what the config is for (lora, dreambooth/fine tune) that would be great. I am trying the 24gb config and noticed that 10gb shared gpu vram is used, making it 8 per second. Is there no way to get this transferred to ram for faster speed? I have 96gb vram and 24vram.

Mario Santiago

please let us know

Furkan Gözükara

save as fp16 it will be 24 gb

Furkan Gözükara

for loras we have another post here : https://www.patreon.com/posts/110879657

Furkan Gözükara

Will try with 10GB Config File and set Virtual Memory even higher than 64GB. Maybe that solves it. Will report then

Chris

These best configs are just for fine tuning or for loras as well?

Mario Santiago

Got everything working but with the v5 config settings for 24GB (4090), my checkpoints are coming in at 47GB. Any idea what I could've missed?

Anderson Deestovel

awesome

Furkan Gözükara

it uses a lot of shared vram. i am not sure if the culprit is your having only 32 gb ram because it uses 35 GB RAM for RTX 3060 on my pc

Furkan Gözükara

I am not sure if you and I had the same issue. I ran across the problem where my VRAM and shared gpu ram was being used. I assumed things were not working as a result of seeing shared gpu ram being used and was prematurely shutting down. After letting it run into a few iterations, I was getting about 23s/it which is good enough for me as these configs work pretty well and produce great results!

Patrick Major

I got it running. Thanks!

JamZam WamBam

I just deleted the kohya folder and installed from your latest installer fresh. Did the step1 install, step 2 and then step 2.5. I used your 16BG kohya dreambooth config (from the best configs v5) in the kohya dreambooth tab and only updated the model locations, image path, model path and lowered the epochs to 125 (as I have 25 images). I left everything else the same. It is using 34BG of RAM. ever since this Kohya update the memory usage has been brutal for me no matter what I do. EDIT: I think it might just be that as the model starts up, it takes up huge memory and then "calms down". After the model loads and things are going, I see the following memory stats: dedicated GPU: 14GB / 16GB shared GPU: 20GB GPU Memory: 32.6GB (which is combined of dedicated and shared). I only have 16GB in total. The GOOD NEWS IS: even though I can see shared memory is being used, which usually means doom day 20 days training time, I am seeing 23.95s/it, which is only 20.5 hours of training time for 3,125 STEPS. I prematurely thought it wasn't working due to the shared memory usage, but for now all seems good. I'll report back later to confirm it is working, but strange behaviour. If anyone else sees this, just sit back and wait until the first couple steps have started and at that point even though memory is showing weird a bit, the iterations a second metric is not too shabby.

Patrick Major

Hi there, tried the new config file 12 GB for full training. Even when my VRAM is "empty" and i set the computer virtual RAM in system settings in windows to 64GB (i have 32GB RAM natively), and i set enable shared VRAM fallback to system memory in nvidia system settings, i still get the CUDA out of memory error. I have a 3060 12GB card. Do you know what else i can try?

Chris

yes this one : Windows_Install_Torch_2_5_Dev_Huge_Speed_Up.bat also use our start file it adds --noverify

Furkan Gözükara

Does your installer, install the correct 2.5.0 Torch? I was having trouble with this, it kept installing 2.6.0 (or not finding the right version)

BecauseReasons

dont enable bucketing make all images 1024 and let me know : edit your all images with paint .net and make sure all saved as png 1024 px

Furkan Gözükara

ERROR Invalid cmdline parsed arguments. This should be a bug. / config_util.py:381 コマンドラインのパース結果が正しくないようです。プログラムのバグの可能性 が高いです。 Traceback (most recent call last): File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\sd-scripts\flux_train.py", line 994, in train(args) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\sd-scripts\flux_train.py", line 128, in train blueprint = blueprint_generator.generate(user_config, args) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\sd-scripts\library\config_util.py", line 406, in generate sanitized_argparse_namespace = self.sanitizer.sanitize_argparse_namespace(argparse_namespace) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\sd-scripts\library\config_util.py", line 378, in sanitize_argparse_namespace return self.argparse_config_validator(argparse_namespace) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\venv\lib\site-packages\voluptuous\schema_builder.py", line 272, in __call__ return self._compiled([], data) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\venv\lib\site-packages\voluptuous\schema_builder.py", line 465, in validate_object out = base_validate(path, iterable, {}) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\venv\lib\site-packages\voluptuous\schema_builder.py", line 433, in validate_mapping raise er.MultipleInvalid(errors) voluptuous.error.MultipleInvalid: expected int for object value @ data['max_bucket_reso'] Traceback (most recent call last): File "C:\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main args.func(args) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command simple_launcher(args) File "C:\Users\James\Documents\Kohya-Flux-v40\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\Users\\James\\Documents\\Kohya-Flux-v40\\kohya_ss\\venv\\Scripts\\python.exe', 'C:/Users/James/Documents/Kohya-Flux-v40/kohya_ss/sd-scripts/flux_train.py', '--config_file', 'C:/SD/Training/KohyaAbbTrainingOutput/config_dreambooth-20241007-145921.toml']' returned non-zero exit status 1. 14:59:29-420848 INFO Training has ended.

JamZam WamBam

Hi Dr Furkan. I keep getting an Error

JamZam WamBam

yep it is really great tutorial

Furkan Gözükara

thanks i will watch it all !

Neil Rhodes

https://youtu.be/nySGu12Y05k 19:00 Instance and class prompts and their importance in training timing may not be exact

Furkan Gözükara

Thanks i now have to learn what you just said means. I dont know what "find a class prompt" means or "train with ohwx class" or how to do that. WHich video doe you recommend of yours i watch to get some understanding? thanks

Neil Rhodes

if you want to train a style, find a class prompt that defines your style, prepare a consistent dataset and train with ohwx class_prompt. it is same as LoRA training but this will just produce way better quality, lesser overfitting results

Furkan Gözükara

This is exciting, im still trying to understand all the so much lingo all thrown, in. What steps would i need take to make FLUX understand a particular type of photo better say a type of landscape or architecture? or even a specific window type? I would need a set of images and caption them and train ? what is the fine tune? And now this can be local with a 12GB card, awesome. I just need a more basic explanation so i can figure out what i need to try in order for my needs.

Neil Rhodes

if you can make a video of what you are doing i can pinpoint error , or if you upgrade gold member i can connect your pc and check out

Furkan Gözükara

please make a fresh install of Kohya, get a fresh config and load into DreamBooth tab, it should work. yesterday i trained on RTX 3060 - 12 GB :)

Furkan Gözükara

it is for DreamBooth, the rest is same as LoRA if you want bucketing sure enable and it should work

Furkan Gözükara

Hy Furkan, one question: these setting are only for Dreambooth not for Lora? When i used these settings for DB must i crop the images to 1024x1024 or can in use Enable buckets like in Lora?

puk

I was unable to get my usage below 500mb andI did try the lowest config of quality 1, I still had the same problem. I will maybe try installing everything new, if that does not work I will just use massed compute.

Luno Lux

Thanks for answering!

Luno Lux

I tried the above as well as downloaded the best configs v5 you have just posted, specifically with the "16GB_GPU_15150MB_13.8_second_it_Tier_1.json" config. Though I am still finding it goes up to 24GB. I was so excited I was checking the params and had been excited that you somehow got the 1024 working within 16GB. I also tried an experiment when I set the blocks to swap up to 32 when the memory was over running so bad (as the kohya_ss UX says 32 is recommended). I thought for sure it would drive the memory usage down, but no, still 24GB. I also lowered the image resolution from 1024,1024 to 896,896 with no success in lowering the VRAM usage. It's really strange, but was literally working for me last week.

Patrick Major

follow my other replies. now even 8 gb can train :D

Furkan Gözükara

meanwhile what you need to do is set Single Blocks to swap (depercated) = 0 and Double Blocks to swap (depercated) = 0 and Blocks to swap = 12 - as you increase this it will use lesser vram

Furkan Gözükara

ok i found the reason. i have to update the configs according to the newest version. will do hopefully today.

Furkan Gözükara

Patrick Major

i will test this and report back maybe kohya broken. will do a fresh install. my defaults are cuda 12.4 and cudnn 8.9.7

Furkan Gözükara

Something seems to have changed. I installed fresh from your "Kohya_GUI_Flux_Installer_v39" and am using the "Quality_2_15100MB_30_Second_IT" from your best fine tuning configs v4. This was working *very* well. I am not sure if it was the update to the kohya installer v39, but now the training takes upwards of 23 GB, whereas it was working previously with my 16GB GPU for many trainings. I noticed you have your installer v27 associated with the Flux fine tunings. Has anyone else noticed that the 15GB training now causes 20+GB of memory to be used? If so, has anyone resolved this yet?

Patrick Major

hello. such second happens when you use shared vram. which config file you tried? how much vram you are using before starting to training?

Furkan Gözükara

I found this in trining cmd: ..\kohya_ss\venv\lib\site-packages\torch\utils\checkpoint.py:1399: FutureWarning: `torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead. with device_autocast_ctx, torch.cpu.amp.autocast(**cpu_autocast_kwargs), recompute_context: # type: ignore[attr-defined] I tried updating torch to 2.5 but still the same problem. Also it says this: epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:701

Luno Lux

Hey Furkan, for some reason I get a speed of 400-800 seconds per interation when training. I have a 4090 and when I check the performence it is running at 100%, so the training is not running on cpu. I have followed your essential Ai video and installed kohya with your installers for flux. Configuration is loadad into dreambooth. the training starts but it is super slow. Do you have any idea why this could be? THANK YOU FOR YOUR HARD WORK!!!

Luno Lux

it is not supported yet and i dont think these distilled models are ready yet - not good enough

Furkan Gözükara

This has been working great on my 4090. Wondering if you've tried a full finetune on the new undistilled version of flux dev that came out yesterday? https://huggingface.co/nyanko7/flux-dev-de-distill -- this should solve a lot of the problems I'm seeing in my full finetunes. Trying to do a full finetune in Kohya, but getting "NotImplementedError: Cannot copy out of meta tensor; no data!" No problem finetuning base dev Flux on my 4090, but getting the same error as another user described here: https://github.com/kohya-ss/sd-scripts/issues/1665 Any ideas, or is support simply not implemented in Kohya yet?

John James

yes i am open to consultations i am gonna message you know - best way to learn :)

Furkan Gözükara

Hello! I tried your config for the 4x GPUs and it worked fantastically on Massed Compute. I was wondering if the same config could be used for 8x GPUs. Besides that, would you mind explaining to me how the training works with multiple GPUs? Would make sense to increase batch size to the number of GPUs being used? Besides this, are you open for consultations? If yes, could you send me a DM so we can discuss that? Thanks in advance for your answer!

San Milano

thanks for heads up.

Furkan Gözükara

Sorry for the confusion, is not flowchart, is flowmatch and is not an optimizer is a scheduler

Anduvo

thanks

Furkan Gözükara

Ok great, thanks for your answer, I will try to mimic multires training in kohya. Yes ai-toolkit is using an optimizer that I have never seen before which is flowchart. Unfortunately I don't see it in kohya. Also I suppose is not as memory friendly as adafactor but quality wise is the best I have seen. Please give a try to ai-toolkit, it doesn't have a gui but paradoxically I find it much easier to edit the config yaml than kohyas laberinth ui. In the example yaml not all available parameters are listed. Also I don't think it can do yet a full fine tune, only LoRa. I will keep you posted with my test comparisons

Anduvo

ai toolkit trains lower resolution that is why. you can mimic it with have folders, 512,768,1024 and set your repeat count same also it might not have some slowing down things like t5 attention mask i never trained with ai toolkit but rather some new optimizers may have better quality i will hopefully test them

Furkan Gözükara

Hi Furkan, 10000 steps with ai-toolkit lora multi-res 1024x1024, 768x768, 512x512 takes around 8h and kohya fine tune 1024x1024 10000 steps is taking 30h. I know I am comparing LoRa to fine tune but is it normal that takes 3 times more ? Also first results LoRa rank 640 extracted from fine tune 5000 steps quality in details much worst than ai-toolkit LoRa rank 64 10000 steps. I know is double of steps that's why I am doing another training to try to do a fair comparison but I expected much more fine details already from the rank 640 LoRa. Dataset 288 images of a car Any insights on the quality and timings ?

Anduvo

i asked this to Kohya he also didn't give me any answer. i think people using comfyui nodes to convert and save. I will also bmaltais if he can add

Furkan Gözükara

Hi I understand that you cannot train FP8, but could you let us know once the training is done how could I convert the FP16 into an FP8? I tried to find something on the net without any success. Thanks

Frederic Collin

hi quality 1 means all equal quality, only vram usage and speed changes. for 24 gb gpu use Quality_1_23100MB_14_12_Second_IT.json - if you get out of memory use Quality_1_22460MB_15_12_Second_IT.json hopefully i will update configs still doing a lot of experiments

Furkan Gözükara

Ey Furkan, I'm downloaded the best_configs_v3, but I'm confused by the name convention, no idea what those numbers stand for. By the way, I have a 3090 24GB and want to fine-tune on Windows, which one do you recommend?

Pablo Montero

runpod would work better since it installs from 0, make sure to add sudo to commands

Furkan Gözükara

Hi Furkan, if I want to install Kohya in Ubuntu locally on my computer, which of the installers should I use ?

Anduvo

tell me your device details how much ram? vram? very likely that you are missing virtual ram follow this video it explains how to set : https://youtu.be/nySGu12Y05k

Furkan Gözükara

getting error in sd Forge "OutOfMemoryError: Allocation on device" please give me solution

Rajasekar R

you are welcome. hopefully will make a video doing more research atm

Furkan Gözükara

Okay, I understand, thanks!

peter balafas

ok then all you need to do is get a fresh config from zip file, load into dreambooth tab, select flux dev model path and other paths, exactly same as lora training. each checkpoint will be 24 gb so be careful of that

Furkan Gözükara

I'm trying to finetune a flux checkpoint

peter balafas

dreambooth configs are different. which model you trying to fine tune sdxl sd 1.5?

Furkan Gözükara

Hi, thanks for effort, your doing amazing work. I have done numerous Loras that look great. I'm trying to finetune an old checkpoint, I loaded your best config config in Dream booth, I'm baffled where I should load my checkpoint, I don't see that option anywhere in the Gui.

peter balafas

yes just halve the number of epochs - the displayed epoch not important but step count should match - 2x displayed step count since u have 2x gpu

Furkan Gözükara

If I train 21 photos with 2x A100 80GB and want to train approx. 3600 steps, do I only have to enter 80 instead of 160 epochs in the configuration? The terminal then indicates that 152 epochs are being trained, but only 1300 steps approx. 1300 steps are too few in my experience. So do I have to leave everything the same with 2x GPU as with 1x GPU, i.e. 160 epochs, or do I really have to halve the number of epochs?

Chris

I am officially retiring my SDXL models (30+models) and earlier. Flux is incredibly good, training is top notch, prompt following is great, no need for yolo face or hands or other fixes, general people in background to not get converted to main subject, skin is not plastic, so many more.. The quality is absolutely incredible (note some of the samplers reduce age by many years so found it is important to use define appropriate ages in prompt, ex 50 yo man)

Ec Jep

maybe LR was too low. you can raise LR to 0.000006 from 0.000004 and train again

Furkan Gözükara

I trained 21 images with 160 Epochs with your full Checkpoint config rank1 on massed Compute. The result barely looks like the trained subject, looks like training with 200 steps or something Like that. Should resemble more like 3600 steps that i trained. Any idea? Training a lora with the 21 images look very good (also up to 3000 steps), so the pictures are fine i guess.

Chris

Watching right now 😁

Chris

you should watch these 2 https://youtu.be/HKX8_F1Er_w https://youtu.be/bupRePUOA18

Furkan Gözükara

cfg 1 was the solution! still trying to get comfortable with swarmui, coming from months of a1111 / forge :D

Chris

Use cfg 1. And try to generate without your model to verify your system working

Furkan Gözükara

Will try. Unfortunately the first generated image was totally blurry and crippled, overburned or sth like that. Dont know what went wrong.

Chris

Also update your swarm ui to latest and restart pc

Furkan Gözükara

Well when I test on my rtx 3060 it doesn't use that much. Can you restart computer and try again

Furkan Gözükara

Put in the wrong folder 😶 Works now, but image Generation takes ages. Vram usage is 22gb, my card has only 12gb though. Is recognized correctly when starting swarmui

Chris

move models into SwarmUI\Models\diffusion_models - it is checkpoint or lora?

Furkan Gözükara

Now in SwarmUI I have this error when loading the trained full model: [Error] Error loading model on backend 0 (ComfyUI Self-Starting): ComfyUI execution error: ERROR: Could not detect model type of: G:\swarmui\SwarmUI\Models\Stable-Diffusion\martlins_fluxfull-000090.safetensors 17:32:03.397 [Warning] Tried 1 backends but none were able to load model 'martlins_fluxfull-000090.safetensors'

Chris

i have downloader for flux i suggest to use it. it has even better samplers : https://www.patreon.com/posts/109289967 better sampler is iPNDM

Furkan Gözükara

In SwarmUI, do i also need to put clip.safetensor and t5.safetensor in the directories or does it work without those files?

Chris

In FORGE we have the Flux Improved Sampler to choose instead of EULER. Does sth like this also exist for SwarmUI?

Chris

i suggest you to use swarmui it works better and faster. you dont even need nf4 model. it will auto cast fp16 dev model into fp8

Furkan Gözükara

When I do a full checkpoint training and want to use the safetensor in FORGE, there is an error: AssertionError: You do not have CLIP state dict! If I then select dict-l and t5 in the VAE menu in the web interface, my whole PC crashes, blue screen with error regarding memory allocation. I have a 3060 12GB. You are training on the dev-1 flux version, maybe I should train on the nf4Flux1_nf4Bnb.safetensors version or flux1-dev-bnb-nf4-v2.safetensor? I can use it to generate images in FORGE. What do you think?

Chris

used as unet. currently put them into diffusion_models folder : \SwarmUI\Models\diffusion_models

Furkan Gözükara

Is the fine tuned saved safetensor file to be used as Unet or Checkpoint?

Robert Arsene

yep currently swapping is slow but Kohya said he is gonna hopefully improve. i think when rtx 5090 ti arrives it will directly fit into vram and it will be ultra fast

Furkan Gözükara

All Rank 1 and Rank 2, whichever you prefer according to speed / quality. slower ones are slightly better quality

Furkan Gözükara

at the moment FLUX is bleeding / mixing multiple concepts. it is its biggest issue. but when you have multiple person in same image it learns

Furkan Gözükara

sadly currently biggest problem of FLUX is that it bleeds multiple concepts :/

Furkan Gözükara

That would be very interesting for me as well. I created single FLUX loras of my wife, me and my dog. Used sparately, they work amazingly well, but when I want to combine them to have e.g. my wife and me in the same image, rendering gets extremely slow due to VRAM issue, even though I have 24GB RTX4090. So having a trained checkpoint with all three of us would be amazing!

Jason Dawn

from more training and using flux it is substantially better than sdxl. I don't think i'm going back. One amazing feature is to take an sdxl image and use it as init but with the new trained flux model - wow. Flux is >>> SDXL. Your research and tips are extremely helpful!! BTW I found 3-3.5 guidance, stating the age of the subject (ex 60 yo man) and DEIS sampler is amazing. Other: it takes 10-14 hrs to train on my 4090 using block swapping but it is worth the wait.

Ec Jep

I'm just a bit confused by the MB size listings in the Best Configs folder. Which one is the Rank 1 config for a 48GB A6000 Massed Compute instance?

Oinksauce

is it in genral possible to finetune a flux model with different characters? I tried to train dev with person1 (triggerword lsVK) and then used the trained model "dev+person1" as base model for my training with person2 (triggerword nicIl). But the result was that by using "lsVK" as trigger the model gave me an image of person2. Do you have any ideas how to train a model with more than one individual character?

MathiasF

so true. you need to delete older ones or get at least 2x gpu. i should mention this in video hopefully

Furkan Gözükara

if you are training a single subject i don't suggest. currently i am not training a subject but general improvement is my aim

Furkan Gözükara

I recall you saying you don't need captions for Lora training (just a trigger word). Is this the case for fine-tuning as well, or should we use detailed captions for fine-tuning?

GH

Just a quick nit on this: When running on massed compute and using the best config scripts with one A6000, the storage space is only 250GB, each checkpoint is ~24gb, so using the base config you'll be running 200 epochs and saving every 25. You will run out of space before training completes due to the number of checkpoints being saved.

Oinksauce

Same rank = same quality. choose according to your VRAM

Furkan Gözükara

In the Best Configs file - are all Rank 1 choices the same quality? Do you have a recommendation for which one is the best quality?

Oinksauce

Yep it is number 1. Your ordering is looking accurate

Furkan Gözükara

Thanks again for all the research! - saves me unbelievable amounts of time. So far this is my rank in terms of quality in training people Flux>realvisxl5>realvisxl4>sdxl>sd1.5. Flux is truly amazing how well it follows prompts.

Ec Jep

i didnt try very skin colored person but it took my skin tone accurately and my body shape as well when i trained with 256 images of myself

Furkan Gözükara

Thank you for your contributions. I've learned a lot. I'm curious if you'd be willing to try training on subjects that flux is less amenable to. For example, I notice flux seems to be quite poor at picking up different body types (e.g overweight and/or short), and darker skin tones. During training you'll often notice it picks up a subjects facial features before skin tones so you'll end up with a white version of the subject long before it picks up their skin tone. In this respect, SDXL seems to be much more preceptive than flux. Training on a subject that resembles oprah or queen latifah - with flux you'll end up with a 6 foot slender version. The face will match though. SDLX on the other hand would be quite accurate given the same training data Thanks for listening :)

Jay

i added info to the top

Furkan Gözükara

i dont have 4x GPU config in the zip file because i couldnt make it work sadly :( i even opened issue asked Kohya

Furkan Gözükara

I don't mean to sound like a jerk, but I just spent 4.5 hours trying to get this to work on 4xA6000 but didn't see this comment until now. I feel this is really important info and should be highlighted at the TOP of the post. I love your work, you've made the step into training much less daunting.

Error_404_unknown

multi gpu fine tuning requiring 80 gb vram sadly. i also couldnt make it work

Furkan Gözükara

Hi, i think multi GPU training is broken. When loading Rank 1 file for full checkpoint training and setting "enable multi gpu" and GPU id to 0,1,2,3 (it shows 0123 at nvitop tool) it gives the following error: rank3]: Traceback (most recent call last): [rank3]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 908, in [rank3]: train(args) [rank3]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 193, in train [rank3]: accelerator.wait_for_everyone() [rank3]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2564, in wait_for_everyone [rank3]: wait_for_everyone() [rank3]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/other.py", line 138, in wait_for_everyone [rank3]: PartialState().wait_for_everyone() [rank3]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/state.py", line 374, in wait_for_everyone [rank3]: torch.distributed.barrier() [rank3]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper [rank3]: return func(*args, **kwargs) [rank3]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3936, in barrier [rank3]: work = default_pg.barrier(opts=opts) [rank3]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, invalid usage (run with NCCL_DEBUG=WARN for details), NCCL version 2.20.5 [rank3]: ncclInvalidUsage: This usually reflects invalid usage of NCCL library. [rank3]: Last error: [rank3]: Duplicate GPU detected : rank 3 and rank 0 both on CUDA device 2000 [rank0]: Traceback (most recent call last): [rank0]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 908, in [rank0]: train(args) [rank0]: File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 193, in train [rank0]: accelerator.wait_for_everyone() [rank0]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2564, in wait_for_everyone [rank0]: wait_for_everyone() [rank0]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/other.py", line 138, in wait_for_everyone [rank0]: PartialState().wait_for_everyone() [rank0]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/state.py", line 374, in wait_for_everyone [rank0]: torch.distributed.barrier() [rank0]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper [rank0]: return func(*args, **kwargs) [rank0]: File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3936, in barrier [rank0]: work = default_pg.barrier(opts=opts) [rank0]: torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/NCCLUtils.hpp:275, invalid usage (run with NCCL_DEBUG=WARN for details), NCCL version 2.20.5 [rank0]: ncclInvalidUsage: This usually reflects invalid usage of NCCL library. [rank0]: Last error: [rank0]: Duplicate GPU detected : rank 0 and rank 3 both on CUDA device 2000 W0918 17:54:55.253000 139563853816960 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 8310 closing signal SIGTERM W0918 17:54:55.254000 139563853816960 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 8311 closing signal SIGTERM W0918 17:54:55.254000 139563853816960 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 8312 closing signal SIGTERM E0918 17:54:55.582000 139563853816960 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 3 (pid: 8313) of binary: /home/Ubuntu/apps/kohya_ss/venv/bin/python Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command multi_gpu_launcher(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher distrib_run.run(args) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ============================================================ /home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py FAILED ------------------------------------------------------------ Failures: ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-09-18_17:54:55 host : 0068-kci-prxmx10116 rank : 3 (local_rank: 3) exitcode : 1 (pid: 8313) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================ 17:54:56-880890 INFO Training has ended.

Chris

yes you can use your existing comfyui as back end. we use it in kaggle

Furkan Gözükara

It’s like auto1111 with more features + use can use comfy ui side by side. Are you able to swap the comfy ui preloaded with one you already have that has all custom nodes and the manager without messing up swarm ui?

Brandon

the json files has required VRAM amounts, Quality_1_23100MB_14_12_Second_IT.json

Furkan Gözükara

I cannot know which json is match the 24G ?

shen oracle

great. ye try fine tuning it should be way better quality

Furkan Gözükara

they are excellent. I wanted to orient myself. the datasets I use to imitate a video game are 500 images, both ingame and fmv, and for the characters I have to do some tests. I use joycaption which I think is really useful especially with the latest update where I can put trigger words as a prefix. I'm still studying flux and I need to understand if long or short prompts are better. I'm not sure about the repetitions. I also noticed that the checkpoints it creates are around 2 GB but with koyha I can reduce its size. I want to try your new fine tuning setting and see how it behaves. thank you for your time. I follow you with pleasure. sorry for my english I use google translator.

The Room

well i think you did a decent training arent results satisfactory? you need to analyze and see if undertrained or overtrained . did you take checkpoints and compare?

Furkan Gözükara

Hi, How many images, epocs, repeats, steps do you recommend to create a graphic style or a character? I did an experiment using 500 images taken from a video game, 3 repetitions, 10 epocs for a total of 25,000 steps, AVR was between 0.46 and 0.48. But do you think all these images are necessary? my system has a speed of 3s/it with your old rank 1 low setting and I would like to try these new ones settings with fine Tuning. I would like to recreate a style of a specific video game and I would like you to give me a hand, I'm still inexperienced. and tell me what differences in setting are there between creating a character and a style? Thanks

The Room

yes SwarmUI works amazing out of box it applies all the optimizations.

Furkan Gözükara

Comfy ui is very slow with lora generations. Not sure if its my node configurations. I have a 3090. You said you recommend using swarm ui for trained loras?

Brandon

there is no precise number. for 15 images person 2250 steps, for 256 images it was 80 epoch = 20480 steps but since i used 8x gpu thus it was 2560 steps, for 66 style images it was 66*75 = 4950 steps

Furkan Gözükara

What would be the good number of steps to use for a person training or a specific clothes? At the end it’s not epochs or repeats that matters but the number of steps. Thanks!

Frederic Collin

it is useless in diffusers training, i never pay attention :)

Furkan Gözükara

Great work, in the meantime, do you find any relation with the loss rate number?

ZXM

true

Furkan Gözükara

Well yeah, no perfectly tuned ones right now, we have to wait for the release of some kind of realistic vision flux.

Albert

ah not yet. i dont think there is any worthy fine tuned flux yet

Furkan Gözükara

Cool, Have you tried training images on a modified non-basic Flux model?

Albert


Related Creators