Aitrepreneur : 1-Click INSTALL FLUXGYM - EASY FLUX LORA TRAINING!

Yep, like others sadly crashes for me at 'writing request stream'...shame, looked so good :-(

Steve Saunders

2025-07-02 15:22:20 +0000 UTC

You putting out a run pod for wan2.1 i2v/t2v loras?

Steve

2025-06-24 16:18:47 +0000 UTC

Dear Aitrepreneur Love your videos and thanks for making all of your efforts to make these tools accessible to hobbyists like me. I have tried to start training a flux LORA on runpod and encountered an issue, that is probably pretty basic, so hopefully easy to solve. I haven't found the solution in the existing conversation, hence my reaching out. The installation ran smoothly, I could open the training UI window without issue and apply all the settings. Then, when I start training it gives the error: "Cannot access gated repo for url https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors. Access to model black-forest-labs/FLUX.1-schnell is restricted. You must have access to it and be authenticated to access it. Please log in." The way to resolve this according to what I could find online is by entering the text: from huggingface_hub import login token = “token_name” I have prepared the appropriate token and accepted T&C on hugging face for the model, but typing this in the command window (with the token_name replaced for the actual token obviously), doesn't work. The workaround that I have for this on my PC was to simply download the model file manually and copying it to the folder. I would do the same on the runpod environment, but don't know how to do that. Alternatively, I can upload the model, but that would take ages. Hope you can help.

Blonde Adonis

2025-06-22 11:40:07 +0000 UTC

Hi does the one click installer still work? i get - Writing web request Writing request stream... (Number of bytes written: 24131810) it runs for a little bit like this then cuts out

Waynethejockrohnson

2025-06-14 15:38:29 +0000 UTC

[2025-05-24 16:08:28] [INFO] gradient accumulation steps / 勾配を合計するステップ数 = 1 [2025-05-24 16:08:28] [INFO] total optimization steps / 学習ステップ数: 1800 [2025-05-24 16:08:41] [INFO] 2025-05-24 16:08:41 INFO unet dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1323 [2025-05-24 16:08:41] [INFO] INFO text_encoder [0] dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1329 [2025-05-24 16:08:41] [INFO] INFO text_encoder [1] dtype: torch.bfloat16, device: cpu train_network.py:1329 [2025-05-24 16:08:42] [INFO] steps: 0%| | 0/1800 [00:00

Robert

2025-05-24 16:11:50 +0000 UTC

idk. starting to think @aitrepreneur is a runpod agent trying to make us spend as much money on the platform as possible by promising easy one-click solutions for 5 dollars a month, but delivering non-working, buggy tutorials that waste our weekends and credits. geez man… fix your stuff.

Robert

2025-05-24 15:53:20 +0000 UTC

Is there any chance of a fluxgym installer for rtx 50 series cards, please?

John Holden

2025-05-23 21:26:47 +0000 UTC

This might be a dumb question, but could this be modified to work with HIDREAM or even Pony based model?

Plaiboy Magazine

2025-05-18 04:19:52 +0000 UTC

happens to me to

Bjarki Kjellsson

2025-04-28 19:46:53 +0000 UTC

When i launch the .bat, it close after the python installer download. [process exited with code 0] . Any solutions ?

Snick3rs

2025-04-24 21:14:34 +0000 UTC

Whats new in the V2 version?

Virtamouse

2025-04-19 22:14:11 +0000 UTC

same problem here, tried to reinstall it, use multiple methods, but always the same problem

Lukáš Hájek

2025-03-22 17:12:29 +0000 UTC

I'm getting this error now all of a sudden when it worked fine before.

Virtamouse

2025-03-22 16:19:37 +0000 UTC

is anyone here have this error? "mat1 and mat2 shapes cannot be multiplied (1x2304 and 2816x1280)"

Rluu

2025-03-21 14:40:01 +0000 UTC

Inside of the fluxgym folder there is a bat file "LAUNCHER.bat" execute that to re open it.

GRL

2025-03-18 02:00:50 +0000 UTC

I had that happen to me the first time I successfully ran this; check the log for errors; mine had an argument that was coming in that was not possible to run on my setup (--optimizer_type adamw8bit); for some reason, it runs on that specification even if you do not check it on the advanced menu. so I had to change that to (--optimizer_type adamw) and it worked. There are some of these that will require your computer to be set before running the tool in its environment.

GRL

2025-03-18 01:55:00 +0000 UTC

After "Training Complete. Check the outputs folder for the LoRA files." There are no safetensors files saved. What might be the problem?

Fabi AI

2025-03-17 12:29:01 +0000 UTC

If just installing doesn't work you can also try to update the current installation. here is the command line for that: pip3 install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 But make sure to activate the env in the scripts folder before doing so.

GRL

2025-03-17 03:34:57 +0000 UTC

I found a thing that worked for me! I made sure that I was in the correct environment for the fluxgym and ran this update for the latest pytorch : pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 This is assuming you are running the latest CUDA version for the 50 series, you can check what CUDA version you are running by using this command : nvcc --version if the release number is "release 12.8" that means you need cu128 and the pip3 install command above will install it for you. Or you can use this florance based tool I found to caption your data set. : https://github.com/MNeMoNiCuZ/florence2-caption-batch I hope this helps.

GRL

2025-03-17 01:26:25 +0000 UTC

It looks to me that this has nothing to do with the installer he provided but with your computer failing to install the libraries. This may be caused by so many different things that it's difficult to predict. Try running the bat as admin; that may resolve permissions to install that environment to run the rest of the bat file properly.

GRL

2025-03-17 00:55:59 +0000 UTC

Yes! I tried selecting images that already have a txt file with the same name and dropping it in the "drop file here" area, and it will load your descriptions according to the file name, for example, "image_001.jpg" and "image_001.txt" If they are named correctly, it will populate the fields for you. I hope this helps.

GRL

2025-03-17 00:52:10 +0000 UTC

I have the same issue (also a 5090 user). Maybe it's related to the pytorch not having support for this drive, I tried to update using the following : pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128 I found a tutorial for using Comfy on the 50 series and was told that we need to use this in the environment. But even after this, I still have that error when trying to run Florance-2. I hope we can find a solution to this.

GRL

2025-03-17 00:45:58 +0000 UTC

Hey. What is your suggestion for parameters if we have 40 person images to train for flux schnell with 16 GB VRAM?

no name

2025-03-16 21:52:15 +0000 UTC

Hi Aitrepreneur, How about this error? It just hangs my training forever without completing it. [2025-03-11 21:10:02] [INFO] 2025-03-11 21:10:02 INFO Checking the state dict: flux_utils.py:43 [2025-03-11 21:10:02] [INFO] Diffusers or BFL, dev or schnell [2025-03-11 21:10:02] [INFO] INFO t5xxl_max_token_length: flux_train_network.py:157 [2025-03-11 21:10:02] [INFO] 512 [2025-03-11 21:10:03] [INFO] F:\AI_Work\FluxGym\fluxgym\env\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 [2025-03-11 21:10:03] [INFO] warnings.warn( [2025-03-11 21:10:03] [INFO] You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 [2025-03-11 21:10:03] [INFO] 2025-03-11 21:10:03 INFO Loading dataset config from train_network.py:488 [2025-03-11 21:10:03] [INFO] F:\AI_Work\FluxGym\fluxgym\ou [2025-03-11 21:10:03] [INFO] tputs\lenniev1\dataset.toml [2025-03-11 21:10:03] [INFO] INFO prepare images. train_util.py:2049 [2025-03-11 21:10:03] [INFO] INFO get image size from name of train_util.py:1942 [2025-03-11 21:10:03] [INFO] cache files [2025-03-11 21:10:03] [INFO] 0%| | 0/21 [00:00 [2025-03-11 21:10:03] [INFO] INFO Cast FLUX model to fp8. flux_train_network.py:108 [2025-03-11 21:10:03] [INFO] This may take a while. [2025-03-11 21:10:03] [INFO] You can reduce the time [2025-03-11 21:10:03] [INFO] by using fp8 checkpoint. [2025-03-11 21:10:03] [INFO] / [2025-03-11 21:10:03] [INFO] FLUXモデルをfp8に変換し [2025-03-11 21:10:03] [INFO] ています。これには時間が [2025-03-11 21:10:03] [INFO] かかる場合があります。fp [2025-03-11 21:10:03] [INFO] 8チェックポイントを使用 [2025-03-11 21:10:03] [INFO] することで時間を短縮でき [2025-03-11 21:10:03] [INFO] ます。 [2025-03-11 21:10:58] [INFO] 2025-03-11 21:10:58 INFO Building CLIP-L flux_utils.py:179 [2025-03-11 21:10:58] [INFO] INFO Loading state dict from flux_utils.py:275 [2025-03-11 21:10:58] [INFO] F:\AI_Work\FluxGym\fluxgym\model [2025-03-11 21:10:58] [INFO] s\clip\clip_l.safetensors [2025-03-11 21:10:58] [INFO] INFO Loaded CLIP-L: [2025-03-11 21:10:58] [INFO] INFO Loading state dict from flux_utils.py:330 [2025-03-11 21:10:58] [INFO] F:\AI_Work\FluxGym\fluxgym\model [2025-03-11 21:10:58] [INFO] s\clip\t5xxl_fp16.safetensors [2025-03-11 21:10:58] [INFO] INFO Loaded T5xxl: [2025-03-11 21:10:58] [INFO] INFO Building AutoEncoder flux_utils.py:144 [2025-03-11 21:10:58] [INFO] INFO Loading state dict from flux_utils.py:149 [2025-03-11 21:10:58] [INFO] F:\AI_Work\FluxGym\fluxgym\model [2025-03-11 21:10:58] [INFO] s\vae\ae.sft [2025-03-11 21:10:59] [INFO] 2025-03-11 21:10:59 INFO Loaded AE: [2025-03-11 21:10:59] [INFO] import network module: networks.lora_flux [2025-03-11 21:10:59] [INFO] INFO [Dataset 0] train_util.py:2585 [2025-03-11 21:10:59] [INFO] INFO caching latents with caching train_util.py:1095 [2025-03-11 21:10:59] [INFO] strategy. [2025-03-11 21:10:59] [INFO] INFO caching latents... train_util.py:1144 [2025-03-11 21:11:03] [INFO] 0%| | 0/21 [00:00

Elranzer

2025-03-11 13:27:04 +0000 UTC

I have not. Thanks for letting me know.

Bruce

2025-03-09 05:44:44 +0000 UTC

Onetrainer has Hunyuan Lora training now. Have you looked into that yet?

Slap Dash Dolt

2025-03-09 01:24:38 +0000 UTC

This has been working great - but today, using the same image data set as before I'm now getting this error on trying to upload images HTTP 413: 413 Request Entity Too Large nginx/1.18.0 I've tried multiple pods and same thing happens in fluxgym

Alex Kilbee

2025-03-06 11:03:25 +0000 UTC

Send me a dm

Aitrepreneur

2025-03-04 23:58:13 +0000 UTC

I have the same problem, I installed GIT and Python and I started the installation again and the menu disappears... I have no idea how to fix the situation

Vinnyfm

2025-03-04 23:54:04 +0000 UTC

or do you have the first version of FLUX-LORA-FLUXGYM-INSTALL-V2.bat not the V2? thanks

Raiod

2025-03-04 17:06:05 +0000 UTC

So I have this problem, that im stuck on one command (python -m venv env) after I hit enter it always tells me that python was not found, even though I have it already installed it.

Raiod

2025-03-04 17:05:13 +0000 UTC

Is there any way to use the Florence Models from the all in one workflow inside FluxGym instead of having to connect to Huggingface?

Slap Dash Dolt

2025-03-02 23:22:22 +0000 UTC

hi, while training for an object, I've got the message: .... commercial-license', 'license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md'] license_str = license: other license_name: flux-1-dev-non-commercial-license license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md no samples . any idea of what'a going on ?

BOB

2025-03-01 23:09:36 +0000 UTC

I have a bunch of old fantasy art from a popular artist at the time. I'd like to try to make a Lora using that art style. Do you go about it in the same way as a model? And how do you tell comfy ui to make art using that type of art style?

Chris Christopher

2025-02-28 19:41:01 +0000 UTC

install GIT and Python first manually.

Jason Blake

2025-02-28 06:27:27 +0000 UTC

I don't understand whats going on. I'm doing the v2.bat. It does a download, then the terminal closes down, and there's no folder or anything.

Liam

2025-02-27 22:58:28 +0000 UTC

Heads up on a mistake I made, I found it by copying and pasting the errors to ChatGPT, make sure you don't accidently hit "O" instead of zero :( It looks like the error is coming from a typo in your command line arguments. Specifically, the error argument --multires_noise_discount: invalid float value: 'o.3' indicates that the argument value is being read as the string "o.3" (using the letter "o") rather than the float 0.3 (with a zero). Similarly, check your --noise_offset argument which appears as "o.1"; it should probably be "0.1". To fix the error, update your command and replace: --multires_noise_discount o.3 with --multires_noise_discount 0.3 --noise_offset o.1 with --noise_offset 0.1 These changes should allow the arguments to be correctly parsed as floats.

Jason Blake

2025-02-27 21:32:37 +0000 UTC

thank you

Jason Blake

2025-02-27 21:24:41 +0000 UTC

Think I got it. picture size is 1080x1498 because I cropped them. I had to go into the "max_bucket_reso" and add the value of 1500 to it. All I can say is thank god for Cluade.ai. I am no code expert lol

Zachary Garner

2025-02-27 19:24:11 +0000 UTC

in runpod it immediately says the training is complete right after it downloads the flux.dev. I've done everything in the video. flux1-dev.sft: 100%|█████████████████████████████████████████████████████████████████████████████████████████▉| 23.8G/23.8G [00:21<00:00, 1.09GB/s] downloading ae.sft... ae.sft: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████▉| 335M/335M [00:00<00:00, 647MB/s] download clip_l.safetensors clip_l.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████▉| 246M/246M [00:00<00:00, 725MB/s] download t5xxl_fp16.safetensors t5xxl_fp16.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████▉| 9.79G/9.79G [00:07<00:00, 1.37GB/s] concept_sentence=Zack lora_name Zack, concept_sentence=Zack, output_name=zack license_items=['license: other', 'license_name: flux-1-dev-non-commercial-license', 'license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md'] license_str = license: other license_name: flux-1-dev-non-commercial-license license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md no samples

Zachary Garner

2025-02-27 18:29:58 +0000 UTC

Yup i think i found the cause of it. I was using a laptop and when im not working i close it. After a short while i get the error so it seems if you unattend the gradio interface it returns the error. Leaving my pc on (monitor off) and the tab open (in the background) did the trick for me.

Benni

2025-02-27 15:35:35 +0000 UTC

Same thing happening to me, did you find an answer?

Jason Blake

2025-02-27 15:30:32 +0000 UTC

It worked great for me! Thank you so much!

Lance Kedron

2025-02-26 23:08:06 +0000 UTC

Git for windows and python 3.10.11 added to path. I will change the way the future installer works but these two should be installed manually for the best compatibility

Aitrepreneur

2025-02-26 22:36:39 +0000 UTC

just send me a dm man, otherwise I can't follow up

Aitrepreneur

2025-02-26 22:35:19 +0000 UTC

Ok, I can never get these installers to install properly - are there any pre-requisites that need to be installed on your pc prior to running the .bat?

Kenneth Kraft

2025-02-26 18:03:45 +0000 UTC

you're right, i let my frustration get the better of me... tough i hope to recieve some help, i'm trying to make a project for a friend and it saddens me that it won't work

Kevin Vandendriessche

2025-02-26 17:58:48 +0000 UTC

Rude and factually incorrect. Most people have little or no trouble with K's installers. When problems do occur he is there to help solve them. The alternative is to install manually... good luck.

LW

2025-02-26 16:45:24 +0000 UTC

ERROR: Exception: Traceback (most recent call last): File "e:\fluxgym\env\lib\site-packages\pip\_internal\cli\base_command.py", line 180, in _main status = self.run(options, args) File "e:\fluxgym\env\lib\site-packages\pip\_internal\cli\req_command.py", line 204, in wrapper return func(self, options, args) File "e:\fluxgym\env\lib\site-packages\pip\_internal\commands\install.py", line 318, in run requirement_set = resolver.resolve( File "e:\fluxgym\env\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 127, in resolve result = self._result = resolver.resolve( File "e:\fluxgym\env\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 473, in resolve state = resolution.resolve(requirements, max_rounds=max_rounds) File "e:\fluxgym\env\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 384, in resolve raise ResolutionTooDeep(max_rounds) pip._vendor.resolvelib.resolvers.ResolutionTooDeep: 2000000 Error: Failed to install PyTorch.

Kevin Vandendriessche

2025-02-26 15:12:40 +0000 UTC

would be fun if any of your stuff actually worked

Kevin Vandendriessche

2025-02-26 14:54:00 +0000 UTC

I had a problem with 2 GPU's, I temporarily disabled 1 in device manager. It's now working

Dale Romanov

2025-02-24 18:10:25 +0000 UTC

Is there a way to load a dataset that has already been created?

Bruce

2025-02-22 20:50:56 +0000 UTC

Hello! After clicking on "Add AI captions with Florance-2" having this error and I can't get a caption on my images :( (I have 5090 btw, maybe this is a reason?) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Тимур Ахманов

2025-02-22 20:20:27 +0000 UTC

what am i doing wrong here: [notice] A new release of pip is available: 24.0 -> 25.0.1 [notice] To update, run: python -m pip install --upgrade pip root@a02dea72446b:/workspace# python -m pip install --upgrade pip Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (24.0) Collecting pip Downloading pip-25.0.1-py3-none-any.whl.metadata (3.7 kB) Downloading pip-25.0.1-py3-none-any.whl (1.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 8.3 MB/s eta 0:00:00 Installing collected packages: pip Attempting uninstall: pip Found existing installation: pip 24.0 Uninstalling pip-24.0: Successfully uninstalled pip-24.0 Successfully installed pip-25.0.1 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv root@a02dea72446b:/workspace#

AI Mastery

2025-02-22 18:42:53 +0000 UTC

Probably a silly question but how do I simply start fluxgym again after restarting the pod?

CJ Rademeyer

2025-02-22 16:33:34 +0000 UTC

In the video he has a manual install too. I had the same issue as you and what I did was follow each step of the manual install, after I would install one program from the video, I would run the one click installer again, and if it crashed out I would install the next step in the manual installer video, run one click again, and after a few installs, the one click installer worked. To answer the next possible question, in the one click installer video for fluxgym, he tells you how to manually install everything towards the end of the video. Best of luck.

Jaiven

2025-02-22 16:18:06 +0000 UTC

Sometimes when it train a lora in runpod it exits wit following error Terminating process Killing process: I get this output in the terminal from Jupyter. This usually happens within the first few epochs. In Gradio it doenst show a error it just sitts there doing nothing. Another folloup question about this line in teh Gradio terminal Cast FLUX model to fp8. This may take a while. You can reduce the time by using fp8 why does it have to cast to fp8 when we wanna use fp16?

Benni

2025-02-22 15:29:41 +0000 UTC

Please make a new updated video. :|

ahazy

2025-02-22 14:42:54 +0000 UTC

It's a legit blank widows install. I'm gonna go outside and throw some ice at a wall and come back and try again!

ahazy

2025-02-22 14:33:56 +0000 UTC

At least you got it working with less than 20 images!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

ahazy

2025-02-22 14:32:31 +0000 UTC

?????????????????????

ahazy

2025-02-22 14:30:58 +0000 UTC

I double clicked the .bat file on a brand new install of windows on a new SSD on my user folder with no spaces. It looks like it's installing and then NOTHING HAPPENED!!!! BRO! The installer closed out! What do I do next? Should i reformat the hard drive and start over?????????????? I followed every single step!!! We need a simple step by step guide either video or text.

ahazy

2025-02-22 14:07:44 +0000 UTC

Heya. If I add more than 20 images, it gives me an error. So, I've just been reducing my image pool to 20. Is there a reason for this? Is there a setting so I can have 50 images? Thanks <3

Pete

2025-02-21 09:43:18 +0000 UTC

Wait new issue. SO it seems to have resumed when I did both of those but its stopping at epoch 6 for some reason. I have 70 training images. I set the parameter to 10 max train epochs. but it keeps sayin info Training Complete. Check the outputs folder for the LoRA files.

Virtamouse

2025-02-21 00:13:29 +0000 UTC

Little confusing since there is --resume saved state to resume training / 学習再開するモデルのstate --initial_epoch initial epoch number, 1 means first epoch (same as not specifying). NOTE: initial_epoch/step doesn't affect to lr scheduler. Which means lr scheduler will start from 0 without `--resume So would I input 6 in the inital epoch field? or enter the folder path into saved state to resume training? Folder path or folderpath\epochname.safesensors?

Virtamouse

2025-02-20 23:51:22 +0000 UTC

there is an option in the advanced parameters resume training from, just input the path of the epoch

Aitrepreneur

2025-02-20 23:12:15 +0000 UTC

This is probably a bad initial python install, you need to uninstall your current python installation and reinstall it correctly. Go to the add and remove programs, search for python and uninstall both the current python version and the python install program. Once this is done, go here and download this installer: https://www.python.org/ftp/python/3.10.11/python-3.10.11-amd64.exe Run it and check the “Add python 3.10 to Path” checkbox and continue with the installation. You can check that the right python version is installed by opening a new command prompt window and typing: python --version and it should give you the 3.10.11 version Then just relaunch the 1-click installer in a new folder and try again

Aitrepreneur

2025-02-20 23:11:28 +0000 UTC

it says that you didn't input an output name, not even counting that you didn't follow any of the parameters that I showed in the video either

Aitrepreneur

2025-02-20 23:10:48 +0000 UTC

how to resume training if I got an error and the training stopped at epoch 6? can it be done without starting over?

Virtamouse

2025-02-20 20:23:40 +0000 UTC

I have the same issue... :( I'll send a DM

Jean Dupont

2025-02-20 15:03:35 +0000 UTC

When I run the script I get the following error: ERROR: Could not find a version that satisfies the requirement accelerate==0.33.0 (from -r requirements.txt (line 1)) (from versions: 0.0.1, 0.1.0, 0.2.0, 0.2.1, 0.3.0, 0.4.0, 0.5.0, 0.5.1, 0.6.0, 0.6.1, 0.6.2, 0.7.0, 0.7.1, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.13.2, 0.14.0, 0.15.0, 0.16.0, 0.17.0, 0.17.1, 0.18.0, 0.19.0, 0.20.0, 0.20.1, 0.20.2, 0.20.3) ERROR: No matching distribution found for accelerate==0.33.0 (from -r requirements.txt (line 1)) Tried installing accelerate with pip but then I get another error saying pytorch is wrong. What should I do?

darmok72

2025-02-20 13:40:40 +0000 UTC

POTENTIAL ERROR FIX FOR SOME: If you type your own captions or use a program/AI that isn't Florence-2 to help generate captions, make sure that all characters in the caption are UTF-8 approved characters. I used an LLM to generate longer captions to help increase my model's quality but unknowingly had multiple instances of "curly" apostrophes (') and quotes ("), longer dashed lines, and accented letters -- which aren't allowed and stopped the training before it even started. It's tedious, but best practice is to pull your previous dataset (images & text files) into a fresh instance of Fluxgym BUT read the failed log output to see which text file had the issue and alter that file before drag and dropping it into your new instance. Wash and repeat. Maybe it was just me but editing the text within the browser while it already had a correlated text file didn't overwrite my changes and it still failed... so I recommend you edit it at the source before dragging it over, or copy and paste the words verbatim into only the dragged over image and forego the non UTF-8 compliant text file altogether.

SenoTakai

2025-02-20 03:59:22 +0000 UTC

[2025-02-20 03:48:49] [INFO] Running C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\outputs\train.bat [2025-02-20 03:48:49] [INFO] [2025-02-20 03:48:49] [INFO] (env) C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --split_mode --network_args "train_blocks=single" --lr_scheduler constant_with_warmup --max_grad_norm 0.0 --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 16 --save_every_n_epochs 4 --dataset_config "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\outputs\dataset.toml" --output_dir "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\outputs" --output_name --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 [2025-02-20 03:48:54] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead: [2025-02-20 03:48:54] [INFO] `--num_processes` was set to a value of `1` [2025-02-20 03:48:54] [INFO] `--num_machines` was set to a value of `1` [2025-02-20 03:48:54] [INFO] `--dynamo_backend` was set to a value of `'no'` [2025-02-20 03:48:54] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. [2025-02-20 03:48:58] [INFO] usage: flux_train_network.py [-h] [2025-02-20 03:48:58] [INFO] [--console_log_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [2025-02-20 03:48:58] [INFO] [--console_log_file CONSOLE_LOG_FILE] [2025-02-20 03:48:58] [INFO] [--console_log_simple] [--v2] [2025-02-20 03:48:58] [INFO] [--v_parameterization] [2025-02-20 03:48:58] [INFO] [--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH] [2025-02-20 03:48:58] [INFO] [--tokenizer_cache_dir TOKENIZER_CACHE_DIR] [2025-02-20 03:48:58] [INFO] [--train_data_dir TRAIN_DATA_DIR] [--cache_info] [2025-02-20 03:48:58] [INFO] [--shuffle_caption] [2025-02-20 03:48:58] [INFO] [--caption_separator CAPTION_SEPARATOR] [2025-02-20 03:48:58] [INFO] [--caption_extension CAPTION_EXTENSION] [2025-02-20 03:48:58] [INFO] [--caption_extention CAPTION_EXTENTION] [2025-02-20 03:48:58] [INFO] [--keep_tokens KEEP_TOKENS] [2025-02-20 03:48:58] [INFO] [--keep_tokens_separator KEEP_TOKENS_SEPARATOR] [2025-02-20 03:48:58] [INFO] [--secondary_separator SECONDARY_SEPARATOR] [2025-02-20 03:48:58] [INFO] [--enable_wildcard] [2025-02-20 03:48:58] [INFO] [--caption_prefix CAPTION_PREFIX] [2025-02-20 03:48:58] [INFO] [--caption_suffix CAPTION_SUFFIX] [--color_aug] [2025-02-20 03:48:58] [INFO] [--flip_aug] [2025-02-20 03:48:58] [INFO] [--face_crop_aug_range FACE_CROP_AUG_RANGE] [2025-02-20 03:48:58] [INFO] [--random_crop] [--debug_dataset] [2025-02-20 03:48:58] [INFO] [--resolution RESOLUTION] [--cache_latents] [2025-02-20 03:48:58] [INFO] [--vae_batch_size VAE_BATCH_SIZE] [2025-02-20 03:48:58] [INFO] [--cache_latents_to_disk] [--skip_cache_check] [2025-02-20 03:48:58] [INFO] [--enable_bucket] [2025-02-20 03:48:58] [INFO] [--min_bucket_reso MIN_BUCKET_RESO] [2025-02-20 03:48:58] [INFO] [--max_bucket_reso MAX_BUCKET_RESO] [2025-02-20 03:48:58] [INFO] [--bucket_reso_steps BUCKET_RESO_STEPS] [2025-02-20 03:48:58] [INFO] [--bucket_no_upscale] [2025-02-20 03:48:58] [INFO] [--token_warmup_min TOKEN_WARMUP_MIN] [2025-02-20 03:48:58] [INFO] [--token_warmup_step TOKEN_WARMUP_STEP] [2025-02-20 03:48:58] [INFO] [--alpha_mask] [--dataset_class DATASET_CLASS] [2025-02-20 03:48:58] [INFO] [--caption_dropout_rate CAPTION_DROPOUT_RATE] [2025-02-20 03:48:58] [INFO] [--caption_dropout_every_n_epochs CAPTION_DROPOUT_EVERY_N_EPOCHS] [2025-02-20 03:48:58] [INFO] [--caption_tag_dropout_rate CAPTION_TAG_DROPOUT_RATE] [2025-02-20 03:48:58] [INFO] [--reg_data_dir REG_DATA_DIR] [--in_json IN_JSON] [2025-02-20 03:48:58] [INFO] [--dataset_repeats DATASET_REPEATS] [2025-02-20 03:48:58] [INFO] [--output_dir OUTPUT_DIR] [2025-02-20 03:48:58] [INFO] [--output_name OUTPUT_NAME] [2025-02-20 03:48:58] [INFO] [--huggingface_repo_id HUGGINGFACE_REPO_ID] [2025-02-20 03:48:58] [INFO] [--huggingface_repo_type HUGGINGFACE_REPO_TYPE] [2025-02-20 03:48:58] [INFO] [--huggingface_path_in_repo HUGGINGFACE_PATH_IN_REPO] [2025-02-20 03:48:58] [INFO] [--huggingface_token HUGGINGFACE_TOKEN] [2025-02-20 03:48:58] [INFO] [--huggingface_repo_visibility HUGGINGFACE_REPO_VISIBILITY] [2025-02-20 03:48:58] [INFO] [--save_state_to_huggingface] [2025-02-20 03:48:58] [INFO] [--resume_from_huggingface] [--async_upload] [2025-02-20 03:48:58] [INFO] [--save_precision {None,float,fp16,bf16}] [2025-02-20 03:48:58] [INFO] [--save_every_n_epochs SAVE_EVERY_N_EPOCHS] [2025-02-20 03:48:58] [INFO] [--save_every_n_steps SAVE_EVERY_N_STEPS] [2025-02-20 03:48:58] [INFO] [--save_n_epoch_ratio SAVE_N_EPOCH_RATIO] [2025-02-20 03:48:58] [INFO] [--save_last_n_epochs SAVE_LAST_N_EPOCHS] [2025-02-20 03:48:58] [INFO] [--save_last_n_epochs_state SAVE_LAST_N_EPOCHS_STATE] [2025-02-20 03:48:58] [INFO] [--save_last_n_steps SAVE_LAST_N_STEPS] [2025-02-20 03:48:58] [INFO] [--save_last_n_steps_state SAVE_LAST_N_STEPS_STATE] [2025-02-20 03:48:58] [INFO] [--save_state] [--save_state_on_train_end] [2025-02-20 03:48:58] [INFO] [--resume RESUME] [2025-02-20 03:48:58] [INFO] [--train_batch_size TRAIN_BATCH_SIZE] [2025-02-20 03:48:58] [INFO] [--max_token_length {None,150,225}] [2025-02-20 03:48:58] [INFO] [--mem_eff_attn] [--torch_compile] [2025-02-20 03:48:58] [INFO] [--dynamo_backend {eager,aot_eager,inductor,aot_ts_nvfuser,nvprims_nvfuser,cudagraphs,ofi,fx2trt,onnxrt,tensort,ipex,tvm}] [2025-02-20 03:48:58] [INFO] [--xformers] [--sdpa] [--vae VAE] [2025-02-20 03:48:58] [INFO] [--max_train_steps MAX_TRAIN_STEPS] [2025-02-20 03:48:58] [INFO] [--max_train_epochs MAX_TRAIN_EPOCHS] [2025-02-20 03:48:58] [INFO] [--max_data_loader_n_workers MAX_DATA_LOADER_N_WORKERS] [2025-02-20 03:48:58] [INFO] [--persistent_data_loader_workers] [--seed SEED] [2025-02-20 03:48:58] [INFO] [--gradient_checkpointing] [2025-02-20 03:48:58] [INFO] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [2025-02-20 03:48:58] [INFO] [--mixed_precision {no,fp16,bf16}] [--full_fp16] [2025-02-20 03:48:58] [INFO] [--full_bf16] [--fp8_base] [2025-02-20 03:48:58] [INFO] [--ddp_timeout DDP_TIMEOUT] [2025-02-20 03:48:58] [INFO] [--ddp_gradient_as_bucket_view] [2025-02-20 03:48:58] [INFO] [--ddp_static_graph] [--clip_skip CLIP_SKIP] [2025-02-20 03:48:58] [INFO] [--logging_dir LOGGING_DIR] [2025-02-20 03:48:58] [INFO] [--log_with {tensorboard,wandb,all}] [2025-02-20 03:48:58] [INFO] [--log_prefix LOG_PREFIX] [2025-02-20 03:48:58] [INFO] [--log_tracker_name LOG_TRACKER_NAME] [2025-02-20 03:48:58] [INFO] [--wandb_run_name WANDB_RUN_NAME] [2025-02-20 03:48:58] [INFO] [--log_tracker_config LOG_TRACKER_CONFIG] [2025-02-20 03:48:58] [INFO] [--wandb_api_key WANDB_API_KEY] [--log_config] [2025-02-20 03:48:58] [INFO] [--noise_offset NOISE_OFFSET] [2025-02-20 03:48:58] [INFO] [--noise_offset_random_strength] [2025-02-20 03:48:58] [INFO] [--multires_noise_iterations MULTIRES_NOISE_ITERATIONS] [2025-02-20 03:48:58] [INFO] [--ip_noise_gamma IP_NOISE_GAMMA] [2025-02-20 03:48:58] [INFO] [--ip_noise_gamma_random_strength] [2025-02-20 03:48:58] [INFO] [--multires_noise_discount MULTIRES_NOISE_DISCOUNT] [2025-02-20 03:48:58] [INFO] [--adaptive_noise_scale ADAPTIVE_NOISE_SCALE] [2025-02-20 03:48:58] [INFO] [--zero_terminal_snr] [2025-02-20 03:48:58] [INFO] [--min_timestep MIN_TIMESTEP] [2025-02-20 03:48:58] [INFO] [--max_timestep MAX_TIMESTEP] [2025-02-20 03:48:58] [INFO] [--loss_type {l1,l2,huber,smooth_l1}] [2025-02-20 03:48:58] [INFO] [--huber_schedule {constant,exponential,snr}] [2025-02-20 03:48:58] [INFO] [--huber_c HUBER_C] [--huber_scale HUBER_SCALE] [2025-02-20 03:48:58] [INFO] [--lowram] [--highvram] [2025-02-20 03:48:58] [INFO] [--sample_every_n_steps SAMPLE_EVERY_N_STEPS] [2025-02-20 03:48:58] [INFO] [--sample_at_first] [2025-02-20 03:48:58] [INFO] [--sample_every_n_epochs SAMPLE_EVERY_N_EPOCHS] [2025-02-20 03:48:58] [INFO] [--sample_prompts SAMPLE_PROMPTS] [2025-02-20 03:48:58] [INFO] [--sample_sampler {ddim,pndm,lms,euler,euler_a,heun,dpm_2,dpm_2_a,dpmsolver,dpmsolver++,dpmsingle,k_lms,k_euler,k_euler_a,k_dpm_2,k_dpm_2_a}] [2025-02-20 03:48:58] [INFO] [--config_file CONFIG_FILE] [--output_config] [2025-02-20 03:48:58] [INFO] [--metadata_title METADATA_TITLE] [2025-02-20 03:48:58] [INFO] [--metadata_author METADATA_AUTHOR] [2025-02-20 03:48:58] [INFO] [--metadata_description METADATA_DESCRIPTION] [2025-02-20 03:48:58] [INFO] [--metadata_license METADATA_LICENSE] [2025-02-20 03:48:58] [INFO] [--metadata_tags METADATA_TAGS] [2025-02-20 03:48:58] [INFO] [--prior_loss_weight PRIOR_LOSS_WEIGHT] [2025-02-20 03:48:58] [INFO] [--conditioning_data_dir CONDITIONING_DATA_DIR] [2025-02-20 03:48:58] [INFO] [--masked_loss] [--deepspeed] [2025-02-20 03:48:58] [INFO] [--zero_stage {0,1,2,3}] [2025-02-20 03:48:58] [INFO] [--offload_optimizer_device {None,cpu,nvme}] [2025-02-20 03:48:58] [INFO] [--offload_optimizer_nvme_path OFFLOAD_OPTIMIZER_NVME_PATH] [2025-02-20 03:48:58] [INFO] [--offload_param_device {None,cpu,nvme}] [2025-02-20 03:48:58] [INFO] [--offload_param_nvme_path OFFLOAD_PARAM_NVME_PATH] [2025-02-20 03:48:58] [INFO] [--zero3_init_flag] [--zero3_save_16bit_model] [2025-02-20 03:48:58] [INFO] [--fp16_master_weights_and_gradients] [2025-02-20 03:48:58] [INFO] [--optimizer_type OPTIMIZER_TYPE] [2025-02-20 03:48:58] [INFO] [--use_8bit_adam] [--use_lion_optimizer] [2025-02-20 03:48:58] [INFO] [--learning_rate LEARNING_RATE] [2025-02-20 03:48:58] [INFO] [--max_grad_norm MAX_GRAD_NORM] [2025-02-20 03:48:58] [INFO] [--optimizer_args [OPTIMIZER_ARGS ...]] [2025-02-20 03:48:58] [INFO] [--lr_scheduler_type LR_SCHEDULER_TYPE] [2025-02-20 03:48:58] [INFO] [--lr_scheduler_args [LR_SCHEDULER_ARGS ...]] [2025-02-20 03:48:58] [INFO] [--lr_scheduler LR_SCHEDULER] [2025-02-20 03:48:58] [INFO] [--lr_warmup_steps LR_WARMUP_STEPS] [2025-02-20 03:48:58] [INFO] [--lr_decay_steps LR_DECAY_STEPS] [2025-02-20 03:48:58] [INFO] [--lr_scheduler_num_cycles LR_SCHEDULER_NUM_CYCLES] [2025-02-20 03:48:58] [INFO] [--lr_scheduler_power LR_SCHEDULER_POWER] [2025-02-20 03:48:58] [INFO] [--fused_backward_pass] [2025-02-20 03:48:58] [INFO] [--lr_scheduler_timescale LR_SCHEDULER_TIMESCALE] [2025-02-20 03:48:58] [INFO] [--lr_scheduler_min_lr_ratio LR_SCHEDULER_MIN_LR_RATIO] [2025-02-20 03:48:58] [INFO] [--dataset_config DATASET_CONFIG] [2025-02-20 03:48:58] [INFO] [--min_snr_gamma MIN_SNR_GAMMA] [2025-02-20 03:48:58] [INFO] [--scale_v_pred_loss_like_noise_pred] [2025-02-20 03:48:58] [INFO] [--v_pred_like_loss V_PRED_LIKE_LOSS] [2025-02-20 03:48:58] [INFO] [--debiased_estimation_loss] [2025-02-20 03:48:58] [INFO] [--weighted_captions] [2025-02-20 03:48:58] [INFO] [--cpu_offload_checkpointing] [--no_metadata] [2025-02-20 03:48:58] [INFO] [--save_model_as {None,ckpt,pt,safetensors}] [2025-02-20 03:48:58] [INFO] [--unet_lr UNET_LR] [2025-02-20 03:48:58] [INFO] [--text_encoder_lr [TEXT_ENCODER_LR ...]] [2025-02-20 03:48:58] [INFO] [--fp8_base_unet] [2025-02-20 03:48:58] [INFO] [--network_weights NETWORK_WEIGHTS] [2025-02-20 03:48:58] [INFO] [--network_module NETWORK_MODULE] [2025-02-20 03:48:58] [INFO] [--network_dim NETWORK_DIM] [2025-02-20 03:48:58] [INFO] [--network_alpha NETWORK_ALPHA] [2025-02-20 03:48:58] [INFO] [--network_dropout NETWORK_DROPOUT] [2025-02-20 03:48:58] [INFO] [--network_args [NETWORK_ARGS ...]] [2025-02-20 03:48:58] [INFO] [--network_train_unet_only] [2025-02-20 03:48:58] [INFO] [--network_train_text_encoder_only] [2025-02-20 03:48:58] [INFO] [--training_comment TRAINING_COMMENT] [2025-02-20 03:48:58] [INFO] [--dim_from_weights] [2025-02-20 03:48:58] [INFO] [--scale_weight_norms SCALE_WEIGHT_NORMS] [2025-02-20 03:48:58] [INFO] [--base_weights [BASE_WEIGHTS ...]] [2025-02-20 03:48:58] [INFO] [--base_weights_multiplier [BASE_WEIGHTS_MULTIPLIER ...]] [2025-02-20 03:48:58] [INFO] [--no_half_vae] [--skip_until_initial_step] [2025-02-20 03:48:58] [INFO] [--initial_epoch INITIAL_EPOCH] [2025-02-20 03:48:58] [INFO] [--initial_step INITIAL_STEP] [2025-02-20 03:48:58] [INFO] [--validation_seed VALIDATION_SEED] [2025-02-20 03:48:58] [INFO] [--validation_split VALIDATION_SPLIT] [2025-02-20 03:48:58] [INFO] [--validate_every_n_steps VALIDATE_EVERY_N_STEPS] [2025-02-20 03:48:58] [INFO] [--validate_every_n_epochs VALIDATE_EVERY_N_EPOCHS] [2025-02-20 03:48:58] [INFO] [--max_validation_steps MAX_VALIDATION_STEPS] [2025-02-20 03:48:58] [INFO] [--cache_text_encoder_outputs] [2025-02-20 03:48:58] [INFO] [--cache_text_encoder_outputs_to_disk] [2025-02-20 03:48:58] [INFO] [--text_encoder_batch_size TEXT_ENCODER_BATCH_SIZE] [2025-02-20 03:48:58] [INFO] [--disable_mmap_load_safetensors] [2025-02-20 03:48:58] [INFO] [--weighting_scheme {sigma_sqrt,logit_normal,mode,cosmap,none,uniform}] [2025-02-20 03:48:58] [INFO] [--logit_mean LOGIT_MEAN] [--logit_std LOGIT_STD] [2025-02-20 03:48:58] [INFO] [--mode_scale MODE_SCALE] [2025-02-20 03:48:58] [INFO] [--blocks_to_swap BLOCKS_TO_SWAP] [2025-02-20 03:48:58] [INFO] [--clip_l CLIP_L] [--t5xxl T5XXL] [--ae AE] [2025-02-20 03:48:58] [INFO] [--controlnet_model_name_or_path CONTROLNET_MODEL_NAME_OR_PATH] [2025-02-20 03:48:58] [INFO] [--t5xxl_max_token_length T5XXL_MAX_TOKEN_LENGTH] [2025-02-20 03:48:58] [INFO] [--apply_t5_attn_mask] [2025-02-20 03:48:58] [INFO] [--guidance_scale GUIDANCE_SCALE] [2025-02-20 03:48:58] [INFO] [--timestep_sampling {sigma,uniform,sigmoid,shift,flux_shift}] [2025-02-20 03:48:58] [INFO] [--sigmoid_scale SIGMOID_SCALE] [2025-02-20 03:48:58] [INFO] [--model_prediction_type {raw,additive,sigma_scaled}] [2025-02-20 03:48:58] [INFO] [--discrete_flow_shift DISCRETE_FLOW_SHIFT] [2025-02-20 03:48:58] [INFO] [--split_mode] [2025-02-20 03:48:58] [INFO] flux_train_network.py: error: argument --output_name: expected one argument [2025-02-20 03:48:58] [INFO] Traceback (most recent call last): [2025-02-20 03:48:58] [INFO] File "", line 198, in _run_module_as_main [2025-02-20 03:48:58] [INFO] File "", line 88, in _run_code [2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Scripts\accelerate.exe\main.py", line 7, in [2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main [2025-02-20 03:48:58] [INFO] args.func(args) [2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command [2025-02-20 03:48:58] [INFO] simple_launcher(args) [2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher [2025-02-20 03:48:58] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2025-02-20 03:48:58] [INFO] subprocess.CalledProcessError: Command '['C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\env\\Scripts\\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\unet\\flux1-dev.sft', '--clip_l', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\clip\\clip_l.safetensors', '--t5xxl', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\clip\\t5xxl_fp16.safetensors', '--ae', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\vae\\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\outputs\\dataset.toml', '--output_dir', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\outputs', '--output_name', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 2. [2025-02-20 03:48:59] [ERROR] Command exited with code 1 [2025-02-20 03:48:59] [INFO] Runner: got this error

Anton Karpuzikov

2025-02-20 01:51:06 +0000 UTC

ok then try this: go inside the C:\Users\YOURUSERNAME\.cache\huggingface\hub folder and look for the folder called "models--multimodalart--Florence-2-large-no-flash-attn" and delete it. THEN once you are inside the C:\Users\YOURUSERNAME\.cache\huggingface\hub folder click on the folder path, type cmd press enter, this will bring a command prompt window inside that folder and inside type: git lfs install git clone https://huggingface.co/Aitrepreneur/Florence-2-large-no-flash-attn this will take some time to download (a few minutes) once you see that everything was downloaded (the whole folder should be 1.4gb) you need to rename that folder from Florence-2-large-no-flash-attn into models--multimodalart--Florence-2-large-no-flash-attn then you can relaunch fluxgym

Aitrepreneur

2025-02-19 23:59:41 +0000 UTC

no you need to train video loras for specifically the hunyuan model. Each model needs their own lora training since every architecture is different

Aitrepreneur

2025-02-19 23:09:18 +0000 UTC

as in my other comment. there was still an issue with a dual system. I installed on a single 4080 system with the same images and it worked fine. I am sure there are added stuff I could do to make it work but the 4080 is fine.

NewBe2

2025-02-19 20:40:30 +0000 UTC

its the dual 4090 still. I installed on a single 4080 system and it runs fine. same images, settings.

NewBe2

2025-02-19 20:36:16 +0000 UTC

Can you elaborate or link to an article that defines what "weird image dimensions" are? I believe I'm having a similar issue. I was successful on a smaller dataset of 40 images, but I tried to expand it to 60 images using new images that had to have a significant amount of the image cropped out, so the aspect ratio for some of the new images are >2:1. Once everything is run through the bucket, that makes the resolution >2048:1024.

SenoTakai

2025-02-19 20:18:17 +0000 UTC

login to jupiter labs (connect) and click on workspace/flxgym. there is an outputfolder. right click download ;-)

Marc

2025-02-19 19:29:23 +0000 UTC

Great video! As I am using Runpod, where can I find the LORAs once the training is done?

Leonardo Piumi

2025-02-19 17:01:26 +0000 UTC

Same. I've tried everything. I reached out to Aitrepreneur and he said to delete cache in C:\Users\'username\.cache\huggingface\hub but that didn't work for me either. I'm reinstalling currently to see if that helps.

Reign2294

2025-02-19 15:56:08 +0000 UTC

I can't get florence 2 to work inside Gymflux. Sometimes it starts to download, other times it crashes out right away: run_captioning concept sentence captions ('', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '') device=cuda pytorch_model.bin: 0%| | 0.00/1.54G [00:13

JW

2025-02-19 14:00:13 +0000 UTC

Yeah, the trained LoRA's I have created of my wife work fantastically when generating images via F8 & GGUF Image Generation workflows, but they don't seem to work at all when I try to use them with Hunyuan Text to Video. Is this because the LoRA's created in Fluxgym are incompatible? Rgthree perhaps? If so, can we convert them?

Osvaldo Alfaro

2025-02-19 06:22:03 +0000 UTC

[2025-02-18 21:35:44] [INFO] Running C:\Users\chat1\ai\fluxgym\outputs\robink\train.bat [2025-02-18 21:35:44] [INFO] [2025-02-18 21:35:44] [INFO] (env) C:\Users\chat1\ai\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 --num_processes=1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\Users\chat1\ai\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\Users\chat1\ai\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\Users\chat1\ai\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\Users\chat1\ai\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 16 --optimizer_type adamw8bit --learning_rate 5e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 10 --save_every_n_epochs 1 --dataset_config "C:\Users\chat1\ai\fluxgym\outputs\robink\dataset.toml" --output_dir "C:\Users\chat1\ai\fluxgym\outputs\robink" --output_name robink --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 --enable_bucket --min_snr_gamma 5 --multires_noise_discount 0.3 --multires_noise_iterations 6 --noise_offset 0.1 --train_batch_size 2 [2025-02-18 21:35:48] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead: [2025-02-18 21:35:48] [INFO] `--num_machines` was set to a value of `1` [2025-02-18 21:35:48] [INFO] `--dynamo_backend` was set to a value of `'no'` [2025-02-18 21:35:48] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. [2025-02-18 21:35:51] [INFO] 2025-02-18 21:35:51 INFO highvram is enabled / train_util.py:4305 [2025-02-18 21:35:51] [INFO] highvramが有効です [2025-02-18 21:35:51] [INFO] WARNING cache_latents_to_disk is train_util.py:4322 [2025-02-18 21:35:51] [INFO] enabled, so cache_latents is [2025-02-18 21:35:51] [INFO] also enabled / [2025-02-18 21:35:51] [INFO] cache_latents_to_diskが有効なた [2025-02-18 21:35:51] [INFO] め、cache_latentsを有効にします [2025-02-18 21:35:51] [INFO] 2025-02-18 21:35:51 INFO Checking the state dict: flux_utils.py:43 [2025-02-18 21:35:51] [INFO] Diffusers or BFL, dev or schnell [2025-02-18 21:35:51] [INFO] INFO t5xxl_max_token_length: flux_train_network.py:152 [2025-02-18 21:35:51] [INFO] 512 [2025-02-18 21:35:51] [INFO] C:\Users\chat1\ai\fluxgym\env\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 [2025-02-18 21:35:51] [INFO] warnings.warn( [2025-02-18 21:35:51] [INFO] You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 [2025-02-18 21:35:51] [INFO] INFO Loading dataset config from train_network.py:446 [2025-02-18 21:35:51] [INFO] C:\Users\chat1\ai\fluxgym\out [2025-02-18 21:35:51] [INFO] puts\robink\dataset.toml [2025-02-18 21:35:51] [INFO] INFO prepare images. train_util.py:2062 [2025-02-18 21:35:51] [INFO] INFO get image size from name of train_util.py:1951 [2025-02-18 21:35:51] [INFO] cache files [2025-02-18 21:35:51] [INFO] 0%| | 0/24 [00:00 [2025-02-18 21:35:51] [INFO] trainer.train(args) [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\sd-scripts\train_network.py", line 521, in train [2025-02-18 21:35:51] [INFO] accelerator = train_util.prepare_accelerator(args) [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\sd-scripts\library\train_util.py", line 5384, in prepare_accelerator [2025-02-18 21:35:51] [INFO] accelerator = Accelerator( [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\accelerator.py", line 383, in init [2025-02-18 21:35:51] [INFO] self.state = AcceleratorState( [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\state.py", line 846, in init [2025-02-18 21:35:51] [INFO] PartialState(cpu, **kwargs) [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\state.py", line 270, in init [2025-02-18 21:35:51] [INFO] self.num_processes = torch.distributed.get_world_size() [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\distributed_c10d.py", line 2020, in get_world_size [2025-02-18 21:35:51] [INFO] return _get_group_size(group) [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\distributed_c10d.py", line 986, in _get_group_size [2025-02-18 21:35:51] [INFO] default_pg = _get_default_group() [2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\distributed_c10d.py", line 1150, in _get_default_group [2025-02-18 21:35:51] [INFO] raise ValueError( [2025-02-18 21:35:51] [INFO] ValueError: Default process group has not been initialized, please make sure to call init_process_group. [2025-02-18 21:35:52] [INFO] Traceback (most recent call last): [2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main [2025-02-18 21:35:52] [INFO] return _run_code(code, main_globals, None, [2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code [2025-02-18 21:35:52] [INFO] exec(code, run_globals) [2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\Scripts\accelerate.exe\main.py", line 7, in [2025-02-18 21:35:52] [INFO] sys.exit(main()) [2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main [2025-02-18 21:35:52] [INFO] args.func(args) [2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command [2025-02-18 21:35:52] [INFO] simple_launcher(args) [2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 704, in ensimple_launcher [2025-02-18 21:35:52] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2025-02-18 21:35:52] [INFO] subprocess.CalledProcessError: Command '['C:\\Users\\chat1\\ai\\fluxgym\\env\\Scripts\\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\unet\\flux1-dev.sft', '--clip_l', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\clip\\clip_l.safetensors', '--t5xxl', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\clip\\t5xxl_fp16.safetensors', '--ae', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\vae\\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '16', '--optimizer_type', 'adamw8bit', '--learning_rate', '5e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '10', '--save_every_n_epochs', '1', '--dataset_config', 'C:\\Users\\chat1\\ai\\fluxgym\\outputs\\robink\\dataset.toml', '--output_dir', 'C:\\Users\\chat1\\ai\\fluxgym\\outputs\\robink', '--output_name', 'robink', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2', '--enable_bucket', '--min_snr_gamma', '5', '--multires_noise_discount', '0.3', '--multires_noise_iterations', '6', '--noise_offset', '0.1', '--train_batch_size', '2']' returned non-zero exit status 1. [2025-02-18 21:35:52] [ERROR] Command exited with code 1 [2025-02-18 21:35:52] [INFO] Runner: So I added the --num=processes=1 and it got past the multiple GPU issue. However, I ran into the one above. I have tried different images. Different number of images, Different resolutions, your suggested changes, using default values..... Even closed and restarted app between different runs. Always the same error.

NewBe2

2025-02-19 06:02:07 +0000 UTC

Sweet thanks! It works :)

Verratanectu

2025-02-19 03:50:05 +0000 UTC

This is probably a bad initial python install, you need to uninstall your current python installation and reinstall it correctly. Go to the add and remove programs, search for python and uninstall both the current python version and the python install program. Once this is done, go here and download this installer: https://www.python.org/ftp/python/3.10.11/python-3.10.11-amd64.exe Run it and check the “Add python 3.10 to Path” checkbox and continue with the installation. You can check that the right python version is installed by opening a new command prompt window and typing: python --version and it should give you the 3.10.11 version Then just relaunch the 1-click installer in a new folder and try again.

Aitrepreneur

2025-02-19 03:37:12 +0000 UTC

Got the following error Starting FluxGym installation... Python already installed. Git already installed. Cloning FluxGym repository... Cloning into 'fluxgym'... remote: Enumerating objects: 271, done. remote: Counting objects: 100% (156/156), done. remote: Compressing objects: 100% (59/59), done. remote: Total 271 (delta 127), reused 97 (delta 97), pack-reused 115 (from 2) Receiving objects: 100% (271/271), 16.52 MiB | 37.17 MiB/s, done. Resolving deltas: 100% (156/156), done. Cloning SD Scripts repository... Cloning into 'sd-scripts'... remote: Enumerating objects: 9097, done. remote: Counting objects: 100% (37/37), done. remote: Compressing objects: 100% (22/22), done. remote: Total 9097 (delta 29), reused 17 (delta 15), pack-reused 9060 (from 3) Receiving objects: 100% (9097/9097), 11.14 MiB | 31.77 MiB/s, done. Resolving deltas: 100% (6592/6592), done. Downloading launcher... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 52 100 52 0 0 172 0 --:--:-- --:--:-- --:--:-- 172 Creating Python virtual environment... Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases. Error: Failed to create virtual environment. Press any key to continue . . .

Verratanectu

2025-02-19 03:19:44 +0000 UTC

https://c.tenor.com/cwoN93BINOMAAAAC/tenor.gif

Aitrepreneur

2025-02-19 02:46:44 +0000 UTC

Are there any plans of making something like this for hunyuan video. Because that would be awesome.

Bruce

2025-02-19 02:42:24 +0000 UTC

Yes I'm using the 16gb training preset, I also tried to activate --full_bf16 (could'nt figure out a way to make the fp16 work), --fused_backward_pass and --xformers and lower the network_dim to 10, I noticed a small improvement, but it still takes around 1 hour to train 1 epoch. I also noticed that when I have more than 1 picture in the training batch it tends to offload some of the training onto my RAM, slowing the whole process, so I kept it at 1. Problem is : I'm not sure how long is it roughly supposed to take to train 1 epoch with this setup, maybe it's normal it takes so much time, but I was hoping this new graphic card was strong enough to endure more ai work. (Same issue with hunyan where it takes ages to generate anything ...). On the other hand, the image generation with Flux Q8 in GGUF takes around 50s for one image. Let me know if you need any other specific informations and thanks again for your help !

LeGregouz

2025-02-19 01:33:28 +0000 UTC

edited app.py to add the parameter. Crashed at a different spot. Might have issues with training data. Will work on it and get back.

NewBe2

2025-02-19 01:04:12 +0000 UTC

It's because you have multiple GPU, it doesn't really support it correctly. To fix this, you need to edit the app.py file. At the line 453, press enter and put: --num_processes=1 {line_break} so that it looks like this: sh = f"""accelerate launch {line_break} --num_processes=1 {line_break} --mixed_precision bf16 {line_break} then save the file and reload fluxgym

Aitrepreneur

2025-02-19 00:58:45 +0000 UTC

stops before it starts training with the following log [2025-02-18 16:41:20] [INFO] (env) C:\Users\chat1\ai\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\Users\chat1\ai\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\Users\chat1\ai\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\Users\chat1\ai\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\Users\chat1\ai\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adamw8bit --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 16 --save_every_n_epochs 4 --dataset_config "C:\Users\chat1\ai\fluxgym\outputs\robin1\dataset.toml" --output_dir "C:\Users\chat1\ai\fluxgym\outputs\robin1" --output_name robin1 --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 [2025-02-18 16:41:23] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead: [2025-02-18 16:41:23] [INFO] `--num_processes` was set to a value of `2` [2025-02-18 16:41:23] [INFO] More than one GPU was found, enabling multi-GPU training. [2025-02-18 16:41:23] [INFO] If this was unintended please pass in `--num_processes=1`. [2025-02-18 16:41:23] [INFO] `--num_machines` was set to a value of `1` [2025-02-18 16:41:23] [INFO] `--dynamo_backend` was set to a value of `'no'` [2025-02-18 16:41:23] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. [2025-02-18 16:41:23] [INFO] W0218 16:41:23.645788 4916 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs. [2025-02-18 16:41:25] [INFO] Traceback (most recent call last): [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main [2025-02-18 16:41:25] [INFO] return _run_code(code, main_globals, None, [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code [2025-02-18 16:41:25] [INFO] exec(code, run_globals) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\Scripts\accelerate.exe\main.py", line 7, in [2025-02-18 16:41:25] [INFO] sys.exit(main()) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main [2025-02-18 16:41:25] [INFO] args.func(args) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1097, in launch_command [2025-02-18 16:41:25] [INFO] multi_gpu_launcher(args) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 734, in multi_gpu_launcher [2025-02-18 16:41:25] [INFO] distrib_run.run(args) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\run.py", line 910, in run [2025-02-18 16:41:25] [INFO] elastic_launch( [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\launcher\api.py", line 138, in call [2025-02-18 16:41:25] [INFO] return launch_agent(self._config, self._entrypoint, list(args)) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\launcher\api.py", line 260, in launch_agent [2025-02-18 16:41:25] [INFO] result = agent.run() [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper [2025-02-18 16:41:25] [INFO] result = f(*args, **kwargs) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 696, in run [2025-02-18 16:41:25] [INFO] result = self._invoke_run(role) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 849, in _invoke_run [2025-02-18 16:41:25] [INFO] self._initialize_workers(self._worker_group) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper [2025-02-18 16:41:25] [INFO] result = f(*args, **kwargs) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 668, in _initialize_workers [2025-02-18 16:41:25] [INFO] self._rendezvous(worker_group) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper [2025-02-18 16:41:25] [INFO] result = f(*args, **kwargs) [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 500, in _rendezvous [2025-02-18 16:41:25] [INFO] rdzv_info = spec.rdzv_handler.next_rendezvous() [2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 67, in next_rendezvous [2025-02-18 16:41:25] [INFO] self._store = TCPStore( # type: ignore[call-arg] [2025-02-18 16:41:25] [INFO] RuntimeError: use_libuv was requested but PyTorch was build without libuv support [2025-02-18 16:41:26] [ERROR] Command exited with code 1 [2025-02-18 16:41:26] [INFO] Runner: The only difference between my system and yours is I have dual 4090's. Not sure if that is the issue but I can't manually change the num_processes=1.

NewBe2

2025-02-19 00:45:30 +0000 UTC

I saw on runpod that stopping the pod kinda destroys the env, so it's better to just completely delete the pod and redo the install each time unfortunately. Otherwise you can also do this, once you are inside the fluxgym folder, click on the terminal icon on the right then type: cd sd-scripts pip install -r requirements.txt cd .. pip install -r requirements.txt pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 python app.py

Aitrepreneur

2025-02-19 00:26:27 +0000 UTC

Looks like there is a problem with your dataset, seems like you might have some weird image dimension for 1 or multiple images. Make sure your images aren't too big either for training.

Aitrepreneur

2025-02-19 00:24:26 +0000 UTC

it's difficult to say just from this message, especially since Fluxgym is so bad at showing progress. Have you chosen the specific 16g or 12gb training presets in fluxgym?

Aitrepreneur

2025-02-19 00:22:41 +0000 UTC

Yes for that particular error that is one of the possible fix (tbh not even sure why that value isn't set at 0 by default already in the project but oh well...)

Aitrepreneur

2025-02-19 00:20:42 +0000 UTC

What do you mean? You can't just train multiple loras at the same time (if that's what you mean...), it's better to just train a lora separately for each object and then use them inside the prompt. Or you can just merge those loras inside the flux model as well but you will lose the flexibility of a lora. Also you can use mutliple loras together already so, not sure about your question

Aitrepreneur

2025-02-19 00:19:39 +0000 UTC

go inside the C:\Users\YOURUSERNAME\.cache\huggingface\hub folder and look for the folder called "models--multimodalart--Florence-2-large-no-flash-attn" and delete it. Then try again. You can also use something like everything.exe (https://www.voidtools.com/downloads) to search for "models--multimodalart--Florence-2-large-no-flash-attn" and delete the folder

Aitrepreneur

2025-02-19 00:15:55 +0000 UTC

just select them, then right click and download.

Aitrepreneur

2025-02-19 00:12:00 +0000 UTC

I think there is a limit of 150 for the number of images you can use in fluxgym. You don't need more anyway

Aitrepreneur

2025-02-19 00:11:34 +0000 UTC

send me a dm

Aitrepreneur

2025-02-19 00:10:02 +0000 UTC

quality beats quantity, if you already have at least 20 images, just add more varied photos, don't just use selfies, use different angles, different lighting, etc but always as high quality as possible

Aitrepreneur

2025-02-19 00:09:49 +0000 UTC

same question as healingpaint, what's your installed python version? You need the 3.10.11, and added to path correctly.

Aitrepreneur

2025-02-19 00:08:11 +0000 UTC

I saw on runpod that stopping the pod kinda destroys the env, so it's better to just completely delete the pod and redo the install each time unfortunately. Otherwise you can also do this, once you are inside the fluxgym folder, click on the terminal icon on the right then type: cd sd-scripts pip install -r requirements.txt cd .. pip install -r requirements.txt pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 python app.py

Aitrepreneur

2025-02-19 00:06:55 +0000 UTC

does not work - maybe there is no activatet enviroment? if so - how do we activate? :-)

Marc

2025-02-18 19:52:45 +0000 UTC

hi, newbie here ,how do I restart fluxgym after I stop the pod and run the same pod again , all the file is still there , i tried enter the env mode and install the requirement ,then it stocked .....

Nathan lee

2025-02-18 16:41:38 +0000 UTC

[2025-02-18 17:36:21] [INFO] (env) C:\Ai\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\Ai\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\Ai\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\Ai\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\Ai\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --split_mode --network_args "train_blocks=single" --lr_scheduler constant_with_warmup --max_grad_norm 0.0 --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 10 --save_every_n_epochs 1 --dataset_config "C:\Ai\fluxgym\outputs\nelly-trinket\dataset.toml" --output_dir "C:\Ai\fluxgym\outputs\nelly-trinket" --output_name nelly-trinket --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 --enable_bucket --huggingface_repo_visibility private --min_snr_gamma 5 --multires_noise_discount 0.3 --multires_noise_iterations 6 --noise_offset 0.1 [2025-02-18 17:36:24] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead: [2025-02-18 17:36:24] [INFO] `--num_processes` was set to a value of `1` [2025-02-18 17:36:24] [INFO] `--num_machines` was set to a value of `1` [2025-02-18 17:36:24] [INFO] `--dynamo_backend` was set to a value of `'no'` [2025-02-18 17:36:24] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. [2025-02-18 17:36:26] [INFO] Traceback (most recent call last): [2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\flux_train_network.py", line 14, in [2025-02-18 17:36:26] [INFO] import train_network [2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\train_network.py", line 26, in [2025-02-18 17:36:26] [INFO] from library import deepspeed_utils, model_util, strategy_base, strategy_sd [2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\library\strategy_sd.py", line 7, in [2025-02-18 17:36:26] [INFO] from library import train_util [2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\library\train_util.py", line 291 [2025-02-18 17:36:26] [INFO] raise ValueError(f"Invalid image dimensions: {image_width}x{image_height} in file {self.image_path}") [2025-02-18 17:36:26] [INFO] IndentationError: expected an indented block after 'if' statement on line 290 [2025-02-18 17:36:27] [INFO] Traceback (most recent call last): [2025-02-18 17:36:27] [INFO] File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main [2025-02-18 17:36:27] [INFO] return _run_code(code, main_globals, None, [2025-02-18 17:36:27] [INFO] File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code [2025-02-18 17:36:27] [INFO] exec(code, run_globals) [2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\Scripts\accelerate.exe\main.py", line 7, in [2025-02-18 17:36:27] [INFO] sys.exit(main()) [2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main [2025-02-18 17:36:27] [INFO] args.func(args) [2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command [2025-02-18 17:36:27] [INFO] simple_launcher(args) [2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher [2025-02-18 17:36:27] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) [2025-02-18 17:36:27] [INFO] subprocess.CalledProcessError: Command '['C:\\Ai\\fluxgym\\env\\Scripts\\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'C:\\Ai\\fluxgym\\models\\unet\\flux1-dev.sft', '--clip_l', 'C:\\Ai\\fluxgym\\models\\clip\\clip_l.safetensors', '--t5xxl', 'C:\\Ai\\fluxgym\\models\\clip\\t5xxl_fp16.safetensors', '--ae', 'C:\\Ai\\fluxgym\\models\\vae\\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '10', '--save_every_n_epochs', '1', '--dataset_config', 'C:\\Ai\\fluxgym\\outputs\\nelly-trinket\\dataset.toml', '--output_dir', 'C:\\Ai\\fluxgym\\outputs\\nelly-trinket', '--output_name', 'nelly-trinket', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2', '--enable_bucket', '--huggingface_repo_visibility', 'private', '--min_snr_gamma', '5', '--multires_noise_discount', '0.3', '--multires_noise_iterations', '6', '--noise_offset', '0.1']' returned non-zero exit status 1. [2025-02-18 17:36:28] [ERROR] Command exited with code 1 [2025-02-18 17:36:28] [INFO] Runner: Get this error...

Killy_Blame

2025-02-18 16:38:07 +0000 UTC

Hi! In my system (AMD + RTX3090 + Win 11) it is stopping while running the 'write web request' with V2. Any help is very much appreciated.

Speedy2023

2025-02-18 05:29:13 +0000 UTC

Hello, Thank you very much for all the hard work, everything is easy to use all the time and that's absolutely great ! So far no issue with any of the one way install you made ! But I still have a question regarding training loras for Flux. I know it's supposed to take time to train a Lora, but setting up everything as you showed, it still takes a treamendous amout of time for me. I have an ASUS TUF Gaming GeForce RTX 4080 OC edition with 16Go of VRAM and I thought it might take a couple of hours to run at least 5 epoch, but so far it's been like 2 hours and I only have 1 epoch trained. I tried it with a dataset of thirty pictures and with a train batch size set to null and another one set to 4. Everything seems to work fine, but I don't see any difference on the time it takes to do one epoch, wether I augment the train batch size or not. I feel like with a 4080 SUPER with 16Go of Vram it should take less time, but maybe I'm wrong. Is there something I'm missing ? Maybe my GPU isn't giving all its potential for whatever reason (despite being at a 100% in task manager) ? Or is it absolutly normal ? Also can my CPU (in my case a ryzen 9 7950x3d) help in any way ? Also for exemple are the options --full_fp16 ; --fused_backward_pass or --xformers viable solutions to improve speed without losing too much quality ? Thank you for your answer ! (And if anyone else has any information about the time it should take with my setup or just how much time it roughly takes for you with your setup, feel free to share ! It could give everyone a rough idea of how long does it take to train one epoch for each setup 😊)

LeGregouz

2025-02-18 00:59:04 +0000 UTC

to get it to work on my machine, I just edited the app.py file, near the top. look for this HF_HUB_ENABLE_HF_TRANSFER and set the value to 0

Bob Winberry

2025-02-17 23:09:29 +0000 UTC

Same

Alex Kilbee

2025-02-17 19:00:23 +0000 UTC

Episode Topic Suggestion: "Multi-LORAs" , i.e. training multiple LORA , one for each object and then using multiple LORAs to make image with e.g. three newly trained objects? (here is some preliminary research I asked perplexity and deepresearch answered that it should be possible: https://www.perplexity.ai/search/i-see-tutorials-training-makin-SHc0lJBHR0SRd9ww6_0PdA )

Grzegorz Wierzowiecki

2025-02-17 18:43:19 +0000 UTC

When I click on add AI captions it errors with Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn' .... it's not downloading it return model_class.from_pretrained( File "E:\AI\FluxGym-Training\fluxgym\env\lib\site-packages\transformers\modeling_utils.py", line 3644, in from_pretrained raise EnvironmentError( OSError: Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'multimodalart/Florence-2-large-no-flash-attn' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Robb

2025-02-17 18:13:27 +0000 UTC

Same, commenting in case somebody finds the solution ✋

BS

2025-02-17 16:50:03 +0000 UTC

Hi I have a really basic question , I have trained my a LORA on runpod using fluxgym and I can see the .safetensor files on the code notebook , But how do I download them ?

Siddharth Shukla

2025-02-17 15:50:26 +0000 UTC

In K's video he uses 40 images for LoRa training. Is it possible to use say 100 or even 200 images without getting tuple index errors?

LW

2025-02-17 14:51:26 +0000 UTC

Actually, I haven't used Florence-2 yet. Working with some old hand edited images. It seeme that Florence must go in the folder "multimodalart", so you could try searching for that folder. If the folder is there but empty, you might be able to download the required model (Florence-2-large-no-flash-attn) directly from HuggingFace.

LW

2025-02-17 14:26:46 +0000 UTC

Installed flawlessly, already have two runs and they worked fantastic. Only took between 20-30mins for each run on my setup. Really appreciate this.

Michael Moeller

2025-02-17 07:14:25 +0000 UTC

V2 still doesn't nothing than coping python installation program to download.And nothing else...

Bobépine

2025-02-17 07:10:24 +0000 UTC

looks like it worked. I trained it on images of myself. Some of them don't work. This is the first time I've attempted to train a lora, so any ideas on getting better results? More images?

dattrax

2025-02-17 01:51:58 +0000 UTC

FYI, fixed by updating to Python 3.10.11 and making sure it was in my path as primary (top most)

HealingPaint

2025-02-17 00:51:33 +0000 UTC

Thanks I installed it, but without the uninstall part and made sure it was on top of the path/environmental variables and this seemed to have resolved it. Appreciate it!

HealingPaint

2025-02-17 00:51:10 +0000 UTC

I'm getting " Traceback (most recent call last): File "/workspace/fluxgym/app.py", line 8, in import gradio as gr ModuleNotFoundError: No module named 'gradio'" when typing that into the runpod terminal

Tyler89537

2025-02-16 23:50:15 +0000 UTC

you need to uninstall your current python installation and reinstall it correctly. Go to the add and remove programs, search for python and uninstall both the current python version and the python install program. Once this is done, go here and download this installer: https://www.python.org/ftp/python/3.10.11/python-3.10.11-amd64.exe Run it and check the “Add python 3.10 to Path” checkbox and continue with the installation. You can check that the right python version is installed by opening a new command prompt window and typing: python --version and it should give you the 3.10.11 version Then just relaunch the 1-click installer in a new folder and try again.

Aitrepreneur

2025-02-16 23:37:40 +0000 UTC

Is there a quick command prompt or powershell way to update this version and add to path? Thanks for the help.

HealingPaint

2025-02-16 23:30:29 +0000 UTC

Ahh I'm using Python 3.9.19

HealingPaint

2025-02-16 23:29:12 +0000 UTC

what's your installed python version? You need the 3.10.11, added to path correctly as well of course

Aitrepreneur

2025-02-16 23:24:02 +0000 UTC

My issue still happens...Any fix for this? Installation completed successfully Launching application... Traceback (most recent call last): File "E:\ComfyUI_windows_portable\ComfyUI\_templates\LORA Training (Flux)\fluxgym\app.py", line 19, in from library import flux_train_utils, huggingface_util File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_train_utils.py", line 17, in from library import flux_models, flux_utils, strategy_base, train_util File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 366, in class ModelSpec: File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 369, in ModelSpec ckpt_path: str | None TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' Press any key to continue . . .

HealingPaint

2025-02-16 23:07:32 +0000 UTC

unless actually modifying the whole app.py file no. Maybe creating a quick macro for the browser?

Aitrepreneur

2025-02-16 21:23:03 +0000 UTC

I uploaded the V2 installer, might solve this error

Aitrepreneur

2025-02-16 21:22:05 +0000 UTC

type: python app.py

Aitrepreneur

2025-02-16 21:21:46 +0000 UTC

Yes I just uploaded the V2, it should take care of the huggingface issue during the install, otherwise there is an additional option to change in the actual fluxgym repository, people can dm me for the rest

Aitrepreneur

2025-02-16 21:21:18 +0000 UTC

you should have dm me for that. One thing you can do is open a cmd window and drag and drop the installer inside then press enter, this will at least avoid the window from closing and will give at least an error message we can use to troubleshoot the issue.

Aitrepreneur

2025-02-16 21:10:27 +0000 UTC

it's crystools

Aitrepreneur

2025-02-16 21:09:06 +0000 UTC

it's a separate webui

Aitrepreneur

2025-02-16 21:07:57 +0000 UTC

as I said it will work with 8gb, it will just take longer but it works

Aitrepreneur

2025-02-16 21:07:47 +0000 UTC

Got it working. Had a ChatGPT session and we got it all worked out. 👍

MJ

2025-02-16 20:54:28 +0000 UTC

Seems like this is broken for a lot of people . Any fix coming?

HealingPaint

2025-02-16 20:49:32 +0000 UTC

Question. Do you have to adjust the advance settings every time or is there a way to save the settings?

Virtamouse

2025-02-16 20:41:17 +0000 UTC

Ok so the problem with Network Volumes is that we cannot just run the same script you've provided becuase the file paths already exist... So the environment and whatever else still needs to be initialized or installed onto th ePOD, even tho the files are already there.... I've messed with this for quite some time, but Perplexity is letting me down, I've been unable to make a new script that will run to download dependenceis and not re-download all the files that already exist... Any suggestions?

Tyler

2025-02-16 20:12:50 +0000 UTC

Found a solution from some guy on reddit :D Open the Start Menu, search for "Environment Variables", and select Edit the system environment variables. Click Environment Variables.... Under User variables or System variables, click New. Set: Variable name: HF_HUB_ENABLE_HF_TRANSFER Variable value: 0

MPG

2025-02-16 19:29:55 +0000 UTC

This is the error I get: File "G:\FUXgym\fluxgym\env\lib\site-packages\huggingface_hub\file_download.py", line 437, in http_get raise RuntimeError( RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

patreon@winberry.com

2025-02-16 18:25:10 +0000 UTC

@LW I am having trouble downloading the Florence-2. Does anyone know which exact folder that should go into via manual download? OSError: Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'multimodalart/Florence-2-large-no-flash-attn' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Christian

2025-02-16 17:06:54 +0000 UTC

How do I relaunch Fluxgym in runpod?

Bruce

2025-02-16 16:46:57 +0000 UTC

Not doing well, seems like I get an error right from the start. Anyone else getting this error, and if so how did you fix it. "WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available." Here is the complete train wreck! 😑 Installing SD Scripts dependencies... WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. Obtaining file:///C:/AI/FLUX%20LOAR%20FLUXGYM/fluxgym/sd-scripts (from -r requirements.txt (line 46)) Preparing metadata (setup.py) ... done WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/ Could not fetch URL https://pypi.org/simple/accelerate/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/accelerate/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping ERROR: Could not find a version that satisfies the requirement accelerate==0.33.0 (from versions: none) ERROR: No matching distribution found for accelerate==0.33.0 WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available. Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping Error: Failed to install SD Scripts dependencies. Press any key to continue . . .

MJ

2025-02-16 13:45:57 +0000 UTC

EDIT: PROBLEM SOLVED SEE BELOW (at the bottom) The installation process seems to work ok but once I set Flux Gym up to make a LoRa it tries to download Flux1-dev.sft file and keeps failing. I get the following error message: RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling. I have tried downloading the Flux1-dev.sft file from Huggingface and putting that in the unet folder (...fluxgym\models\unet\flux1-dev.sft) but Flux Gym still tries to download the file when I start a LoRa training. I have tried the following fix suggested by user MPG: Open the Start Menu, search for "Environment Variables", and select Edit the system environment variables. Click Environment Variables.... Under User variables or System variables, click New. Set: Variable name: HF_HUB_ENABLE_HF_TRANSFER Variable value: 0 However, even after a computer restart, I get the same error when Flux Gym tries to download the Flux1-dev.sft model. Any suggestions? EDIT: PROBLEM SOLVED:- The problem I have had is with HF-TRANSFER failing to download large files (the models). I'm not sure why. However, if you download them directly from HuggingFace and put them in the proper locations then the process works. I tried this above but I mistakenly used Flux1-dev.safetensors rather than Flux1-dev.sft. (Apparently "it" does make you go blind after all! OMG, I'm in trouble now!) Anyway.... ......\fluxgym\models\unet should contain Flux1-dev.sft (not Flux1-dev.safetensors .... you can just change the extension!) ......\fluxgym\models\clip should contain t5xxl_fp16.safetensors Those are the two files Flux Gym had problems downloading for me. You can download them from Hugging Face or you may already have them in your ComfyUI (other programs are available) set up, so you can just copy paste. Have fun

LW

2025-02-16 13:08:10 +0000 UTC

hmm. i get an error like this. OSError: cannot write mode RGBA as JPEG

Killy_Blame

2025-02-16 11:34:42 +0000 UTC

I ended up using Pinokio to get Fluxgym installed, as person commented earlier.

Greg

2025-02-16 11:08:35 +0000 UTC

After a lot of other errors, I am not stuck on this one as well.

Thomas

2025-02-16 09:44:10 +0000 UTC

Seems there is a checkbox that says Add Path. I think that is it. That's what i did anyway.

Thomas

2025-02-16 09:30:04 +0000 UTC

Amazing, Can we use the generated Lora also on Hunyuan Video generation in your ultimate workflow? I used the OneTrainer to train but for some reason my generated lora did not worked

Ryan Tavan

2025-02-16 09:21:18 +0000 UTC

I am kinda hard stuck. I downloaded everything, installed Python with PATH enabled, then I installed RUST, then it seems to have run to completion but didn't launch the app. I've tried opening the LAUNCHER.bat file, but it opens for a second then closes immediately. Any ideas what might be going wrong?

Ish

2025-02-16 08:24:22 +0000 UTC

How do you enable PATH? I've run into this same problem.

Lexi Barber

2025-02-16 08:16:20 +0000 UTC

Manually downloading the latest versions of Python (with PATH enabled) and Git fixed it for me

Rich Vol

2025-02-16 07:20:08 +0000 UTC

Found a solution from some guy on reddit :D Open the Start Menu, search for "Environment Variables", and select Edit the system environment variables. Click Environment Variables.... Under User variables or System variables, click New. Set: Variable name: HF_HUB_ENABLE_HF_TRANSFER Variable value: 0

MPG

2025-02-16 06:44:18 +0000 UTC

I get this error when I ran the 1 click installed (using Windows 11) - any ideas?? Installation completed successfully Launching application... Traceback (most recent call last): File "E:\ComfyUI_windows_portable\ComfyUI\_templates\LORA Training (Flux)\fluxgym\app.py", line 19, in from library import flux_train_utils, huggingface_util File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_train_utils.py", line 17, in from library import flux_models, flux_utils, strategy_base, train_util File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 366, in class ModelSpec: File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 369, in ModelSpec ckpt_path: str | None TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' Press any key to continue . . .

HealingPaint

2025-02-16 05:20:38 +0000 UTC

Hi, I receive the following error when trying to use the local installer: Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [48 lines of output] Traceback (most recent call last): File "D:\Flux\fluxgym\env\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in main() ^^ File "D:\Flux\fluxgym\env\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Flux\fluxgym\env\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 334, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=[]) ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 304, in _get_build_requires self.run_setup() ^^ File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 522, in run_setup super().run_setup(setup_script=setup_script) ~^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in run_setup exec(code, locals()) ^^^^^^^^^^^^^^^^ File "", line 128, in File "C:\Python313\Lib\subprocess.py", line 414, in check_call retcode = call(*popenargs, **kwargs) File "C:\Python313\Lib\subprocess.py", line 395, in call with Popen(*popenargs, **kwargs) as p: ~^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python313\Lib\subprocess.py", line 1036, in init self._execute_child(args, executable, preexec_fn, close_fds, ~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ pass_fds, cwd, env, ^^^^^^^^^^^^^^^^^^^ ...<5 lines>... gid, gids, uid, umask, ^^^^^^^^^^^^^^^^^^^^^^ start_new_session, process_group) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Python313\Lib\subprocess.py", line 1548, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, ~~~~~^^^^^^^^^^^^^^^^^^ # no special security ^^^^^^^^^^^^^^^^^^^^^ ...<4 lines>... cwd, ^^^^ startupinfo) ^^^^^^^^^^^^ FileNotFoundError: [WinError 2] The system cannot find the file specified [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. [notice] A new release of pip is available: 24.3.1 -> 25.0.1 [notice] To update, run: python.exe -m pip install --upgrade pip error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output. note: This error originates from a subprocess, and is likely not a problem with pip. Error: Failed to install SD Scripts dependencies. Press any key to continue . . .

Greg

2025-02-16 04:48:53 +0000 UTC

Does anyone else run into the issue where the .bat just closes after downloading python.310.11 , even after you have it already installed ?

Thomas

2025-02-16 04:26:00 +0000 UTC

Thanks for the tip! New patreon here myself

Herman

2025-02-16 01:51:15 +0000 UTC

Hey there, long time watcher, new Patreon. Love your stuff. You should really include to ppl to create a network volume under 'storage' on runpod, set THAT to 100gb, and launch a POD with the network volume/storage. 100GB is $7/mo and you won't lose your data, while the daily cost with the way you've shared it is like $0.50/day. It adds up fast! Hope this helps as I messed with runpod for HOURS to get it working and find a better cost effective solution. You can delete a POD, and your data will stay on the network volume so you can resume a new POD easily and not have to redownload models/files. $$$

Tyler

2025-02-16 01:01:02 +0000 UTC

I love these 1-step installers. Great for us lazy ass... er busy people who don't know a git from a hub :) And, as usual the video makes it all crystal clear. Thanks K. I think I can turn the heating off as my GPU will be glowing for a while :)

LW

2025-02-15 23:18:14 +0000 UTC

I don't know why but I have Python installed but your installers have never worked for me. They always close whenever I try to run them and have to do all my installs manually. I've been installing these programs via git for a couple of years now so I'm not a total noob but not an expert either so doing it manually isn't a deal breaker for me but it would be nice to not have to go through the trouble every now and then. Just wish I knew what was going on. Any time I run the installer, the installer closes immediately. It says the installer ran successfully but nothing actually happens. It just opens and then immediately closes.

lokitsar

2025-02-15 23:16:37 +0000 UTC

I noticed your CPU, GPU, Vram overlay in the video. Which program are you using? I am not very happy with any I have found thus far.

Mark

2025-02-15 22:54:06 +0000 UTC

Heya tryna run this right now I'm pretty sure I'm following along well enough but I'm getting this error RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

MPG

2025-02-15 22:52:46 +0000 UTC

FYI you may have to install Rust. Their happens to be a link to the language download if the installer fails.

Mark

2025-02-15 22:39:10 +0000 UTC

Hi ! Can it be installed in the new Comfy UI created for the V3-ULTIMATE_FLUX_ALL-IN-ONE-WORKFLOW? If yes, in what folder should the installer be launched ? like: ././ComfyUI_windows_portable/ ?

Alex Arangon

2025-02-15 22:11:58 +0000 UTC

Is there any way to make FluxGym work with 8GB of VRAM?

Demitri Grigori

2025-02-15 21:28:53 +0000 UTC

Finally can make Man Bear Pig lora on my potato gpu. Thank you

Wes C

2025-02-15 20:47:06 +0000 UTC

1-Click INSTALL FLUXGYM - EASY FLUX LORA TRAINING!

Comments

Yep, like others sadly crashes for me at 'writing request stream'...shame, looked so good :-(

You putting out a run pod for wan2.1 i2v/t2v loras?

Hi does the one click installer still work? i get - Writing web request Writing request stream... (Number of bytes written: 24131810) it runs for a little bit like this then cuts out

idk. starting to think @aitrepreneur is a runpod agent trying to make us spend as much money on the platform as possible by promising easy one-click solutions for 5 dollars a month, but delivering non-working, buggy tutorials that waste our weekends and credits. geez man… fix your stuff.

Is there any chance of a fluxgym installer for rtx 50 series cards, please?

This might be a dumb question, but could this be modified to work with HIDREAM or even Pony based model?

happens to me to

When i launch the .bat, it close after the python installer download. [process exited with code 0] . Any solutions ?

Whats new in the V2 version?

same problem here, tried to reinstall it, use multiple methods, but always the same problem

I'm getting this error now all of a sudden when it worked fine before.

is anyone here have this error? "mat1 and mat2 shapes cannot be multiplied (1x2304 and 2816x1280)"

Inside of the fluxgym folder there is a bat file "LAUNCHER.bat" execute that to re open it.

After "Training Complete. Check the outputs folder for the LoRA files." There are no safetensors files saved. What might be the problem?

Hey. What is your suggestion for parameters if we have 40 person images to train for flux schnell with 16 GB VRAM?

I have not. Thanks for letting me know.

Onetrainer has Hunyuan Lora training now. Have you looked into that yet?

This has been working great - but today, using the same image data set as before I'm now getting this error on trying to upload images HTTP 413: 413 Request Entity Too Large nginx/1.18.0 I've tried multiple pods and same thing happens in fluxgym

Send me a dm

I have the same problem, I installed GIT and Python and I started the installation again and the menu disappears... I have no idea how to fix the situation

or do you have the first version of FLUX-LORA-FLUXGYM-INSTALL-V2.bat not the V2? thanks

So I have this problem, that im stuck on one command (python -m venv env) after I hit enter it always tells me that python was not found, even though I have it already installed it.

Is there any way to use the Florence Models from the all in one workflow inside FluxGym instead of having to connect to Huggingface?

I have a bunch of old fantasy art from a popular artist at the time. I'd like to try to make a Lora using that art style. Do you go about it in the same way as a model? And how do you tell comfy ui to make art using that type of art style?

install GIT and Python first manually.

I don't understand whats going on. I'm doing the v2.bat. It does a download, then the terminal closes down, and there's no folder or anything.

thank you

Think I got it. picture size is 1080x1498 because I cropped them. I had to go into the "max_bucket_reso" and add the value of 1500 to it. All I can say is thank god for Cluade.ai. I am no code expert lol

Yup i think i found the cause of it. I was using a laptop and when im not working i close it. After a short while i get the error so it seems if you unattend the gradio interface it returns the error. Leaving my pc on (monitor off) and the tab open (in the background) did the trick for me.

Same thing happening to me, did you find an answer?

It worked great for me! Thank you so much!

Git for windows and python 3.10.11 added to path. I will change the way the future installer works but these two should be installed manually for the best compatibility

just send me a dm man, otherwise I can't follow up

Ok, I can never get these installers to install properly - are there any pre-requisites that need to be installed on your pc prior to running the .bat?

you're right, i let my frustration get the better of me... tough i hope to recieve some help, i'm trying to make a project for a friend and it saddens me that it won't work

Rude and factually incorrect. Most people have little or no trouble with K's installers. When problems do occur he is there to help solve them. The alternative is to install manually... good luck.

would be fun if any of your stuff actually worked

I had a problem with 2 GPU's, I temporarily disabled 1 in device manager. It's now working

Is there a way to load a dataset that has already been created?

Probably a silly question but how do I simply start fluxgym again after restarting the pod?

Please make a new updated video. :|

It's a legit blank widows install. I'm gonna go outside and throw some ice at a wall and come back and try again!

At least you got it working with less than 20 images!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

?????????????????????

Heya. If I add more than 20 images, it gives me an error. So, I've just been reducing my image pool to 20. Is there a reason for this? Is there a setting so I can have 50 images? Thanks <3

Wait new issue. SO it seems to have resumed when I did both of those but its stopping at epoch 6 for some reason. I have 70 training images. I set the parameter to 10 max train epochs. but it keeps sayin info Training Complete. Check the outputs folder for the LoRA files.

there is an option in the advanced parameters resume training from, just input the path of the epoch

it says that you didn't input an output name, not even counting that you didn't follow any of the parameters that I showed in the video either

how to resume training if I got an error and the training stopped at epoch 6? can it be done without starting over?

I have the same issue... :( I'll send a DM

no you need to train video loras for specifically the hunyuan model. Each model needs their own lora training since every architecture is different

as in my other comment. there was still an issue with a dual system. I installed on a single 4080 system with the same images and it worked fine. I am sure there are added stuff I could do to make it work but the 4080 is fine.

its the dual 4090 still. I installed on a single 4080 system and it runs fine. same images, settings.

login to jupiter labs (connect) and click on workspace/flxgym. there is an outputfolder. right click download ;-)

Great video! As I am using Runpod, where can I find the LORAs once the training is done?

Same. I've tried everything. I reached out to Aitrepreneur and he said to delete cache in C:\Users\'username\.cache\huggingface\hub but that didn't work for me either. I'm reinstalling currently to see if that helps.

Sweet thanks! It works :)

https://c.tenor.com/cwoN93BINOMAAAAC/tenor.gif

Are there any plans of making something like this for hunyuan video. Because that would be awesome.

edited app.py to add the parameter. Crashed at a different spot. Might have issues with training data. Will work on it and get back.

Looks like there is a problem with your dataset, seems like you might have some weird image dimension for 1 or multiple images. Make sure your images aren't too big either for training.

it's difficult to say just from this message, especially since Fluxgym is so bad at showing progress. Have you chosen the specific 16g or 12gb training presets in fluxgym?

Yes for that particular error that is one of the possible fix (tbh not even sure why that value isn't set at 0 by default already in the project but oh well...)

just select them, then right click and download.

I think there is a limit of 150 for the number of images you can use in fluxgym. You don't need more anyway

send me a dm

quality beats quantity, if you already have at least 20 images, just add more varied photos, don't just use selfies, use different angles, different lighting, etc but always as high quality as possible

same question as healingpaint, what's your installed python version? You need the 3.10.11, and added to path correctly.

does not work - maybe there is no activatet enviroment? if so - how do we activate? :-)

hi, newbie here ,how do I restart fluxgym after I stop the pod and run the same pod again , all the file is still there , i tried enter the env mode and install the requirement ,then it stocked .....

Hi! In my system (AMD + RTX3090 + Win 11) it is stopping while running the 'write web request' with V2. Any help is very much appreciated.

to get it to work on my machine, I just edited the app.py file, near the top. look for this HF_HUB_ENABLE_HF_TRANSFER and set the value to 0

Same

Same, commenting in case somebody finds the solution ✋

Hi I have a really basic question , I have trained my a LORA on runpod using fluxgym and I can see the .safetensor files on the code notebook , But how do I download them ?

In K's video he uses 40 images for LoRa training. Is it possible to use say 100 or even 200 images without getting tuple index errors?

Installed flawlessly, already have two runs and they worked fantastic. Only took between 20-30mins for each run on my setup. Really appreciate this.

V2 still doesn't nothing than coping python installation program to download.And nothing else...