1-Click INSTALL FLUXGYM - EASY FLUX LORA TRAINING!
Added 2025-02-15 20:41:44 +0000 UTC
Hey everyone! I've created a 1-click installer for FLUXGYM! The BEST and EASIEST webui for Training a LORA I have EVER SEEN!!!! It's just amazing!
You can check out the video right here: https://youtu.be/LILai5jIW1w
The installer of course automates the entire install process for fluxgym and its launcher!
DON'T FORGET TO USE THE NEW SPECIAL WORKFLOWS RIGHT HERE:
FLUX LORA TRAINING + LORA TESTING WORKFLOW | Patreon
LOCAL INSTALL!
1. Download the FLUX-LORA-FLUXGYM-INSTALL.bat
2. Run the bat file
3. ????
4. Profit ๐
IF YOU ARE USING RUNPOD:
1. Create an account if you haven't already: Runpod
2. Click on Pod (on the left side) then click deploy
3. Choose a GPU with at least 24gb of VRAM (a cheap 3090, 4090 or an A30 are great), and choose a pytorch template, then edit the template 100gb for both the container and volume disk, then deploy on demand
4. Go to my pods, wait for everything to finish and then click "connect", then "Connect to Jupyter lab"
5. Then drag and drop the FLUX-LORA-FLUXGYM-INSTALL-RUNPOD.sh file on the left side of the UI then click on the "Terminal" icon on the right side on the UI
6. Copy and paste these two lines then press enter:
chmod +x FLUX-LORA-FLUXGYM-INSTALL-RUNPOD.sh
./FLUX-LORA-FLUXGYM-INSTALL-RUNPOD.sh
7. Wait for everything to be installed
8. Then click on the public url
9. ???
10. Profit ๐
As always, supporting me on Patreon allows me to keep creating helpful resources like this for the community. Thank you for your support - now go have some fun๐!
Yep, like others sadly crashes for me at 'writing request stream'...shame, looked so good :-(
Steve Saunders
2025-07-02 15:22:20 +0000 UTC
You putting out a run pod for wan2.1 i2v/t2v loras?
Steve
2025-06-24 16:18:47 +0000 UTC
Dear Aitrepreneur
Love your videos and thanks for making all of your efforts to make these tools accessible to hobbyists like me.
I have tried to start training a flux LORA on runpod and encountered an issue, that is probably pretty basic, so hopefully easy to solve. I haven't found the solution in the existing conversation, hence my reaching out. The installation ran smoothly, I could open the training UI window without issue and apply all the settings. Then, when I start training it gives the error: "Cannot access gated repo for url https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors. Access to model black-forest-labs/FLUX.1-schnell is restricted. You must have access to it and be authenticated to access it. Please log in."
The way to resolve this according to what I could find online is by entering the text: from huggingface_hub import login
token = โtoken_nameโ
I have prepared the appropriate token and accepted T&C on hugging face for the model, but typing this in the command window (with the token_name replaced for the actual token obviously), doesn't work.
The workaround that I have for this on my PC was to simply download the model file manually and copying it to the folder. I would do the same on the runpod environment, but don't know how to do that. Alternatively, I can upload the model, but that would take ages.
Hope you can help.
Blonde Adonis
2025-06-22 11:40:07 +0000 UTC
Hi does the one click installer still work? i get - Writing web request Writing request stream... (Number of bytes written: 24131810) it runs for a little bit like this then cuts out
Waynethejockrohnson
2025-06-14 15:38:29 +0000 UTC
[2025-05-24 16:08:28] [INFO] gradient accumulation steps / ๅพ้
ใๅ่จใใในใใใๆฐ = 1
[2025-05-24 16:08:28] [INFO] total optimization steps / ๅญฆ็ฟในใใใๆฐ: 1800
[2025-05-24 16:08:41] [INFO] 2025-05-24 16:08:41 INFO unet dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1323
[2025-05-24 16:08:41] [INFO] INFO text_encoder [0] dtype: torch.float8_e4m3fn, device: cuda:0 train_network.py:1329
[2025-05-24 16:08:41] [INFO] INFO text_encoder [1] dtype: torch.bfloat16, device: cpu train_network.py:1329
[2025-05-24 16:08:42] [INFO] steps: 0%| | 0/1800 [00:00
Robert
2025-05-24 16:11:50 +0000 UTC
idk. starting to think @aitrepreneur is a runpod agent trying to make us spend as much money on the platform as possible by promising easy one-click solutions for 5 dollars a month, but delivering non-working, buggy tutorials that waste our weekends and credits. geez manโฆ fix your stuff.
Robert
2025-05-24 15:53:20 +0000 UTC
Is there any chance of a fluxgym installer for rtx 50 series cards, please?
John Holden
2025-05-23 21:26:47 +0000 UTC
This might be a dumb question, but could this be modified to work with HIDREAM or even Pony based model?
Plaiboy Magazine
2025-05-18 04:19:52 +0000 UTC
happens to me to
Bjarki Kjellsson
2025-04-28 19:46:53 +0000 UTC
When i launch the .bat, it close after the python installer download. [process exited with code 0] .
Any solutions ?
Snick3rs
2025-04-24 21:14:34 +0000 UTC
Whats new in the V2 version?
Virtamouse
2025-04-19 22:14:11 +0000 UTC
same problem here, tried to reinstall it, use multiple methods, but always the same problem
Lukรกลก Hรกjek
2025-03-22 17:12:29 +0000 UTC
I'm getting this error now all of a sudden when it worked fine before.
Virtamouse
2025-03-22 16:19:37 +0000 UTC
is anyone here have this error?
"mat1 and mat2 shapes cannot be multiplied (1x2304 and 2816x1280)"
Rluu
2025-03-21 14:40:01 +0000 UTC
Inside of the fluxgym folder there is a bat file "LAUNCHER.bat" execute that to re open it.
GRL
2025-03-18 02:00:50 +0000 UTC
I had that happen to me the first time I successfully ran this; check the log for errors; mine had an argument that was coming in that was not possible to run on my setup (--optimizer_type adamw8bit); for some reason, it runs on that specification even if you do not check it on the advanced menu. so I had to change that to (--optimizer_type adamw) and it worked. There are some of these that will require your computer to be set before running the tool in its environment.
GRL
2025-03-18 01:55:00 +0000 UTC
After "Training Complete. Check the outputs folder for the LoRA files." There are no safetensors files saved. What might be the problem?
Fabi AI
2025-03-17 12:29:01 +0000 UTC
If just installing doesn't work you can also try to update the current installation. here is the command line for that:
pip3 install --upgrade --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
But make sure to activate the env in the scripts folder before doing so.
GRL
2025-03-17 03:34:57 +0000 UTC
I found a thing that worked for me!
I made sure that I was in the correct environment for the fluxgym and ran this update for the latest pytorch :
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
This is assuming you are running the latest CUDA version for the 50 series, you can check what CUDA version you are running by using this command :
nvcc --version
if the release number is "release 12.8" that means you need cu128 and the pip3 install command above will install it for you.
Or you can use this florance based tool I found to caption your data set. :
https://github.com/MNeMoNiCuZ/florence2-caption-batch
I hope this helps.
GRL
2025-03-17 01:26:25 +0000 UTC
It looks to me that this has nothing to do with the installer he provided but with your computer failing to install the libraries. This may be caused by so many different things that it's difficult to predict. Try running the bat as admin; that may resolve permissions to install that environment to run the rest of the bat file properly.
GRL
2025-03-17 00:55:59 +0000 UTC
Yes! I tried selecting images that already have a txt file with the same name and dropping it in the "drop file here" area, and it will load your descriptions according to the file name, for example, "image_001.jpg" and "image_001.txt"
If they are named correctly, it will populate the fields for you.
I hope this helps.
GRL
2025-03-17 00:52:10 +0000 UTC
I have the same issue (also a 5090 user). Maybe it's related to the pytorch not having support for this drive, I tried to update using the following :
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
I found a tutorial for using Comfy on the 50 series and was told that we need to use this in the environment. But even after this, I still have that error when trying to run Florance-2.
I hope we can find a solution to this.
GRL
2025-03-17 00:45:58 +0000 UTC
Hey. What is your suggestion for parameters if we have 40 person images to train for flux schnell with 16 GB VRAM?
no name
2025-03-16 21:52:15 +0000 UTC
Hi Aitrepreneur, How about this error?
It just hangs my training forever without completing it.
[2025-03-11 21:10:02] [INFO] 2025-03-11 21:10:02 INFO Checking the state dict: flux_utils.py:43
[2025-03-11 21:10:02] [INFO] Diffusers or BFL, dev or schnell
[2025-03-11 21:10:02] [INFO] INFO t5xxl_max_token_length: flux_train_network.py:157
[2025-03-11 21:10:02] [INFO] 512
[2025-03-11 21:10:03] [INFO] F:\AI_Work\FluxGym\fluxgym\env\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
[2025-03-11 21:10:03] [INFO] warnings.warn(
[2025-03-11 21:10:03] [INFO] You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
[2025-03-11 21:10:03] [INFO] 2025-03-11 21:10:03 INFO Loading dataset config from train_network.py:488
[2025-03-11 21:10:03] [INFO] F:\AI_Work\FluxGym\fluxgym\ou
[2025-03-11 21:10:03] [INFO] tputs\lenniev1\dataset.toml
[2025-03-11 21:10:03] [INFO] INFO prepare images. train_util.py:2049
[2025-03-11 21:10:03] [INFO] INFO get image size from name of train_util.py:1942
[2025-03-11 21:10:03] [INFO] cache files
[2025-03-11 21:10:03] [INFO] 0%| | 0/21 [00:00
[2025-03-11 21:10:03] [INFO] INFO Cast FLUX model to fp8. flux_train_network.py:108
[2025-03-11 21:10:03] [INFO] This may take a while.
[2025-03-11 21:10:03] [INFO] You can reduce the time
[2025-03-11 21:10:03] [INFO] by using fp8 checkpoint.
[2025-03-11 21:10:03] [INFO] /
[2025-03-11 21:10:03] [INFO] FLUXใขใใซใfp8ใซๅคๆใ
[2025-03-11 21:10:03] [INFO] ใฆใใพใใใใใซใฏๆ้ใ
[2025-03-11 21:10:03] [INFO] ใใใๅ ดๅใใใใพใใfp
[2025-03-11 21:10:03] [INFO] 8ใใงใใฏใใคใณใใไฝฟ็จ
[2025-03-11 21:10:03] [INFO] ใใใใจใงๆ้ใ็ญ็ธฎใงใ
[2025-03-11 21:10:03] [INFO] ใพใใ
[2025-03-11 21:10:58] [INFO] 2025-03-11 21:10:58 INFO Building CLIP-L flux_utils.py:179
[2025-03-11 21:10:58] [INFO] INFO Loading state dict from flux_utils.py:275
[2025-03-11 21:10:58] [INFO] F:\AI_Work\FluxGym\fluxgym\model
[2025-03-11 21:10:58] [INFO] s\clip\clip_l.safetensors
[2025-03-11 21:10:58] [INFO] INFO Loaded CLIP-L:
[2025-03-11 21:10:58] [INFO] INFO Loading state dict from flux_utils.py:330
[2025-03-11 21:10:58] [INFO] F:\AI_Work\FluxGym\fluxgym\model
[2025-03-11 21:10:58] [INFO] s\clip\t5xxl_fp16.safetensors
[2025-03-11 21:10:58] [INFO] INFO Loaded T5xxl:
[2025-03-11 21:10:58] [INFO] INFO Building AutoEncoder flux_utils.py:144
[2025-03-11 21:10:58] [INFO] INFO Loading state dict from flux_utils.py:149
[2025-03-11 21:10:58] [INFO] F:\AI_Work\FluxGym\fluxgym\model
[2025-03-11 21:10:58] [INFO] s\vae\ae.sft
[2025-03-11 21:10:59] [INFO] 2025-03-11 21:10:59 INFO Loaded AE:
[2025-03-11 21:10:59] [INFO] import network module: networks.lora_flux
[2025-03-11 21:10:59] [INFO] INFO [Dataset 0] train_util.py:2585
[2025-03-11 21:10:59] [INFO] INFO caching latents with caching train_util.py:1095
[2025-03-11 21:10:59] [INFO] strategy.
[2025-03-11 21:10:59] [INFO] INFO caching latents... train_util.py:1144
[2025-03-11 21:11:03] [INFO] 0%| | 0/21 [00:00
Elranzer
2025-03-11 13:27:04 +0000 UTC
I have not. Thanks for letting me know.
Bruce
2025-03-09 05:44:44 +0000 UTC
Onetrainer has Hunyuan Lora training now. Have you looked into that yet?
Slap Dash Dolt
2025-03-09 01:24:38 +0000 UTC
This has been working great - but today, using the same image data set as before I'm now getting this error on trying to upload images
HTTP 413:
413 Request Entity Too Large
nginx/1.18.0
I've tried multiple pods and same thing happens in fluxgym
Alex Kilbee
2025-03-06 11:03:25 +0000 UTC
Send me a dm
Aitrepreneur
2025-03-04 23:58:13 +0000 UTC
I have the same problem, I installed GIT and Python and I started the installation again and the menu disappears... I have no idea how to fix the situation
Vinnyfm
2025-03-04 23:54:04 +0000 UTC
or do you have the first version of FLUX-LORA-FLUXGYM-INSTALL-V2.bat not the V2? thanks
Raiod
2025-03-04 17:06:05 +0000 UTC
So I have this problem, that im stuck on one command (python -m venv env) after I hit enter it always tells me that python was not found, even though I have it already installed it.
Raiod
2025-03-04 17:05:13 +0000 UTC
Is there any way to use the Florence Models from the all in one workflow inside FluxGym instead of having to connect to Huggingface?
Slap Dash Dolt
2025-03-02 23:22:22 +0000 UTC
hi, while training for an object, I've got the message: .... commercial-license', 'license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md']
license_str = license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
no samples .
any idea of what'a going on ?
BOB
2025-03-01 23:09:36 +0000 UTC
I have a bunch of old fantasy art from a popular artist at the time. I'd like to try to make a Lora using that art style. Do you go about it in the same way as a model? And how do you tell comfy ui to make art using that type of art style?
Chris Christopher
2025-02-28 19:41:01 +0000 UTC
install GIT and Python first manually.
Jason Blake
2025-02-28 06:27:27 +0000 UTC
I don't understand whats going on. I'm doing the v2.bat. It does a download, then the terminal closes down, and there's no folder or anything.
Liam
2025-02-27 22:58:28 +0000 UTC
Heads up on a mistake I made, I found it by copying and pasting the errors to ChatGPT, make sure you don't accidently hit "O" instead of zero :(
It looks like the error is coming from a typo in your command line arguments. Specifically, the error
argument --multires_noise_discount: invalid float value: 'o.3'
indicates that the argument value is being read as the string "o.3" (using the letter "o") rather than the float 0.3 (with a zero). Similarly, check your --noise_offset argument which appears as "o.1"; it should probably be "0.1".
To fix the error, update your command and replace:
--multires_noise_discount o.3 with --multires_noise_discount 0.3
--noise_offset o.1 with --noise_offset 0.1
These changes should allow the arguments to be correctly parsed as floats.
Jason Blake
2025-02-27 21:32:37 +0000 UTC
thank you
Jason Blake
2025-02-27 21:24:41 +0000 UTC
Think I got it. picture size is 1080x1498 because I cropped them. I had to go into the "max_bucket_reso" and add the value of 1500 to it. All I can say is thank god for Cluade.ai. I am no code expert lol
Zachary Garner
2025-02-27 19:24:11 +0000 UTC
in runpod it immediately says the training is complete right after it downloads the flux.dev. I've done everything in the video. flux1-dev.sft: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 23.8G/23.8G [00:21<00:00, 1.09GB/s]
downloading ae.sft...
ae.sft: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 335M/335M [00:00<00:00, 647MB/s]
download clip_l.safetensors
clip_l.safetensors: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 246M/246M [00:00<00:00, 725MB/s]
download t5xxl_fp16.safetensors
t5xxl_fp16.safetensors: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 9.79G/9.79G [00:07<00:00, 1.37GB/s]
concept_sentence=Zack
lora_name Zack, concept_sentence=Zack, output_name=zack
license_items=['license: other', 'license_name: flux-1-dev-non-commercial-license', 'license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md']
license_str = license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
no samples
Zachary Garner
2025-02-27 18:29:58 +0000 UTC
Yup i think i found the cause of it. I was using a laptop and when im not working i close it. After a short while i get the error so it seems if you unattend the gradio interface it returns the error. Leaving my pc on (monitor off) and the tab open (in the background) did the trick for me.
Benni
2025-02-27 15:35:35 +0000 UTC
Same thing happening to me, did you find an answer?
Jason Blake
2025-02-27 15:30:32 +0000 UTC
It worked great for me! Thank you so much!
Lance Kedron
2025-02-26 23:08:06 +0000 UTC
Git for windows and python 3.10.11 added to path.
I will change the way the future installer works but these two should be installed manually for the best compatibility
Aitrepreneur
2025-02-26 22:36:39 +0000 UTC
just send me a dm man, otherwise I can't follow up
Aitrepreneur
2025-02-26 22:35:19 +0000 UTC
Ok, I can never get these installers to install properly - are there any pre-requisites that need to be installed on your pc prior to running the .bat?
Kenneth Kraft
2025-02-26 18:03:45 +0000 UTC
you're right, i let my frustration get the better of me... tough i hope to recieve some help, i'm trying to make a project for a friend and it saddens me that it won't work
Kevin Vandendriessche
2025-02-26 17:58:48 +0000 UTC
Rude and factually incorrect. Most people have little or no trouble with K's installers. When problems do occur he is there to help solve them. The alternative is to install manually... good luck.
LW
2025-02-26 16:45:24 +0000 UTC
ERROR: Exception:
Traceback (most recent call last):
File "e:\fluxgym\env\lib\site-packages\pip\_internal\cli\base_command.py", line 180, in _main
status = self.run(options, args)
File "e:\fluxgym\env\lib\site-packages\pip\_internal\cli\req_command.py", line 204, in wrapper
return func(self, options, args)
File "e:\fluxgym\env\lib\site-packages\pip\_internal\commands\install.py", line 318, in run
requirement_set = resolver.resolve(
File "e:\fluxgym\env\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 127, in resolve
result = self._result = resolver.resolve(
File "e:\fluxgym\env\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 473, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
File "e:\fluxgym\env\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 384, in resolve
raise ResolutionTooDeep(max_rounds)
pip._vendor.resolvelib.resolvers.ResolutionTooDeep: 2000000
Error: Failed to install PyTorch.
Kevin Vandendriessche
2025-02-26 15:12:40 +0000 UTC
would be fun if any of your stuff actually worked
Kevin Vandendriessche
2025-02-26 14:54:00 +0000 UTC
I had a problem with 2 GPU's, I temporarily disabled 1 in device manager. It's now working
Dale Romanov
2025-02-24 18:10:25 +0000 UTC
Is there a way to load a dataset that has already been created?
Bruce
2025-02-22 20:50:56 +0000 UTC
Hello! After clicking on "Add AI captions with Florance-2" having this error and I can't get a caption on my images :( (I have 5090 btw, maybe this is a reason?)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ะขะธะผัั ะั
ะผะฐะฝะพะฒ
2025-02-22 20:20:27 +0000 UTC
what am i doing wrong here: [notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python -m pip install --upgrade pip
root@a02dea72446b:/workspace# python -m pip install --upgrade pip
Requirement already satisfied: pip in /usr/local/lib/python3.10/dist-packages (24.0)
Collecting pip
Downloading pip-25.0.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-25.0.1-py3-none-any.whl (1.8 MB)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1.8/1.8 MB 8.3 MB/s eta 0:00:00
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 24.0
Uninstalling pip-24.0:
Successfully uninstalled pip-24.0
Successfully installed pip-25.0.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@a02dea72446b:/workspace#
AI Mastery
2025-02-22 18:42:53 +0000 UTC
Probably a silly question but how do I simply start fluxgym again after restarting the pod?
CJ Rademeyer
2025-02-22 16:33:34 +0000 UTC
In the video he has a manual install too. I had the same issue as you and what I did was follow each step of the manual install, after I would install one program from the video, I would run the one click installer again, and if it crashed out I would install the next step in the manual installer video, run one click again, and after a few installs, the one click installer worked. To answer the next possible question, in the one click installer video for fluxgym, he tells you how to manually install everything towards the end of the video. Best of luck.
Jaiven
2025-02-22 16:18:06 +0000 UTC
Sometimes when it train a lora in runpod it exits wit following error
Terminating process
Killing process:
I get this output in the terminal from Jupyter.
This usually happens within the first few epochs.
In Gradio it doenst show a error it just sitts there doing nothing.
Another folloup question about this line in teh Gradio terminal
Cast FLUX model to fp8. This may take a while. You can reduce the time by using fp8
why does it have to cast to fp8 when we wanna use fp16?
Benni
2025-02-22 15:29:41 +0000 UTC
Please make a new updated video. :|
ahazy
2025-02-22 14:42:54 +0000 UTC
It's a legit blank widows install. I'm gonna go outside and throw some ice at a wall and come back and try again!
ahazy
2025-02-22 14:33:56 +0000 UTC
At least you got it working with less than 20 images!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
ahazy
2025-02-22 14:32:31 +0000 UTC
?????????????????????
ahazy
2025-02-22 14:30:58 +0000 UTC
I double clicked the .bat file on a brand new install of windows on a new SSD on my user folder with no spaces. It looks like it's installing and then NOTHING HAPPENED!!!! BRO! The installer closed out! What do I do next? Should i reformat the hard drive and start over?????????????? I followed every single step!!! We need a simple step by step guide either video or text.
ahazy
2025-02-22 14:07:44 +0000 UTC
Heya. If I add more than 20 images, it gives me an error. So, I've just been reducing my image pool to 20. Is there a reason for this? Is there a setting so I can have 50 images? Thanks <3
Pete
2025-02-21 09:43:18 +0000 UTC
Wait new issue. SO it seems to have resumed when I did both of those but its stopping at epoch 6 for some reason. I have 70 training images. I set the parameter to 10 max train epochs. but it keeps sayin info
Training Complete. Check the outputs folder for the LoRA files.
Virtamouse
2025-02-21 00:13:29 +0000 UTC
Little confusing since there is --resume
saved state to resume training / ๅญฆ็ฟๅ้ใใใขใใซใฎstate
--initial_epoch
initial epoch number, 1 means first epoch (same as not specifying). NOTE: initial_epoch/step doesn't affect to lr scheduler. Which means lr scheduler will start from 0 without `--resume
So would I input 6 in the inital epoch field?
or enter the folder path into saved state to resume training? Folder path or folderpath\epochname.safesensors?
Virtamouse
2025-02-20 23:51:22 +0000 UTC
there is an option in the advanced parameters resume training from, just input the path of the epoch
Aitrepreneur
2025-02-20 23:12:15 +0000 UTC
This is probably a bad initial python install, you need to uninstall your current python installation and reinstall it correctly. Go to the add and remove programs, search for python and uninstall both the current python version and the python install program. Once this is done, go here and download this installer: https://www.python.org/ftp/python/3.10.11/python-3.10.11-amd64.exe
Run it and check the โAdd python 3.10 to Pathโ checkbox and continue with the installation.
You can check that the right python version is installed by opening a new command prompt window and typing:
python --version
and it should give you the 3.10.11 version
Then just relaunch the 1-click installer in a new folder and try again
Aitrepreneur
2025-02-20 23:11:28 +0000 UTC
it says that you didn't input an output name, not even counting that you didn't follow any of the parameters that I showed in the video either
Aitrepreneur
2025-02-20 23:10:48 +0000 UTC
how to resume training if I got an error and the training stopped at epoch 6? can it be done without starting over?
Virtamouse
2025-02-20 20:23:40 +0000 UTC
I have the same issue... :( I'll send a DM
Jean Dupont
2025-02-20 15:03:35 +0000 UTC
When I run the script I get the following error:
ERROR: Could not find a version that satisfies the requirement accelerate==0.33.0 (from -r requirements.txt (line 1)) (from versions: 0.0.1, 0.1.0, 0.2.0, 0.2.1, 0.3.0, 0.4.0, 0.5.0, 0.5.1, 0.6.0, 0.6.1, 0.6.2, 0.7.0, 0.7.1, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.13.2, 0.14.0, 0.15.0, 0.16.0, 0.17.0, 0.17.1, 0.18.0, 0.19.0, 0.20.0, 0.20.1, 0.20.2, 0.20.3)
ERROR: No matching distribution found for accelerate==0.33.0 (from -r requirements.txt (line 1))
Tried installing accelerate with pip but then I get another error saying pytorch is wrong.
What should I do?
darmok72
2025-02-20 13:40:40 +0000 UTC
POTENTIAL ERROR FIX FOR SOME:
If you type your own captions or use a program/AI that isn't Florence-2 to help generate captions, make sure that all characters in the caption are UTF-8 approved characters.
I used an LLM to generate longer captions to help increase my model's quality but unknowingly had multiple instances of "curly" apostrophes (') and quotes ("), longer dashed lines, and accented letters -- which aren't allowed and stopped the training before it even started.
It's tedious, but best practice is to pull your previous dataset (images & text files) into a fresh instance of Fluxgym BUT read the failed log output to see which text file had the issue and alter that file before drag and dropping it into your new instance. Wash and repeat. Maybe it was just me but editing the text within the browser while it already had a correlated text file didn't overwrite my changes and it still failed... so I recommend you edit it at the source before dragging it over, or copy and paste the words verbatim into only the dragged over image and forego the non UTF-8 compliant text file altogether.
SenoTakai
2025-02-20 03:59:22 +0000 UTC
[2025-02-20 03:48:49] [INFO] Running C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\outputs\train.bat
[2025-02-20 03:48:49] [INFO]
[2025-02-20 03:48:49] [INFO] (env) C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --split_mode --network_args "train_blocks=single" --lr_scheduler constant_with_warmup --max_grad_norm 0.0 --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 16 --save_every_n_epochs 4 --dataset_config "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\outputs\dataset.toml" --output_dir "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\outputs" --output_name --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2
[2025-02-20 03:48:54] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead:
[2025-02-20 03:48:54] [INFO] `--num_processes` was set to a value of `1`
[2025-02-20 03:48:54] [INFO] `--num_machines` was set to a value of `1`
[2025-02-20 03:48:54] [INFO] `--dynamo_backend` was set to a value of `'no'`
[2025-02-20 03:48:54] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2025-02-20 03:48:58] [INFO] usage: flux_train_network.py [-h]
[2025-02-20 03:48:58] [INFO] [--console_log_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[2025-02-20 03:48:58] [INFO] [--console_log_file CONSOLE_LOG_FILE]
[2025-02-20 03:48:58] [INFO] [--console_log_simple] [--v2]
[2025-02-20 03:48:58] [INFO] [--v_parameterization]
[2025-02-20 03:48:58] [INFO] [--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH]
[2025-02-20 03:48:58] [INFO] [--tokenizer_cache_dir TOKENIZER_CACHE_DIR]
[2025-02-20 03:48:58] [INFO] [--train_data_dir TRAIN_DATA_DIR] [--cache_info]
[2025-02-20 03:48:58] [INFO] [--shuffle_caption]
[2025-02-20 03:48:58] [INFO] [--caption_separator CAPTION_SEPARATOR]
[2025-02-20 03:48:58] [INFO] [--caption_extension CAPTION_EXTENSION]
[2025-02-20 03:48:58] [INFO] [--caption_extention CAPTION_EXTENTION]
[2025-02-20 03:48:58] [INFO] [--keep_tokens KEEP_TOKENS]
[2025-02-20 03:48:58] [INFO] [--keep_tokens_separator KEEP_TOKENS_SEPARATOR]
[2025-02-20 03:48:58] [INFO] [--secondary_separator SECONDARY_SEPARATOR]
[2025-02-20 03:48:58] [INFO] [--enable_wildcard]
[2025-02-20 03:48:58] [INFO] [--caption_prefix CAPTION_PREFIX]
[2025-02-20 03:48:58] [INFO] [--caption_suffix CAPTION_SUFFIX] [--color_aug]
[2025-02-20 03:48:58] [INFO] [--flip_aug]
[2025-02-20 03:48:58] [INFO] [--face_crop_aug_range FACE_CROP_AUG_RANGE]
[2025-02-20 03:48:58] [INFO] [--random_crop] [--debug_dataset]
[2025-02-20 03:48:58] [INFO] [--resolution RESOLUTION] [--cache_latents]
[2025-02-20 03:48:58] [INFO] [--vae_batch_size VAE_BATCH_SIZE]
[2025-02-20 03:48:58] [INFO] [--cache_latents_to_disk] [--skip_cache_check]
[2025-02-20 03:48:58] [INFO] [--enable_bucket]
[2025-02-20 03:48:58] [INFO] [--min_bucket_reso MIN_BUCKET_RESO]
[2025-02-20 03:48:58] [INFO] [--max_bucket_reso MAX_BUCKET_RESO]
[2025-02-20 03:48:58] [INFO] [--bucket_reso_steps BUCKET_RESO_STEPS]
[2025-02-20 03:48:58] [INFO] [--bucket_no_upscale]
[2025-02-20 03:48:58] [INFO] [--token_warmup_min TOKEN_WARMUP_MIN]
[2025-02-20 03:48:58] [INFO] [--token_warmup_step TOKEN_WARMUP_STEP]
[2025-02-20 03:48:58] [INFO] [--alpha_mask] [--dataset_class DATASET_CLASS]
[2025-02-20 03:48:58] [INFO] [--caption_dropout_rate CAPTION_DROPOUT_RATE]
[2025-02-20 03:48:58] [INFO] [--caption_dropout_every_n_epochs CAPTION_DROPOUT_EVERY_N_EPOCHS]
[2025-02-20 03:48:58] [INFO] [--caption_tag_dropout_rate CAPTION_TAG_DROPOUT_RATE]
[2025-02-20 03:48:58] [INFO] [--reg_data_dir REG_DATA_DIR] [--in_json IN_JSON]
[2025-02-20 03:48:58] [INFO] [--dataset_repeats DATASET_REPEATS]
[2025-02-20 03:48:58] [INFO] [--output_dir OUTPUT_DIR]
[2025-02-20 03:48:58] [INFO] [--output_name OUTPUT_NAME]
[2025-02-20 03:48:58] [INFO] [--huggingface_repo_id HUGGINGFACE_REPO_ID]
[2025-02-20 03:48:58] [INFO] [--huggingface_repo_type HUGGINGFACE_REPO_TYPE]
[2025-02-20 03:48:58] [INFO] [--huggingface_path_in_repo HUGGINGFACE_PATH_IN_REPO]
[2025-02-20 03:48:58] [INFO] [--huggingface_token HUGGINGFACE_TOKEN]
[2025-02-20 03:48:58] [INFO] [--huggingface_repo_visibility HUGGINGFACE_REPO_VISIBILITY]
[2025-02-20 03:48:58] [INFO] [--save_state_to_huggingface]
[2025-02-20 03:48:58] [INFO] [--resume_from_huggingface] [--async_upload]
[2025-02-20 03:48:58] [INFO] [--save_precision {None,float,fp16,bf16}]
[2025-02-20 03:48:58] [INFO] [--save_every_n_epochs SAVE_EVERY_N_EPOCHS]
[2025-02-20 03:48:58] [INFO] [--save_every_n_steps SAVE_EVERY_N_STEPS]
[2025-02-20 03:48:58] [INFO] [--save_n_epoch_ratio SAVE_N_EPOCH_RATIO]
[2025-02-20 03:48:58] [INFO] [--save_last_n_epochs SAVE_LAST_N_EPOCHS]
[2025-02-20 03:48:58] [INFO] [--save_last_n_epochs_state SAVE_LAST_N_EPOCHS_STATE]
[2025-02-20 03:48:58] [INFO] [--save_last_n_steps SAVE_LAST_N_STEPS]
[2025-02-20 03:48:58] [INFO] [--save_last_n_steps_state SAVE_LAST_N_STEPS_STATE]
[2025-02-20 03:48:58] [INFO] [--save_state] [--save_state_on_train_end]
[2025-02-20 03:48:58] [INFO] [--resume RESUME]
[2025-02-20 03:48:58] [INFO] [--train_batch_size TRAIN_BATCH_SIZE]
[2025-02-20 03:48:58] [INFO] [--max_token_length {None,150,225}]
[2025-02-20 03:48:58] [INFO] [--mem_eff_attn] [--torch_compile]
[2025-02-20 03:48:58] [INFO] [--dynamo_backend {eager,aot_eager,inductor,aot_ts_nvfuser,nvprims_nvfuser,cudagraphs,ofi,fx2trt,onnxrt,tensort,ipex,tvm}]
[2025-02-20 03:48:58] [INFO] [--xformers] [--sdpa] [--vae VAE]
[2025-02-20 03:48:58] [INFO] [--max_train_steps MAX_TRAIN_STEPS]
[2025-02-20 03:48:58] [INFO] [--max_train_epochs MAX_TRAIN_EPOCHS]
[2025-02-20 03:48:58] [INFO] [--max_data_loader_n_workers MAX_DATA_LOADER_N_WORKERS]
[2025-02-20 03:48:58] [INFO] [--persistent_data_loader_workers] [--seed SEED]
[2025-02-20 03:48:58] [INFO] [--gradient_checkpointing]
[2025-02-20 03:48:58] [INFO] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[2025-02-20 03:48:58] [INFO] [--mixed_precision {no,fp16,bf16}] [--full_fp16]
[2025-02-20 03:48:58] [INFO] [--full_bf16] [--fp8_base]
[2025-02-20 03:48:58] [INFO] [--ddp_timeout DDP_TIMEOUT]
[2025-02-20 03:48:58] [INFO] [--ddp_gradient_as_bucket_view]
[2025-02-20 03:48:58] [INFO] [--ddp_static_graph] [--clip_skip CLIP_SKIP]
[2025-02-20 03:48:58] [INFO] [--logging_dir LOGGING_DIR]
[2025-02-20 03:48:58] [INFO] [--log_with {tensorboard,wandb,all}]
[2025-02-20 03:48:58] [INFO] [--log_prefix LOG_PREFIX]
[2025-02-20 03:48:58] [INFO] [--log_tracker_name LOG_TRACKER_NAME]
[2025-02-20 03:48:58] [INFO] [--wandb_run_name WANDB_RUN_NAME]
[2025-02-20 03:48:58] [INFO] [--log_tracker_config LOG_TRACKER_CONFIG]
[2025-02-20 03:48:58] [INFO] [--wandb_api_key WANDB_API_KEY] [--log_config]
[2025-02-20 03:48:58] [INFO] [--noise_offset NOISE_OFFSET]
[2025-02-20 03:48:58] [INFO] [--noise_offset_random_strength]
[2025-02-20 03:48:58] [INFO] [--multires_noise_iterations MULTIRES_NOISE_ITERATIONS]
[2025-02-20 03:48:58] [INFO] [--ip_noise_gamma IP_NOISE_GAMMA]
[2025-02-20 03:48:58] [INFO] [--ip_noise_gamma_random_strength]
[2025-02-20 03:48:58] [INFO] [--multires_noise_discount MULTIRES_NOISE_DISCOUNT]
[2025-02-20 03:48:58] [INFO] [--adaptive_noise_scale ADAPTIVE_NOISE_SCALE]
[2025-02-20 03:48:58] [INFO] [--zero_terminal_snr]
[2025-02-20 03:48:58] [INFO] [--min_timestep MIN_TIMESTEP]
[2025-02-20 03:48:58] [INFO] [--max_timestep MAX_TIMESTEP]
[2025-02-20 03:48:58] [INFO] [--loss_type {l1,l2,huber,smooth_l1}]
[2025-02-20 03:48:58] [INFO] [--huber_schedule {constant,exponential,snr}]
[2025-02-20 03:48:58] [INFO] [--huber_c HUBER_C] [--huber_scale HUBER_SCALE]
[2025-02-20 03:48:58] [INFO] [--lowram] [--highvram]
[2025-02-20 03:48:58] [INFO] [--sample_every_n_steps SAMPLE_EVERY_N_STEPS]
[2025-02-20 03:48:58] [INFO] [--sample_at_first]
[2025-02-20 03:48:58] [INFO] [--sample_every_n_epochs SAMPLE_EVERY_N_EPOCHS]
[2025-02-20 03:48:58] [INFO] [--sample_prompts SAMPLE_PROMPTS]
[2025-02-20 03:48:58] [INFO] [--sample_sampler {ddim,pndm,lms,euler,euler_a,heun,dpm_2,dpm_2_a,dpmsolver,dpmsolver++,dpmsingle,k_lms,k_euler,k_euler_a,k_dpm_2,k_dpm_2_a}]
[2025-02-20 03:48:58] [INFO] [--config_file CONFIG_FILE] [--output_config]
[2025-02-20 03:48:58] [INFO] [--metadata_title METADATA_TITLE]
[2025-02-20 03:48:58] [INFO] [--metadata_author METADATA_AUTHOR]
[2025-02-20 03:48:58] [INFO] [--metadata_description METADATA_DESCRIPTION]
[2025-02-20 03:48:58] [INFO] [--metadata_license METADATA_LICENSE]
[2025-02-20 03:48:58] [INFO] [--metadata_tags METADATA_TAGS]
[2025-02-20 03:48:58] [INFO] [--prior_loss_weight PRIOR_LOSS_WEIGHT]
[2025-02-20 03:48:58] [INFO] [--conditioning_data_dir CONDITIONING_DATA_DIR]
[2025-02-20 03:48:58] [INFO] [--masked_loss] [--deepspeed]
[2025-02-20 03:48:58] [INFO] [--zero_stage {0,1,2,3}]
[2025-02-20 03:48:58] [INFO] [--offload_optimizer_device {None,cpu,nvme}]
[2025-02-20 03:48:58] [INFO] [--offload_optimizer_nvme_path OFFLOAD_OPTIMIZER_NVME_PATH]
[2025-02-20 03:48:58] [INFO] [--offload_param_device {None,cpu,nvme}]
[2025-02-20 03:48:58] [INFO] [--offload_param_nvme_path OFFLOAD_PARAM_NVME_PATH]
[2025-02-20 03:48:58] [INFO] [--zero3_init_flag] [--zero3_save_16bit_model]
[2025-02-20 03:48:58] [INFO] [--fp16_master_weights_and_gradients]
[2025-02-20 03:48:58] [INFO] [--optimizer_type OPTIMIZER_TYPE]
[2025-02-20 03:48:58] [INFO] [--use_8bit_adam] [--use_lion_optimizer]
[2025-02-20 03:48:58] [INFO] [--learning_rate LEARNING_RATE]
[2025-02-20 03:48:58] [INFO] [--max_grad_norm MAX_GRAD_NORM]
[2025-02-20 03:48:58] [INFO] [--optimizer_args [OPTIMIZER_ARGS ...]]
[2025-02-20 03:48:58] [INFO] [--lr_scheduler_type LR_SCHEDULER_TYPE]
[2025-02-20 03:48:58] [INFO] [--lr_scheduler_args [LR_SCHEDULER_ARGS ...]]
[2025-02-20 03:48:58] [INFO] [--lr_scheduler LR_SCHEDULER]
[2025-02-20 03:48:58] [INFO] [--lr_warmup_steps LR_WARMUP_STEPS]
[2025-02-20 03:48:58] [INFO] [--lr_decay_steps LR_DECAY_STEPS]
[2025-02-20 03:48:58] [INFO] [--lr_scheduler_num_cycles LR_SCHEDULER_NUM_CYCLES]
[2025-02-20 03:48:58] [INFO] [--lr_scheduler_power LR_SCHEDULER_POWER]
[2025-02-20 03:48:58] [INFO] [--fused_backward_pass]
[2025-02-20 03:48:58] [INFO] [--lr_scheduler_timescale LR_SCHEDULER_TIMESCALE]
[2025-02-20 03:48:58] [INFO] [--lr_scheduler_min_lr_ratio LR_SCHEDULER_MIN_LR_RATIO]
[2025-02-20 03:48:58] [INFO] [--dataset_config DATASET_CONFIG]
[2025-02-20 03:48:58] [INFO] [--min_snr_gamma MIN_SNR_GAMMA]
[2025-02-20 03:48:58] [INFO] [--scale_v_pred_loss_like_noise_pred]
[2025-02-20 03:48:58] [INFO] [--v_pred_like_loss V_PRED_LIKE_LOSS]
[2025-02-20 03:48:58] [INFO] [--debiased_estimation_loss]
[2025-02-20 03:48:58] [INFO] [--weighted_captions]
[2025-02-20 03:48:58] [INFO] [--cpu_offload_checkpointing] [--no_metadata]
[2025-02-20 03:48:58] [INFO] [--save_model_as {None,ckpt,pt,safetensors}]
[2025-02-20 03:48:58] [INFO] [--unet_lr UNET_LR]
[2025-02-20 03:48:58] [INFO] [--text_encoder_lr [TEXT_ENCODER_LR ...]]
[2025-02-20 03:48:58] [INFO] [--fp8_base_unet]
[2025-02-20 03:48:58] [INFO] [--network_weights NETWORK_WEIGHTS]
[2025-02-20 03:48:58] [INFO] [--network_module NETWORK_MODULE]
[2025-02-20 03:48:58] [INFO] [--network_dim NETWORK_DIM]
[2025-02-20 03:48:58] [INFO] [--network_alpha NETWORK_ALPHA]
[2025-02-20 03:48:58] [INFO] [--network_dropout NETWORK_DROPOUT]
[2025-02-20 03:48:58] [INFO] [--network_args [NETWORK_ARGS ...]]
[2025-02-20 03:48:58] [INFO] [--network_train_unet_only]
[2025-02-20 03:48:58] [INFO] [--network_train_text_encoder_only]
[2025-02-20 03:48:58] [INFO] [--training_comment TRAINING_COMMENT]
[2025-02-20 03:48:58] [INFO] [--dim_from_weights]
[2025-02-20 03:48:58] [INFO] [--scale_weight_norms SCALE_WEIGHT_NORMS]
[2025-02-20 03:48:58] [INFO] [--base_weights [BASE_WEIGHTS ...]]
[2025-02-20 03:48:58] [INFO] [--base_weights_multiplier [BASE_WEIGHTS_MULTIPLIER ...]]
[2025-02-20 03:48:58] [INFO] [--no_half_vae] [--skip_until_initial_step]
[2025-02-20 03:48:58] [INFO] [--initial_epoch INITIAL_EPOCH]
[2025-02-20 03:48:58] [INFO] [--initial_step INITIAL_STEP]
[2025-02-20 03:48:58] [INFO] [--validation_seed VALIDATION_SEED]
[2025-02-20 03:48:58] [INFO] [--validation_split VALIDATION_SPLIT]
[2025-02-20 03:48:58] [INFO] [--validate_every_n_steps VALIDATE_EVERY_N_STEPS]
[2025-02-20 03:48:58] [INFO] [--validate_every_n_epochs VALIDATE_EVERY_N_EPOCHS]
[2025-02-20 03:48:58] [INFO] [--max_validation_steps MAX_VALIDATION_STEPS]
[2025-02-20 03:48:58] [INFO] [--cache_text_encoder_outputs]
[2025-02-20 03:48:58] [INFO] [--cache_text_encoder_outputs_to_disk]
[2025-02-20 03:48:58] [INFO] [--text_encoder_batch_size TEXT_ENCODER_BATCH_SIZE]
[2025-02-20 03:48:58] [INFO] [--disable_mmap_load_safetensors]
[2025-02-20 03:48:58] [INFO] [--weighting_scheme {sigma_sqrt,logit_normal,mode,cosmap,none,uniform}]
[2025-02-20 03:48:58] [INFO] [--logit_mean LOGIT_MEAN] [--logit_std LOGIT_STD]
[2025-02-20 03:48:58] [INFO] [--mode_scale MODE_SCALE]
[2025-02-20 03:48:58] [INFO] [--blocks_to_swap BLOCKS_TO_SWAP]
[2025-02-20 03:48:58] [INFO] [--clip_l CLIP_L] [--t5xxl T5XXL] [--ae AE]
[2025-02-20 03:48:58] [INFO] [--controlnet_model_name_or_path CONTROLNET_MODEL_NAME_OR_PATH]
[2025-02-20 03:48:58] [INFO] [--t5xxl_max_token_length T5XXL_MAX_TOKEN_LENGTH]
[2025-02-20 03:48:58] [INFO] [--apply_t5_attn_mask]
[2025-02-20 03:48:58] [INFO] [--guidance_scale GUIDANCE_SCALE]
[2025-02-20 03:48:58] [INFO] [--timestep_sampling {sigma,uniform,sigmoid,shift,flux_shift}]
[2025-02-20 03:48:58] [INFO] [--sigmoid_scale SIGMOID_SCALE]
[2025-02-20 03:48:58] [INFO] [--model_prediction_type {raw,additive,sigma_scaled}]
[2025-02-20 03:48:58] [INFO] [--discrete_flow_shift DISCRETE_FLOW_SHIFT]
[2025-02-20 03:48:58] [INFO] [--split_mode]
[2025-02-20 03:48:58] [INFO] flux_train_network.py: error: argument --output_name: expected one argument
[2025-02-20 03:48:58] [INFO] Traceback (most recent call last):
[2025-02-20 03:48:58] [INFO] File "", line 198, in _run_module_as_main
[2025-02-20 03:48:58] [INFO] File "", line 88, in _run_code
[2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Scripts\accelerate.exe\__main__.py", line 7, in
[2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
[2025-02-20 03:48:58] [INFO] args.func(args)
[2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
[2025-02-20 03:48:58] [INFO] simple_launcher(args)
[2025-02-20 03:48:58] [INFO] File "C:\AI\Aitrepreneur\FLUX-LORA-FLUXGYM-INSTALL-V2\fluxgym\env\Lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
[2025-02-20 03:48:58] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
[2025-02-20 03:48:58] [INFO] subprocess.CalledProcessError: Command '['C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\env\\Scripts\\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\unet\\flux1-dev.sft', '--clip_l', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\clip\\clip_l.safetensors', '--t5xxl', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\clip\\t5xxl_fp16.safetensors', '--ae', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\models\\vae\\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '16', '--save_every_n_epochs', '4', '--dataset_config', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\outputs\\dataset.toml', '--output_dir', 'C:\\AI\\Aitrepreneur\\FLUX-LORA-FLUXGYM-INSTALL-V2\\fluxgym\\outputs', '--output_name', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 2.
[2025-02-20 03:48:59] [ERROR] Command exited with code 1
[2025-02-20 03:48:59] [INFO] Runner:
got this error
Anton Karpuzikov
2025-02-20 01:51:06 +0000 UTC
ok then try this:
go inside the
C:\Users\YOURUSERNAME\.cache\huggingface\hub folder and look for the folder called "models--multimodalart--Florence-2-large-no-flash-attn" and delete it.
THEN once you are inside the C:\Users\YOURUSERNAME\.cache\huggingface\hub folder
click on the folder path, type cmd press enter, this will bring a command prompt window inside that folder and inside type:
git lfs install
git clone https://huggingface.co/Aitrepreneur/Florence-2-large-no-flash-attn
this will take some time to download (a few minutes) once you see that everything was downloaded (the whole folder should be 1.4gb) you need to rename that folder from Florence-2-large-no-flash-attn into
models--multimodalart--Florence-2-large-no-flash-attn
then you can relaunch fluxgym
Aitrepreneur
2025-02-19 23:59:41 +0000 UTC
no you need to train video loras for specifically the hunyuan model. Each model needs their own lora training since every architecture is different
Aitrepreneur
2025-02-19 23:09:18 +0000 UTC
as in my other comment. there was still an issue with a dual system. I installed on a single 4080 system with the same images and it worked fine. I am sure there are added stuff I could do to make it work but the 4080 is fine.
NewBe2
2025-02-19 20:40:30 +0000 UTC
its the dual 4090 still. I installed on a single 4080 system and it runs fine. same images, settings.
NewBe2
2025-02-19 20:36:16 +0000 UTC
Can you elaborate or link to an article that defines what "weird image dimensions" are? I believe I'm having a similar issue. I was successful on a smaller dataset of 40 images, but I tried to expand it to 60 images using new images that had to have a significant amount of the image cropped out, so the aspect ratio for some of the new images are >2:1. Once everything is run through the bucket, that makes the resolution >2048:1024.
SenoTakai
2025-02-19 20:18:17 +0000 UTC
login to jupiter labs (connect) and click on workspace/flxgym. there is an outputfolder. right click download ;-)
Marc
2025-02-19 19:29:23 +0000 UTC
Great video!
As I am using Runpod, where can I find the LORAs once the training is done?
Leonardo Piumi
2025-02-19 17:01:26 +0000 UTC
Same. I've tried everything. I reached out to Aitrepreneur and he said to delete cache in C:\Users\'username\.cache\huggingface\hub but that didn't work for me either. I'm reinstalling currently to see if that helps.
Reign2294
2025-02-19 15:56:08 +0000 UTC
I can't get florence 2 to work inside Gymflux. Sometimes it starts to download, other times it crashes out right away: run_captioning
concept sentence
captions ('', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '')
device=cuda
pytorch_model.bin: 0%| | 0.00/1.54G [00:13
JW
2025-02-19 14:00:13 +0000 UTC
Yeah, the trained LoRA's I have created of my wife work fantastically when generating images via F8 & GGUF Image Generation workflows, but they don't seem to work at all when I try to use them with Hunyuan Text to Video. Is this because the LoRA's created in Fluxgym are incompatible? Rgthree perhaps? If so, can we convert them?
Osvaldo Alfaro
2025-02-19 06:22:03 +0000 UTC
[2025-02-18 21:35:44] [INFO] Running C:\Users\chat1\ai\fluxgym\outputs\robink\train.bat
[2025-02-18 21:35:44] [INFO]
[2025-02-18 21:35:44] [INFO] (env) C:\Users\chat1\ai\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 --num_processes=1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\Users\chat1\ai\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\Users\chat1\ai\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\Users\chat1\ai\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\Users\chat1\ai\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 16 --optimizer_type adamw8bit --learning_rate 5e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 10 --save_every_n_epochs 1 --dataset_config "C:\Users\chat1\ai\fluxgym\outputs\robink\dataset.toml" --output_dir "C:\Users\chat1\ai\fluxgym\outputs\robink" --output_name robink --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 --enable_bucket --min_snr_gamma 5 --multires_noise_discount 0.3 --multires_noise_iterations 6 --noise_offset 0.1 --train_batch_size 2
[2025-02-18 21:35:48] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead:
[2025-02-18 21:35:48] [INFO] `--num_machines` was set to a value of `1`
[2025-02-18 21:35:48] [INFO] `--dynamo_backend` was set to a value of `'no'`
[2025-02-18 21:35:48] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2025-02-18 21:35:51] [INFO] 2025-02-18 21:35:51 INFO highvram is enabled / train_util.py:4305
[2025-02-18 21:35:51] [INFO] highvramใๆๅนใงใ
[2025-02-18 21:35:51] [INFO] WARNING cache_latents_to_disk is train_util.py:4322
[2025-02-18 21:35:51] [INFO] enabled, so cache_latents is
[2025-02-18 21:35:51] [INFO] also enabled /
[2025-02-18 21:35:51] [INFO] cache_latents_to_diskใๆๅนใชใ
[2025-02-18 21:35:51] [INFO] ใใcache_latentsใๆๅนใซใใพใ
[2025-02-18 21:35:51] [INFO] 2025-02-18 21:35:51 INFO Checking the state dict: flux_utils.py:43
[2025-02-18 21:35:51] [INFO] Diffusers or BFL, dev or schnell
[2025-02-18 21:35:51] [INFO] INFO t5xxl_max_token_length: flux_train_network.py:152
[2025-02-18 21:35:51] [INFO] 512
[2025-02-18 21:35:51] [INFO] C:\Users\chat1\ai\fluxgym\env\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
[2025-02-18 21:35:51] [INFO] warnings.warn(
[2025-02-18 21:35:51] [INFO] You are using the default legacy behaviour of the . This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
[2025-02-18 21:35:51] [INFO] INFO Loading dataset config from train_network.py:446
[2025-02-18 21:35:51] [INFO] C:\Users\chat1\ai\fluxgym\out
[2025-02-18 21:35:51] [INFO] puts\robink\dataset.toml
[2025-02-18 21:35:51] [INFO] INFO prepare images. train_util.py:2062
[2025-02-18 21:35:51] [INFO] INFO get image size from name of train_util.py:1951
[2025-02-18 21:35:51] [INFO] cache files
[2025-02-18 21:35:51] [INFO] 0%| | 0/24 [00:00
[2025-02-18 21:35:51] [INFO] trainer.train(args)
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\sd-scripts\train_network.py", line 521, in train
[2025-02-18 21:35:51] [INFO] accelerator = train_util.prepare_accelerator(args)
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\sd-scripts\library\train_util.py", line 5384, in prepare_accelerator
[2025-02-18 21:35:51] [INFO] accelerator = Accelerator(
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\accelerator.py", line 383, in __init__
[2025-02-18 21:35:51] [INFO] self.state = AcceleratorState(
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\state.py", line 846, in __init__
[2025-02-18 21:35:51] [INFO] PartialState(cpu, **kwargs)
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\state.py", line 270, in __init__
[2025-02-18 21:35:51] [INFO] self.num_processes = torch.distributed.get_world_size()
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\distributed_c10d.py", line 2020, in get_world_size
[2025-02-18 21:35:51] [INFO] return _get_group_size(group)
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\distributed_c10d.py", line 986, in _get_group_size
[2025-02-18 21:35:51] [INFO] default_pg = _get_default_group()
[2025-02-18 21:35:51] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\distributed_c10d.py", line 1150, in _get_default_group
[2025-02-18 21:35:51] [INFO] raise ValueError(
[2025-02-18 21:35:51] [INFO] ValueError: Default process group has not been initialized, please make sure to call init_process_group.
[2025-02-18 21:35:52] [INFO] Traceback (most recent call last):
[2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
[2025-02-18 21:35:52] [INFO] return _run_code(code, main_globals, None,
[2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
[2025-02-18 21:35:52] [INFO] exec(code, run_globals)
[2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\Scripts\accelerate.exe\__main__.py", line 7, in
[2025-02-18 21:35:52] [INFO] sys.exit(main())
[2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
[2025-02-18 21:35:52] [INFO] args.func(args)
[2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
[2025-02-18 21:35:52] [INFO] simple_launcher(args)
[2025-02-18 21:35:52] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 704, in ensimple_launcher
[2025-02-18 21:35:52] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
[2025-02-18 21:35:52] [INFO] subprocess.CalledProcessError: Command '['C:\\Users\\chat1\\ai\\fluxgym\\env\\Scripts\\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\unet\\flux1-dev.sft', '--clip_l', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\clip\\clip_l.safetensors', '--t5xxl', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\clip\\t5xxl_fp16.safetensors', '--ae', 'C:\\Users\\chat1\\ai\\fluxgym\\models\\vae\\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '16', '--optimizer_type', 'adamw8bit', '--learning_rate', '5e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '10', '--save_every_n_epochs', '1', '--dataset_config', 'C:\\Users\\chat1\\ai\\fluxgym\\outputs\\robink\\dataset.toml', '--output_dir', 'C:\\Users\\chat1\\ai\\fluxgym\\outputs\\robink', '--output_name', 'robink', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2', '--enable_bucket', '--min_snr_gamma', '5', '--multires_noise_discount', '0.3', '--multires_noise_iterations', '6', '--noise_offset', '0.1', '--train_batch_size', '2']' returned non-zero exit status 1.
[2025-02-18 21:35:52] [ERROR] Command exited with code 1
[2025-02-18 21:35:52] [INFO] Runner:
So I added the --num=processes=1 and it got past the multiple GPU issue. However, I ran into the one above. I have tried different images. Different number of images, Different resolutions, your suggested changes, using default values..... Even closed and restarted app between different runs. Always the same error.
NewBe2
2025-02-19 06:02:07 +0000 UTC
Sweet thanks! It works :)
Verratanectu
2025-02-19 03:50:05 +0000 UTC
This is probably a bad initial python install, you need to uninstall your current python installation and reinstall it correctly. Go to the add and remove programs, search for python and uninstall both the current python version and the python install program. Once this is done, go here and download this installer: https://www.python.org/ftp/python/3.10.11/python-3.10.11-amd64.exe
Run it and check the โAdd python 3.10 to Pathโ checkbox and continue with the installation.
You can check that the right python version is installed by opening a new command prompt window and typing:
python --version
and it should give you the 3.10.11 version
Then just relaunch the 1-click installer in a new folder and try again.
Aitrepreneur
2025-02-19 03:37:12 +0000 UTC
Got the following error
Starting FluxGym installation...
Python already installed.
Git already installed.
Cloning FluxGym repository...
Cloning into 'fluxgym'...
remote: Enumerating objects: 271, done.
remote: Counting objects: 100% (156/156), done.
remote: Compressing objects: 100% (59/59), done.
remote: Total 271 (delta 127), reused 97 (delta 97), pack-reused 115 (from 2)
Receiving objects: 100% (271/271), 16.52 MiB | 37.17 MiB/s, done.
Resolving deltas: 100% (156/156), done.
Cloning SD Scripts repository...
Cloning into 'sd-scripts'...
remote: Enumerating objects: 9097, done.
remote: Counting objects: 100% (37/37), done.
remote: Compressing objects: 100% (22/22), done.
remote: Total 9097 (delta 29), reused 17 (delta 15), pack-reused 9060 (from 3)
Receiving objects: 100% (9097/9097), 11.14 MiB | 31.77 MiB/s, done.
Resolving deltas: 100% (6592/6592), done.
Downloading launcher...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 52 100 52 0 0 172 0 --:--:-- --:--:-- --:--:-- 172
Creating Python virtual environment...
Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
Error: Failed to create virtual environment.
Press any key to continue . . .
Verratanectu
2025-02-19 03:19:44 +0000 UTC
https://c.tenor.com/cwoN93BINOMAAAAC/tenor.gif
Aitrepreneur
2025-02-19 02:46:44 +0000 UTC
Are there any plans of making something like this for hunyuan video. Because that would be awesome.
Bruce
2025-02-19 02:42:24 +0000 UTC
Yes I'm using the 16gb training preset, I also tried to activate --full_bf16 (could'nt figure out a way to make the fp16 work), --fused_backward_pass and --xformers and lower the network_dim to 10, I noticed a small improvement, but it still takes around 1 hour to train 1 epoch.
I also noticed that when I have more than 1 picture in the training batch it tends to offload some of the training onto my RAM, slowing the whole process, so I kept it at 1.
Problem is : I'm not sure how long is it roughly supposed to take to train 1 epoch with this setup, maybe it's normal it takes so much time, but I was hoping this new graphic card was strong enough to endure more ai work.
(Same issue with hunyan where it takes ages to generate anything ...). On the other hand, the image generation with Flux Q8 in GGUF takes around 50s for one image.
Let me know if you need any other specific informations and thanks again for your help !
LeGregouz
2025-02-19 01:33:28 +0000 UTC
edited app.py to add the parameter. Crashed at a different spot. Might have issues with training data. Will work on it and get back.
NewBe2
2025-02-19 01:04:12 +0000 UTC
It's because you have multiple GPU, it doesn't really support it correctly. To fix this, you need to edit the app.py file. At the line 453, press enter and put:
--num_processes=1 {line_break}
so that it looks like this:
sh = f"""accelerate launch {line_break}
--num_processes=1 {line_break}
--mixed_precision bf16 {line_break}
then save the file and reload fluxgym
Aitrepreneur
2025-02-19 00:58:45 +0000 UTC
stops before it starts training with the following log
[2025-02-18 16:41:20] [INFO] (env) C:\Users\chat1\ai\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\Users\chat1\ai\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\Users\chat1\ai\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\Users\chat1\ai\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\Users\chat1\ai\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adamw8bit --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 16 --save_every_n_epochs 4 --dataset_config "C:\Users\chat1\ai\fluxgym\outputs\robin1\dataset.toml" --output_dir "C:\Users\chat1\ai\fluxgym\outputs\robin1" --output_name robin1 --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2
[2025-02-18 16:41:23] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead:
[2025-02-18 16:41:23] [INFO] `--num_processes` was set to a value of `2`
[2025-02-18 16:41:23] [INFO] More than one GPU was found, enabling multi-GPU training.
[2025-02-18 16:41:23] [INFO] If this was unintended please pass in `--num_processes=1`.
[2025-02-18 16:41:23] [INFO] `--num_machines` was set to a value of `1`
[2025-02-18 16:41:23] [INFO] `--dynamo_backend` was set to a value of `'no'`
[2025-02-18 16:41:23] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2025-02-18 16:41:23] [INFO] W0218 16:41:23.645788 4916 Lib\site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
[2025-02-18 16:41:25] [INFO] Traceback (most recent call last):
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
[2025-02-18 16:41:25] [INFO] return _run_code(code, main_globals, None,
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
[2025-02-18 16:41:25] [INFO] exec(code, run_globals)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\Scripts\accelerate.exe\__main__.py", line 7, in
[2025-02-18 16:41:25] [INFO] sys.exit(main())
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
[2025-02-18 16:41:25] [INFO] args.func(args)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1097, in launch_command
[2025-02-18 16:41:25] [INFO] multi_gpu_launcher(args)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 734, in multi_gpu_launcher
[2025-02-18 16:41:25] [INFO] distrib_run.run(args)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\run.py", line 910, in run
[2025-02-18 16:41:25] [INFO] elastic_launch(
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\launcher\api.py", line 138, in __call__
[2025-02-18 16:41:25] [INFO] return launch_agent(self._config, self._entrypoint, list(args))
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\launcher\api.py", line 260, in launch_agent
[2025-02-18 16:41:25] [INFO] result = agent.run()
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper
[2025-02-18 16:41:25] [INFO] result = f(*args, **kwargs)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 696, in run
[2025-02-18 16:41:25] [INFO] result = self._invoke_run(role)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 849, in _invoke_run
[2025-02-18 16:41:25] [INFO] self._initialize_workers(self._worker_group)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper
[2025-02-18 16:41:25] [INFO] result = f(*args, **kwargs)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 668, in _initialize_workers
[2025-02-18 16:41:25] [INFO] self._rendezvous(worker_group)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\metrics\api.py", line 137, in wrapper
[2025-02-18 16:41:25] [INFO] result = f(*args, **kwargs)
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\agent\server\api.py", line 500, in _rendezvous
[2025-02-18 16:41:25] [INFO] rdzv_info = spec.rdzv_handler.next_rendezvous()
[2025-02-18 16:41:25] [INFO] File "C:\Users\chat1\ai\fluxgym\env\lib\site-packages\torch\distributed\elastic\rendezvous\static_tcp_rendezvous.py", line 67, in next_rendezvous
[2025-02-18 16:41:25] [INFO] self._store = TCPStore( # type: ignore[call-arg]
[2025-02-18 16:41:25] [INFO] RuntimeError: use_libuv was requested but PyTorch was build without libuv support
[2025-02-18 16:41:26] [ERROR] Command exited with code 1
[2025-02-18 16:41:26] [INFO] Runner:
The only difference between my system and yours is I have dual 4090's. Not sure if that is the issue but I can't manually change the num_processes=1.
NewBe2
2025-02-19 00:45:30 +0000 UTC
I saw on runpod that stopping the pod kinda destroys the env, so it's better to just completely delete the pod and redo the install each time unfortunately. Otherwise you can also do this, once you are inside the fluxgym folder, click on the terminal icon on the right then type:
cd sd-scripts
pip install -r requirements.txt
cd ..
pip install -r requirements.txt
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python app.py
Aitrepreneur
2025-02-19 00:26:27 +0000 UTC
Looks like there is a problem with your dataset, seems like you might have some weird image dimension for 1 or multiple images. Make sure your images aren't too big either for training.
Aitrepreneur
2025-02-19 00:24:26 +0000 UTC
it's difficult to say just from this message, especially since Fluxgym is so bad at showing progress. Have you chosen the specific 16g or 12gb training presets in fluxgym?
Aitrepreneur
2025-02-19 00:22:41 +0000 UTC
Yes for that particular error that is one of the possible fix (tbh not even sure why that value isn't set at 0 by default already in the project but oh well...)
Aitrepreneur
2025-02-19 00:20:42 +0000 UTC
What do you mean? You can't just train multiple loras at the same time (if that's what you mean...), it's better to just train a lora separately for each object and then use them inside the prompt. Or you can just merge those loras inside the flux model as well but you will lose the flexibility of a lora. Also you can use mutliple loras together already so, not sure about your question
Aitrepreneur
2025-02-19 00:19:39 +0000 UTC
go inside the
C:\Users\YOURUSERNAME\.cache\huggingface\hub folder and look for the folder called "models--multimodalart--Florence-2-large-no-flash-attn" and delete it.
Then try again. You can also use something like everything.exe (https://www.voidtools.com/downloads) to search for "models--multimodalart--Florence-2-large-no-flash-attn" and delete the folder
Aitrepreneur
2025-02-19 00:15:55 +0000 UTC
just select them, then right click and download.
Aitrepreneur
2025-02-19 00:12:00 +0000 UTC
I think there is a limit of 150 for the number of images you can use in fluxgym. You don't need more anyway
Aitrepreneur
2025-02-19 00:11:34 +0000 UTC
send me a dm
Aitrepreneur
2025-02-19 00:10:02 +0000 UTC
quality beats quantity, if you already have at least 20 images, just add more varied photos, don't just use selfies, use different angles, different lighting, etc but always as high quality as possible
Aitrepreneur
2025-02-19 00:09:49 +0000 UTC
same question as healingpaint, what's your installed python version? You need the 3.10.11, and added to path correctly.
Aitrepreneur
2025-02-19 00:08:11 +0000 UTC
I saw on runpod that stopping the pod kinda destroys the env, so it's better to just completely delete the pod and redo the install each time unfortunately. Otherwise you can also do this, once you are inside the fluxgym folder, click on the terminal icon on the right then type:
cd sd-scripts
pip install -r requirements.txt
cd ..
pip install -r requirements.txt
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python app.py
Aitrepreneur
2025-02-19 00:06:55 +0000 UTC
does not work - maybe there is no activatet enviroment? if so - how do we activate? :-)
Marc
2025-02-18 19:52:45 +0000 UTC
hi, newbie here ,how do I restart fluxgym after I stop the pod and run the same pod again , all the file is still there , i tried enter the env mode and install the requirement ,then it stocked .....
Nathan lee
2025-02-18 16:41:38 +0000 UTC
[2025-02-18 17:36:21] [INFO] (env) C:\Ai\fluxgym>accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/flux_train_network.py --pretrained_model_name_or_path "C:\Ai\fluxgym\models\unet\flux1-dev.sft" --clip_l "C:\Ai\fluxgym\models\clip\clip_l.safetensors" --t5xxl "C:\Ai\fluxgym\models\clip\t5xxl_fp16.safetensors" --ae "C:\Ai\fluxgym\models\vae\ae.sft" --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adafactor --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" --split_mode --network_args "train_blocks=single" --lr_scheduler constant_with_warmup --max_grad_norm 0.0 --learning_rate 8e-4 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --fp8_base --highvram --max_train_epochs 10 --save_every_n_epochs 1 --dataset_config "C:\Ai\fluxgym\outputs\nelly-trinket\dataset.toml" --output_dir "C:\Ai\fluxgym\outputs\nelly-trinket" --output_name nelly-trinket --timestep_sampling shift --discrete_flow_shift 3.1582 --model_prediction_type raw --guidance_scale 1 --loss_type l2 --enable_bucket --huggingface_repo_visibility private --min_snr_gamma 5 --multires_noise_discount 0.3 --multires_noise_iterations 6 --noise_offset 0.1
[2025-02-18 17:36:24] [INFO] The following values were not passed to `accelerate launch` and had defaults used instead:
[2025-02-18 17:36:24] [INFO] `--num_processes` was set to a value of `1`
[2025-02-18 17:36:24] [INFO] `--num_machines` was set to a value of `1`
[2025-02-18 17:36:24] [INFO] `--dynamo_backend` was set to a value of `'no'`
[2025-02-18 17:36:24] [INFO] To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2025-02-18 17:36:26] [INFO] Traceback (most recent call last):
[2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\flux_train_network.py", line 14, in
[2025-02-18 17:36:26] [INFO] import train_network
[2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\train_network.py", line 26, in
[2025-02-18 17:36:26] [INFO] from library import deepspeed_utils, model_util, strategy_base, strategy_sd
[2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\library\strategy_sd.py", line 7, in
[2025-02-18 17:36:26] [INFO] from library import train_util
[2025-02-18 17:36:26] [INFO] File "C:\Ai\fluxgym\sd-scripts\library\train_util.py", line 291
[2025-02-18 17:36:26] [INFO] raise ValueError(f"Invalid image dimensions: {image_width}x{image_height} in file {self.image_path}")
[2025-02-18 17:36:26] [INFO] IndentationError: expected an indented block after 'if' statement on line 290
[2025-02-18 17:36:27] [INFO] Traceback (most recent call last):
[2025-02-18 17:36:27] [INFO] File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
[2025-02-18 17:36:27] [INFO] return _run_code(code, main_globals, None,
[2025-02-18 17:36:27] [INFO] File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
[2025-02-18 17:36:27] [INFO] exec(code, run_globals)
[2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\Scripts\accelerate.exe\__main__.py", line 7, in
[2025-02-18 17:36:27] [INFO] sys.exit(main())
[2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
[2025-02-18 17:36:27] [INFO] args.func(args)
[2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 1106, in launch_command
[2025-02-18 17:36:27] [INFO] simple_launcher(args)
[2025-02-18 17:36:27] [INFO] File "C:\Ai\fluxgym\env\lib\site-packages\accelerate\commands\launch.py", line 704, in simple_launcher
[2025-02-18 17:36:27] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
[2025-02-18 17:36:27] [INFO] subprocess.CalledProcessError: Command '['C:\\Ai\\fluxgym\\env\\Scripts\\python.exe', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', 'C:\\Ai\\fluxgym\\models\\unet\\flux1-dev.sft', '--clip_l', 'C:\\Ai\\fluxgym\\models\\clip\\clip_l.safetensors', '--t5xxl', 'C:\\Ai\\fluxgym\\models\\clip\\t5xxl_fp16.safetensors', '--ae', 'C:\\Ai\\fluxgym\\models\\vae\\ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '10', '--save_every_n_epochs', '1', '--dataset_config', 'C:\\Ai\\fluxgym\\outputs\\nelly-trinket\\dataset.toml', '--output_dir', 'C:\\Ai\\fluxgym\\outputs\\nelly-trinket', '--output_name', 'nelly-trinket', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2', '--enable_bucket', '--huggingface_repo_visibility', 'private', '--min_snr_gamma', '5', '--multires_noise_discount', '0.3', '--multires_noise_iterations', '6', '--noise_offset', '0.1']' returned non-zero exit status 1.
[2025-02-18 17:36:28] [ERROR] Command exited with code 1
[2025-02-18 17:36:28] [INFO] Runner:
Get this error...
Killy_Blame
2025-02-18 16:38:07 +0000 UTC
Hi! In my system (AMD + RTX3090 + Win 11) it is stopping while running the 'write web request' with V2. Any help is very much appreciated.
Speedy2023
2025-02-18 05:29:13 +0000 UTC
Hello,
Thank you very much for all the hard work, everything is easy to use all the time and that's absolutely great !
So far no issue with any of the one way install you made ! But I still have a question regarding training loras for Flux. I know it's supposed to take time to train a Lora, but setting up everything as you showed, it still takes a treamendous amout of time for me. I have an ASUS TUF Gaming GeForce RTX 4080 OC edition with 16Go of VRAM and I thought it might take a couple of hours to run at least 5 epoch, but so far it's been like 2 hours and I only have 1 epoch trained.
I tried it with a dataset of thirty pictures and with a train batch size set to null and another one set to 4. Everything seems to work fine, but I don't see any difference on the time it takes to do one epoch, wether I augment the train batch size or not.
I feel like with a 4080 SUPER with 16Go of Vram it should take less time, but maybe I'm wrong. Is there something I'm missing ? Maybe my GPU isn't giving all its potential for whatever reason (despite being at a 100% in task manager) ? Or is it absolutly normal ? Also can my CPU (in my case a ryzen 9 7950x3d) help in any way ? Also for exemple are the options --full_fp16 ; --fused_backward_pass or --xformers viable solutions to improve speed without losing too much quality ?
Thank you for your answer !
(And if anyone else has any information about the time it should take with my setup or just how much time it roughly takes for you with your setup, feel free to share ! It could give everyone a rough idea of how long does it take to train one epoch for each setup ๐)
LeGregouz
2025-02-18 00:59:04 +0000 UTC
to get it to work on my machine, I just edited the app.py file, near the top. look for this HF_HUB_ENABLE_HF_TRANSFER and set the value to 0
Bob Winberry
2025-02-17 23:09:29 +0000 UTC
Same
Alex Kilbee
2025-02-17 19:00:23 +0000 UTC
Episode Topic Suggestion: "Multi-LORAs" , i.e. training multiple LORA , one for each object and then using multiple LORAs to make image with e.g. three newly trained objects?
(here is some preliminary research I asked perplexity and deepresearch answered that it should be possible: https://www.perplexity.ai/search/i-see-tutorials-training-makin-SHc0lJBHR0SRd9ww6_0PdA )
Grzegorz Wierzowiecki
2025-02-17 18:43:19 +0000 UTC
When I click on add AI captions it errors with Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn' .... it's not downloading it
return model_class.from_pretrained(
File "E:\AI\FluxGym-Training\fluxgym\env\lib\site-packages\transformers\modeling_utils.py", line 3644, in from_pretrained
raise EnvironmentError(
OSError: Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'multimodalart/Florence-2-large-no-flash-attn' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
Robb
2025-02-17 18:13:27 +0000 UTC
Same, commenting in case somebody finds the solution โ
BS
2025-02-17 16:50:03 +0000 UTC
Hi I have a really basic question , I have trained my a LORA on runpod using fluxgym and I can see the .safetensor files on the code notebook , But how do I download them ?
Siddharth Shukla
2025-02-17 15:50:26 +0000 UTC
In K's video he uses 40 images for LoRa training. Is it possible to use say 100 or even 200 images without getting tuple index errors?
LW
2025-02-17 14:51:26 +0000 UTC
Actually, I haven't used Florence-2 yet. Working with some old hand edited images. It seeme that Florence must go in the folder "multimodalart", so you could try searching for that folder. If the folder is there but empty, you might be able to download the required model (Florence-2-large-no-flash-attn) directly from HuggingFace.
LW
2025-02-17 14:26:46 +0000 UTC
Installed flawlessly, already have two runs and they worked fantastic. Only took between 20-30mins for each run on my setup. Really appreciate this.
Michael Moeller
2025-02-17 07:14:25 +0000 UTC
V2 still doesn't nothing than coping python installation program to download.And nothing else...
Bobรฉpine
2025-02-17 07:10:24 +0000 UTC
looks like it worked. I trained it on images of myself. Some of them don't work. This is the first time I've attempted to train a lora, so any ideas on getting better results? More images?
dattrax
2025-02-17 01:51:58 +0000 UTC
FYI, fixed by updating to Python 3.10.11 and making sure it was in my path as primary (top most)
HealingPaint
2025-02-17 00:51:33 +0000 UTC
Thanks I installed it, but without the uninstall part and made sure it was on top of the path/environmental variables and this seemed to have resolved it. Appreciate it!
HealingPaint
2025-02-17 00:51:10 +0000 UTC
I'm getting " Traceback (most recent call last):
File "/workspace/fluxgym/app.py", line 8, in
import gradio as gr
ModuleNotFoundError: No module named 'gradio'" when typing that into the runpod terminal
Tyler89537
2025-02-16 23:50:15 +0000 UTC
you need to uninstall your current python installation and reinstall it correctly. Go to the add and remove programs, search for python and uninstall both the current python version and the python install program. Once this is done, go here and download this installer: https://www.python.org/ftp/python/3.10.11/python-3.10.11-amd64.exe
Run it and check the โAdd python 3.10 to Pathโ checkbox and continue with the installation.
You can check that the right python version is installed by opening a new command prompt window and typing:
python --version
and it should give you the 3.10.11 version
Then just relaunch the 1-click installer in a new folder and try again.
Aitrepreneur
2025-02-16 23:37:40 +0000 UTC
Is there a quick command prompt or powershell way to update this version and add to path? Thanks for the help.
HealingPaint
2025-02-16 23:30:29 +0000 UTC
Ahh I'm using Python 3.9.19
HealingPaint
2025-02-16 23:29:12 +0000 UTC
what's your installed python version? You need the 3.10.11, added to path correctly as well of course
Aitrepreneur
2025-02-16 23:24:02 +0000 UTC
My issue still happens...Any fix for this?
Installation completed successfully
Launching application...
Traceback (most recent call last):
File "E:\ComfyUI_windows_portable\ComfyUI\_templates\LORA Training (Flux)\fluxgym\app.py", line 19, in
from library import flux_train_utils, huggingface_util
File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_train_utils.py", line 17, in
from library import flux_models, flux_utils, strategy_base, train_util
File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 366, in
class ModelSpec:
File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 369, in ModelSpec
ckpt_path: str | None
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
Press any key to continue . . .
HealingPaint
2025-02-16 23:07:32 +0000 UTC
unless actually modifying the whole app.py file no. Maybe creating a quick macro for the browser?
Aitrepreneur
2025-02-16 21:23:03 +0000 UTC
I uploaded the V2 installer, might solve this error
Aitrepreneur
2025-02-16 21:22:05 +0000 UTC
type:
python app.py
Aitrepreneur
2025-02-16 21:21:46 +0000 UTC
Yes I just uploaded the V2, it should take care of the huggingface issue during the install, otherwise there is an additional option to change in the actual fluxgym repository, people can dm me for the rest
Aitrepreneur
2025-02-16 21:21:18 +0000 UTC
you should have dm me for that. One thing you can do is open a cmd window and drag and drop the installer inside then press enter, this will at least avoid the window from closing and will give at least an error message we can use to troubleshoot the issue.
Aitrepreneur
2025-02-16 21:10:27 +0000 UTC
it's crystools
Aitrepreneur
2025-02-16 21:09:06 +0000 UTC
it's a separate webui
Aitrepreneur
2025-02-16 21:07:57 +0000 UTC
as I said it will work with 8gb, it will just take longer but it works
Aitrepreneur
2025-02-16 21:07:47 +0000 UTC
Got it working. Had a ChatGPT session and we got it all worked out. ๐
MJ
2025-02-16 20:54:28 +0000 UTC
Seems like this is broken for a lot of people . Any fix coming?
HealingPaint
2025-02-16 20:49:32 +0000 UTC
Question. Do you have to adjust the advance settings every time or is there a way to save the settings?
Virtamouse
2025-02-16 20:41:17 +0000 UTC
Ok so the problem with Network Volumes is that we cannot just run the same script you've provided becuase the file paths already exist... So the environment and whatever else still needs to be initialized or installed onto th ePOD, even tho the files are already there.... I've messed with this for quite some time, but Perplexity is letting me down, I've been unable to make a new script that will run to download dependenceis and not re-download all the files that already exist... Any suggestions?
Tyler
2025-02-16 20:12:50 +0000 UTC
Found a solution from some guy on reddit :D
Open the Start Menu, search for "Environment Variables", and select Edit the system environment variables.
Click Environment Variables....
Under User variables or System variables, click New.
Set:
Variable name: HF_HUB_ENABLE_HF_TRANSFER
Variable value: 0
MPG
2025-02-16 19:29:55 +0000 UTC
This is the error I get: File "G:\FUXgym\fluxgym\env\lib\site-packages\huggingface_hub\file_download.py", line 437, in http_get
raise RuntimeError(
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
patreon@winberry.com
2025-02-16 18:25:10 +0000 UTC
@LW I am having trouble downloading the Florence-2. Does anyone know which exact folder that should go into via manual download?
OSError: Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'multimodalart/Florence-2-large-no-flash-attn' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
Christian
2025-02-16 17:06:54 +0000 UTC
How do I relaunch Fluxgym in runpod?
Bruce
2025-02-16 16:46:57 +0000 UTC
Not doing well, seems like I get an error right from the start. Anyone else getting this error, and if so how did you fix it.
"WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available."
Here is the complete train wreck! ๐
Installing SD Scripts dependencies...
WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Obtaining file:///C:/AI/FLUX%20LOAR%20FLUXGYM/fluxgym/sd-scripts (from -r requirements.txt (line 46))
Preparing metadata (setup.py) ... done
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.")': /simple/accelerate/
Could not fetch URL https://pypi.org/simple/accelerate/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/accelerate/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping
ERROR: Could not find a version that satisfies the requirement accelerate==0.33.0 (from versions: none)
ERROR: No matching distribution found for accelerate==0.33.0
WARNING: pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
Could not fetch URL https://pypi.org/simple/pip/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.org', port=443): Max retries exceeded with url: /simple/pip/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.")) - skipping
Error: Failed to install SD Scripts dependencies.
Press any key to continue . . .
MJ
2025-02-16 13:45:57 +0000 UTC
EDIT: PROBLEM SOLVED SEE BELOW (at the bottom)
The installation process seems to work ok but once I set Flux Gym up to make a LoRa it tries to download Flux1-dev.sft file and keeps failing. I get the following error message:
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
I have tried downloading the Flux1-dev.sft file from Huggingface and putting that in the unet folder (...fluxgym\models\unet\flux1-dev.sft) but Flux Gym still tries to download the file when I start a LoRa training.
I have tried the following fix suggested by user MPG:
Open the Start Menu, search for "Environment Variables", and select Edit the system environment variables.
Click Environment Variables....
Under User variables or System variables, click New.
Set:
Variable name: HF_HUB_ENABLE_HF_TRANSFER
Variable value: 0
However, even after a computer restart, I get the same error when Flux Gym tries to download the Flux1-dev.sft model.
Any suggestions?
EDIT: PROBLEM SOLVED:-
The problem I have had is with HF-TRANSFER failing to download large files (the models). I'm not sure why. However, if you download them directly from HuggingFace and put them in the proper locations then the process works. I tried this above but I mistakenly used Flux1-dev.safetensors rather than Flux1-dev.sft. (Apparently "it" does make you go blind after all! OMG, I'm in trouble now!)
Anyway....
......\fluxgym\models\unet should contain Flux1-dev.sft (not Flux1-dev.safetensors .... you can just change the extension!)
......\fluxgym\models\clip should contain t5xxl_fp16.safetensors
Those are the two files Flux Gym had problems downloading for me. You can download them from Hugging Face or you may already have them in your ComfyUI (other programs are available) set up, so you can just copy paste.
Have fun
LW
2025-02-16 13:08:10 +0000 UTC
hmm. i get an error like this. OSError: cannot write mode RGBA as JPEG
Killy_Blame
2025-02-16 11:34:42 +0000 UTC
I ended up using Pinokio to get Fluxgym installed, as person commented earlier.
Greg
2025-02-16 11:08:35 +0000 UTC
After a lot of other errors, I am not stuck on this one as well.
Thomas
2025-02-16 09:44:10 +0000 UTC
Seems there is a checkbox that says Add Path. I think that is it. That's what i did anyway.
Thomas
2025-02-16 09:30:04 +0000 UTC
Amazing, Can we use the generated Lora also on Hunyuan Video generation in your ultimate workflow? I used the OneTrainer to train but for some reason my generated lora did not worked
Ryan Tavan
2025-02-16 09:21:18 +0000 UTC
I am kinda hard stuck. I downloaded everything, installed Python with PATH enabled, then I installed RUST, then it seems to have run to completion but didn't launch the app. I've tried opening the LAUNCHER.bat file, but it opens for a second then closes immediately. Any ideas what might be going wrong?
Ish
2025-02-16 08:24:22 +0000 UTC
How do you enable PATH? I've run into this same problem.
Lexi Barber
2025-02-16 08:16:20 +0000 UTC
Manually downloading the latest versions of Python (with PATH enabled) and Git fixed it for me
Rich Vol
2025-02-16 07:20:08 +0000 UTC
Found a solution from some guy on reddit :D
Open the Start Menu, search for "Environment Variables", and select Edit the system environment variables.
Click Environment Variables....
Under User variables or System variables, click New.
Set:
Variable name: HF_HUB_ENABLE_HF_TRANSFER
Variable value: 0
MPG
2025-02-16 06:44:18 +0000 UTC
I get this error when I ran the 1 click installed (using Windows 11) - any ideas??
Installation completed successfully
Launching application...
Traceback (most recent call last):
File "E:\ComfyUI_windows_portable\ComfyUI\_templates\LORA Training (Flux)\fluxgym\app.py", line 19, in
from library import flux_train_utils, huggingface_util
File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_train_utils.py", line 17, in
from library import flux_models, flux_utils, strategy_base, train_util
File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 366, in
class ModelSpec:
File "e:\comfyui_windows_portable\comfyui\_templates\lora training (flux)\fluxgym\sd-scripts\library\flux_models.py", line 369, in ModelSpec
ckpt_path: str | None
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
Press any key to continue . . .
HealingPaint
2025-02-16 05:20:38 +0000 UTC
Hi, I receive the following error when trying to use the local installer:
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
ร Getting requirements to build wheel did not run successfully.
โ exit code: 1
โฐโ> [48 lines of output]
Traceback (most recent call last):
File "D:\Flux\fluxgym\env\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in
main()
~~~~^^
File "D:\Flux\fluxgym\env\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
~~~~^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Flux\fluxgym\env\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 304, in _get_build_requires
self.run_setup()
~~~~~~~~~~~~~~^^
File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 522, in run_setup
super().run_setup(setup_script=setup_script)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\menno\AppData\Local\Temp\pip-build-env-4x9ft98s\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in run_setup
exec(code, locals())
~~~~^^^^^^^^^^^^^^^^
File "", line 128, in
File "C:\Python313\Lib\subprocess.py", line 414, in check_call
retcode = call(*popenargs, **kwargs)
File "C:\Python313\Lib\subprocess.py", line 395, in call
with Popen(*popenargs, **kwargs) as p:
~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python313\Lib\subprocess.py", line 1036, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pass_fds, cwd, env,
^^^^^^^^^^^^^^^^^^^
...<5 lines>...
gid, gids, uid, umask,
^^^^^^^^^^^^^^^^^^^^^^
start_new_session, process_group)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python313\Lib\subprocess.py", line 1548, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
# no special security
^^^^^^^^^^^^^^^^^^^^^
...<4 lines>...
cwd,
^^^^
startupinfo)
^^^^^^^^^^^^
FileNotFoundError: [WinError 2] The system cannot find the file specified
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
error: subprocess-exited-with-error
ร Getting requirements to build wheel did not run successfully.
โ exit code: 1
โฐโ> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Error: Failed to install SD Scripts dependencies.
Press any key to continue . . .
Greg
2025-02-16 04:48:53 +0000 UTC
Does anyone else run into the issue where the .bat just closes after downloading python.310.11 , even after you have it already installed ?
Thomas
2025-02-16 04:26:00 +0000 UTC
Thanks for the tip! New patreon here myself
Herman
2025-02-16 01:51:15 +0000 UTC
Hey there, long time watcher, new Patreon. Love your stuff. You should really include to ppl to create a network volume under 'storage' on runpod, set THAT to 100gb, and launch a POD with the network volume/storage. 100GB is $7/mo and you won't lose your data, while the daily cost with the way you've shared it is like $0.50/day. It adds up fast! Hope this helps as I messed with runpod for HOURS to get it working and find a better cost effective solution. You can delete a POD, and your data will stay on the network volume so you can resume a new POD easily and not have to redownload models/files. $$$
Tyler
2025-02-16 01:01:02 +0000 UTC
I love these 1-step installers. Great for us lazy ass... er busy people who don't know a git from a hub :) And, as usual the video makes it all crystal clear. Thanks K. I think I can turn the heating off as my GPU will be glowing for a while :)
LW
2025-02-15 23:18:14 +0000 UTC
I don't know why but I have Python installed but your installers have never worked for me. They always close whenever I try to run them and have to do all my installs manually. I've been installing these programs via git for a couple of years now so I'm not a total noob but not an expert either so doing it manually isn't a deal breaker for me but it would be nice to not have to go through the trouble every now and then. Just wish I knew what was going on. Any time I run the installer, the installer closes immediately. It says the installer ran successfully but nothing actually happens. It just opens and then immediately closes.
lokitsar
2025-02-15 23:16:37 +0000 UTC
I noticed your CPU, GPU, Vram overlay in the video. Which program are you using? I am not very happy with any I have found thus far.
Mark
2025-02-15 22:54:06 +0000 UTC
Heya tryna run this right now I'm pretty sure I'm following along well enough but I'm getting this error
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.
MPG
2025-02-15 22:52:46 +0000 UTC
FYI you may have to install Rust. Their happens to be a link to the language download if the installer fails.
Mark
2025-02-15 22:39:10 +0000 UTC
Hi ! Can it be installed in the new Comfy UI created for the V3-ULTIMATE_FLUX_ALL-IN-ONE-WORKFLOW?
If yes, in what folder should the installer be launched ?
like: ././ComfyUI_windows_portable/ ?
Alex Arangon
2025-02-15 22:11:58 +0000 UTC
Is there any way to make FluxGym work with 8GB of VRAM?
Demitri Grigori
2025-02-15 21:28:53 +0000 UTC
Finally can make Man Bear Pig lora on my potato gpu. Thank you
Wes C
2025-02-15 20:47:06 +0000 UTC