'Allo folks!
Summer's over, kind of. Ups and downs. Let's get the bad first: workshop relining (aka plumbing renovation) is STILL not done. 2 months late now. Been blocked from accessing like 90% of my gear, tools and workspaces for ages now, which is frustrating as hell whatwith having the big-rig pretty much done. Pretty much scuttled all my summer workshop plans.
On the upside, they've re-connected the "real" pipes now and are in the process of cleaning things out, so it's finally about to wrap. Downside (yeah, roller coaster) is that it's right as I am about to have two weeks of business trips. Week in Zagreb, week in Krakow, including some travel days during the weekends. I'll have a little bit of time to clean and prepare around the travels, at least, and when I'm back I damn well plan on planting my arse in the workshop for a week solid. Fingers crossed the plan holds!
Since I haven't been able to do proper workshopping, I dug deeper into the AI stuff. Lots to talk about there! Let's see how well the 'images integrated into post' thing works while we're at it, as I still haven't actually done that despite Patreon adding it like 6mon ago.
First off, throw an eye at the May post if you haven't already. A bit of speculation on how media is about to get weird whatwith AI-generated image/video able to replicate and replace at scale. Not just for skeevy scams, but for legit cases. It'll simply be cheaper to generate media than produce it for real, which means it'll be everywhere, which means we as media consumers will end up having a new kind of relationship and default assumption about media, etc.
I posted some experiments earlier from when I was testing things out at openart.ai (which is still pretty solid for cheap/fast cloud experimentation), but like many commercial offerings it has NSFW limitations. That's clearly an issue, because we aren't in the SFW business here. Mostly. With that in mind, I spent a decent chunk of the vacation on setting up and familiarizing myself with local workflows, i.e. running things on your own machine. Turns out to be almost scary easy.
GPU VRAM, main thing that matters. For most current open-source/local things you'll want 16gB or more. Nvidia cards a bit better supported than AMD-things, but both are viable. By now there's plenty of mid-range consumer grade hardware that fits the bill. Many cheaper (relatively speaking) of Nvidias RTX4/5-series cards pack the necessary punch, e.g. Asus' 5060ti 16gB that retails for around $500 (https://www.asus.com/motherboards-components/graphics-cards/dual/dual-rtx5060ti-16g/ ).
Snagging a faster card (i.e. more power but same VRAM) gets you slightly faster generations, but that's about it. More VRAM on the other hand lets you run more advanced ("smarter") models that generate better quality things that conform better to prompts etc.
You CAN run things on regular RAM too, but it's 10x+ slower and may need a bit of tweaking to get working. Gets too slow for any kind of sensible iterative work. I can get my consumer-grade GPU to spit out a high-resolution image per minute, which is pretty okay for adjusting/tweaking 10-20 times in a single session. If it was 10-20min/image you'd likely not be arsed to do more than one or two tweaks, so it kind of limits your realistic workflow options.
OS doesn't matter too much, that's just a preference matter. Some models/applications then have their own wrapper software (commonly a gradio application), or plug into one of the major workflow orchestrators: ComfyUI or Forge. On my end I went deep into ComfyUI since it seemed a lot more flexible, and I haven't really touched Forge, so no major commentary on the inner workings. Lots on the internet though, e.g. https://www.reddit.com/r/FluxAI/comments/1h8ffp3/forge_vs_comfy/
ComfyUI, anyway, is a graphical boxes-and-strings workflow system. Does everything a written script would, but via 'nodes' on a canvas instead. Can end up as a tangled web of spaghetti quite easily, but I've got to admit that the half-GUI handling is pretty nifty when you want partial/mid-step visualizations and such. Pretty easy to set up workflows yourself, and very common to find workflows getting shared on e.g. reddit (r/StableDiffusion deals with both SD & Flux).
Typical scenario from my own experimenting below. Zoom-out kills the text content, but y'all get the general gist of how it works.

Speaking of models, when it comes to image generation there are two main families to be aware of - StableDiffusion and Flux. As of writing, Flux is a bit better att generating realistic things, whereas StableDiffusion goes stylistic / artsy. New models are released constantly and get tweaked by the communities around them, so take the statements with a pinch of salt depending on time-of-reading. For all practical purposes, either works fine. I mainly poked with Flux.
The interesting thing is when we start looking at getting what we want out of the models. Flux, straight out of the box, is decent. Size-wise it can handle about 2MP until results get weird, but good/accurate upscaling is easy to implement into the workflow. The content is limited by what it's been trained on, however. Clearly they haven't scraped a site-rip of rubberpassion.
That's where LoRA's come in. In short, it's a small'ish amount of additional training packaged as a model that loads on top of the base Flux/StableDiffusion. A LoRA can be trained on a picture style (black/white, comic, HDR photo, ...), clothing (weird latex gear!) or specific people (aka 'how to get in trouble' both legally and ethically). This is where the whole 'automate away self-photoshoots' thing gets put into practice though.
Thanks to LoRAs we can get reminded of that time when LatexLucy won the Berlin go-kart grand prix, etc:

I took a bunch of photos from a dressup round a while back, cleaned them a tiny bit (background removal via a ComfyUI workflow, etc) and threw them into FluxGym overnight. It's a fairly easy to use LoRA trainer that supports doing training on consumer-grade HW.
~8-10h training later, and voila, a LoRA that does a surprisingly good job of replicating the look considering the somewhat sub-optimal collection of reference photos.
Real photos:
(that got used in the training pile)



Pure AI photos:
Pure prompt-to-image with no image input after the LoRA training:




Looking at the AI generated images - it gets the skin texture, creases and a lot of the tattoo details right. Handles the face a bit less accurately (and makes it a lot prettier), which is attributable to the lack of Really High Detail Jank Closeups. Judge for yourself on the overall accuracy though.
It's also possible to combine LoRAs, e.g. one clothing-trained LoRA and one face/character-trained LoRA. Results may vary, and they sometimes don't play nicely together, but it's relatively simple to get combinations going that way. Since the one I'm using for these pics around here is a full-body+face one, it doesn't really mesh well, so eh - examples of that another day when I've banged my head against the problem a bit more.
In any case - if I was a greedy media monetizer, instead of a philosophical dork attempting to make Real Things to an obsessive degree, I'd be jumping onto this straight away. Grab a bunch of character/person photos, stack some outfits, and go hog-wild with near-zero-effort media productions. Et voila, "my latest beach outing please buy photoset for $10", etc.

Now, it's not perfect. Typical issues/limitations include:
Training time. My rig runs a ~100pic FluxGym round in roughly 12h, so an overnighter. If you screwed something up and need to iterate, yeah, it's painful.
Detail level. Local training is a bit limited in what kind of image size can be used for the actual training, so they are commonly downsized by quite a lot. 2000x3000 (6MP) -> 512x768, and so on. If you only include a bunch of full body pictures into a training set, you're likely to lose out on e.g. facial details.
Detail locking. Depending on what you're training on, you may get a model that is obsessed with an unwanted detail. The examples above on my photo-batch were trained on underwear photos, so the model ends up obsessed with underwear. Requires a lot of effort/iterations/prompting to get any real clothes on, and even then there's often some stylistic trace of the underwear in e.g. dress lines or so.
LoRA bleed-over. If you have e.g. a character LoRA of a woman with a ponytail and prompt a scene with background characters in it, many of them will look like blonde women with ponytails.
LoRA combinations. It's a bit unpredictable how well they'll play together. Can easily happen that your 12h-training-round was wasted because the trainer picked up some obscure detail that coincidentally conflicts with a similar obscure detail on your wanted pairing. If they don't play well, you get blurry/frayed images that are completely unusable.
Complex things that are theoretically describable, but highly niche, require detailed LoRA:s. Masks, disguises, suits and such are anathema to the base version of Flux. Very, very difficult if not impossible to prompt a successful "realistic mask", unmasking/dressup scenario on the raw model, even if it technically has seen the base visuals needed. I'll try my hand at LoRAing it someday.
As a final note, should you experiment with this yourself: it's quite common that generated images enclose generation details in the exif data. As an example, it's possible to find the entire workflow (including prompt):

So mind your skeevyness or wipe your exif :p
Anyway, idk, might make a pack or three for fun somewhere along the line. We'll see how things chug along. For now the Highest Goddamn Priority will be FINISHING THE BIG-RIG WHEN THE WORKSHOP IS ACCESSIBLE AGAIN!
TODO on bigrig:
Clean off the first cast, see if there's something to fix.
Reset the rig.
Attach the foot sections.
Sort out the improved drainage collector.
CAST!
#######################################
Again, as always, thank you all for your continued support. The plumbing renovations rendered the summer a lot less productive than I'd hoped, but things should pick up speed after the mid-july business trips. If I glossed over something, wrote about something especially interesting or so, don't hesitate to comment/DM/ping in the chat!
Avalon I
2025-07-06 06:29:33 +0000 UTC