Hi there, I’m trying to get (more or less) historically accurate images from the early and high middle ages, but none of the models seems to have a grasp of what “maille armor” or “bucket helmets” are and I either get complete garbage or fantasy armor that mostly resembles the early modern period (the stereotypical shining knight armor).

I assume a Lora, trained on images of the armor and weapons I’d like to include, could fix this problem. I found some neat tutorials for making Lora’s and think I’ll give it a shot.

Do any of you have experience in making these kinds of style Lora’s? What should I take care of? I will be manually downloading images that fit my aestethic and manually tag them - how many images should I use? Any input here is highly appreciated.

  • FactorSD@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    11
    ·
    1 year ago

    I too have just started on my LORA-making journey, and I too am interested in ahem specialist apparel. My experience is that most LORA are made by non-enthusiasts who don’t necessarily know how enthusiasts refer to things, and to some degree non-enthusiasts want visual variety so they can churn out “dwarf in armour” and “elf in armour” prompts and get things that actually look different. That is fine for most people, who just want to have some nice pictures to go along with their D&D campaign or whatever. But if you are a discerning connoisseur then yes, you kinda do need to roll up your sleeves and make it yourself.

    There are some guides out there for LORA making - As ever, they are a mix of helpful and not helpful, and you are going to end up having to work things out yourself. You are definitely going to end up wasting a lot of compute time on LORAs that just fail. That’s part of the process. You are going to see a lot of parameters which you don’t understand and that have seemingly absurd values.

    Before I jump into the rest of this - I strongly advise you to start out with LORAs that do one specific thing, and only that thing. So, a LORA just for bucket helmets, using just images of bucket helmets. You can make more complex LORA, but holy crap this is a complex process with a lot of moving parts, so start out with just one thing that you can easily tell if its working and how well.

    It is good to hear that you are mentally prepared to manually tag your own images because this is utterly essential and you need to do a very thorough job of it. When training stuff the rule is “garbage in, garbage out” and there is no shortcut here. I honestly haven’t found a good tagging methodology yet, but the advice I’m trying to follow is that you want to tag anything that you DON’T want included in the training term, and don’t tag things that you DO want included.

    So, you have a picture of some reenactor in brigandine - You tag it as “brigandine”, but you don’t tag “rivets” or “visible plates”. You would tag “black leather” because brigandine could be any colour, so you call out the colour so the AI can see that the colour is separate from the armour design. You would also tag the trousers, the helmet, the person wearing the armour, the background, the lighting, the time of day; and then also add in “good quality” or “cropped” or other terms about the photograph itself. This sounds like overkill, but if you don’t do it right then the LORA will do all kinds of weird things that you didn’t intend.

    To give an example - On the first run of my first LORA I was actually kinda shocked that I was getting good results for the garment that I had trained on… But the LORA was also changing the skin tones and the white balance in the image. The training data was skewed towards very warm light and tanned skin tones, and I hadn’t tagged that, so the result was the AI also associated the training terms with olive skin and incandescent light and they couldn’t easily be separated. I had to go back, reprocess the images, retag them and then come back again.

    Which brings me on to the images - You want the largest, highest quality images you can find. You want a range from long shots to close ups, but don’t use anything too close up because SD needs to see how the armour relates to the rest of the figure.

    You don’t need to train on huge sets, but I strongly advise that you grab yourself 100+ images and then aggressively prune that collection down to around 30-50 of the best quality images. You should run all of them through some denoising, and for almost all of them you should fiddle with the colour balance. You don’t need to perfectly match colours or anything, but when you see that the light is a bit orange or the image is dark, just change the levels a bit so they are more neutral. You also want to manually crop the images, and (AFAIK) they do have to be 1:1 squares which often means having to crop figures.

    As for the actual LORA settings - Don’t ask me what they do, I don’t know. I have been kinda kludging together suggestions from different guides and just seeing what happens. I know for certain that my LORAs are training too fast and typically are burning out by about epoch 6 or 7, but I have no idea how to fix that.

    I would recommend that while you are learning you set the trainer to save every epoch and print a sample every epoch. This is because Kohya is fucking twitchy and can sometimes just hang during an 8 hour run, and also so you can monitor the training in real time and see what is happening. I’ve never had a training session that actually needed to run to the end, but the sample images showed training, then good fit, then blurring and overfitting, and I quit out at that point. When you are saving every epoch you of course keep every version to test out, even if there is a crash or you quit, but you can also do a nice X/Y plot of every version of the LORA next to each other and find where the sweet spot is.

    Also, if a LORA is just for personal use you don’t necessarily need perfection to get results that will work for you. The standard for most LORA is that they are very transparent and compose well with other LORA and work on lots of models. That’s a lot of work man, and if you aren’t using it that way you won’t even appreciate it all.

    Instead, I have been using a combination of control nets, latent couples and composable LORA to apply my mediocre quality outputs to specific parts of specific images. It’s a faff, but you can generate a nice knightly figure, freeze his pose, mask off just the torso, specify brigandine, then generate an image that will probably be very good even with amateurish LORA creations. And that way you don’t have to worry too much about your LORA melting people’s faces and turning their hair pink.

    Here are some of the resources that I’ve used:

    https://civitai.com/questions/158/guide-lora-style-training https://rentry.co/lora_train https://civitai.com/models/22530/guide-make-your-own-loras-easy-and-free

    God’s speed, fellow garment enthusiast!

    • Shiimiish@lem.ainyataovi.netOP
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      wow, thank you for the detailed answer! I think you have saved me a couple of wasted hours right there :) I think I’ll start with a Lora on Bucket Helmets (which seems to be the most straight forward of the things I have in mind) and see how that goes. Your detailed comment on tagging was especially helpful, before that I thought I would just tag “bucket helmet”, but not it seems that I’ll need to be a lot more thorought than that.

      • FactorSD@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Yeah it’s crazy frustrating that the most important part of your exciting AI endeavour is tediously tagging images. But it really is important. And, to be fair, it doesn’t take all that long. Get your images together, processed and cropped, then just sit and blitz the tags out in under an hour.

        I read guides that said auto-tagging stuff is absolutely fine, but remember that a LOT of user generated SD stuff is anime/manga based, which doesn’t have the problems with lighting/exposure, and the figures all have nice clean outlines.

        • Shiimiish@lem.ainyataovi.netOP
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          I think I’ll try autotagging with 1-2 pictures just to see how the format looks and how its supposed to work for starters. Then do all the real tagging manually. Just to make sure, I got the syntax right.

          • FactorSD@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            Yeah, that’s a decent idea. Autotagging will at least spit out one txt file per file, which will save you from having to make them, and they can fill in the “man standing in a field” parts of the prompt.

    • polystruct@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Nice answer! Is there a number of concepts, to your knowledge/experience, where a LORA no longer works? For instance, if I want to make a model that understands all car brands and types, assuming that the base model doesn’t of course, would a LORA still be sensible here?

      Most LORAs I find have a more specialized narrow focus, and I don’t know if I would just start with multiple LORAs dealing with individual concepts (a LORA for a “1931 Minerva 8 AL Rollston Convertible Sedan”, a LORA for a “Maybach SW 42, 1939”, etc.) or if I should try and generate one LORA to rule know them all…

      • FactorSD@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        1 year ago

        It would definitely be unwieldy to make LORAs that were so obsessively granular, but it comes down to how specific you want the output to be. It’s probably pretty easy to make a Porsche LORA that will spit out credible Porsche silhouettes, but if you want accurate and detailed 911 Turbo S models you have to train it just on those. More accurate means less flexible; you have to pick your poison.

        If you want an omni-functional thing then you really want to make your own checkpoint… But LORA are still a potentially good way to approach that because you can merge LORAs into a model, and you can merge LORAs into each other. I have no idea how you do that, but the functionality is there.

        My plan is (eventually) to do exactly that with my own niche area of interest, build a model but do it in discrete chunks. So start out with one specific LORA for one specific garment that I have a particular interest in, get that to work well, then make others, then when I have enough that it’s tiresome to sort through start merging them together.

        That said… I am not an expert, not even slightly.