xvasynth

Interim 11 (4 new and 7 v2 re-trained) voices;

Added 2021-12-28 19:28:05 +0000 UTC

I hope everyone's having a great holiday.

I've been taking it a little easier on the voice training, and focused a bit more on re-training some main voices from v1 to v2. I've included them here, as well as some new voices.

The new voices:

- GTA Vice City: Colonel
- GTA Vice City: Diaz
- GTA Vice City: Lance
- Age of Empires: Narrator (IV)

Voices re-trained to v2:

- Witcher: Triss
- Skyrim: Isran
- Skyrim: FemaleOrc
- Skyrim: FemaleArgonian
- Skyrim: MaleArgonian
- Skyrim: FemaleYoungEager
- Skyrim: FemaleDarkElf

With this update, there is now support for the Age of Empires game series, so please remember to add the asset files (.json and .jpg) to ./resources/app/assets, to enable this.

---

The next poll is here, too, for the following batch (check the next post).

I'm still enjoying the last few days of time off work, but I've used up some of this time to develop xVATrainer further, which is now inching closer to training FastPitch models (Step 3* now mostly finished). I've also made some tweaks and adjustments to the v2 models training script, to make them faster to train, and hopefully a bit better quality too - do let me know if there's any feedback (starting from the next batch onwards)!

And although it's quite some time away still, I've begun the research+development process for v3 models! I've got some great things planned for this, that I'm quite excited about, but it's too early to start getting into the details for this just yet - stay tuned!

Keep an eye out for the next patch (v2.0.7 or v2.1) as well, which will contain some big fixes to speech-to-speech, amongst other things.

Lastly, if you've missed it on Discord, I've recently lowered the "minimum" required number of lines for training a model to 150ish lines (down from like 250), following improvements to the v2 training script. Of course, there are many factors which will add variation to this, and they won't necessarily fail if it's a lower amount - they'd just be lower quality. But this is now the minimum amount to aim for (for both male and female voices).

* Steps copied here from the sticky post on Discord, for those not in the server:

✔ 1) I quickly finish off adding in some new data pre-processing tools that I've written since adding in the tools, originally
✔ 2) I design and draft up the UI/flow for model and batch model training
✔ 3) [the hard part] I set up a model training backend, with all the necessary groundwork for managing model training instances, and inter-process communication, for multiple model types
☐ 4) (done together with 5 or 6) Implement a tensorboard-style graph for losses and maybe some metrics, alongside text log feedback
☐ 5or6) Integration of modified FastPitch v2.0 model training into the framework
☐ 5or6) Integration of HiFi-GAN model training into the framework
☐ 7) Harmonize everything into a good batch training flow, to allow automated training of a list of voices
☐ 8) Optimizations, and other "glue" features