xvasynth

v1.1 beta - Big update + GPU news

Added 2021-01-31 12:29:48 +0000 UTC

This post is two-fold, first a pre-release of the biggest update to xVASynth yet, incorporating many feature requests, fixing numerous bugs, and making several large improvements to existing components/features. The second is a big announcement on the GPU situation, and a new rough plan for the future of development of new voice models.

Let's start with the v1.1 pre-release changelog:

* Ability to edit previously generated samples
* Numerical inputs for sliders
* Upgraded WaveGlow to a better version. Bigger and a bit slower, but better sounding output across the board
* Re-implemented a big part of the letter lengths mechanism, fixing many issues in the process
* Ability to edit multiple letters at once (ctrl+click multiple), moving pitch slider together as a group
* Compiled using CUDA 11.0 (fix for 3000 series GPUs)
* Shift-click "Keep sample" to first bring up a file name prompt
* Stopped disabling the "Keep sample" button (useful for when outputting several audio post-processing versions)
* Made clicking on the letter bring focus to it (instead of clicking the slider and accidentally changing its value)
* Made the pitch editor taller
* Show the values for the editor sliders in a local tooltip
* Fixed bug with renaming files with explicit file format in the name
* Fix mp3 audio post-processing output sometimes failing
* Fixes to file re-naming if full stops were present in the input sequence
* Several other internal changes, including tweaks for any potential memory leaks
* Keyboard shortcuts, with a new "Information" menu, containing this cheat sheet:
    - left/right arrows: Move between letter focused
    - SHIFT-left/right: multi-letter create selection range
    - up/down move pitch up/down for the letter(s) selected
    - CTRL+left/right arrows change the sequence-wide pacing
    - CTRL+up/down increase/decrease buttons
    - CTRL+SHIFT-up/down amplify/flatten buttons
    - space bring focus to the input textarea
    - Enter: Generate sample
    - CTRL-S: Keep sample
    - CTRL-SHIFT-S: Keep sample (but with rename prompt)
    - Escape: close modals
    - Y/N for prompt modals

This is definitely the largest update so far, and while a good number of issues have been fixed, it's entirely possible that some interesting new bugs are introduced. Hopefully not, but I am posting this pre-release here ahead of posting it on the nexus, for people to take for a spin, should they wish to help test it out.

GDrive link (GPU+CPU): https://drive.google.com/file/d/1MjZxOl-zvyxsYQgn4C1NkduuQyyNJSde/view?usp=sharing
GDrive link (CPU-only): https://drive.google.com/file/d/18sVH26Z8y5ituX3o7QFxegPT0pfl75AF/view?usp=sharing

---

Now about the GPU news. I've already posted this on the Discord server, so there's nothing new here if you've already seen that. Otherwise, the big news is that an amazing member of the community (who has chosen to stay anonymous in this announcement) has donated a 3090 GPU for this project, to speed up development, and enable large-VRAM Tacotron2 training sessions which I could not do before.

I've set the GPU up, and sure enough, I was immediately (after just over a day of training, that is) able to converge a Tacotron2 model! I've since been messing around with different configurations and implementations, but I've also trained a decent-sounding model for Nate. I'm currently moving onto using this Nate-trained Tacotron2 model to pre-process and train the resulting FastPitch model, so expect a new, better Nate voice model soon!

The main request from this "anonymous benefactor" which affects the development process is for me to focus first on Fallout 4 voices (I guess it makes sense, it's the game with the most amount of quest mods still in the planning/development process). This does not mean I will not be doing voices on other games any more, just that I will do these first.

So a rough new plan is as follows:
(in progress) Train up a better voice for Nate
- Try to improve Nora further by training/using a bespoke Tacotron2 pre-processing model
- Make a first pass on every single (doable) Fallout 4 female voice, as before
- Make a first pass on every single (doable) Fallout 4 male voice, as before
- Go back through these voices, and where the quality is not that good, either try using the Nate T2 model, or if there's enough data for the voice, try to fine-tune a T2 model on it, and re-do it with this if possible (success will vary, depending on the amount of data, and strangeness of the voice)
- Fallout 4 robot voices, and some other difficult voices if I can get them to work (eg. ghouls and supermutants?)
- Having covered Fallout 4 voices, I can now go back to other games, and I will resume them, following this Fallout 4 process (a first, breadth-first approach, then depth-first into voices that need extra work)

I will still be doing polls on which voices I work on first, going along these guidelines (the next poll should be ready shortly, as soon as I'm done with Nora), and taking Patreon requests outside of that. Furhermore, I'll still be working on other things in the mean-time, such as new features (big and small) for the xVA app.

I will keep the Patreon for the time being (its main reason was funding the new GPU), as there are still hardware components I'll be buying (CPU+motherboard, RAM, etc - already had to buy a new power supply to run the GPU) as to not bottle-neck the GPU during training.

All-in-all though, expect faster development of voices from now on! :)