xvasynth

1 year anniversary of v1.0 | xVASynth on Steam!

Added 2022-01-11 12:02:28 +0000 UTC

Well, it's now been an entire year since I first posted the initial full release of the app, on 11th Jan 2021!

Things have come a long way for xVASynth, from its early days in 2018 (YouTube showcase of v0.1). Since this time last year, we've reached:

Over 450 voices trained (not including re-trained models)
Support for 39 games and game series
Almost 2000 members on Discord
Over 55 downstream mods made with xVASynth in the showcase list (the actual number is much higher, given I only add things people send me, and not mods from other platforms)
Countless YouTube videos with xVASynth voice acting
271,711 total downloads on Nexus, 478,275 views, and 2591 endorsements
4 small spin-off mods/plugins, with 2 more on the way
148732 total YouTube views on xVASynth videos (v2.0 showcase video)

The app has had a bunch of changes and new features added, including:

Much better UI, internationalization, and keyboard navigation
Audio post-processing via ffmpeg, with controls for pitch, tempo, silence padding, format, and amplitude
HiFi-GAN vocoders, replacing big slow WaveGlow vocoders with near instant, high quality audio generation
Batch mode, with super-optimized multi-processed, batched mass voice generation
A third party plugins system, with developer reference for anyone to build with
Support for a third per-letter control vector for models: energy (intensity)
Added support for explicit pronunciation control between { } brackets via ARPAbet, and dictionary management for automated replacements
A brand new, much improved upon pitch/durations/energy editor, along with much better audio player
A 3D voice embeddings visualiser for search and discoverability of voice similarity
Nexus integration, to enable automated search, download, and installation of voice models
Plugin to automatically generate .lip/.fuz files for Bethesda games
(in progress still, but basically finished) Plugin to enable real-time xVASynth based TTS in Skyrim via the Fuz ro Bork mod
A speech-to-speech mode (which finally works...) for generating audio for a voice A, in the style of voice A or B in a reference audio file
Support for multiple voice variants to choose between
Many other, and smaller things

Alongside all this, I've optimized the crap out of the training scripts, and began incorporating it into a companion app, xVATrainer, which currently has the following:

A main menu to create/adjust transcripts for speech datasets, with recording capabilities
[Tool] Audio formatting, to convert any audio to the format required for the deep learning models
[Tool] AI speaker diarization, which automatically extracts speech audio from a super long audio file (eg movie), and extracts it into speaker-separated folders of short audio clips
[Tool] AI source separation, which, given a noisy audio file (eg background music, sfx), outputs just the clean speech audio, to make it usable for training
[Tool] Audio normalization, to harmonize audio from multiple sources to the same consistent levels
[Tool] WEM to OGG, to convert from a common format of game file to a previewable audio format
[Tool] Cluster speakers, to take a folder of tens/hundreds/thousands of audio files from multiple speakers, and automatically sort them into folders assigned to each speaker - or if used with only files from one speaker, separate into different "emotions" (speaking styles)
[Tool] Speaker similarity search, to sort a large corpus of audio files by similarity to a query set of audio files
[Tool] Speaker cluster similarity search - same as normal search, but where the corpus is cluster folders
[Tool] Automatically generate transcripts for speech audio files
[Tool] Remove background noise, to get studio-like silence (no background humming, hissing, fan noise, etc)

But more importantly, I am currently working on the training menu for xVATrainer, which given a dataset fully processed with some/all the above tools, can train up models for you. This bit is quite an undertaking of work, but it is coming!

(the above contains test data, apart from the system resources graphs)

xVASynth on Steam

So let's get to that headline link. I've been mentioning for ages that I'm working on a better way to distribute the app and models, since even before the v2 release. That thing is xVASynth being listed on Steam! This has been an insanely slow process, that I started in September (!), with lots of back-and-forth with Valve, over whether they can legally do it, with lots of U-turn decisions, and convincing.

There are actually a lot of benefits to also hosting the app on Steam (it's not going away from the Nexus):

Much better distribution speeds, for downloading the main app (which is getting quite big now, at ~5GB), and updates. No more slow Nexus download speeds!
Much easier installation, less probable causes of issues
Automated installation of the Microsoft Redistributable C++ requirement, which does catch people out
Automated installations for updates - always up-to-date with the latest changes
Workshop support! And this is a big one (thank you radbeetle for the idea!). As people start training voices with xVATrainer, it might be hard to keep track of any uploads to the Nexus, or wherever, despite the Repo management menu in xVASynth. The Steam Workshop can be somewhere that people upload models to, complete with user ratings. This should also work for other things too, like ARPAbet dictionaries, game themes, and plugins.
Probably some other steam community features that I haven't explored yet - I must admit I've never used the Steam community features

There's actually a lot more work that needs doing for the workshop integration, and I am still waiting for the final go-ahead for the Steam release - subject to some small tweaks I've already made around me having even mentioned that I have a Patreon, in the app 🙄. The upload process will likely go in xVATrainer anyway.

You can already search for it (and a few people are already using it), but due to Steam's rules, I have to leave it in for a minimum of 2 weeks from posting as "Coming soon" before I can actually make it publicly downloadable to everyone. So perhaps the current release date will change to 2 weeks later, when they reply, and if they're not fussy about any more details.

On that note, Steam made sure to stress to me to let people know about the release, so they add it to their wishlist - apparently this is something they use for their algorithms.

The last quirk to mention around this is that I couldn't include the asset files (the game art) into the build, so the app will download with no images in there. Everything still works the same, but just without the images, as is. We'll see if they have a problem with this, but I've hosted the images on a google drive folder here which I will add to, as game support expands. There's a link in the app to this, which you'll have to go to and download the images from and manually place them in the ./resources/app/assets folder yourself, as a one-time installation thing.

Future plans

Right now, my main priority is getting xVATrainer finished, so that people can train their own voice models. Of course, things will significantly change once this happens, and I will most likely be free to switch to mainly just a research/development role - though I will of course continue to also train voices.

The discord server/channels design will have to adapt to that, as it becomes much more decentralized/community driven. I have some plans for new stuff to add to it too, which should be fun, once I upgrade my raspberry pi server to something more capable.

Speaking of the research/development stuff, I've actually also made some great progress with v3 models (which I also had to prioritize, for non-xVASynth related reasons), which I'm very excited about. I will post updates on this when I'm closer to the final design/feature spec.

---

All-in-all, thank you for the insane support over the last year - I never expected this tool to get used as much as it has been so far! I really appreciate all the messages, the support here, the funny videos, the mods, and the community we've built.

Now let's see what the following year brings...