v1.2 beta - the Efficiency/Speed/Quality update
Added 2021-02-07 11:43:15 +0000 UTCFollowing feedback from v1.1 where I changed the vocoder model, this update focuses on the quality of the output. The UI has been changed to accommodate for multiple vocoder models, thus allowing people to pick between which WaveGlow model they prefer to use for synthesis (the old one, pre v1.1, and the newer bigger one).
Additionally, I've added support for bespoke Vocoders on a per-voice basis. You may have seen the two side-by-side audio samples on the Discord server for Nate's voice. I have been running experiments with different vocoders, and I've begun training HiFi-GAN vocoders with Nate's voice data, thus fine-tuning the model to do really well for Nate's voice. In the next update to the Nate voice model, there will be a fourth file (about 50mb). This model is selectable in the new Vocoder dropdown, alongside either WaveGlow model and the "quick-and-dirty" model, and should be of higher quality than any of them. The bonus is that this is the same model as the "quick-and-dirty" model, so although it has the highest quality, it will also be the quickest (joint with the qnd model). Thus, when one of these bespoke models is available for use, it is indicated by a yellow 🗲 icon next to the Vocoder dropdown.
The downside is that fine-tuning the HiFi-GAN model takes a veery long time (I've not even finished training Nate's yet). I will aim to have one such bespoke for all the voices (eventually), but they may be slow to come out at first, while I first focus on getting more voices out.
There were other changes and fixes, including audio post-processing amplitude, a search bar for voices, and more. The changelog:
- Added support for bespoke voice vocoder models - a fourth file in voice packs, a HiFi GAN model trained specifically for that voice with higher quality than either WaveGlow vocoders, at the speed of the "quick-and-dirty" model.
- Added support for models saved as FP16, meaning models from now on will only be half as big (in terms of file size)
- Added an option to select which Vocoder you prefer to use (following feedback on v1.1 vocoder model change) - useful for deciding which to use for the best quality
- Added an amplitude multiplier ffmpeg audio post-processing option
- Added a search bar for the models selection panel
- Fixed issue where keyboard shortcuts were active when typing in input fields
- Added a specific error message to when the output directory for saving audio files does not exist
- UI tweaks/changes, enlarged minimum size of modals, for extremely small window sizes (the window can be resized)
- Fixed some occasional issues with file re-naming (and then editing)
- Made the "Enter" key act as a submit action when in the rename modal (and other prompt modals)
Vocoders (if you don't already have the two WaveGlow models): https://drive.google.com/file/d/1-tNQLMF-3WR4K8fQYHdhhuy3JvBqWmrR/view?usp=sharing
CPU-only: https://drive.google.com/file/d/1GQochGLcp0M7totUn1EA5OCTcKxe-dpj/view?usp=sharing
GPU+CPU: https://drive.google.com/file/d/1RW4Hw4-YL6ojQYKnGWTKNRjRrIJPCDaP/view?usp=sharing
Nate v1.2 (in-progress, only for testing out the unfinished bespoke HiFi-GAN model): https://drive.google.com/file/d/1PL-7_OGrZQVJ52x88tvIIqqiUM1ndSHM/view?usp=sharing
Let me know if you try the beta out and there's any issues. If there are none, I can post it to the nexus sooner!