Screen Shot 2026-03-02 at 4.18.40 PM.png
- 444.29 KB
(2003x1640)
So I've been working on a project to re-implement the VOCALOID1 engine. I'm basing it on the description in Jordi Bonada's PhD thesis "Voice Processing and Synthesis by Performance Sampling and Spectral Models" and not the original papers as the former is more detailed, easier to follow, and also describes the VOCALOID2 engine. After a lot of trouble with getting TWM f0 estimation to work, I've finally gotten to implementing MFPA. And amazingly, it seems to have worked first try. Compare my results: https://i.ibb.co/dsvgv0fd/Screen-Shot-2026-03-02-at-3-54-48-PM.png To the results in the study: https://i.ibb.co/C3fjdWVd/Screen-Shot-2026-03-02-at-3-55-09-PM.png
