[ Home ]
[ aca / en / f / h3 / i / jp / t / v ] [ dis ] [ Home ] [ FAQ ] [ Rules ] [ Catalog ] [ Archive ] [ RSS ]
Board Statistics
Board PPD Total Posts Unique Posters Last Post
Take it easy!

Screen Shot 2026-04-18 at 11.57.41 PM.png - 230.10 KB (936x2115)

In the last post about my vocal synthesis project, I talked about implementing the Wide-Band Voice Pulse Modeling algorithm. Since then, I've actually done some original research of my own and have devised what I believe to be three minor improvements to the algorithm. I implemented the Wide-Band Voice Pulse Modeling algorithm (from Dr. Jordi Bonada's PhD thesis: https://www.tdx.cat/bitstream/handle/10803/7555/tjbs.pdf) via the upsampling method (specifically, upsampling via a natural cubic spline). There are actually two methods proposed in that paper, the other being via periodization. There is actually a patent that pertains to WBVPM, but it only covers the periodization version (which is what they used for their results), so I have implemented the upsampling method instead. I have been able to validate the main results in that paper; specifically, its shape-invariance and lower residual when compared to other methods. Furthermore, I have devised three significant improvements to the algorithm - two of which are only possible because I used the spline approach, so in a sense it was good that I had to do it that way. Of the three improvements, I have implemented the first two and shown their advantage of the original WBVPM algorithm. The resulting score has been obtained by taking the mean of the relative of residual level (i.e. the difference between the original and reconstructed signal; relative to the level original signal). I have done so on an audio sample that deliberately exhibits traits that were noted as negatively affecting the WBVPM algorithm's resulting quality. Notably, a low pitch voice with rapid and deep vibrato, transients, strong amplitude modulation, and a large portion of the sampling being between a voiced/unvoiced/voiced transition. First I should note that my WBVPM implementation is currently far from optimal. The pitch estimation system (via the modified TWM algorithm) has not undergone testing and tuning of its parameters, and there are many variations of the TWM algorithm to consider. Additionally, I have not implemented unvoiced/voiced detection (because, as far as I can tell, it is not mentioned in Bonada's thesis; presumably it's in prior literature, but I have not researched it yet), so all the algorithms act as if they are always processing a voiced signal even when they are not. RESILIENT BORDER INTERPOLATION IN SYNTHESIS - When I first implemented the synthesis step for WBVPM, it was late at night and I was tired. I wanted a quick result before I went to bed and didn't understand the wording of the description of the synthesis step in WBVPM. As such, my original implementation differed significantly. Instead of using the overlap-and-add, it instead, for each sample, found the closest voice pulse and determined its value for that time, taking advantage of the spline that was generated for downsampling and using the periodic nature of the pulse to extend it when the sample was beyond its domain (i.e. the opposite of overlapping). This approach lead to high-frequency crackling artifacts due to discontinuities between the voice pulse boundaries. The following day, I properly understood the synthesis approach and rewrote the synthesis code. Interestingly, this actually gave worse overall results. While the high frequency artifacts were gone, there were now large low frequency artifacts that appeared as large modulations in the time-domain. I eventually tracked this down to being a bug in my implementation of the MFPA algorithm that sometimes resulted in massive errors of up to 1.5 radians. I fixed this bug and the reconstruction synthesis no longer had significant artifacts, but I thought it was interesting that my approach, despite having the discontinuity issue, was more resilient to errors in the MFPA estimation. I began thinking if the two approaches could be combined to create an even better approach. (continued in the next post)

>>

I was thinking about why the modulation occurred in the case of the overlap-and-add method. Thinking about it, when the fundamental frequency is stationary and the MFPA onsets are perfect, the trapezoidal window function is equivalent to a weighted average between two adjacent voice pulses over the duration of twice the border interpolation size. However, when the MFPA onsets are inaccurate, or even just when the fundamental frequency is non-stationary, this is no longer true. Even worse, thinking about it from the weighted average point of view, the sum isn't necessarily one everywhere anymore, hence the modulation. I then devised a method that would not result in modulation. This method works by first synthesizing the 'inner' portion of each pulse (by 'inner', I mean starting at the end of the border interpolation at the start, and ending before the start of the next border interpolation towards the end of the pulse). Then, for the gaps in between each pulse, we calculate each sample value by a weighted average of two values. These are values are the values of each voice pulse at that time. Since the gap extends beyond the boundaries of each voice pulse, we use the periodic nature of the pulses to compute the effective position in the voice pulse by taking the position modulo the period of the fundamental frequency at that voice pulse. The fundamental frequencies of each of the voice pulses may differ, so we actually change step in time linearly. At each end of the gap, the step size for the voice pulse it is next to is one sample in time, while the step for the former voice pulse is the equivalent of one sample in the latter voice pulse relative to the former's fundamental frequency (e.g. if the second voice pulse has twice the fundamental frequency as the first; the step size for the first would be 2 and tep size for the second would be 1, at the end of the gap). For the start of the gap, it is the same except relative to the first pulse having a step of 1. In between, we the step size interpolate linearly. It is worth noting that in the ideal case where the onsets are exactly correct and the fundamental frequency is stationary, the result of this approach is the same as using the trapezoidal window. FREQUENCY WARP-CORRECTION - As noted in Bonada's thesis, WBVPM assumes that the fundamental frequency is stationary within each pulse, however this is not actually true, and that the artifacts from this are particularly apparent for low fundamental frequency voice signals, because each period of the signal is longer in time and thus the internal state of the system has more time to change. One of the changes that can happen over time is modulation of the fundamental frequency. This can be thought of actually as a time-domain remapping function that distorts each voice pulse according to a continuous fundamental frequency trajectory. I discovered a way of correcting this, largely by accident as I was thinking about solving the modulation issue I discussed in the previous section. I was thinking about how I proposed changing the step size linearly in the gaps between the 'inner' pulses. I was thinking, we have a discrete sequence of fundamental frequencies. So, what if instead of changing the step size linearly, we instead created a spline from the fundamental frequencies and instead changed the step size based on that? Then I realized that we could also use this for the whole voice pulses and just sample everything with a step size based on the fundamental frequency trajectory. I then realized that this would actually act like the distortion from changing parameters within each voice pulse, at least in the synthesis stage. Further more, since we are already computing splines for each voice pulse to downsample it, this comes at very little additional computational cost. (continued in the next post)

>>

However, the voice pulses in analysis are already distorted. So then I already we can do the inverse resampling in the upsampling stage of WBVPM analysis to correct for non-stationary frequency, then it is redistorted according to the transformed fundamental frequency trajectory in the synthesis stage. This makes this method effectively invariant to modulations in fundamental frequency, so long as the modulation is less than the fundamental frequency and it is modeled well by the spline, which should be the case for modulation period is at least several voice pulses. PITCHED/UNPITCHED DECOMPOSITION - As mentioned in Bonada's thesis, WBVPM only models sinusoids, and thus residual in the input signal is encoded as flucuations between the spectra of voice pulses. I devised a post-processing technique to separate the voice pulses into sinusoidal and residual components. I have not actually implemented and test this yet, because as it is post-processing step, it will not improve the residual level, and in fact, will probably make it worse. The benefit of it is in the transformation stage, which is much harder to quantify and which I have not finished implementing. However, I believe this approach should work. The technique works as follows: a) First, for each voice pulse, and then for each harmonic of its spectra, we compute a spline based on the values of the amplitude of that harmonic in the voice pulse as well as a fixed number of surrounding voice pulses. b) Since the time delta between voice pulses can vary, we then resample each local harmonic spline with fixed steps in time. c) We compute the fourier transform of these resampled local harmonic trajectories d) We apply a low-pass and high-pass filter to separate it into low-frequency and high-frequency components. e) We then apply the inverse fourier transform to each of these. We can then sample the low pass trajectory at the time of the voice pulse to get the amplitude value of the denoised harmonic for that voice pulse. The same can be done for the high pass trajectory to obtain a pseudo-pulse representing the residual. These residual voice pulses can then be synthesized using the WBVPM synthesis method to obtain a time-domain residual signal which can be processed separately from the main harmonic signal. A significant source of error in this process presumably would come from the resampling step. This can be decreased by using a smaller time step, at an increased computational cost. However, the error could probably be greatly reduced by first calculating the difference between the original amplitudes and the amplitudes at the same times in a spline computed from the resampled harmonic spline before applying the band filters, this difference can later be added back to the low-pass amplitude trajectory. The denoised harmonic phase can also be computed via the same method, using Bonada's method for unwrapping phase across both frequency and time. The residual phase can be calculated by taking difference of the original phase from the denoised phase and dividing it by the residual amplitude. RESULTS: I have tested these improvements and obtained the following results for the aforementioned audio sample: Original WBVPM: -36.355dB Warp-correction improvement only: -36.74595dB Warp-correction & Resilient border interpolation in synthesis: -37.41177dB More research is needed to properly evaluate these improvements across more samples with more variety, and to see if these techniques still result in improvements with more accurate pitch and MFPA estimation and with proper handling of unvoiced/voiced frames.

>>
1w.jpg - 29.34 KB (424x358)

Okay


Vctorique-1024x576-98398285.png - 444.81 KB (1024x576)

Does anyone else hate how men cant talk about themselves and their feelings amongst eachother until theyre blackout drunk and have to forget everything or pretend to forget everything the day after? Basically no point to it.

>>
image.png - 1438.15 KB (1080x2089)

>>12327 In my defense, it's an "ex"-girlfriend neco

>>

hikarin... you can overcome this femcel madness. i love you.

>>

>>12331 is there hope for a pathetic fakecel like me hikarin cry

>>

>>12338 what's a fakecel

>>
tree-snake-7982626_1280-3302527811.jpg - 267.73 KB (1280x984)

>>12339 a type of snake


89b16d06384a2c86659ae60bffacda3f.gif - 3158.31 KB (384x288)

We got the moves, hikarin!

>>
https://www.youtube.com/watch?v=E40N4MeAtiU

キタ━━━(゚∀゚)━━━!!

>>

I feel like this dance just works with any songs I listen to surprised

>>
wrigglebeckydance.gif - 203.80 KB (200x200)

キタ━━━(゚∀゚)━━━!!

>>

So many Colors

>>
dance.gif - 1042.55 KB

>>10792 Me neither but we can try, hikarin. Soooo, lets do it!


https://www.youtube.com/watch?v=qWXnt2Z2D1E

I want to talk about horror media that is sourced from the internet. Not necessarily full-release movies or well-known published games. I'm thinking more analog horror, creepy videos, horror flash/indie games, even stuff like the SCP foundation falls under this category. Bonus points for more niche stuff or media you consider internet history. Starting this thread off with this video. I love this one, I am a huge huge fan of atmospheric horror and I think this is a great example of such.

>>
https://www.youtube.com/watch?v=7esdLo2f7mo

This one is decently new, came out last year. It's not the spookiest thing I have ever seen but I enjoy it. More than jumpscares or gore, I REALLY love horror that makes me feel deeply unsettled. Uncanny valley and such.

>>

the original SCP - Containment Breach game got an update after 8 years! very cool to see. it's on Steam now too. https://store.steampowered.com/app/2178380/SCP__Containment_Breach/

>>
image.png - 88.19 KB (795x593)

>>12293 As for demonophobia, I came across it on some Chinese forum and read about the creator (except for the nickname, almost nothing is known, and that's it). The game itself is a labyrinth-style adventure with brutal guru murder scenes (not so brutal).

>>

>>12293 Just found out about this one

https://youtu.be/7iFXyLah2oQ [Embed]

>>

>>12347 oh my goodness i completely forgot about this video... that brings back memories!



photo_2026-04-14_10-20-26.jpg - 215.91 KB (2560x1152)

I've moved again, and it's annoying that our housing prices are becoming more and more exorbitant. Our teenagers have already realized that they won't even be able to save up for a down payment on a mortgage and are basically living from day to day. I'm the same way.

>>

Just get on welfare. >(windows)

>>

>>12341 I feel you Hikarin... I'll be moving places for the 4th time next month. Everything is so expensive, I feel like I'm throwing money down the sink

>>
4bccbcb108664a9d1f9f064ac61797ce.jpg - 62.73 KB (736x736)

>>12343 I'm too lazy for that and I have a job. Today I got my salary of $290, I paid off my debts, and about half of it went away. >>12344 Yes, and this is infuriating. I hate this life. I'll buy myself a house on wheels and drive around the mountains, drink Chinese tea, and enjoy the views. I'll find another remote job.

>>
1000022206.jpg - 159.44 KB (2048x1152)

I feel you. I'm moving back in with my father at 25 just so I can finish my degree and save up money without working myself ragged (as I have been doing for some years now). I'm taking a significant pay cut swapping jobs but it's in a field I am actually passionate about. Plus I'll only be working 3-4 days instead of 5 which gives me time for my hobbies (that I have been neglecting for lack of time cry). I am thankful to God I have this option, I am so fed up with the rigamarole of life man... >>12345 Living on wheels and just travelling around sounds so nice. Society is sick anyway. It would be better to live as a nomad in an RV or a hermit in a monastery than have to suffer 50 years of the corporate West.


2a6f5e921bc928ba4fb4961b1cc2b05c.jpg - 722.09 KB (1714x1392)

A cat left a dead rat with its guts out in my backyard and now it smells like shit there cry

>>
1000022205.jpg - 285.67 KB (850x808)

You can't blame a nyaggan for following it's WILD instincts. neco


4700c2eda93a17685266b7a5bb5f1a25.jpg - 563.51 KB (1457x2064)

What do you know about Hikari3, Hikarin?

>>

>>12305 I got permabanned for saying this about syrno the fascist, bit luckily for nord VPN discount code hikari, I can evade syrnos bans eazy. shades

>>

i know that the 3 year anniversary is coming up snicker

>>
en.png - 1128.94 KB (2048x2048)
>>

>>12330 Why should I? What you gonna do? Ban me again? I'll just change VPN again and evade. You'll never stop me. shades

>>

>>12333 like a ghost.... in the wind..... vengence


ampit.jpg - 178.52 KB (600x847)

Anyone ever wonder what it would feel like to get their armpits sniffed and licked?

>>

>>12215 Yes

>>

licking is brutish and barbaric cultured men and women only sniff

>>
84478.jpg - 181.13 KB

Who knows.

>>
37191.gif - 24.25 KB
>>

It would make me feel like shit for sure.


Screenshot_20260403_232058.png - 918.54 KB (2560x1324)

i'm geekin on giko shades talk about gikopoipoi.net and share fun screenshots. https://gikopoipoi.net/

>>
image.png - 196.60 KB (624x351)

No seat left for me

>>
image.png - 147.34 KB (853x443)

>>12290 here, for you happy

>>

what is tis site? never heard of it before. (i just joined the link to look arround and i found another site like this but with more characters)

>>
image.png - 32.34 KB (403x265)

>>12312 it's a chatroom, basically. you pick a name + avatar, then you can join a list of rooms to chat. can stream music or screenshare too. it's fun to hop on at night and chat with gikos. i believe it has history with the Japanese site 2ch, i know for a fact the characters come from there.

>>
image.png - 462.87 KB (1708x717)

watching someone eat a hot dog


Delete post: [ File only ]