2024 Note on Registrations

philip_bennefall · 2019-01-13 04:26:01

philip_bennefall
red potter
Offline

From: Sweden
Registered: 2007-06-07
Posts: 748
User Karma: 121

Hi all,

I just wanted to share a little pet project of mine with you all. It is an attempt to create a formant based speech synthesizer from scratch. If you don't know what a formant based speech synthesizer is, think Eloquence or DECTalk. Formant synthesis is based on mathematics rather than prerecorded segments of speech.

I have nothing to release yet and probably won't for some time, but I hack away at this on weekends. I am keeping a diary of my efforts with accompanying audio examples. In case anyone is interested, the URL is:
http://blastbay.com/blastvox/

I am still at a very early stage so don't expect anything impressive. Still, it's a lot of fun and I make slow but steady progress.

I will update this from time to time, so feel free to check back every now and then if you're curious.

Kind regards,

Philip Bennefall

Stormy · 2019-01-13 05:47:07

Stormy
Business monkey
Offline

Registered: 2019-01-04
Posts: 271
User Karma: 24

This is a pretty interesting project. Hats off for the hours spent scratching your head at the online articles that never seem to give you the piece of information to make everything click inside your head. I've definitely been there. I'd love to see how far you are able to get with this. I gave the entries a quick read, but I'll probably have more questions once I'm able to sit down and wrap my head around everything.

Trying to free my mind before the end of the world.

philip_bennefall · 2019-01-13 06:07:29

philip_bennefall
red potter
Offline

From: Sweden
Registered: 2007-06-07
Posts: 748
User Karma: 121

Sounds good. I'm definitely open to tips and suggestions from those who are interested. I'm having a lot of fun with this and I welcome feedback from anyone who takes the time to read through the text and listen to the output.

Kind regards,

Philip Bennefall

defender · 2019-01-13 06:36:04

defender
Fainting adventurer
Offline

From: Southwestern United States
Registered: 2012-01-13
Posts: 6,403
User Karma: 1,648

Honestly I'm just glad your not dead.

PREPARE
YOUR
ANUS!
https://freesound.org/people/SilverIllu … nds/546960

GrannyCheeseWheel · 2019-01-13 06:39:33

GrannyCheeseWheel
Banned
Offline

From: Pennsyltucky
Registered: 2015-01-29
Posts: 12,701
User Karma: 2,272

This is pretty cool.

Facts with Tom MacDonald, Adam Calhoun, and Dax
End racism
End division
Become united

philip_bennefall · 2019-01-13 06:42:01

philip_bennefall
red potter
Offline

From: Sweden
Registered: 2007-06-07
Posts: 748
User Karma: 121

Oh far from dead, just not doing much of anything relating to audio games so haven't had a lot of relevant things to post on here. Strictly speaking this is not really relevant to games either which is why I put it in the off topic room, but I figured some people might find it fun to follow along since I'm sure I'm not the only one interested in speech synthesis around here.

Kind regards,

Philip Bennefall

Draq · 2019-01-13 08:17:25

Draq
Playroom playboy
Online

Registered: 2015-09-02
Posts: 1,538
User Karma: 194

Yikes. I always wondered how painful it was to make something like eloquence. Now I know.

datajake1999 · 2019-01-13 08:29:16

datajake1999
Kingdom crafter
Offline

From: Someware in space
Registered: 2015-02-17
Posts: 445
User Karma: 34

Can you post a link to the git repo? I would like to look at the source code. I think this is also an interesting project.

philip_bennefall · 2019-01-13 09:50:56

philip_bennefall
red potter
Offline

From: Sweden
Registered: 2007-06-07
Posts: 748
User Karma: 121

I'm keeping the source closed for now, just so that I can decide what to do with the final product if it ever reaches acceptable quality. If I release it as open source now, I limit my options later. It is not out of the question that I will release it as open source in the future, but I want to work on it a little more before I make my decision.

Kind regards,

Philip Bennefall

Sage_Lancaster · 2019-01-13 09:58:03

Sage_Lancaster
Hunter grunt
Offline

Registered: 2017-11-13
Posts: 1,295
User Karma: 105

I hope this project works out for you

ManFromTheDark · 2019-01-13 12:30:59

ManFromTheDark
Dark matter miner
Offline

Registered: 2017-11-02
Posts: 556
User Karma: 30

Nice enthusiasm in here! Keep on hacking the funky stuff and may-be one day there will be another oldschool formant voice to use in electronic music.

philip_bennefall · 2019-01-13 12:44:15

philip_bennefall
red potter
Offline

From: Sweden
Registered: 2007-06-07
Posts: 748
User Karma: 121

I just had another hacking session and managed to get semi-voiced sounds working, as well as a short fade at the beginning and at the end to get rid of clicks. There is a new diary entry with audio examples (see post 1 for the link).

Kind regards,

Philip Bennefall

visualstudio · 2019-01-13 13:07:14

visualstudio
Human antivirus
Offline

From: in the ground
Registered: 2012-02-16
Posts: 889
User Karma: 46

hi,
maybe you can use deep learning for it as well.
like convolution neural networks etc.
maybe WaveNet can help you a little bit.
also, eSpeak is somehow like what you are trying to achieve.

Stormy · 2019-01-13 17:00:06

Stormy
Business monkey
Offline

Registered: 2019-01-04
Posts: 271
User Karma: 24

Questions:
1. Wouldn't the formant/bandwidth values for voiced sounds slightly vary depending on the voice itself? (my point being, if a different person recorded the stories, would your formant values be different?)
2. So, since you ended up needing two sets of filters for voicing, are you running the same signal through both sets of filters in parallel?
3. How long does it take to generate an audio file for a short sentence like the ones you've been using for testing?

Trying to free my mind before the end of the world.

jack · 2019-01-13 18:17:04

jack
Administrator
Offline

From: United States
Registered: 2010-10-16
Posts: 6,801
User Karma: 936

Neural network/deep learning is not possible with a formant synthesizer the likes of what is being developed here. Google, for example, has stupid amounts of compute power allocated to just wavenet. Granted, they've trunkated the resources significantly from when they started, but it's still far from something that can run on an embedded device.
As for the synth itself, I like what I'm hearing so far! I assume you've moved on from Festvox and are just doing this by hand now, since Festvox, from the looks of it, requires quite a lot of resources to run?

Stormy · 2019-01-13 20:19:57

Stormy
Business monkey
Offline

Registered: 2019-01-04
Posts: 271
User Karma: 24

@jack
For something that started out being done by hand, the synth sounds incredible so far to me. Coming from information gathered from the web and a few estimates, it's definitely farther than I'd be able to take it. The processing power needed for wavenet is insane. I don't remember exactly what it was, but the amount of time required to generate one second of audio, even with the top-notch resources that a company like Google has, took an extremely long time. (at least at the time the article I read a while back was posted)

Trying to free my mind before the end of the world.

ManFromTheDark · 2019-01-13 20:39:42

ManFromTheDark
Dark matter miner
Offline

Registered: 2017-11-02
Posts: 556
User Karma: 30

Formant speech synthesis has nothing to do with that whole wavenet stuff.

harrylst · 2019-01-13 20:49:17

harrylst
Hunter grunt
Offline

Registered: 2008-03-22
Posts: 1,304
User Karma: 33

This is super interesting. Will it be easy (once you get the synth up and running), to make it bilingual? If, for example, it could support Swedish as well? (Although that'd be a headache in itself because of the different rhythms of the languages...)

skype name: techluver
Feel free to add me.

jack · 2019-01-13 21:28:30

jack
Administrator
Offline

From: United States
Registered: 2010-10-16
Posts: 6,801
User Karma: 936

Wavenet is a concattinative respin that involves sample-for-sample synthesis rather than splicing, either way it is not formant at all. And Festvox is diphone-concatinative.

philip_bennefall · 2019-01-13 22:14:37

philip_bennefall
red potter
Offline

From: Sweden
Registered: 2007-06-07
Posts: 748
User Karma: 121

The text processing part of the system was trained by way of machine learning by the folks who made Festival/Festvox/Flite, so technically I am using machine learning - just not in the synthesis backend code.

ESpeak is similar to what I want to do for sure, but I am personally not very fond of its output so I wanted to see if I could achieve something different.

@BoundTo:

1. The formant frequencies and bandwidths do vary for different speakers, especially between men, women and children. It is definitely possible to derive a new formant table to add a new voice, though some other tweaks would be needed as well such as defining the average pitch etc.

2. I generate two completely separate signals, one with white noise and one with the pulse; a sawtooth in this case. The noise runs through the unvoiced filters and the sawtooth runs through the voiced ones. They are then combined using envelopes to smoothly turn the two sources on and off as appropriate.

3. Rendering the slow version of the "visual roses" sentence which comes out to 3.91 seconds, took 31 milliseconds on my laptop. This is generating the whole thing in one go, however, which you would not do in a streaming application - you could generate as little as 5 or 10 milliseconds of audio at a time in many cases. Note that I have not done any work on optimizing the code; I'm sure I could speed it up significantly down the road.

@Jack Festvox is a suite of voice building tools, you don't actually use the code in Festvox in the final synthesizer. Since I am using Flite for text processing, I'm still very much making use of Festvox. But you're absolutely right in thinking that all the synthesis code is being done by hand.

@harrylst Yes, you could definitely make a Swedish voice using the Festvox tools as a starting point. You would train models for duration and fundamental frequency prediction based on natural speech, and you would train a letter to sound rule set by analyzing a large pronunciation dictionary. All this is possible in Festvox, and the output can be converted to Flite which means I could get phones, durations and a pitch contour for a sentence. But it would take some tweaking and quite a bit of trial and error to get the various models right, not to mention getting formant settings for all the sounds that differ from English.

Thanks so much for all the positive feedback, everyone!

Kind regards,

Philip Bennefall

ammericandad2005 · 2019-01-13 23:30:32

ammericandad2005
Great word adventurer
Offline

From: Castle Wolfenstein
Registered: 2013-12-03
Posts: 1,782
User Karma: 23

Hay philop,
how do you plan to release this? sapi 5 engine, NVDA addon, or both?

be a hero and stop Coppa now!
https://docs.google.com/document/d/1Dkm … DkWZ8/edit
-id software, 1995

ammericandad2005 · 2019-01-13 23:31:00

ammericandad2005
Great word adventurer
Offline

From: Castle Wolfenstein
Registered: 2013-12-03
Posts: 1,782
User Karma: 23

or an android tts voice?

be a hero and stop Coppa now!
https://docs.google.com/document/d/1Dkm … DkWZ8/edit
-id software, 1995

philip_bennefall · 2019-01-14 00:11:44

philip_bennefall
red potter
Offline

From: Sweden
Registered: 2007-06-07
Posts: 748
User Karma: 121

Probably all of them, if I reach a high enough level of quality. If the output is reasonable, I can package the system in all sorts of ways.

Kind regards,

Philip Bennefall

jaybird · 2019-01-14 01:47:28

jaybird
Hunter grunt
Offline

From: Louisville, Kentucky, USA
Registered: 2005-03-05
Posts: 1,211
User Karma: 84

This sounds really cool! I'm looking forward to learning more about this!

Personally I absolutely cannot stand Espeak, so I would love to see something better on the open-source market as it were.

visualstudio · 2019-01-14 06:09:04

visualstudio
Human antivirus
Offline

From: in the ground
Registered: 2012-02-16
Posts: 889
User Karma: 46

regarding deep learning and wavenet:
wavenet takes some audio as it's input for training, and it can produce audio output.
i'm not talking about the computational power either, just talking about the speech model
it is not only for generating speech, it can even generate music.
regarding festival/festvox, they are not so much needed in this project, since festvox by itself is used to build voices for festival/flite
also, festival has support for diphone, but it's recommended way is clustergen (unit selection).
now, coming to the text processing part:
for converting text into phones, g2p is your best option.
it becomes better, when it is trained on a sequence to sequence model.
p.s: checkout soloud
it has a little synthisizer.

2024 Note on Registrations

Making a Vintage Sounding TTS Voice (Page 1 of 3)

Posts: 1 to 25 of 53

#1 Topic by philip_bennefall 2019-01-13 04:26:01

#2 Reply by Stormy 2019-01-13 05:47:07

#3 Reply by philip_bennefall 2019-01-13 06:07:29 (edited by philip_bennefall 2019-01-13 06:10:19)

#4 Reply by defender 2019-01-13 06:36:04

#5 Reply by GrannyCheeseWheel 2019-01-13 06:39:33

#6 Reply by philip_bennefall 2019-01-13 06:42:01

#7 Reply by Draq 2019-01-13 08:17:25

#8 Reply by datajake1999 2019-01-13 08:29:16

#9 Reply by philip_bennefall 2019-01-13 09:50:56

#10 Reply by Sage_Lancaster 2019-01-13 09:58:03

#11 Reply by ManFromTheDark 2019-01-13 12:30:59

#12 Reply by philip_bennefall 2019-01-13 12:44:15

#13 Reply by visualstudio 2019-01-13 13:07:14

#14 Reply by Stormy 2019-01-13 17:00:06

#15 Reply by jack 2019-01-13 18:17:04

#16 Reply by Stormy 2019-01-13 20:19:57

#17 Reply by ManFromTheDark 2019-01-13 20:39:42

#18 Reply by harrylst 2019-01-13 20:49:17

#19 Reply by jack 2019-01-13 21:28:30

#20 Reply by philip_bennefall 2019-01-13 22:14:37

#21 Reply by ammericandad2005 2019-01-13 23:30:32

#22 Reply by ammericandad2005 2019-01-13 23:31:00

#23 Reply by philip_bennefall 2019-01-14 00:11:44

#24 Reply by jaybird 2019-01-14 01:47:28

#25 Reply by visualstudio 2019-01-14 06:09:04 (edited by visualstudio 2019-01-14 06:19:11)

Posts: 1 to 25 of 53