2012-09-26 01:47:14

It sounds like it was made via Festival...
In which case, technically, it could be made to sing, as Flinger seems to more or less be an expansion on Festival. (Assuming it still exists?)

Ear Ninja?

2012-09-26 01:50:32

Heh, it sounds just so awesome... Lol! I just like such old-sounding robotic synths... big_smile

2012-09-27 02:43:26

Hi all,

I have now updated the initial post with a new link to a sample that I uploaded to Dropbox. I'll be interested to hear your opinions on the differences.

Kind regards,

Philip Bennefall

2012-09-27 07:50:19

Philip, way to go. It does sound quite a bit better than the last demonstration. As you say it does require some more work, but I like it so far. big_smile

2012-09-27 16:52:32

hi, nice try for the voice. however, I still wait for the final result...

2012-09-28 09:24:36

wow. Nice improvements there. How do you make a voice like this better? I thought you recorded lots and lots of sentences which you might have done, but what next? what can you do to improve the speech quallity?

2012-09-28 13:22:04 (edited by musicalman 2012-09-28 13:22:34)

The second voice sounds less jumpy, but needs a pretty big intonation increase in my opinion. It sounds monotone to me. But the intelligibility has improved.

2012-09-28 17:32:35

Yes, the latest version does sound less jumpy, but I must agree with Ray that it does have next to no inflection. I think the pitch should be raised slightly, not lowered, but that is just my personal prefference.

I can't wait to hear the next update! I would also be interested to know how you go about improving the overall quality of the voice.

2013-05-09 16:34:27

Hi Philip!
If I listen very close, it sounds pretty much like your voice, I mean the second recording. I couldn't believe that it was your "voice" speaking. lol How did you make it? I may have missed something, but I heard something of "Festival", is it the developing kid for a sapi voice?

Keep up the good work!!!

2013-05-09 17:53:48

Since someone necroed this, I've got to ask.  Is this using Phonemes?  Diphones?  Words?
Also, are you blind, Philip?  I'm just curious--everything I found made making a voice like this into a very complex and visual process (open up your audio tools, select this view, and start clipping with the mouse on a sample-perfect basis...).
And for the record,at least the Espeak with NVDA is great.  I wouldn't go to anything else, now.  So, so fast, and yet clear.  Not natural, but clear all the same.
The one on Linux, especially with Orca, sucks: you can't change the prosity/inflection/whatever it's officially called.  one of these days, I'm going to track someone down who can help me reconfigure it all to actually work well and get Libsonic support without disabling audio permanently (oopse, and yay for VM).

2013-05-10 08:26:54

Hello Camlorn,

Using the Festival and Festvox tools, it doesn't have to be a visual process at all. Sure you have to edit some audio, but the editing is trivial and can be automated for the most part, which suits me as I am completely blind and don't want to mess with visual interfaces either. I really just wanted to see how far I could take the voice, and since I am not particularly happy with the end result I have shelved it for now. I'll take it up again once a new version of Festvox is released that can do better prosodic models, but it's hard to tell when that will be. The method I use is called Clustergen, and creates a statistical model of my voice based on phonemes and diphones when available. So the size and phonetic balance/coverage are vital when you construct your dataset.

As for ESpeak, it's not for me. I only use it when I have absolutely no other option or if I am feeling particularly masochistic. Grins.

Kind regards,

Philip Bennefall

2013-05-10 08:31:43

Awww, sorry to hear you've shelved it, I really liked it. smile

2013-05-10 12:30:39

Yeah the second recording was quite well made. Of corse it's not the best ever tts, but it would work. smile

2013-05-10 14:08:57

Well, I love such voices that aren't "the best" as you say. smile Yeah, I'm weird. Heh.

2013-05-10 20:47:06

To be honest, I thought the first recording of them all was the best.  I also now dislike Eloquence and think that the Espeak with NVDA is the best synthesis ever, so...take it with a grain of salt.
I need to look into this again.  if it's using statistics, it could be possible to get the modle to be more accurate by providing more data from different people, or at least to get it sounding more interesting, and I kinda wonder if you couldn't somehow duplicate Eloquence with it by using recordings.
How are you automating audio?  I'm not familiar of any audio analysis and modification scripts, save maybe Nyquist, but that's probably overkill.  The rest are all geared towards music, or so it seems.
If I can get or find a good microphone and a quiet place to record, this could be a lot of fun to play with.

2017-11-14 17:14:28

Hi guys,

After a few years of silence, I picked up this thread again. There have been some improvements in the voice creation tools, and I regenerated the voice with them using the existing dataset. Here's a new recording for those who may be interested:


I plan to rebuild the voice yet again with the latest snapshot of the tools that came out just a few days ago, but that's a project for the weekend when I'm off work.

Kind regards,

Philip Bennefall

2017-11-14 21:47:29

I personally would just wait to see if Lyrebird releases their AI, as in my opinion it sounds a bit better than whatever you're using right now. Still, I definitely see myself downloading it if you made it a sapi voice or if you made it for NVDA. Plus, I don't think Lyrebird will be releasing there AI any time in the near future...

2017-11-14 23:23:57

@shotgunshell I definitely agree that Lyrebird sounds better. My goal is really just to experiment with the Festvox system to see what results I can achieve. I will only release it if I get a voice that I personally consider usable, in which case I could easily make it into a Sapi voice or a DLL for NVDA or whatever other format people might want. I'm doing this in my free time, of which I don't have a lot, so I have no idea when/if I'll have something usable. I'm just playing around and wanted to revive this topic to post the current output.

Kind regards,

Philip Bennefall

2017-11-15 01:19:07

Would you be willing to make it an Android TTS voice one day? I'd be willing to pay for it!

I myself quite like it!

2017-11-15 01:56:26

I'm kind of curious about this app myself, does it have command line parameters?

2017-11-15 02:13:01

Your links don't work.

2017-11-15 03:16:10 (edited by defender 2017-11-15 03:17:14)

Sounds like a fuzzy bucket. :-)
Decent inflection though, and it actually does sound like you.
No noticeable artifacts in the portion I heard either, but that crazy smoother thing that makes it sound fuzzy probably hides them all anyway...
May be the way you wrote it but, it seems kind of, droning, not enough commas? The sentences don't have defined separators, no real inflection change at the ends.

2017-11-15 05:21:25

If it would be as easy as making several concatenative files with different speech samples, and the whole interface was made to be accessible, people could definitely put in the effort to make their own. As for whether or not they want to to sell it...

2017-11-15 09:33:30

The main thing I'm wanting to fix is the inflection, and the endless sentences. I have solutions for both, as well as a few other tweaks I want to do. I'll post another sample when I have it.

The old links don't work, but that version sounded awful. For kicks, here they are for comparison.

The very first link, with a tiny dataset of just 500 recordings:
https://www.dropbox.com/s/upe4x3ckssv5m … 0.wav?dl=1

And the second link, with a few more recordings and a slightly different rendering method:
https://www.dropbox.com/s/0w31xv2h2utvr … l.wav?dl=1

Now that, that's fuzzy for sure. I die a little every time I go back and listen to these. They were made with very very old versions of the tools.

When it comes to platforms, as long as I'm happy with the voice itself, I'm game to try building it for whatever I can get my hands on. But again, I have not decided at all what I'm going to do with it.

Kind regards,

Philip Bennefall

2017-11-15 21:31:54

hay philop,
could you provide a tutorial on how to make a sapi voice using festvox?

