2012-09-22 14:02:25 (edited by philip_bennefall 2012-09-27 02:42:32)

Hi all!

For the last few days, I've been attempting to make a synthetic voice. My goals are:

1. It must not sound as jumpy and irregular as many unit selection voices do, (think Cepstral).

2. It must be lightweight, cross platform, and self contained.

3. It must be understandable, but it is okay if it is not human sounding in nature.

I generated a prompt list with about 1200 phoneticly balanced sentences, and I have recorded 500 of them so far just to see the result. And I thought I'd share it with you. Keep in mind that this is still a very, very! Rough and early draft, so be warned that it is nowhere near the level of quality that I am striving for.

Blastvox, as I have called it for now, has a message to you all:
https://dl.dropbox.com/u/5121962/birth_500.wav

Update: I have now recorded a total of 1178 sentences, and this shows in the quality of the output. Please compare the above sample with the following:
https://dl.dropbox.com/u/5121962/cheerful.wav

I still don't consider it done, but I do feel that it is a significant step forward.

Let me know your thoughts!

Kind regards,

Philip Bennefall

2012-09-22 15:00:33

Philip, its actually not too bad. I think it might sound better if the pitch was a bit lower and the volume of the wav file was a bit hire but it is a pretty decent voice over all. Of course, I'm use to using ESpeak on Linux and it doesn't take much to be better sounding than that. Lol!

Anyway, I for one will be interested in hearing how Blastvox improves. This is something that would come in handy for game developers who wants a default TTS to go with a game project.

Sincerely,
Thomas Ward
USA Games Interactive
http://www.usagamesinteractive.com

2012-09-22 16:23:25

Hi Tom,

Thank you! It's always nice to get some encouragement even when the work is in an early stage and sounds far from good. I am relatively new to Festival, Festvox and Flite, so I will see what I can do regarding the pitch settings. I seem to recall finding a settings file in the voice directory/etc, but I am not sure if I can tweak it as is or if some steps of the build process have to be done over again. I'll definitely let you know how it goes though. In fact, if I may, I might ask for your advice in how to best package the voice for Linux users. I'm not there yet, but I do plan to distribute it if it turns out to be reasonable in the end.

As a matter of fact, one of the reasons why I wanted to make my tts voice in the first place, aside from my personal curiocity, was so that I could package a voice with BGT that users could call upon if they wanted to be sure what the output was going to sound like rather than depending on the user having a good Sapi voice installed. Though of course, we're not there yet either. Smile.

I'll be recording the final 700 prompts on Monday and Tuesday, and then we'll see how much better the statistical model becomes.

Kind regards,

Philip Bennefall

2012-09-22 17:44:44 (edited by blindncool 2012-09-22 17:45:34)

For a rough draft, that's not bad.

“Can we be casual in the work of God — casual when the house is on fire, and people are in danger of being burned?” — Duncan Campbell
“There are four things that we ought to do with the Word of God – admit it as the Word of God, commit it to our hearts and minds, submit to it, and transmit it to the world.” — William Wilberforce

2012-09-22 18:01:07

Pretty nifty. How much will it cost?

2012-09-22 18:44:40

Hi there,

I have not decided at all under which terms I will distribute the voice. It entirely depends on the final quality that I can achieve. If it is significantly better than this I might charge a few dollars, but if it's only a little bit better then it'll be free.

Kind regards,

Philip Bennefall

2012-09-22 19:26:15

I could never get festvox to work in the slightest. I suspect that the fact that my user directory on this computer has a space in it confused Cygwin and made my attempts at working with it even more frustrating than jumping into a Linux-esque shell from scratch would have been on its own.
And that is why I have tried nothing Linux-related since removing the festvox icon from my desktop.

In any case, I like the sound of where this is headed. smile

看過來!
"If you want utopia but reality gives you Lovecraft, you don't give up, you carve your utopia out of the corpses of dead gods."
MaxAngor wrote:
    George... Don't do that.

2012-09-22 20:02:44

I wonder if you could make a version of both.
Have a low quality for general use, thus being the free one, similar to ESpeak.
If however, if folks would like the higher quality version, which of course would go towards making more blastbay games and programs, then they can buy the premium version.

Regardless which path ya take, I know this voice is gonna be one awesome fellow!

2012-09-22 20:38:43

Haha, sounds like a svox male voice.

Thanks much,
Mason

2012-09-22 22:26:31

Like it! big_smile keep it up man, it's awesome.
big_smile

To see a world in a grain of sand, and a heaven in a wild flower.
Hold infinity in the palm of your hand, and eternity in an hour.
William Blake - Auguries of Innocence, line 1 to 4

2012-09-23 02:24:14

Philip, as far as packaging programs goes for Linux it really depends on the distribution or distributions you intend to target. There really isn't a one size fits all solution in terms of package manager. Although, it is possible to install and use other package managers on a distribution other than the one it was designed for.

As you may know there are many brands of Linux called distributions. Usually there is a major distribution and then there are derivatives that use the same package manager and things like that, but perhaps come with different software, default desktop, etc. There is the Debian distributions such as Debian, Ubuntu, Vinux, Sonar, etc which use the DPKG package manager. The Red Hat distributions such as Red Hat, Fedora, Mandriva, Cent OS, and so on all use a package manager called RPM. Arch Linux uses a package manager called Packman. So it really depends on what Linux distribution or distributions you intend to support here.

That said, the majority of the blind Linux users I know usually use Debian based derivatives such as Debian, Ubuntu, Vinux, Sonar,  Trisquel, and so on. In terms of numbers you'd definitely want a DPKG package as that would get most of the blind Linux market, but certainly not all. I know a few Fedora users out there and know of a few people who use Arch. So you can either build extra packages for those systems, or you might use an application like Alien to try and convert your setup packages from one format to the other.

Of course, there is nothing saying you have to use a specific package manager. You could simply write a simple shell script that copies files into the proper directories, sets environment variables, and your done. No hassle figuring out what package manager this or that Linux distribution uses. big_smile

In fact, this is what most commercial software developers end up doing. Cepstral, for example, simply uses a shell script to install their commercial voices to /opt/cepstral, sets up environment variables, and they don't mess with DPKG, RPM, Packman, etc because there are just too many different package formats to possibly support them all. I often do this myself when I write Linux applications as I usually don't know which type of distribution my end users will be running.

Sincerely,
Thomas Ward
USA Games Interactive
http://www.usagamesinteractive.com

2012-09-23 03:25:13

Hi Philip,
Good start.
I took your file and made a stereo version that freed up the middle sound space so targeting sounds could be heard clearer, then I lowered the pitch and sped up the file a bit, then turned it into an mp3.
You can hear it at,
http://www.pcsgames.net/BlastVox_birth_500.mp3

2012-09-23 05:28:01

Philip, erhaps you could include BlastVox in a linux BGT port, if you decide to do that.

“Can we be casual in the work of God — casual when the house is on fire, and people are in danger of being burned?” — Duncan Campbell
“There are four things that we ought to do with the Word of God – admit it as the Word of God, commit it to our hearts and minds, submit to it, and transmit it to the world.” — William Wilberforce

2012-09-23 11:43:47

Just to be clear, Festival voices can work on Windows, provided everything's properly compiled. (I could not compile any of it myself and had to get someone to send me a link to a compiled exe. Which I have not been able to do for Festvox.)

看過來!
"If you want utopia but reality gives you Lovecraft, you don't give up, you carve your utopia out of the corpses of dead gods."
MaxAngor wrote:
    George... Don't do that.

2012-09-23 15:25:21

Hi philip. I realy like your idea of designing a speech sinthesiser. The preview was realy good for an alpha version you could say.

2012-09-23 17:59:15 (edited by synthesizer101 2012-09-23 18:00:09)

It's OK, but I hope that the quality improves drastically. Of course, I'm used to acapela Heather, which is about as fluent as you can get, but this sounds a bit jumpy. Also, it seems to pause in places where it shouldn't. Check after "all" in the first sentence and "i'll", and after "time" in the last sentence. Are there commas there? The word "world" sounds like it has two sylables. Also, there seems to be too much emphasis on the first sylable in "promise". I agree that dropping the pitch slightly would be nice. If this improves, it would be great to include with bgt to use to output text. Still, if developers use the screenreader output options as well as letting the user choose a sapi voice, it shouldn't be necesary. If this is based on festival, could other festival voices be included?

2012-09-23 18:15:02

So far, only NVDA's got it compiled. Which annoys me!
How in the world do ya get festival to work under distros anyway? I've tried it with Vinux, Ubuntu, no luck!
Installed the right packages, but no dice!
Even uncommenting the festival thing doesn't work for some very odd reason.
Yes, blastvox would definitely! make a wonderful voice for Linux or Windows!

2012-09-23 18:27:49

Hi there,

Oh the intonation is a bit wacked, and so are the pauses. A couple of hours ago I converted the current incarnation of the voice to Flite and ran it on Windows, which actually improved the quality. It also fixed the incorrect pauses, so that now they are only inserted based on punctuation. I was pleasantly surprised. I have about 700 prompts left to record, which should make a great difference. If I am still not satisfied with it, I'll run the text analysis tool in Festival on some more public domain books to get even more prompts to record. Smile.

Kind regards,

Philip Bennefall

2012-09-23 18:28:14

My conclusion is that Festival was created by supervillains with PHDs to taunt the world. Although, taking that to its logical conclusions gets into things it oughtn't...
... Oh, right. Festival and related applications have been my only reason for looking into anything Linux-related. This probably contributes to my unpleasant feelings toward Linux in general, which it should not. Hmm.

看過來!
"If you want utopia but reality gives you Lovecraft, you don't give up, you carve your utopia out of the corpses of dead gods."
MaxAngor wrote:
    George... Don't do that.

2012-09-24 05:49:54

Trenton, part of the problem with getting Festival voices working on  Linux is the sound drivers it uses. Most modern distributions use Pulse for audio output, and if you want older synths  like Festival to work you have to tinker with the configuration to get it to use pulse or route the audio through Alsa OSS emulation. Either way Festival voices will work but it requires some tinkering do to the migration fromAlsa to Pulse. In short, as usual Festival and Flite are lagging behind current Linux development releases.

CAE_Jones, bad experiences with Festival and Flite definitely should not contribute to unplesent feelings towards Linux in general. Linux is a decent OS, and should be judged as a whole rather than just one or two programs that happen to be difficult to work with in the best of circomstances.

For example, there are graphical desktops and GUI applications that are very accessible, some only partly accessible, and others not accessible at all for Linux. Well, its hard to blame Linux for that since Windows applications aren't perfect either. I've found Windows apps that are very accessible, somewhat accessible, and others not accessible at all. The best you or I can do is attempt to judge the OS fairly based on what works, what is accessible, and try not to make unfair comparisons between Windows and Linux when coming at it as a new user with limited experience. Basing an opinion on an entire operating system based on one or two apps is definitely unfair.

Sincerely,
Thomas Ward
USA Games Interactive
http://www.usagamesinteractive.com

2012-09-24 10:55:09

Hi.
well, interesting recording. The voice sounds really bad compared to the goals you have said. I know this is a very rof beta and I don't know much about how to make a voice like this. But I'm interested to see how this project goes. It doesn't sounds bad for the first beta though. Keep up the great work.

Best regards SLJ.
Feel free to contact me privately if you have something in mind. If you do so, then please send me a mail instead of using the private message on the forum, since I don't check those very often.
Facebook: https://facebook.com/sorenjensen1988
Twitter: https://twitter.com/soerenjensen

2012-09-24 13:02:59

Somehow I really like the fact that it sounds so robotic. smile I'd love to hear more of this.

To see a world in a grain of sand, and a heaven in a wild flower.
Hold infinity in the palm of your hand, and eternity in an hour.
William Blake - Auguries of Innocence, line 1 to 4

2012-09-25 23:22:22

Nice voice Philip. What is this done with, model talker? If it is I think you have to finish it in full before you can use it. I'd be interested in making one of these.

2012-09-25 23:32:27

Maybe the synth could have singing capability like dectalk does.

2012-09-26 00:26:04

Where did this fermentive voice come from? It hardly sounds concatenative.

Ulysses, KJ7ERC
She/they
Reedsy