2018-10-13 03:15:00

I'm going to post the email I got below. I'm kind of curious what this will mean for screen reader users, i.e., if we might be able to integrate this into NVDA or something similar.

Lyrebird wrote:

Today, we’re excited to share some updates from Lyrebird!

First, the beta version of our Vocal Avatar API is now open to everyone! Our API gives you the ability to integrate your users’ voices with your applications. Click here to learn more. To get started with integrating our API in your application, take a look at our documentation. And if you have any questions, you can reach us in this Slack community.

In addition, our new Slack integration lets you send voice messages in Slack using your Lyrebird Vocal Avatar! You can give it a try here. You can also fork the project on Github to learn how we used our Vocal Avatar API to create this Slack integration.

If you use the API to create an application that you think we’ll love, let us know! We’ll be featuring the best apps on our website to share them with all of our users.

If anyone wants to add me on Skype, it's garrett.brown2014.

Thumbs up

2018-10-13 05:57:01

You know, maybe its just me but... At a time when data theft, user exploitation, scams, social engineering, harassment, swatting, doxing, hacking, identity theft, deepfakes, and privacy violations are becoming a rampant daily occurance at ever increasing scales, I can't help but feel that putting even more sensitive biometric data out there for people to horde/steal/trade/abuse isn't a good idea.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

Thumbs up

2018-10-13 06:10:22

@magurp244 That might be true. However, it's pretty easy to tell that lyrebird voices are fake, even to people who don't have an ear for that sort of thing. Besides, even if technology came out to make the lines blurred, I seriously doubt such technology would be released into the public

If anyone wants to add me on Skype, it's garrett.brown2014.

Thumbs up

2018-10-13 09:07:07

Well, the quality of synthesis can depend on the number and quality of samples provided, and techniques for parsing those datasets are improving all the time. As for the technology being available, alot of these are based off publicly available Machine Learning API's, so that ship has already sailed. Just recently Baidu released a research paper on their latest [Deep Voice Cloning] techniques, and its already being integrated into the open source [Deep Voice 3] github repo which uses PyTorch, so... yeah.

Also, if anyone wants to play around with python based deep voice synthesis, have you heard of Deep Voice 3? Heh.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

Thumbs up

2018-10-13 12:06:32

I'm sorry to say, but those examples sound noticeably bad, and definitely don't pass the intelligible synth test if you ask me.

If anyone wants to add me on Skype, it's garrett.brown2014.

Thumbs up

2018-10-13 18:48:23

let's see someone make a sapi engine or text to audio converter with this API.

watch my brother's twitch stream here:
https://www.twitch.tv/sylvrexe

2018-10-13 20:48:19

Yeah, that would be awesome. Right now it only works with Slack.

If anyone wants to add me on Skype, it's garrett.brown2014.

Thumbs up

2018-10-13 21:08:47

what is slack?

watch my brother's twitch stream here:
https://www.twitch.tv/sylvrexe

2018-10-13 23:58:10

@8, you don't know what slack is? Amazing. smile Its a set of collaborative software solutions for teams. Its got chat, and now voice chat, and a lot of other things. Its proprietary though.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.

Thumbs up

2018-10-14 00:00:43

@Ethin What do you mean by proprietary in this case? Is it only available on windows, or can you not use third party apps or something?

If anyone wants to add me on Skype, it's garrett.brown2014.

Thumbs up

2018-10-14 00:23:06

Slacks proprietary in the sense that its source code isn't available to the public and all data is hosted on Slacks cloud servers.

As to Deep Voice, perhaps its a matter of subjectivity but I can interpret what their saying and their accent, though there is some distortion i'll grant you that. Deep Voice is only one particular example however, and what makes it more remarkable is how little training data it requires to generate a reasonably good voice copy, 3.7 seconds.

Machine Learning is still a very new field, and this kind of speech synthesis only within the past few years. There are a number of different API's around that have different properties and take different approaches with different results, for example Googles [Wavenet Library]. That particular library produces audio that can be remarkably difficult to tell the difference from a naturalized human voice, but the trade off is the amount and quality of the training data required to generate those results, such as 20 minutes or more of speech, coupled with time such as thousands of iterations over hours.

Similarly there's [Char2Wav] thats also quite good, but also requires a great deal of data and training time. At this point its not necessarily that they can't already generate human capable voices, so much as doing it efficiently and to produce as few artifacts in articulation and pronounciation as possible over a given range. For most practical purposes though it doesn't matter if the voice is perfect, only that its convincing.

There's more research articles and videos I could pull up if your interested, like the latest Wavenet Tacotron article [here] that has some cool audio samples that demonstrate the voice quality based on how much training data its fed, along with encoder-decoder training. Wavenet and Char2Wav are both open source by the way.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

Thumbs up

2018-10-14 01:23:42

I stand corrected, they can and will put that tech into the hands of anyone which is kind of a scary thought. I've heard samples of wavenet and tacatron before, but I've never heard those particular samples before. Wavenet sounds like very interesting experiment material, actually.

If anyone wants to add me on Skype, it's garrett.brown2014.

Thumbs up