2014-12-14 19:33:32

Hello,
Ivona Speech Cloud
Is currently free and open for developers.
Basically what it does is you send it text and it sends your application a speech file that you can then play.
The speech files are very small and it is super easy to create an algorithm to fetch new files from the server.
I wouldn't use it on something like a mud, but for most games, this kind of system is fantastic. It is also a totally platform independent system.
Here are the SDKs:
Java
Python
PHP

Last time I checked, only the "other" option was getting through on their customer support.

2014-12-14 19:57:58 (edited by camlorn 2014-12-14 20:01:13)

Sorry to be pessimistic:
It says free for development purposes.  You probably have to buy when at the point of distribution.
If you use it for a majority of your TTS, your offline game can no longer be played offline.
Those with slow internet connections and no local cache built up will have to wait for everything, even for small audio files.  Writing such a local cache is probably about as hard as writing the screen reader API calling code anyway, something which we already have multiple libraries for.  Coders that want to develop for it will need access to web apis, so it's easy in Python but not much else commonly used for audiogame development (primary candidates for difficulty include C++, probably VB6, and, possibly, BGT).
In either case, using it and still letting a realtime game go forward is probably going to involve threads if you're a new programmer or twisted if you've been bitten hard by the fact that you used threads all over for something as trivial as your TTS.  In either case, you end up with something much more complicated.
So interesting, probably good for web sites or something, but otherwise not worth it for games.

My Blog
Twitter: @ajhicks1992

2014-12-14 20:42:58

Well,
I was just holding the files locally to be honest and they were less than 1 MB for over 100 menu options and notifications.
I think it would be totally worth it for games like Traders of Known Space, Park Boss, Side-scrollers and other games that just find speech kind of a pain.
There are also something like 40 languages, so multilingual support is something to consider.

I actually think I'm using it a little differently than they thought it would be.

The way their payment structure works is based off the amount of requests made and does not take into consideration that I would just wish to save and distribute the audio files.

Alternatively, you could create an option to download all the speech files for the voice of choice. There are like 16 English voices, so someone is bound to find one they like.
IMO it is much better than SAPI in many circumstances.

2014-12-14 20:47:57 (edited by frastlin 2014-12-14 20:51:04)

Also,
you use requests as it uses the Amazon s3 API.
Here is the code for a simple speech player using pygame and their SDK:


#coding: utf-8
#Is a sound player
#Our imports, from mostly the ivona python SDK
import pygame, time
from ivonaspeechcloud.client import SpeechCloudClient
from ivonaspeechcloud.const import METHOD_POST
from ivonaspeechcloud.inputs import Voice, OutputFormat
from credentials import access_key, secret_key

#Our constants, for Ivona:
language = "en-GB"
text = "I am very happy"
file = "tmp/test1.ogg"

#Starting pygame
pygame.mixer.pre_init(22050,-16, 2, 400)
pygame.mixer.init()

#Our function for creating speech
def speech():
    """Will produce a file with the selected text from above and in the voice above"""
    client = SpeechCloudClient(access_key, secret_key)
    res = client.create_speech(text, output_format=OutputFormat(codec="OGG"), voice=Voice(language=language), method=METHOD_POST)
    with open(file, 'wb') as f:
        [f.write(chunk) for chunk in res.chunks]

if __name__ == '__main__':
    speech()

    pygame.mixer.music.load(file)
    pygame.mixer.music.play()

    time.sleep(3)


Now you have an ogg file on your computer that you can play over and over again without using the internet.

2014-12-14 21:53:43

I didn't say this was hard in Python.  I know about requests, and I'm not surprised that they provide an sdk (which you seem to be importing-maybe these are your own packages).
But seriously.  Unless you're one of the very tiny number of games that will literally never need to say a number, ever, don't.  We have accessible_output2 and universal speech. BGT already provides these functions.  That covers every language I can think of.  The code needed to do it is about the same size.  You lose a few MB off your download.  You work with my speech settings and, considering who you think this is good for, you're probably doing absolutely nothing original with your speech which necessitates having the samples anyway.  Given that the code is literally just as simple, you have no excuse to not just go through the screen reader which is, really, the superior option.
As for downloading the entire voice and concatenating, this is actually harder to do than you think: a good concatenative synth is going to handle stuff like applying intonations, something which their API is doing for you.  If they provide an option to get the voices locally, something which I'm almost sure is the case as I've seen this company mentioned other places, it's going to cost.  It will also be some sort of library that does some additional processing.  Anything you can come up with in the way of getting around their rates by synthesizing bits yourself will not be better than whatever you would get by calling the screen reader, though it might be better than Windows 7 SAPI.  To even get to the point of trying arbitrary concatenative synthesis properly, you need to somehow cut the files into phonemes-this is normally done with manual editing and will take a while-and then arrange for every single pair to concatenate without clicking-which takes more manual editing and maybe some filters.  And you still don't have the pitch down intonation on periods, natural comma pauses...
Games which are "having trouble with speech" have no real excuse unless they're written in something so niche it can't call a C dll.  I can't think of an example.  I really fail completely to see what this fixes.  If your game relies on it in an online manner, i.e. you don't prefetch everything before distribution, you're also going to be depending on that site remaining up.  The use case for software like this is web sites and other stuff where you're already online anyway, not game development and not applications for which you've already got a perfectly good offline solution.  If you want to pull stuff for cut scenes, there's a number of services that provide voice acting at something that's reasonably affordable, and that's certainly better quality than this (they can, at least a bit, capture your emotions.  But no promise as to the talent beyond that).  To be honest, I'm failing to see the utility for anything outside the browser, and can't offhandedly think of a good example for something inside the browser, either.

My Blog
Twitter: @ajhicks1992