2019-03-26 22:47:16

Hey, I'm working on trying to port a game I'm writing from Linux to Windows. Most of it should work fine since my engine is cross-platform, but TTS is a bit of a sticking point.

I'm using Rust and wrote my own higher-level TTS library that currently supports Linux and the web via Web Assembly. I added support for Tolk yesterday, and while that's fine for screen reader users, the TTS library is meant to be usable by anyone, so I need a native option. Rust seems to give me a couple ways to achieve this. I can use the winapi crate which uses COM and supports everything back to XP. There's also a winrt crate, a grep through whose source reveals what appears to be support for a speech synthesis API. WinRT speech synthesis examples make a lot more sense to me than does COM, for which there's no common Linux analog.

Since there are lots of Windows developers here, I'm wondering if anyone can explain the differences between the COM-based API, which I assume is SAPI, and this seemingly simpler WinRT API that I assume *isn't* SAPI?

I'm also told that using WinRT locks me to at least Windows 8. For those of you writing/selling Windows games, do you have any sense of how much of your audience uses Windows 7?

My choice of one or the other API isn't necessarily final, since my library allows you to select your TTS backend at runtime, and can conditionally compile in backends. I'll almost certainly want to support the WinRT speech synthesis API at some point for when Rust gains UWP support, then I could potentially compile to XBox targets. My question is which to support now, and I don't know how broad Windows 7 use still is.

Thanks.

2019-03-26 23:16:18

Tolk supports SAPI on windows. Use Tolk_TrySAPI/Tolk_PreferSAPI. The difference between these two is that Tolk_TrySAPI adds SAPI at the end of the screen reader queue, which Tolk will try in order. If no screen reader is available, SAPI will be used. Tolk_PreferSAPI does the opposite, encouraging SAPI use over screen readers.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-03-26 23:30:54

Right, I get that Tolk supports SAPI, but it isn't full SAPI support in that it lets you set rate, pitch, volume, etc. unless it just passes through SSML and I can do it that way. It also requires that you download Tolk and correctly place its DLLs/.lib files so your code can find them, and this library isn't just for audio games/screen reader users. I figure we're collectively doing more work with TTS than the average sighted developer, so given that I already support Tolk and am now looking for a native option, I'm trying to get a sense for how many folks I'll abandon if I only support WinRT for now, or if its speech synthesis API is any better or worse than COM/SAPI.

Thanks.

2019-03-27 02:44:48

Having developed on Windows for quite a few years, I suggest going for SAPI first to cover the largest user base and then implementing the WinRT stuff as an extra bonus - I don't know anything about the latter myself. When I implemented SAPI support in BGT about 9 years ago, I had no end of trouble with the SAPI 5.1 SDK so I ended up abandoning it and using the generic IDispatch interface. When I used the SAPI 5.1 SDK directly, a lot of voices wouldn't work at all - never quite figured out why. With IDispatch everything worked much better. One thing to note is that the internal SAPI resampler is horrendous. It kicks in when you switch voices on an existing audio device. I ended up forcing it to generate a memory stream at the native sample rate used by the given voice and then resampling manually afterwards. Even something basic like linear interpolation plus a single lowpass filter will go a long way above the built-in SAPI resampler.

Just my $0.02 in case it helps.

Kind regards,

Philip Bennefall