2019-01-30 19:42:52

Hey folks,

In my continuing efforts to build out a Rust-based audio games infrastructure, I'm trying to add SAPI support to my TTS crate. Rust has a winapi crate, so this is definitely possible, but the winapi docs rightfully send you to MSDN for individual module documentation, since the Rust API just wraps the C/C++ APIs.

Unfortunately, SAPI seems royally undocumented, to put it mildly. I found some SAPI code samples, but they all assume deep knowledge of Windows' own APIs, which I don't have. I.e. they all create an entire UI to speak entered text, when all I want is a basic "hello, world."

Does anyone have, or can anyone link me to, a simple SAPI example that just speaks some text through the default audio output? I'm sure I can figure out the finer points from that, but a basic example would tell me where to look in the Rust winapi crate docs. Then I can wrap the complexity behind my simpler cross-platform TTS interface. C/C++ would be great as it's closer to what Rust gives me, but I can probably take and learn something from code in any language provided it actually uses SAPI and not some other higher-level wrapper.

Thanks.

2019-01-30 21:19:22 (edited by Ethin 2019-01-30 21:22:59)

Can't you just use Tolk? Or does the Tolk crate not allow you to do that? That's how I've always done it.
Update: your right. Searching for TTS through SAPI, I've found examples in .NET and other platforms and using other services, but never through C itself. I know Tolk can do it -- it does it, so below I've pasted the code, from tolk/src/ScreenReaderDriverSAPI.h and tolk/src/ScreenReaderDriverSAPI.cpp:
tolk/src/ScreenReaderDriverSAPI.h:

#include <sapi.h>
#include "ScreenReaderDriver.h"

class ScreenReaderDriverSAPI : public ScreenReaderDriver {
public:
  ScreenReaderDriverSAPI();
  ~ScreenReaderDriverSAPI();

public:
  bool Speak(const wchar_t *str, bool interrupt);
  bool Braille(const wchar_t *) { return false; }
  bool IsSpeaking();
  bool Silence();
  bool IsActive() { return (!!controller); }
  bool Output(const wchar_t *str, bool interrupt) { return Speak(str, interrupt); }

private:
  void Initialize();
  void Finalize();

private:
  ISpVoice *controller;
};

Tolk/src/ScreenReaderDriverSAPI.cpp:

ScreenReaderDriverSAPI::ScreenReaderDriverSAPI() :
  ScreenReaderDriver(L"SAPI", true, false),
  controller(NULL)
{
  Initialize();
}

ScreenReaderDriverSAPI::~ScreenReaderDriverSAPI() {
  Finalize();
}

bool ScreenReaderDriverSAPI::Speak(const wchar_t *str, bool interrupt) {
  if (!controller) return false;
  DWORD flags = SPF_ASYNC | SPF_IS_NOT_XML;
  if (interrupt) flags |= SPF_PURGEBEFORESPEAK;
  return SUCCEEDED(controller->Speak(str, flags, NULL));
}

bool ScreenReaderDriverSAPI::IsSpeaking() {
  if (!controller) return false;
  SPVOICESTATUS status;
  // The second parameter to GetStatus() can be NULL,
  // suppress warning when compiling with /analyze.
#pragma warning(suppress:6387)
  if (FAILED(controller->GetStatus(&status, NULL))) return false;
  return (status.dwRunningState == SPRS_IS_SPEAKING);
}

bool ScreenReaderDriverSAPI::Silence() {
  if (!controller) return false;
  const DWORD flags = SPF_ASYNC | SPF_IS_NOT_XML | SPF_PURGEBEFORESPEAK;
  return SUCCEEDED(controller->Speak(NULL, flags, NULL));
}

void ScreenReaderDriverSAPI::Initialize() {
  if (controller || FAILED(CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_INPROC_SERVER, IID_ISpVoice, (void **)&controller))) {
    // This is here for symmetry with other drivers
    // and so compiling /analyze won't throw a warning.
    return;
  }
}

void ScreenReaderDriverSAPI::Finalize() {
  if (controller) {
    controller->Release();
    controller = NULL;
  }
}

This is a bit too much, I think. So I think you can abstract it to just:

#include <sapi.h>
#include <windows.h>
ISpVoice *controller;
int main() {
controller->Speak("This is a test", SPF_ASYNC | SPF_IS_NOT_XML|SPF_PURGEBEFORESPEAK, NULL);
controller->WaitUntilDone(INFINITE);
controller->release();
return 0;
}

I don't know if that's correct though, I've never manually invoked SAPI. If yoru going to be invoking SAPI a lot, remove the second line (controller->WaitUntilDone). That blocks the thread until its done speaking, see https://docs.microsoft.com/en-us/previo … 3Dmsdn.10) for details.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2019-01-30 21:26:46

Ah, thanks, never thought to see how Tolk did it. My intent is to support Tolk in addition to SAPI, but this is a generic TTS crate that should be able to work without Tolk.

2019-01-30 22:36:37

Aha. I understand now.

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github