Philip recommended getting the raw audio samples when SAPI speaks and trimming the silence from the start, to help it speak faster.
However, I encountered a problem doing that.
What I have tried is this:
# a script that tries to output SAPI and trim silence.
from win32com.client.gencache import EnsureDispatch
import winsound
from wave import Wave_write
from io import BytesIO
voice=EnsureDispatch("SAPI.SPVoice")
stream=EnsureDispatch("SAPI.SPMemoryStream")
stream.Format.Type=34 # SAFT44kHz16BitMono = 34
voice.AudioOutputStream=stream
while True:
text=input("Enter text to speak")
voice.Speak(text)
bytereader=BytesIO()
wavefile=Wave_write(bytereader)
wavefile.setnchannels(1)
wavefile.setsampwidth(2)
wavefile.setframerate(44100)
wavefile.writeframes(stream.GetData().tobytes())
data=bytereader.getvalue()
while data[0]==0: data=data[2:]
winsound.PlaySound(data, winsound.SND_MEMORY)
There is just one obvious problem. There is no way to empty the stream! Every time you press enter, the previous text is also repeated!
Does anyone know how I can empty it?
If I call stream.SetData(0), then the, it helps, but if I speak a long string, and then a short string, some parts of the long string can still be heard when the short string has finished.
Thanks for any help!