It has been a somewhat interesting coincidence that I am currently without voice because of a cold, and OpenAI has just released some really good Text-to-Speech voices with their Create speech API. So in preparation for a meeting today, I created a little script that will output the spoke audio what I typed.
Since the voice will read exactly what’s there, I added a spell fixer that will (through ChatGPT) automatically fix typos before it’s sent to the audio API.
$ php talk.php
Voice: echo
Fix spelling: off
Speed: 1.0
> hi everyone and welcoem to tis meetin
> sc
Fix spelling: on
> hi everyone and welcoem to tis meetin
Hi everyone and welcome to this meeting.
> s2
Speed: 2
> my voice is gone because of a pretty string cold that iv pickd up
My voice is gone because of a pretty strong cold that I've picked up.
> s1.1
Speed: 1.1
> my voice is gone because of a pretty string cold that iv pickd up
My voice is gone because of a pretty strong cold that I've picked up.
> turns out, even suing the streaming audio aip, typing and then waiting for the srsult is too lsow for a conversation. but it's been interesting
Turns out, even using the streaming audio API, typing and then waiting for the result is too slow for a conversation. But it's been interesting.
> sc
Fix spelling: off
> without spell fixer it's faster but for good intonation it only makes sense to send full sentences, not single words as soon as they have been typed. maybe that can also be solved, but that's for the next experiment
In any case, it’s been fun. Thanks Simon for highlighting the API.