That is, no known system is flawless. Even OCR is at best one error per page of text and speech to text seems to be one error per line at times (I'm leaning to the worst on that sentence.)

So https://www.netguru.co/blog/voice-recognition-tools-review kicks around the current front runners but if I was on the team my nod would be Google's cloud api and code + MONEY to do the work. I think in your case, Nuance is your winner.

You won't be happy since this is not 100% accurate no matter which you use.

Then again the OLD SCHOOL METHOD was a transcription service where you pay by the word or page.