For the voice to text end your developer would, if they are smart use Android Voice (Apple has similar) and tutorials are out there if your developer knows to google it (some won't and they are poorer for ignoring this.) Read https://www.androidhive.info/2014/07/android-speech-to-text-tutorial/
So we have the voice to text that your back end can parse and then your code deals with as far as what to reply with and what animation you want on screen.
So yes, all possible but a new programmer? Figure the usual few years if they are not currently coding and writing apps.
I was wondering how difficult it would be and how long would it take for a tech person to create an app with a 3D avatar that is "smart " ? By that I mean an avatar that can talk and respond according to specific questions /situations .
For example , if there was an app with a doctor avatar and after I describe to him my symptoms , he tells me what illness I have . This is purely an example and is not what I'm planning on doing. Thanks a lot !