Concept: Solve the problem of an indoor AR audio tour guide app that cannot access GPS by integrating a Voice UI which allows users to control the experience's progression.
Implementing the Google Dialogflow into Unity turned problematic. I turned instead to IBM Watson, and quickly had better results integrating with their SDK. The possible user responses have to be hardcoded, but using about 5-6 options seems to initially cover what I expect users to say in response to the narrator's prerecorded prompts.
Here's the conversation flow chart.
And here's the video showing the implementation within Unity and the Voice recognition at work.
Unfortunately you cannot hear my user responses since I'm Soundflower recording out of Unity, but you can see in the Console the Speech to Text translation of what I'm saying into my headphone's microphone. You may also notice the sound quality is inferior at the end of the narration when I respond to the User (I used the laptop mic for these short responses as a mockup). Also, it fires twice at the end of the segment, with both the affirmative and negative reply, so I'll need to code that so that only one can fire. Also, next up is get these responses to trigger the next scene, or more likely integrating a manager script that keeps all the experience's chapters in one Unity scene and just swaps out jpegs, wavs, etc.
Wrap-up: Voice Interfaces was a great class, and I got a lot out of the level of design thinking and discussions within the class as well as from the lectures. It's a very interesting area to learn about, and as digital assistant voice synthesis gets more and more believable, it will be an increasingly important area to research. Happy to have been introduced to the history and landscape of Voice UI.