Week 2 : Progress / Expert conversations / Research / ..and Lyrebird Karaoke :)

Who is the target audience for this work:

Teens and adults. NYC tourists or locals.  People interested in AR, sound design, audio walks and Voice UI, and interactive new media in general.

I have decided for the moment to focus my storyboarding on Grand Central Station.  My initial hesitations that this choice may be too cliche or iconic have been allayed because of the following reasons.

  • It's a big enough public space that people should feel free to explore and enjoy the prompts of my audio walk in an anonymously fun and harmlessly voyeuristic way.
  • It offers 100 years of historical anecdotes to pull from for the psychogeographical component.
  • It offers many physical layers and areas to explore the mythogeographical component.
  • It's bustling commuter crowds supply a robust noise level that functions as the input source to explore the DSP musical constraint/transposition component.  And even if I don't include the DSP,  allows me to offer field recordings of quieter times in the space to powerfully contrast the real time noise of the environment.

I've had some really informative conversations with professors regarding my project.

chat with Yotam Mann -

Thinks that Javascript is a viable way to go with the project.  The location-aware element and the audio asset triggering seems pretty good.

This RTC library apparently does very good pitch recognition.

chat with Pedro Galvao Cesar de Oliveira -

Advice - be very careful with location aware audio guides that they don’t fall flat on execution and story.

Can probably work on projects in his Expressive Interfaces: Voice class that will work for my thesis in interesting ways to integrate voice UI control over the experience and maintain a mostly hands free/visual play experience.

Thinks I could potentially build almost all of this in the Google Assistant hosted platform.

chat with Clay Shirky -

had a great brainstorm with Clay..tagged some things here with keyword themes..


What canal st was like 100 years ago, 10 years ago, now… in acoustic storytelling

#InteractiveJournalism #InteractivePodcast
what does this nytimes story about this place sound like in 50 deep fake versions. which one do you hear when you encounter the piece ? play with nature of trust

#ImmersiveTourism #InteractiveLocalism
what is union square like at 4 am, vs 4 pm, etc… you could time/audio scrobble between different times while sitting in the location

#ImmersiveAI #ArtificeOfTrust
what is it like to have 50 variations of AI produced audio narration so that the experiences are very different for different users when they sit and listen to the audio guide.

Adario Fo piece . tried to reinvent idea of the scenario in the sense the actor embodies the intent of the director or playwright

make a series of mp3 recordings that people could play and experience without worrying about the AR location aware stuff. Say which of these did you like most, what were your experiences while doing, etc… and quickly get feedback testing data about which directions to pursue, which kind of story to tell, etc..

use Lyrebird to make the narrator of a piece have special or ironic significance
e.g. Trump reads the declaration of independence if you stand outside Trump Tower in midtown. or maybe more cutting. Trump reads the screenplay/script of Looney Tunes episodes. Or perhaps reads the accounts of his abuse as written by his accusers. Stormy Daniels, etc. Or Russian files from the Mueller investigation…

Chat with Todd Bryant -

who connected me over email with his friend Noah Feehan whose MIT Media Lab thesis from 2010 explored a very similar area of personalized musical walks.

Noah's thesis has provided some great projects from the last 15 years for me to look into of which I was unaware, and we have a coffee date to meet up.

Chat with Harish from Imagination Agency (my summer internship) -

really likes the idea of location aware audio guides.  Thinks that there needs to be an element of seduction or sexiness implicit to the location that you pick, and that the overall packaging of the story and experience is really important.  Echoing what Pedro said in this respect.  Makes sense in that both those folks have backgrounds in creative agency and design work.  Important to keep in mind.

chat with Dr. Roginska, Director of the NYU Music Technology Program and my 3D Audio class professor,

Advice on the technical side of my project was encouraging - "Anything is possible".

She encouraged me to pursue a Unity project.  I think it makes sense to pursue this dev platform since I have experience with it, the experience of working in this environment will be applicable to VR and AR down the road.

She also encouraged ambisonic recordings of the space.  But I think we agree that some static binaural base elements of the hall may be good and fine.  But the ability to ambisonically rotate and experience the space will be the ultimate goal.

Dr Roginska was helpful in advising my scoping of the project with a multiphase paradigm.

  1. First make a geospatially-aware audio experience that involves no processing.  All reverb is pre-processed.  See how the story feels and get the basic triggering of assets working.
  2. Once that is functional (which should be straightforward), incorporate realtime spatial audio processing.  Assume people can hold their phone out in front of them as they explore, thus giving an accurate gyroscopic reading of their orientation.  This will really make the piece convincingly realistic, immersive, and perhaps even magical.

Meeting with Ziv Schneider next week to discuss my approach to story and project.  Last semester, in Ziv's class Alt Docs: Inventing New Formats for Documentary Storytelling, I explored telling a story primarily through immersive sound in a group project with Cristobal Valenzueala and Jenny Lim called Bodega.  Ziv is enthusiastic about my preliminary description of the thesis idea and I look forward to her expert input.

Research progress:

Heading to Grand Central today to scout location, interaction and narrative ideas.  Will also do the Orpheo Audio Tour Guide (don't expect to love this or find much overlap, but need to know what's out there).

Noah Feehans' MIT thesis paper providing some nice examples of similar projects to consider

Immersive Sound by Dr Roginska is supplying some great knowledge on psychoacoustics, the mechanics of our perception of sound, and how to place sound sources within the spatial soundfield to affect behavior modification.

Leonardo Journal of Music (MIT Press) - new issue has some nice audio walk and sound walk descriptions of current work that provides good sense of current landscape.


Played with the Lyrebird AI voice synthesizer.  I'm interested to play with Text To Speech and Speech To Text with a Voice UI component to this project.  This could become an interesting investigation into how to make Voice Synthesis more expressive, and may feature more heavily into my interest to incorporate narrative themes of AI, fake news, coauthorship of experience,  and algorithmically guided modern life.

After some preliminary training tests on the Lyrebird AI voice synthesizer, I set up a nice mic and trained it properly with 50 sentences performed (20 more than their minimum).  It was an interesting experiment with trying to make a voice synthesizer both expressive and musically useful, using what appears to be the best available voice synthesizer that you can train yourself.  While the track feels ironcially comedic and certainly feels a bit sci-fi, I do hear some amount of effective, emotive longing to the lead vocal when put to this song.

I Would Die 4 U (Lyrebird Karaoke)


I looped a D piano note in my headphones while reading the training sentences into the microphone, monotonously performing in unison all the words without natural English language pitch inflections.  Ironically, I had to read as robotically as possible so that the output synthesis would be musically usefully.  In the future I'd like to perform more Lyrebird trainings to build out an arsenal of different notes with this workaround technique so that I can reimagine new songs with more dynamic lead vocal melodies.

I think the classic Prince song takes on an interesting recontextualization here.  In the original, Prince offers to make the ultimate sacrifice for his lover, invoking religious and cryptic poetics throughout.  Here, those lines blend with the uncanny valley of AI generated vocals and my synthesized TTS lead vocal offers perhaps a more nuanced sacrifice - relinquished immortality.

What first came up in Lyrebird conversations with friends Patrick Presto and Alejandro Matamala was the ability to "cryogenically freezing" a loved ones voice before they passed away.  We imagined capturing an ill grandparent's voice via Lyrebird so that their vocal essence would remain (theoretically immortal), perhaps to read future grandkids bedtime stories or recount family stories at holiday gatherings.  The TTS implications of this are potentially wonderful, a nice thought to offset all the negative possibilities of identity forgery and bad actors in this space.

Training set was performed using a Rode K2 through a Universal Audio Apollo interface, edited in Ableton for arrangement of stanzas and rythmic expressiveness.  Downloaded a Prince Karaoke track.  Otherwise all audio essentially straight from Lyrebird's voice synthesizer.