Here's the video from my final Thesis presentation. A lot of work went into this and I appreciate all the conversations and advice I received throughout the process, from ITP teachers, fellow students, and from outside experts.
Had a wonderful time doing some user testing and documenting at Grand Central with Jacob Krupnick as camera man and co-director. Huge thanks to Jacob!!
We were joined by Marco Guarino and Lindsay Daniels, who did an excellent job appearing in the video and being comfortable with our 2-man crew filming them as they moved within the crowded station trying out the audio tour.
I think the edit gives a very nice Teaser-level of detail about the experience and what people can expect to feel and hear when they put on the headphones. Can't wait to polish this up and put it out into the world!
Here's a brief timelapse video documenting a portion of the user tests for both my intro vignette and my sonified train schedule vignette. In the latter, users hear the trains arriving from different directions and look around to hear a sped-up timelapse of Grand Central's morning rush hour schedule set to music.
The tests were really illuminating. Primarily, it was clear that sound design elements work better at slow tempos. For the 3D head tracking feature to be recognizable to users when there is no visual display to indicate where a sound source is located in the virtual geography, it must ring out with a long enough sustain that someone can look towards it and locate it.
How can I leverage the immersive psychoacoustic properties of 3D audio to build a rich and interactive storytelling experience via a location-specific, psychogeographic audio guide of NYC's iconic Grand Central Station?
When you put someone in the center of a story they take more of an interest. By exploring spatialized audio cues and immersive audio design as tools to modify behavior within physical space, my Grand Central story will lead users through a delightful and interactive examination of the urban monument that reveals personal stories, historic moments, and psychogeographic critique.
Even audio guides that feature immersive sound are often limited by their static nature. And while sound is the most inherently immersive of our sensory experiences, overwhelmingly immersive tech focuses on the visual display. While the fidelity of a VR headset experience is high, it lacks freedom of movement and is detached from physical reality.
Sound is popularly seen as a companion to more prioritized visual displays, often sound is an afterthought. Yet spatial audio cues powerfully alter our behavior inside VR headsets to help us wayfind and discover - so why not ditch the HMD and spend longer in a more pleasant mixed reality?
This piece will interest fans of spatial sound design, audio focused podcasts like Radiolab, NYC tourists or history buffs, and those more generally interested in exploring the perimeter of what augmented reality experiences can be.
My mixed-up reality experience will be a headphones-based experience developed in Unity for iPhone, and will allow users to spend much longer inside it than a visually oriented experience would. It will also be more distributable than an immersive theater piece could be, and will explore the UX tools of mixed reality audio display in a location-aware manner that will be new to most users.
The key features will be a base layer of environmental binaural recordings capture in Grand Central station and overlaid monaural audio assets placed virtually into the sound field that users can move towards and away from. Real time audio processing will realistically localize sonic moments and allow users to modify the depth and angle of sounds by walking around.
Historical reenactment will be one poetic component to the experience, as for example when Mary Lee Read, the organist for Grand Central between 1928-1958, played "The Star Spangled Banner" the day after the Pearl Harbor attacks (instead of her normal Christmas carols) and brought the hall full of commuters to a somber, dramatic stand still.
I expect the experience to have a 10-15 minute arc, and will be designed for one user. Poetic prompts from the narrator will encourage playful voyeurism and offer historical/architectural exploration, and perhaps culminate in an interaction with an unwitting stranger. I also hope to include a Voice UI component, that allows the experience to remain entirely in the sonic realm and detached from the visual display after initiating the experience.
There is a magic trick that already exists: listening to prerecorded binaural sound in headphones within the space in which the audio was captured. It is at once naturalistic and surreal, delightfully obscuring the distinction between past and present and challenging our brains to encode/decode the psychoacoustic aural cues that we hear with what our eyes perceive. In this moment of blended reality there is a storytelling opportunity that combines a podcast-style narrative with a guided historical walking tour.
Much of the research into soundscapes, acoustic ecology, and psychogeography has been conducted already in Marina Zurkow's Temporary Expert class. Magic Windows with Rui Perreira and Project Dev Studio with Danny Rozin familiarized me with Unity and the different SDK's and map API's that can serve my project. I'm also currently in Dr. Roginska's 3D Audio class at the Music Technology program, which will be a major influence on my thesis work.
Let me site some things that I love which drew me to this area of focus. Principally, Janet Cardiff's Central Park audio walk "Her Long Black Hair" and Alter Bahnhof video walk, the binaural audio one-man show on Broadway "The Encounter," derivés and the work of the psychogeographers (i.e., Guy Deboord's writing, Patrick Keiller's films), and immersive podcasts such as Radio Lab.
For the past 10 years I've dedicated my work to one thing - creating amazing experiences with music. As my performances have evolved to include increasingly complex technical choreography, I've become more interested in adding visual layers to the sounds of the musicians. Mostly this has focused on editing video content to project behind the performers, adding metalayers of meaning to the work.
At ITP my projects have often positioned the user as coauthor within a malleable soundscape, and this thesis project aspires to leap from those mediums of screen and visual display into the realm of the invisible and naturalistic.
Some experts I've consulted with already:
Marina Zurkow, ITP - Temporary Expert teacher from last semester and sound walk author.
Luke Dubois, IDM - currently doing an Independent Study with him, will assist on the musical and location specific considerations.
Dr. Agnieska Roginska, NYU Music Technology - currently taking her 3D Audio course and will consult directly via the considerations related to this critical element of my project.
Charlie Mydhal and Mark Cartwright, post-Doc researchers at NYU CUSP's SONYC machine learning noise pollution project - in conversation with these gentleman
Rui Pereirra, ITP - Took his Magic Windows and Mixed Up Realities last semester, and am in conversation with him about the best ways to implement location aware Unity applications in my project.
TK Broderick, ITP - took his Immersive Listening course and continue to converse with him about spatial audio implementation and Unity.
Jean-Luc Cohen, NYU Music Technology - took his Software Synthesis last semester, and will continue to meet with him regarding digital signal processing and procedural audio within Unity.
First off, what I made: An ambisonic sketch of street noise outside grand central (where the experience will begin). Playing with an audio skybox with the Google Resonance spatial audio SDK in Unity. Not blown away by the headtracking, will news investigate further how to optimize. And need better realistic reflection off the interior spaces.
Then did more scouting during my Museum Hack-led Grand Central Scavenger Hunt this weekend. Side note – didn’t get a huge amount of useful info from this experience, but it was nice to explore the space more and find some new nooks and crannies.
Below are notes on the conversations I had this week. They helped me solidify my scope and vision.
I’ve since re-embraced an original idea of this project, which is to complete a trio of experiments into immersive aural storytelling that explore the relationships between sound and physical space.
The Unity build audio guide of GCT is one, and still the main.
Another will be an extension and reinterpretation of a Max project I made using different struck notes from a xylophone to control projection mapped video, but will be now involving voice synthesis audio ( a la my week 1 work) and video recording of faces to create a circularly structured sound poem that relates to sound and space and memory. Theoretically it could be viewed as a speculative installation piece for Vanderbilt Hall in Grand Central station.
The third will be an extension of my Ambient Met mobile app, but now pursued in ARkit. Also a Unity build designed for the main concourse of grand central, it will be a spatial musical AR interaction involving visuals. A 3D, AR version of Ambient Met or my Sound Objects project, with new sounds that appear to exist with acoustic realism inside of the main hall at GCT.
Chat w/ Dan Sheehan of SiegelVision agency
historical moments he likes
the franklin roosevelt train tunnel underneath waldorf astoria / warhol threw parties there
paparazzi moments when marilyn monroe and others would disembark at gct
edison bulbs outside on awnings were a show off move
fisher king scene – magical moment, check out
grand central partnership – they might be interested in working with me
Chat w/ Mike Dory, Design Expo class teacher and Googler
skateboarding in the city
storefront for art and architecture
sound lab work from @Media Lab
check out cooper hewitt accessibility exhibit — Which I immediately did. it was pretty good. The sound design exhibit was not that helpful, but the accessibility exhibit was very nice. Ultimately, I think it is too much to include in the scope of my project.
kate hartman – check her ITP thesis
– exploring emotions
allison parrish thesis presentation.
if i were leading this i’d scope 4 months, 4-8ppl
scope down — Oh, Ok.
maybe do small part of it, and then explore other options — Yes, I will. Back to original inspiration.
have fun, it’ll be ok.
Chat w/ Ziv Schneider, Alt Docs professor
consider the no-geo spatial location dumb version
look into gps integration with unity
voice ui – cool! never seen before
further research this week:
— Something like this could be great for the app. Where an -on click- reveals what it sounds like below your feet. In the train tunnels, etc.
Weeks 4-9: Design and Production
4: Finish Grand Central Historical Research + Create demo Binaural + Monaural Asset – test out Option A (no-Geospatial awareness, but has realtime processing
5: Unity Implementation – Option A
6: Implement and refine storyboarded flow of scenes in Unity
7: Asses and pursue if feasible Option B (Geo spatial awareness and/or a Voice UI component)
8: Mini user test / Final Asset creation / Unity Dev
9: Refine visual components and refine/optimize sound spatialization processing
Weeks 10-12: Testing and Refining
10: User Test 1 / Cast and Schedule 2-3 Actors
11: Big Refine
12: User Test 2 / Subtle Refine
Weeks 13-15: Documenting and Presentation Practice
13: Video Shoot in Grand Central, B-Roll Asset creation
14: Editing video / Create Slide Deck / Practice Presentation
15: Practice Presentation
Notes from walking around the space with Director and friend Jacob Krupnick:
a possible entrance location for the piece.
paris opera house styled balcony staircases. conjure the sounds of an opera here?
try placing musical stems around the hall. a choir, organ, strings, horns... walk towards them and live mix with your location.
include a musical scene something like Janet Cardiff's 40-Part Motet, where you can walk into the midst of a group of sound sources in harmony.
the people walking up high in the glass windows are mta employees, jacob says. not open to the public unfortunately, must be a great view. but perhaps they're omnipresent enough to include a reference to them in the narration.
dank air in the vanderbilt 42nd street overpass. feels different here.
grand chandelier lights. could aurally glow with resonant hum when you walk underneath.
orphaned balloons on the ceiling. kids lost their prizes? how common are these balloons? can i include and hope they are there for future users of my experience?
you can freely walk out onto the train platforms. catch glimpses of people just before their departures. imagine you're the one about to leave..
previously people played on a squash court and visitors could watch, says jacob.
a guy down on his luck sings under the boardwalk on the train (could be in a hallway)
the iconic clock as a meeting place for visitors and commuters - some story element must happen here.
I was very inspired by the Cornel Box assignment, and the chance to imbue metaphor and theme onto a theatrical diorama. A 3-dimensional mood board, apropros of my thesis area of focus. I included a photo and a drawing of Grand Central's main hall and a cutaway elevation view to suggest my ideas of voyeurism and psychogeographic storytelling. Including some images of Jetson's style robot drawings adds the themes of an AI tour guide and the digital assistant conversational UI/voice recognition which I plan to include in the experience. And of course, suspended in the center of it all - a 3D printed ear. Thanks to Or Fleisher for the ear!
Storyboard for the Grand Central Station concept.
Who is the target audience for this work:
Teens and adults. NYC tourists or locals. People interested in AR, sound design, audio walks and Voice UI, and interactive new media in general.
I have decided for the moment to focus my storyboarding on Grand Central Station. My initial hesitations that this choice may be too cliche or iconic have been allayed because of the following reasons.
- It's a big enough public space that people should feel free to explore and enjoy the prompts of my audio walk in an anonymously fun and harmlessly voyeuristic way.
- It offers 100 years of historical anecdotes to pull from for the psychogeographical component.
- It offers many physical layers and areas to explore the mythogeographical component.
- It's bustling commuter crowds supply a robust noise level that functions as the input source to explore the DSP musical constraint/transposition component. And even if I don't include the DSP, allows me to offer field recordings of quieter times in the space to powerfully contrast the real time noise of the environment.
I've had some really informative conversations with professors regarding my project.
chat with Yotam Mann -
This RTC library apparently does very good pitch recognition.
chat with Pedro Galvao Cesar de Oliveira -
Advice - be very careful with location aware audio guides that they don’t fall flat on execution and story.
Can probably work on projects in his Expressive Interfaces: Voice class that will work for my thesis in interesting ways to integrate voice UI control over the experience and maintain a mostly hands free/visual play experience.
Thinks I could potentially build almost all of this in the Google Assistant hosted platform.
chat with Clay Shirky -
had a great brainstorm with Clay..tagged some things here with keyword themes..
What canal st was like 100 years ago, 10 years ago, now… in acoustic storytelling
what does this nytimes story about this place sound like in 50 deep fake versions. which one do you hear when you encounter the piece ? play with nature of trust
what is union square like at 4 am, vs 4 pm, etc… you could time/audio scrobble between different times while sitting in the location
what is it like to have 50 variations of AI produced audio narration so that the experiences are very different for different users when they sit and listen to the audio guide.
Adario Fo piece . tried to reinvent idea of the scenario in the sense the actor embodies the intent of the director or playwright
make a series of mp3 recordings that people could play and experience without worrying about the AR location aware stuff. Say which of these did you like most, what were your experiences while doing, etc… and quickly get feedback testing data about which directions to pursue, which kind of story to tell, etc..
use Lyrebird to make the narrator of a piece have special or ironic significance
e.g. Trump reads the declaration of independence if you stand outside Trump Tower in midtown. or maybe more cutting. Trump reads the screenplay/script of Looney Tunes episodes. Or perhaps reads the accounts of his abuse as written by his accusers. Stormy Daniels, etc. Or Russian files from the Mueller investigation…
Chat with Todd Bryant -
who connected me over email with his friend Noah Feehan whose MIT Media Lab thesis from 2010 explored a very similar area of personalized musical walks.
Noah's thesis has provided some great projects from the last 15 years for me to look into of which I was unaware, and we have a coffee date to meet up.
Chat with Harish from Imagination Agency (my summer internship) -
really likes the idea of location aware audio guides. Thinks that there needs to be an element of seduction or sexiness implicit to the location that you pick, and that the overall packaging of the story and experience is really important. Echoing what Pedro said in this respect. Makes sense in that both those folks have backgrounds in creative agency and design work. Important to keep in mind.
chat with Dr. Roginska, Director of the NYU Music Technology Program and my 3D Audio class professor,
Advice on the technical side of my project was encouraging - "Anything is possible".
She encouraged me to pursue a Unity project. I think it makes sense to pursue this dev platform since I have experience with it, the experience of working in this environment will be applicable to VR and AR down the road.
She also encouraged ambisonic recordings of the space. But I think we agree that some static binaural base elements of the hall may be good and fine. But the ability to ambisonically rotate and experience the space will be the ultimate goal.
Dr Roginska was helpful in advising my scoping of the project with a multiphase paradigm.
- First make a geospatially-aware audio experience that involves no processing. All reverb is pre-processed. See how the story feels and get the basic triggering of assets working.
- Once that is functional (which should be straightforward), incorporate realtime spatial audio processing. Assume people can hold their phone out in front of them as they explore, thus giving an accurate gyroscopic reading of their orientation. This will really make the piece convincingly realistic, immersive, and perhaps even magical.
Meeting with Ziv Schneider next week to discuss my approach to story and project. Last semester, in Ziv's class Alt Docs: Inventing New Formats for Documentary Storytelling, I explored telling a story primarily through immersive sound in a group project with Cristobal Valenzueala and Jenny Lim called Bodega. Ziv is enthusiastic about my preliminary description of the thesis idea and I look forward to her expert input.
Heading to Grand Central today to scout location, interaction and narrative ideas. Will also do the Orpheo Audio Tour Guide (don't expect to love this or find much overlap, but need to know what's out there).
Noah Feehans' MIT thesis paper providing some nice examples of similar projects to consider
Immersive Sound by Dr Roginska is supplying some great knowledge on psychoacoustics, the mechanics of our perception of sound, and how to place sound sources within the spatial soundfield to affect behavior modification.
Leonardo Journal of Music (MIT Press) - new issue has some nice audio walk and sound walk descriptions of current work that provides good sense of current landscape.
Played with the Lyrebird AI voice synthesizer. I'm interested to play with Text To Speech and Speech To Text with a Voice UI component to this project. This could become an interesting investigation into how to make Voice Synthesis more expressive, and may feature more heavily into my interest to incorporate narrative themes of AI, fake news, coauthorship of experience, and algorithmically guided modern life.
After some preliminary training tests on the Lyrebird AI voice synthesizer, I set up a nice mic and trained it properly with 50 sentences performed (20 more than their minimum). It was an interesting experiment with trying to make a voice synthesizer both expressive and musically useful, using what appears to be the best available voice synthesizer that you can train yourself. While the track feels ironcially comedic and certainly feels a bit sci-fi, I do hear some amount of effective, emotive longing to the lead vocal when put to this song.
I looped a D piano note in my headphones while reading the training sentences into the microphone, monotonously performing in unison all the words without natural English language pitch inflections. Ironically, I had to read as robotically as possible so that the output synthesis would be musically usefully. In the future I'd like to perform more Lyrebird trainings to build out an arsenal of different notes with this workaround technique so that I can reimagine new songs with more dynamic lead vocal melodies.
I think the classic Prince song takes on an interesting recontextualization here. In the original, Prince offers to make the ultimate sacrifice for his lover, invoking religious and cryptic poetics throughout. Here, those lines blend with the uncanny valley of AI generated vocals and my synthesized TTS lead vocal offers perhaps a more nuanced sacrifice - relinquished immortality.
What first came up in Lyrebird conversations with friends Patrick Presto and Alejandro Matamala was the ability to "cryogenically freezing" a loved ones voice before they passed away. We imagined capturing an ill grandparent's voice via Lyrebird so that their vocal essence would remain (theoretically immortal), perhaps to read future grandkids bedtime stories or recount family stories at holiday gatherings. The TTS implications of this are potentially wonderful, a nice thought to offset all the negative possibilities of identity forgery and bad actors in this space.
Training set was performed using a Rode K2 through a Universal Audio Apollo interface, edited in Ableton for arrangement of stanzas and rythmic expressiveness. Downloaded a Prince Karaoke track. Otherwise all audio essentially straight from Lyrebird's voice synthesizer.
From R. Murray Schafer's The Soundscape; Our Sonic Environment and the Tuning of the World.
#Soundscape #AcousticEcology #Indeterminism
“Cross cultural evidence from around the world must be carefully assembled and interpreted. New methods of educating the public to the importance of environmental sound must be devised. The final question will be: is the soundscape of the world an indeterminate composition over which we have no control, or are we its composers and performers, responsible for giving it form and beauty?”
Are we the composer? is he world indelterminate composition? can I use acoustic ecology recordings and immersive sound design to beautify parts of the city's soundscape?
Apollonian vs Dionysian approach to music.
Dionysian - subjective, passionate, irrational, uses expressive devices (tempo, timbre, etc), romanticism.
Apollonian - exact, serene, mathematical, harmony of the universe, physics, transcendental visions of Utopia and Harmony of the Spheres.
PSYCHOGEOGRAPHY vs. MYTHOGEOGRAPHY
from website for Phil Smith (Crabman)'s Mythogeography
Psychogeography arises as one of a set of ideas and practices developed by the International Lettristes (who later gave birth to the situationists), a study of how places affect the psychological states of those who pass through them. With a reciprocal meaning: that the places might be changed in order to change the experiences and mental states of their residents and visitors. This was part of a theory of radical activism for the transformation of cities.
In the UK the concept of psychogeography was detached from activist meaning and reconfigured as a literary practice in the work of writers like Iain Sinclair and also gathered some occult trappings during this time from Sinclair, Peter Ackroyd and others.
Mythogeography describes a way of thinking about and visiting places where multiple meanings have been squeezed into a single and restricted meaning (for example, heritage, tourist or leisure sites tend to be presented as just that, when they may also have been homes, jam factories, battlegrounds, lovers' lanes, farms, cemeteries and madhouses). Mythogeography emphasises the multiple nature of places and suggests multiple ways of celebrating, expressing and weaving those places and their multiple meanings.
Mythogeography is influenced by, and draws on, psychogeography – seeking to reconnect with some of its original political edge as well as with its more recent additions. While engaging seriously with academic discourses in areas like geography, tourism studies and spatial theory, mythogeography also draws upon what Charles Fort might have described as ‘the procession of damned data’. So, occulted and anomalous narratives are among those available to mythogeography, not as ends in themselves, but as means and metaphors to explain, engage and disrupt.
"...strolling in the cracks in the pavement and a means to walking out on the Spectacle."
"...addresses the means, uses and consequences of 'walking sideways', of deploying the ordinary act of walking as a lever to prise the lid off everyday life."
From a super important writer at the New Yorker or NYTimes...
Scott Reitherman’s new project _____ plunges participants into an aural experience of New York City in ways that charge the stories with a spatially dynamic liveliness. Building on traditions of psychogeographic derivés, sound walks, and historical neighborhood tours, Reitherman’s project transforms the pedestrian experience into anything but that.
We are accompanied through an aimless stroll of the city with Reitherman as our ghostly narrator, who - through the use of binaural audio recordings - appears at times to walk beside or in front of us, telling us stories of the characters behind the local businesses as we pass them or reading pieces from the New York Times of location specific relevance.
We are prompted to consider classic themes of urban life like voyeurism and loneliness and alter how we perceive our own exteriority within public space via a variety of playful interventions (e.g., “For the next 30 seconds, pretend you are a spy”).
Furthermore, the noise of the street is passed through our headphone mic and spun out in real time to our ears as music, constrained to different keys or moods depending on which zone of the city you’re in. On Houston and Broadway I felt the musical backdrop change to a chaotic, anxious minor chord, yet when I wandered into Central Park I felt the generative score melt into a placid and calming backdrop. Best of all, your experience is guaranteed to differ: particularly depending on time of day, location, and random sonic chance (e.g., car horns, ambulance sirens, people yelling at each other).
These provocative and hi-fidelity aural interventions are impactful, and we are left never quite knowing what is real and what is prerecorded sound. The pacing of our guide’s footsteps subtly pace the experience, and conversations pass by your head that may or may not belong to the humans on the street with you. I found my attention drawn towards other observations as well when I removed the headphones. For example, I’ve never been made to consider just how loud and noise polluted this city is until I took this walk with Reitherman.
Along the way we move through cinematic scenes and stories - almost like an interactive podcast - and as we are encouraged to wander freely and without destination our personal acoustic adventure unfolds. The adventure lasts as long as you like, and if you want you can mute the narrator and just enjoy your personalized soundtrack. If you want to feel more co-authorship, there's a map that indicates where to find new narrative easter eggs, subtly gamifying the walk in a way that suggests an acoustically focused spin on part of what made Pokemon GO such a success.
Yet crucially, I found myself mostly unconcerned with seeking out moments on the map; they always found me. The intimacy of this mixed reality experience is impactful: before long you are walking, breathing and wondering as one with Reitherman. It’s a simple and beautifully invasive trick he has cast that will delight fans of podcasts, ambient music, and augmented reality, in ways that are refreshingly focused not on the visual but the powerful immersive storytelling qualities of the aural.
Immersive Acoustic Storytelling -
I will make a series of investigations into immersive acoustic storytelling to try and understand how sound design can tweak our perceptions of the world that surrounds our ears and function as a powerful cognitive primer. Sound is in many ways the most immersive of our sensory experiences already, yet we so place focus on the visual in tech when we try to dazzle, transport, and tell stories with technology. This will be a continuation of the projects I’ve been making at ITP, and will experiment with spatialized sound design for mixed reality, location aware audio guides, and manipulation of the soundscapes of New York City’s interior and exterior spaces.
For instance, how can I translate noise pollution into beautiful musical characters? How can I use immersive technologies to let someone improvise musically? And can that improvisation be captured and thus become a compositional tool (i.e. a Unity C# script that exports collision events to MIDI data for use in a DAW)? Can i alter the quality of the sound of the inner monologue inside my head? How can i manipulate moments to be calm and delightful, or overlay contextual information that provokes someone to take a deeper look at the space they inhabit, whether that’s a park, a museum installation, or a neighborhood.
The result of this cognitive priming that I want to better understand is the moment when you take the tech off and experience the world with new ears.