luxcache

AUDIO ARCHITECTURE MASTERCLASS - Part 1: Digital Music Ecosystem & Stereo-Space-Sound with 0khz

Added 2023-08-03 18:03:09 +0000 UTC

AUDIO ARCHITECTURE MASTERCLASS

Part 1: Digital Music Ecosystem & Stereo-Space-Sound with 0khz

In this Lux Cache article/tutorial series, we delve into the intricate world of ‘audio architecture’', exploring the complex interplay between digital audio creation and music technology. By examining the multifaceted systems and frameworks that shape the digital music landscape, this series illuminates the myriad ways in which music is composed, performed, and perceived in the digital age. In the first chapter of this series, producer and music technologist 0khz guides us through the labyrinth of the Digital Music Ecosystem, shedding light on the blurred boundaries between composers, performers, and audiences, and the myriad ways in which music reaches its listeners. Venturing beyond mere composition, we also consider the impact of distribution, reproduction, and audience engagement on the life cycle of a piece of digital content. This comprehensive exploration of the digital audio landscape offers invaluable insights for anyone seeking to navigate the rapidly changing world of computer music.

This tutorial is available as a Patreon text post and a preferred .pdf document format. We ask you kindly to not share Lux Cache content outside of the Patreon, our contributors rely on your donations.

INTRODUCTION
THE DIGITAL MUSIC ECOSYSTEM
STEREO-AUDIO-SPACE
1. INDUCTIVE APPROACH TO COMPOSITION
2. DEDUCTIVE APPROACH TO COMPOSITION
CONCLUSION

INTRODUCTION

Creating audio content on a computer can be challenging. The reception of your content is influenced by a myriad of factors, from psychoacoustics and audio transduction to algorithmic recommendation systems and more. The ambition of releasing a sonic piece into this vast system, and having it perceived, reconstructed, and even appreciated by someone at the other end is indeed a lofty one. In the ensuing article, I endeavor to identify and elucidate several pivotal points to help shape your unique approach to digital audio creation. I've termed this a field guide, as I aim to highlight key aspects of digital audio and their interconnections. This guide is by no means comprehensive, and while I begin with established principles, I progress towards explaining what are, in essence, my personal interpretations of the environment we all operate in. In the initial section, "The Digital Music Ecosystem," I strive to shed light on as many stages in your content's lifecycle as possible and offer my broad understanding of their interrelationships. In the subsequent section, "Stereo-Audio-Space," I discuss prevalent methods to compose and interpret sound as they pertain to working with digital audio, and then delineate my individual approach.

THE DIGITAL MUSIC ECOSYSTEM

Here is what it used to be like to play an instrument:

Working in audio, you may have come across this chain:

There exists a range of similar diagrams tailored to different use cases. The details of each point in the signal chain can be expanded far beyond what is covered in this article.

The subsequent chain takes into account human perception and involvement in the composition and performance of music (where the audio transduction chain refers back to the previous diagram).:

With the advent of digital audio workstations, MIDI instruments, sampling, and online music distribution, the chain linking music composition to its audience has grown significantly more intricate. The lines separating composers, performers, and audience members have become thoroughly indistinct.

Producing music may look something like this:

However, the pathways through which music reaches the audience have become much more diverse. The feedback loop between composer and performer has evolved well beyond the simple reproduction of a composition in a specific space. Your work might reach its audience through streaming, sampling, a DJ mix, as background music in a TikTok video, or in numerous other ways. Concurrently, the capabilities of the computer, and the forms of input to the computer, have significantly advanced (MPE, digital instruments, hardware, image synthesis, resampling, algorithmic/generative synthesis, data sonification, AI, and more). Fundamentally, the study of music theory is driven by the need to navigate within whatever process transports music from the composition stage to the audience. How can we begin to do so when the tools at our disposal have become so intricate? The Digital Music Ecosystem is my earnest attempt to elucidate the key players in digital audio distribution as they relate to you, the producer.

Since the dawn of recorded audio, the relationship between composer, performer, and audience has been subject to scrutiny. With the advent of digital audio and the internet, this question has grown even more intricate. When a record is played back, it's straightforward enough to identify who wrote and performed the song. However, when a song is assembled from recorded works, it becomes more challenging. When works composed of sampled audio have been around for decades, and I can play them back at 2x speed on YouTube, who then is the performer? If I screen-record the 2x speed edit and post it on Twitter 30 seconds later, who was the performer then? What if I stripped all identifying information and someone mistook it for an original song? What if I spend an entire day watching Evian Christ live videos from 2016 that are clipping wildly with terrible sound quality? What about the YouTube algorithm that persistently recommends them to me? What if I downloaded as many as I could in anticipation of copyright strikes and shared them via a Google Drive link? Who had the most recent influence on the work's performance? At how many stages was the work in stasis? Who comprised the audience?

The following diagram is an attempt to identify several core points in the lifetime of a piece of digital media:

The digital music ecosystem I've delineated here functions between three primary stages in the lifespan of a piece of digital content. My argument is that even a rendered audio file is, in a way, live, in that each stage of a song's creation and distribution occurs in relation to an audience. Essentially, the composer’s role has been repositioned to a stage prior to the music being synthesized, performed, or recorded in any form.

The process commences with the "Content Creator," where samples, live audio signals, and control data may be assembled, performed, or synthesized (as input to the computer interface). The cloud labelled "Compositional Framework," pointing to "Content Creator," will be discussed further in the subsequent section. For now, what's crucial to recognize is the dialectical relationship between the composer and the digital music ecosystem. You have a variable degree of control over how music is distributed, which will inevitably influence how you compose music. I regard the content creation process as part of the performance in digital audio for a variety of reasons. The assembly of MIDI gestures, mouse movements, and keyboard input encapsulates some form of physical performance. Moreover, previously live performances may be recorded and incorporated. Even after a file is rendered, project files may be revisited numerous times. Lastly, a portion of the audience’s experience of digital audio is tied to perceiving how a song was created. When the methods of creating music are so diverse, it becomes necessary to develop some mental model of how a piece was constructed, whether accurate or not.

In the "Computer Interface" stage, output from the content creator is digitized and manipulated. This could refer to the process of crafting a song in a DAW, but it also extends to less obvious setups like using an iPad to control a Max device. Additional layers of content creation, such as uploading a live recording to YouTube or playing out a song on a live (DJ) setup, are also taken into account.

In the "Output To Audience" phase, audio is disseminated to listeners in some manner. I have grouped them into two categories: Distribution and Performance. Any form of output to the audience holds the potential to ignite further content creation by audience members, thereby restarting the cycle (as detailed below).

The "Distribution" category is often the most readily considered method of output. Typically, audio files are uploaded to streaming platforms and promoted via social media. Physical formats such as CDs and tapes may be produced and sold. I've included licensing in this category, although it's less frequently considered. For instance, digital music may be licensed for use in commercials, as background music, as hold music on the phone, or in video games. In the distribution category, the content creator has the least awareness of how and when their content is reproduced.
The "Performance" category outlines ways your audio content may reach audiences where you have some involvement in its reproduction. This could include playing a song live in a DJ set, performing a song live, or including it in a larger mix distributed on a stream or radio set. These are scenarios where the content creator has slightly more control over the context in which the piece is reproduced, and who it will reach.

Both categories have their own advantages and necessitate consideration within the compositional framework. Copyrighted samples and remixes might be easier to incorporate into a live set, while substantial arrangement considerations may be needed to create a song that can hold its own alongside other artists' tracks in a live setting. A piece of custom software requires significantly more stress testing when it needs to consistently function for live performances.

Branching off from forms of output to the audience, we have "Further Content Creation". At this juncture, your music or performances may be incorporated into another content creator’s own digital media work, thus initiating the cycle anew (where they become the content creator). My aspiration is that the system I've outlined can accommodate the myriad ways in which a piece of content can be repurposed, rehashed, transformed, distorted, and shared.

Finally, the ways in which audiences engage with your work, repurpose it, or respond to it will inevitably impact your compositional framework and influence the life cycle of a piece of content. Our return to the content creation stage may occur as variations in production are created, a song is re-released, or something else entirely. For instance, some artists may tour an album while adjusting it, potentially for years after the announced release date, before it appears on streaming platforms. During this process, countless live recordings, or even live streams of performances may surface, such that dozens of variations of a song are available to audiences before it enters the "Distribution" phase at the behest of the artist. There are far too many possible ways for a piece of content to exist to be listed here. Unfortunately, YOU have to create digital audio for this system.

The interstices labelled on the circle refer to forms of output, positioned where they are most likely to occur. Transduction here is optional in some instances, for example, the content creation process may be played back as audio through the computer numerous times before being output to any audience. We are already familiar with programs that archive the internet or scrape the web for training data, where a piece of content may be stored or algorithmically reprocessed many times before it is next used by a content creator. Similarly, streaming recommendation algorithms mentioned earlier are an example of a feedback process between the audience and some computer interface. Playlisting is an example where some content creators may be present, but lacking in any sort of musical input. Audio may also be processed many times without your involvement before reaching its audience, for example, by MP3 compression or loudness normalization.

Developing awareness of your content’s lifespan, and responding to the system in which digital media exists, is an invaluable skill. Experimenting with how a song is distributed, how you perform music, and how you compose for the digital music ecosystem can only enhance your work’s lifespan. In the following section, I outline more "in the box" forces that shape digital audio creation, and elaborate on compositional considerations for the digital music ecosystem.

STEREO-AUDIO-SPACE

Having established a framework for the uncertainties of digital audio as an artistic practice, let's consider what we do know. The stereo audio format consists of two series of bins that store numbers, where each bin is called a sample. One series is for the left channel, and one is for the right. Each bin (sample) stores the amplitude of the signal at a given time for a given channel, so the number of samples is proportional to the duration. Digital audio comes with two additional specifications: the sample rate and bit depth. The sample rate is the resolution in time, and the bit depth is the resolution in amplitude. More specifically, the sample rate denotes how many samples will be played back per second, and the bit depth describes how many bits are dedicated to storing each of our samples in binary. A visualization is shown on the left of the following diagram.

Working in stereo-audio-space, we start from a set of clearly defined physical constraints and operate within them to create something much more elusive and indescribable. The overall stereo audio experience that emerges from playing back audio may reach us in a number of ways, but it encompasses a range of perceptual events and experiences that cannot be accounted for by the score or the stereo audio format alone. Between our lists of numbers and the overall sensory experience, I have positioned the "compositional framework." Put simply, the compositional framework is how you are thinking about sound when you are creating sound. What tools are you using, how are you assembling individual elements to create aural experiences, and what limitations are you imposing on yourself? What elements of digital audio and psychoacoustics inform your choices? In the following sections, I aim to shed light on some of the core building blocks of digital music composition and some of the most general starting points for developing your own compositional framework. Additionally, I describe a few approaches to stereo-audio-space composition that may assist you in further grappling from our established variables into the realm of digital audio.

Here is a plot of a snare-drum sample’s amplitude over time, in this case, the left channel:

If you zoomed in, you would be able to see each amplitude bin (sample) in time. You might also be familiar with frequency domain representations of sound, where the amplitude and phase for each frequency over time can be displayed.

The following diagrams show amplitude per frequency over time, where the 3D representation only displays 0 - 1250 Hz:

On one hand, the number of possible representations of sound is limited by physical constraints (number of channels, duration, sample rate, bit depth). On the other hand, in theory, we should have access to any possible representation of sound within stereo-audio-space.

I would find it quite challenging to compose music specifying every sample's amplitude, or every frequency's amplitude and phase over time. I would also find it quite difficult to reach all possible representations of sound in stereo-audio-space via traditional music notation.

Here is how composers most typically represent music (a score).

With the range of synthesizers, effects, audio programming environments, algorithms, and recording methods available, there are a multitude of ways to populate our two channels of audio.

The following mixing diagram describes a stereo mix for a recorded song:

Here, we have forsaken the time variable in order to describe the physical placement of sounds in space. In stereo audio, a variety of effects can be used to simulate positioning around the listener. Stereo wideners, reverb, delay, all-pass filters, binaural panning, autopan/filter effects and more can be used to place a sound in space. Notably, stereo placement and width operate on principles related to both how sound is reproduced and psychoacoustics. For instance, binaural panning will be less effective if our audience is not listening with headphones.

Graphic scores for performance also exist. They exhibit significantly more variance than the following example, though it may be the most familiar format.

What can be appreciated about this type of graphic score over the frequency/time domain representation of a signal is that individual elements are separated, and subjective qualities can be accounted for.

What's important here is the ability to score for a system that no longer requires notes. With the breadth of available tools (generative MIDI plugins, MIDI packs, sample packs, synth presets, etc…), transitioning from a shape denoting timbre, frequency range, and time may be as simple as dragging, dropping, and menu diving. With image synthesis or algorithmic tools, graphic scores can even be interpreted by computers in whatever way you choose. Perhaps someday, you'll be able to select portions of frequency and time to fill with AI-generated content.

Producers work with tools and techniques so hyper-specific and personal that transcription of digital audio to some form of notation may be futile. What then becomes important is describing digital audio across whatever physical/perceptual axes it operates on. Being able to analyze, reproduce, and develop upon identifiable effects in stereo-audio-space then becomes the key to understanding computer music. The compositional framework is augmented by one's repertoire of effects, events, and illusions that can be created in stereo-audio-space. What I propose is that the process of composing is a dialogue between envisioned perceptual effects and our best attempts to reproduce them with available technology. As an added consideration, these perceptual effects will inevitably be filtered through whatever means by which they reach our audience (refer back to the digital music ecosystem).

Over time, you have developed some background idea of what it will look like when you go into the DAW, studio, or software environment you use to make music. This is what I have labelled "Compositional Framework" on the digital music ecosystem diagram. What I hope to establish here is the idea that virtually any approach to computer music; any set of sonic building blocks; any framework or set of techniques, is limited in scope. In turn, as far as I know, any sound or type of music can be approached from a wide variety of perspectives (compositional frameworks). Familiarizing yourself with the axes across which music can be conceived in stereo-audio-space is key to developing your own compositional framework.

When identifiable instrumentation and techniques cease to dominate stereo-audio-space, deconstructing digital audio into physical and perceptual phenomena becomes essential. When music can be made through virtually irreproducible meta-instruments, feedback loops, and post-processing, transcription and composition for individual parts with notation and articulation become futile. With cross-processing effects like vocoding, style transfer, and cross-synthesis, even envisioning what someone’s DAW session looked like can be challenging. Thus, learning to create, reproduce, and deconstruct sound objects in digital audio across observable axes becomes the most operative skill. The crux being that producing some ideal effect across stereo-audio-space then becomes dependent on how your piece of digital audio is distributed, reproduced, and processed. The act in the DAW then becomes one of translating precomposed ideals to actual audience perception under existing physical and technological constraints. The feedback loop between your work and your audience, described in the digital music ecosystem, then becomes another informing factor of the compositional framework. In the following section, I describe my own compositional approach built off of the various perspectives from which to view stereo-audio-space outlined previously.

We begin with what I am calling an inductive approach to composition. Starting with some small element of the song, we work our way outwards to the desired arrangement, filling out stereo-audio-space as we go.

Through a combination of visual and textual description, I have decided to create the world's most annoying lead. I began by positioning it in stereo/frequency space (frequency on the Y axis).

Next, we extrude the lead in time (Z axis), considering the type of movement in frequency/ stereo that is desired. Upon consideration, I have decided to fade in a trance pluck, again with some visual representation of subjective qualities

Finally, rotating so that time is on the X axis, and frequency is on the Y axis, I have laid out where I want my kick drum to go, and how I want it to sound.

This is quite a simple example, though it bears some resemblance to how I wrote a recently released track "Smited." The takeaway being that some dialogue has occurred between the sonic events I want to create, and what I can do in the DAW as I build up my final audio construction. At this point, I may be using placeholder sounds, or may already have presets and samples ready to assemble. I may reform how I want my sounds to fit together across various axes as I sound design the individual parts. The goal is that by the time I am preparing to distribute my audio work, I have both some audio construct(s) (wav file, DAW session, software), and a unified vision of how people will perceive it in stereo-audio-space. Traditionally then, the process of mixing and mastering will ready your work for audiences, and focus on making your vision a reality. What I suggest is that when digital audio experiences can be constructed on such a basic level, all choices leading up to distribution should be molded according to your understanding of how distribution, reproduction, and perception will occur. Every step of the inductive process should be taken in relation to our idea of how the work will eventually exist in the digital music ecosystem, even if that idea changes during composition. The informing factors of your compositional framework for this inductive approach then become what axes across which you hope to achieve some audio experience, and what production choices, techniques, and tools you will use to achieve the end result.

Contrasting the inductive approach is the deductive approach to composition. In the deductive approach, we begin with some overall perceptual experience we wish to create and fight our way into the computer to achieve our best approximation of it. In the process, limitations, tools, and the context the work is to exist in (the compositional framework) move towards compromise with our hypothetical composition. Many perceptual experiences (loudness, brightness, harmonicity, stereo-width) exist relative to the whole composition and the playback context.

Can you create a stereo sub that will survive being ripped from YouTube and played in a club? What should your range of dynamics look like for streaming versus performing live? We are working backwards to craft some illusion that may be wildly different in its construction from how the desired audio experience will be perceived. If we want to place the listener inside of our snare drum as though they’re being hit in the head, is it even helpful to record a snare drum? Might our sound best be constructed from a variety of synthetic layers, or intense processing of completely unrelated samples? It's possible we've decided to restrict ourselves to only a few tools/samples. Often some of the most minimal, hard-hitting sounds in pop/EDM are the result of dozens of infinitesimal technical and creative choices that set them apart from the competition. In the deductive approach, we begin with as specific an idea as possible of the listening experience and shape our methods, and ultimately the composition, to achieve whatever approximation is possible within the compositional framework and the means by which our audio reaches its audience.

Let us begin with the following image as our desired perceptual experience:

Following a similar approach based on frequency, time, and stereo space, I have some idea of color, timbre, spatialization, sonic overlap, the number of elements and to some extent dynamics. The following images show cross sections in time and stereo, along with added textual specifications.

Textual descriptions may arise before I begin working in the computer as a part of the desired end experience or may arise as I attempt to design sounds for the desired end result. Keep in mind that the diagrams are abstract, and may not necessarily be how our piece appears on the spectrogram or stereoscope, but how I aspire for them to be experienced. The axes could of course be relabeled, flipped, or rotated. How would the compositional process need to change if the initial diagram described only stereo space in 3D with a sound’s probability of occurring in time represented by opacity or density in space? Given that we are composing the ideal perceptual experience, what other qualities could we label our axes/colors with? Brightness? Mood? Meter? Texture? Some qualities rely on others. Loudness, for instance, is perceived based on amplitude, time, frequency, and stereo space. This is where it becomes necessary to develop a focused compositional framework. When a given sound can be synthesized in any number of ways and may be heard in any number of contexts, limiting tools, and focusing on a specific listening experience becomes essential. Stereo audio experiences may succeed in some spaces and completely fall apart in others. Audiences familiar with a certain cultural context may better appreciate and understand your composition than those without. Surprise, drama, humor, sophistication, reference and aesthetic value may only translate in a set number of listening contexts. Much like in visual arts, brightness, contrast, saturation, scale and a variety of other effects exist as elements of your audio composition give rise to them. The characteristics of a single sound or section are constructed in relation to the whole composition.

The deductive approach is so personal and context-dependent that it eludes step-by-step explanations. Our goal is to reach some compromise between the desired listening experience, our stereo audio information, and our compositional framework. Each may need to be reworked in order to reach the destination. It may be easier to consider an even more specific point in the digital music ecosystem with far fewer parameters. Let's imagine we are playing a live DJ set and need to transition between two songs. We’ve compounded the act of any songwriting to our baked audio files, and now are operating with a much smaller goal, a limited number of controls, a significant time constraint, and a specific audience. In the short term, we begin with some goal, in this case, to make a dramatic transition between songs, then work to achieve it in the current framework of parameters, practiced transitions, available samples, audience expectations, and more. Writing a song is not the same as performing a transition on CDJs, but the analogy holds. You have your own set of samples, expectations, and practiced routes through the DAW, within which to achieve the desired audio experience. This may mean expanding, molding, or even contracting your framework (downloading new plugins, learning new sections of the DAW, acquiring new hardware, or limiting yourself to only certain plugins). It may also mean acknowledging the limited scope of your framework, and reworking the hypothetical composition in order to achieve the desired effect. The final audio composition may even bear little resemblance to the idea you started with.

Neither the inductive nor deductive approach is exhaustive in its capabilities. What's more, they may exist in tandem with each other as a piece comes together. Some sounds are so complex that I may dedicate a few live sessions to practicing, experimenting, and attempting to achieve the desired effect (deductive). The resulting sounds may then be used to build entirely different compositions from the ground up. Alternatively, in the process of building a song from the ground up (inductive), I may decide that I need to split up all my current content and use it as a starting point for achieving a different sonic idea. I may write a song inductively, then realize I need to redesign every sound and rearrange sections to create a version that works better live, with some new idea for the end audience experience in my mind. I may start from one sound, and a concept for where and how my song will be played, and work inwards from both ends.

CONCLUSION:

Thank you for reading my digital audio field guide. I hope I have provided a useful resource for navigation and further exploration of digital audio as a practice. Whether you're a beginner or an expert, being able to orient yourself within the rapidly changing landscape of computer music is a valuable skill. No matter how you choose to write music, furthering your understanding of any point in the Digital Music Ecosystem, or axis of Stereo-Audio-Space will inevitably strengthen your work and hopefully inspire you too.

0hz is a producer, music technologist and visual designer based in Seattle. You can listen to 0khz’s music on their SoundCloud and Bandcamp pages. You can watch their videos on their YouTube channel and explore their portfolio at 0khz.live.

You can follow them on Instagram & Twitter: @0khz0khz0khz

2023 © Whiston Digital / Lux Media | luxcache.com