Course:PHYS341/2022/Project18

From UBC Wiki

Boots and Cats

Beatboxing is the art of using the voice as a percussive instrument, sometimes mimicking other sounds and instruments[1]. A flexible instrument, the supraglottal cavity is able to produce a range of musical sounds varying in pitch (frequency), volume (amplitude), timbre, and dynamics/articulation/envelope (attack, sustain, decay, release) depending on the place and manner of articulation, airstream mechanisms, and more.

This article will give a brief introduction to the anatomy and acoustics of beatboxing, and explore the production and analysis of some of the most common sounds in beatboxing with variations around a group of sounds. The notation used is

slightly different from Standard Beatbox Notation (“developed in 2002 by Mark Splinter and Revd Gavin Tyte”)[2], as the sections below encompass unofficial variations that are categorised in different ways so they are represented by the most common notation for the most simple sound variation.

The three main sounds explored below are the Kick (B), Hihat(t/ts), and Snare(k) which mimic their drum counterparts and are often taught with the phrase “Boots and Cats” which progressively sounds more like percussive music and less like

speech when you begin to remove the vowels, hence the name of the article. Below that is an inclusion of some additional sounds including clicks, vocalisations, and bass sounds. Much of this information is transferred from phonetics[3].

A short routine {btttkttt...}
A short routine {btktkbtkt...}
A short routine {btkbtktkt...}
A short routine {ttttstt...}
A short routine {bkbbk...}

Anatomy

Creating and manipulating a specific sound in beatboxing depends on many anatomical factors that affect the acoustic profile of the sound. Most beatbox sounds involve “vibrating flexible parts (eg. trills, vocal folds) or sudden equalisation of pressure (eg. stops)”[3] which produces a longitudinal oscillating sound wave through oscillations of pressure. Below the larynx and epiglottis, the lungs provide the “power” for most sounds (the subglottal cavity). The supraglottal cavity consists of the larynx (vocal folds), pharynx, oral cavity, and nasal cavity.

Airstream & Voicing

Sounds can be egressive (outward) or ingressive (inward) (eg. outward vs. inward k snare) and can be initiated at different stages of the vocal tract from pulmonic (from the lungs), to glottalic, to velaric. On top of the place of sound source and the direction of airflow, there is also voicing and quality (creaky, modal, breathy) if they involve the larynx. Voiced sounds have the vocal folds come together to vibrate, voiceless sounds allow for air to freely move past the glottis (eg. a { B } kick will have voicing, and a { pf } snare will not). Bernoulli's principle shows that pressure moves from high to low[4]

Place of Articulation

Another concept from phonetics, the supraglottal cavity is a flexible one, with different fixed (eg. teeth) and moveable parts (eg. tongue). Sounds can be articulated at different points. For example, a bilabial sound like a kick is produced at the lips, a hihat is produced with the apex of the tongue meeting anywhere from the top teeth to the alveolar ridge, and a k snare is produced with the velum. In beatboxing, there are fine levels of detail needed to explain different sounds as they’re produced (eg. a kick being able to be produced with the closure on the inside of the lips, outside of the lips, or with some overlap).

Manner of Articulation

This deals with the way in which the sound is created at a given point - is it a full occlusion (creating a tube with two closed ends and high pressure), a frication that allows turbulent airflow, an approximant, or something more vocalic with no obstruction. This has an impact on the timbral shape of the spectrum - a stop or ejective like a kick or snare might have a clear fundamental but no specific harmonic resonances, an affricate that combines a stop and a fricative like a { ts } hihat or { pf } snare might have a lot of distributed noise in the high frequencies, and a vocalic whistle or throat bass might have clear harmonics. Different tube models can be used to understand the production of certain resonances - for example, a more open sound might be analysed as a single tube, two tubes, or a helmholtz resonator[3].

A kick (upper lip overlap lower lip) spectrum.
A kick (high pressure) spectrum.
A kick (lower pressure) spectrum.

Acoustics

Envelope

The envelope deals with the dynamic change in amplitude of a sound - its attack (how fast it begins), release (when the attack ends), sustain (the duration of the main sound), and decay (how fast it ends).

Timbre (Harmonics & Formants)

The source-filter theory shows that the source (eg. the lungs) provides the fundamental frequency, which in turn provides harmonics (whole number multiples of the fundamental). Formants are determined more by the configuration of the filter, or the supraglottal cavity, which can be defined as “naturally high amplitude resonance frequencies of the vocal tract that filter out which harmonics are realised.”[3] Anti-formants can occur when airflow is redirected to a forked tube (eg. in a nasal). The principle of superposition shows us that the timbre, or sound quality/unique shape of the spectrum of the sound, is composed of the component resonances (harmonics, formants, and antiformants) of any given sound. The difference between a lip roll bass note and throat bass of the same fundamental frequency and volume comes from which resonances are present due to the difference in articulation and creation. Generally, the more space, the lower the frequency of the given component. The specific type of sound determines which cavity (ie. in front of the articulation or behind the articulation) the given component corresponds to[3].

Variations of kicks.
A hihat (open teeth, single articulation) spectrum
A hihat (affricate with "tch") spectrum.
A hihat (affricate with "ts") spectrum.
.

Kick { B }[5]

The kick is produced by creating a bilabial closure, then quickly releasing the built up pressure as the lips open again. The demo sounds here show three different variations of the voiced plosive - one with low pressure build up, one with higher pressure build up, and one where the upper lip is placed over the bottom lip (rather than straight on). A kick drum is the largest of the western pop/rock kit, thus producing the lowest frequencies, which sits around 200-300Hz in these recordings. The spectrums show a major difference in decay between the low/high pressure variations, with the latter having enough power to carry the harmonics. The third variation shows a similar decay pattern to the high pressure as it was produced as such, but some of the first peaks are higher in frequency, which may be due to the tightness of an asymmetric lip placement.

Variations of hihats.
An outward k snare (closer to velar) spectrum.
An outward k snare (closer to uvular) spectrum.
An outward k snare (unaspirated) spectrum.
An inward k snare spectrum.
An outward k snare (aspirated) spectrum.

HiHat { t }[6]

A hi hat is similarly produced to the stop of the kick with a pre-dental place of articulation. The apex of the tongue is brought up to an area from the teeth to the palate, creating a closure with high pressure build up behind it that is then quickly released creating a burst of energy outwards. The hi hat is a common cymbal, so the timbre in beatboxing should match the metallic shimmer - such as open with a { t } single articulation alveolar ridge, or with an affricate { ts } or { tch }. The spectra show the characteristic distributed noise in the high frequency region, with the single stop { t } having a pronounced fundamental (the “burst” at the release stage of the sound), and the { ch } having louder middle-high frequencies in comparison to the more forward tongue placement of the { s }, which may have to do with the length of the resonator.

Variations of snares.
A bass lip oscillation spectrum.
A tongue trill bass spectrum.
A throat bass spectrum.
An OD bass spectrum.
A bass lip roll spectrum.

Snare { K }[7]

The k snare is produced like the aforementioned stops (also voiceless like the hihat), but further back around the velum, where the back of the tongue makes contact. K snares are commonly both egressive and ingressive. A mid tone between the kick and hi hat, one might expect the spectra to have peaks in the mid range. These spectra show a few variations on the outward snare - unaspirated at different places of articulation where the further back production (eg closer to uvular) appears to shift the sound to lower frequencies (likely with the elongation of potentially the primary resonator of space in front of the closure) in comparison to the closer to velar production. There appear to be two distinct peaks, with the second peak being higher than the first. The inward spectrum has a noticeable antiformant and decay of the higher frequencies, with thinner and more distinct peaks. The aspirated outward snare has quite a bit of noise with an even distribution as the aspiration lets air out without picking up on specific resonances.

Additional Sounds

Variations of basses.

This is a sample of a whole suite of sounds.[8][9][10]

Basses

There are many ways to create bass sounds - as long as it has a low frequency oscillation. These variations range from strong harmonic with strong fundamentals (eg. a throat bass) to a really distributed even decay of harmonics (eg. the lip oscillation).

Variations of vocalisations.

Vocalizations

These vocalisations include a zipper, siren, and trumpet to scratch the surface of sounds that involve more melodic incorporation of voicing.

Variations of clicks.

Clicks

Clicks are a useful sharp and strong percussive element like a k snare. The click uses the blade of the tongue in a plosive manner on the palate, sometimes drawn backwards a bit during release. Because of this articulation, rounding and changing the exact place of articulation are fairly easy. A click further back has a pronounced fundamental and a peak or two, but is extremely clear and percussive. A more forward click becomes a bit more distributed and completely shifts the bulk of the peaks into higher frequencies as the space in front of the tongue shortens. As one can see in the rounded vs unrounded spectra, it appears rounding may create an antiformant, which may be easier to do when the tongue is further back (for an English speaker), which may explain the antiformant in the first spectrum.


Overall, to make the clearest sounds, one must maximise pressure for stop sounds, and ensure looseness for the lowest oscillations for bass sounds.


  1. "Beatboxing". Wikipedia.
  2. "SBN". Human Beatbox.
  3. 3.0 3.1 3.2 3.3 3.4 LING 313, Rachel Soo, UBC 2021.
  4. "Bernoulli's Principle". Khan Academy.
  5. "Classic Kick Drum". Human Beatbox.
  6. "HiHats". Human Beatbox.
  7. "Snares". Human Beatbox.
  8. "Beatbox Sounds". Reddit.
  9. "Bzzktt Sounds". Bzzktt.
  10. "Sounds". Human Beatbox.