Contents

Here I describe the digital signal processing that takes place in the Haptic Box. I assume the reader understands basic electronic audio concepts like filters and mixing, but less standard concepts are explained. I make references to the SuperCollider source code here and there, but investigating them shouldn’t be necessary to come to an understanding of the process.1

Overview

All of the electronic audio manipulation in Haptic Box is digital signal processing (DSP) performed by SuperCollider. At the most abstract, the SuperCollider patch processes audio in order to keep it in the tactile range and to generate change depending on the user’s tactile interactions while retaining some kind of apparent structure. The structure is important in order to keep the experience from being a direct correlation between touch and and its effects on the process, so the patch attempts to balance tactile correlation and the process’s “internal” character.

In practical terms, the audio is passed from input, through several processes, to output. From the output, the sound is played on the box’s surface, where it takes on characteristics of the material and the pressure of the touch before being picked up again by the piezos and sent to the DSP input. These processes are:

The bandpass filters are responsible for creating the movement in the pitches which ultimately feed back to be felt as vibration. The bulk of the SuperCollider patch is dedicated to how they move, which might be the most conventionally “algorithmic” or “compositional” aspect of the process. I discuss this aspect in depth below. The AC hum rejection filter was necessary while prototyping with the system physically connected to my computer and ultimately to ground, and might be removed. The 100 Hz low-pass filter ensures all of the audio stays in a range that is primarily felt as vibratory movement more than it is heard as sound.

The rest of the process

The rest of the process – the spectral compression and wavefolding primarily, but also the amplification and panning – functions to prevent the feedback system from organizing around one or two dominant frequencies. The importance of this function cannot be understated: though the physical aspect has a complex range of frequencies which define its acoustic spectrum, in practice a limited number dominate its resonance and produce much louder feedback, even compared to tones which are forced to feed back through filtering. Additionally, the strength (propensity to feed back) of these nodes is also disproportionate to the others, meaning that a touch which is firm enough to stifle the “lesser” resonant nodes may not be enough to have any effect on the dominant ones, and conversely, a touch that is firm enough to stifle a dominant node will also mute any “lesser” nodes. Since the goal is a vibrating surface which responds to touch (without simply dying out!), the playing field needs some levelling. At the same time, because the interaction depends on acoustic input and the material propensities of the box are of interest, a balance must be struck with the physical acoustic tendencies.

Spectral compression is a very useful tool to accomplish this task. Like a regular compressor, spectral compression attenuates signals above a set amplitude threshold. Unlike a regular compressor, spectral compression operates in the frequency domain, attenuating frequency bands individually. In theory, this behaviour is very like having an automatic equalizer which performs the levels manipulation required in order to keep the dominant resonant nodes in check while allowing others to feed back. I first experimented with spectral compression with fftease’s pvcompand~ object for Pure Data while developing Pathside Box, and fortunately the PV_Compander object provides the same functionality in SuperCollider.

In practice, two problems prevent spectral compression from being the only solution. Firstly, it can be too effective. Reducing the natural dynamic response of the spectrum results in everything feeding back with similar amplitude and character. At high compression levels, the effect is like an arpeggio over a cluster of sine tones, and these characteristics present at lower levels as well. (Notably, though the sine tone quality of sound doesn’t strictly matter since the material is primarily haptic, the predictability of response does.) Secondly, due to the nature of the Fast Fourier Transform, frequency resolution is limited in the lower register unless additional latency is introduced.2 The implication is that the frequency band(s) in use are so wide that when they are attenuated by the presence of a “dominant” resonant node, they might also suppress the nodes of interest which they are meant to enable.

Ultimately, spectral compression at mild levels proved useful enough to leave in, but required the use of another harmonic processing measure: wavefolding. Wavefolding performs what is described by its title: the portion of a wave outside of a threshold is mirrored inward (“folded”), and if the folded portion reaches the other threshold, it is folded again, and so on. The result is the introduction of additional harmonic content related to the incoming audio (and indeed that is the original use of wavefolding in the “west-coast” paradigm of analog synthesis).3 Subtle wavefolding proved useful in this application because it effectively moved some of the acoustic energy from the input frequency into higher frequencies, thus both limiting the “dominant” nodes while encouraging other nodes to resonate by injecting energy into their spectra.

Folding a sine wave

Finally, gain and panning manipulation required fine-tuning to bring about the desired behaviour. The input gain needs to be just high enough to give the filters something to work with while still being sensitive to touch. The input and output gain needed to be correlated with the fold level to produce appropriate spectral content and amplitudes. (These values ultimately also relate to the physical construction of the box and the audio hardware in use) A small portion of each channel is cross-panned into the opposite channel (that is, the channels are not “hard panned”) in order to encourage polyphonic effects like beating and harmonic correlation. To an extent, this duplicates the transfer of vibrations resulting from the outputs being physically coupled by virtue of being mounted on different sides of the same object.

The vibrations stripped bare by their filters, even

As in Pathside Box, bandpass filtered feedback makes up the core of the algorithmic material of the piece which directly and discretely manipulates of sonic events in time. A bandpass filter in a feedback circuit will only allow resonant frequencies within the band to feed back. If there are none in the band, or if the frequencies covered by the band don’t generate enough energy to feed back, no feedback pitch is produced. Slowly moving the band’s centre frequency across a range of frequencies will generate an arpeggio of the frequencies that will feed back. Which frequencies sound are determined by the physical-acoustic properties of the loop and the additional processing (see previous section). The attack and decay times are emergent results of the physical-acoustic properties, other processing, speed of the filter’s movement, and the filter’s cutoff slope.

A bandpass filter

The Haptic Box patch uses independent banks of four bandpass filters per stereo channel. Though these banks act independently, their behaviour is identical and defined by the same code in the \main SynthDef. The movement performed by these filters which ultimately manifests as the most apparently compositional material takes place in two general spheres of manipulation: amplitude and frequency. Both spheres make use of an “introspective” analysis of the process’s audio amplitude envelope to modulate their parameters.4

For the purposes of this explanation, these banks can be represented as a list: [band, band, band, band]. They are never given as such in the code, instead defined by iterating for the number of bands (n_bands.collect( ... )). To start, assume the bands are spaced evenly across a frequency range:

[
    band 0 (30 Hz),
    band 1 (35 Hz),
    band 2 (41 Hz),
    band 3 (49 Hz),
]
Several bandpass filters

Amplitude

Two processes affect the amplitudes of each band, one resembling a tilt equalizer and one resembling a limiter.

A tilt equalizer is one which draws an imaginary line across the frequency spectrum, tilted on an axis, and attenuates or boosts according to the tilt of the line.5 For instance, if the higher portion is boosted, the lower portion is attenuated, and vice-versa. In practice, these are implemented using shelving filters. In the Haptic Box algorithm, the tilt simply boosts the amplitude of some of the bands and cuts others. The axis is centred and the slope value is fixed at 0.4. Because the bands eventually rearrange out of their initial lowest-to-highest frequency order, the effect is primarily to build in a slight asymmetry among their gain levels.

The limiter approach used here draws on a technique commonly used to self-regulate feedback systems.6 The amplitude envelope of the signal is inverted and used to modulate the signal’s output amplitude, allowing quiet signals to pass while reducing louder ones. In the Haptic Box patch, the technique is used to regulate the amplitude of each band relative to the group. If a band’s envelope is above the mean for its channel, it will be attenuated; if it’s below the mean, it will be amplified (to a maximum of double). The intent is both to restrain bands from dominating and to encourage multiple bands to feed back simultaneously.

Frequency

The filter centres move through vertical pitch space, arpeggiating with feedback the resonant nodes they pass through. Their movement is driven by related sinusoidal low-frequency oscillators (LFOs) which are subject to both frequency and phase modulation determined by the resulting amplitude envelopes. The LFOs begin running at the same frequency in quadrature phase, with their ranges splayed out over a 1.4 octave spread.7 The result is that the filters cover independent frequency ranges with some overlap.

Initial filter trajectories

As the filter bands move, the speed at which they move (ie, their LFO’s frequency) momentarily drops when their band’s amplitude is high. The intent is for the filters to linger around frequencies that feed back. Like the limiter, the amplitude envelope analysis is relative to the channel’s bands: the amplitude threshold over which a band is slowed is 70% of the group’s maximum amplitude, slewed over five seconds.8 In other words, when a band is feeding back louder than most of its group, it lingers to continue to do so.

The point of having the bands linger is for the system to tend toward feeding back. As the bands move, they “want” to bring out vibrations, slowing down and spending time when they find them and hurrying on to the next afterward. However, despite the dependency on material involvement in this emergent behaviour, it seems likely that the bands would eventually settle in to a stable phase relationship defined by their semi-deterministic movement across a fixed medium.

In order to mitigate that tendency toward stasis, the phase relationship between each channel’s bands’ LFOs is manipulated. The initial quadrature phase of the LFOs is defined by i / 4 * 2pi (where i is the band’s index from 0-3), which simply distributes the phases of each LFO at one-quarter increments across the full range of 2pi. The phase manipulation is “injected” into this relationship relative to each band’s index in the group, with the result that the phases are unequally affected (lower-indexed bands are less affected than higher-indexed bands; if the bands were affected equally, their relationship would remain the same). The resulting expression is (i * lfo_phase_injection + i) / 4 * 2pi % 2pi.

Like the other modulations, the phase injection value is determined by by the feedback amplitude envelope, but the value itself is derived from the phase relationship of the band LFOs. A new value is taken when the mean of the channel’s amplitude envelopes is outside of a given range (0.2–0.8) for at least one second, at most once every five seconds. The value that is taken is the range of the real LFO phases, that is, after the frequency manipulation. Because the real value and not the assigned value is required, the phase is calculated from the LFO’s output with the ~get_phase_kr function. As a result, the phase manipulation is derived from the system’s ongoing behaviour while injecting another chaotic element into it.

The injection value is not sampled simply (as in a sample-and-hold) but rather affected by an analog shift register (ASR) with its own feedback. An analog shift register is a series of sample-and-hold registers which sequentially store incoming values when triggered by a clock.9 When the input is sampled, the old value in the first position is “shifted” into the second position, the old second position value is “shifted” into the third, and so on for the number of positions in the register. While this can be useful for maintaining a history of each value, here it is used with feedback: each value in the register is mixed in with the input before the shifting begins. Even though only the first position is used, the feedback behaviour provides a further pocket of memory to the process, inspired by Andrew Fitch’s application of squid neuron research to analog synthesis.10

The result

Haptic Box’s signal processing spectrally compresses, filters, and folds the audio just so in order to keep it changing in a particular way in a tangible frequency range. Its self-referential internal feedback generates frequency movement that responds to inputs from other elements in the system and its previous changes, imbuing its behaviour with both emergent structure and unpredictability.11 The processing is one agent among many that interact with the sculpture’s unfolding events: the user’s touch, the input and output transducers, the box’s material, and even the environment in which the process takes place affect the audio stream. The processing is a way for me to influence the flow of events in a way which I hope will draw out a particular quality of engagement.


  1. References to particular places in the code have been given with reference to variable names and definitions as opposed to line numbers, in order to protect against link rot due to the changing nature of code. Be aware that SuperCollider processes mathematical operators from left to right without regard for the correct order of operations, that is, 2 + 3 * 4 evaluates to 20, not 14.↩︎

  2. The FFT buffer represents frequencies linearly, with all bands occupying equal width in Herz. Therefore a typical width such as 10Hz which might be an acceptable resolution in middle and upper frequencies is too wide for the lower ranges used by Haptic Box, breaking the space of the lowest octave into a mere three segments. FFT frequency resolution is tied to latency (greater precision requires a larger sample to analyse, partly because the wavelengths being analysed are longer in duration). To some extent this can be accomplished by using overlapping analysis windows, which trades audio time for CPU time. Since the target platform is limited in processing power, neither option is a workable solution. For more on Fourier analysis, see Curtis Roads, “Windowed Analysis and Transformation” in Microsound (Cambridge, MA: MIT Press), 2004.↩︎

  3. Wavefolding can also be used for other applications such as adding complexity to a control signal or assymetrically rectifying audio in a feedback loop. A brief introduction to the history of wavefolding and waveshaping can be found in Fabián Esqueda et al., “Virtual Analog Models of the Lockhart and Serge Wavefolders,” Applied Sciences 7, no. 12 (2017): 1328-, https://doi.org/10.3390/app7121328.↩︎

  4. Due to the line-by-line definition of audio processes in SuperCollider, envelope following code is a little more verbose than its equivalent representation in Pure Data or analog hardware, necessitating buffers to represent the analysed values. This is mainly because of the inconvenient reality that effects generally can’t be examined before they happen. Analysis of the amplitude envelope reflects a point in the continuum of analysis that extends towards the abstract in the direction of FFT and eventually corpus categorization with MFCCs, and towards process in the direction of ADC sampling, modulating processes like filtering, and the passing of vibrations through material.↩︎

  5. EDN, “Implement an Audio-Frequency Tilt-Equalizer Filter,” EDN (blog), February 2, 2012. https://www.edn.com/implement-an-audio-frequency-tilt-equalizer-filter/.↩︎

  6. See for instance Kees Tazelaar’s explanation of Jaap Vink’s well-known ring modulator feedback patch from 1970, which uses the technique: Ring-Modulated Feedback in BEA5, Analogue Studio BEA5 at the Institute of Sonology, den Haag, accessed October 17, 2022, https://youtu.be/watch?v=X_Bcr_HS9XM.↩︎

  7. The splay and spread parameters are left as interactive arguments from an earlier performance version of the patch. spread refers to the octave range above the lower frequency and thus defines the total frequency range the bands move within. splay refers to how the individual bands’ LFO ranges are distributed throughout that range. A value of 0 means each band’s movement occupies the entire range and are as a result superimposed; a value of 1 means each band occupies one quarter of the range and are “stacked” with no overlap; intermediary values result in some overlap over the entire range defined by spread, which is in any case always used.↩︎

  8. The condition for a band’s movement being slowed is given by chan_envs[i] > (envs_max.lag(5) * lfo_env_thresh): slow if envelope is higher than the maximum envelope times the threshold (0.7). The slowed LFO frequency is 0.3 times the unmodified frequency.↩︎

  9. “Analog” in this sense does not mean a physical circuit as opposed to a digital implementation (which this clearly is), but rather a continuous numerical value as opposed to a binary true or false one (or zero).↩︎

  10. The notable differences between the feedback ASR in Fitch’s Squid Axon module and the implementation in Haptic Box are first that Squid Axon uses two feedback paths (linear and non-linear) whereas Haptic Box uses only one, and second that Squid Axon feeds back only the value of its final position, while Haptic Box draws from each register, scaling back their values by their position with a tilt function to mimic memory fading with time.↩︎

  11. The correlation of the system with its past states is effected in several ways: in coarse discrete values through the analog shift registers, in fine discrete values through the slewing of parameters and the filters, and in continuous process through the system’s meta positioning in different states (such as those resulting from the phase relationships between the filter’s LFOs and their amplitude envelopes) and through feedback itself. For a conceptual approach to these, see “It feeds back” in Haptic Box Background .↩︎