top of page
520510-PIWD0O-887.jpg
Anchor 1

TouchLab-Research

Latency and musical coherence in classical ensemble performance

How TERMINAL Enables High-Fidelity Telematic Music-Making

In classical ensemble performance, latency—the delay between an action and its perception—is not an anomaly but a structural component of musical interaction. Acoustic, perceptual, and cognitive factors ensure that musicians routinely perform within latency conditions that are far from zero. Understanding this natural framework is essential for evaluating digital systems intended for high-level artistic use.

​

Acoustic latency as an inherent part of ensemble playing


Sound travels at approximately 343 m/s. In a symphony orchestra, musicians positioned 15–20 meters apart inevitably produce arrival-time differences of 45–60 ms at the audience position. Yet orchestras are perceived as coherent entities because performers continually compensate for these offsets.


Typical strategies include:

  • Low-register instruments and instruments with slower attack characteristics routinely play slightly ahead of musicians closer to the front of the stage.

  • Listeners perceptually fuse events that occur within roughly 20 ms of one another; within this window, the auditory system treats multiple sources as a single musical event.

  • Large tutti passages naturally “fill in” microscopic discrepancies that would be immediately exposed in a duet.


Thus, classical ensemble performance already operates within a latency environment far exceeding the thresholds typically assumed in digital media engineering.

​​

TERMINAL audio: latency on the order of natural acoustics

​

The TERMINAL system delivers bidirectional audio with an end-to-end latency that is clearly below 20 ms, and often close to 15 ms within the Netherlands. This places TERMINAL audio squarely within the same temporal range as natural acoustic delays encountered on stage. For performers, the system therefore behaves not as a technical overlay but as an extension of familiar musical conditions.

​​

Because TERMINAL works with close miking, the auditory experience for both musicians and conductor is highly intimate: timbre, articulation and detail from colleagues are perceived as if everyone were positioned at a distance of roughly 50 cm. In other words, the sound image is much closer and more direct than on a conventional stage. The brain consequently interprets the signals as “nearby”, rather than as delayed, so that the telematic setup feels, if anything, more immediate and intimate than traditional ensemble playing.

​​

​​​

Conducting gestures and the multimodal nature of time synchronization

Conducting gestures provide a global temporal reference, but these gestures are not precise millisecond markers. Their power lies in their predictability, shape, and expressive timing. Musicians rely on a multimodal integration of visual information, auditory feedback from colleagues, and learned anticipatory strategies.

Synchrony emerges not from perfect simultaneity, but from shared expectations shaped through training and context.​

TERMINAL Conductor transmits conductor gestures as visual information to remote performers. Within national distances in the Netherlands, the end-to-end gesture-to-visualization delay is approximately 40 ms. Technically, this corresponds to a temporal spread comparable to larger physical distances within a hall or to working in an acoustic with a slightly longer response. In practice, however, the combination with the close-miked audio image means that both musicians and conductor experience the interaction very much like conventional ensemble conducting: the sound field is clear, focused and perceived as “close”.

 

The critical factor is therefore not only the absolute delay, but consistency is equally important. TERMINAL Conductor ensures that all endpoints receive the visual information simultaneously, with inter-endpoint variation kept within the natural tolerance band of ensemble musicianship.

​​​​

​

Video between venues: the remaining challenge

​

While audio and gestural communication operate well within musically acceptable latency bounds, transmitting live video images of performers between venues introduces additional challenges. To remain musically functional, such video should ideally maintain an end-to-end latency below about 100 ms. In practice, this kind of performance is currently only achievable in carefully optimised camera-to-monitor chains.

 

Real-world implementations in concert halls typically involve additional stages: software-based video mixers, scaling, format conversion and projection on large beamers or LED walls. Each of these adds latency, so that overall delays in such setups are often well above 100 ms. Reducing these values—potentially via high-refresh displays or dedicated low-latency pipelines—is an important area of ongoing research within the TERMINAL project.​

​​

​Conclusion


By aligning its technical parameters with the physiological and acoustic realities of ensemble performance, TERMINAL demonstrates that high-quality telematic music-making is not a compromise but a viable continuation of classical practice. With audio latency in the 15 ms range and visual-conductor latency around 40 ms, and with strict control over inter-endpoint consistency, TERMINAL operates within the same temporal ecology that musicians already navigate on stage.

©2025 by office INSOMNIO

bottom of page