Nobody ever admits that the media rivalry between sound and vision is serious but it’s been a talking point ever since radio challenged the primacy of newspapers. Then in the television age, radio people denied that they were the poor relations of film and TV types. They joked that radio had “the best pictures” - those it created in the listener’s mind.
The BBC’s Sounds Amazing conference showed that audio is now full of genuine confidence - if not gaining the upper hand on pictures, then certainly giving them a run for their money.
Matthew Postgate, chief technology and product officer for the BBC’s Design and Engineering department, opened the conference with a big claim about the primacy of audio: unlike vision, “sound is pervasive whenever we’re awake”. Ears never close: they must be more important than eyes.
He reported that studies have shown an audience’s experience of the immersive quality of an audio-visual experience is more effectively improved by better sound than higher picture quality.

Matthew then identified two trends that are helping audio: first, there’s the ever-increasing pervasiveness of media, promoting portable, inescapable audio. Second, there’s the increasing immersiveness of media experiences, virtual reality (VR) and augmented reality (AR) being at the cutting-edge, which exploit audio’s new technologies - such as binaural sound, discussed later in the day.
Next up at the conference was James Robinson, the producer of Tracks, a nine-part BBC podcast that was later broadcast on BBC Radio Four. It’s a drama in 44-minute episodes telling the story of an air crash by using extensive sounds of planes, cars and other carefully-placed effects to build an atmosphere behind comparatively sparse dialogue.
Robinson played a clip that involved a car crash. He admitted he was still worried: “is it clear what’s going on?” he asked. “It’s fine to lose the audience, but only for a moment.”
James said there’s “an explosion in online audio” (and he wasn’t talking about his plane crash), with three times as many people listening to podcasts than five years ago. But BBC Radio is still on a different scale: 10 times as many people listened to Tracks on Radio Four than as a podcast.
To understand the precise relation between pictures and sound on TV, you needed to hear the next presentation, a double act between BBC colleagues Kate Hopkins and Graham Wild, sound designer and sound mixer respectively, both of whom had worked on Blue Planet and other BBC natural history series.
Kate showed an amazingly complex screenshot (below) of the tracks she laid for Graham to mix on a film. They were made up of location sounds, other existing audio and foley (specially recorded sounds for particular spots in the film). She played an example of a strong visual sequence, of cities at night, whose pictures were given to her to edit completely mute. The finished sequence was a rich mix of sounds, some matching the pictures, but others, she explained, emphasising shot changes and sped up shots rather than anything pretending to be naturalistic.

As the mixer, it’s Graham’s job to combine Kate’s tracks with “hundreds” of music tracks and multiple takes from David Attenborough's commentary lines. With so much sound to play with, a moment of silence in the finished film is prized by a mixer. “If I can get silence in, I’m really happy,” Graham said.
It turns out that sound editing is affected by the size of the screen you’re expecting the film to be viewed on. Bigger screens mean slower editing because it takes the viewer a moment to look around the shot.
That Kate and Graham were talking to professionals was obvious when they answered an audience question: how long did they get to work on each Blue Planet episode? The answer - 15 days to track lay and two or three days to mix - produced a round of respectful applause.
There were more comparisons between radio and TV in the next panel, from Mohit Bakaya, commissioning editor, factual for Radio Four: “television understands what it’s all about: it’s about the pictures,” he said. Radio should be about the sounds. But there hasn’t been enough attention paid to that. There’s too much use of “haphazardly”-chosen library music, Mohit said. “There’s a huge opportunity for us to think about what audio does well, in the way that film-makers have done for years.”
Producers can enhance their work through sound “to get people to feel as well as think their way into a story”. He’d like sound to be used “to create hermetically-sealed universes” and he wants to commission programmes that “tap into the emotional side of radio, rather than just the thinking side”. Podcasts could benefit from the same approach, said Mohit, and be more than just “chat, chat, chat”.
An obvious new frontier for audio is the internet-enabled speakers which a few months ago were called “digital assistants” but whose field is now, more grandly, known simply as “voice”. Mukul Devichand is the BBC’s new “executive editor, voice” (which, he admitted, made his mum think he’s the producer of a talent show). He said the new devices change “the nature of our relationship with the internet to a conversational one”.

Mukul gave a live demonstration of an experimental “skill” (as Amazon calls apps for Alexa) that the BBC has developed for an Echo device. It uses the rushes of an existing BBC interview with Matthew Walker, a sleep expert. Walker’s answers were analysed and linked to possible questions that a user might ask - things like “why do we need sleep?”
If the device recognises a question, it jumps to the appropriate sound-bite from the interview. More impressive than the general questions was the skill’s ability to respond to more obscure ones like: “Alexa, what about Margaret Thatcher?” which successfully cued a reference to Lady Thatcher’s famously small sleep requirement.
For a preview of where the new world of voice is taking us, Mark Savage, a BBC music reporter, spent four weeks with seven of the devices in his home. He describes the experience as “living in an Orwellian nightmare”, but it wasn’t all bad. Talking to the internet lets you “raise your eyes up from your phone” and interact with your family.
But his early enthusiasm waned. Mark found none of the devices were much good at understanding his Northern Irish accent. More fundamentally, when offered the chance to pick from almost any piece of music just by calling its name, he found he couldn’t think of anything he wanted to hear.
