Link to newsround

Robot learns how to lip sync using AI and YouTube

Media caption,

Meet EMO, the lip-syncing robot

Robots can already dance, play football, and even help out with jobs around the house. But now, engineers in the US have invented one that lip sync.

The robot is called EMO, and it can learn and recreate the way that humans move their lips when they talk.

Human lip movements happen due to a complicated combination of muscles, bones and skin working together - which scientist say is very hard to reproduce.

Instead of being given step-by-instructions to follow, EMO used artificial intelligence (AI) and a process called 'observational learning' - which means learning by watching and copying another person's behaviour.

The robot even learned to sing a song out of its own AI-generated debut album "hello world".

How did the robot learn to lip sync?

children singing into microphonesImage source, Getty Images
Image caption,

Could a future robot beat you in a lip sync battle?

First, the robot was programmed to keep moving the 26 motors in its face and to watch itself doing that using its own reflection in a mirror.

It made thousands of random face expressions and lip gestures, learning how to move its motors to achieve particular movements.

Then, scientists used hours of YouTube videos to show how humans move their mouths when they speak and sing.

This helped the computer inside the robot match mouth movements to sounds, too.

In the science journal Science Robotics, researchers at Columbia University in New York explained how they tested EMO by playing it lots of sounds, including different languages and songs to see if it could keep up.

What did the robot scientists say?

robot face with a sunglasses emoji on a black screen Image source, Getty Images
Image caption,

Getting a robot face to act and move like a human's is big challenge

It's thought to be the first time that a robot has ever been able to do this.

But making EMO wasn't easy.

Scientist Hod Lipson said the robot found it tricky to copy sounds like "B" and "W", which need careful lip movements.

He added: "The more it interacts with humans, the better it will get."