a blog about things that I've been thinking hard about

How to Sync Video and Separately Recorded Audio, Using Only Open-Source Software

4 November, 2012
sync audio and video by aligning two audios

Syncing video and audio is hard when they are recorded separately.

It's easy to sync the low quality audio from your camera with your separately recorded high quality audio.

After syncing the audios, make a note of the time offset.

Apply the time offset to the high quality audio before substituting it into the video.

A Common Scenario: Home-Recording a Music Performance Video

So, you've practiced playing some music, which might be original, or it might be a cover, and now you want to record a video of yourself playing it and you want to put your video onto YouTube.

At this point you discover a problem that many others before you have discovered:

Some Possible Solutions

Get a camera that accepts stereo line-in or external microphone input

Typically cheaper consumer video recorders do not have external stereo line-in audio inputs, so this is a more expensive solution.

A variant on this solution is the combination of an recent model Iphone with a USB audio adapter such as the "Mikey" Digital. (I have not used this device, but the FAQ assures us that the line-in is a digital stereo input.)

Alignment Software

PluralEyes from Red Giant claims to solve the alignment problem, and you can see from that page how much it costs: $US199 (at the time of writing). Also only currently available for Mac, so if you don't have a Mac, you'll have to buy one of those as well, or wait for the Windows version. (And if you don't have either Mac or Windows, then presumably you're out of luck.)

The Clapperboard

The Clapperboard is the traditional approach to this problem: a device which makes a noise which starts when the device is in a particular position that can be visually identified.

In a musical context, the clapperboard can be replaced by playing a particular note on your musical instrument, preferably with a timbre that has a rapid attack. For this to work properly, the video camera has to have a clear view of whatever finger or hand action is producing the note.

Of course once you've recorded your video and audio with the clapperboard clapping at the start of the video, you then have to find some video editing software that makes it easy to identify the moment of the "clap" within both the video and audio tracks, and then carry out the necessary realignment.

An Alternative Solution: Align the Audio Tracks

The best solution which I have found is a variant on the clapperboard method, but it requires only an audio "clap", and it works independently of any particular video editing software.

What it does require is that the video camera has its own microphone which is recording the same sound as the separate higher quality audio recording that you are making. And it requires access to some audio editing software.

The problem of alignment can be stated as the problem of how much time the separate audio recording has to be moved forwards or backwards to align with the video.

To solve this problem, you only need audio software. The most suitable software for this purpose is Audacity, which is freely available (for free) for Mac, Windows and Linux.

Detailed Instructions

To successfully align your video and audio, proceed as follows:

At this point, you still have a camera video with its own audio track, and you have a separately recorded audio track which is now perfectly aligned with the video audio track.

Now you need to use some video editing software to replace the video's original audio track with the separately recorded track.

The video editor I have used is Avidemux. The following steps will create an output video with replacement audio:

Drift

One issue I have not dealt with here is drift, i.e. the audio and video might be in sync at one point in time, but then drift apart. Drift is caused by different recording devices not agreeing on how fast time is, or, to put it another way, disagreeing on how long a second is.

The simplest form of drift is where the difference between time between two devices is a fixed constant. In this case, it should be possible to fix the problem with the help of two reference sounds: one at the start of the video, and one at the end of the video. You'll need to note the alleged time of both reference sounds in both videos, and then do a bit of algebra to figure out the required addition and multiplication to get one track in sync with the other from start to finish.

The PluralEyes software mentioned above does claim to solve this type of problem, and possibly more complicated desychronization problems as well.

In practice, for the length of recordings I've made, and the equipment I have been using, I haven't had any problems with drift. This is one thing you can verify by the step above where you play the camera track and aligned separate audio track simultaneously. If there is any drift, this should become obvious as a change in the quality of the sound as the tracks play from beginning to end.

My Equipment, and an Example

The equipment I have used to record video and audio is:

For an example of video that I have processed using the steps described above, see the following two videos on YouTube ...

With audio from the camera microphone, and including the reference sound:

With separately recorded audio, and with the reference sound trimmed from the start:

Vote for or comment on this article on Reddit or Hacker News ...