A Common Scenario: Home-Recording a Music Performance Video
So, you've practiced playing some music, which might be original, or it might
be a cover, and now you want to record a video of yourself playing it and you want to put
your video onto YouTube.
At this point you discover a problem that many others before you have discovered:
Your home video camera records good quality video, which is good enough to make a video
which is going to be uploaded to YouTube. (Here, a "video camera" might be an actual
camcorder, or a digital photo camera, or just your mobile phone.)
Your home video camera has a microphone, but the recording sound quality is not
good enough. Also, it's mono-only (no stereo).
You can easily record quality audio into your home computer. This is full quality stereo
sound, recorded via a line-in or microphone input on the computer's sound card.
But, and here's the problem we need to solve, there is no way to easily and automatically
sync the two recordings.
Some Possible Solutions
Get a camera that accepts stereo line-in or external microphone input
Typically cheaper consumer video recorders do not have external stereo
line-in audio inputs, so this is a more expensive solution.
A variant on this solution is the combination of an recent model Iphone with a
USB audio adapter such as the
"Mikey" Digital. (I have not
used this device, but the FAQ
assures us that the line-in is a digital stereo input.)
Red Giant claims to solve the alignment problem, and you can see from that page how much
it costs: $US199 (at the time of writing). Also only currently available for Mac, so if
you don't have a Mac, you'll have to buy one of those as well, or wait for the Windows
version. (And if you don't have either Mac or Windows, then presumably you're out of luck.)
Clapperboard is the traditional
approach to this problem: a device which makes a noise which starts when the device is
in a particular position that can be visually identified.
In a musical context, the clapperboard can be replaced by playing a particular
note on your musical instrument, preferably with a timbre that has a rapid attack. For
this to work properly, the video camera has to have a clear view of whatever finger or
hand action is producing the note.
Of course once you've recorded your video and audio with the clapperboard clapping
at the start of the video, you then have to find some video editing software
that makes it easy to identify the moment of the "clap" within both the video and audio tracks,
and then carry out the necessary realignment.
An Alternative Solution: Align the Audio Tracks
The best solution which I have found is a variant on the clapperboard method,
but it requires only an audio "clap",
and it works independently of any particular video editing software.
What it does require is that the video camera has its own microphone which is recording
the same sound as the separate higher quality audio recording that you are making.
And it requires access to some audio editing software.
The problem of alignment can be stated as the problem of how much time the separate audio
recording has to be moved forwards or backwards to align with the video.
To solve this problem, you only need audio software. The most suitable software for this
purpose is Audacity, which is freely available
(for free) for Mac, Windows and Linux.
To successfully align your video and audio, proceed as follows:
Record the video and separate audio, with a suitable reference sound at the beginning of
the recording (this will be trimmed off once you are finished, so remember to leave a suitable
gap between the reference sound and whatever it is you are actually recording).
If they are not already there, copy the video and audio recordings to your computer.
(For sound, it is recommended to save as uncompressed WAV, only compressing at the last moment
when the final video is to be created. For video, from a consumer video recorder, it is likely
to be pre-compressed anyway.)
In Audacity, open the video file. Being quite clever about media formats, Audacity
will automatically extract and load the audio track of your video. If it doesn't, then you'll
have to find some other way to get the audio track out of the video.
To add the separately recorded audio track to the same Audacity window, choose the menu item
File => Import => Audio ...
Note that individual tracks can be muted and unmuted. Also note that, due to being recorded
differently, the correspondence between the two tracks may not be completely obvious visually.
For example, the camera audio may contain ambient room noise that is not readily distinguishable
from music just from looking at the wave forms.
For each track, separately, determine the precise millisecond when the reference note occurs.
Do this using a combination of play, stop, zoom in, zoom out and scrolling back and forth in time.
Write down the time of the reference sound in milliseconds for each track.
Using your calculator, subtract one number from the other to determine the size of the gap in
Now, shift the separately recorded audio track in the desired
direction by the required number of milliseconds. There are obviously two possible alternatives:
Shift forward, by inserting silence. In Audacity this is done via the
Generate => Silence... menu option, making sure to select the hh:mm:ss + milliseconds
input option (confusingly, the default "seconds" option looks like it might work, but that's a
comma not a decimal point, so choosing 3,600 will add an hour of silence, not 3.6 seconds).
Shift backward, by firstly selecting from time zero until the required number of milliseconds (you can for
example type the actual numbers into the Selection Start and End values at the bottom
of the screen, selecting format hh:mm:ss + milliseconds so that you can specify milliseconds,
and then secondly hitting the Delete key).
Once shifted, you can test for alignment by playing with both tracks unmuted. If it just sounds
like one audio track, then you're probably good to go.
Now, delete the extracted video audio track from the Audacity window, and then
export the aligned audio track to a new WAV file.
At this point, you still have a camera video with its own audio track, and you have a separately
recorded audio track which is now perfectly aligned with the video audio track.
Now you need to use some video editing software to replace the video's original audio track
with the separately recorded track.
The video editor I have used is Avidemux. The
following steps will create an output video with replacement audio:
In Avidemux, choose Open to open the video file.
From the Avidemux menu, choose Audio => Main Track
In the dialog, change the Audio Source option from Video to External WAV
Click on the Browse... button to select the aligned audio track.
This gives you a video with the audio track replaced by the separately recorded and correctly aligned audio track.
WARNING: Avidemux may not actually correctly align video and audio when playing the video.
(At least it didn't for me. However this doesn't prevent you using Avidemux for simple editing tasks.)
Use the selection options to trim the beginning and end of the video (in particular you won't want
to retain the initial reference sound).
Select desired video encoding, audio encoding and container format.
Click Save and choose a file name to save to.
Wait a while ...
When it's done, press OK, and you're done!
Upload to YouTube, or whatever.
One issue I have not dealt with here is drift, i.e. the audio and video might be in sync
at one point in time, but then drift apart. Drift is caused by different recording devices not
agreeing on how fast time is, or, to put it another way, disagreeing on how long a second is.
The simplest form of drift is where the difference between time between two devices is a fixed constant.
In this case, it should be possible to fix the problem with the help of two reference sounds: one
at the start of the video, and one at the end of the video. You'll need to note the alleged time of
both reference sounds in both videos, and then do a bit of algebra to figure out the required addition
and multiplication to get one track in sync with the other from start to finish.
The PluralEyes software mentioned above does claim to solve this type of problem,
and possibly more complicated desychronization problems as well.
In practice, for the length of recordings I've made, and the equipment I have been using, I haven't
had any problems with drift. This is one thing you can verify by the step above where you play the
camera track and aligned separate audio track simultaneously. If there is any drift, this should become
obvious as a change in the quality of the sound as the tracks play from beginning to end.
My Equipment, and an Example
The equipment I have used to record video and audio is:
A Canon IXUS 100 IS
Dell Inspiron running Ubuntu 11.0 with a Asus Xonar-D1 sound card.
For an example of video that I have processed using the steps described above, see
the following two videos on YouTube ...
With audio from the camera microphone, and including the reference sound:
With separately recorded audio, and with the reference sound trimmed from the start: