ffmpeg for streamers

2015-08-20

Recording a screencast is fairly trivial these days. You get any of the recording software packages out there, turn on a mic, maybe even a camera, and you start jabbering about the topic at hand. But then what? If everything is fine just push that video to youtube or any content platform of your choice. But what if it's not? Here are some tricks to help you with post processing.

TL;DR this post is a collection of my experiences recording videos (in linux) and some post processing woes you're bound to discover. First part covers how to use OBS to record multi audio tracks so you can actually do some post processing when needed. Second part explains away some of the magic of using ffmpeg, a powerful video manipulation program and some examples of common use cases when streaming.

This guide is linux oriented, though I think most of it can be done under windows or on the mac as well. It mainly requires a library called "ffmpeg" (ffmpeg.org). A free cross platform open source program that can manipulate all kinds of videos and their audio. Since it is command line only it may have a steep learning curve, but once you figure out how to use it the sky is the limit.

OBS


So first for recording. I turned to OBS (obsproject.com). This is a great free open source recording software that is available cross platform and can do nearly anything a streamer desires, out of the box. I've only used it on Linux so far but it's not hard to imagine this working just as well on other platforms. I'm seriously impressed by what this software offers you for the price of nothing.

There's some setup required for OBS. Of course you need to set the main window and sounds to capture. While initially you may be tempted to record the whole desktop, you should in fact only record the window you are casting about, because for anything else... well why would you. So you create a single scene and add a "Window Capture" item. Double click on this item to show a configuration panel where you can select the window to record. You can also toggle a few things here. I had to enable the "swap red and blue" option, but you'll just have to see for yourself. If you're seeing blue where you ought to be seeing red, just toggle this once and never look back. The "Capture cursor" strongly varies per application. Some do and some don't need this setting enabled. I don't use the rest but if you need them they are pretty self explanatory, and you'll get a preview when configuring them, anyways.

Multi track audio


Okay that settles video. Audio is a different matter. A different beast altogether. Recently OBS added support for multi track audio recording. Which is _awesome_. I jumped in at the perfect moment because I think you really need to record background audio and the microphone separately in case you need some post processing. And since OBS supports this now, there's no need for weird setups to capture the microphone. Grrrreat!

The way to set this up is a bit cumbersome though and you may have to check its state time and again to make sure everything is set as you want it to (I've had it revert once, lucky enough I discovered it in the second video recorded that way, unlucky enough I really needed it for that video so that one now has bad audio). I'm going to assume you've plugged in a microphone of sorts. I can't help you with microphone calibration; I'm still finding my way through that hell hole. But here's what you do to set up multi track audio recording in OBS;

We are going to record videos with three different audio tracks: one with background sound and microphone sound mixed together. One track with just the background sound. And one with just the microphone sound. This serves two main cases: being able to tweak the volume balance between the two tracks and being able to mute out certain parts of the background audio in case of a copyright claim.

Another use case may be post processing of the microphone track. For example there is a bug in Ubuntu that causes the volume of one channel of my microphone to be reduced to about 10%, where the other channel remains at 100%, after a suspend-resume. This means you almost only hear my voice on the right side and is super annoying. Luckily we can fix that and recording the microphone in a separate track means we only apply the fix to the microphone, not to the application sound.

In OBS go to settings. Click on the "Output" tab. By default this opens the streaming settings, but we want the "recording" settings. However, first at the top set "Output Mode" to "Advanced", for otherwise we don't even see the settings we need. Then click on the "Recording" tab right below that drop down.

Keep "Type" to "Standard" unless you know what you're doing.
Set the recording path to some dir for output. Make sure there's enough space to record. My videos are about 500mb-3gb for 30-45 mins at full HD (1920x1080). Though I think that upper bound was caused by a bad setting, but I'll need to record more to say for certain.
I have recording format set to "mkv". Honestly I doubt the type is really going to matter if you're just going to upload your video anyways, but take note that some players only support certain types. However, in my tests OBS currently only does multi audio tracks properly for mkv, so that's why I'm using it. Results may vary and future releases will change this as it should also work properly for mp4, but it does not for me at the time of writing. Again, I don't think it matters much for you unless you already know it matters. In that case you can try but these steps may simply not work for you.
Keep the first three "Audio track" checkboxes checked, but uncheck the fourth one. We won't use it. So you'll record track 1, 2, and 3.
I have "Encoder" set to "(Use stream encoder)". I can't tell you what the difference will be if you change this.
I don't have "Rescale Output" set for my case. I don't think it's relevant to the audio stuff.

Finally select the "Audio" tab on the horizontal tabs (so the right-most one). Here we can assign names to tracks and set their bitrate. I've named my tracks to clarify their intent. This is optional though as most players don't even use it. Unfortunately you cannot set the track language (yet), so that remains "undetermined" or "und" in most if not all players and even editors. Note that ffmpeg does see this title so it may help you :)

Click on the "Audio" tab in the left part.
Make sure that you've set the "desktop Audio Device" to whatever you can select here but not your mic or something silly like that. Just something that reflects your main sound ouput, where your speakers are connected and such.
Make sure "Mic/Auxiliary Audio Device" is set to your microphone. The other devices are "Disabled" for me. This guide is assuming this so adjust accordingly.
I have "Sample rate" and "Channels" set to their defaults (44.1/Stereo).

That's all we need for audio in the settings panel, so close it for now. The last important step is actually setting up the tracks. This is done elsewhere, namely the "Mixer". In the main window, right above the audio bars in the middle-bottom there's a "Mixer" button with a configuration wheel. Click the wheel to open the track editor ("Advanced Audio Properties"), I expect this UI to change soon because it's far too subtle. Here you should see two audio inputs to configure, one should be mic and other should be desktop audio. Downmixing to mono may be an option you need but I don't use it. The important part here is to the right, the "Tracks". I suggest to record desktop AND microphone to track 1. This is what most players and services use by default and this is what you want most of the time. Then in track 2 record desktop only (so disable it for your microphone). In track three only record your mic (so disable it for desktop). Four should not be recorded but you can uncheck it anyways.

Volume


Last but not least is tweaking the volumes. This is actually the hardest part. It's easier if you record the same thing all the time, but even then you'll need to tweak it the first time. I've read that as a rule of thumb you'll want desktop sound to be about 75% of your microphone level for commentary. Honestly I'm still experimenting with audio and microphone settings myself so I may not be the best to advice you here. Either way, listen back to test recordings and make sure your voice is louder than the background sound and you should be good to go.

Hotkeys


I've learned it helps to at least have a shortcut button to start/stop recording. In my case, some games use fullscreen mode and the window collapses if you force another window (OBS) to focus. Additionally this prevents those awkward mouse moves to click the stop record button. A macro makes this trivial. You do have to take care not to stop recording too quickly after your last words or they may be truncated. You'll get used to that. You can setup these and other macros in the "Hotkeys" tab in the settings. They work pretty solidly for me.

Record and upload


Note that at this point your microphone bar in the main window should be jumpy when you speak and the desktop audio bar should be jumpy if your computer is making any kind of noise, like playing background music. If either is not the case (or both), you're probably not going to record the sounds you want. I can't help you here. Also, OBS should show a live preview of the target window, including mouse movement, so if that's not the case you'll need to fix that first as well. Don't forget to peek at OBS while recording every once in a while. You won't like it to discover after an hour that OBS is not recording at all, or that your video isn't targeting the right window.

Now. Record a movie. Tweak the audio. And play it back. Make sure the levels are proper or correct them otherwise. Repeat until satisfied. Remember to make sure OBS targets the proper window after testing (it never updates for me so this is a manual step for me). Then record your video. I'll wait.

Finished? Cool. You should now have an "mkv" file of some mega or even gigabytes. Open it, check if it's what you expect. Then upload the video if you're fine with it. Below are some post processing steps you can do to the video if needed. We can also reduce the size of the file first in case upload speed/size is a concern for you.

Post processing


I've identified three generic use cases for simple post processing, where you don't really edit the video visually but only adjust some of the audio. On top of that I have one special use case for myself that I'll explain.

Aside from reducing the file size of the video for uploading, I guess the biggest use case is re-balancing the volume of the desktop and microphone tracks. No matter how well you prepare, your target application may make more or less noise as you had anticipated. Sometimes you just need to adjust these levels. Luckily this can be done with the Swiss army knife called "ffmpeg" (ffmpeg.org).

ffmpeg terminology


Since we'll use ffmpeg let's start off with some clarifications. We're assuming the video setup above, so our input videos will have four tracks, three of which are audio. When processing below, we will discard the first audio and only use the second (desktop) and third (microphone). In the command line args, these indexes offset at zero. That means that the first track of the file, which is video, will be track 0. Track index 1 will be desktop+microphone, track index 2 just desktop, and track index 3 is just the microphone.

We can ignore video so we tell ffmpeg to simply copy the video data as is, rather than re-encoding it. This is done with -c:v copy, which means so much as "option codec, just for video tracks, for any such track just copy the existing data". The same could work for -c:a copy for audio, or -c copy for any kind of track, but we actually want to re-encode the audio so we do it just for video.

In the command line args we'll refer to tracks in a special way. The args will often repeat forms of 0:a:1, which means "first input file, of all audio tracks, pick the _second_ one" (note: second, because the first one is 0!). Similarly you can use v and some other symbols we won't use here. 3:2 would simply mean "of the fourth file, pick the third track while ignoring actual types when counting".

Merging audio


We will need to make use of the -filter_complex option because that's currently the only way to merge two audio tracks into one, apparently. But also it's a very powerful sub-tool to manipulate individual tracks inside the file. Generally the format is -filter_complex "input-filter-alias; input-filter-alias; ...; input-filter[-alias]". Each rule is closed with a semi-colon except for the last. There's more to this syntax but I'm omitting the parts we don't need, like I omit the output handle because then it implicitly becomes the output file. The rule applies to a track, which may have a filter applied and then an alias defined to that result. Something like [0:a:1]volume=0.3[MIC] means to pick the second audio track of the first input file, set volume to a third (I think..) and make an alias [MIC] for the result.

To merge two audio tracks you can use amix and amerge and truth be told, I'm not entirely sure which to use when. I didn't really hear the difference in my trials, but perhaps their explicit use case was not relevant for my tests. I did not investigate them thoroughly. They take two tracks by default, though can have more then one as long as you tell them so. By default: [0:a:0] [0:a:1] amerge, inputs are of arbitrary origin so same file or various files or whatever is not relevant. To merge more than two tracks use inputs=N like so: [A][B][C][D]amerge=inputs=4.

There are various filters you can use. Search the docs for details. I'll touch on two here; pan and volume. volume is the easiest; just tell ffmpeg how loud you want your track to be. The value for volume can be relative, or in dB (maybe more) because that may make more sense.

pan


The pan filter allows you to do crazy stuff with channels of an audio track on a per-channel-basis. For example, "mono" has just one channel, "stereo" has two channels, and 5.1 has six of them. Most channels have predefined symbols like FL and FR, but I don't know all of them. You can swap them around, merge them, pan. In a way, pan subsumes volume. See this great resource for visual examples on channel manipulation. The syntax looks something like

Code:
[SOURCEa][SOURCEb]amerge=pan=TYPE:OUTPUTCHANNELx=INPUTCHANNELa+INPUTCHANNELc|OUTPUTCHANNELy=INPUTCHANNELb+INPUTCHANNELd[ALIAS]

or as an actual example:

Code:
[0:a:1][0:a:2]amerge=pan=stereo:FL=c0+c2:FR=c1+c3[OUT]

The stream to the left of the = in FL=c0+c2 is the output channel of the resulting audio stream, FL meaning "Front Left", I think an alias for c0. The right hand side are symbols for channels relative to the input streams, so c0 means the left channel of the first input of the merge (not globally! just the merge) and c2 means the left channel of the second input stream to merge. This assumes we are merging two stereo input streams because the channel index simply (seems to) counts upward, which is why c2 signifies the first, left, channel of the second stereo stream. I think the link explains it better so do check it out.

Examples


Note: for filters to take effect in ffmpeg, you need the target type (audio, video) to be re-encoded. If you force prevent the re-encoding with -c copy or something, any filters are ignored or an error is raised. Basically if it takes just a few seconds for ffmpeg to finish on your half hour video, you're not encoding anything and filters like volume and pan are ignored :) Copying is relatively fast, which is why we force copying the video in these examples (-c:v copy).

Okay having said all that, let's get cracking. We use in.mkv as your input video and store the result in out.mkv, overwriting it without question (-y).

To remove the extra two tracks from the mkv to reduce file size:

Code:
// -map allows you to pick specific input tracks to copy, output tracks will in order of -map in the command line
ffmpeg -i in.mkv -map 0:0 -map 0:1 -c copy -y out.mkv

If you want to drop the "all" audio (0), and re-encode the desktop and microphone, and have desktop be 75% of microphone level:

Code:
ffmpeg -i in.mkv -c:v copy -filter_complex "[0:a:1][0:a:2]amerge,pan=stereo:c0<0.75*c0+c2|c1<0.75*c1+c3" -y out.mkv

Mute out a certain part of the background when youtube puts down a copyright claim on it. Let's say the contested part is 5 minutes in and lasts for 1.5 minute. We'll keep the microphone track as is but mute out part of the desktop track, merge that together as the only track of the result. Video stays as is of course. We'll use multiple steps, I think you can combine them but I tend to run into codec issues when trying to do that. So we extract the audio, store it temporary, then merge everything back in:

Code:
// 300 = 5 * 60 = 5 min. 90 = 1.5 min. -af is the "audio filter" option
ffmpeg -i in.mkv -af "volume=enable='between(t,300,390)':volume=0" -map 0:a:1 -y bg-blipped.mp3
ffmpeg -i in.mkv -map 0:a:2 -y mic.mp3
ffmpeg -i in.mkv -i bg-blipped.mp3 -i mic.mp3 -c:v:0 copy -filter_complex "[1:a:0][2:a:0]amerge" -y out.mkv

// or we can omit the mic step:
ffmpeg -i in.mkv -af "volume=enable='between(t,300,390)':volume=0" -map 0:a:1 -y bg-blipped.mp3
ffmpeg -i in.mkv -i bg-blipped.mp3 -c:v:0 copy -ac 2 -filter_complex "[0:a:2][1]amix[out]" -map 0:v -map [out] -y out.mkv

// and as a one-liner:
ffmpeg -i in.mkv -c:v:0 copy -filter_complex "[0:a:1]volume=enable='between(t,300,390)':volume=0[A]; [A][0:a:2]amix[out]" -map 0:v -map [out] -y out.mkv

If your microphone only recorded one channel in a stereo file you can remix that. It's probably lossy but better than having your voice in one ear only. This outputs a file with just one audio track, where the microphone is fixed to be stereo by copying right to left. The desktop audio is kept as is. The result is merged together:

Code:
// using -ac 1 you force mono output
ffmpeg -i in.mkv -map 0:a:2 -ac 1 -y mic.mp3
ffmpeg -i in.mkv -i mic.mp3 -c:v:0 copy -filter_complex "[0:a:1][1]amerge" -y out.mkv

// or using pan
// in my case, both channels did record something but one was significantly softer
// if you have a mute channel, you can simplify this a bit. left as an exercise to the reader ;)
ffmpeg -i in.mkv -filter_complex "[0:a:1][0:a:2]amerge,pan=stereo|c0=c0+c2+c3|c1=c1+c2+c3" -y out.mkv

There are many many many more things you can do with ffmpeg. Add watermarks or logos to the video. Replace audio tracks. Fade in and out. Montage intros and outros. It's really a versatile tool which I expect to be using more frequent as I get more acquainted with it myself.

Why


If you follow me you may be wondering what I'm using all this for anyways. In that case I've done my job well and will leave you to wonder about that for a little while longer :) Happy hunting, though it shouldn't be that hard.

(This is a very error prone subject. Feel free to report me any issues...)