7.10. Making a high quality MPEG-4 ("DivX") rip of a DVD movie

One frequently asked question is "How do I make the highest quality rip for a given size?". Another question is "How do I make the highest quality DVD rip possible? I don't care about file size, I just want the best quality."

The latter question is perhaps at least somewhat wrongly posed. After all, if you don't care about file size, why not simply copy the entire MPEG-2 video stream from the the DVD? Sure, your AVI will end up being 5GB, give or take, but if you want the best quality and don't care about size, this is certainly your best option.

In fact, the reason you want to transcode a DVD into MPEG-4 is specifically because you do care about file size.

It's difficult to offer a cookbook recipe on how to create a very high quality DVD rip. There are several factors to consider, and you should understand these details or else you're likely to end up disappointed with your results. Below we'll investigate some of these issues, and then have a look at an example. We assume you're using libavcodec to encode the video, although the theory applies to other codecs as well.

If this seems to be too much for you, you should probably use one of the many fine frontends that are listed in the MEncoder section of our related projects page. That way, you should be able to achieve high quality rips without too much thinking, because most of those tools are designed to take clever decisions for you.

7.10.1. Constant Quantizer vs. two pass

There are three approaches to encoding the video: constant bitrate (CBR), constant quantizer, and two pass (ABR, or average bitrate).

In each of these modes, libavcodec breaks the video frame into 16x16 pixel macroblocks and then applies a quantizer to each macroblock. The lower the quantizer, the better the quality and higher the bitrate. The method libavcodec uses to determine which quantizer to use for a given macroblock varies and is highly tunable. (This is an extreme over-simplification of the actual process, but the basic concept is useful to understand.)

When you specify a constant bitrate, libavcodec will encode the video, discarding detail as much as necessary and as little as possible in order to remain lower than the given bitrate. If you truly don't care about file size, you could as well use CBR and specify a bitrate of infinity. (In practice, this means a value high enough so that it poses no limit, like 10000Kbit.) With no real restriction on bitrate, the result is that libavcodec will use the lowest possible quantizer for each macroblock (as specified by vqmin, which is 2 by default). As soon as you specify a low enough bitrate that libavcodec is forced to use a higher quantizer, then you're almost certainly ruining the quality of your video. In order to avoid that, you should probably downscale your video, according to the method described later on in this guide. In general, you should avoid CBR altogether if you care about quality.

With constant quantizer, libavcodec uses the same quantizer, as specified by the vqscale option, on every macroblock. If you want the highest quality rip possible, again ignoring bitrate, you can use vqscale=2. This will yield the same bitrate and PSNR (peak signal-to-noise ratio) as CBR with vbitrate=infinity and the default vqmin of 2.

The problem with constant quantizing is that it uses the given quantizer whether the macroblock needs it or not. That is, it might be possible to use a higher quantizer on a macroblock without sacrificing visual quality. Why waste the bits on an unnecessarily low quantizer? Your CPU has as many cycles as there is time, but there's only so many bits on your hard disk.

With a two pass encode, the first pass will rip the movie as though it were CBR, but it will keep a log of properties for each frame. This data is then used during the second pass in order to make intelligent decisions about which quantizers to use. During fast action or low detail scenes, higher quantizers will likely be used, and during slow moving or high detail scenes, lower quantizers will be used.

If you use vqscale=2, then you're wasting bits. If you use vqscale=3, then you're not getting the highest quality rip. Suppose you rip a DVD at vqscale=3, and the result is 1800Kbit. If you do a two pass encode with vbitrate=1800, the resulting video will have higher quality for the same bitrate.

Since you're now convinced that two pass is the way to go, the real question now is what bitrate to use? The answer is that there's no single answer. Ideally you want to choose a bitrate that yields the best balance between quality and file size. This is going to vary depending on the source video.

If size doesn't matter, a good starting point for a very high quality rip is about 2000Kbit plus or minus 200Kbit. For fast action or high detail source video, or if you just have a very critical eye, you might decide on 2400 or 2600. For some DVDs, you might not notice a difference at 1400Kbit. It's a good idea to experiment with scenes at different bitrates to get a feel.

If you aim at a certain size, you will have to somehow calculate the bitrate. But before that, you need to know how much space you should reserve for the audio track(s), so you should rip those first. You can compute the bitrate with the following equation: bitrate = (target_size_in_Mbytes - sound_size_in_Mbytes) * 1024 * 1024 / length_in_secs * 8 / 1000 For instance, to squeeze a two-hour movie onto a 702MB CD, with 60MB of audio track, the video bitrate will have to be: (702 - 60) * 1024 * 1024 / (120*60) * 8 / 1000 = 740kbps

7.10.2. Constraints for efficient encoding

Due to the nature of MPEG-type compression, there are various constraints you should follow for maximal quality. MPEG splits the video up into 16x16 squares called macroblocks, each composed of 4 8x8 blocks of luma (intensity) information and two half-resolution 8x8 chroma (color) blocks (one for red-cyan axis and the other for the blue-yellow axis). Even if your movie width and height are not multiples of 16, the encoder will use enough 16x16 macroblocks to cover the whole picture area, and the extra space will go to waste. So in the interests of maximizing quality at a fixed filesize, it is a bad idea to use dimensions that are not multiples of 16.

Most DVDs also have some degree of black borders at the edges. Leaving these in place can hurt quality in several ways.

  1. MPEG-type compression is also highly dependent on frequency domain transformations, in particular the Discrete Cosine Transform (DCT), which is similar to the Fourier transform. This sort of encoding is efficient for representing patterns and smooth transitions, but it has a hard time with sharp edges. In order to encode them it must use many more bits, or else an artifact known as ringing will appear.

    The frequency transform (DCT) takes place separately on each macroblock (actually each block), so this problem only applies when the sharp edge is inside a block. If your black borders begin exactly at multiple-of-16 pixel boundaries, this is not a problem. However, the black borders on DVDs rarely come nicely aligned, so in practice you will always need to crop to avoid this penalty.

In addition to frequency domain transforms, MPEG-type compression uses motion vectors to represent the change from one frame to the next. Motion vectors naturally work much less efficiently for new content coming in from the edges of the picture, because it is not present in the previous frame. As long as the picture extends all the way to the edge of the encoded region, motion vectors have no problem with content moving out the edges of the picture. However, in the presence of black borders, there can be trouble:

  1. For each macroblock, MPEG-type compression stores a vector identifying which part of the previous frame should be copied into this macroblock as a base for predicting the next frame. Only the remaining differences need to be encoded. If a macroblock spans the edge of the picture and contains part of the black border, then motion vectors from other parts of the picture will overwrite the black border. This means that lots of bits must be spent either re-blackening the border that was overwritten, or (more likely) a motion vector won't be used at all and all the changes in this macroblock will have to be coded explicitly. Either way, encoding efficiency is greatly reduced.

    Again, this problem only applies if black borders do not line up on multiple-of-16 boundaries.

  2. Finally, suppose we have a macroblock in the interior of the picture, and an object is moving into this block from near the edge of the image. MPEG-type coding can't say "copy the part that's inside the picture but not the black border." So the black border will get copied inside too, and lots of bits will have to be spent encoding the part of the picture that's supposed to be there.

    If the picture runs all the way to the edge of the encoded area, MPEG has special optimizations to repeatedly copy the pixels at the edge of the picture when a motion vector comes from outside the encoded area. This feature becomes useless when the movie has black borders. Unlike problems 1 and 2, aligning the borders at multiples of 16 does not help here.

  3. Depite the borders being entirely black and never changing, there is at least a minimal amount of overhead involved in having more macroblocks.

For all of these reasons, it's recommended to fully crop black borders. Further, if there is an area of noise/distortion at the edge of the picture, cropping this will improve encoding efficiency as well. Videophile purists who want to preserve the original as close as possible may object to this cropping, but unless you plan to encode at constant quantizer, the quality you gain from cropping will considerably exceed the amount of information lost at the edges.

7.10.3. Cropping and Scaling

Native DVD resolution is 720x480 for NTSC, and 720x576 for PAL, but there's an aspect flag that specifies whether it's full-screen (4:3) or wide-screen (16:9). Many (if not most) widescreen DVDs are not strictly 16:9, and will be either 1.85:1 or 2.35:1 (cinescope). This means that there will be black bands in the video that will need to be cropped out.

MPlayer provides a crop detection filter that will determine the crop rectangle (-vf cropdetect). Because MPEG-4 uses 16x16 macroblocks, you'll want to make sure that each dimension of the video you're encoding is a multiple of 16 or else you will be degrading quality, especially at lower bitrates. You can do this by rounding the width and height of the crop rectangle down to the nearest multiple of 16. When cropping, you'll want to increase the y-offset by half the difference of the old and the new height so that the resulting video is taken from the center of the frame. And because of the way DVD video is sampled, make sure the offset is an even number. (In fact, as a rule, never use odd values for any parameter when you're cropping and scaling video.) If you're not comfortable throwing a few extra pixels away, you might prefer instead to scale the video instead. We'll look at this in our example below. You can actually let the cropdetect filter do all of the above for you, as it has an optional round parameter that is equal to 16 by default.

Also, be careful about "half black" pixels at the edges. Make sure you crop these out too, or else you'll be wasting bits there that are better spent elsewhere.

After all is said and done, you'll probably end up with video whose pixels aren't quite 1.85:1 or 2.35:1, but rather something close to that. You could calculate the new aspect ratio manually, but MEncoder offers an option for libavcodec called autoaspect that will do this for you. Absolutely do not scale this video up in order to square the pixels unless you like to waste your hard disk space. Scaling should be done on playback, and the player will use the aspect stored in the AVI to determine the correct resolution. Unfortunately, not all players enforce this auto-scaling information, therefore you may still want to rescale.

First, you should compute the encoded aspect ratio: ARc = (Wc x (ARa / PRdvd )) / Hc

where:

  • Wc and Hc are the width and height of the cropped video,

  • PRdvd is the pixel ratio of the DVD wich is equal to 1.25=(720/576) for PAL,

  • DVDs and 1.5=(720/480) for NTSC DVDs,

Then, you can compute the X and Y resolution, according to a certain Compression Quality (CQ) factor: ResY = INT(SQRT( 1000*Bitrate/25/ARc/CQ )/16) * 16 ResX = INT( ResY * ARc / 16) * 16

Okay, but what is the CQ? The CQ represents the number of bits per pixel and per frame of the encode. Roughly speaking, the greater the CQ, the less the likelihood to see encoding artifacts. However, if you have a target size for your movie (1 or 2 CDs for instance), there's a limited total number of bits that you can spend; therefore it's necessary to find a good tradeoff between compressibility and quality.

The CQ depends both on the bitrate and the movie resolution. In order to raise the CQ, typically you'd downscale the movie given that the bitrate is computed in function of the target size and the length of the movie, which are constant. A CQ below 0.18 usually ends up in a very blocky picture, because there aren't enough bits to code the information of each macroblock (MPEG4, like many other codecs, groups pixels by blocks of several pixels to compress the image; if there aren't enough bits, the edges of those blocks are visible). It's therefore wise to take a CQ ranging from 0.20 to 0.22 for a 1 CD rip, and 0.26-0.28 for 2 CDs.

Please take note that the CQ is just an indicative figure, as depending on the encoded content, a CQ of 0.18 may look just fine for a Bergman, contrary to a movie such as The Matrix, which contains many high-motion scenes. On the other hand, it's worthless to raise CQ higher than 0.30 as you'd be wasting bits without any noticeable quality gain.

7.10.4. Audio

Audio is a much simpler problem to solve: if you care about quality, just leave it as is. Even AC3 5.1 streams are at most 448Kbit/s, and they're worth every bit. You might be tempted to transcode the audio to high quality Vorbis, but just because you don't have an A/V receiver for AC3 pass-through today doesn't mean you won't have one tomorrow. Future-proof your DVD rips by preserving the AC3 stream. You can keep the AC3 stream either by copying it directly into the video stream during the encoding. You can also extract the AC3 stream in order to mux it into containers such as NUT or Matroska.

mplayer source_file.vob -aid 129 -dumpaudio -dumpfile sound.ac3
will dump into the file sound.ac3 the audio track number 129 from the file source_file.vob (NB: DVD VOB files usually use a different audio numbering, which means that the VOB audio track 129 is the 2nd audio track of the file).

But sometimes you truly have no choice but to further compress the sound so that more bits can be spent on the video. Most people choose to compress audio with either MP3 or Vorbis audio codecs. While the latter is a very space-efficient codec, MP3 is better supported by hardware players, although this trend is changing.

First of all, you will have to convert the DVD sound into a WAV file that the audio codec can use as input. For example:

mplayer source_file.vob -ao pcm:file=destination_sound.wav -vc dummy -aid 1 -vo null
will dump the second audio track from the file source_file.vob into the file destination_sound.wav. You may want to normalize the sound before encoding, as DVD audio tracks are commonly recorded at low volumes. You can use the tool normalize for instance, which is available in most distributions. If you're using Windows, a tool such as BeSweet can do the same job. You will compress in either Vorbis or MP3. For example:
oggenc -q1 destination_sound.wav
will encode destination_sound.wav with the encoding quality 1, which is roughly equivalent to 80Kb/s, and is the minimum quality at which you should encode if you care about quality. Please note that MEncoder currently cannot mux Vorbis audio tracks into the output file because it only supports AVI and MPEG containers as an output, each of which may lead to audio/video playback synchronization problems with some players when the AVI file contain VBR audio streams such as Vorbis. Don't worry, this document will show you how you can do that with third party programs.

7.10.5. Interlacing and Telecine

Almost all movies are shot at 24 fps. Because NTSC is 30000/1001 fps, some processing must be done to this 24 fps video to make it run at the correct NTSC framerate. The process is called 3:2 pulldown, commonly referred to as telecine (because pulldown is often applied during the telecine process), and, naively described, it works by slowing the film down to 24000/1001 fps, and repeating every fourth frame.

No special processing, however, is done to the video for PAL DVDs, which run at 25 fps. (Technically, PAL can be telecined, called 2:2 pulldown, but this doesn't become an issue in practice.) The 24 fps film is simply played back at 25 fps. The result is that the movie runs slightly faster, but unless you're an alien, you probably won't notice the difference. Most PAL DVDs have pitch-corrected audio, so when they're played back at 25 fps things will sound right, even though the audio track (and hence the whole movie) has a running time that's 4% less than NTSC DVDs.

Because the video in a PAL DVD hasn't been altered, you needn't worry much about frame rate. The source is 25 fps, and your rip will be 25 fps. However, if you're ripping an NTSC DVD movie, you may need to apply inverse telecine.

For movies shot at 24 fps, the video on the NTSC DVD is either telecined 30000/1001, or else it is progressive 24000/1001 fps and intended to be telecined on-the-fly by a DVD player. On the other hand, TV series are usually only interlaced, not telecined. This is not a hard rule: some TV series are interlaced (such as Buffy the Vampire Slayer) whereas some are a mixture of progressive and interlaced (such as Angel, or 24).

It's highly recommended that you read the section on How to deal with telecine and interlacing in NTSC DVDs to learn how to handle the different possibilities.

However, if you're mostly just ripping movies, likely you're either dealing with 24 fps progressive or telecined video, in which case you can use the pullup filter -vf pullup,softskip.

7.10.6. Filtering

In general, you want to do as little filtering as possible to the movie in order to remain close to the original DVD source. Cropping is often necessary (as described above), but do not scale the video. Although scaling down is sometimes preferred to using higher quantizers, we want to avoid both these things: remember that we decided from the start to trade bits for quality.

Also, do not adjust gamma, contrast, brightness, etc. What looks good on your display may not look good on others. These adjustments should be done on playback only.

One thing you might want to do, however, is pass the video through a very light denoise filter, such as -vf hqdn3d=2:1:2. Again, it's a matter of putting those bits to better use: why waste them encoding noise when you can just add that noise back in during playback? Increasing the parameters for hqdn3d will further improve compressibility, but if you increase the values too much, you risk degrading the image visibily. The suggested values above (2:1:2) are quite conservative; you should feel free to experiment with higher values and observe the results for yourself.

7.10.7. Encoding options of libavcodec

Ideally, you'd probably want to be able to just tell the encoder to switch into "high quality" mode and move on. That would probably be nice, but unfortunately hard to implement as different encoding options yield different quality results depending on the source material. That's because compression depends on the visual properties of the video in question. For example, anime and live action have very different properties and thus require different options to obtain optimum encoding. The good news is that some options should never be left out, like mbd=2, trell, and v4mv. See below for a detailed description of common encoding options.

Options to adjust:

  • vmax_b_frames: 1 or 2 is good, depending on the movie. Note that libavcodec does not yet support closed GOP (the option cgop doesn't currently work), so DivX5 won't be able to decode anything encoded with B-frames.

  • vb_strategy=1: helps in high-motion scenes. Requires vmax_b_frames >= 2. On some videos, vmax_b_frames may hurt quality, but vmax_b_frames=2 along with vb_strategy=1 helps.

  • dia: motion search range. Bigger is better and slower. Negative values are a completely different scale. Good values are -1 for a fast encode, or 2-4 for slower.

  • predia: motion search pre-pass. Not as important as dia. Good values are 1 (default) to 4. Requires preme=2 to really be useful.

  • cmp, subcmp, precmp: Comparison function for motion estimation. Experiment with values of 0 (default), 2 (hadamard), 3 (dct), and 6 (rate distortion). 0 is fastest, and sufficient for precmp. For cmp and subcmp, 2 is good for anime, and 3 is good for live action. 6 may or may not be slightly better, but is slow.

  • last_pred: Number of motion predictors to take from the previous frame. 1-3 or so help at little speed cost. Higher values are slow for no extra gain.

  • cbp, mv0: Controls the selection of macroblocks. Small speed cost for small quality gain.

  • qprd: adaptive quantization based on the macroblock's complexity. May help or hurt depending on the video and other options. This can cause artifacts unless you set vqmax to some reasonably small value (6 is good, maybe as low as 4); vqmin=1 should also help.

  • qns: very slow, especially when combined with qprd. This option will make the encoder minimize noise due to compression artifacts instead of making the encoded video strictly match the source. Don't use this unless you've already tweaked everything else as far as it will go and the results still aren't good enough.

  • vqcomp: Tweak ratecontrol. What values are good depends on the movie. You can safely leave this alone if you want. Reducing vqcomp puts more bits on low-complexity scenes, increasing it puts them on high-complexity scenes (default: 0.5, range: 0-1. recommended range: 0.5-0.7).

  • vlelim, vcelim: Sets the single coefficient elimination threshold for luminance and chroma planes. These are encoded separately in all MPEG-like algorithms. The idea behind these options is to use some good heuristics to determine when the change in a block is less than the threshold you specify, and in such a case, to just encode the block as "no change". This saves bits and perhaps speeds up encoding. vlelim=-4 and vcelim=9 seem to be good for live movies, but seem not to help with anime; when encoding animation, you should probably leave them unchanged.

  • qpel: Quarter pixel motion estimation. MPEG-4 uses half pixel precision for its motion search by default, therefore this option comes with an overhead as more information will be stored in the encoded file. The compression gain/loss depends on the movie, but it's usually not very effective on anime. qpel always incurs a significant cost in CPU decode time (+20% in practice).

  • psnr: doesn't affect the actual encoding, but writes a log file giving the type/size/quality of each frame, and prints a summary of PSNR (Peak Signal to Noise Ratio) at the end.

Options not recommended to play with:

  • vme: The default is best.

  • lumi_mask, dark_mask: Psychovisual adaptive quantization. You don't want to play with those options if you care about quality. Reasonable values may be effective in your case, but be warned this is very subjective.

  • scplx_mask: Tries to prevent blocky artifacts, but postprocessing is better.

7.10.8. Example

So, you've just bought your shiny new copy of Harry Potter and the Chamber of Secrets (widescreen edition, of course), and you want to rip this DVD so that you can add it to your Home Theatre PC. This is a region 1 DVD, so it's NTSC. The example below will still apply to PAL, except you'll omit -ofps 24000/1001 (because the output framerate is the same as the input framerate), and of course the crop dimensions will be different.

After running mplayer dvd://1, we follow the process detailed in the section How to deal with telecine and interlacing in NTSC DVDs and discover that it's 24000/1001 fps progressive video, which means that we needn't use an inverse telecine filter, such as pullup or filmdint.

Next, we want to determine the appropriate crop rectangle, so we use the cropdetect filter:

mplayer dvd://1 -vf cropdetect
Make sure you seek to a fully filled frame (such as a bright scene), and you'll see in MPlayer's console output:
crop area: X: 0..719  Y: 57..419  (-vf crop=720:362:0:58)
We then play the movie back with this filter to test its correctness:
mplayer dvd://1 -vf crop=720:362:0:58
And we see that it looks perfectly fine. Next, we ensure the width and height are a multiple of 16. The width is fine, however the height is not. Since we didn't fail 7th grade math, we know that the nearest multiple of 16 lower than 362 is 352.

We could just use crop=720:352:0:58, but it'd be nice to take a little off the top and a little off the bottom so that we retain the center. We've shrunk the height by 10 pixels, but we don't want to increase the y-offset by 5-pixels since that's an odd number and will adversely affect quality. Instead, we'll increase the y-offset by 4 pixels:

mplayer dvd://1 -vf crop=720:352:0:62
Another reason to shave pixels from both the top and the bottom is that we ensure we've eliminated any half-black pixels if they exist. Note that if your video is telecined, make sure the pullup filter (or whichever inverse telecine filter you decide to use) appears in the filter chain before you crop. If it is interlaced, deinterlace before cropping. (If you choose to preserve the interlaced video, then make sure your vertical crop offset is a multiple of 4.)

If you're really concerned about losing those 10 pixels, you might prefer instead to scale the dimensions down to the nearest multiple of 16. The filter chain would look like:

-vf crop=720:362:0:58,scale=720:352
Scaling the video down like this will mean that some small amount of detail is lost, though it probably won't be perceptible. Scaling up will result in lower quality (unless you increase the bitrate). Cropping discards those pixels altogether. It's a tradeoff that you'll want to consider for each circumstance. For example, if the DVD video was made for television, you might want to avoid vertical scaling, since the line sampling corresponds to the way the content was originally recorded.

On inspection, we see that our movie has a fair bit of action and high amounts of detail, so we pick 2400Kbit for our bitrate.

We're now ready to do the two pass encode. Pass one:

mencoder dvd://1 -ofps 24000/1001 -oac copy -vf crop=720:352:0:62,hqdn3d=2:1:2 -ovc lavc \
-lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:mbcmp=3:autoaspect:vpass=1 \
-o Harry_Potter_2.avi
And pass two is the same, except that we specify vpass=2:
mencoder dvd://1 -ofps 24000/1001 -oac copy -vf crop=720:352:0:62,hqdn3d=2:1:2 -ovc lavc \
-lavcopts vcodec=mpeg4:vbitrate=2400:v4mv:mbd=2:trell:cmp=3:subcmp=3:mbcmp=3:autoaspect:vpass=2 \
-o Harry_Potter_2.avi

The options v4mv:mbd=2:trell will greatly increase the quality at the expense of encoding time. There's little reason to leave these options out when the primary goal is quality. The options cmp=3:subcmp=3:mbcmp=3 select a comparison function that yields higher quality than the defaults. You might try experimenting with this parameter (refer to the man page for the possible values) as different functions can have a large impact on quality depending on the source material. For example, if you find libavcodec produces too much blocky artifacting, you could try selecting the experimental NSSE as comparison function via *cmp=10.

For this movie, the resulting AVI will be 138 minutes long and nearly 3GB. And because you said that file size doesn't matter, this is a perfectly acceptable size. However, if you had wanted it smaller, you could try a lower bitrate. Increasing bitrates have diminishing returns, so while we might clearly see an improvement from 1800Kbit to 2000Kbit, it might not be so noticeable above 2000Kbit. Feel free to experiment until you're happy.

Because we passed the source video through a denoise filter, you may want to add some of it back during playback. This, along with the spp post-processing filter, drastically improves the perception of quality and helps eliminate blocky artifacts in the video. With MPlayer's autoq option, you can vary the amount of post-processing done by the spp filter depending on available CPU. Also, at this point, you may want to apply gamma and/or color correction to best suit your display. For example:

mplayer Harry_Potter_2.avi -vf spp,noise=9ah:5ah,eq2=1.2 -autoq 3

7.10.9. Muxing

Now that you have encoded your video, you will most likely want to mux it with one or more audio tracks into a movie container, such as AVI, MPEG, Matroska or NUT. MEncoder is currently only able to output audio and video into MPEG and AVI container formats. for example:

mencoder -oac copy -ovc copy  -o output_movie.avi -audiofile input_audio.mp2 input_video.avi
This would merge the video file input_video.avi and the audio file input_audio.mp2 into the AVI file output_movie.avi. This command works with MPEG-1 layer I, II and III (more commonly known as MP3) audio, WAV and a few other audio formats too.

MEncoder features experimental support for libavformat, which is a library from the FFmpeg project that supports muxing and demuxing a variety of containers. For example:

mencoder -oac copy -ovc copy  -o output_movie.asf -audiofile input_audio.mp2 input_video.avi -of lavf -lavfopts format=asf
This will do the same thing as the previous example, except that the output container will be ASF. Please note that this support is highly experimental (but getting better every day), and will only work if you compiled MPlayer with the support for libavformat enabled (which means that a pre-packaged binary version will not work in most cases).

7.10.9.1. Limitations of the AVI container

Although it is the most widely-supported container format after MPEG-1, AVI also has some major drawbacks. Perhaps the most obvious is the overhead. For each chunk of the AVI file, 24 bytes are wasted on headers and index. This translates into a little over 5 MB per hour, or 1-2.5% overhead for a 700 MB movie. This may not seem like much, but it could mean the difference between being able to use 700 kbit/sec video or 714 kbit/sec, and every bit of quality counts.

In addition this gross inefficiency, AVI also has the following major limitations:

  1. Only fixed-fps content can be stored. This is particularly limiting if the original material you want to encode is mixed content, for example a mix of NTSC video and film material. Actually there are hacks that can be used to store mixed-framerate content in AVI, but they increase the (already huge) overhead fivefold or more and so are not practical.

  2. Audio in AVI files must be either constant-bitrate (CBR) or constant-framesize (i.e. all frames decode to the same number of samples). Unfortunately, the most efficient codec, Vorbis, does not meet either of these requirements. Therefore, if you plan to store your movie in AVI, you'll have to use a less efficient codec such as MP3 or AC3.

Having said all that, MEncoder does not currently support variable-fps output or Vorbis encoding. Therefore, you may not see these as limitations if MEncoder is the only tool you will be using to produce your encodes. However, it is possible to use MEncoder only for video encoding, and then use external tools to encode audio and mux it into another container format.

7.10.9.2. Muxing into the Matroska container

Matroska is a free, open standard container format, aiming to offer a lot of advanced features, which older containers like AVI cannot handle. For example, Matroska supports variable bitrate audio content (VBR), variable framerates (VFR), chapters, file attachments, error detection code (EDC) and modern A/V Codecs like "Advanced Audio Coding" (AAC), "Vorbis" or "MPEG-4 AVC" (H.264), next to nothing handled by AVI.

The tools required to create Matroska files are collectively called mkvtoolnix, and are available for most Unix platforms as well as Windows. Because Matroska is an open standard you may find other tools that suit you better, but since mkvtoolnix is the most common, and is supported by the Matroska team itself, we will only cover its usage.

Probably the easiest way to get started with Matroska is to use MMG, the graphical frontend shipped with mkvtoolnix, and follow the guide to mkvmerge GUI (mmg)

You may also mux audio and video files using the command line:

mkvmerge -o output.mkv input_video.avi input_audio1.mp3 input_audio2.ac3
This would merge the video file input_video.avi and the two audio files input_audio1.mp3 and input_audio2.ac3 into the Matroska file output.mkv. Matroska, as mentioned earlier, is able to do much more than that, like multiple audio tracks (including fine-tuning of audio/video synchronization), chapters, subtitles, splitting, etc... Please refer to the documentation of those applications for more details.