merge audio and video from youtube.

Recently youtube has separated the video and audio file while you download stuff from it.

You might end up with 2 mp4 files: one for video and other for audio.

To merge them, a easy way is to use the ffmpeg.

The command could be:

ffmpeg.exe -i pathOfAudio -i pathOfVideo -acodec copy -vcodec copy output.mp4

To make it easier to use, create a bat file would be better:

@echo off
title YouTube Merger 20140127
echo FROM:
set /p audio=drag your AUDIO here and press ENTER:
set /p video=drag your VIDEO here and press ENTER:
ffmpeg.exe -i %audio% -i %video% -acodec copy -vcodec copy mergedFile.mp4
echo DONE
echo File: mergedFile.mp4 Create.

Some detail BLEOW:

To extract audio from mp4, the basic command is:

ffmpeg -i filename.mp4 filename.mp3


ffmpeg -i video.mp4 -b:a 192K -vn music.mp3

A codec is only responsible for the video or audio part, and one or more codecs can be merged into a container. The container is responsible for keeping them together and this is also what you usually open up in your media player of choice.

As you can see, we’ll have to explain a few things here.

What is a codec?

A codec is short for encoder/decoder, which basically just means the following: Data generated by an encoder can always be decoded by an appropriate decoder. This happens to be valid for video, audio, but you could also think about cryptography (an encoder needs an appropriate decoder to display an encrypted message).

Let’s take a historical look at things. Nowadays, when a video codec is specified, the institutions that take part in it usually only specify the syntax of the standard. For example, they will say: “The bitstream format has to be like this”, “The 0x810429AAB here will be translated into that”, et cetera. Often they supply a reference encoder and decoder, but how an encoder is then written to match such a format completely is up to manufacturers.

This is the reason why you will find so many encoders for the very same codec, and some of them even commercial.

A case example – H.264

Before we mix up terminology, let’s take an example. Consider the case for H.264. The standard’s name is H.264 – that’s not the name of the actual encoder. Mainconcept is a very good commercial encoder, whereas x264 is a free and open source one. Both claim to deliver good quality, of course.

The mere fact that you can optimize encoding makes for a competition here. Both encoders will deliver a standardized bitstream that can always be decoded by a H.264-compliant decoder.

To summarize

So, all in all, let’s just say that an encoder will:

  • take video frames
  • produce a valid bitstream

The decoder will:

  • take that valid bitstream
  • reconstruct the video frames from it

That’s all!

Current codecs

These days, you will probably only find videos encoded with the codecs I mention here. Interestingly, almost all of them were created by the Motion Picture Experts Group, with the help of some other joint effort groups.

The procedure is always the same: After they’ve created the draft standard, they will go ahead and let it be standardized by the ISO. This results in video codec standards we know today.

Note that “MPEG” can refer to both codecs and containers, as you will see below. This adds to the confusion, but just know that “MPEG” alone doesn’t mean anything, e.g. “I have a file in MPEG format” is ambiguous”.


MPEG-2 is quite old. Its first public release is from 1996. MPEG-2 video is mostly used for DVDs and TV broadcasting, e.g. DVB. If you watch TV, you’re most likely seeing MPEG-2 video. It offers high quality at the expense of high file sizes. It’s been the main choice for video compression for many many years now. Encoders are often found embedded into hardware. The encoding scheme is very basic and therefore allows for fast encoding.

Modern applications demanded too much though: At some point, its quality is not good enough for lower bitrates. While a satellite or cable transmission offered enough bitrate for a high quality MPEG-2 video, it just wasn’t good enough for the internet and multimedia age, e.g. our smartphones on 3G cellular networks.

MPEG-2 videos are mostly found in an .MPG container.

MPEG-4 Part 2

This is probably the one used mostly to encode videos for the web (where I mean, encode for letting others download it). It offers rather good quality at practical file sizes, which means you can burn a whole movie of 90 minutes length onto a 600 MB CD (whereas with MPEG-2 you would have needed a DVD, see my answer here).

Its drawback is that the quality itself might not be good enough for some viewers, especially with high resolution material (e.g. 1080p).

Some encoders that output MPEG-4 Part 2 video are DivX, its open sourced ripoff XviD, and Nero Digital.

MPEG-4 Part 2 videos mostly come in an .AVI container, but .MP4 is also seen on rare occasions.

MPEG-4 Part 10

This is also known as MPEG-4 Advanced Video Coding (AVC) or H.264: This is the big boss, the number one codec today. It offers superb quality at small file sizes and therefore is perfectly suited for all kinds of video for the internet or mobile devices.

That was its main purpose actually, which you can see from the name. Originally, it was called H.264. You can see that this has nothing to do with the MPEG. In fact, the ITU created it based on h.263 – both were meant for videoconferencing. They however joined efforts and created this new standard.

You will find H.264 in almost every modern applications, from phones to camcorders, even on Blu Ray disks, video is now encoded in H.264. Its quality vs. filesize ratio is just so much better than with MPEG-2 or even MPEG-4 Part 2. The main disadvantage is that it is very slow to encode as it has some vast algorithmic improvements over older codecs.

Some encoders for it are: x264, Mainconcept, QuickTime. The videos mostly come in .MP4, .MKV or .MOVcontainers.

At the moment, the successor to H.264 is standardized, but it’ll take a while before it will see widespread use.

The on2 Codecs

Some proprietary codecs are developed not by a committee or institution, but rather a company. One of those was On2, and they produced quite a few codecs that are worth mentioning.

  • VP3 is a codec that On2 has made open source. There is actually no specification, they only released the codec implementation, and that’s it. It however became the basis of the free and open source Theora codecby the Xiph.Org Foundation.
  • VP6 is used in Flash 8 by Macromedia/Adobe.
  • VP7 claimed to be better than H.264, but it hasn’t seen much usage.
  • VP8 was later turned by Google into WebM.
  • They also seem to have created the basis for Skype’s video protocol.

Windows Media Video

Microsoft has always tried to popularize their own video codecs, namely Windows Media Video. The first WMV 7 was introduced in 1999 and copies MPEG-4 Part 2. WMV 8 and 9 followed, with Microsoft releasing VC-1 as a standard in 2006, which is comparable to MPEG-4 Part 10.

It is mostly found in Microsoft-specific environments, e.g. Silverlight and HD-DVDs (before Blu-ray took over).

Official encoders only exist from Microsoft, e.g. Microsoft Expression Encoder.

Video Codecs over time

Here’s a nice graphic that I found on AppleInsider, which shows the proliferation of video codecs over the last few years:

Proliferation of video codecs

What is a format (container)?

Until now we’ve only explained the raw “bitstream”, which is basically just really raw video data. You could actually go ahead and watch the video using such a raw bitstream. But in most cases that’s just not enough or not practical.

Therefore, you need to wrap the video in a container. There are several reasons why:

  • Maybe you want some audio along with the video
  • Maybe you want to skip to a certain part in the video (like, “go to 1:32:20.12”)
  • Both audio and video should be perfectly synchronized
  • The video might need to be transmitted over a reliable network and split into packets before
  • The video might even be sent over a lossy network (like 3G) and split into packets before

For all of those reasons, container formats were invented, some simple, some more advanced. What they all do is “wrap” the video bitstream into another bitstream.

A container will synchronize video and audio frames according to their Presentation Time Stamp (PTS), which makes sure they are displayed at exactly the same time. It would also take care of adding information for streaming servers, if necessary, so that a streaming server knows when to send which part of the file.

Let’s take a look at some popular containers.

Popular containers

You will find videos mostly wrapped in the following containers. There are other less popular ones as well, but as I said, mostly, it’s those:


Audio Video Interleave — this is the most basic container, it’s just there to interleave audio and video (heh, the name says it). It was written in 1992 and is still used today. It has quite a few disadvantages:

  • No aspect ratio information
  • Not all variable bitrate video or audio
  • No good support for codecs that use predictive coding (like H.264)

All in all, we shouldn’t use AVI anymore, yet we do. Don’t ask me why.


is also known as MPEG-4 Part 14 and is based on the QuickTime file format. This is the go-to format for H.264 video, but it also wraps MPEG-4 Part 2 and MPEG-2. It has the advantage of offering vast metadata information about a video, better and stable support for predictive coding (unlike AVI).

Interestingly, this container might also wrap audio only, which is why you’ll find so many .mp4 files which are no videos but rather AAC-encoded audio.

Note: The extension “m4v” is usually taken for raw MPEG-4 Part 2 bitstreams!


Matroska Video is an open sourced and free file format that is often found for H.264 files nowadays.

The fact that it is not yet supported in all players (especially hardware ones like TV stations or DLNA streaming servers) is a bit of a disadvantage. It sees more and more support these days though. I guess in the future, MKV will be omnipresent on all devices, along with MP4.


The Ogg container is the container of choice for the Theora video codec (and the Vorbis audio codec), also created by the Xiph.Org Foundation. It’s also free and open source (just like the codec). Some features include additional XML metadata information which theoretically offers much more than other containers.


The Flash video format was created by Adobe, for use in their streaming applications.

FLV files can take VP6-encoded video (see above), or H.263 and H.264. These days you’ll usually find H.264 in FLV files, which also means that you can change the container from MP4 to FLV without the need to re-encode.

Popular codecs and formats

Also, which of the following are codecs, which ones are file formats, and which ones are neither?

  • Quicktime MOV: .mov is the file extension for the QuickTime File Format, which is a container created by Apple. This container was later adapted for MP4. It can carry all kinds of codecs. Quicktime is actually a whole media framework, it doesn’t really specify any codec itself as far as I’m concerned.
  • MPEG (1, 2, 3, 4): Standards defined by the Motion Picture Experts Group. See my post above for details.
  • WMV: Windows Media Video. It’s actually a codec wrapped in an Advanced Systems Format container, which uses the .wmv extension again. Weird, but that’s the way it is.
  • FFmpeg: This is neither a codec nor a container. It is a library of video tools that also allow conversion between different codecs and containers. FFmpeg relies on the open source libavcodec and libavformat libraries for creating codecs and containers, respectively. Most of video tools you find today are based on it.
  • AVC: Synonym for MPEG-4 Part 10 or H.264.
  • DivX: Another type of encoder for MPEG-4 Part 2 video.
  • Xvid: One type of encoder for MPEG-4 Part 2 video. It’s just the open source, free version of DivX, which of course led to some controversy.
  • H.264: Synonym for MPEG-4 Part 10 or AVC.

On a side note:

Am I even using the correct terminology?

I guess once would prefer to specifically use “codec” and “container” instead of “format” to avoid misunderstandings. A format can theoretically be anything, because both codecs and containers specify a format (i.e. how data should be represented).

That being said, the FFmpeg terminology would be to use “format” for the container. This is also because of the distinction between:

  • libavcodec, the library for encoding/decoding
  • libavformat, the library for the containers

HERE is a good arctile


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s