Love my blog? I'd be thankful for a gift from my geeky wishlist. Thanks!
Last time I explained how YouTube videos can be downloaded with gawk programming language by getting the YouTube page where the video is displayed and finding out how the flash video player retrieves the FLV (flash video) media file.
This time I’ll use Perl programming language which is my favorite language at the moment and write a one-liner which downloads a YouTube video.
Instead of parsing the YouTube video page, let’s look how an embedded YouTube video player on a 3rd party website gets the video.
Let’s go to this cool video and look at the embed html code:
For this video it looks as following:
<object width="425" height="350"><param name="movie" value="http://www.youtube.com/v/qg1ckCkm8YI"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/qg1ckCkm8YI" type="application/x-shockwave-flash" wmode="transparent" width="425" height="350"></embed></object>
95% of this code is boring, the only interesting part is this URL:
http://www.youtube.com/v/qg1ckCkm8YI
Let’s load this in a browser, and as we do it, we get redirected to some other URL:
http://www.youtube.com/jp.swf?video_id=qg1ckCkm8YI&eurl=&iurl=http://img.youtube.com/vi/qg1ckCkm8YI/default.jpg&t=OEgsToPDskJCPW5DvMKeM3srnQ5e0LSY
So far we have no information how the flash player will retrieve the video, the only thing we know that ‘iurl’ stands for ‘image url’ and is the location of the thumbnail image.
Let’s sniff the traffic again, this time with an excellent (though, commercial) Internet Explorer plugin ‘HttpWatch Professional’.
This plugin displays all the requests the browser makes no matter if it’s HTTP or HTTPS traffic and displays in a nice manner which makes our job much quicker than by using Ethereal.
The FireFox’s alternative to this tool is Live HTTP Headers extension which basically does the same as HttpWatch Professional but it takes more time to understand the output.
Here is what we see with HttpWatch Professional when we load the URL in the browser:
We see that to get a video browser first requested:
http://www.youtube.com/get_video?video_id=qg1ckCkm8YI&t=OEgsToPDskJ3bp4DEiMuxUmjx7oumUec&eurl=
then got redirected to:
http://cache.googlevideo.com/get_video?video_id=qg1ckCkm8YI
and then another time to:
http://74.125.13.83/get_video?video_id=qg1ckCkm8YI
This is exactly what what we saw in the previous article on downloading videos with gawk!
Now let’s write a Perl one-liner that retrieves this video file!
What is a one-liner you might ask? Well, my definition of one liner is that it is a program you are willing to type out without saving it to disk.
First of all we will need some perl packages (modules) which will ease working with HTTP protocol. There are two widely used available on Perl’s module archive (CPAN) - LWP and WWW::Mechanize.
WWW::Mechanize is built on top of LWP, so let’s go to a higher level of abstraction and use this module.
The WWW::Mechanize package does not come as Perl’s core package by default, so you’ll have to get it installed.
To do it, type
perl -MCPAN -eshell
In your console and when the CPAN shell appears, type
install WWW::Mechanize
to get the module installed.
If everything goes fine, the CPAN will tell you that the module got installed.
I don’t want to go into Perl language’s details again, also I don’t want to go into WWW::Mechanize package’s details.
If you want to learn Perl I recommend this article as a starter, these books and of course perldoc. Once you learn the basics you can quickly pick up the WWW::Mechanize package by reading the documentation, faq and trying examples.
Now finally let’s write the one-liner. So what do we have to do?
First we have to retrieve
http://www.youtube.com/v/qg1ckCkm8YI
then follow the redirect (which WWW::Mechanize will do for us), then get the ‘t’ identifier from query string and finally request and save output of
http://www.youtube.com/get_video?video_id=qg1ckCkm8YI&t=OEgsToPDskJ3bp4DEiMuxUmjx7oumUec&eurl=
That’s it!
So here is the final version which can probably be made even shorter:
perl -MWWW::Mechanize -e '$_ = shift; s#http://|www\.|youtube\.com/|watch\?|v=|##g; $m = WWW::Mechanize->new; ($t = $m->get("http://www.youtube.com/v/$_")->request->uri) =~ s/.*&t=(.+)/$1/; $m->get("http://www.youtube.com/get_video?video_id=$_&t=$t", ":content_file" => "$_.flv")'
A little longer than a usual one-liner but does the job nicely. To keep it short, there is no error checking!
To use this one-liner just copy it to command line and specify the URL of a YouTube video (or just the ID of the video, or a variation of URL (like without ‘http://’). Like this:
perl -MWWW::Mechanize -e '...' http://www.youtube.com/watch?v=l69Vi5IDc0g
or just
perl -MWWW::Mechanize -e '...' l69Vi5IDc0g
Let’s spread this one liner to multiple lines and see what it does as it is not documented.
One could do the spreading out to multiple lines by hand, but that’s not what humans are for, let’s make Perl do it. By adding -MO=Deparse to the command line list we get the output of the Perl generated source code (i added line numbers myself):
use WWW::Mechanize;
1) $_ = shift @ARGV;
2) s[http://|www\.|youtube\.com/|watch\?|v=|][]g;
3) $m = 'WWW::Mechanize'->new;
4) ($t = $m->get("http://www.youtube.com/v/$_")->request->uri) =~ s/.*&t=(.+)/$1/;
5) $m->get("http://www.youtube.com/get_video?video_id=$_&t=$t", ':content_file', "$_.flv");
So our one liner is actually 5 lines.
On line 1 we put the first argument of ARGV variable into special variable $_ so we could use advantage of it and save some typing.
On line 2 we just leave the ID of the video by removing parts from the URL one by one so a user could specify the video URL in various formats like ‘www.youtube.com/watch?v=ID, or just ‘youtube.com?v=ID’ or just ‘v=ID’ or even just ‘ID’. The ID gets stored in the special $_ variable.
On line 3 we create a WWW::Mechanize object we are going to use twice.
Line 4 needs more explanation because we are doing so much in it. First it retrieves that embedded video URL I talked about earlier, the server actually redirects us away, so we have to look at the last request’s location. We save this location into variable $t and then extract the ‘t’ YouTube ID out.
As a YouTube video is uniquely specifed with two IDs, the video ID and ‘t’ ID, on line 5 we retrieve the file and tell WWW::Mechanize to save contents to the ID.flv file. WWW::Mechanize handles redirects for us so everything should work. Indeed, I tested it out and it worked.
Can you golf it shorter?
I golfed it a little myself, here is what I came up with:
perl -MWWW::Mechanize -e '$_ = shift; ($y, $i) = m#(http://www\.youtube\.com)/watch\?v=(.+)#; $m = WWW::Mechanize->new; ($t = $m->get("$y/v/$i")->request->uri) =~ s/.*&t=(.+)/$1/; $m->get("$y/get_video?video_id=$i&t=$t", ":content_file" => "$i.flv")'
To use this one liner you must specify the full URL to youtube video, like this one:
http://www.youtube.com/watch?v=l69Vi5IDc0g
This one liner saves the “http://www.youtube.com” string in variable $y and the ID of the video in variable $i. The $y comes handy because we don’t have to use the full YouTube URL, instead we use use $y.
Also, are you interested in Perl programming language? Here are three excellent books on Perl from Amazon (recommended by me):
Did you like this post? Subscribe here:
If you really enjoyed the post, I'd appreciate a gift from my geeky Amazon book wishlist. Books would make make me more educated and I would write even better posts. Thanks! :)

|
|
|


July 27th, 2007 at 7:34 am
All of the methods you used from WWW::Mechanize were inherited from LWP…
Here’s a first go at a golf, very similar to yours:
perl -MLWP -e '($y,$i)=shift=~/^(.+m)\/.+v=(.+)/;($m=LWP::UserAgent->new)->get("$y/get_video?video_id=$i&t=".($m->get("$y/v/$i")->request->uri=~/&t=(.+)/)[0],":content_file"=>"$i.flv")' 'http://www.youtube.com/watch?v=l69Vi5IDc0g'July 27th, 2007 at 7:38 am
With some whitespace:
perl -MLWP -e ‘($y,$i) = shift =~ /^(.+m)\/.+v=(.+)/; ($m = LWP::UserAgent->new) ->get(”$y/get_video?video_id=$i&t=” . ($m->get(”$y/v/$i”) ->request->uri =~ /&t=(.+)/)[0], “:content_file” => “$i.flv”)’
July 27th, 2007 at 7:46 am
Sorry to spam, I’m new to the intertubes.
Third time’s the charm (perltidied):
perl -MLWP -e' ( $y, $i ) = shift =~ /^(.+m)\/.+v=(.+)/; ( $m = LWP::UserAgent->new )->get( "$y/get_video?video_id=$i&t=" . ( $m->get("$y/v/$i")->request->uri =~ /&t=(.+)/ )[0], ":content_file" => "$i.flv" ) 'July 27th, 2007 at 7:52 am
A little shorter:
perl -MWWW::Mechanize -e'$y="http://youtube.com";($i)=pop=~/\w+$/g;$m=new WWW::Mechanize;$m->get("$y/v/$i")->request->uri=~/&t=.+/;$m->get("$y/get_video?video_id=$i$&",":content_file"=>"$i.flv")'July 27th, 2007 at 7:54 am
heh, same problem as Saldane. here it is with unnecessary \ns after semicolons:
perl -MWWW::Mechanize -e'$y="http://youtube.com"; ($i)=pop=~/\w+$/g; $m=new WWW::Mechanize; $m->get("$y/v/$i")->request->uri=~/&t=.+/; $m->get("$y/get_video?video_id=$i$&",":content_file"=>"$i.flv")'July 27th, 2007 at 8:13 am
Intermediate Perl
http://www.flazx.com/ebook4407.php
July 27th, 2007 at 6:42 pm
nice.. and quoted for windows shell:
perl -MLWP -e"$y='http://youtube.com';($i)=pop=~/\w+$/g;($m=new LWP::UserAgent)->get(qq{$y/v/$i})->request->uri=~/&t=.+/;$m->get(qq{$y/get_video?video_id=$i$&},':content_file',$i.'.flv')" "l69Vi5IDc0g"July 27th, 2007 at 6:54 pm
Sweet! Thanks for golfing
I noticed the comments do not look good at all. I will fix the design so that the code did not get cut off if it runs over the edge
July 28th, 2007 at 1:44 am
That is nice. A friend of mine always boasted about Perl and how good it is.
________________
http://www.FreeOpenMoko.com
July 31st, 2007 at 11:46 pm
[…] I said, we will be creating the tool in Perl programming language. In the previous post about YouTube I used the WWW::Mechanize package. I can tell you in advance that it will not work this time there […]
August 12th, 2007 at 3:29 am
I have to say, that I could not agree with you in 100% regarding o.us poetry, but it’s just my opinion, which could be wrong
August 12th, 2007 at 6:50 am
Daniel, what do you mean by ‘o.us poetry’?
August 15th, 2007 at 9:31 am
problem, video -CrLh0xR3FM causes problems
there is a 0xR in it - and perl recognizes this as unicode. Anyway you can quote it to take in the whole string?
September 14th, 2007 at 9:27 am
[…] See also Peteris’ excellent articles on Downloading YouTube Videos and Perls’ Special […]
November 4th, 2007 at 2:54 am
I put together the following korn script from your perl code… It downloads the video and converts it to a DVD-style MPEG. Good work; I hope others will find it useful!
#!/bin/ksh set -e if [ -z "$1" ]; then echo "Please supply quoted URL as argument." exit 1 fi URL="$1" FILE=`echo "$URL" | awk -F 'v=' '{print $2}'` perl -MWWW::Mechanize -e '$_ = shift; ($y, $i) = m#(http://www\.youtube\.com)/watch\?v=(.+)#; $m = WWW::Mechanize->new; ($t = $m->get("$y/v/$i")->request->uri) =~ s/.*&t=(.+)/$1/; $m->get("$y/get_video?video_id=$i&t=$t", ":content_file" => "$i.flv")' "$URL" mencoder -of mpeg -mpegopts format=dvd -ofps 30000/1001 -oac lavc -ovc lavc -srate 48000 -af lavcresample=48000 -vf scale=704:480,expand=720:480 -lavcopts acodec=ac3:abitrate=192:vcodec=mpeg2video:vrc_buf_size=1835:vrc_maxrate=9800:vbitrate=1856:keyint=18:aspect=4/3 -o "$FILE.mpeg" "$FILE.flv"November 4th, 2007 at 6:26 am
Greg, thanks for the script
November 7th, 2007 at 11:06 am
I am not able to download youtube vieo using VBScript file. I am still getting the .dll error even though I opened IE and did the required change? Could you tell me why is that?
Thanks.
-Manish
November 7th, 2007 at 11:07 am
I am not able to download youtube video using VBScript file. I am still getting the .dll error even though I opened IE and did the required change? Could you tell me why is that?
Thanks.
-Manish
November 7th, 2007 at 10:34 pm
Kankani, what .dll error?
November 14th, 2007 at 5:54 pm
[…] by Downloading YouTube videos with a Perl one-liner, I’ve put together a piece of code to do the same thing with Groovy. Not as succinct as Perl. […]
February 3rd, 2008 at 5:06 am
I’m not a programmer so I guess I’ll stick to How To Download YouTube Videos The Easy Way For Free.
April 1st, 2008 at 6:05 pm
Hi,
this is great and it works perfectly (I’m wonrking on Windows). I have one question, is it possible to have the percentage accomplished being displayed to know how long it will take to complete ? I it could be done that’d awesome.
Thanks a lot
April 19th, 2008 at 9:48 am
After reading the comment from Vinniemc I thought of taking a stab at showing some kind of progress indicator.
For showing progress indicator I had to find a mechanism where LWP::UserAgent would call my function after it received each chunk of file. I was delighted to find in LWP::UserAgent’s perldoc that its possible to specify a call back method to LWP::UserAgent’s get() method via the special field name “:content_cb”. After trying unsuccessfully to use this call back functionality I went back to the LWP perldoc. On re-reading I found that its not possible to use the option “:content_file” & “:content_cb” at the same time !
After searching some more I found the lwp cookbook which has an example of manually processing http responses. So based on that I was able to hack up the progress indicator. Unfortunately it hardly qualifies as a one liner anymore! In my attempt to still make the script small it has become a little obfuscated and might be difficult to understand. So here is the code
perl -MLWP -e '$_ = shift; ($y, $i) = m#(http://www\.youtube\.com)/watch\?v=(.+)#; $m = LWP::UserAgent->new; ($t = $m->get("$y/v/$i")->request->uri) =~ s/.*&t=(.+)/$1/; open($fh,">$i.flv");binmode($fh);$t1=$t2=time;print "\n";$res = $m->request(HTTP::Request->new(GET => "$y/get_video?video_id=$i&t=$t"),sub { ($c,$res) = @_;$br += length($c);$t2 = time;if($t2 > $t1){if ($res->content_length) {printf STDERR "%d%% - ",100*$br/$res->content_length;$t1= $t2;}}print $fh $c;});close($fh);print "\n";' http://www.youtube.com/watch?v=l69Vi5IDc0gJuly 3rd, 2008 at 3:49 am
I want to download videos and movies.
July 9th, 2008 at 6:55 am
Why Perl? You can even do it with bash!
And you also get to download the mp4 format and a download o-meter for free.
July 23rd, 2008 at 10:24 am
wow,perl is so strong.
August 3rd, 2008 at 6:34 pm
Very good..
It was very useful.
September 12th, 2008 at 5:12 am
Download videos from popular video sharing sites like youtube.com,
blip.tv, break.com, google video, ifilm.com, spike.com etc. Simply
copy the video url and paste it to the video url box at vidmaza.com
and click to download.Also vidmaza.com try to find different video
formats automaticly(If avaliable).
Tired of copying and pasting urls of video files, try vidmaza.com search
videos function, it searches entirely in youtube.com for videos according
to your search criteria.Simply choose the format of video at results page
and download.
September 21st, 2008 at 7:55 am
Trying to develop a one liner to download the video from CNBC video pages without success.
for example from:
http://www.cnbc.com/id/15840232?video=861445025&play=1
It generally plays an ad and then the video. Sometimes the ad is missing and it plays the video directly. No way to tell.
Any pointers ?
October 16th, 2008 at 7:58 pm
The price for HttpWatch starts from $300, there are some other good http analyzers. For example http debugger or fiddler.
October 24th, 2008 at 5:41 am
Hi, I tried to do the same thing using Ruby. Not the one liner though. Its a piece of code. It worked great initially as I was able to extract value of ‘t’ from the redirected url.
But now, after a few day, the script stops working. I investigated and found out that the redirected url which initially held the value of ‘t’ does not contain the value of ‘t’ anymore and hence the regex that was used to retrieve it was fetching nil.
How to go about it now? As value of t is unknown. (Or atleast I could not figure out a way to find it out)
October 24th, 2008 at 6:00 am
I am getting redirected to
If that helps. The redirection stops here. Earlier this kind url used to have the ‘t’ value. But it does not anymore.
October 25th, 2008 at 7:04 pm
Chirantan,
You’ll have to use tcpdump, then download the video using a browser with flash. When the browser views the video the output of tcpdump will show the final url.
December 7th, 2008 at 3:25 pm
Beautiful!!!
But how do you write this code for running it on your own pc, e.g. with ActivePerl?
January 11th, 2009 at 5:03 pm
Hi peter,The script Doesnt work with me.dont know why!!
it just run for 2 secs and end with nothing!!!.any clues?
February 4th, 2009 at 2:28 am
youtube must have changed its format since you posted this, since what was being returned did not have a t get variable. The t value was stored in the javascript of the page, so i used the following regex to get it:
which i found out after looking at a python script youtube-dl at http://www.arrakis.es/~rggi3/youtube-dl/ but while their downloader is very robust and object oriented, this one is quick and perl-like but gets the job done. Heres one that works as of february 2009:
use WWW::Mechanize; use Number::Bytes::Human qw(format_bytes); for (@ARGV) { s{http://|www\.|youtube\.com/|watch\?|v=|}{}g; $m = WWW::Mechanize->new; $m->get("http://www.youtube.com/watch?v=$_&gl=US&hl=en"); ($t) = $m->content =~ /, "t": "([^"]+)"/; open $f, "> $_.flv"; binmode $f; $m->get( "http://youtube.com/get_video?video_id=$_&t=$t", ':content_cb' => sub { ($c, $r) = @_; $b += length($c); if ($r->content_length) { printf STDERR "$_: %.2f%%: %s of %s \r", 100. * $b / $r->content_length, format_bytes($b), format_bytes($r->content_length); } print $f $c; }); $b = 0; print "\n"; close $f; }I combined some things that i liked from the previous posts (such as the callback).
March 23rd, 2009 at 7:25 pm
$_ = shift; ?
just say “shift;”
April 7th, 2009 at 10:34 am
what????
April 7th, 2009 at 10:35 am
what???
May 24th, 2009 at 5:26 pm
Thank you…
But i found “Zillatube” program download video quickly, and also easy to play those videos
too.
found it at http://www.zillatube.com
June 3rd, 2009 at 5:16 am
PS - If you have to use a browser with flash support, you might as well use a plugin like the author mentioned: e.g. LiveHTTPheaders for Firefox, which gives can handle regex and you a formatted URL. Unless you’re piping the tcpdump output into something else.
YouTube has made it easier to download directly. I think because so much effort has been put by users into making this easier, whether with Perl, Awk, Python, VB, whatever.
If someone publishes an article on how to get the URL without using a flash browser, then I think we will really see a change in how sites use flash. Because ultimately people just want the video; they don’t care how they get it. And not everyone across the globe has a fast internet connection. Streaming is great if you have the bandwidth for it. But not everyone in the world has the bandwidth.
June 5th, 2009 at 12:27 am
Addendum:
There is a Microsoft Research (command line) tool (actually it’s several) that is quite good for nicely formatted HTTP conversations: it’s called STRACE. It comes with an HTTP replay utility as well. You can also use a special wininet.dll with MSIE that MS Research released many years ago.
The tool we really need though is one that has all the ‘handshaking’ (redirections, javascript, swf, etc) capabilities of Mozilla coupled with built-in ‘livehttpeaders’ functions, and _nothing else_. No rendering engines and other bloat. This tools would get you the URL to the content. That is its sole job. Then you download the content with a tool that works (unlike the download mamagers of the browsers). And finally you play the content offline with a standalone tool that works on any media file (you should know which ones can do that from your own experience trying different ones).
Alas, the developers seems to think they can include all these steps in one application (”plugins”) and have it work, seamlessly… instead of following the UNIX way of letting each tool do its job and piping the task from one application to the next.
Advertising (after all we do need that right?) can be ‘attached’ to the content.
An example is the TED videos, where you see a BMW ad at the start of each video.
This has been said before by people smarter than me: all the html tags flowing through the ‘wire’ are largely unnecessary, and slow things down. Almost all value is in plain text and links to more text or to content. ‘Typesetting’ (e.g. html gimmicks) is only residual value.
June 5th, 2009 at 12:34 am
s/all the html tags/many of the html tags/
What I mean is that there is ‘useful’ markup and then there are ‘gimmicks’.