Follow me on Twitter for my latest adventures!

Hi all. I'm starting yet another article series here. This one is going to be about Unix utilities that you should know about. The articles will discuss one Unix program at a time. I'll try to write a good introduction to the tool and give as many examples as I can think of.
Before I start, I want to clarify one thing - Why am I starting so many article series? The answer is that I want to write about many topics simultaneously and switch between them as I feel inspired.
The first post in this series is going to be about not so well known Unix program called Pipe Viewer or pv for short. Pipe viewer is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.
Pipe viewer is written by Andrew Wood, an experienced Unix sysadmin. The homepage of pv utility is here: pv utility.
If you feel like you are interested in this stuff, I suggest that you subscribe to my rss feed to receive my future posts automatically.
How to use pv?
Ok, let's start with some really easy examples and progress to more complicated ones.
Suppose that you had a file "access.log" that is a few gigabytes in size and contains web logs. You want to compress it into a smaller file, let's say a gunzip archive (.gz). The obvious way would be to do:
$ gzip -c access.log > access.log.gz
As the file is so huge (several gigabytes), you have no idea how long to wait. Will it finish soon? Or will it take another 30 mins?
By using pv you can precisely time how long it will take. Take a look at doing the same through pv:
$ pv access.log | gzip > access.log.gz 611MB 0:00:11 [58.3MB/s] [=> ] 15% ETA 0:00:59
Pipe viewer acts as "cat" here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.
You may stick several pv processes in between. For example, you can time how fast the data is being read from the disk and how much data is gzip outputting:
$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz source: 760MB 0:00:15 [37.4MB/s] [=> ] 19% ETA 0:01:02 gzip: 34.5MB 0:00:15 [1.74MB/s] [ <=> ]
Here we specified the "-N" parameter to pv to create a named stream. The "-c" parameter makes sure the output is not garbaged by one pv process writing over the other.
This example shows that "access.log" file is being read at a speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!
Notice how the gzip does not include how much data is left or how fast it will finish. It's because the pv process after gzip has no idea how much data gzip will produce (it's just outputting compressed data from input stream). The first pv process, however, knows how much data is left, because it's reading it.
Another similar example would be to pack the whole directory of files into a compressed tarball:
$ tar -czf - . | pv > out.tgz 117MB 0:00:55 [2.7MB/s] [> ]
In this example pv shows just the output rate of "tar -czf" command. Not very interesting and it does not provide information about how much data is left. We need to provide the total size of data we are tarring to pv, it's done this way:
$ tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
253MB 0:00:05 [46.7MB/s] [> ] 1% ETA 0:04:49
What happens here is we tell tar to create "-c" an archive of all files in current dir "." (recursively) and output the data to stdout "-f -". Next we specify the size "-s" to pv of all files in current dir. The "du -sb . | awk '{print $1}'" returns number of bytes in current dir, and it gets fed as "-s" parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way "pv" knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.
Another fine example is copying large amounts of data over network by using help of "nc" utility that I will write about some other time.
Suppose you have two computers A and B. You want to transfer a directory from A to B very quickly. The fastest way is to use tar and nc, and time the operation with pv.
# on computer A, with IP address 192.168.1.100 $ tar -cf - /path/to/dir | pv | nc -l -p 6666 -q 5
# on computer B $ nc 192.168.1.100 6666 | pv | tar -xf -
That's it. All the files in /path/to/dir on computer A will get transferred to computer B, and you'll be able to see how fast the operation is going.
If you want the progress bar, you have to do the "pv -s $(...)" trick from the previous example (only on computer A).
Another funny example is by my blog reader alexandru. He shows how to time how fast the computer reads from /dev/zero:
$ pv /dev/zero > /dev/null 157GB 0:00:38 [4,17GB/s]
That's about it. I hope you enjoyed my examples and learned something new. I love explaining things and teaching! :)
How to install pv?
If you're on Debian or Debian based system such as Ubuntu do the following:
$ sudo aptitude install pv
If you're on Fedora or Fedora based system such as CentOS do:
$ sudo yum install pv
If you're on Slackware, go to pv homepage, download the pv-version.tar.gz archive and do:
$ tar -zxf pv-version.tar.gz $ cd pv-version $ ./configure && sudo make install
If you're a Mac user:
$ sudo port install pv
If you're OpenSolaris user:
$ pfexec pkg install pv
If you're a Windows user on Cygwin:
$ ./configure $ export DESTDIR=/cygdrive/c/cygwin $ make $ make install
The manual of the utility can be found here man pv.
Have fun measuring your pipes with pv, and until next time!


Facebook
Plurk
more
GitHub
LinkedIn
FriendFeed
Google Plus
Amazon wish list
Comments
i like this one:
$ pv /dev/null
157GB 0:00:38 [4,17GB/s]
Hi, that's a pretty nifty tool, I've been using 'cwp' for more or less the same thing: http://www.ex-parrot.com/~chris/software.html
I think 'screen' is the biggest life changer.
i like this one:
similar to
$ dd if=/dev/zero of=/dev/null& pid=$!
$ kill -USR1 $pid; sleep 1; kill $pid
from dd manpage
This was really useful.... thanks. I was using the following dd command, but your solution is tidier:
dd if=/dev/zero bs=60M | pv -s 500G | dd of=/dev/null bs=60MHaha, alexandru. That's a great example. :)
That pv thing rocks...;)
When output is TAB separated (like du's one) cut -f1 is a little shorter than awk '{print $1}' :)
Good advice, Charlie. Totally forgot about cut utility.
Very useful.
How about "cowsay" for your next article :-)
Useful utilities / shell idioms not everyone knows about:
lsof (for finding what process is holding that file)
watch
screen
xargs
which (for people with messy PATHs)
Using backticks in commands
Are there any performance ramifications of inserting this into the stream?
Not by much. I've been doing some profiling and out of 100% of the Linux processes, pv takes ~1%.
Wow, really useful tool, great post!
oog's is a good list (though I can't imagine anybody not knowing "which"). I also find many people don't know about pstree or od.
@Daniel Watkins, yes, obviously because there is one more process in the pipline, but judging by Alexandru's example it won't slow things down by much.
Cool, nice utility! Heard about this post from http://arhuaco.org/pipe-viewer .
Great article! I've added "pv" to my sys admin toolkit and subscribed to your blog. Thanks!
Best,
Aleksey
You are right, that is a must have!
RT
grep, sed and cut will do everything you ever need. Well nearly. http://www.coldclimate.co.uk/2009/01/26/insomnia-unix-tools-txtr/
Before I found pv, I used "buffer -z 512K" to get pipe flow feedback. It doesn't have a concept of total size, so you end up doing the ETA calculations yourself.
Yes, and one way to measure them is to chain several pv's together:
(buffer comparison included for curiosity's sake.)
... chain several pv’s together: pv -c < /dev/zero | pv -c | pv -c | pv -c | pv -c > /dev/null
Wow thanks for the great post. I've been a UNIX user for 12 years and although I've needed PV many times, I didn't know it existed. Very cool!
BTW it's in macports as well under "pv".
Alan
Joey Hess maintains moreutils, a collection of nifty Unixy tools.
Very nice - haven't come across this one before. Other potential subjects? How about bash history, searching and re-submitting via ! (eg, !! or ! : parameter via !$).
I'd like to read an article about more obscure utilities like 'od', 'nm' or 'objdump'. Do they have any use in general sys administration?
# uname
FreeBSD
# pwd
/usr/ports/sysutils/pipemeter
#
OMG lame. Uniz is shit. its all about DOS. Can you even play games on unix? more like lames anyway lol bc they are lame games (= lames)
Also, Dos is way more secure and it is the basis of unix, making it a more developed language. Even linus torvalds has admitted to using dos for all important work.
If that was sarcasm, yes it was funny :).
If that was not sarcasm, please. Get over it. There is stuff Unix does best and there is stuff Windows does best. Both do their stuff well and its all about personal taste and is subjective. However, what you're saying is completely and utterly wrong.
lol, hilarious dos is the foundation of unix
real admins don't do windows
I to use to bash unix back before I learned how smart and powerful you feel when you can use something that the average bone head can't. Linux isn't for everyone you have to be of at least average intelligence to use it. I wonder why so many servers use unix/linux. I guess because it so lame and almost never crashes. Hmmm, disc operating system..........sounds old.
Not to be confused with pv, the phase vocoder.
As for your question, I think rename(1) isn't as well-known as it ought to be.
Good post, but I am just curious. Why this is filed under UNIX utilities yet you only mention Linux installation steps?
Hi,
I liked this, it looks very useful.
Thank you for doing this write up.
It's also available in OpenSolaris, if you have the "pending" repository setup:
Name: pv
Summary:
State: Installed
Authority: opensolaris-pending
Version: 1.1.4
Build Release: 5.11
Branch: 0.101
Packaging Date: Tue Nov 25 08:37:14 2008
Size: 56.53 kB
FMRI: pkg://opensolaris-pending/pv@1.1.4,5.11-0.101:20081125T083714Z
As a cygwin user, I installed pv from source code and tried the examples :-) . Nice tool!
MSBoob:
You suck at trolling
Unless I was metatrolled. To add to the discussion, someoned mentioned lsof. Don't forget about 'fuser' :)
For the Mac users:
macbookair:~ sam$ sudo port install pv
Password:
---> Fetching pv
---> Attempting to fetch pv-1.1.4.tar.bz2 from http://voxel.dl.sourceforge.net/pipeviewer
---> Verifying checksum(s) for pv
-----> Building pv
---> Staging pv into destroot
---> Installing pv @1.1.4_0
---> Activating pv @1.1.4_0
---> Cleaning pv
For cygwin users, compiling from source
pv is really helpful. Thanks Peter.
One more utility named jot (which can be used to print sequential or random data)
One of my post on jot is here:
http://unstableme.blogspot.com/2007/12/jot-print-sequential-or-random-data.html
Here's a simple script that I use to copy files with a progress bar. I call it vcp, inspired by a tool with this name.
#!/bin/sh if (($#!=2)); then echo "usage:" echo -e "\t$0 SRC DST" exit 1 else pv "$1" > "$2/${1##*/}" fiYes, it could be improved, especially in error handling. I also posted it to the BashFAQ, where you'll find other alternatives for this task.
Cheers,
redondos
I found pv a while back, the problem was, I kept forgetting to include it in the original slow command. After the gzip, disk image or what ever had been running for a while, I wanted a progress bar, but not to have to restart the operation.
A handy little function to solve this problem is:
rate () {
tailf $1 | pv > /dev/null
}
It tails the end of a file through pv. It won't give "how long to go" information, but it will show the throughput, and that can often be enough to figure out how long the operation will take.
Not really a utility, but the -exec flag of find(1) is incredibly useful, especially when you find yourself looping over a list of files and performing some operation on them in bash. This will search /foo/bar for vim swap files, and delete them:
find /foo/bar -name '*.swp' -exec rm '{}' ';'
@Alan:
Or, if you're zsh-enabled:
Wow, pv is really cool. Added to my arsenal.
This was quite an interesting and useful article, Peter - thanks.
Also, it looks like there is more good info in the many comments, which I must read.
Some points by me:
Like some other commenters above, I recommend xargs, lsof and fuser as useful tools. find and xargs together are a powerful combination - find lets you find all files under some directory tree that match some criteria, and xargs lets you execute a command on all those found files.
Of course the command executed can be a shell script, which means that many commands (in the script) can be executed on each of those files.
Mig:
>I’d like to read an article about more obscure utilities like ‘od’, ‘nm’ or ‘objdump’. Do they have any use in general sys administration
od definitely is useful for both system administrators as well as developers and general users. A common use of it is to display the contents of a file in one of many possible formats like:
- as characters
- as bytes in octal or hexadecimal
- as words in decimal
- etc.
This is useful to:
- see what the file contains, particularly if you don't have a "native" viewer app for it.
- to view the contents of binary files
- also od can be combined with grep and other such tools in useful ways in a pipeline
nm is more of use for developers but can also be useful to system administrators, particularly if they have some developer knowledge/skills (and IMO most good sysadmins do have that).
One common use of it is to display / dump the names of the symbols defined in object files.
- Vasudev
PV is pretty awesome. When I wrote pipemeter, I wanted to add most of the features that pv already seems to have. And the design is even pretty good.
Still.. pipemeter *is* much smaller with the major features still covered.. ;-)
Thanks Peter, I had never heard of pv before. It seems very useful.
I think you should cover screen for your next instalment in the series.
As for the nc over the net command, you can sometimes get a significant performance boost by adding gzip. In most cases the network is the bottleneck, so a few CPU cycles are often worth it.
Also, haven't tried it myself, but I imagine adding -z to the two tar arguements would have the same effect.
What is an example to monitor progress of
tar -jxf linux-source.tar.bz2
Thanks
ThrobbitChevron,
Note that pv is a handy way to add ETA calculations to any task, even ones that don't otherwise involve data flowing through pipes. For exmaple, I wrote a little perl script to fix a database normalization issue, but it was a big database and I needed to know how long it was going to take to finish. Instead of adding a bunch of date/time logic to an otherwise simple script, I just had it print '.' for each row processed, and ran the output through pv -s $(sql 'select count(*) from table') > /dev/null
Nice one, a lot of helpfull links and information :D
Have been looking for something like this in the past
pv sounds like the 'progress' utility in NetBSD
Cool utility!
Here is another cool one: hilite (google for hilite.c).
I have my 'make' command aliased to 'hilite make'.
All compilation errors are marked red!
(Assuming colors are enabled in the terminal)
In response to efk back in February:
# uname
FreeBSD
# pkg_add -r pv
:-)
Cool article, Peter. Please don't abandon this series!
My word, there's some really good and interesting material on this blog. Rgds Vince
How about a network speed test using netcat and pipe viewer? I talk about a simple example here: http://blog.jamieisaacs.com/archives/309
Awesome utility that I didn't know about, and a well-written article. Thanks!
How the hell did I not know about pv?!? Thanks alot this is definately a fantastic tool.
The nice script from redondos works just fine for copying single files. To copy a bunch of files to a common destination directory - including the usage of wildcards - I took this script as a starting point and wrote this:
#! /bin/sh if (($#<2)); then echo "usage:" echo -e "\t$0 SRC DST" exit 1 else for (( i=1; i < $#; i++ )) do pv "${@:${i}:1}" > "${!#}/${@:${i}:1}" done fiI kept the script name vcp used by redondos. An example for calling this script:
This will show the usual pv progress indicator for every file on a separate line.
I would like another further enhancement similar to the example given further above with a gzip process and a two-line display. In my case the upper line would indicate the progress of the whole copying batch while the lower line would show the progress of the current file - that means what is now produced by my actual script but it would have to stay fixed at the lower line. But I have no idea how to stick the for loop into a pv command, or vice versa ...
Ooops, there went something terribly wrong with the script code. I forgot to escape the < and > characters! Now here is the complete script.
#! /bin/sh if (($#<2)); then echo "usage:" echo -e "\t$0 SRC DST" exit 1 else for (( i=1; i < $#; i++ )) do pv "${@:${i}:1}" > "${!#}/${@:${i}:1}" done fiAnother great article! Read your blog backwards as I discovered it today: lsof, nc, and pv. Very informative and useful primers here. A unix utility you should know about is: flip - Converts ASCII files between Unix, MS-DOS/Windows, or Macintosh newline formats.
I think screen would be a great addition to the series.
Peter, thanks a lot for your article!
Thanks for the great post. This:
tar -czf - . | pv > out.tgz
was exactly what I needed.
One question, though. I'm running Linux under VMware, and in the output of the above command, kB/s alternates back and forth between 0 and a higher number, e.g. 93.1kB/s. I'm wondering why? Is it the VM, or simply a behavior of this usage of pv?
arbingersys, it's the VM. I had a similar issue under VMware.
Certainly makes sense. Thanks again for the articles.
How the hell did I not know about pv ? Thanks a lot this is definately a fantastic tool.
I found a new way to use pv:
I've got a bunch of files (2000) being created by somebody else's program, and I wanted to monitor it with pv, but there's no point where pv could be inserted.
So, I got clever (I'm using tcsh):
In one window:
% start:
% find . -name '*.icm' | gawk '{printf "%5d\n",NR}' > a; comm -23 a b >> b; sleep 1; goto start
In another window:
% tail -f b | pv -s 12000 > /dev/null
The '12000' was calculated by hand (6 bytes*total number of files). And voila, a progress meter for a non-pipe!
To do this in bash, use a 'while' loop instead of using 'goto'.
Have fun!
Thanks for the examples and showing the pv(1) utility. I'd like to point out some details in your explanation which may not seem accurate IMHO:
This may not be correct as the command is really telling you is how fast pv(1) is reading data from the file and writing to the pipe. Not how fast gzip(1) will be done compressing the whole file. gzip(1) may still be buffering and chewing on those bytes for quite a bit more.
Your second example goes in the right path IMHO:
The following statement may not be accurate either:
Dividing the current read/write speed (throughput) of these two processes may not reveal nothing about how compressed the file will finally be.
srcdir=$1
outfile=$2
tar -Ocf - $srcdir | pv -i 1 -w 50 -berps `du -bs $srcdir | awk '{print $1}'` | 7za a -si $outfile
Thanks for this article. I didn't know pv yet, and I know that it could be useful lot of times...
Sometimes, pv(1) doesn't help. For example, `tar cf - foo | bzip2 -9v >foo.tar.bz2'. You don't know the size of the data that needs to pass down the pipe. But I sometimes find watching tar(1) open the files to read is handy; `strace -e trace=open $(pidof tar)'.
Well, strace is useful for many things but for what you do the -v parameter of tar is enough, you'll get the list of files added to / extracted from the archive...
The first argument to tar should be a function; either one of the letters Acdrtux, or one of the long function names. A function letter need not be prefixed with ``-'', and may be combined with other single-letter options.
There was this argument generating tool that some person blogged about. He used it to test some drivers or other program related to electric circuits. It used patterns as it's argument and generated expanded list of argument as output. For e.g I1-10 would generate I1 I2 I3 ... I10 and so on. I tried to search for the webpage again but could not find it.
Would be great if someone could link to it.
# Copying a big file
pv srcfile | cat - > dstfile
# Measuring disk speed (including raided devices)
pv /dev/sda > /dev/null
pv /dev/md0 > /dev/null
I have /dev/md0 set up as a RAID0 device (using /dev/sda and /dev/sdb) and I did get twice the read speed of /dev/sda
Modern OS X: `brew install pv`
Leave a new comment