This article is part of the article series "Unix Utilities You Should Know About."
<- previous article next article ->
Unix Utilities

Hi all. I'm starting yet another article series here. This one is going to be about Unix utilities that you should know about. The articles will discuss one Unix program at a time. I'll try to write a good introduction to the tool and give as many examples as I can think of.

Before I start, I want to clarify one thing - Why am I starting so many article series? The answer is that I want to write about many topics simultaneously and switch between them as I feel inspired.

The first post in this series is going to be about not so well known Unix program called Pipe Viewer or pv for short. Pipe viewer is a terminal-based tool for monitoring the progress of data through a pipeline. It can be inserted into any normal pipeline between two processes to give a visual indication of how quickly data is passing through, how long it has taken, how near to completion it is, and an estimate of how long it will be until completion.

Pipe viewer is written by Andrew Wood, an experienced Unix sysadmin. The homepage of pv utility is here: pv utility.

If you feel like you are interested in this stuff, I suggest that you subscribe to my rss feed to receive my future posts automatically.

How to use pv?

Ok, let's start with some really easy examples and progress to more complicated ones.

Suppose that you had a file "access.log" that is a few gigabytes in size and contains web logs. You want to compress it into a smaller file, let's say a gunzip archive (.gz). The obvious way would be to do:

$ gzip -c access.log > access.log.gz

As the file is so huge (several gigabytes), you have no idea how long to wait. Will it finish soon? Or will it take another 30 mins?

By using pv you can precisely time how long it will take. Take a look at doing the same through pv:

$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=>      ] 15% ETA 0:00:59

Pipe viewer acts as "cat" here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.

You may stick several pv processes in between. For example, you can time how fast the data is being read from the disk and how much data is gzip outputting:

$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source:  760MB 0:00:15 [37.4MB/s] [=>     ] 19% ETA 0:01:02
  gzip: 34.5MB 0:00:15 [1.74MB/s] [  <=>  ]

Here we specified the "-N" parameter to pv to create a named stream. The "-c" parameter makes sure the output is not garbaged by one pv process writing over the other.

This example shows that "access.log" file is being read at a speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!

Notice how the gzip does not include how much data is left or how fast it will finish. It's because the pv process after gzip has no idea how much data gzip will produce (it's just outputting compressed data from input stream). The first pv process, however, knows how much data is left, because it's reading it.

Another similar example would be to pack the whole directory of files into a compressed tarball:

$ tar -czf - . | pv > out.tgz
 117MB 0:00:55 [2.7MB/s] [>         ]

In this example pv shows just the output rate of "tar -czf" command. Not very interesting and it does not provide information about how much data is left. We need to provide the total size of data we are tarring to pv, it's done this way:

$ tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz
 253MB 0:00:05 [46.7MB/s] [>     ]  1% ETA 0:04:49

What happens here is we tell tar to create "-c" an archive of all files in current dir "." (recursively) and output the data to stdout "-f -". Next we specify the size "-s" to pv of all files in current dir. The "du -sb . | awk '{print $1}'" returns number of bytes in current dir, and it gets fed as "-s" parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way "pv" knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.

Another fine example is copying large amounts of data over network by using help of "nc" utility that I will write about some other time.

Suppose you have two computers A and B. You want to transfer a directory from A to B very quickly. The fastest way is to use tar and nc, and time the operation with pv.

# on computer A, with IP address 192.168.1.100
$ tar -cf - /path/to/dir | pv | nc -l -p 6666 -q 5
# on computer B
$ nc 192.168.1.100 6666 | pv | tar -xf -

That's it. All the files in /path/to/dir on computer A will get transferred to computer B, and you'll be able to see how fast the operation is going.

If you want the progress bar, you have to do the "pv -s $(...)" trick from the previous example (only on computer A).

Another funny example is by my blog reader alexandru. He shows how to time how fast the computer reads from /dev/zero:

$ pv /dev/zero > /dev/null
 157GB 0:00:38 [4,17GB/s]

That's about it. I hope you enjoyed my examples and learned something new. I love explaining things and teaching! :)

How to install pv?

If you're on Debian or Debian based system such as Ubuntu do the following:

$ sudo aptitude install pv

If you're on Fedora or Fedora based system such as CentOS do:

$ sudo yum install pv

If you're on Slackware, go to pv homepage, download the pv-version.tar.gz archive and do:

$ tar -zxf pv-version.tar.gz
$ cd pv-version
$ ./configure && sudo make install

If you're a Mac user:

$ sudo port install pv

If you're OpenSolaris user:

$ pfexec pkg install pv

If you're a Windows user on Cygwin:

$ ./configure
$ export DESTDIR=/cygdrive/c/cygwin
$ make
$ make install

The manual of the utility can be found here man pv.

Have fun measuring your pipes with pv, and until next time!

This article is part of the article series "Unix Utilities You Should Know About."
<- previous article next article ->

Comments

alexandru Permalink
February 02, 2009, 14:59

i like this one:

$ pv /dev/null
157GB 0:00:38 [4,17GB/s]

February 02, 2009, 14:59

Hi, that's a pretty nifty tool, I've been using 'cwp' for more or less the same thing: http://www.ex-parrot.com/~chris/software.html

I think 'screen' is the biggest life changer.

alexandru Permalink
February 02, 2009, 15:00

i like this one:

$ pv < /dev/zero > /dev/null
 157GB 0:00:38 [4,17GB/s]
alexandru Permalink
October 25, 2011, 00:28

similar to

$ dd if=/dev/zero of=/dev/null& pid=$!
$ kill -USR1 $pid; sleep 1; kill $pid

from dd manpage

February 02, 2009, 15:22

Haha, alexandru. That's a great example. :)

Charlie Permalink
February 02, 2009, 15:38

That pv thing rocks...;)

When output is TAB separated (like du's one) cut -f1 is a little shorter than awk '{print $1}' :)

February 02, 2009, 15:59

Good advice, Charlie. Totally forgot about cut utility.

NickF Permalink
February 02, 2009, 16:26

Very useful.

How about "cowsay" for your next article :-)

oog robot Permalink
February 02, 2009, 17:14

Useful utilities / shell idioms not everyone knows about:

lsof (for finding what process is holding that file)
watch
screen
xargs
which (for people with messy PATHs)
Using backticks in commands

February 02, 2009, 17:15

Are there any performance ramifications of inserting this into the stream?

February 02, 2009, 17:21

Wow, really useful tool, great post!

February 02, 2009, 17:49

oog's is a good list (though I can't imagine anybody not knowing "which"). I also find many people don't know about pstree or od.

February 02, 2009, 18:15

@Daniel Watkins, yes, obviously because there is one more process in the pipline, but judging by Alexandru's example it won't slow things down by much.

February 02, 2009, 18:59

Cool, nice utility! Heard about this post from http://arhuaco.org/pipe-viewer .

February 02, 2009, 19:59

Great article! I've added "pv" to my sys admin toolkit and subscribed to your blog. Thanks!

Best,
Aleksey

David JHbesw Permalink
February 02, 2009, 20:04

You are right, that is a must have!

RT

February 02, 2009, 20:47

grep, sed and cut will do everything you ever need. Well nearly. http://www.coldclimate.co.uk/2009/01/26/insomnia-unix-tools-txtr/

February 02, 2009, 21:30

Before I found pv, I used "buffer -z 512K" to get pipe flow feedback. It doesn't have a concept of total size, so you end up doing the ETA calculations yourself.

Are there any performance ramifications of inserting this into the stream?

Yes, and one way to measure them is to chain several pv's together:

pv -c  /dev/null
 
#     MB/s
   pv  buffer
1 4430  2950
2  514   580
3  271   296
4  206   213
5  201   197
6  162   157

(buffer comparison included for curiosity's sake.)

February 02, 2009, 21:33

... chain several pv’s together: pv -c < /dev/zero | pv -c | pv -c | pv -c | pv -c > /dev/null

February 02, 2009, 22:19

Wow thanks for the great post. I've been a UNIX user for 12 years and although I've needed PV many times, I didn't know it existed. Very cool!

BTW it's in macports as well under "pv".

Alan

skoob Permalink
February 02, 2009, 23:34
Eddy Permalink
February 03, 2009, 00:38

Very nice - haven't come across this one before. Other potential subjects? How about bash history, searching and re-submitting via ! (eg, !! or ! : parameter via !$).

Mig Permalink
February 03, 2009, 01:41

I'd like to read an article about more obscure utilities like 'od', 'nm' or 'objdump'. Do they have any use in general sys administration?

efk Permalink
February 03, 2009, 02:15

# uname
FreeBSD
# pwd
/usr/ports/sysutils/pipemeter
#

MS Bob Permalink
February 03, 2009, 02:42

OMG lame. Uniz is shit. its all about DOS. Can you even play games on unix? more like lames anyway lol bc they are lame games (= lames)

Also, Dos is way more secure and it is the basis of unix, making it a more developed language. Even linus torvalds has admitted to using dos for all important work.

DingDong Permalink
August 22, 2010, 19:46

If that was sarcasm, yes it was funny :).
If that was not sarcasm, please. Get over it. There is stuff Unix does best and there is stuff Windows does best. Both do their stuff well and its all about personal taste and is subjective. However, what you're saying is completely and utterly wrong.

shazam Permalink
April 25, 2012, 02:38

lol, hilarious dos is the foundation of unix

real admins don't do windows

LinuxMidskillz Permalink
September 12, 2012, 16:40

I to use to bash unix back before I learned how smart and powerful you feel when you can use something that the average bone head can't. Linux isn't for everyone you have to be of at least average intelligence to use it. I wonder why so many servers use unix/linux. I guess because it so lame and almost never crashes. Hmmm, disc operating system..........sounds old.

Tim Permalink
February 03, 2009, 02:55

Not to be confused with pv, the phase vocoder.

As for your question, I think rename(1) isn't as well-known as it ought to be.

Djinn Permalink
February 03, 2009, 02:58

Good post, but I am just curious. Why this is filed under UNIX utilities yet you only mention Linux installation steps?

Matthew Conolly Permalink
February 03, 2009, 03:30

Hi,
I liked this, it looks very useful.
Thank you for doing this write up.

sean Permalink
February 03, 2009, 04:01

It's also available in OpenSolaris, if you have the "pending" repository setup:

Name: pv
Summary:
State: Installed
Authority: opensolaris-pending
Version: 1.1.4
Build Release: 5.11
Branch: 0.101
Packaging Date: Tue Nov 25 08:37:14 2008
Size: 56.53 kB
FMRI: pkg://opensolaris-pending/pv@1.1.4,5.11-0.101:20081125T083714Z

February 03, 2009, 07:00

As a cygwin user, I installed pv from source code and tried the examples :-) . Nice tool!

Diggers Permalink
February 03, 2009, 07:31

MSBoob:
You suck at trolling

Diggers Permalink
February 03, 2009, 07:33

Unless I was metatrolled. To add to the discussion, someoned mentioned lsof. Don't forget about 'fuser' :)

February 03, 2009, 08:20

For the Mac users:

macbookair:~ sam$ sudo port install pv
Password:
---> Fetching pv
---> Attempting to fetch pv-1.1.4.tar.bz2 from http://voxel.dl.sourceforge.net/pipeviewer
---> Verifying checksum(s) for pv
-----> Building pv
---> Staging pv into destroot
---> Installing pv @1.1.4_0
---> Activating pv @1.1.4_0
---> Cleaning pv

klang Permalink
February 03, 2009, 09:28

For cygwin users, compiling from source

./configure
export DESTDIR=/cygdrive/c/cygwin
make
make install
February 03, 2009, 09:55

pv is really helpful. Thanks Peter.

One more utility named jot (which can be used to print sequential or random data)

One of my post on jot is here:

http://unstableme.blogspot.com/2007/12/jot-print-sequential-or-random-data.html

February 03, 2009, 12:54

Here's a simple script that I use to copy files with a progress bar. I call it vcp, inspired by a tool with this name.

#!/bin/sh

if (($#!=2)); then
	echo "usage:"
	echo -e "\t$0 SRC DST"
	exit 1
else	
	pv "$1" > "$2/${1##*/}"
fi

Yes, it could be improved, especially in error handling. I also posted it to the BashFAQ, where you'll find other alternatives for this task.

Cheers,
redondos

Jonathan Wright Permalink
February 03, 2009, 12:57

I found pv a while back, the problem was, I kept forgetting to include it in the original slow command. After the gzip, disk image or what ever had been running for a while, I wanted a progress bar, but not to have to restart the operation.

A handy little function to solve this problem is:

rate () {
tailf $1 | pv > /dev/null
}

It tails the end of a file through pv. It won't give "how long to go" information, but it will show the throughput, and that can often be enough to figure out how long the operation will take.

February 03, 2009, 16:34

Not really a utility, but the -exec flag of find(1) is incredibly useful, especially when you find yourself looping over a list of files and performing some operation on them in bash. This will search /foo/bar for vim swap files, and delete them:

find /foo/bar -name '*.swp' -exec rm '{}' ';'

February 03, 2009, 16:46

@Alan:

Or, if you're zsh-enabled:

rm /foo/bar/**/*.swp
February 03, 2009, 22:18

Wow, pv is really cool. Added to my arsenal.

February 03, 2009, 22:36

This was quite an interesting and useful article, Peteris - thanks.

Also, it looks like there is more good info in the many comments, which I must read.

Some points by me:

Like some other commenters above, I recommend xargs, lsof and fuser as useful tools. find and xargs together are a powerful combination - find lets you find all files under some directory tree that match some criteria, and xargs lets you execute a command on all those found files.

Of course the command executed can be a shell script, which means that many commands (in the script) can be executed on each of those files.

Mig:

>I’d like to read an article about more obscure utilities like ‘od’, ‘nm’ or ‘objdump’. Do they have any use in general sys administration

od definitely is useful for both system administrators as well as developers and general users. A common use of it is to display the contents of a file in one of many possible formats like:

- as characters
- as bytes in octal or hexadecimal
- as words in decimal
- etc.

This is useful to:

- see what the file contains, particularly if you don't have a "native" viewer app for it.

- to view the contents of binary files

- also od can be combined with grep and other such tools in useful ways in a pipeline

nm is more of use for developers but can also be useful to system administrators, particularly if they have some developer knowledge/skills (and IMO most good sysadmins do have that).

One common use of it is to display / dump the names of the symbols defined in object files.

- Vasudev

February 03, 2009, 23:53

PV is pretty awesome. When I wrote pipemeter, I wanted to add most of the features that pv already seems to have. And the design is even pretty good.

Still.. pipemeter *is* much smaller with the major features still covered.. ;-)

Marcos Lara Permalink
February 05, 2009, 00:52

Thanks Peter, I had never heard of pv before. It seems very useful.

I think you should cover screen for your next instalment in the series.

wjw Permalink
February 07, 2009, 00:30

As for the nc over the net command, you can sometimes get a significant performance boost by adding gzip. In most cases the network is the bottleneck, so a few CPU cycles are often worth it.

# on computer A, with IP address 192.168.1.100
 $ tar -cf - /path/to/dir | pv | gzip | nc -l -p 6666 -q 5 
# on computer B
 $ nc 192.168.1.100 6666 | gunzip | pv | tar -xf - 

Also, haven't tried it myself, but I imagine adding -z to the two tar arguements would have the same effect.

ThrobbitChevron Permalink
February 09, 2009, 06:47

What is an example to monitor progress of

tar -jxf linux-source.tar.bz2

Thanks

February 09, 2009, 11:46

ThrobbitChevron,

pv linux-source.tar.bz2 | tar -jxf -
February 13, 2009, 20:22

Note that pv is a handy way to add ETA calculations to any task, even ones that don't otherwise involve data flowing through pipes. For exmaple, I wrote a little perl script to fix a database normalization issue, but it was a big database and I needed to know how long it was going to take to finish. Instead of adding a bunch of date/time logic to an otherwise simple script, I just had it print '.' for each row processed, and ran the output through pv -s $(sql 'select count(*) from table') > /dev/null

May 21, 2009, 11:06

Nice one, a lot of helpfull links and information :D

Have been looking for something like this in the past

argv Permalink
June 01, 2009, 07:29

pv sounds like the 'progress' utility in NetBSD

Christoph Permalink
June 22, 2009, 14:14

Cool utility!

Here is another cool one: hilite (google for hilite.c).
I have my 'make' command aliased to 'hilite make'.

All compilation errors are marked red!
(Assuming colors are enabled in the terminal)

CorkyAgain Permalink
July 13, 2009, 22:05

In response to efk back in February:

# uname
FreeBSD
# pkg_add -r pv

:-)

Cool article, Peteris. Please don't abandon this series!

July 14, 2009, 16:19

My word, there's some really good and interesting material on this blog. Rgds Vince

August 28, 2009, 21:18

How about a network speed test using netcat and pipe viewer? I talk about a simple example here: http://blog.jamieisaacs.com/archives/309

November 18, 2009, 15:28

Awesome utility that I didn't know about, and a well-written article. Thanks!

December 13, 2009, 14:17

How the hell did I not know about pv?!? Thanks alot this is definately a fantastic tool.

bkant Permalink
December 23, 2009, 23:44

The nice script from redondos works just fine for copying single files. To copy a bunch of files to a common destination directory - including the usage of wildcards - I took this script as a starting point and wrote this:

#! /bin/sh
if (($#<2)); then
	echo "usage:"
	echo -e "\t$0 SRC DST"
	exit 1
else
	for (( i=1; i < $#; i++ ))
	do
		pv "${@:${i}:1}" > "${!#}/${@:${i}:1}"
	done
fi

I kept the script name vcp used by redondos. An example for calling this script:

vcp bunch-of-files*.rar /destination/dir

This will show the usual pv progress indicator for every file on a separate line.

I would like another further enhancement similar to the example given further above with a gzip process and a two-line display. In my case the upper line would indicate the progress of the whole copying batch while the lower line would show the progress of the current file - that means what is now produced by my actual script but it would have to stay fixed at the lower line. But I have no idea how to stick the for loop into a pv command, or vice versa ...

bkant Permalink
December 23, 2009, 23:50

Ooops, there went something terribly wrong with the script code. I forgot to escape the < and > characters! Now here is the complete script.

#! /bin/sh
if (($#<2)); then
	echo "usage:"
	echo -e "\t$0 SRC DST"
	exit 1
else
	for (( i=1; i < $#; i++ ))
	do
		pv "${@:${i}:1}" > "${!#}/${@:${i}:1}"
	done
fi

Another great article! Read your blog backwards as I discovered it today: lsof, nc, and pv. Very informative and useful primers here. A unix utility you should know about is: flip - Converts ASCII files between Unix, MS-DOS/Windows, or Macintosh newline formats.

I think screen would be a great addition to the series.

Alexander Simakov Permalink
January 11, 2010, 09:08

Peteris, thanks a lot for your article!

February 04, 2010, 22:18

Thanks for the great post. This:

tar -czf - . | pv > out.tgz

was exactly what I needed.

One question, though. I'm running Linux under VMware, and in the output of the above command, kB/s alternates back and forth between 0 and a higher number, e.g. 93.1kB/s. I'm wondering why? Is it the VM, or simply a behavior of this usage of pv?

February 04, 2010, 22:42

arbingersys, it's the VM. I had a similar issue under VMware.

February 08, 2010, 15:09

Certainly makes sense. Thanks again for the articles.

February 28, 2010, 22:51

How the hell did I not know about pv ? Thanks a lot this is definately a fantastic tool.

P Fudd Permalink
June 18, 2010, 18:05

I found a new way to use pv:

I've got a bunch of files (2000) being created by somebody else's program, and I wanted to monitor it with pv, but there's no point where pv could be inserted.

So, I got clever (I'm using tcsh):

In one window:
% start:
% find . -name '*.icm' | gawk '{printf "%5d\n",NR}' > a; comm -23 a b >> b; sleep 1; goto start

In another window:
% tail -f b | pv -s 12000 > /dev/null

The '12000' was calculated by hand (6 bytes*total number of files). And voila, a progress meter for a non-pipe!

To do this in bash, use a 'while' loop instead of using 'goto'.

Have fun!

maxwcc Permalink
July 21, 2010, 17:44

Thanks for the examples and showing the pv(1) utility. I'd like to point out some details in your explanation which may not seem accurate IMHO:

$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=> ] 15% ETA 0:00:59

Pipe viewer acts as "cat" here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.

This may not be correct as the command is really telling you is how fast pv(1) is reading data from the file and writing to the pipe. Not how fast gzip(1) will be done compressing the whole file. gzip(1) may still be buffering and chewing on those bytes for quite a bit more.

Your second example goes in the right path IMHO:

$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source: 760MB 0:00:15 [37.4MB/s] [=> ] 19% ETA 0:01:02
gzip: 34.5MB 0:00:15 [1.74MB/s] [ <=> ]

The following statement may not be accurate either:

We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!"

Dividing the current read/write speed (throughput) of these two processes may not reveal nothing about how compressed the file will finally be.

leebert Permalink
July 25, 2010, 19:43

srcdir=$1
outfile=$2

tar -Ocf - $srcdir | pv -i 1 -w 50 -berps `du -bs $srcdir | awk '{print $1}'` | 7za a -si $outfile

September 07, 2010, 20:14

Thanks for this article. I didn't know pv yet, and I know that it could be useful lot of times...

Anonymous Permalink
July 28, 2011, 15:28

Sometimes, pv(1) doesn't help. For example, `tar cf - foo | bzip2 -9v >foo.tar.bz2'. You don't know the size of the data that needs to pass down the pipe. But I sometimes find watching tar(1) open the files to read is handy; `strace -e trace=open $(pidof tar)'.

alexandru Permalink
October 25, 2011, 01:11

Well, strace is useful for many things but for what you do the -v parameter of tar is enough, you'll get the list of files added to / extracted from the archive...

alexandru Permalink
October 25, 2011, 01:13

The first argument to tar should be a function; either one of the letters Acdrtux, or one of the long function names. A function letter need not be prefixed with ``-'', and may be combined with other single-letter options.

Alok Permalink
December 14, 2011, 10:13

There was this argument generating tool that some person blogged about. He used it to test some drivers or other program related to electric circuits. It used patterns as it's argument and generated expanded list of argument as output. For e.g I1-10 would generate I1 I2 I3 ... I10 and so on. I tried to search for the webpage again but could not find it.

Would be great if someone could link to it.

The Gripmaster Permalink
March 04, 2012, 07:58

# Copying a big file
pv srcfile | cat - > dstfile

# Measuring disk speed (including raided devices)
pv /dev/sda > /dev/null
pv /dev/md0 > /dev/null

I have /dev/md0 set up as a RAID0 device (using /dev/sda and /dev/sdb) and I did get twice the read speed of /dev/sda

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the first letter of your name: (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements