You're replying to a comment by maxwcc.

maxwcc Permalink
July 21, 2010, 17:44

Thanks for the examples and showing the pv(1) utility. I'd like to point out some details in your explanation which may not seem accurate IMHO:

$ pv access.log | gzip > access.log.gz
611MB 0:00:11 [58.3MB/s] [=> ] 15% ETA 0:00:59

Pipe viewer acts as "cat" here, except it also adds a progress bar. We can see that gzip processed 611MB of data in 11 seconds. It has processed 15% of all data and it will take 59 more seconds to finish.

This may not be correct as the command is really telling you is how fast pv(1) is reading data from the file and writing to the pipe. Not how fast gzip(1) will be done compressing the whole file. gzip(1) may still be buffering and chewing on those bytes for quite a bit more.

Your second example goes in the right path IMHO:

$ pv -cN source access.log | gzip | pv -cN gzip > access.log.gz
source: 760MB 0:00:15 [37.4MB/s] [=> ] 19% ETA 0:01:02
gzip: 34.5MB 0:00:15 [1.74MB/s] [ <=> ]

The following statement may not be accurate either:

We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!"

Dividing the current read/write speed (throughput) of these two processes may not reveal nothing about how compressed the file will finally be.

Reply To This Comment

(why do I need your e-mail?)

(Your twitter handle, if you have one.)

Type the word "disk_126": (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.