Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject: Anyone using pbzip2?
PostPosted: Sat Dec 19, 2009 7:55 pm 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Is anyone using pbzip2 for backups or packaging of similarly large files (hundreds of MB, even some GB)? Just wondering what your experiences are.

It is in essence parallel bzip and produces bzip2 compatible files, so I suppose it should be much faster than and produce smaller files than gzip.


Top
   
 Post subject:
PostPosted: Sat Dec 19, 2009 7:56 pm 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 569
Website: http://www.mattnordhoff.com/
I've seen a few people who use it in #linode, but I myself haven't bothered...


Top
   
 Post subject:
PostPosted: Sat Dec 19, 2009 9:14 pm 
Offline
Senior Member
User avatar

Joined: Sat Aug 30, 2008 1:55 pm
Posts: 1739
Location: Rochester, New York
I've used it. It worked out very well CPU-wise, although I was doing it in a pipeline with a very I/O-intensive task, so I switched back to using normal bzip2 to slow things down a bit.

It's how I back up my claims on IRC that I'm always able to get ~400% CPU when I want it :-)


Top
   
 Post subject:
PostPosted: Sun Dec 20, 2009 1:29 am 
Offline
Senior Member
User avatar

Joined: Sun Feb 08, 2004 7:18 pm
Posts: 562
Location: Austin
I use it whenever I can; it's great. Often I find myself in the same boat as Azathoth, where I'm piping the output through gpg, so it doesn't gain me much there. But for other uses, absolutely.


Top
   
 Post subject:
PostPosted: Mon Dec 21, 2009 8:33 am 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Hmm... I tried using it but I don't see much improvement for my particular case (1.3G of Gentoo Portage including distfiles and compiled binaries). pbzip2 working on 4 cores completes in a bit less time as gzip (a few seconds less @ cca 1 min overall), but the size difference (cca 10% smaller bz2) is not sufficient to warrant taxing of 4 cores to get the same job done.

Although I'm sure bpzip2 works as intended because regular bzip would have a smaller archive but in much longer time.

I guess it would work best for larger archives (for sizes in gigabytes).


Top
   
 Post subject:
PostPosted: Mon Dec 21, 2009 2:04 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
You're comparing apples and oranges, bzip2 and gzip:

gzip --> pigz
bzip2 --> pbzip2

You're comparing gzip to pbzip2 for some reason. If you want to speed up gzip, use pigz. If you want to speed up bzip2, use pbzip2.


Top
   
 Post subject:
PostPosted: Mon Dec 21, 2009 6:32 pm 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Actually I'm not. The idea is to produce smaller archive files, thus comparing the bz with gz algorithms, or more precisely wanting to achieve bz compression ratio at gz speed.


Top
   
 Post subject:
PostPosted: Tue Dec 22, 2009 1:22 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
But you can't, because "gz speed" would be the speed achieved with pigz (the parallel version of gzip). And pbzip2 will obviously be nowhere near as fast as pigz.


Top
   
 Post subject:
PostPosted: Tue Dec 22, 2009 5:45 pm 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Guspaz wrote:
But you can't, because "gz speed" would be the speed achieved with pigz (the parallel version of gzip). And pbzip2 will obviously be nowhere near as fast as pigz.


And pigz would still produce archives as large as gz. So I wanted filesizes of bzip at the speed of gz or better.

The question remains do I want increased I/O when four processes start asking for disc access at once, and what is more important to me, smaller archive or increased I/O. No one can answer that for me, but myself. ;)


Top
   
 Post subject:
PostPosted: Tue Dec 22, 2009 6:02 pm 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Perhaps I need to explain what I want it in detail.

I have 1.3G of files to archive and ship out compressed and encrypted via FTP every morning at 5am localtime when the server is least loaded.

gz takes approx 56 seconds and produces 800MB archive
bz2 takes few minutes and produces smaller file (don't remember exact figures)
pbzip2 takes 54 seconds and produces 740MB archive, but at 4 times the IO of gz, because I use 4 processes (one per core)

Now, if I used pigz, I am sure it would take much less than 50 seconds, but will produce 800MB archive just like gz, and peak the I/O four times more.

This is just a test, in preparation for rather larger archives (few GB) at rather larger load than I currently have on the server, which will be needed once we start a new local service in January.

So I will need to balance between:

- smaller or bigger IO peak
- longer or shorter network hogging to get the backup over, including smaller or larger archive to store on the backup server.
- backup locally first, then ship away the encrypted tarball, or tar, compress, encrypt and ship away without local files on the fly (and I have yet to test if parallel compression works with data on stdout)

Sure, pigz will produce the archive faster, but I want them smaller, and doing so, I want to see how much IO and CPU will it take to produce them smaller, and what is better, longer but less taxing serial, or quicker but more IO intensive parallel compression. Though I still want smaller archive, so pbzip2 is better for me than pigz.


Top
   
 Post subject:
PostPosted: Wed Dec 23, 2009 1:29 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
pbzip operating in 54 seconds should use about the same amount of IO as gz does. They're reading the same source data in about the same amount of time.


Top
   
 Post subject:
PostPosted: Sun Dec 27, 2009 3:04 am 
Offline
Senior Member
User avatar

Joined: Sun Jan 18, 2009 2:41 pm
Posts: 830
For limiting CPU and I/O usage, you may want to play around with nice and ionice (article).


Top
   
 Post subject:
PostPosted: Tue Dec 29, 2009 1:47 pm 
Offline
Senior Member
User avatar

Joined: Tue May 26, 2009 3:29 pm
Posts: 1691
Location: Montreal, QC
Here's something for consideration: 7zip. It's open-source (and the author made his LZMA algorithm public-domain), and should be in most distro repos. It also supports multi-core compression/decompression.

Compression performance is a bit better than bzip2, but usually much faster. RAM requirements are usually higher, though (depends on the dictionary size, which affects compression).

It's also got decent support. The app itself is available for *nix, but also for other platforms like Windows, and WinRAR can also decompression 7zip archives.


Top
   
 Post subject:
PostPosted: Thu Dec 31, 2009 8:21 am 
Offline
Senior Member

Joined: Mon Dec 07, 2009 6:46 am
Posts: 331
Thanks for all your interesting suggestions.

@Guspaz

I guess you're right. For more or less same filesizes, shorter processing time can only mean lower I/O, smaller peaks.


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group