Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
 Post subject:
PostPosted: Thu Apr 24, 2008 3:40 pm 
Offline
Senior Newbie

Joined: Thu Apr 24, 2008 3:23 pm
Posts: 6
edavis wrote:
I am anxiously waiting this feature!

Currently I use Amazon S3 for backup storage and transfer data to/from using s3sync. It's quirky and on thing I really miss is the ability to store incremental linked backups ala rsync.


I initially looked at s3sync but didnt like the lack of versioning or incremental backups.

My choice has been duplicity (the latest version). It has worked great for backups. I've configured mine to maintain 6 months of backups, making a complete backup on the first of the month, and using incremental backups in the meantime. It bundles, compresses and encrypts using GPG.

Start with this guide, then use the script below:

http://www.randys.org/2007/11/16/how-to ... -duplicity

And make sure to store a copy of your GPG key in a safe place off the server

You'll need to set change YOUR_ACCESS_KEY, YOUR_SECRET_KEY, YOUR_GPG_PASSPHRASE, YOUR_GPG_KEY, and YOUR_BUCKET_NAME.

A note about include/excludes - if you want to exclude something in a directory, you need to exclude the file/subdirectory before including the directory, as includes/excludes work on a 'first match' basis.

Code:
#!/bin/bash
# Export some ENV variables so you don't have to type anything

trace () {
        stamp=`date +%Y-%m-%d_%H:%M:%S`
        echo "$stamp: $*" >> /var/log/backup.log
}

export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
export PASSPHRASE=YOUR_GPG_PASSPHRASE

GPG_KEY=YOUR_GPG_KEY

OLDER_THAN="6M"

# The source of your backup
SOURCE=/

# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://YOUR_BUCKET_NAME"

FULL=
if [ $(date +%d) -eq 1 ]; then
        FULL=full
fi;

trace "Backup for local filesystem started"

trace "... removing old backups"

duplicity remove-older-than ${OLDER_THAN} ${DEST} >> /var/log/backup.log 2>&1

trace "... backing up filesystem"

duplicity \
    ${FULL} \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --include=/boot \
    --include=/etc \
    --include=/home \
    --include=/lib \
    --exclude=/root/.jungledisk/cache \
    --exclude=/root/.cpan \
    --include=/root \
    --include=/usr \
    --exclude=/var/tmp \
    --include=/var \
    --exclude=/** \
    ${SOURCE} ${DEST} >> /var/log/backup.log 2>&1

trace "Backup for local filesystem complete"
trace "------------------------------------"

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=


You can restore a file with this script:

Code:
#!/bin/bash
# Export some ENV variables so you don't have to type anything

export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
export PASSPHRASE=YOUR_GPG_PASSPHRASE

GPG_KEY=YOUR_GPG_KEY

# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://YOUR-BUCKET-NAME"

if [ $# -lt 3 ]; then echo "Usage $0 <time> <file> <restore-to>"; exit; fi

duplicity \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --file-to-restore $2 \
    --restore-time $1 \
    ${DEST} $3

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=


Note that paths are relative, not absolute. /etc/apache2 would be backed up as etc/apache2. You can restore whole directories but the destination needs to exist... for example, to restore /etc/apache2 from April 23rd to a local directory 'restore', doing the following would fail because ./etc does not exist:

# cd ~
# mkdir restore
# cd restore
# duplicity-restore.sh "2008-04-23" etc/apache2 etc/apache2

However, doing:

# duplicity-restore.sh "2008-04-23" etc/apache2 apache2

would restore the directory to ./apache2


Top
   
 Post subject:
PostPosted: Mon May 05, 2008 9:15 am 
Offline
Junior Member

Joined: Fri Jun 02, 2006 10:02 am
Posts: 26
MaineCoon, many thanks for this. This is the S3 backup solution that I've been searching for.

Works very very nicely here on CentOS 5 - the only slightly tricky bit is getting all the dependencies installed for duplicity as the CentOS duplicity RPM is way out of date.


Top
   
 Post subject:
PostPosted: Thu Nov 27, 2008 6:15 pm 
Offline
Senior Newbie

Joined: Thu Nov 20, 2008 5:39 pm
Posts: 17
ICQ: 221298635
I followed the above guide, but running the backup script results in this:

Code:
2008-11-27_17:11:24: Backup for local filesystem started
2008-11-27_17:11:24: ... removing old backups
No old backup sets found, nothing deleted.
2008-11-27_17:11:30: ... backing up filesystem
No signatures found, switching to full backup.
Traceback (most recent call last):
  File "/usr/bin/duplicity", line 463, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 458, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 449, in main
    full_backup(col_stats)
  File "/usr/bin/duplicity", line 155, in full_backup
    bytes_written = write_multivol("full", tarblock_iter, globals.backend)
  File "/usr/bin/duplicity", line 87, in write_multivol
    globals.gpg_profile,globals.volsize)
  File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 225, in GPGWriteFile
    file.close()
  File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 132, in close
    self.gpg_process.wait()
  File "/var/lib/python-support/python2.5/GnuPGInterface.py", line 639, in wait
    raise IOError, "GnuPG exited non-zero, with code %d" % (e << 8)
IOError: GnuPG exited non-zero, with code 131072
close failed: [Errno 32] Broken pipe
2008-11-27_17:11:31: Backup for local filesystem complete
2008-11-27_17:11:31: ------------------------------------


My backitup script is:

Code:
#!/bin/bash
# Export some ENV variables so you don't have to type anything

trace () {
        stamp=`date +%Y-%m-%d_%H:%M:%S`
        echo "$stamp: $*" >> /var/log/backup.log
}

export AWS_ACCESS_KEY_ID="xxxxxxxxxxxxxxxxx..."
export AWS_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxx..."
export PASSPHRASE=$(cat pwtextfile)

GPG_KEY=XXXXXXXX

OLDER_THAN="6M"

# The source of your backup
SOURCE=/

# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://mybucketname.s3.amazonaws.com"

FULL=
if [ $(date +%d) -eq 1 ]; then
        FULL=full
fi;

trace "Backup for local filesystem started"

trace "... removing old backups"

duplicity remove-older-than ${OLDER_THAN} ${DEST} >> /var/log/backup.log 2>&1

trace "... backing up filesystem"

duplicity \
    ${FULL} \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --include=/boot \
    --exclude=/** \
    ${SOURCE} ${DEST} >> /var/log/backup.log 2>&1

trace "Backup for local filesystem complete"
trace "------------------------------------"

# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=


I wanted to include only /boot for starts, not to waste bandwidth on testing. The /boot directory seems to be empty, but I creates a test text file there and it still didn't work so that doesn't seem to be an issue.

I used the defaults when generating my key.

EDIT: Hm, the output actually changed when I ran it with the test file created in /boot. It's this now:

Code:
2008-11-27_17:14:55: Backup for local filesystem started
2008-11-27_17:14:55: ... removing old backups
No old backup sets found, nothing deleted.
2008-11-27_17:14:55: ... backing up filesystem
No signatures found, switching to full backup.
Traceback (most recent call last):
  File "/usr/bin/duplicity", line 463, in <module>
    with_tempdir(main)
  File "/usr/bin/duplicity", line 458, in with_tempdir
    fn()
  File "/usr/bin/duplicity", line 449, in main
    full_backup(col_stats)
  File "/usr/bin/duplicity", line 155, in full_backup
    bytes_written = write_multivol("full", tarblock_iter, globals.backend)
  File "/usr/bin/duplicity", line 87, in write_multivol
    globals.gpg_profile,globals.volsize)
  File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 213, in GPGWriteFile
    data = block_iter.next(bytes_to_go).data
  File "/usr/lib/python2.5/site-packages/duplicity/diffdir.py", line 407, in next
    result = self.process(self.input_iter.next(), size)
  File "/usr/lib/python2.5/site-packages/duplicity/diffdir.py", line 284, in get_delta_iter_w_sig
    sigTarFile.close()
  File "/usr/lib/python2.5/site-packages/duplicity/tarfile.py", line 508, in close
    self.fileobj.write("\0" * (RECORDSIZE - remainder))
  File "/usr/lib/python2.5/site-packages/duplicity/dup_temp.py", line 101, in write
    return self.fileobj.write(buf)
  File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 125, in write
    return self.gpg_input.write(buf)
IOError: [Errno 32] Broken pipe
close failed: [Errno 32] Broken pipe
2008-11-27_17:14:55: Backup for local filesystem complete
2008-11-27_17:14:55: ------------------------------------


I'm quite far off from understanding these sorts of errors. :oops:


Any ideas?

Thanks


Top
   
 Post subject:
PostPosted: Fri Nov 28, 2008 8:14 am 
Offline
Senior Member

Joined: Fri Sep 12, 2008 3:17 am
Posts: 166
Website: http://independentchaos.com
Wild Guess: If your /boot is empty it is because it probably unmounted after a successful boot.

I found this in a wild google search:
http://osdir.com/ml/sysutils.backup.dup ... 00036.html
https://bugs.launchpad.net/ubuntu/+sour ... bug/126417

Common thing is there seems to be a problem with duplicity.


Top
   
 Post subject:
PostPosted: Sun Nov 30, 2008 2:22 am 
Offline
Senior Newbie

Joined: Thu Oct 23, 2008 4:13 pm
Posts: 13
AOL: eyeni4net
this tut should get you up and running with s3 + rsync, no gui needed

http://el-studio.com/article/jungledisk-linux-backups


Top
   
 Post subject:
PostPosted: Mon Dec 01, 2008 8:15 pm 
Offline
Senior Newbie

Joined: Thu Nov 20, 2008 5:39 pm
Posts: 17
ICQ: 221298635
Thanks guys. Sorry for not responding earlier. I kinda just dropped this case for the time being and went on with other things, but I appreciate your help.

If duplicity is indeed buggy I'm tending towards not using it at this point. Jungledisk option seems good, but I dislike the fact that it's proprietary.. I guess though if no better option is available I might go with it.

Thanks again.


Top
   
 Post subject:
PostPosted: Mon Dec 01, 2008 10:39 pm 
Offline
Senior Member

Joined: Sat May 03, 2008 4:01 pm
Posts: 568
Website: http://www.mattnordhoff.com/
memenode wrote:
Thanks guys. Sorry for not responding earlier. I kinda just dropped this case for the time being and went on with other things, but I appreciate your help.

If duplicity is indeed buggy I'm tending towards not using it at this point. Jungledisk option seems good, but I dislike the fact that it's proprietary.. I guess though if no better option is available I might go with it.

Thanks again.


FWIW, JungleDisk has (or had, last I heard) some GPL software to extract the data from your S3 account. So if JungleDisk goes under or you lose your license or whatever, you won't lose your data, though you'll obviously have to find a new solution for new data.


Top
   
PostPosted: Tue Dec 02, 2008 1:27 am 
Offline
Newbie

Joined: Mon Feb 25, 2008 4:35 pm
Posts: 2
Website: http://www.hearsaynashville.com
So I noticed that my alerts have been going haywire for the last week, and then I looked at my AWS account. Far too many GB used. Wow. I took a look, and apparently last week sometime duplicity started making full backups every time it runs on BOTH my linodes. I have no idea what I changed that caused this, but it must have been something I've changed on both linodes, and I'm racking my brain trying to figure it out. Does anyone have an idea what might be causing this?


Top
   
 Post subject:
PostPosted: Tue Dec 02, 2008 1:33 am 
Offline
Newbie

Joined: Mon Feb 25, 2008 4:35 pm
Posts: 2
Website: http://www.hearsaynashville.com
Update: It's not random. More than half the storage space used in my AWS account for duplicity backups was filled today. Today is the first of a new month, the first time a new month has occurred since I installed duplicity. Hmm... I know I've got it set to make a full backup on the first of every month, but I didn't realize it was going to do it during ANY hour of that first day that backups are scheduled. Oh well.... I suppose I'll delete all of today's backups except for the most recent.


Top
   
 Post subject:
PostPosted: Tue Dec 02, 2008 11:55 am 
Offline
Senior Newbie

Joined: Thu Nov 20, 2008 5:39 pm
Posts: 17
ICQ: 221298635
mnordhoff wrote:
FWIW, JungleDisk has (or had, last I heard) some GPL software to extract the data from your S3 account. So if JungleDisk goes under or you lose your license or whatever, you won't lose your data, though you'll obviously have to find a new solution for new data.


Yeah I noticed something like that. Unfortunately though this option didn't quite work. Running jungledisk /mnt/s3 didn't mount anything. It still remained just a local directory. Maybe there's a solution to that, but frankly I'd really like something better..

Well I guess if all else fails I can just try s3sync or s3fs..


Top
   
PostPosted: Sun Jan 25, 2009 2:13 pm 
Offline
Senior Newbie

Joined: Wed Jan 07, 2009 5:25 am
Posts: 10
Website: http://chrisboshuizen.com
I had the same problems as mentioned by memenode, and I fixed them by checking three things: install location, permissions, and packages installed.

For install location, I picked a location for the root or backup user to run the script from. I chose /etc/backup, and performed all of the steps there.

For permissions, I made sure that I created all the necessary files, including the gpg key, with the correct user. This seemed to be an important step.

I also chown'd everything to the correct user.

Then, I made sure I installed the correct packages. It seemed to me that missing dependencies were to blame, so I followed these directions:

Code:
sudo aptitude build-dep duplicity

or in the case of my ubuntu version (8.04),
Code:
sudo apt-get build-dep duplicity


and then followed the rest of the steps:
Code:
$ sudo aptitude install python-boto ncftp
$ wget http://savannah.nongnu.org/download/duplicity/duplicity-0.5.03.tar.gz
$ tar xvzf duplicity-0.5.03.tar.gz
$ cd duplicity-0.5.03/
$ sudo python setup.py install

checking for the latest version of duplicity, which at this time is 0.5.06.

As a final step, make sure that the log can be written by the desired user.

I then made sure that I ran the script with the correct user, and it worked with out error.

However, while the files were being correctly transfered on the first test run, it wasn't noticing changes or new files, so something is still not quite right. I will review that and post another topic.


Top
   
PostPosted: Sun Jan 25, 2009 5:09 pm 
Offline
Senior Newbie

Joined: Wed Jan 07, 2009 5:25 am
Posts: 10
Website: http://chrisboshuizen.com
I just mentioned some problem with the report in the log, stating that 0 files were changed/added when in fact there had been.

To check what is going on, I created some more scripts to do listings and status reports, and the listings do match my changes to the file system.

I did some test backups, adding/changing/removing files in between, and could successfully restore any file from any point in time. Thus, it is all working despite the incorrect messages in the log.

*shrug*


Top
   
 Post subject: Re: 'Storage' Linodes?
PostPosted: Fri Jan 30, 2009 12:47 am 
Offline
Senior Newbie

Joined: Thu Oct 23, 2008 4:13 pm
Posts: 13
AOL: eyeni4net
PaulC wrote:
I know that we can add additional storage to our Linodes, but at $5/Gib/Mo it's not so attractive for an off-site backup of your photo or music collection.

So... any thought to some sort of 'storage'-oriented linode package(s)? I'm thinking for 'personal backup' usage, not some sort of file download site.

Just wondering if it's something others would find useful, or it's just me. Dunno if it makes business sense, but thought I'd ask.


Why not just mount s3 to your server? then you'll have endless diskspace


Top
   
 Post subject: Re: 'Storage' Linodes?
PostPosted: Fri Jan 30, 2009 3:34 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
poetics5 wrote:
Why not just mount s3 to your server? then you'll have endless diskspace


Yeah, there's S3FS which lets you mount your S3 bucket as if it were a regular filesystem.

Problem?

1) There seem to be at least 3 different programs named S3FS (in addition to JungleDisk which is not free), and each of them uses its own (proprietary) way of storing data in S3. These protocols are not compatible with each other. Accordingly, you can't access the data you stored on S3 unless you go through the same program you used to store it.

2) None of those 3 programs have reliable error handling capabilities. It'll tell you that a file has been copied to S3, but it won't tell you if a few bytes in the middle went corrupt in the process. Coupled with the apparently high failure rate of connections to S3, this is a total deal breaker. What good is a gazillion gigabytes of space if a data transmission error could go unnoticed?

Seriously, S3 is a fantastic service but it has so many limitations if one were to use it like a regular filesystem. Of course there are ways to work around those limitations (usually by storing extra information in the metadata), but I'm not jumping on the bandwagon until I have a standardized and reliable protocol for doing so.

In the meantime, I have a couple of low-cost, high-storage VPS's (50GB) and a backup package from BQBackup and the likes (100GB) mounted on my Linode using SSHFS. That protocol is damn slow, but at least it's standardized and reliable.


Top
   
 Post subject:
PostPosted: Fri Jan 30, 2009 10:08 am 
Offline
Senior Member

Joined: Mon Feb 28, 2005 7:21 pm
Posts: 76
I believe the 'FuseOverAmazon' s3fs is the most advanced of the s3fs' out there, but I haven't tested any of them. s3fs is quite proprietary in file format because it creates a virtual block device in the bucket.

I've been testing JungleDisk (JD) on my Linode and the two main issues are:
(1) lots of little files cause a performance bottleneck because each must be posted in its own transaction with S3. JD does this file-based uploading because that's the only way S3 will guarantee "consistency".
(2) The JD cache isn't yet available offline, so files in the JD mount are only visible if there is connectivity to S3. It was designed this way to ensure cache validity when the bucket is shared between multiple machines..

Another downside of JD is that for Linux CLI use, you'll need to download the USB version and create the XML configuration file using it (on a Mac or PC), then paste the config to your Linode. There's no published spec for the XML file but easy to edit by hand after creating the initial file.

JungleDisk can create and use compatibility buckets which are compatible with other software (but S3 doesn't have any "standard" for this, anyway). The biggest advantage of JD is that it can encrypt everything before it leaves your machine, which obviously makes those buckets incompatible with other software. There is an open source tool to access your JD encrypted bucket data.

We need both fast and cheaper storage, and backups that are easy to verify and restore. Handling both with the same system might not be optimal (see how Amazon separates EBS from snapshot backups to S3). I really look forward to seeing what Linode comes up with because everything else here rocks!


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group