|
It is by no means trivial to do what you desire, but I can offer some tips to point you in the right direction.
1. You'll need to use DNS-based failover. When the Japan datacenter goes down, you'll need to update your DNS records so that your hostname resolves to the US IP address instead. Since you want the failover to happen within 10 minutes, you'll need to set a TTL on your DNS record no greater than 10 minutes. For this to happen automatically, you'll need the DNS to be updated dynamically. Conventional DNS servers like BIND support dynamic DNS; newer DNS services like Amazon's Route 53 have an API for making updates to DNS records. You'll need a monitoring process running somewhere that monitors the Japan datacenter and updates the DNS when it goes down. This is rather tricky to do correctly, since you want to avoid failing over unnecessarily if there's only a momentary outage or if you have a network partition (where both the Japan data center and the monitoring process are running but can't communicate with each other).
2. You'll need to continuously replicate your database from your Japan server to your US server. MySQL and PostgreSQL both have replication support builtin. To make life easier, you should put *everything* in a single database. For example, if you're saving stuff to the filesystem (such as images), use a database table instead, since it will be far easier to do replication if there's only one thing to replicate. You should use asynchronous replication instead of synchronous replication, since this provides better performance, with the downside that you'll lose any data that was being written to the database in the moments before Japan goes down (which you've said is OK). NOTE: replication between data centers is only feasible for databases which aren't changed too often. If your webapp is constantly modifying your database, the bandwidth requirements will be too great. It will saturate your bandwidth, causing the replication to fall behind, and/or run up your bandwidth bill from Linode.
3. When Japan goes down, you'll need to promote your US database to master so you can start writing to it. It is VERY VERY hard to do this automatically and correctly. You do not want to accidentally promote your US database to master while Japan is still up; this will cause the two databases to diverge. Remember, because of network partitions, just because your US server can't contact your Japan server doesn't mean that Japan is down.
4. In order to spin up additional Linodes on the fly, you should use configuration management (Puppet, Chef, etc.) to provision your servers.
I would strongly recommend against trying to do failover automatically, since it's very difficult for a computer program to correctly decide whether failover is appropriate or not. It's better to "wear a pager" (i.e. set up monitoring that will wake you up at night if necessary) and initiate the failover yourself if you decide it's necessary. Write a script to do the failover so it's quick. But, I could see why this wouldn't work for you, since a major disaster in Japan could prevent you from initiating the failover.
All that said, maybe you should just put your Linodes in the US. Latency between Fremont and Tokyo is ~127ms. For most types of webapps that's not going to be noticable. You can always serve your static files from a Linode in Japan (or a CDN) to get a slight speed boost for Japanese users.
|