Thursday 29 December 2011

Crashplan Active Bandwidth Control


I love what random things I get up to over the holidays. Crashplan is a Cloud backup service that I use for all my backups. I have devised a way so that Crashplan starts backing up at FULL speed, when everyone is not at home, or are not doing anything on the internet. And then drops back down to a trickle backup when things become busy.

Essentially I have one computer, my server, and it has the Crashplan service running on it. The server is an Ubuntu based Linux distrubution. Nothing fancy there. What is interesting is how I have managed to control the Crashplan bandwidth it utilises. In the previous house, I had a good 18-20Mbps internet connection, so my backups were really nice. Now, I only have a 3Mbps connection so the "usage" is very important. If crashplan hogs the connection you reall notice it.

Crashplan has two ways of controlling it's backup usage. First is on a backup set basis, you set the time that it runs. Helpful, for the past 2 months, my backups have been scheduled for 12am to 7am. Works well, but at that rate, I will be completely backed up in about 12 months time... hrmm.

The other way Crashplan operates is by bandwidth limiting. This is how I roll it now.

Attempt 1 - See when they Connect

I needed to determine the best method to "ascertain" if someone was using the internet. I pondered looking at when the phones (we have 4 adults, 3 iPhones and 1 Android) are "on" or can be seen as that is a good indicator that someone is home. But of course it goes a bit more than that.

I did however, to test the theory, write a quick Perl script which was the start of monitoring when an iPhone connects to the WiFi. The Android apparently uses Zeroconf also, but I didn't test that.
Basically, each time the phone connects for the initial session, it sends a multicast out. If you are listening, you will see it. Simply, this perl script listens on the right port.
I quickly discounted this method when I realised that some devices just wont tell me they connected. And because I want to keep my network largely zero configuation, I want to just use DHCP and not have to bother with static entries, or Static leases for my clients. Which is when it hit me:

Attempt 2 - Scan the Known DHCP Range


The premise behind my final solution is simple. I have two types of devices in use in our household.
  • Infrastructure - servers, printer, wirless and network switches / routers
  • Clients - Laptops, phones (soon tablets)

All my infrastructure devices use static IP addresses, such as 1 to 39. All the clients, use a DHCP IP address. We have no desktops in the house, only laptops and when they are closed, they are not in use.

If they are not in the house, they are not in use. Same goes with the phones, they are either sleeping or not in the house.

The Solution

So, if all the laptops are closed and all the phones are out, I know that none of the "clients" will ping, and therefore I can ramp up the Crashplan bandwidth.

Solution to Attempt 2 - Monitor for Active Clients

I had a few things to solve. But for each I knew that Perl would be all that I would need.

Part 1 - Automatically Raise and Lower Crashplan Bandwidth

I needed to determine how Crashplan "stored" it's bandwidth rate. Turns out it is in a config file, but the web site can also change the config. So, the client (my Crashplan service running on my server) and the Crashplan Cloud keep in contact all the time· If I change the bandwidth on the client, it updates on the website, and vice versa.

I fired off a few support queries to Crashplan and they basically said that the config file I found (/usr/local/crashplan/conf/my.service.xml) is not editable by hand. And that the only supported way is via the Client (desktop Java App) or the Website.

Call WWW::Mechanize - I used this module to login to the Crashplan Website and change the bandwidth (kbps) based on my argument. Simple and it looks like this.

Not rabbit proof, but it works for my needs

Part 2 - Monitor for Activity

This was the hardest part, but as always, was solvable. The router (Sky Broadband Provider) I currently use "knows" what clients are using it. VERY handy. So like the Crashplan website solution above, I again use WWW::Mechanize to determine what IP addresses are currently using the router.

I do a filter out of all the ip addresses that do NOT fall inside my DHCP range.

I won't go into the details of using XPath on the HTML from the Router, but, you can see that it is quite succinct. Essentially the IP Addresses appear in a 2nd column in a table. I use an XPath expression to select the right values.

This approach is brittle in as much as, when / if I change my router then I need to code or come up with another way to determine which hosts, within the DHCP range, are currently "in use". I did first to this by just pinging all 60 possible IP addresses; works, brute force, but of course worked :-)

So .. now putting it all together. The script, a perl script, runs from Cron every minute. and performs the following tasks.

  • Get the Current Rate from the Crashplan Config file
  • Get all the current "hosts" that the Router knows
  • Ping each host to MAKE sure they are active
  • If we have an active host (one or more) - we need to go slow
    • If the current rate that Crashplan is running is more than our slow rate, tell Crashplan to slow down
    • If the current rate is slow, do nothing
  • If we have no active hosts - we can go fast
    • If the current rate that Crashplan is running is less than our fast rate, tell Crashplan to go fast
    • If the current rate is fast, do nothing

Simple! And the full Script looks like below. Enjoy!

Of course, I did all this and know and realise that QoS could be used, but my Sky Broadband Router is just not that smart today and I haven't got the devices spare or the time to reconfigure to installl a dd-wrt based router or some such other. So perl scripts it is for now.

Current 5 booksmarks @ del.icio.us/pefdus