RB Tech: 2011

Thursday, 29 December 2011

Crashplan Active Bandwidth Control

I love what random things I get up to over the holidays. Crashplan is a Cloud backup service that I use for all my backups. I have devised a way so that Crashplan starts backing up at FULL speed, when everyone is not at home, or are not doing anything on the internet. And then drops back down to a trickle backup when things become busy.

Essentially I have one computer, my server, and it has the Crashplan service running on it. The server is an Ubuntu based Linux distrubution. Nothing fancy there. What is interesting is how I have managed to control the Crashplan bandwidth it utilises. In the previous house, I had a good 18-20Mbps internet connection, so my backups were really nice. Now, I only have a 3Mbps connection so the "usage" is very important. If crashplan hogs the connection you reall notice it.

Crashplan has two ways of controlling it's backup usage. First is on a backup set basis, you set the time that it runs. Helpful, for the past 2 months, my backups have been scheduled for 12am to 7am. Works well, but at that rate, I will be completely backed up in about 12 months time... hrmm.

The other way Crashplan operates is by bandwidth limiting. This is how I roll it now.

Attempt 1 - See when they Connect

I needed to determine the best method to "ascertain" if someone was using the internet. I pondered looking at when the phones (we have 4 adults, 3 iPhones and 1 Android) are "on" or can be seen as that is a good indicator that someone is home. But of course it goes a bit more than that.

I did however, to test the theory, write a quick Perl script which was the start of monitoring when an iPhone connects to the WiFi. The Android apparently uses Zeroconf also, but I didn't test that.
Basically, each time the phone connects for the initial session, it sends a multicast out. If you are listening, you will see it. Simply, this perl script listens on the right port.
I quickly discounted this method when I realised that some devices just wont tell me they connected. And because I want to keep my network largely zero configuation, I want to just use DHCP and not have to bother with static entries, or Static leases for my clients. Which is when it hit me:

Attempt 2 - Scan the Known DHCP Range

The premise behind my final solution is simple. I have two types of devices in use in our household.

Infrastructure - servers, printer, wirless and network switches / routers
Clients - Laptops, phones (soon tablets)

All my infrastructure devices use static IP addresses, such as 1 to 39. All the clients, use a DHCP IP address. We have no desktops in the house, only laptops and when they are closed, they are not in use.

If they are not in the house, they are not in use. Same goes with the phones, they are either sleeping or not in the house.

The Solution

So, if all the laptops are closed and all the phones are out, I know that none of the "clients" will ping, and therefore I can ramp up the Crashplan bandwidth.

Solution to Attempt 2 - Monitor for Active Clients

I had a few things to solve. But for each I knew that Perl would be all that I would need.

Part 1 - Automatically Raise and Lower Crashplan Bandwidth

I needed to determine how Crashplan "stored" it's bandwidth rate. Turns out it is in a config file, but the web site can also change the config. So, the client (my Crashplan service running on my server) and the Crashplan Cloud keep in contact all the time· If I change the bandwidth on the client, it updates on the website, and vice versa.

I fired off a few support queries to Crashplan and they basically said that the config file I found (/usr/local/crashplan/conf/my.service.xml) is not editable by hand. And that the only supported way is via the Client (desktop Java App) or the Website.

Call WWW::Mechanize - I used this module to login to the Crashplan Website and change the bandwidth (kbps) based on my argument. Simple and it looks like this.

Not rabbit proof, but it works for my needs

Part 2 - Monitor for Activity

This was the hardest part, but as always, was solvable. The router (Sky Broadband Provider) I currently use "knows" what clients are using it. VERY handy. So like the Crashplan website solution above, I again use WWW::Mechanize to determine what IP addresses are currently using the router.

I do a filter out of all the ip addresses that do NOT fall inside my DHCP range.

I won't go into the details of using XPath on the HTML from the Router, but, you can see that it is quite succinct. Essentially the IP Addresses appear in a 2nd column in a table. I use an XPath expression to select the right values.

This approach is brittle in as much as, when / if I change my router then I need to code or come up with another way to determine which hosts, within the DHCP range, are currently "in use". I did first to this by just pinging all 60 possible IP addresses; works, brute force, but of course worked :-)

So .. now putting it all together. The script, a perl script, runs from Cron every minute. and performs the following tasks.

Get the Current Rate from the Crashplan Config file
Get all the current "hosts" that the Router knows
Ping each host to MAKE sure they are active
If we have an active host (one or more) - we need to go slow
- If the current rate that Crashplan is running is more than our slow rate, tell Crashplan to slow down
- If the current rate is slow, do nothing
If we have no active hosts - we can go fast
- If the current rate that Crashplan is running is less than our fast rate, tell Crashplan to go fast
- If the current rate is fast, do nothing

Simple! And the full Script looks like below. Enjoy!

Of course, I did all this and know and realise that QoS could be used, but my Sky Broadband Router is just not that smart today and I haven't got the devices spare or the time to reconfigure to installl a dd-wrt based router or some such other. So perl scripts it is for now.

Thursday, 27 October 2011

Neo4J - 2nd Look - Setting a Primary Key on Nodes

Primary Key

In my last post I considered the lack of Primary Key like Id's as something I need to solve. My use case is

The application I will be building out will have, after all is said and done a really simple Web Interface with REST type URLs. So .. for example, I will be able to do.
http://myservice.co.uk/superwebapp/mySpecialThing/detailedView/55
The 55 there will result in a query to Neo4J to locate the "MySpecialThing" object with ID of type 55 and display it.

I also considered using a UUID across objects which is also good, but not really what I was after. I want a class of objects to all share an identifier. It has a lot of use. To solve the problem I arrived at the following solution.

Solution Outline

All Domain Objects extend fro super type of AbstractLongDomain which has getId()/setId() (Long)
An Aspect wrapped around the getId() looks for a null value and if no Id is found. It creates one
In the aspect creation of an ID involves talking to a singleton IdManager for a "nextId()"
nextId() on the manager looks to it's cache HashMap to see if it has an IdObject that knows what the next Id is
IdObject self persists to the repository after each call (** this could be slow.. see how we go)

To the Code

All my Objects extend the AbstractLongDomain The ID Object holds a "per" class Long Id, so each time an ID is needed, one of these objects gives one out. Next we have the IdManager that is managed as a Spring Singleton Bean. Its job is to return an id based on the "class" that needs an Id via getNextId(Class klass). The idRepository you see there is a simple Spring Data Neo4J Repository which has aspect-magic dust sprinkled on it to make the actual implementation.

Because I am not sure if neo4j is the "best" place to store the Id's (though it is the logical) I created a simple idGenerator interface which is simply what the aspect will call and talk to. One implementation (the only) is the Neo4JBackedIdGenerator which uses the id objects and idmanager above.

So first the interface for the IdGenerator

And then the actual Neo4JBackedIdGenerator which is created and managed as a spring bean.

I will have to play with the transactional semantics on this one. I recall a horrid situation which a similar design but via stored procs many moons ago where we had the sproc that generate Id's wrapped in transactions. They needed to be in their own transaction to ensure that mass object thread creation would not get stuck on a lock. (just an area I know can be sticky and bite.. so I put the @Transactional in there and commented out to remind me.

So last, we have the AspectJ which wraps our getId(). All the domain objects extend org.soqqo.luap.model.AbstractLongDomain which means we will get the Id creation and generation for free each time getId is called. (technically a catch here is that setId doesn't get checked if called manually on setting an Id. It could I guess look into the repo to see if the Id is already used.

And finally the Unit Test code to show that it all works

Note the use of @DirtiesContext, because the Neo4J is transactional, after each test the contents are dumped, which means that the idManager which has the HashMap cache becomes stale and it is singleton and has a lifecycle of the test class, not just the method. So the fix is either..manually flush() the cache or use @DirtiesContext which tells spring to re-build the context file. Both work but manually flushing my HashMap (new() ) is 2 seconds faster (0.037s for the test) than spring is at rebuilding.

The 2nd last piece to show is my test context file - model-test-context.xml

The very last piece is what my Maven POM looks like because a lot of people like to see that... Hopefully these are the correct relevant bits. I am happy to provide all this as a ZIP or push it up to github if people want to see more of it.

Wednesday, 26 October 2011

Neo4J - My First Look with Spring Data Graph

Spring Data Neo4J

I have been working on some proof of concept code and decided on a clean route to using NoSQL. Of course there are many choices and because the application I am working on is highly connected around relationships. (not boyfriend girlfriend types) I figured I would look at Neo4j. Given my favourite library of the year is spring-data I would take a look at the recently release spring-data-neo4j library (formerly called Spring Data Graph).

Spring Data provides some funky interface abstraction over your data store, be that an RDBMS, or other type of storage like NoSQL forms.

Specifics to Neo4j and Spring Data Neo4J

A Quick Overview:

Neo4J allows you to store POJOs without the need of a Schema.
POJOs are tied together using Relationships
Neo4J Understands a Node and a Relationship (that is it)
Spring Data Graph makes the "Node storage" and "relationship" tie-ing really simple with Annotations

Let's look at that last point in detail. spring-data-neo4j uses a few special annotations, not unlike JPA's annotations.

Declaring a Node

A Node is declared with the following annotation

@NodeEntity
public class MySpecialPojo {

    @Indexed
    Long id;

    @Indexed(indexType=IndexType.FULLTEXT, indexName = "search")
    String textField;
    //...
}

Effectively these annotations make some magic happen. One of the big magic happen things is to do with some special methods() you will find on the objects. If you include the right "stuff" in your Maven POM for spring-data-neo4j, you will get some good stuff happening. Effectively some "DAO/repository" style methods get woven into your domain objects.

        // for free we get .persist() Which wraps up a call to neo4j and put my Pojo as a Node down to Neo4j.
        MySpecialPojo special = new MySpecialPojo(1,"Some Data").persist();

        MySpecialPojo foundSpecial = this.pojoRepository.findByPropertyValue("textField", "Some Data");

You will also have .remove() and other fun stuff. The best document I have found (as it is very new (2.0.0.M1) is the following PDF. Spring Data Neo4J - Good Relationships

Missing Identity or Mimicking(sp?) a Primary Key

The application I will be building out will have, after all is said and done a really simple Web Interface with REST type URLs. So .. for example, I will be able to do

http://myservice.co.uk/superwebapp/mySpecialThing/detailedView/55

The 55 there will result in a query to Neo4J to locate the "MySpecialThing" object with ID of type 55 and display it. The problem I have is a two fold

Neo4J just stores objects and does not have "primary keys" other than a "nodeId".
The NodeId is collection wide. So Pojo1 shares the incremental nodeIds with Pojo2.

spring-data-neo4j adds (via an aspect ITD) a getNodeId() method to my POJOs but I don't want to depend on these for my "primary key" (future proofing my app if I move from Neo4J to something else).

So I want a Class wide "Id" so that when an object is persisted it has an ID for it.
I may be thinking about this wrong, and should just mold and accept a collection (database/store) wide ID system.
I am heading down this path

public class Foo { 
   @NodeId(collection=Foo.class)
   private Long id;
}

// or 
@NodeId(collection=Foo.class)
public class Foo extends AbstractLongIdDomainObject { 
    // .... get/setId() is found on super class.
}

This would then store and manage an ID, like we used to in RDBMS days when they did not have Primary Key AUTO INCREMENT or IDENTITY type stuff. The way you would implement Primary ID's is to have a table that stores the ID for "each collection" and a lookup stored-procedure or code that would "get" the next ID for you (within a transaction for example).

I have not perfected this, but I figured I would show what I was thinking. Plus I will do some more reading, (maybe Neo4J has some config to allow per class type nodeIds. * I always go the hard way first *


@NodeEntity
public class IdManager { 

    /**
     * This map holds an idObject per "className" for each object we want to "have a Primary KEY id for"
     */
    @Indexed
    private Map idCollection;

}
@NodeEntity
public class IdObject {

    public Long getNextId() {
        return nextId;
    }

    public void setNextId(Long nextId) {
        this.nextId = nextId;
    }

}

So with this magic code I would then annotate as above with my custom (@NodeId) and some magic happens using aspects and stuff to weave in the "next" ID when an object is "created" and about to be stored through spring-data-neo4j.

Can't Default Values

Probably just by how the aspects interact with the get/set methods for your fields, I found that you cannot default a fields value like you can with JPA.

@NodeEntity
public class Foo { 

   private Long someNumber = 1L;
   // .. getter setters
}

If you create this object, and set the someNumber to 55 for example. And then fooInstances.persist() and then retrieve it from the repository, it will not have the value of 55, but the value of 1. !! Annoying. So that is okay .. but I think the apsects that "populate" the fields are going in too early or something. I have a test case that shows this happening but it was in a complex Aspect( because of my above primary key playing) so it may be a special case. I'll see.

Summary

Really nice and I like the simplicity that neo4j and spring-data give. GO away SQL.

Friday, 16 September 2011

Released a New Open Source Library

For a while it has slightly or mildly annoyed me how clunky creating random data can be.
You see all manner of approaches from Importing CSV files and XML to loading up JSON objects and the like.
What I have wanted was a library that allows random or specific generation into your Java POJO domain model which you then simply persist using your preferred storage layer (Hibernate, JPA, Neo4j) etc.
DBUnit is fantastic and I have often used http://www.generatedata.com to generate XML files to upload through DBUnit.

Well know I sctarched that itch over the last day to create the aptly named

random-data-generator - http://code.google.com/p/random-data-generator

In short, see the project page as it explains most of how it works. It is a library for use in your code for "seeding" objects with random data.

Enjoy.

Thursday, 15 September 2011

Love *Nix

I Still love *nix after all these years. It only took me 1 minute to convert lines like


1880,"Mary",0.072381,"girl"

to what I wanted.


$ grep \"girl\" baby-names.csv  | head | cut -d, -f2 | sed 's/"//g;s/$/,F/g'
Mary,F
Anna,F
Emma,F
Elizabeth,F
Minnie,F
Margaret,F
Ida,F
Alice,F
Bertha,F
Sarah,F

RB Tech