RB Tech: 2013

Monday, 4 November 2013

Ramon's Rough Guide to Akka - Top Tips 1

Ramon's Rough Guide to Akka - Top Tips 1

And someone please tell me if the tips are wrong :-)

Over the past 2 years in my spare down time, I have been using Akka for the core of my application. There is so much to grok, but it is also very simple.

This will be a really short post, of things which I wish I had read somewhere, before diving into Akka.

1. Don't pass ActorRef's around :-)

The ActorRef (as I read tonight) is a handle to a specific instance of an Actor. So when that Actor dies an another takes it's place (or in my case during event sourcing replay) - The ActorRef will become stale.

Instead, get the ActorRef (or an ActorSelection) from

system.actorSelection("name")
context.actorSelection("/orPath/name")

(and I uncovered this due to the deprecation of actorFor).
This will give you the Actor you need.

I guess I passed the ActorRef in to places because I am not using any DI (like I used to always with Spring). So instead .. now system or the context can be always used.

2. Read this new Post of Actor paths and Actor addressing!

This explains (1) better than I can.

http://doc.akka.io/docs/akka/snapshot/general/addressing.html

That's it for today.

Friday, 11 October 2013

Embedding a Primary Key in a UUID / GUID for CQRS / ES

For some time now I have been working on an Event Sourced system. Today I am going to describe my UUID/GUID primary key approach that I devised. Ignoring much about the event sourced technology and the merits thereof; primary key's are common and critical and their usage in an Event Sourced system is very critical.

The usual route for a primary key in a system is an increasing number (Person ID = 102).

With Event Sourcing and CQRS, the UUID or GUID is often identified as a good implementation choice for your primary key

If you are interested in the topic, then these two resources are a good read. (nothing to do with Event Sourcing)

The Cost of GUIDs as Primary Keys - http://www.informit.com/articles/printerfriendly.aspx?p=25862
GUIDs as fast primary keys under multiple databases - http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database

But I have a few issues with the UUID. For one, it's not very client friendly, nor is it easily memorable. I want a system wide unique reference for all my objects ; a UUID is great for that; and I want it to also function as a friendly and memorable primary key.. how ? Well here below is my solution.

A little about Event Sourcing

With event sourcing, the primary route to storage is not through the "Domain Object".

Most developers are familiar with an Object-Relational Mapping system (Hibernate, NHibernate etc). (Object persisted to a table). This usual approach is that a Domain object is persisted (from a class or class heirarchy structure) to a database table. Event Sourcing instead stores the "event" that creates or modifies the Domain Object, and not the Object itself. A rebuild or some other query is a replay of the Events stream that was persisted.

I explain all that to say that an Event Sourced (CQRS) system can be designed without "tradtional primary keys".

Each event has a key, and the objects created can have keys, but what is common is to use UUIDs across all domain objects.

My Design Goal

I wanted some traditional primary keys; and I wanted the UUID. In short, the system should be simple to use; being both client friendly and a memorable primary key for use.

When you have a UUID as the primary key and it is used in API's and get's passed around the office as "Look up client 3422" you need this API to be simple.

An example: /person/fbe645f0-3031-41e3-aa6e-0800200c9a66 is just not a nice URI

However,

/person/2078

/person/5f9

They are simple.

What I have embarked on is to segment my UUID (my primary keys) so that I utilise the entire UUID space - a part for randomness, a part for the Group (or table ID if you like) and a part for the object; in this example case, the person.

The make up of my Custom UUID

The UUID is a 128 bit number, represented as 32 hex characters (with some dashes for legibility)

The UUID specification reserves some bits for version and variant.

With much (actually 30 minutes) thought, I have decided to use Version 6. (it doesn't exist in the IETF RFC 4122, they just went 1, 2, 3, 4 and 5)

and just make the Variant 'a'.

but ignoring all that, what is special about my UUID is that I embed a primary key and a "group" ID into the UUID.

So, my UUID's look like

pppppppp-pppp-6ggg-arrr-rrrrrrrrrrrr

where p is the Primary key (sequential incremementing)

where 6 is the Version (always 6)

where g is the Group ID, (akin to an identifier of the table)

where a is the variant (always a)

where r is the random part

This is my working notes:

00000000-0000-6000-a000-000000000000

-----------------

60 bits Random bits (time or other)

x == a, b, 8 or 9

---

group ID - of 12 bits value is (0 to 4095)

6 == always 6 - (Version 4, Random UUID)

-------- ----

Primary key ID of 48 bits (max 281,474,976,710,656)

So I have enough bits for a primary ID (halfway between Int *32 bits and Long *64 bits)

The Version

I decided to use 6 as the Version, 1 - 3 are other uses, 4 is Random or Psuedo random, which is almost mine, but not quite and Version 5 is a SHA-1 Hash; the RFC 4122 stipulates the following

Process of identifier assignment:

Generating a UUID does not require that a registration authority
be contacted. One algorithm requires a unique value over space
for each generator. This value is typically an IEEE 802 MAC
address, usually already available on network-connected hosts.
The address can be assigned from an address block obtained from
the IEEE registration authority. If no such address is available,
or privacy concerns make its use undesirable, Section 4.5
specifies two alternatives. Another approach is to use version 3
or version 4 UUIDs as defined below.

The sentence I want to draw attention to is - "Generating a UUID does not require that a registration authority be contacted"

But then again, because I am using Version 6, it's probably not a UUID.

My reasoning to a Version 6 are:

My UUID is not a Version 4, because it is only partially random ( a part of it )
It is not any other Version (1-3, or 5)
Version 6 wasn't used
If someone has an issue with my use, then I'll call it instead a LUUID, a Local UUID

Moving on.

The Group ID

I wanted to have a unique key that represents all objects across the space. In this way I can now have a REST URI that looks like

/any/<uuid>

and the system can appropriately redirect to the correct resource, by looking at the group ID. (pattern matching the -6xxx- )

For example, if we have the UUID

000000231-22e-6aa1-a789-28ef27ab7c62

This is

aa1 for the Group ID for 'person'
23122e for the primary key

/any/000000231-22e-6aa1-a789-28ef27ab7c62

and with a match we can redirect the request to

/person/000000231-22e-6aa1-a789-28ef27ab7c62

Clashes in the Primary Key with a Distributed System

CQRS guru Greg Young says of the UUID (and Event Sourced systems)

Having the client originate Ids normally in the form of UUIDs is extremely valuable in distributed systems.

And this is true, but I don't want to give the client that honour, I want to allocate them a UUID (for many reasons) - but that comes with a drawback of needing to cater for clashes. (there is always a tradeoff with IT)

For my UUIDs, the primary key is sequential, at point of allocation. (231-22e, 231-22f, 231-230 etc). Having the end of the "UUID" random (e.g.: -a789-28ef27ab7c62)means I can cluster (multiple systems) and allow for "duplicates" even in the primary key space, (where two servers generate the same ID and Group ID).

Leaving the UUID as it is (with a duplicate in the PK ID space) is ok but not ideal. So we will need to cater for that. (and of course for the lottery day when two systems generate the same UUID even down to the random part!)

Let me explain that : Assume that I have a web application with two servers that generate "people",

www.myco.com

server1-myco; and
server2-myco

If server1-repo and server2-repo BOTH generate a UUID but the random part differs, e.g.

server1 - 000000231-a32-6aa1-a789-28ef27ab7c62 (Jim)

server2 - 000000231-a32-6aa1-a25e-6ac5c6ef127c (Mary)

then, technically I have two unique ID's, but the initial shorter ID's clash. This is not exactly ideal - because I want my friendly PK ID's to be unique also. If I want to minimise that "duplicate", then here are some options.

Single ID generation node (creates a single point of failure)
Regular re-assignment in case of a clash
Generate and check with peers
Block reserving

Options 2, 3 and 4 will be my preferred. With a distributed system, I have that PK ID issue anyways, so it's not different because I am playing with UUID's.

But what I really like is how I can use the primary key all by itself, outside of the /UUID

for example.

Dear Mr Client,

Welcome as a supplier to Company X. For future reference, your client ID is 23122e.

...

/person/23122e

I don't have to use the full UUID everywhere, but rather just where I need it (in the event sourced messages), and as a unique global ID on the system.

Debugging a system will be easier too, looking at logs, someone with a little knowledge will recognise 'people' UUID's vs 'building', or 'schedule' UUID's (because they will distinguish the group ID after '-6xxx-' as the unique group identifier); in effect people will learn those 3 characters and identify what the group Id is.

UUID generation

So now I hear you wonder, how do I generate these mythical UUID's ?

Simply really.

This is the call for the generating the UUID (reusing the java.util.UUID class, just giving two lower and upper longs (64 bits each)

/**

* Create a new UUID given some ID as the groupID and an already sequenced ID

def createUuid(groupId: Int, id: Long): Uuid = {

val randomBytes = new Array[Byte](8)

secureRandom.nextBytes(randomBytes)

val randomLong = java.nio.ByteBuffer.wrap(randomBytes).getLong()

// groupID has to be 12bits as the 4L is going in over the top.

return Uuid(new java.util.UUID(groupId | (6L << 12) | (id << 16), (10L << 60) | (randomLong >>> 4)).toString())

}

The Primary Key is an incremental sequence on the code that creates a new person, or group, or employee (etc)

In Scala, it is simply a matter of adding in a Trait for the "class" you want ID's sequenced for/ UUID's generated for. e.g: I add with UuidGenerator[Person] and this gives a newUuid() method

class PersonProcessor(val repository: Repository[Uuid, Person]) extends AbstractProcessor[Person]

with UuidGenerator[Person]

{ this: Emitter =>

def klass = classOf[Person]

...

createPerson(newUuid, cmd )

The implementation of that newUuid() method looks as follows:

trait UuidGenerator[D] {

implicit def klass: Class[_]

private val ids = Map.empty[String, Long]

private[this]def className = klass.getClass().getCanonicalName()

/**

* return the next ID

def newUuid:Uuid = {

val idKey = className

val currentId = ids.getOrElseUpdate(idKey, 0L)

ids += (idKey -> (currentId + 1))

return Uuid.createUuid(klass, currentId + 1)

}

...

And Uuid.createUuid looks like :

def createUuid(klass: Class[_], id: Long): Uuid = createUuid(groupId(klass), id)

The Group ID is simply CRC-12, or 12 bit CRC over the classname. This is because I had 12 bits to spare where I placed the groupId -6aa1- So given that all 'Person' domain objects extend from com.soqqo.system.domain.Person my groupId's are consistent there.

On bootstrapping my system, I make sure that all "in use" groupId's are not clashing on CRC12-ing them - could happen - and if it does I'll deal with that then. (just System.halt bootstrap .. and change code to suit)

This is the crc12 implementation in Scala. I haven't tested that brutally, but it DOES generate unique < 4096 ID's for random byte's passed in, so it is working the way I need it to.

def crc12(toHash: String) = {

/**

* ************************************************************************

* Using direct calculation

* ************************************************************************

varcrc:Int = 0xFFF; // initial contents of LFBSR

varpoly: Int = 0xF01; // reverse polynomial

var bytes = toHash.getBytes()

for (b: Byte <- bytes) {

var temp = (crc ^ b) & 0xff;

// read 8 bits one at a time

for (i <- 0 to 7) {

if ((temp & 1) == 1) temp = (temp >>> 1) ^ poly;

else temp = (temp >>> 1);

}

crc = (crc >>> 8) ^ temp;

}

// flip bits

crc = crc ^ 0xfff;

crc;

}

Summary

I hope this helps someone on any of the weird topics I have covered here. I will share the Uuid and UuidGenerator classes happily for anyone that wants them. They are anything special, but rather a lot of thinking about how I wanted my Uuid's to be utilised in the system.

My system entails:

1. spray.io - Web router on top of Spray Can
2. Event Sourced (now also known as akka-persistence)
3. AngularJS

Enjoy!

Tuesday, 2 July 2013

Heroku and Gradle - and jetty-runner Configuration

I recently worked on a project where we used Heroku as the deploment engine.

For speed of the project I chose Maven (simply because I know it well and it is VERY good).

Knowing also that Gradle is now becoming the "next build kid" on the block and, for me, recognising that Gradle is easier to configure, I set about the task today of switching the build to Gradle.

This was very simple, and the last "part of the puzzle" after replicating all the functionality was to setup the Heroku parts. By default, we were using the Maven (jetty-runner) boot strapping. Arguably it is easier and lighter to run Jetty Embedded (as per https://github.com/heroku/devcenter-gradle) but I wanted to see what is required to use "jetty-runner"). This was more an excercise in build comparison, that it was "get it onto Heroku".

Heroku will detect a Maven pom.xml, and by default will run

This creates a war. Heroku then runs your app using "whatever" is in the Procfile. The Procfile they suggest looks like this:

The jetty-runner.jar gets in the "target/dependency/" folder due to this Maven Magic.

When Heroku detects a Gradle project, it runs

instead of "gradle package".

So to replicate the same with Gradle I had to write a copy task to get jetty-runner in there, and generate the war and attach that all to a "stage" task.

It is very easy when you look at it, the trick is in the knowing of the API that slows it down. But I resolved it in about 30 minutes.

In short, the changes for Gradle are as follows.

1. Change your Procfile (we will get the jetty-runner.jar in "build/libs") (the war goes there by default)

2. Add a "new" configurations for the depenency of jetty-runner

3. Add a new dependency for Jetty Runner (note I also use newrelic so it goes in there too!)

4. Create a new task which copies the jars from the dependencies "runtimeOnly" We also replicate the Maven method of renaming the jar to have no Version(s).

5. Add a "stage" task, because that is what Heroku will run.

And that is it. When stage is run, it will create a war, and copy jetty-runner.jar into the build/libs folder.

Happy days.

Wednesday, 29 May 2013

The hierarchy of the type X is inconsistent - A Classpath Issue

Often the simple things will slow you down when developing.

Today I had one such moment. It occurred a few days back where my Eclipse project was reporting the following error:

The hierarchy of the type LoggingFilter is inconsistent

The LoggingFilter was simply a class extending the Spring AbstractRequestLoggingFilter

Nothing really special was going on. I was kind of hoping it would be obvious but it wasn't.

In short, my classpath was wrong where I had an import that did not (was older perhaps) match to a library that, somewhere in the stack, AbstractRequestLoggingFilter was depending on.

I frequently used mvn dependency:tree to assess what this list was .. and nothing stood out. I checked the javax.servlet-api (yes I was using 3.0.1, and so was Spring). I checked my exclusion of commons logging (I use SLF4J), but that was okay.

Eventually I looked inside the .classpath and to my mild horror I saw Spring 3.0.6 was included.
Looking back at the dependency:tree for Maven, it didn't show in the list.

Which meant one of two things:
1. mvn dependency:tree was wrong
2. mvn eclipse:eclipse was wrong

I looked first at dependency:tree, and then figured that perhaps Maven needed an update (I was using 3.0.4) and that took me to check release notes, of which I found : https://cwiki.apache.org/MAVEN/maven-3x-compatibility-notes.html#Maven3.xCompatibilityNotes-DependencyResolution

and a magic note that , dependency:tree does not work according to maven's resolution.
So no worries, I ran debug and I saw that .. yes Maven was "seeing" 3.0.6 but it was also excluding it.
So that meant that eclipse:eclipse was wrong.. a quick pom change and (2.9 eclipse plugin) and it was all good.

mvn eclipse:eclipse excluded 3.0.6 as expected and included 3.2.3 as needed.

Jetty however is still bootstrapping with 3.0.6 .. so I may have to specifically find which of the depenedencies is trying to include it, exclude it and then forcibly include spring-context.. rather than rely on the transitive.

Moral to the post .. check everything .. and assume nothing!

Friday, 8 March 2013

Scala - Event Sourcing and Spray

Scala + NoSQL

Over the past 24 months I have been diving a bit deeper into Scala by way of using a new architecture. My dabbling started 4 years ago. I haven't touched Java for about 2.

I have been sick of, over the past 10 years, building the typical DB/App stack. If you track back about 5-6 years in my posts you'll notice I shifted to researching and investigating NoSQL enterprises. (db4o etc). The "ick" centred around the simple yet fundamental issue of the ORM Impedance Mismatch. Every man and dog has blogged and written about it and it is very prevalent.

One thing led to another and I found myself loving Scala and it's fresh way of code development. But never really landed on a great NoSQL solution. Being a Java hack I immersed myself in its ways (Scala that is) and attended ScalaDays 2012 to sure up my skills; There I met and chatted with a lot of people and found the 3 days brilliant.

At Scaladays I was researching my next "stack" and honed in on spray.io. After the talk Matthias Doenitz gave at ScalaDays, I caught up with him in the hallways asking a general question of

"I am an old GWT hack and want less complexity; what interfaces do you see being plugged into Spray,.."

to which he and a friend gave some tips to various JS libraries for me to check out (of which I have settled on Twitter Bootstrap). Matthias and / or the other (I don't recall who) made reference, if I wanted to walk on a new area, to check out the work in the Event Sourcing arena.

I read and could see it's benefits. So I figured I would read some more. I ended up watching the threads on the DDD/CQRS. That got me hooked - I have to say though that the CQRS is a simple yet massive theory and I wanted some good practical; it was to come.

About the same time, Martin Krasser posted some info on JAXB Scala Marshalling - I wanted some of that in my Spray application. Scala has excellent JSON marshalling using lift-json; alternatively you can use a built in library spray-json. Both of these worked well, however I wanted a "single" API definition class that I could expose as JSON or XML without me needing to code it twice. My sample application at the time was still bound to a Scala/Spring JPA and Hypersonic DB stack.

Seeing the "sample" project that Martin Krasser had built, and his excellent blog posts on it, took me right into the eventsourced package where the JAXB marshalling was used. Martin Krasser together with colleagues released the early draft of this Event Sourcing package and after a few iterations it was fully embedded into the Akka way .. and since Spray was too, I have joined the two together ever so simply - and now have the base framework for my perfect "stack".

The Scala ES Spray Stack

So what does it look like ? At the front end, though I have some ways to go there, I have

Twitter - Bootstrap, which talks to a
JSON REST API; into
spray.io routing; which delegates the "commands"; to
eventsourced

The commands and the events that are journaled are Scala Case Classes, Annotated with JAXB annotations to support the Un/Marshalling in spray.

Akka Camel will come next, though because I am an old Apache Camel Hack - random contributions and use throughout time, I know what it does and well, so will slot it in later.

A Weak Schema on Historic Events

Event sourcing is great - in essence - keep everything you ever did. Everything. It is very BigData-esq, and actually beneficial for audit tracking. If I keep every Command/Event that the system responds and reacts with. I have a full "audit" trail of how it operates. CQRS gives me a benefit of no Database (can stick one on the "read" view if I want). However;

My last "piece" to the framework puzzle is the concept of supporting a weak schema. After you read all about Event Sourcing, you will quickly realise that "Cross Version" software support needs to be managed well. The typical "DB" stack doesn't have this challenge as much, simply because, unless coded for, all history is "thrown" away - and thus the problem is smaller - and DB upgrade scripts decide at "run" what is kept .. or what goes.. and the "change is usually irrevocable".

With Event Sourcing, the "history" is with you, and that is it's benefit. So to retain that benefit through future upgrades of your application, you need to support the older events, in what ever form they have. A few ways to achieve this are:

1. Retain, in code, the "V1, V2 and V3" objects that the event messages relate to;

Needless to say, this is quite complex. The amount of code you may have to manage over a long life span of the software could become messy. * discounted *

2. Upgrade the older Events, when you "upgrade" the Application.

This may work - but it feels wrong. Upgrading events to something they never were breaks the model. Adding a field to an event .. what should be the default of the value that was never supplied 4 years ago ? Needless to say, you will recognise this is often the DB way. That is okay. It does work.. there is another way. * discounted *

3. Translate the older events on the fly as they are read in..

My idea was to build a shim in the "serialisation layer" that translates older events V1, V2 up to V3 equivalent .. but again it kind of smells.. so that is not it. * discounted * .. kind of..

4. A Weak Schema..

Greg Young, on the eventsourced mailing list pointed me (in two words .. "Weak Schema") to look at this model .. quickly I ended looking at Google's protobuf. I had read it before but had not the need, until now.

Protobuf is all about "version" management across messages between systems. This is exactly what Google built it for. Between their index servers that may be running different versions of software. And as it turns out, it might be a brilliant fit. I have some "tests" to do, which for now I am going to park as the theory seems okay, and when the need warrants I will utilise it's power.

I googled in earnest about working with a "weak schema" but there was not a lot to read, but it didn't take too long to work it out.

Let me give you an example:

if we have a "Command" object that we will serialise to disk. It could look like this:

"CorrectTheBirthDate(uuid,newBirthDate)"

In my fictitious application, imagine we serialise this to JSON. It may look like this.

Now imagine 3 years later we figure it is a good idea to record the "reason" why the Birth date had to change; to support this we add a new field for "reason". Simple enough, our Command object changes to CorrectTheBirthDate(uuid,newBirthDate,reason) and the newer JSON is serialised as you would expect.

Ok. So what happens when the system "replays" all the Commands. Well with protobuf, it just sees that the field is not supplied, so it doesn't deserialise "nothing" into the object. Instead (using ScalaBuff) the case class is marked with "Option[] fields, so that the value in that instance becomes "None".

If it were the other case where a field is dropped; then the Case Class will just never have the value "loaded in".

I am sure there are horrid edge cases lurking, but it feels right to let the serialisation layer deal with the problem .. how it knows .. and so long as the developers know the rules (when things are dropped, or what is added when) then coding can continue. The trick or benefit is that the "protobuf" default should be enough.

Serialising in this way means that regardless of the changes in my API's, Commands and Events. I will always have the History. Therefore I will always have the ability to scour the depths for stats and reports. Exactly my reasons for Event Sourcing it. (amongst so many others).

So, stay tuned. I will post a real sample application once I tidy up the mess and make a real UI to play with.

RB Tech