Friday 8 March 2013

Scala - Event Sourcing and Spray

Scala + NoSQL

Over the past 24 months I have been diving a bit deeper into Scala by way of using a new architecture. My dabbling started 4 years ago. I haven't touched Java for about 2.

I have been sick of, over the past 10 years, building the typical DB/App stack. If you track back about 5-6 years in my posts you'll notice I shifted to researching and investigating NoSQL enterprises. (db4o etc). The "ick" centred around the simple yet fundamental issue of the ORM Impedance Mismatch. Every man and dog has blogged and written about it and it is very prevalent.

One thing led to another and I found myself loving Scala and it's fresh way of code development. But never really landed on a great NoSQL solution. Being a Java hack I immersed myself in its ways (Scala that is) and attended ScalaDays 2012 to sure up my skills; There I met and chatted with a lot of people and found the 3 days brilliant.

At Scaladays I was researching my next "stack" and honed in on spray.io. After the talk Matthias Doenitz gave at ScalaDays, I caught up with him in the hallways asking a general question of

"I am an old GWT hack and want less complexity; what interfaces do you see being plugged into Spray,.."

to which he and a friend gave some tips to various JS libraries for me to check out (of which I have settled on Twitter Bootstrap). Matthias and / or the other (I don't recall who) made reference, if I wanted to walk on a new area, to check out the work in the Event Sourcing arena.

I read and could see it's benefits. So I figured I would read some more. I ended up watching the threads on the DDD/CQRS. That got me hooked - I have to say though that the CQRS is a simple yet massive theory and I wanted some good practical; it was to come.

About the same time, Martin Krasser posted some info on JAXB Scala Marshalling - I wanted some of that in my Spray application. Scala has excellent JSON marshalling using lift-json; alternatively you can use a built in library spray-json. Both of these worked well, however I wanted a "single" API definition class that I could expose as JSON or XML without me needing to code it twice. My sample application at the time was still bound to a Scala/Spring JPA and Hypersonic DB stack. 

Seeing the "sample" project that Martin Krasser had built, and his excellent blog posts on it, took me right into the eventsourced package where the JAXB marshalling was used. Martin Krasser together with colleagues released the early draft of this Event Sourcing package and after a few iterations it was fully embedded into the Akka way .. and since Spray was too, I have joined the two together ever so simply - and now have the base framework for my perfect "stack".

The Scala ES Spray Stack

So what does it look like ? At the front end, though I have some ways to go there, I have
  • Twitter - Bootstrap, which talks to a
  • JSON REST API; into
  • spray.io routing; which delegates the "commands"; to
  • eventsourced
The commands and the events that are journaled are Scala Case Classes, Annotated with JAXB annotations to support the Un/Marshalling in spray.

Akka Camel will come next, though because I am an old Apache Camel Hack - random contributions and use throughout time, I know what it does and well, so will slot it in later.

A Weak Schema on Historic Events

Event sourcing is great - in essence - keep everything you ever did. Everything. It is very BigData-esq, and actually beneficial for audit tracking. If I keep every Command/Event that the system responds and reacts with. I have a full "audit" trail of how it operates. CQRS gives me a benefit of no Database (can stick one on the "read" view if I want). However;

My last "piece" to the framework puzzle is the concept of supporting a weak schema. After you read all about Event Sourcing, you will quickly realise that "Cross Version" software support needs to be managed well. The typical "DB" stack doesn't have this challenge as much, simply because, unless coded for, all history is "thrown" away - and thus the problem is smaller - and DB upgrade scripts decide at "run" what is kept .. or what goes.. and the "change is usually irrevocable".

With Event Sourcing, the "history" is with you, and that is it's benefit. So to retain that benefit through future upgrades of your application, you need to support the older events, in what ever form they have. A few ways to achieve this are:

1. Retain, in code, the "V1, V2 and V3" objects that the event messages relate to;

Needless to say, this is quite complex. The amount of code you may have to manage over a long life span of the software could become messy. * discounted *

2. Upgrade the older Events, when you "upgrade" the Application.

This may work - but it feels wrong. Upgrading events to something they never were breaks the model. Adding a field to an event .. what should be the default of the value that was never supplied 4 years ago ? Needless to say, you will recognise this is often the DB way. That is okay. It does work.. there is another way. * discounted *

3. Translate the older events on the fly as they are read in..

My idea was to build a shim in the "serialisation layer" that translates older events V1, V2 up to V3 equivalent .. but again it kind of smells.. so that is not it. * discounted * .. kind of..  

4. A Weak Schema.. 

Greg Young, on the eventsourced mailing list pointed me (in two words .. "Weak Schema") to  look at this model .. quickly I ended looking at Google's protobuf. I had read it before but had not the need, until now.

Protobuf is all about "version" management across messages between systems. This is exactly what Google built it for. Between their index servers that may be running different versions of software. And as it turns out, it might be a brilliant fit. I have some "tests" to do, which for now I am going to park as the theory seems okay, and when the need warrants I will utilise it's power. 

I googled in earnest about working with a "weak schema" but there was not a lot to read, but it didn't take too long to work it out.

Let me give you an example: 

if we have a "Command" object that we will serialise to disk. It could look like this: 

"CorrectTheBirthDate(uuid,newBirthDate)"

In my fictitious application, imagine we serialise this to JSON. It may look like this.

Now imagine 3 years later we figure it is a good idea to record the "reason" why the Birth date had to change; to support this we add a new field for "reason". Simple enough, our Command object changes to CorrectTheBirthDate(uuid,newBirthDate,reason) and the newer JSON is serialised as you would expect.
Ok. So what happens when the system "replays" all the Commands. Well with protobuf, it just sees that the field is not supplied, so it doesn't deserialise "nothing" into the object. Instead (using ScalaBuff) the case class is marked with "Option[] fields, so that the value in that instance becomes "None".

If it were the other case where a field is dropped; then the Case Class will just never have the value "loaded in".

I am sure there are horrid edge cases lurking, but it feels right to let the serialisation layer deal with the problem .. how it knows .. and so long as the developers know the rules (when things are dropped, or what is added when) then coding can continue. The trick or benefit is that the "protobuf" default should be enough.

Serialising in this way means that regardless of the changes in my API's, Commands and Events. I will always have the History. Therefore I will always have the ability to scour the depths for stats and reports. Exactly my reasons for Event Sourcing it. (amongst so many others).

So, stay tuned. I will post a real sample application once I tidy up the mess and make a real UI to play with.



Current 5 booksmarks @ del.icio.us/pefdus