Wednesday, 16 April 2014

SBT not ready for the Corporate World

I have been using SBT now for about two years and I really like it's features. I very much grok it.

It is definitely growing up, in terms of new features and stability. However, of course there is an however, "I can't give this to my friends".

It is just not ready for the corporate world. The likes of Gradle, and yes even Maven make it so much easier to explain the build process. Actually, I can explain the build process with gradle and maven. But SBT is SUCH a massive barrier. We really can do better folks.!

I think I can base my gripes down to two areas: Syntax, and then a higher level, the complexity of creating a build setup.

Syntax
In short, SBT has confusing and ugly syntax. (shocking really) Watching Martin Odersky's key note at the Scala 2013 days, he discussed the notion that we should be very careful with Symbolic vs alphabetic Method names.

http://parleys.com/play/51c1994ae4b0d38b54f4621b/chapter35/about (see from 59:10)

Here Martin discusses Alphabetic vs Symbolic method names. Can we have some normal reading config please ? Looked at a gradle build file lately ? I can explain that to a junior developer of 3 months very easy - but sbt. not a chance (I have tried). It just represents - "forget it - what do you know about node.js ?"

Build and Customisation Complexity

What Ruby on Rails and Maven and (some other stuff I am sure) have in common is the Convention over Configuration. Gradle does this well, where to extend a plugins execution, just define "more stuff for it to do".

The way settings are applied in  a .scala definition is almost appalling. It really needs a rethink as when I show people the various methods, you get blank stares or squizzed up faces.

With SBT, I find I have to write some very complex logic if I want to move files, or process config files etc etc. I feel this comes from SBT lacking a "customer API". It has an Engineers API at the moment but ease of extending it's activities really requires some serious depth knowledge of Scala AND SBT.

My issue

SBT is shaped as the defacto build too for Scala; yes Maven and Gradle can do it. But SBT does it better. Because it is defacto, and because we want new users to learn the languages, and because we want the new users to have good projects (for longevity of maintenance) we want them to use a build tool. But our option .. sticking them in front of SBT ?? oh dear.

Scala is a fantastic language, simply my favourite and I know a LOT of languages and environments. Ant was a brilliant help, and Maven improved that massively (yes yes, ant v maven). So when I train new users (ok / so so developers) in a project, I don't want to confuse them with "take up", I just want to get them on the road to exploring. Training Java, it might be Spring, or Drools or JAX-RS or something else. With Maven (and now with Gradle as that is my preferred Java build tool) I can simply setup a build script, easily (and I mean easy), explain the script to someone, and get them on their way.

The new-ish developer does not need to know much about Java to understand Maven and even less knowledge to understand Gradle as it reads better.

But if I wanted to get someone up and running with Scala, giving them a default Build tool, I can show them an SBT build setup. Oops. that doesn't work.

Parallel this poor experience with how Scala compares to Java, I can show them a case class and how simple that is, so at the high level, Scala as a language is easy, but SBT is just killing take up for the less experienced.

I have read that sbt is like vi, once you grok it, you get it.. and yes I am definitely there. (vi is my favourite editor) - but please can we work on a new build definition language - and I really mean something that it's GOAL is user friendliness, to overlay on top of the vi-ness ?

If we get that right, because SBT is a defacto build tool, we might get even more developers!

Is there something in the works that will help SBT be a tool for the masses ? Because at the moment you need a degree and a year on Scala to understand how to do more complex things (and sometimes something basic).

Every so often I do consider switching a project to Gradle but, of course, SBT just does work better with Scala and I like it, but I can't share a build setup with a new junior developer. I have tried and it just confuses to the point on non-productiveness

So I have an itch, I may scratch it.. .. but I haven't yet

There, I can have a coffee now.

<rant-off/>

Saturday, 15 March 2014

akka-persistence and DDD

I have been working for some years now on a few projects. In many respects just code to exercise the brain.

About 3 years back I picked up event sourcing from a Scala Days conference and I am sold. now more than ever.

So I have released the core components of my scala - akka-persistence - DDD (CQRS-esq) implementation over on github for all to peruse.

In the coming weeks I will add a "Petstore" ala, sample application to the implementation (and it will be a case of eat your own dog food).

https://github.com/rbuckland/io-straight-fw

This ties together the very odd Uuid generator I blogged about about a year a go and Event Sourcing of objects.

I am using AngularJS as my front library which works really well because with spray.io and the Jackson Marshalling, I get the Scala Case classess coming back to the browser as JSON with next to no effort at all.

And because it is all in memory, the speed is just blindingly quick! Very very impressed.

Thanks have to go to
- Martin Krasser for eventsourced - and akka-persistence (it's successor)
- Mathias Doentiz and the spray.io team
- All the Typesafe Akka guys. Seriously brilliant stuff
- The whole AngularJS team - job really well done.
- Jackson JSON Marshalling. Truely the best JSON marshaller there is.
(is that it ? ) .. for now yep.


Monday, 4 November 2013

Ramon's Rough Guide to Akka - Top Tips 1

Ramon's Rough Guide to Akka - Top Tips 1

And someone please tell me if the tips are wrong :-)

Over the past 2 years in my spare down time, I have been using Akka for the core of my application. There is so much to grok, but it is also very simple.

This will be a really short post, of things which I wish I had read somewhere, before diving into Akka.

1. Don't pass ActorRef's around :-)

The ActorRef (as I read tonight) is a handle to a specific instance of an Actor. So when that Actor dies an another takes it's place (or in my case during event sourcing replay) - The ActorRef will become stale.

Instead, get the ActorRef (or an ActorSelection) from

   system.actorSelection("name")
   context.actorSelection("/orPath/name")

(and I uncovered this due to the deprecation of actorFor).
This will give you the Actor you need.

I guess I passed the ActorRef in to places because I am not using any DI (like I used to always with Spring). So instead .. now system or the context can be always used.


2. Read this new Post of Actor paths and Actor addressing!

This explains (1) better than I can.

http://doc.akka.io/docs/akka/snapshot/general/addressing.html

That's it for today.

Friday, 11 October 2013

Embedding a Primary Key in a UUID / GUID for CQRS / ES

For some time now I have been working on an Event Sourced system. Today I am going to describe my UUID/GUID primary key approach that I devised. Ignoring much about the event sourced technology and the merits thereof; primary key's are common and critical and their usage in an Event Sourced system is very critical.

The usual route for a primary key in a system is an increasing number (Person ID = 102).

With Event Sourcing and CQRS, the UUID or GUID is often identified as a good implementation choice for your primary key

If you are interested in the topic, then these two resources are a good read. (nothing to do with Event Sourcing)

But I have a few issues with the UUID. For one, it's not very client friendly, nor is it easily memorable. I want a system wide unique reference for all my objects ; a UUID is great for that; and I want it to also function as a friendly and memorable primary key.. how ? Well here below is my solution.

A little about Event Sourcing

With event sourcing, the primary route to storage is not through the "Domain Object".

Most developers are familiar with an Object-Relational Mapping system (Hibernate, NHibernate etc). (Object persisted to a table). This usual approach is that a Domain object is persisted (from a class or class heirarchy structure) to a database table. Event Sourcing instead stores the "event" that creates or modifies the Domain Object, and not the Object itself. A rebuild or some other query is a replay of the Events stream that was persisted.

I explain all that to say that an Event Sourced (CQRS) system can be designed without "tradtional primary keys".

Each event has a key, and the objects created can have keys, but what is common is to use UUIDs across all domain objects.

My Design Goal

I wanted some traditional primary keys; and I wanted the UUID. In short, the system should be simple to use; being both client friendly and a memorable primary key for use.

When you have a UUID as the primary key and it is used in API's and get's passed around the office as "Look up client 3422" you need this API to be simple.

An example: /person/fbe645f0-3031-41e3-aa6e-0800200c9a66 is just not a nice URI

However,

/person/2078

or

/person/5f9

They are simple.

What I have embarked on is to segment my UUID (my primary keys) so that I utilise the entire UUID space - a part for randomness, a part for the Group (or table ID if you like) and a part for the object; in this example case, the person.

The make up of my Custom UUID

The UUID is a 128 bit number, represented as 32 hex characters (with some dashes for legibility)

The UUID specification reserves some bits for version and variant.

With much (actually 30 minutes) thought, I have decided to use Version 6. (it doesn't exist in the IETF RFC 4122, they just went 1, 2, 3, 4 and 5)

and just make the Variant 'a'.

but ignoring all that, what is special about my UUID is that I embed a primary key and a "group" ID into the UUID.

So, my UUID's look like

pppppppp-pppp-6ggg-arrr-rrrrrrrrrrrr

where p is the Primary key (sequential incremementing)

where 6 is the Version (always 6)

where g is the Group ID, (akin to an identifier of the table)

where a is the variant (always a)

where r is the random part

This is my working notes: 

00000000-0000-6000-a000-000000000000
                    -----------------
                    60 bits Random bits (time or other)
                   -
                   x == a, b, 8 or 9
               ---
               group ID - of 12 bits value is (0 to 4095)
              
              6 == always 6 - (Version 4, Random UUID)
-------- ----
Primary key ID of 48 bits  (max 281,474,976,710,656)

So I have enough bits for a primary ID (halfway between Int *32 bits and Long *64 bits)

The Version

I decided to use 6 as the Version, 1 - 3 are other uses, 4 is Random or Psuedo random, which is almost mine, but not quite and Version 5 is a SHA-1 Hash; the RFC 4122 stipulates the following

Process of identifier assignment:

Generating a UUID does not require that a registration authority
be contacted. One algorithm requires a unique value over space
for each generator. This value is typically an IEEE 802 MAC
address, usually already available on network-connected hosts.
The address can be assigned from an address block obtained from
the IEEE registration authority. If no such address is available,
or privacy concerns make its use undesirable, Section 4.5
specifies two alternatives. Another approach is to use version 3
or version 4 UUIDs as defined below.

The sentence I want to draw attention to is - "Generating a UUID does not require that a registration authority be contacted"

But then again, because I am using Version 6, it's probably not a UUID.

My reasoning to a Version 6 are:

  • My UUID is not a Version 4, because it is only partially random ( a part of it )
  • It is not any other Version (1-3, or 5)
  • Version 6 wasn't used
  • If someone has an issue with my use, then I'll call it instead a LUUID, a Local UUID

Moving on.

The Group ID

I wanted to have a unique key that represents all objects across the space. In this way I can now have a REST URI that looks like

/any/<uuid>

and the system can appropriately redirect to the correct resource, by looking at the group ID. (pattern matching the -6xxx- )

For example, if we have the UUID

000000231-22e-6aa1-a789-28ef27ab7c62

This is

  • aa1 for the Group ID for 'person'
  • 23122e for the primary key

/any/000000231-22e-6aa1-a789-28ef27ab7c62

and with a match we can redirect the request to

/person/000000231-22e-6aa1-a789-28ef27ab7c62

Clashes in the Primary Key with a Distributed System

CQRS guru Greg Young says of the UUID (and Event Sourced systems)

Having the client originate Ids normally in the form of UUIDs is extremely valuable in distributed systems.

And this is true, but I don't want to give the client that honour, I want to allocate them a UUID (for many reasons) - but that comes with a drawback of needing to cater for clashes. (there is always a tradeoff with IT)

For my UUIDs, the primary key is sequential, at point of allocation. (231-22e, 231-22f, 231-230 etc). Having the end of the "UUID" random (e.g.: -a789-28ef27ab7c62)means I can cluster (multiple systems) and allow for "duplicates" even in the primary key space, (where two servers generate the same ID and Group ID).

Leaving the UUID as it is (with a duplicate in the PK ID space) is ok but not ideal. So we will need to cater for that. (and of course for the lottery day when two systems generate the same UUID even down to the random part!)

Let me explain that : Assume that I have a web application with two servers that generate "people",

   www.myco.com  

  • server1-myco; and
  • server2-myco
 
If server1-repo and server2-repo BOTH generate a UUID but the random part differs, e.g.
 
server1 - 000000231-a32-6aa1-a789-28ef27ab7c62   (Jim)
server2 - 000000231-a32-6aa1-a25e-6ac5c6ef127c   (Mary)

then, technically I have two unique ID's, but the initial shorter ID's clash. This is not exactly ideal - because I want my friendly PK ID's to be unique also. If I want to minimise that "duplicate", then here are some options.

  1. Single ID generation node (creates a single point of failure)
  2. Regular re-assignment in case of a clash
  3. Generate and check with peers
  4. Block reserving

Options 2, 3 and 4 will be my preferred. With a distributed system, I have that PK ID issue anyways, so it's not different because I am playing with UUID's.

But what I really like is how I can use the primary key all by itself, outside of the /UUID

for example. 


Dear Mr Client,

Welcome as a supplier to Company X. For future reference, your client ID is 23122e.

...


or

/person/23122e

I don't have to use the full UUID everywhere, but rather just where I need it (in the event sourced messages), and as a unique global ID on the system.

Debugging a system will be easier too, looking at logs, someone with a little knowledge will recognise 'people' UUID's vs 'building', or 'schedule' UUID's (because they will distinguish the group ID after '-6xxx-' as the unique group identifier); in effect people will learn those 3 characters and identify what the group Id is.

UUID generation

So now I hear you wonder, how do I generate these mythical UUID's ?

Simply really.

This is the call for the generating the UUID (reusing the java.util.UUID class, just giving two lower and upper longs (64 bits each)

  /**

   * Create a new UUID given some ID as the groupID and an already sequenced ID

   */

  def createUuid(groupId: Int, id: Long): Uuid = {

 

    val randomBytes = new Array[Byte](8)

    secureRandom.nextBytes(randomBytes)

    val randomLong = java.nio.ByteBuffer.wrap(randomBytes).getLong()

 

    // groupID has to be 12bits as the 4L is going in over the top.

    return Uuid(new java.util.UUID(groupId | (6L << 12) | (id << 16), (10L << 60) | (randomLong >>> 4)).toString())

 

  }

 

The Primary Key is an incremental sequence on the code that creates a new person, or group, or employee (etc)

In Scala, it is simply a matter of adding in a Trait for the "class" you want ID's sequenced for/ UUID's generated for. e.g:  I add with UuidGenerator[Person] and this gives a newUuid() method 

class PersonProcessor(val repository: Repository[Uuid, Person]) extends AbstractProcessor[Person]

with UuidGenerator[Person] 

{ this: Emitter =>

  

  def klass = classOf[Person]

 

...

      createPerson(newUuid, cmd )

 

The implementation of that newUuid() method looks as follows:


trait UuidGenerator[D] {

  

  implicit def klass: Class[_]

  

  private val ids = Map.empty[String, Long]

  

  private[this]def className = klass.getClass().getCanonicalName()

 

  /**

   * return the next ID

   */

  def newUuid:Uuid = {

    val idKey = className

    val currentId = ids.getOrElseUpdate(idKey, 0L)

    ids += (idKey -> (currentId + 1))

    return Uuid.createUuid(klass, currentId + 1)

  }

...


And Uuid.createUuid looks like : 


 

  def createUuid(klass: Class[_], id: Long): Uuid = createUuid(groupId(klass), id)

 

The Group ID is simply CRC-12, or 12 bit CRC over the classname. This is because I had 12 bits to spare where I placed the groupId -6aa1 So given that all 'Person' domain objects extend from com.soqqo.system.domain.Person my groupId's are consistent there.

 

On bootstrapping my system, I make sure that all "in use" groupId's are not clashing on CRC12-ing them - could happen - and if it does I'll deal with that then. (just System.halt bootstrap .. and change code to suit)

 

This is the crc12 implementation in Scala. I haven't tested that brutally, but it DOES generate unique < 4096 ID's for random byte's passed in, so it is working the way I need it to.

  def crc12(toHash: String) = {

 

    /**

     * ************************************************************************

     *  Using direct calculation

     * ************************************************************************

     */

 

    varcrc:Int = 0xFFF; // initial contents of LFBSR

    varpoly: Int = 0xF01; // reverse polynomial

    var bytes = toHash.getBytes()

 

    for (b: Byte <- bytes) {

      var temp = (crc ^ b) & 0xff;

 

      // read 8 bits one at a time

      for (i <- 0 to 7) {

        if ((temp & 1) == 1) temp = (temp >>> 1) ^ poly;

        else temp = (temp >>> 1);

      }

      crc = (crc >>> 8) ^ temp;

    }

 

    // flip bits

    crc = crc ^ 0xfff;

 

    crc;

 

  }

 

Summary

I hope this helps someone on any of the weird topics I have covered here. I will share the Uuid and UuidGenerator classes happily for anyone that wants them. They are anything special, but rather a lot of thinking about how I wanted my Uuid's to be utilised in the system.

My system entails:

1. spray.io - Web router on top of Spray Can
2. Event Sourced (now also known as akka-persistence)
3. AngularJS

Enjoy!

Tuesday, 2 July 2013

Heroku and Gradle - and jetty-runner Configuration

I recently worked on a project where we used Heroku as the deploment engine.

For speed of the project I chose Maven (simply because I know it well and it is VERY good).

Knowing also that Gradle is now becoming the "next build kid" on the block and, for me, recognising that Gradle is easier to configure, I set about the task today of switching the build to Gradle.

This was very simple, and the last "part of the puzzle" after replicating all the functionality was to setup the Heroku parts. By default, we were using the Maven (jetty-runner) boot strapping. Arguably it is easier and lighter to run Jetty Embedded (as per https://github.com/heroku/devcenter-gradle) but I wanted to see what is required to use "jetty-runner"). This was more an excercise in build comparison, that it was "get it onto Heroku".

Heroku will detect a Maven pom.xml, and by default will run

This creates a war. Heroku then runs your app using "whatever" is in the Procfile. The Procfile they suggest looks like this:

The jetty-runner.jar gets in the "target/dependency/" folder due to this Maven Magic.

When Heroku detects a Gradle project, it runs

instead of "gradle package".

So to replicate the same with Gradle I had to write a copy task to get jetty-runner in there, and generate the war and attach that all to a "stage" task.

It is very easy when you look at it, the trick is in the knowing of the API that slows it down. But I resolved it in about 30 minutes.

In short, the changes for Gradle are as follows.

1. Change your Procfile (we will get the jetty-runner.jar in "build/libs") (the war goes there by default)

2. Add a "new" configurations for the depenency of jetty-runner

3. Add a new dependency for Jetty Runner (note I also use newrelic so it goes in there too!)

4. Create a new task which copies the jars from the dependencies "runtimeOnly" We also replicate the Maven method of renaming the jar to have no Version(s).

5. Add a "stage" task, because that is what Heroku will run.

And that is it. When stage is run, it will create a war, and copy jetty-runner.jar into the build/libs folder.

Happy days.

Current 5 booksmarks @ del.icio.us/pefdus