I modeled the HBase implementation after the JPA mailbox implementation. I followed a top-down approach and created most of the classes as stubs that I have to fill. I also created the stub test classes.
Things are not very useful yet because there is no connection to a HBase cluster to test the (not here yet) good stuff. So this is what I'm concentrating on.
We don't have a schema for storing messages in HBase, and I believe it is going to change a few times before we manage to get it right. I figured we need a flexible system to keep up with the changes so I looked for a way to do this. I found an article from Lars George explaining how to change HBase tables from code and it inspired me to write an implementation for James.
I wrote one using Java for XML binding API, JAXB for short.
My implementation has three main classes that map to a hierarchy in XML: ColumnFamily, Table and Store. The classes are stored in
A store represents a James storage. A store spans multiple tables (one table for storing messages, another for subscriptions, etc.), each table can contain multiple column families to store different types of information.
While working on this I decided to try Test Driven Development. I found it to be very useful way of testing your design. It really kept me going because after each test passed successfully I got a good feeling. Small successes can mean a lot on the long run. The main benefit of this development method, besides the fact that you can test your code as you go, is that the test you write today can save you (a lot of grey matter) in the future. This is because (good) software that makes it into production is never finished. You always find some bugs to fix, features to add or remove, etc. Your code will change, but it's not always for the best. Some times things break and good tests make the difference.
So now I have code to load a mailbox schema written in XML, I have some test written. The next step is to create a schema and see how these things fit together. This was discussed in part on the mailing list and with my mentor Eric and the discussion summary is on a wiki page.
For the next two weeks I plan to be able to load and create a mailbox schema into HBase and begin testing with different ways of storing messages. I'm thinking of buying „HBase the definitive guide” from Amazon. I just hope a e-book version will be also available.
If you have any comments you can post them on the blog if they are related to the article or on the James mailing list if they are suggestions for James.
See you next time,