Groovy client for Apache Cassandra
I have been exploring Apache Cassandra recently for implementation of a project. Cassandra is a NoSQL (not only SQL) database open sourced by Facebook.
This blog shares a Groovy implementation of a Cassandra client which makes accessing Cassandra as simple as accessing a HashMap. The Groovy implementation is currently a wrapper on the basic Thrift API. The map keys serve as the query language to perform slices and range queries.
Cassandra provides a Thrift based interface which makes it accessible from different languages. It is similar to a WSDL file for Web Service which can be used to generate code clients in different languages. There are two open source wrappers on Thrift API to access Cassandra:-
Cassandra Java Client is too basic. Hector is a good option as it provides features like pooling and is JMX enabled.
I tried the Java thrift API and Hector but it looks too cumbersome and verbose if you wrote code in Groovy or another programming language. Following is the sample groovy test code compared to Thrift based Java example..
void testSingleColumnGetAndSet() {
k["User/a/user_id"] = "1";
assert k["User/a/user_id"] == "1";
}
void testSlice() {
k["User/b/1"] = "1";
k["User/b/2"] = "2";
k["User/b/3"] = "3";
assert k["User/b/[1,2,3]"] == [b:["1", "2", "3"]];
assert k["User/b/[1-2]"] == [b:["1", "2"]];
assert k["User/b/[3]"] == [b:["3"]];
}
void testRangeSlices() {
k["User/d/1"] = "1";
k["User/d/2"] = "2";
k["User/e/1"] = "1";
k["User/e/2"] = "2";
assert k["User/[e-e]/[1,2,3]"] == [e:["1","2"]];
assert k["User/[d-e]/[1,2,3]"] == [d:["1","2"],e:["1","2"]];
}
object 'k' is the keyspace wrapper, 'User' the column family and keys and columns follow them in a '/' separated path.
Cassandra data model is simple with only complication of a SuperColumn as explained in the blog (the title of the post should give you an idea).
- Keyspace: like a database schema
- ColumnFamily: Individual data spaces (User,Tweets)
- Key: A key in the ColumnFamily
- Column: A name,value and timestamp combination
- SuperColumn: A Column that can contain multiple columns (in a ColumnFamily you cannot mix super and 'normal' columns)
Following diagram tries to explain the data model.
|
1 |
<a rel="attachment wp-att-4781" href="http://xebee.xebia.in/2010/09/14/groovy-client-for-apache-cassandra/datamodel-3/"><img class="alignnone size-full wp-image-4781" title="DataModel" src="http://xebee.xebia.in/wp-content/uploads/2010/09/DataModel2.png" alt="Cassandra DataModel" width="572" height="324" /></a> |
Now lets have a look at the Groovy code that accesses Cassandra.
How it works:
This is simply achieved by providing getAt and putAt in the Keyspace object which is the Groovy way of operator overriding.
def getAt(String key) {
def ctx = Selector.readContext(key)
ctx["ks"] = ks;
def ret = client.get( ctx);
return ret
}
void putAt(String key, Object value) {
def ctx = Selector.writeContext(key)
ctx["ks"] = ks;
client.put( ctx, value.toString());
}
The 'selectors' (keys used in the subscript operator) are inspired from JXPath world and are used for simple range queries as well. Below is a sample of the type of selectors that could be used.
void testIdentifySetRequest() {
assert Selector.parse("cf/key/col", SelectorType.SET) == "set_slice_col";
}
void testIdentifyGetRequest() {
assert Selector.parse("cf/key/col", SelectorType.GET) == "get_col";
assert Selector.parse("cf/key/[col1,col2]", SelectorType.GET) == "get_slice_col"; // key slice
assert Selector.parse("cf/key/[col1-col2]", SelectorType.GET) == "get_slice_col"; // col range
assert Selector.parse("cf/key/[col1-]", SelectorType.GET) == "get_slice_col"; // start to end
assert Selector.parse("cf/key/[*]", SelectorType.GET) == "get_slice_col"; // all cols
assert Selector.parse("cf/[key1-key2]/[col1-col2]",SelectorType.GET) == "get_range_slice"; // range
}
Selectors are implemented using regular expressions. Selectors parse the 'path' expression and provide it as an input to client. Following is a code snippet of the expressions used.
def static final SET_REGEX = [ set_slice_col: ~/(\w+)\/(\w+)\/(\w+)/ ]; def static final GET_REGEX = [ get_col: ~/(\w+)\/(\w+)\/(\w+)/, // column get get_slice_col: ~/(\w+)\/(\w+)\/((\[(\w+[,-]?)+\])|(\[\*\]))/, // give me all columns get_range_slice: ~ /(\w+)\/((\[(\w+[,-]?)+\])|(\[\*\]))\/((\[(\w+[,-]?)+\])|(\[\*\]))/, // get range slice get_range_key: ~/(\w+)\/\[(\w+[,-]?)+\]/ // give me a key range ] as LinkedHashMap;
Below is the client code (GHector?) and Category code that takes out the reusable Cassandra specific code.
def put(ctx,value) {
execute { Cassandra.Client client ->
use (CassandraCategory) {
client.insert ctx.ks, ctx.key, cpath(ctx), value.serialize(), timestamp, defConLevel
}
}
}
def execute(cmd) {
TTransport tt = new TSocket(server, port);
TProtocol tp = new TBinaryProtocol(tt);
Cassandra.Client c = new Cassandra.Client(tp);
try {
tt.open();
cmd(c);
} finally {
tt.close();
}
}
...
}
// Category code
class CassandraCategory {
static cpath(ks, args) {
ColumnPath cp = new ColumnPath(args["cf"]);
cp.setColumn(args["col"].bytes);
return cp;
}
static serialize(String s) {
return s.getBytes("UTF-8");
}
...
}
This approach could be useful for applications where we have fixed queries so the selectors could be cached. An application selecting large number of columns/keys dynamically would need an alternate approach (some sort of query API with filters).
This is still a work in progress (like results could have column names as keys as well... wait a second , it looks like JSON). In future I will be trying to implement the client using Hector instead of the raw Thrift API. Your comments are appreciated on how to make this better.




[...] leave a comment » My post on Xebee blog on accessing Cassandra using a Groovy client. Xebee Blog Post [...]