Introducing Apache Mahout

Nowadays almost every application is dealing with a descently big amount of data. Irrespective of whether the application is of e-commerce, travel, learning, finance or social networking domain, they all have a good amount of customer data, feedback data, financial records etc. The origin of frameworks in nosql and distributed computing/storage field have also boosted the [...]

Hadoop – What is MapReduce?

In this post I will be explaining the concept of MapReduce and how Hadoop uses it. I will be talking mostly about what Hadoop is, what can it do, when to use it etc. So, lets try to associate Hadoop with the technical problem it solves. Suppose the application you are working on has ever [...]

Building a Hadoop Cluster using VirtualBox

Getting a few spare machines to start learning a NoSQL database or distributed software in early stages looks like an overkill($$). This blog post addresses creating a Ubuntu based Virtual machine for running Hadoop and automation of virtual machine (VM) creation from a given disk images using VirtualBox. While starting with NoSQL group at Xebia we decided [...]