5 / 6
NoSQL - Column Oriented Databases

NoSQL - Column Oriented Databases

What do we mean by serialization in computing? Serialization is a process of converting objects into an array of bytes.

As you can see in the diagram, the object on the left side has name, company and gender gets converted into an array of characters.

Once an object is serialized, the bytes can be saved or transferred and then the object can be reconstructed later.

The process of constructing objects from a sequence of bytes is called de-serialization.

While talking about NoSQL databases, a term we would often use is Column oriented data formats or data stores. While converting a tabular data into a sequence of bytes, we can go column wise or row wise.

For example, we could convert this tabular data to 10 Joe 12 Mary 11 Cathy. This is called row oriented data format and the datastores that save the tabular data in this format are called row oriented data store. This is the traditional way of storing data.

If we store the example table as 10 12 11 Joe mary catchy, it is called column oriented data store or format. In column oriented data store, we first store the first column and then second column and so on.

Since the similar data comes together, the column oriented data formats generally offer better compression.

Certain data stores have come up with kind of hybrid storage called Column Family oriented data stores. In such stores, we group columns into column families. The data is stored column family wise in such data store. Though, the values under a column family are stored row wise.

For given example, CF1 will be stored first and then CF2. For CF1 and CF2, the data will be stored row-wise. So, the result will be 10 Joe 12 Mary 11 Cathy 23 33 45

Column family oriented data a very clever design because with column family oriented way, you can model your data either in row oriented, column oriented or hybrid formats. This provides a greater flexibility.

In first example, we have one column family per column and in the second example, we have a single column family for all columns.

You can observe that first design behaves like a column oriented data store and the second one behaves as a row oriented data store.