partition key, composite key and clustering key in Cassandra

There is a lot of confusion around this, I will try to make it as simple as possible.

The primary key is a general concept to indicate one or more columns used to retrieve data from a Table.

The primary key may be SIMPLE

 create table stackoverflow (
      key text PRIMARY KEY,
      data text      
  );

That means that it is made by a single column.

But the primary key can also be COMPOSITE (aka COMPOUND), generated from more columns.

 create table stackoverflow (
      key_part_one text,
      key_part_two int,
      data text,
      PRIMARY KEY(key_part_one, key_part_two)      
  );

In a situation of COMPOSITE primary key, the “first part” of the key is called PARTITION KEY (in this example key_part_one is the partition key) and the second part of the key is theCLUSTERING KEY (key_part_two)

Please note that the both partition and clustering key can be made by more columns

 create table stackoverflow (
      k_part_one text,
      k_part_two int,
      k_clust_one text,
      k_clust_two int,
      k_clust_three uuid,
      data text,
      PRIMARY KEY((k_part_one,k_part_two), k_clust_one, k_clust_two, k_clust_three)      
  );

Behind these names …

  • The Partition Key is responsible for data distribution accross your nodes.
  • The Clustering Key is responsible for data sorting within the partition.
  • The Primary Key is equivalent to the Partition Key in a single-field-key table.
  • The Composite/Compund Key is just a multiple-columns key

Further usage information: DATASTAX DOCUMENTATION


EDIT due to further requests
Small usage and content examples
SIMPLE KEY:

insert into stackoverflow (key, data) VALUES ('han', 'solo');
select * from stackoverflow where key='han';

table content

key | data
----+------
han | solo

COMPOSITE/COMPOUND KEY can retrieve “wide rows”

insert into stackoverflow (key_part_one, key_part_two, data) VALUES ('ronaldo', 9, 'football player');
insert into stackoverflow (key_part_one, key_part_two, data) VALUES ('ronaldo', 10, 'ex-football player');
select * from stackoverflow where key_part_one = 'ronaldo';

table content

 key_part_one | key_part_two | data
--------------+--------------+--------------------
      ronaldo |            9 |    football player
      ronaldo |           10 | ex-football player

But you can query with all key …

select * from stackoverflow where key_part_one = 'ronaldo' and key_part_two  = 10;

query output

 key_part_one | key_part_two | data
--------------+--------------+--------------------
      ronaldo |           10 | ex-football player

Important note: the partition key is the minimum-specifier needed to perform a query using where clause. If you have a composite partition key, like the following

eg: PRIMARY KEY((col1, col2), col10, col4))

You can perform query only passing at least both col1 and col2, these are the 2 columns that defines the partition key. The “general” rule to make query is you have to pass at least all partition key columns, then you can add each key in the order they’re set.

so the valid queries are (excluding secondary indexes)

  • col1 and col2
  • col1 and col2 and col10
  • col1 and col2 and col10 and col 4

invalid:

  • col1 and col2 and col4
  • anything that does not contain both col1 and col2

Here is a good video about Cassandra modeling explaining how data is stored with partition key and sorted with clustering key.

This article explains the architecture and all the detail about Cassandra including partition/replica factor/boostraping

This vimeo video detailed explained 1.ring 2.primary key(partition key) hash 3. replica mechanism etc…. Like the partition key is hashed using MD5 mapping key to 128bit number, then token is assigned to nodes which consist the cluster(token ring).  Each record will be store into node depending on its partition key hash.

This post explains how read/write requests are handled in the ring.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s