FAQ
Cassandra
DDL and DML

Understand and use the DDL subset of CQL

Keyspace

a top-level namespace for a CQL table schema

Defines the replication strategy for a set of tables
- Keyspace per application is a good idea
Data objects (e.g., tables) belong to a single keyspace

How are primary key, partition key, and clustering columns defined?

Simple partition key, no clustering columns

PRIMARY KEY ( partition_key_column )
Composite partition key, no clustering columns

PRIMARY KEY ( ( partition_key_col1, ..., partition_key_colN ) )
Simple partition key and clustering columns

PRIMARY KEY ( partition_key_column,clustering_column1, ..., clustering_columnM )
Composite partition key and clustering columns

PRIMARY KEY ( ( partition_key_col1, ..., partition_key_colN ), clustering_column1, ...,clustering_columnM ) )

UUID

universally unique identifiers, which is used for avoiding column collision.

Format:hex{8}-hex{4}-hex{4}-hex{4}-hex{12}

TIMEUUID

Embeds a time value within a UUID
CQL function now() generates a new TIMEUUID
CQL function dateOf() extracts the embedded timestamp as a date
TIMEUUID values in clustering columns or in column names are ordered based on time

TIMESTAMP

64-bit integer representing a number of milliseconds since January 1 1970 at 00:00:00 GMT

COUNTER

Cassandra supports distributed counters
Useful for tracking a count
Counter column stores a number that can only be updated
- Incremented or decremented
- Cannot assign an initial value to a counter (initial value is 0)
Counter column cannot be part of a primary key
If a table has a counter column, all non-counter columns must be part of a primary key

What is a secondary index?

Can index additional columns to enable searching by those columns
Cannot be created for
- counter columns
- static columns

When do you want to use a secondary index?

Use with low-cardinality columns

Columns that may contain a relatively small set of distinct values

Do not use:

On high-cardinality columns
On counter column tables
On a frequently updated or deleted columns
To look for a row in a large partition unless narrowly queried

-------------------------------

Understand and use the DML subset of CQL

What is the syntax of the INSERT statement?

INSERT INTO table_name (column1, column2 ...) VALUES (value1, value2 ...)

What is the syntax of the UPDATE statement?

UPDATE <keyspace>.<table> SET column_name1 = value, column_name2 = value WHERE primary_key_column = value;

Row must be identified by values in primary key columns

What is an "upsert"?

Cassandra is a distributed database that avoids reading before a write, so an INSERT or UPDATE sets the column values you specify regardless of whether the row already exists. This means inserts can update existing rows, and updates can create new rows. It also means it’s easy to accidentally overwrite existing data, so keep that in mind.
Both UPDATE and INSERT are write operations
No reading before writing

What are lightweight transactions or ‘compare and set’?

Introduces a new clause IF NOT EXISTS for inserts

Insert operation executes if a row with the same primary key does not exist
Uses a consensus algorithm called Paxos to ensure inserts are done serially
Multiple messages are passed between coordinator and replicas with a large performance penalty
[applied] column returns true if row does not exist and insert executes
[applied] column is false if row exists and the existing row will be returned

Update uses IF to verify the value for column(s) before execution

[applied] column returns true if condition(s) matches and update written
[applied] column is false if condition(s) do not match and the current row will be returned

What is the purpose of the BATCH statement?

BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical operation

Saves on client-server and coordinator-replica communication
Atomic operation
- If any statement in the batch succeeds, all will
No batch isolation
- Other “transactions” can read and write data being affected by a partially executed batch

Example:

BEGIN BATCH
  DELETE FROM albums_by_performer WHERE performer = 'The Beatles' AND year = 1966 AND title = 'Revolver'; 
  INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('The Beatles', 1966, 'Revolver', 'Rock');
APPLY BATCH;

Lightweight transactions in batch

Batch will execute only if conditions for all lightweight transactions are met
All operations in batch will execute serially with the increased performance overhead