Understand and use the DDL subset of CQL
Keyspace
a top-level namespace for a CQL table schema
- Defines the replication strategy for a set of tables
- Keyspace per application is a good idea
- Data objects (e.g., tables) belong to a single keyspace
How are primary key, partition key, and clustering columns defined?
Simple partition key, no clustering columns
PRIMARY KEY ( partition_key_column )
Composite partition key, no clustering columns
PRIMARY KEY ( ( partition_key_col1, ..., partition_key_colN ) )
Simple partition key and clustering columns
PRIMARY KEY ( partition_key_column,clustering_column1, ..., clustering_columnM )
Composite partition key and clustering columns
PRIMARY KEY ( ( partition_key_col1, ..., partition_key_colN ), clustering_column1, ...,clustering_columnM ) )
UUID
universally unique identifiers, which is used for avoiding column collision.
Format:hex{8}-hex{4}-hex{4}-hex{4}-hex{12}
TIMEUUID
- Embeds a time value within a UUID
- CQL function now() generates a new TIMEUUID
- CQL function dateOf() extracts the embedded timestamp as a date
- TIMEUUID values in clustering columns or in column names are ordered based on time
TIMESTAMP
64-bit integer representing a number of milliseconds since January 1 1970 at 00:00:00 GMT
COUNTER
- Cassandra supports distributed counters
- Useful for tracking a count
- Counter column stores a number that can only be updated
- Incremented or decremented
- Cannot assign an initial value to a counter (initial value is 0)
- Counter column cannot be part of a primary key
- If a table has a counter column, all non-counter columns must be part of a primary key
What is a secondary index?
- Can index additional columns to enable searching by those columns
- Cannot be created for
- counter columns
- static columns
When do you want to use a secondary index?
Do not use:
- On high-cardinality columns
- On counter column tables
- On a frequently updated or deleted columns
- To look for a row in a large partition unless narrowly queried
-------------------------------
Understand and use the DML subset of CQL
What is the syntax of the INSERT statement?
INSERT INTO table_name (column1, column2 ...) VALUES (value1, value2 ...)
What is the syntax of the UPDATE statement?
UPDATE <keyspace>.<table> SET column_name1 = value, column_name2 = value WHERE primary_key_column = value;
- Row must be identified by values in primary key columns
What is an "upsert"?
Cassandra is a distributed database that avoids reading before a write, so an INSERT or UPDATE sets the column values you specify regardless of whether the row already exists. This means inserts can update existing rows, and updates can create new rows. It also means it’s easy to accidentally overwrite existing data, so keep that in mind.
Both UPDATE and INSERT are write operations
No reading before writing
What are lightweight transactions or ‘compare and set’?
Introduces a new clause IF NOT EXISTS for inserts
- Insert operation executes if a row with the same primary key does not exist
- Uses a consensus algorithm called Paxos to ensure inserts are done serially
- Multiple messages are passed between coordinator and replicas with a large performance penalty
- [applied] column returns true if row does not exist and insert executes
- [applied] column is false if row exists and the existing row will be returned
Update uses IF to verify the value for column(s) before execution
- [applied] column returns true if condition(s) matches and update written
- [applied] column is false if condition(s) do not match and the current row will be returned
What is the purpose of the BATCH statement?
BATCH statement combines multiple INSERT, UPDATE, and DELETE statements into a single logical operation
- Saves on client-server and coordinator-replica communication
- Atomic operation
- If any statement in the batch succeeds, all will
- No batch isolation
- Other “transactions” can read and write data being affected by a partially executed batch
Example:
BEGIN BATCH
DELETE FROM albums_by_performer WHERE performer = 'The Beatles' AND year = 1966 AND title = 'Revolver';
INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('The Beatles', 1966, 'Revolver', 'Rock');
APPLY BATCH;
Lightweight transactions in batch
- Batch will execute only if conditions for all lightweight transactions are met
- All operations in batch will execute serially with the increased performance overhead
Example:
BATCH
UPDATE user SET lock = true IF lock = false; WHERE performer = 'The Beatles' AND year = 1966 AND title = 'Revolver';
INSERT INTO albums_by_performer (performer, year, title, genre) VALUES ('The Beatles', 1966, 'Revolver', 'Rock');
UPDATE user SET lock = false;
APPLY BATCH;