[go: up one dir, main page]

0% found this document useful (0 votes)
24 views17 pages

Cassandra Data Model

data model for casandara db
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views17 pages

Cassandra Data Model

data model for casandara db
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Cassandra Data Model

KeySpace

1. Key space: Outer most container for data


– Attributes:
• Replication factor
• Replica placement strategy ( Simple strategy (Rack aware) , Network Topology strategy
(Data center - shared)
• Column Families – Container for ordered collection of rows. Represents the structure of
data.
– CREATE KEYSPACE [Keyspace name] WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 3};
2. Column Families

Relational Table Cassandra column Family


A schema in a relational model is fixed. Once In Cassandra, although the column families
we define certain columns for a table, while are defined, the columns are not. You can
inserting data, in every row all the columns freely add any column to any column family
must be filled at least with a null value. at any time.
Relational tables define only columns and the In Cassandra, a table contains columns, or
user fills in the table with values. can be defined as a super column family.
Column Family & Column Cont…
The following layout represents a row in a Column Family (CF):

Column : A Basic data structure. It will have 3 values. Name, Value and Time stamp.

A Typical table
Data Model Comparision
RDBMS Cassandra
RDBMS deals with structured data. Cassandra deals with unstructured data.

It has a fixed schema. Cassandra has a flexible schema.

In RDBMS, a table is an array of arrays. (ROW x In Cassandra, a table is a list of “nested key-
COLUMN) value pairs”. (ROW x COLUMN key x COLUMN
value)

Database is the outermost container that Keyspace is the outermost container that
contains data corresponding to an application. contains data corresponding to an application.

Tables are the entities of a database. Tables or column families are the entity of a
keyspace.
Row is an individual record in RDBMS. Row is a unit of replication in Cassandra.

Column represents the attributes of a relation. Column is a unit of storage in Cassandra.

RDBMS supports the concepts of foreign keys, Relationships are represented using collections.
joins.
Cassandra Query Language (CQL).
Options Usage

cqlsh --help Shows help topics about the options of cqlsh commands.

cqlsh --version Provides the version of the cqlsh you are using.

cqlsh --color Directs the shell to use colored output.

cqlsh --debug Shows additional debugging information.

cqlsh --execute Directs the shell to accept and execute a CQL command.
cql_statement

cqlsh --file= “file name” If you use this option, Cassandra executes the command in the given file and exits.

cqlsh --no-color Directs Cassandra not to use colored output.

cqlsh -u “user name” Using this option, you can authenticate a user. The default user name is: cassandra.

cqlsh-p “pass word” Using this option, you can authenticate a user with a password. The default password
is: cassandra.
CQLSH – Cont…
Documented Shell Commands
• HELP - Displays help topics for all cqlsh commands.
• CAPTURE - Captures the output of a command and adds it to a file.
• CONSISTENCY - Shows the current consistency level, or sets a new consistency level.
• COPY - Copies data to and from Cassandra.
• DESCRIBE - Describes the current cluster of Cassandra and its objects.
• EXPAND - Expands the output of a query vertically.
• EXIT - Using this command, you can terminate cqlsh.
• PAGING - Enables or disables query paging.
• SHOW - Displays the details of current cqlsh session such as Cassandra version, host, or data
type assumptions.
• SOURCE - Executes a file that contains CQL statements.
• TRACING - Enables or disables request tracing.

CQL Data Manipulation Commands


• INSERT - Adds columns for a row in a table.
• UPDATE - Updates a column of a row.
• DELETE - Deletes data from a table.
• BATCH - Executes multiple DML statements at once.
CQLSH – Cont…
CQL Data Definition Commands
• CREATE KEYSPACE - Creates a KeySpace in Cassandra.
• USE - Connects to a created KeySpace.
• ALTER KEYSPACE - Changes the properties of a KeySpace.
• DROP KEYSPACE - Removes a KeySpace
• CREATE TABLE - Creates a table in a KeySpace.
• ALTER TABLE - Modifies the column properties of a table.
• DROP TABLE - Removes a table.
• TRUNCATE - Removes all the data from a table.
• CREATE INDEX - Defines a new index on a single column of a table.
• DROP INDEX - Deletes a named index.

CQL Clauses
• SELECT - This clause reads data from a table
• WHERE - The where clause is used along with select to read a specific data.
• ORDERBY - The orderby clause is used along with select to read a specific data in a specific
order.
CQL data types
CQL Type Constants
Ascii strings
BIGINT Integer. 64 bit
blob blobs
boolean booleans
counter integers
decimal integers, floats

double integers, floats


float integers, floats
frozen tuples, collections, user-defined types, (e.g) frozen <tuple
<int, tuple<text, double>>>
inet strings
int integers
list A collection of one or more ordered elements
map A JSON-style array of literals: { literal : literal, literal :
literal ... }
set A collection of one or more elements
text strings
timestamp integers, strings
timeuuid uuids
tuple Cassandra 2.1 and later. A group of 2-3 fields.
uuid uuids
varchar strings
varint integers
Collection type
A collection column is declared using the collection type, followed by another type,
such as int or text, in angle brackets. For example, you can create a table having a list of textual
elements, a list of integers, or a list of some other element types.

• list<text> list<int> Collection types cannot currently be nested. For example, you cannot
define a list within a list:
• list<list<int>> // not allowed
– In Cassandra 2.1 and later, you can create an index on a column of type map, set, or list.
Using frozen in a collection
• A frozen value serializes multiple components into a single value. Non-frozen types allow
updates to individual fields. Cassandra treats the value of a frozen type as a blob. The entire
value must be overwritten.
Note: You cannot use non-frozen collections for primary key columns. However, you
can use frozen collections for primary key columns.
column_name <collection_type><cql_type, frozen<column_name>>
For example:
CREATE TABLE mykeyspace.users (
id uuid PRIMARY KEY,
name frozen <fullname>,
direct_reports set<frozen <fullname>>, // a collection set
addresses map <text, frozen <address>> // a collection map
);
UUID and timeuuid types
The UUID (universally unique id) comparator type is used to avoid collisions in column names.
• Uuid, 32 hex digits, 0-9 or a-f, which are case-insensitive, separated by dashes, -, after the
8th, 12th, 16th, and 20th digits. For example: 01234567-0123-0123-0123-0123456789ab
• Timeuuid, Uses the time in 100 nanosecond intervals since 00:00:00.00 UTC (60 bits), a clock
sequence number for prevention of duplicates (14 bits), plus the IEEE 801 MAC address (48
bits) to generate a unique identifier. For example: d2177dd0-eaa2-11de-a572-001b779c76e3

uuid and Timeuuid functions


• dateOf() Used in a SELECT clause, this function extracts the timestamp of a timeuuid column
in a resultset. This function returns the extracted timestamp as a date. Use
unixTimestampOf() to get a raw timestamp.
• now() In the coordinator node, generates a new unique timeuuid in milliseconds when the
statement is executed. The timestamp portion of the timeuuid conforms to the UTC
(Universal Time) standard. This method is useful for inserting values. The value returned by
now() is guaranteed to be unique.
• minTimeuuid() and maxTimeuuid()Returns a UUID-like result given a conditional time
component as an argument.
For example: SELECT * FROM myTable
WHERE t > maxTimeuuid('2013-01-01 00:05+0000')
AND t < minTimeuuid('2013-02-02 10:00+0000')
Expiring data example

Both the INSERT and UPDATE commands support setting a time for data in a column to expire.
The expiration time (TTL) is set using CQL.

Use the INSERT command to set a password column in the users table to expire in 86400 seconds,
or one day.
INSERT INTO users (user_name, password) VALUES ('cbrown', 'ch@ngem4a') USING
TTL 86400;

Extend the expiration period to five days by using the UPDATE command
• UPDATE users USING TTL 432000 SET password = 'ch@ngem4a' WHERE user_name =
'cbrown';
Few Examples
KEYSPACE
CREATE KEYSPACE JPS WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};
CREATE/DROP TABLE
CREATE TABLE emp( emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,
emp_phone varint );
DROP TABLE [TABLE NAME]
ALTER TABLE
ALTER TABLE table name ADD new column datatype;
DROP COLUMN
ALTER table name DROP column name;
DESCRIBE/DESC
DESCRIBE COLUMNFAMILIES;
Executing Batch Statements Using Cqlsh
BEGIN BATCH ... INSERT INTO emp (emp_id, emp_city, emp_name, emp_phone, emp_sal)
values( 4,'Pune','rajeev',9848022331, 30000); ... UPDATE emp SET emp_sal = 50000 WHERE
emp_id =3; ... DELETE emp_city FROM emp WHERE emp_id = 2; ... APPLY BATCH;
EXAMPLES CONT…
UPDATE
UPDATE emp SET emp_city='Delhi',emp_sal=50000 WHERE emp_id=2;
SELECT
SELECT emp_name, emp_sal from emp;
DELETE
DELETE emp_sal FROM emp WHERE emp_id=3;
Using lightweight transactions
• INSERT and UPDATE statements using the IF clause, support lightweight transactions, also known
as Compare and Set (CAS).
• Register a new user.
– INSERT INTO users (login, email, name, login_count) VALUES ('jdoe', 'jdoe@abc.com', 'Jane
Doe', 1) IF NOT EXISTS;
• Perform a CAS operation against a row that does exist by adding the predicate for the operation at
the end of the query. For example, reset Jane Doe's password.
– UPDATE users SET email = ‘janedoe@abc.com’ WHERE login = 'jdoe' IF email =
‘jdoe@abc.com’;
Diff. Partition keys
Compound Primary key
CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar,
PRIMARY KEY (empID, deptID));

The empID acts as a partition key for distributing data in the table among the various nodes. The
remaining component of the primary key, the deptID, acts as a clustering mechanism and ensures
that the data is stored in ascending order on disk

Using a composite partition key

CREATE TABLE Cats ( block_id uuid, breed text, color text, short_hair boolean, PRIMARY KEY
((block_id, breed), color, short_hair) );

Cassandra will store columns having the same block_id but a different breed on different nodes,
and columns having the same block_id and breed on the same node.
Consistency
Tunable Consistency:
Availability and consistency can be tuned – Data can be made consistent across all the
nodes in a distributed database cluster.
Linearizable consistency:
Linearizable consistency is a serial (immediate) isolation level for lightweight
transactions.
Configuring consistency:
Consistency levels can be configured to manage availability versus data accuracy. You
can configure consistency on a cluster, data center, or individual I/O operation basis. Consistency
among participating nodes can be set globally and also controlled on a per-operation basis .
You can use a new cqlsh command, CONSISTENCY, to set the consistency level for queries from the
current cqlsh session. The WITH CONSISTENCY clause has been removed from CQL commands. You
set the consistency level programmatically (at the driver level).
For example,
call QueryBuilder.insertInto with a setConsistencyLevel argument. The consistency level defaults to
ONE for all write and read operations.
Write Consistency Levels
Level Description Usage
ALL A write must be written to the commit log and memtable on all replica Highest consistency and the lowest availability
nodes in the cluster for that partition.

EACH_QUORUM Strong consistency. A write must be written to the commit log and Used in multiple data center clusters to strictly maintain
memtable on a quorum of replica nodes in all data centers. consistency at the same level in each data center. For
example, if you want a read to fail when a data center is
down.

QUORUM A write must be written to the commit log and memtable on a quorum of Provides strong consistency if you can tolerate some level of
replica nodes. failure.
LOCAL_QUORUM Strong consistency. A write must be written to the commit log and Used in multiple data center clusters with a rack-aware
memtable on a quorum of replica nodes in the same data center. Avoids replica placement strategy, such
latency of inter-data center communication. as NetworkTopologyStrategy, and a properly configured
snitch.

ONE A write must be written to the commit log and memtable of at least one Satisfies the needs of most users because consistency
replica node. requirements are not stringent.
TWO A write must be written to the commit log and memtable of at least two Similar to ONE.
replica nodes.
THREE A write must be written to the commit log and memtable of at least three Similar to TWO.
replica nodes..
LOCAL_ONE A write must be sent to, and successfully acknowledged by, at least one In a multiple data center clusters, a consistency level
replica node in the local datacenter. of ONE is often desirable, but cross-DC traffic is
not. LOCAL_ONEaccomplishes this.

ANY A write must be written to at least one node. If all replica nodes for the Provides low latency and a guarantee that a write never fails.
given partition key are down, the write can still succeed after a hinted Delivers the lowest consistency and highest availability.
handoff has been written.

SERIAL Achieves linearizable consistency for lightweight transactions by preventing You cannot configure this level as a normal consistency level,
unconditional updates. configured at the driver level using the consistency level
field.
LOCAL_SERIAL Same as SERIAL but confined to the data center. A write must be written Same as SERIAL. Used for disaster recovery.
conditionally to the commit log and memtable on a quorum of replica nodes
in the same data center.
Read Consistency Levels
Level Description Usage
ALL Returns the record after all replicas have responded. The read Provides the highest consistency of all levels and the lowest availability
operation will fail if a replica does not respond. of all levels.

EACH_QUORUM Not supported for reads. Not supported for reads.

QUORUM Returns the record after a quorum of replicas has responded Ensures strong consistency if you can tolerate some level of failure.
from any data center.

LOCAL_QUORUM Returns the record after a quorum of replicas in the current Used in multiple data center clusters with a rack-aware replica
data center as the coordinator node has reported. Avoids placement strategy ( NetworkTopologyStrategy) and a properly
latency of inter-data center communication. configured snitch. Fails when using SimpleStrategy.

ONE Returns a response from the closest replica, as determined by Provides the highest availability of all the levels if you can tolerate a
the snitch. By default, a read repair runs in the background to comparatively high probability of stale data being read. The replicas
make the other replicas consistent. contacted for reads may not always have the most recent write.

TWO Returns the most recent data from two of the closest replicas. Similar to ONE.
THREE Returns the most recent data from three of the closest Similar to TWO.
replicas.
LOCAL_ONE Returns a response from the closest replica in the local data Same usage as described in the table about write consistency levels.
center.
SERIAL Allows reading the current (and possibly uncommitted) state To read the latest value of a column after a user has invoked
of data without proposing a new addition or update. If a lightweight transaction to write to the column, use SERIAL. Cassandra
a SERIALread finds an uncommitted transaction in progress, it then checks the inflight lightweight transaction for updates and, if
will commit the transaction as part of the read. Similar to found, returns the latest data.
QUORUM.
LOCAL_SERIAL Same as SERIAL, but confined to the data center. Similar to Used to achieve linearizable consistency for lightweight transactions.
LOCAL_QUORUM.

You might also like