0% found this document useful (0 votes)

24 views17 pages

Cassandra Data Model

data model for casandara db

Uploaded by

Vijayakumar Peruri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views17 pages

Cassandra Data Model

data model for casandara db

Uploaded by

Vijayakumar Peruri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Cassandra Data Model

KeySpace

1. Key space: Outer most container for data

– Attributes:
• Replication factor
• Replica placement strategy ( Simple strategy (Rack aware) , Network Topology strategy
(Data center - shared)
• Column Families – Container for ordered collection of rows. Represents the structure of
data.
– CREATE KEYSPACE [Keyspace name] WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 3};
2. Column Families

Relational Table Cassandra column Family

A schema in a relational model is fixed. Once In Cassandra, although the column families
we define certain columns for a table, while are defined, the columns are not. You can
inserting data, in every row all the columns freely add any column to any column family
must be filled at least with a null value. at any time.
Relational tables define only columns and the In Cassandra, a table contains columns, or
user fills in the table with values. can be defined as a super column family.
Column Family & Column Cont…
The following layout represents a row in a Column Family (CF):

Column : A Basic data structure. It will have 3 values. Name, Value and Time stamp.

A Typical table
Data Model Comparision
RDBMS Cassandra
RDBMS deals with structured data. Cassandra deals with unstructured data.

It has a fixed schema. Cassandra has a flexible schema.

In RDBMS, a table is an array of arrays. (ROW x In Cassandra, a table is a list of “nested key-
COLUMN) value pairs”. (ROW x COLUMN key x COLUMN
value)

Database is the outermost container that Keyspace is the outermost container that
contains data corresponding to an application. contains data corresponding to an application.

Tables are the entities of a database. Tables or column families are the entity of a
keyspace.
Row is an individual record in RDBMS. Row is a unit of replication in Cassandra.

Column represents the attributes of a relation. Column is a unit of storage in Cassandra.

RDBMS supports the concepts of foreign keys, Relationships are represented using collections.
joins.
Cassandra Query Language (CQL).
Options Usage

cqlsh --help Shows help topics about the options of cqlsh commands.

cqlsh --version Provides the version of the cqlsh you are using.

cqlsh --color Directs the shell to use colored output.

cqlsh --debug Shows additional debugging information.

cqlsh --execute Directs the shell to accept and execute a CQL command.
cql_statement

cqlsh --file= “file name” If you use this option, Cassandra executes the command in the given file and exits.

cqlsh --no-color Directs Cassandra not to use colored output.

cqlsh -u “user name” Using this option, you can authenticate a user. The default user name is: cassandra.

cqlsh-p “pass word” Using this option, you can authenticate a user with a password. The default password
is: cassandra.
CQLSH – Cont…
Documented Shell Commands
• HELP - Displays help topics for all cqlsh commands.
• CAPTURE - Captures the output of a command and adds it to a file.
• CONSISTENCY - Shows the current consistency level, or sets a new consistency level.
• COPY - Copies data to and from Cassandra.
• DESCRIBE - Describes the current cluster of Cassandra and its objects.
• EXPAND - Expands the output of a query vertically.
• EXIT - Using this command, you can terminate cqlsh.
• PAGING - Enables or disables query paging.
• SHOW - Displays the details of current cqlsh session such as Cassandra version, host, or data
type assumptions.
• SOURCE - Executes a file that contains CQL statements.
• TRACING - Enables or disables request tracing.

CQL Data Manipulation Commands

• INSERT - Adds columns for a row in a table.
• UPDATE - Updates a column of a row.
• DELETE - Deletes data from a table.
• BATCH - Executes multiple DML statements at once.
CQLSH – Cont…
CQL Data Definition Commands
• CREATE KEYSPACE - Creates a KeySpace in Cassandra.
• USE - Connects to a created KeySpace.
• ALTER KEYSPACE - Changes the properties of a KeySpace.
• DROP KEYSPACE - Removes a KeySpace
• CREATE TABLE - Creates a table in a KeySpace.
• ALTER TABLE - Modifies the column properties of a table.
• DROP TABLE - Removes a table.
• TRUNCATE - Removes all the data from a table.
• CREATE INDEX - Defines a new index on a single column of a table.
• DROP INDEX - Deletes a named index.

CQL Clauses
• SELECT - This clause reads data from a table
• WHERE - The where clause is used along with select to read a specific data.
• ORDERBY - The orderby clause is used along with select to read a specific data in a specific
order.
CQL data types
CQL Type Constants
Ascii strings
BIGINT Integer. 64 bit
blob blobs
boolean booleans
counter integers
decimal integers, floats

double integers, floats

float integers, floats
frozen tuples, collections, user-defined types, (e.g) frozen <tuple
<int, tuple<text, double>>>
inet strings
int integers
list A collection of one or more ordered elements
map A JSON-style array of literals: { literal : literal, literal :
literal ... }
set A collection of one or more elements
text strings
timestamp integers, strings
timeuuid uuids
tuple Cassandra 2.1 and later. A group of 2-3 fields.
uuid uuids
varchar strings
varint integers
Collection type
A collection column is declared using the collection type, followed by another type,
such as int or text, in angle brackets. For example, you can create a table having a list of textual
elements, a list of integers, or a list of some other element types.

• list<text> list<int> Collection types cannot currently be nested. For example, you cannot
define a list within a list:
• list<list<int>> // not allowed
– In Cassandra 2.1 and later, you can create an index on a column of type map, set, or list.
Using frozen in a collection
• A frozen value serializes multiple components into a single value. Non-frozen types allow
updates to individual fields. Cassandra treats the value of a frozen type as a blob. The entire
value must be overwritten.
Note: You cannot use non-frozen collections for primary key columns. However, you
can use frozen collections for primary key columns.
column_name <collection_type><cql_type, frozen<column_name>>
For example:
CREATE TABLE mykeyspace.users (
id uuid PRIMARY KEY,
name frozen <fullname>,
direct_reports set<frozen <fullname>>, // a collection set
addresses map <text, frozen <address>> // a collection map
);
UUID and timeuuid types
The UUID (universally unique id) comparator type is used to avoid collisions in column names.
• Uuid, 32 hex digits, 0-9 or a-f, which are case-insensitive, separated by dashes, -, after the
8th, 12th, 16th, and 20th digits. For example: 01234567-0123-0123-0123-0123456789ab
• Timeuuid, Uses the time in 100 nanosecond intervals since 00:00:00.00 UTC (60 bits), a clock
sequence number for prevention of duplicates (14 bits), plus the IEEE 801 MAC address (48
bits) to generate a unique identifier. For example: d2177dd0-eaa2-11de-a572-001b779c76e3

uuid and Timeuuid functions

• dateOf() Used in a SELECT clause, this function extracts the timestamp of a timeuuid column
in a resultset. This function returns the extracted timestamp as a date. Use
unixTimestampOf() to get a raw timestamp.
• now() In the coordinator node, generates a new unique timeuuid in milliseconds when the
statement is executed. The timestamp portion of the timeuuid conforms to the UTC
(Universal Time) standard. This method is useful for inserting values. The value returned by
now() is guaranteed to be unique.
• minTimeuuid() and maxTimeuuid()Returns a UUID-like result given a conditional time
component as an argument.
For example: SELECT * FROM myTable
WHERE t > maxTimeuuid('2013-01-01 00:05+0000')
AND t < minTimeuuid('2013-02-02 10:00+0000')
Expiring data example

Both the INSERT and UPDATE commands support setting a time for data in a column to expire.
The expiration time (TTL) is set using CQL.

Use the INSERT command to set a password column in the users table to expire in 86400 seconds,
or one day.
INSERT INTO users (user_name, password) VALUES ('cbrown', 'ch@ngem4a') USING
TTL 86400;

Extend the expiration period to five days by using the UPDATE command
• UPDATE users USING TTL 432000 SET password = 'ch@ngem4a' WHERE user_name =
'cbrown';
Few Examples
KEYSPACE
CREATE KEYSPACE JPS WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};
CREATE/DROP TABLE
CREATE TABLE emp( emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,
emp_phone varint );
DROP TABLE [TABLE NAME]
ALTER TABLE
ALTER TABLE table name ADD new column datatype;
DROP COLUMN
ALTER table name DROP column name;
DESCRIBE/DESC
DESCRIBE COLUMNFAMILIES;
Executing Batch Statements Using Cqlsh
BEGIN BATCH ... INSERT INTO emp (emp_id, emp_city, emp_name, emp_phone, emp_sal)
values( 4,'Pune','rajeev',9848022331, 30000); ... UPDATE emp SET emp_sal = 50000 WHERE
emp_id =3; ... DELETE emp_city FROM emp WHERE emp_id = 2; ... APPLY BATCH;
EXAMPLES CONT…
UPDATE
UPDATE emp SET emp_city='Delhi',emp_sal=50000 WHERE emp_id=2;
SELECT
SELECT emp_name, emp_sal from emp;
DELETE
DELETE emp_sal FROM emp WHERE emp_id=3;
Using lightweight transactions
• INSERT and UPDATE statements using the IF clause, support lightweight transactions, also known
as Compare and Set (CAS).
• Register a new user.
– INSERT INTO users (login, email, name, login_count) VALUES ('jdoe', 'jdoe@abc.com', 'Jane
Doe', 1) IF NOT EXISTS;
• Perform a CAS operation against a row that does exist by adding the predicate for the operation at
the end of the query. For example, reset Jane Doe's password.
– UPDATE users SET email = ‘janedoe@abc.com’ WHERE login = 'jdoe' IF email =
‘jdoe@abc.com’;
Diff. Partition keys
Compound Primary key
CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar,
PRIMARY KEY (empID, deptID));

The empID acts as a partition key for distributing data in the table among the various nodes. The
remaining component of the primary key, the deptID, acts as a clustering mechanism and ensures
that the data is stored in ascending order on disk

Using a composite partition key

CREATE TABLE Cats ( block_id uuid, breed text, color text, short_hair boolean, PRIMARY KEY
((block_id, breed), color, short_hair) );

Cassandra will store columns having the same block_id but a different breed on different nodes,
and columns having the same block_id and breed on the same node.
Consistency
Tunable Consistency:
Availability and consistency can be tuned – Data can be made consistent across all the
nodes in a distributed database cluster.
Linearizable consistency:
Linearizable consistency is a serial (immediate) isolation level for lightweight
transactions.
Configuring consistency:
Consistency levels can be configured to manage availability versus data accuracy. You
can configure consistency on a cluster, data center, or individual I/O operation basis. Consistency
among participating nodes can be set globally and also controlled on a per-operation basis .
You can use a new cqlsh command, CONSISTENCY, to set the consistency level for queries from the
current cqlsh session. The WITH CONSISTENCY clause has been removed from CQL commands. You
set the consistency level programmatically (at the driver level).
For example,
call QueryBuilder.insertInto with a setConsistencyLevel argument. The consistency level defaults to
ONE for all write and read operations.
Write Consistency Levels
Level Description Usage
ALL A write must be written to the commit log and memtable on all replica Highest consistency and the lowest availability
nodes in the cluster for that partition.

EACH_QUORUM Strong consistency. A write must be written to the commit log and Used in multiple data center clusters to strictly maintain
memtable on a quorum of replica nodes in all data centers. consistency at the same level in each data center. For
example, if you want a read to fail when a data center is
down.

QUORUM A write must be written to the commit log and memtable on a quorum of Provides strong consistency if you can tolerate some level of
replica nodes. failure.
LOCAL_QUORUM Strong consistency. A write must be written to the commit log and Used in multiple data center clusters with a rack-aware
memtable on a quorum of replica nodes in the same data center. Avoids replica placement strategy, such
latency of inter-data center communication. as NetworkTopologyStrategy, and a properly configured
snitch.

ONE A write must be written to the commit log and memtable of at least one Satisfies the needs of most users because consistency
replica node. requirements are not stringent.
TWO A write must be written to the commit log and memtable of at least two Similar to ONE.
replica nodes.
THREE A write must be written to the commit log and memtable of at least three Similar to TWO.
replica nodes..
LOCAL_ONE A write must be sent to, and successfully acknowledged by, at least one In a multiple data center clusters, a consistency level
replica node in the local datacenter. of ONE is often desirable, but cross-DC traffic is
not. LOCAL_ONEaccomplishes this.

ANY A write must be written to at least one node. If all replica nodes for the Provides low latency and a guarantee that a write never fails.
given partition key are down, the write can still succeed after a hinted Delivers the lowest consistency and highest availability.
handoff has been written.

SERIAL Achieves linearizable consistency for lightweight transactions by preventing You cannot configure this level as a normal consistency level,
unconditional updates. configured at the driver level using the consistency level
field.
LOCAL_SERIAL Same as SERIAL but confined to the data center. A write must be written Same as SERIAL. Used for disaster recovery.
conditionally to the commit log and memtable on a quorum of replica nodes
in the same data center.
Read Consistency Levels
Level Description Usage
ALL Returns the record after all replicas have responded. The read Provides the highest consistency of all levels and the lowest availability
operation will fail if a replica does not respond. of all levels.

EACH_QUORUM Not supported for reads. Not supported for reads.

QUORUM Returns the record after a quorum of replicas has responded Ensures strong consistency if you can tolerate some level of failure.
from any data center.

LOCAL_QUORUM Returns the record after a quorum of replicas in the current Used in multiple data center clusters with a rack-aware replica
data center as the coordinator node has reported. Avoids placement strategy ( NetworkTopologyStrategy) and a properly
latency of inter-data center communication. configured snitch. Fails when using SimpleStrategy.

ONE Returns a response from the closest replica, as determined by Provides the highest availability of all the levels if you can tolerate a
the snitch. By default, a read repair runs in the background to comparatively high probability of stale data being read. The replicas
make the other replicas consistent. contacted for reads may not always have the most recent write.

TWO Returns the most recent data from two of the closest replicas. Similar to ONE.
THREE Returns the most recent data from three of the closest Similar to TWO.
replicas.
LOCAL_ONE Returns a response from the closest replica in the local data Same usage as described in the table about write consistency levels.
center.
SERIAL Allows reading the current (and possibly uncommitted) state To read the latest value of a column after a user has invoked
of data without proposing a new addition or update. If a lightweight transaction to write to the column, use SERIAL. Cassandra
a SERIALread finds an uncommitted transaction in progress, it then checks the inflight lightweight transaction for updates and, if
will commit the transaction as part of the read. Similar to found, returns the latest data.
QUORUM.
LOCAL_SERIAL Same as SERIAL, but confined to the data center. Similar to Used to achieve linearizable consistency for lightweight transactions.
LOCAL_QUORUM.

Cassandra
No ratings yet
Cassandra
5 pages
Introduction To Cassandra
No ratings yet
Introduction To Cassandra
47 pages
Cassandra CQL Commands
No ratings yet
Cassandra CQL Commands
16 pages
Casandra
No ratings yet
Casandra
57 pages
Cassandra Database Overview
No ratings yet
Cassandra Database Overview
37 pages
Cassandra Data Base1
No ratings yet
Cassandra Data Base1
9 pages
Cassandra
No ratings yet
Cassandra
25 pages
Cqlsh-20 Update
No ratings yet
Cqlsh-20 Update
9 pages
CQL Cheatsheet
No ratings yet
CQL Cheatsheet
20 pages
Apache Cassandra: Database
No ratings yet
Apache Cassandra: Database
55 pages
Cassandra Data Modeling Guide
No ratings yet
Cassandra Data Modeling Guide
50 pages
Cassandra
No ratings yet
Cassandra
7 pages
Unit 2
No ratings yet
Unit 2
18 pages
Learn Cassandra
100% (2)
Learn Cassandra
37 pages
Cassandra
No ratings yet
Cassandra
31 pages
Features of Cassandra
No ratings yet
Features of Cassandra
6 pages
Gheribchaimal 2 Repport 2
No ratings yet
Gheribchaimal 2 Repport 2
5 pages
4 Unit
No ratings yet
4 Unit
10 pages
PR 5 - No SQL
No ratings yet
PR 5 - No SQL
9 pages
Cassandra Complete Notes
No ratings yet
Cassandra Complete Notes
5 pages
Lec 17
No ratings yet
Lec 17
21 pages
Column Oriented Database
No ratings yet
Column Oriented Database
45 pages
Cassandra Presentation Final
100% (3)
Cassandra Presentation Final
71 pages
Ch3 Nosql Wordpress
No ratings yet
Ch3 Nosql Wordpress
15 pages
Dzone Refcard 153 Apache Cassandra 2020
No ratings yet
Dzone Refcard 153 Apache Cassandra 2020
11 pages
4 Column-Family Stores Cassandra
No ratings yet
4 Column-Family Stores Cassandra
44 pages
Apache Cassandra Key-Value Database Guide
No ratings yet
Apache Cassandra Key-Value Database Guide
6 pages
Module 4
No ratings yet
Module 4
22 pages
Actporpares 2sum
No ratings yet
Actporpares 2sum
3 pages
04 Introduction To CassandraDB
No ratings yet
04 Introduction To CassandraDB
19 pages
Unit 4 UDM-1
No ratings yet
Unit 4 UDM-1
13 pages
An Overview of Apache Cassandra: Cassandra Essentials Tutorial Series
No ratings yet
An Overview of Apache Cassandra: Cassandra Essentials Tutorial Series
20 pages
Apache Cassandra: Het Patel Kajal Patel
No ratings yet
Apache Cassandra: Het Patel Kajal Patel
8 pages
ABP W5-W6 Big Data Analytics Lab-CASSANDRA
No ratings yet
ABP W5-W6 Big Data Analytics Lab-CASSANDRA
12 pages
Cassandra Data Model
No ratings yet
Cassandra Data Model
17 pages
Cassandra for Data Engineers
No ratings yet
Cassandra for Data Engineers
50 pages
Cassandra Query Language
No ratings yet
Cassandra Query Language
7 pages
Big Data 76-100
No ratings yet
Big Data 76-100
25 pages
Apache Cassandra Nosql SonuJha 04
No ratings yet
Apache Cassandra Nosql SonuJha 04
14 pages
Cassandra Quick Guide
No ratings yet
Cassandra Quick Guide
60 pages
NoSQL Apache Cassandra
No ratings yet
NoSQL Apache Cassandra
159 pages
Apache Cassandra Query Language (CQL)
No ratings yet
Apache Cassandra Query Language (CQL)
1 page
Pratham SQL 5
No ratings yet
Pratham SQL 5
20 pages
Intro To Cassandra and CQL
No ratings yet
Intro To Cassandra and CQL
29 pages
Intro To NoSQL
No ratings yet
Intro To NoSQL
18 pages
Nosql Column-Family Stores
No ratings yet
Nosql Column-Family Stores
30 pages
Become A Super Modeler
No ratings yet
Become A Super Modeler
29 pages
No SQL Lab Manual
No ratings yet
No SQL Lab Manual
19 pages
Cassandra Design Patterns - Sample Chapter
No ratings yet
Cassandra Design Patterns - Sample Chapter
32 pages
06 CassandraDB Comman
No ratings yet
06 CassandraDB Comman
29 pages
Crud Casaandra COURSERA
No ratings yet
Crud Casaandra COURSERA
6 pages
The Cassandra Data Model
No ratings yet
The Cassandra Data Model
4 pages
Cassandra Introduction
No ratings yet
Cassandra Introduction
99 pages
Introduction to Cassandra Basics
No ratings yet
Introduction to Cassandra Basics
27 pages
Cassandra Interview Questions Answers
No ratings yet
Cassandra Interview Questions Answers
10 pages
Complete PostgreSQL Documentation
No ratings yet
Complete PostgreSQL Documentation
51 pages
Class 3 Cassandra
No ratings yet
Class 3 Cassandra
64 pages
Lecture 25
No ratings yet
Lecture 25
39 pages
Introduction To NOSQL and Cassandra: @rantav @outbrain
No ratings yet
Introduction To NOSQL and Cassandra: @rantav @outbrain
60 pages
Enhance VMS-PACS Integration with PLAI
No ratings yet
Enhance VMS-PACS Integration with PLAI
6 pages
Explore - LeetCodesort
No ratings yet
Explore - LeetCodesort
3 pages
1603454931dom4 24
No ratings yet
1603454931dom4 24
3 pages
Fake News Detect
No ratings yet
Fake News Detect
4 pages
Aml
No ratings yet
Aml
12 pages
Optiplex 780 Tech Guide
No ratings yet
Optiplex 780 Tech Guide
49 pages
Advances in SIoT (Social Internet of Things) H. L. Gururaj PDF Download
No ratings yet
Advances in SIoT (Social Internet of Things) H. L. Gururaj PDF Download
55 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
7 pages
Truck Weighbridge Installation, Foundation Construction and Operation Instructions (BINCEN)
No ratings yet
Truck Weighbridge Installation, Foundation Construction and Operation Instructions (BINCEN)
18 pages
List of Computer Books in Library
100% (1)
List of Computer Books in Library
36 pages
IntractabilityI 207
No ratings yet
IntractabilityI 207
57 pages
MPEG-4 AVC H.264 Video Codec Comparison
No ratings yet
MPEG-4 AVC H.264 Video Codec Comparison
87 pages
IT Lab Manual
No ratings yet
IT Lab Manual
42 pages
Isam Fant-F DS
No ratings yet
Isam Fant-F DS
2 pages
MCA Networking Final
No ratings yet
MCA Networking Final
2 pages
Packet Tracer - Configure Named Standard Ipv4 Acls: Addressing Table
No ratings yet
Packet Tracer - Configure Named Standard Ipv4 Acls: Addressing Table
4 pages
Introduction and AVR Architecture
100% (2)
Introduction and AVR Architecture
31 pages
Alorica Intellicare Member Access User Guide
0% (1)
Alorica Intellicare Member Access User Guide
16 pages
Verilog Notes PDF
No ratings yet
Verilog Notes PDF
61 pages
AI Boosts Oil & Gas Efficiency
No ratings yet
AI Boosts Oil & Gas Efficiency
4 pages
Embedded & Microcontroller Application
No ratings yet
Embedded & Microcontroller Application
8 pages
8085 Assembly: Square Root Program
50% (2)
8085 Assembly: Square Root Program
2 pages
IOS Interview Questions For MNC's (Infosys, Cognizent, HCL, Wipro) - by Randhir Raj - Medium
No ratings yet
IOS Interview Questions For MNC's (Infosys, Cognizent, HCL, Wipro) - by Randhir Raj - Medium
14 pages
In Piston We Trust
100% (1)
In Piston We Trust
20 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
42 pages
Imperva SecureSphere Device Configuration Guide
No ratings yet
Imperva SecureSphere Device Configuration Guide
4 pages
R Rep BT.2246 4 2015 PDF e
No ratings yet
R Rep BT.2246 4 2015 PDF e
92 pages
JavaScript Variables & Data Types Guide
100% (1)
JavaScript Variables & Data Types Guide
14 pages
NIST SP 800-82r3 Ipd
100% (1)
NIST SP 800-82r3 Ipd
317 pages
Computer Knowledge: Upsc Epfo
No ratings yet
Computer Knowledge: Upsc Epfo
10 pages

Cassandra Data Model

Uploaded by

Cassandra Data Model

Uploaded by

Cassandra Data Model

1. Key space: Outer most container for data

Relational Table Cassandra column Family

It has a fixed schema. Cassandra has a flexible schema.

Column represents the attributes of a relation. Column is a unit of storage in Cassandra.

cqlsh --color Directs the shell to use colored output.

cqlsh --debug Shows additional debugging information.

cqlsh --no-color Directs Cassandra not to use colored output.

CQL Data Manipulation Commands

double integers, floats

uuid and Timeuuid functions

Using a composite partition key

EACH_QUORUM Not supported for reads. Not supported for reads.

You might also like