[go: up one dir, main page]

0% found this document useful (0 votes)
472 views1,015 pages

Dse Admin 60

Uploaded by

Johnson Ramu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
472 views1,015 pages

Dse Admin 60

Uploaded by

Johnson Ramu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1015

DSE 6.

0
Administrator Guide
Earlier DSE version
Latest 6.0 patch:
6.0.13

Updated: 2020-09-18-07:00
© 2020 DataStax, Inc. All rights reserved.
DataStax, Titan, and TitanDB are registered trademarks of DataStax,
Inc. and its subsidiaries in the United States and/or other countries.

Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks
of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.


DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

Contents
Chapter 1. Getting started................................................................................................................................... 16

About advanced functionality............................................................................................................................18

New features.....................................................................................................................................................19

Chapter 2. Release notes.....................................................................................................................................22

DSE release notes............................................................................................................................................22

DSE 6.0.13 release notes.......................................................................................................................... 22

DSE 6.0.12 release notes.......................................................................................................................... 26


DSE 6.0.11 release notes.......................................................................................................................... 29

DSE 6.0.10 release notes.......................................................................................................................... 33

DSE 6.0.9 release notes............................................................................................................................ 38

DSE 6.0.8 release notes............................................................................................................................ 39

DSE 6.0.7 release notes............................................................................................................................ 46

DSE 6.0.6 release notes............................................................................................................................ 53

DSE 6.0.5 release notes............................................................................................................................ 54

DSE 6.0.4 release notes............................................................................................................................ 62

DSE 6.0.3 release notes............................................................................................................................ 64

DSE 6.0.2 release notes............................................................................................................................ 71

DSE 6.0.1 release notes............................................................................................................................ 75

DSE 6.0.0 release notes............................................................................................................................ 81

Bulk loader release notes.................................................................................................................................99

Studio release notes......................................................................................................................................... 99

Chapter 3. Installing........................................................................................................................................... 100

Chapter 4. Configuration....................................................................................................................................101

Recommended production settings................................................................................................................ 101

Configure the chunk cache.......................................................................................................................101

Install the latest Java Virtual Machine......................................................................................................103

Synchronize clocks................................................................................................................................... 103

Set kernel parameters.............................................................................................................................. 103

Disable settings that impact performance................................................................................................ 105

Optimize disk settings...............................................................................................................................107

Set the heap size for Java garbage collection......................................................................................... 108

Check Java Hugepages settings.............................................................................................................. 108


DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

YAML and configuration properties................................................................................................................ 109

cassandra.yaml......................................................................................................................................... 109

dse.yaml.................................................................................................................................................... 141
remote.yaml...............................................................................................................................................180

cassandra-rackdc.properties..................................................................................................................... 184

cassandra-topology.properties.................................................................................................................. 184

Cloud provider snitches.................................................................................................................................. 185

Amazon EC2 single-region snitch............................................................................................................ 185

Amazon EC2 multi-region snitch.............................................................................................................. 186

Google Cloud Platform............................................................................................................................. 187

Apache CloudStack snitch........................................................................................................................188


JVM system properties................................................................................................................................... 188

Cassandra................................................................................................................................................. 189

JMX........................................................................................................................................................... 191

DSE Search.............................................................................................................................................. 191

TPC........................................................................................................................................................... 192

LDAP......................................................................................................................................................... 193

Kerberos....................................................................................................................................................194

NodeSync..................................................................................................................................................194

Choosing a compaction strategy.................................................................................................................... 194

NodeSync service........................................................................................................................................... 195

About NodeSync....................................................................................................................................... 195

Starting and stopping the NodeSync service........................................................................................... 197

Enabling NodeSync validation.................................................................................................................. 197

Tuning NodeSync validations................................................................................................................... 198

Manually starting NodeSync validation.....................................................................................................199

Using multiple network interfaces...................................................................................................................199

Configuring gossip settings.............................................................................................................................202

Configuring the heap dump directory............................................................................................................. 203

Configuring Virtual Nodes...............................................................................................................................203

Virtual node (vnode) configuration............................................................................................................203

Enabling virtual nodes on an existing production cluster......................................................................... 205

Logging configuration......................................................................................................................................205

Changing logging locations.......................................................................................................................205

Configuring logging................................................................................................................................... 206


DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

Commit log archive configuration............................................................................................................. 210

Change Data Capture (CDC) logging...................................................................................................... 211

Chapter 5. Initializing a cluster......................................................................................................................... 212


Initializing datacenters.................................................................................................................................... 212

Initializing a single datacenter per workload type.................................................................................... 213

Initializing multiple datacenters per workload type................................................................................... 218

Setting seed nodes for a single datacenter................................................................................................... 223

Use cases for listen address.......................................................................................................................... 224

Initializing single-token architecture datacenters............................................................................................ 225

Calculating tokens for single-token architecture nodes............................................................................228

Chapter 6. Security............................................................................................................................................. 234


Chapter 7. DSE advanced functionality.......................................................................................................... 235

DSE Analytics................................................................................................................................................. 235

About DSE Analytics.................................................................................................................................235

Setting the replication factor for analytics keyspaces.............................................................................. 236

DSE Analytics and Search integration..................................................................................................... 236

About DSE Analytics Solo........................................................................................................................ 238

Analyzing data using Spark......................................................................................................................239

DSEFS (DataStax Enterprise file system)................................................................................................288

DSE Search.................................................................................................................................................... 308

About DSE Search................................................................................................................................... 308

Configuring DSE Search...........................................................................................................................315

Search performance tuning and monitoring............................................................................................. 342

DSE Search operations............................................................................................................................ 347

Solr interfaces........................................................................................................................................... 352

HTTP API SolrJ and other Solr clients.....................................................................................................362

DSE Graph......................................................................................................................................................362

About DataStax Enterprise Graph............................................................................................................ 362

DSE Graph Terminology...........................................................................................................................364

DSE Graph Operations.............................................................................................................................365

DSE Graph Tools..................................................................................................................................... 372

Starting the Gremlin console.................................................................................................................... 373

DSE Graph Reference..............................................................................................................................376

DSE Management Services............................................................................................................................383

Performance Service................................................................................................................................ 383


DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

Best Practice Service................................................................................................................................432

Capacity Service....................................................................................................................................... 432

Repair Service.......................................................................................................................................... 433


DSE Advanced Replication.............................................................................................................................433

About DSE Advanced Replication............................................................................................................ 433

Architecture............................................................................................................................................... 433

Traffic between the clusters..................................................................................................................... 439

Terminology...............................................................................................................................................440

Getting started.......................................................................................................................................... 440

Keyspaces.................................................................................................................................................450

Data types.................................................................................................................................................451
Operations.................................................................................................................................................451

CQL queries..............................................................................................................................................467

Metrics.......................................................................................................................................................468

Managing invalid messages..................................................................................................................... 474

Managing audit logs................................................................................................................................. 475

dse advrep commands............................................................................................................................. 476

DSE In-Memory.............................................................................................................................................. 509

Creating or altering tables to use DSE In-Memory.................................................................................. 509

Verifying table properties.......................................................................................................................... 511

Managing memory.................................................................................................................................... 511

Backing up and restoring data................................................................................................................. 512

DSE Multi-Instance......................................................................................................................................... 512

About DSE Multi-Instance.........................................................................................................................512

DSE Multi-Instance architecture............................................................................................................... 512

Adding nodes to DSE Multi-Instance....................................................................................................... 514

DSE Multi-Instance commands................................................................................................................ 517

DSE Tiered Storage....................................................................................................................................... 518

About DSE Tiered Storage.......................................................................................................................518

Configuring DSE Tiered Storage.............................................................................................................. 519

Testing configurations............................................................................................................................... 521

Chapter 8. Tools................................................................................................................................................. 523

DSE Metrics Collector.....................................................................................................................................523

nodetool...........................................................................................................................................................523

About the nodetool utility.......................................................................................................................... 523


DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

abortrebuild............................................................................................................................................... 523

assassinate............................................................................................................................................... 524

bootstrap................................................................................................................................................... 526
cfhistograms.............................................................................................................................................. 527

cfstats........................................................................................................................................................ 527

cleanup......................................................................................................................................................527

clearsnapshot............................................................................................................................................ 529

compact.....................................................................................................................................................530

compactionhistory..................................................................................................................................... 532

compactionstats........................................................................................................................................ 537

decommission........................................................................................................................................... 538
describecluster.......................................................................................................................................... 539

describering...............................................................................................................................................541

disableautocompaction..............................................................................................................................543

disablebackup........................................................................................................................................... 544

disablebinary............................................................................................................................................. 545

disablegossip.............................................................................................................................................547

disablehandoff........................................................................................................................................... 548

disablehintsfordc....................................................................................................................................... 549

drain.......................................................................................................................................................... 551

enableautocompaction.............................................................................................................................. 552

enablebackup............................................................................................................................................ 553

enablebinary..............................................................................................................................................555

enablegossip............................................................................................................................................. 556

enablehandoff............................................................................................................................................557

enablehintsfordc........................................................................................................................................ 558

failuredetector............................................................................................................................................560

flush...........................................................................................................................................................561

garbagecollect........................................................................................................................................... 562

gcstats....................................................................................................................................................... 564

getbatchlogreplaythrottle........................................................................................................................... 566

getcachecapacity.......................................................................................................................................567

getcachekeystosave..................................................................................................................................568

getcompactionthreshold............................................................................................................................ 570

getcompactionthroughput..........................................................................................................................571
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

getconcurrentcompactors.......................................................................................................................... 572

getconcurrentviewbuilders.........................................................................................................................574

getendpoints..............................................................................................................................................575
gethintedhandoffthrottlekb.........................................................................................................................578

getinterdcstreamthroughput...................................................................................................................... 579

getlogginglevels.........................................................................................................................................580

getmaxhintwindow.....................................................................................................................................582

getseeds....................................................................................................................................................583

getsstables................................................................................................................................................ 585

getstreamthroughput................................................................................................................................. 587

gettimeout..................................................................................................................................................589
gettraceprobability..................................................................................................................................... 590

gossipinfo.................................................................................................................................................. 592

handoffwindow.......................................................................................................................................... 593

help............................................................................................................................................................595

info.............................................................................................................................................................599

inmemorystatus......................................................................................................................................... 600

invalidatecountercache..............................................................................................................................602

invalidatekeycache.................................................................................................................................... 603

invalidaterowcache....................................................................................................................................605

join.............................................................................................................................................................606

listendpointspendinghints.......................................................................................................................... 607

leaksdetection........................................................................................................................................... 609

listsnapshots..............................................................................................................................................611

mark_unrepaired....................................................................................................................................... 613

move..........................................................................................................................................................614

netstats......................................................................................................................................................616

nodesyncservice........................................................................................................................................618

pausehandoff.............................................................................................................................................629

proxyhistograms........................................................................................................................................ 631

rangekeysample........................................................................................................................................ 633

rebuild........................................................................................................................................................634

rebuild_index............................................................................................................................................. 637

rebuild_view.............................................................................................................................................. 638

refresh....................................................................................................................................................... 640
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

refreshsizeestimates................................................................................................................................. 641

reloadseeds...............................................................................................................................................643

reloadtriggers............................................................................................................................................ 644
relocatesstables........................................................................................................................................ 645

removenode.............................................................................................................................................. 647

repair......................................................................................................................................................... 649

replaybatchlog........................................................................................................................................... 652

resetlocalschema...................................................................................................................................... 654

resume...................................................................................................................................................... 655

resumehandoff.......................................................................................................................................... 656

ring............................................................................................................................................................ 657
scrub..........................................................................................................................................................659

sequence...................................................................................................................................................660

setbatchlogreplaythrottle........................................................................................................................... 663

setcachecapacity.......................................................................................................................................665

setcachekeystosave.................................................................................................................................. 666

setcompactionthreshold............................................................................................................................ 668

setcompactionthroughput.......................................................................................................................... 669

setconcurrentcompactors.......................................................................................................................... 670

setconcurrentviewbuilders......................................................................................................................... 671

sethintedhandoffthrottlekb......................................................................................................................... 672

setinterdcstreamthroughput.......................................................................................................................674

setlogginglevel...........................................................................................................................................675

setmaxhintwindow..................................................................................................................................... 677

setstreamthroughput................................................................................................................................. 679

settimeout..................................................................................................................................................680

settraceprobability..................................................................................................................................... 682

sjk.............................................................................................................................................................. 684

snapshot....................................................................................................................................................686

status.........................................................................................................................................................689

statusbackup............................................................................................................................................. 691

statusbinary............................................................................................................................................... 693

statusgossip.............................................................................................................................................. 694

statushandoff.............................................................................................................................................695

stop............................................................................................................................................................697
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

stopdaemon...............................................................................................................................................698

tablehistograms......................................................................................................................................... 700

tablestats................................................................................................................................................... 701
toppartitions...............................................................................................................................................706

tpstats........................................................................................................................................................709

truncatehints..............................................................................................................................................715

upgradesstables........................................................................................................................................ 716

verify..........................................................................................................................................................718

version.......................................................................................................................................................720

viewbuildstatus.......................................................................................................................................... 721

dse commands................................................................................................................................................722
About dse commands............................................................................................................................... 722

dse connection options............................................................................................................................. 723

add-node................................................................................................................................................... 724

advrep....................................................................................................................................................... 727

beeline.......................................................................................................................................................760

cassandra..................................................................................................................................................761

cassandra-stop..........................................................................................................................................763

exec...........................................................................................................................................................764

fs................................................................................................................................................................765

gremlin-console......................................................................................................................................... 766

hadoop fs.................................................................................................................................................. 767

list-nodes................................................................................................................................................... 767

pyspark......................................................................................................................................................768

remove-node............................................................................................................................................. 769

spark..........................................................................................................................................................771

spark-class................................................................................................................................................ 773

spark-jobserver..........................................................................................................................................774

spark-history-server...................................................................................................................................776

spark-sql....................................................................................................................................................777

spark-sql-thriftserver..................................................................................................................................778

spark-submit..............................................................................................................................................779

SparkR...................................................................................................................................................... 782

-v............................................................................................................................................................... 783

dse client-tool..................................................................................................................................................783
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

About dse client-tool................................................................................................................................. 783

client-tool connection options................................................................................................................... 784

cassandra..................................................................................................................................................786
configuration export.................................................................................................................................. 788

configuration byos-export..........................................................................................................................789

configuration import.................................................................................................................................. 791

spark..........................................................................................................................................................792

alwayson-sql..............................................................................................................................................794

nodesync......................................................................................................................................................... 796

disable....................................................................................................................................................... 798

enable........................................................................................................................................................801
help............................................................................................................................................................804

tracing........................................................................................................................................................807

validation................................................................................................................................................... 817

dsefs shell commands.................................................................................................................................... 819

append...................................................................................................................................................... 819

cat..............................................................................................................................................................820

cd...............................................................................................................................................................822

chgrp......................................................................................................................................................... 824

chmod........................................................................................................................................................825

chown........................................................................................................................................................ 827

cp...............................................................................................................................................................828

df............................................................................................................................................................... 830

du.............................................................................................................................................................. 831

echo...........................................................................................................................................................833

exit.............................................................................................................................................................834

fsck............................................................................................................................................................ 835

get............................................................................................................................................................. 836

ls................................................................................................................................................................837

mkdir..........................................................................................................................................................839

mv..............................................................................................................................................................841

put............................................................................................................................................................. 843

pwd............................................................................................................................................................845

realpath..................................................................................................................................................... 846

rename...................................................................................................................................................... 847
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

rm.............................................................................................................................................................. 848

rmdir.......................................................................................................................................................... 849

stat.............................................................................................................................................................851
truncate..................................................................................................................................................... 852

umount...................................................................................................................................................... 853

dsetool.............................................................................................................................................................854

About dsetool............................................................................................................................................ 854

Connection options................................................................................................................................... 855

core_indexing_status................................................................................................................................ 857

create_core............................................................................................................................................... 859

createsystemkey....................................................................................................................................... 862
encryptconfigvalue.................................................................................................................................... 864

get_core_config.........................................................................................................................................864

get_core_schema......................................................................................................................................865

help............................................................................................................................................................867

index_checks.............................................................................................................................................868

infer_solr_schema..................................................................................................................................... 870

inmemorystatus......................................................................................................................................... 871

insights_config...........................................................................................................................................872

insights_filters............................................................................................................................................875

list_index_files........................................................................................................................................... 877

list_core_properties................................................................................................................................... 879

list_subranges........................................................................................................................................... 880

listjt............................................................................................................................................................ 881

managekmip list........................................................................................................................................ 882

managekmip expirekey............................................................................................................................. 883

managekmip revoke..................................................................................................................................884

managekmip destroy.................................................................................................................................885

node_health...............................................................................................................................................886

partitioner.................................................................................................................................................. 887

perf............................................................................................................................................................ 888

read_resource........................................................................................................................................... 891

rebuild_indexes......................................................................................................................................... 892

reload_core............................................................................................................................................... 894

ring............................................................................................................................................................ 896
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

set_core_property..................................................................................................................................... 897

sparkmaster cleanup.................................................................................................................................899

sparkworker restart................................................................................................................................... 900


status.........................................................................................................................................................901

stop_core_reindex.....................................................................................................................................902

tieredtablestats.......................................................................................................................................... 903

tsreload......................................................................................................................................................905

unload_core...............................................................................................................................................906

upgrade_index_files.................................................................................................................................. 907

write_resource...........................................................................................................................................908

Stress tools..................................................................................................................................................... 909


cassandra-stress tool................................................................................................................................ 909

Interpreting the output of cassandra-stress..............................................................................................919

fs-stress tool..............................................................................................................................................920

SSTable utilities.............................................................................................................................................. 921

About SSTable tools................................................................................................................................. 921

sstabledowngrade..................................................................................................................................... 922

sstabledump.............................................................................................................................................. 924

sstableexpiredblockers.............................................................................................................................. 930

sstablelevelreset........................................................................................................................................931

sstableloader............................................................................................................................................. 933

sstablemetadata........................................................................................................................................ 935

sstableofflinerelevel...................................................................................................................................939

sstablepartitions........................................................................................................................................ 941

sstablerepairedset..................................................................................................................................... 944

sstablescrub.............................................................................................................................................. 946

sstablesplit.................................................................................................................................................948

sstableupgrade..........................................................................................................................................950

sstableutil.................................................................................................................................................. 951

sstableverify.............................................................................................................................................. 953

DataStax tools.................................................................................................................................................954

Preflight check tool......................................................................................................................................... 955

cluster_check and yaml_diff tools.................................................................................................................. 956

Chapter 9. Operations........................................................................................................................................ 957

Starting and stopping DSE............................................................................................................................. 957


DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

Starting as a service.................................................................................................................................957

Starting as a stand-alone process............................................................................................................959

Stopping a node....................................................................................................................................... 961


Adding or removing nodes, datacenters, or clusters......................................................................................962

Adding nodes to vnode-enabled cluster................................................................................................... 962

Adding a datacenter to a cluster.............................................................................................................. 963

Adding a datacenter to a cluster using a designated datacenter as a data source..................................968

Replacing a dead node or dead seed node.............................................................................................972

Replacing a running node........................................................................................................................ 975

Moving a node from one rack to another.................................................................................................976

Decommissioning a datacenter................................................................................................................ 977


Removing a node..................................................................................................................................... 979

Changing the IP address of a node......................................................................................................... 980

Switching snitches.................................................................................................................................... 981

Changing keyspace replication strategy...................................................................................................982

Migrating or renaming a cluster................................................................................................................983

Adding single-token nodes to a cluster.................................................................................................... 984

Adding a datacenter to a single-token architecture cluster...................................................................... 985

Replacing a dead node in a single-token architecture cluster................................................................. 986

Backing up and restoring data....................................................................................................................... 989

About snapshots....................................................................................................................................... 989

Taking a snapshot.................................................................................................................................... 989

Deleting snapshot files..............................................................................................................................990

Enabling incremental backups..................................................................................................................991

Restoring from a snapshot....................................................................................................................... 991

Restoring a snapshot into a new cluster..................................................................................................992

Recovering from a single disk failure using JBOD...................................................................................993

Repairing nodes..............................................................................................................................................995

Manual repair: Anti-entropy repair............................................................................................................ 995

When to run anti-entropy repair............................................................................................................... 998

Changing repair strategies........................................................................................................................999

Monitoring a DSE cluster..............................................................................................................................1001

Tuning the database..................................................................................................................................... 1001

Tuning Java Virtual Machine.................................................................................................................. 1001

Tuning Bloom filters................................................................................................................................ 1006


DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13

Configuring memtable thresholds........................................................................................................... 1007

Data caching................................................................................................................................................. 1007

Configuring data caches......................................................................................................................... 1007


Monitoring and adjusting caching........................................................................................................... 1009

Compacting and compressing...................................................................................................................... 1009

Configuring compaction.......................................................................................................................... 1009

Compression........................................................................................................................................... 1010

Testing compaction and compression.................................................................................................... 1011

Migrating data to DSE.................................................................................................................................. 1012

Collecting node health and indexing scores.................................................................................................1012

Clearing data from DSE............................................................................................................................... 1014


Chapter 10. Planning........................................................................................................................................ 1015
Chapter 1. Getting started with DataStax
Enterprise 6.0
Information about using DataStax Enterprise for Administrators.
This topic provides basic information and a roadmap to documentation for System Administrators new to DataStax
Enterprise.
Which product?
DataStax Offerings provides basic information to help you choose which product best fits your requirements.
Learn
Before diving into administration tasks, you can save a lot of time when setting up and operating DataStax
Enterprise (DSE) in a production environment by learning a few basics first:

• DataStax Enterprise-based applications and clusters are much different than relational databases and use
a data model based on the types of queries, not on modeling entities and relationships. Architecture in brief
contains key concepts and terminology for understanding the database.

• You can use DSE OpsCenter and Lifecycle Manager for most administrative tasks.

• Save yourself some time and frustration by spending a few moments looking at DataStax Doc and Search tips.
These short topics talk about navigation and bookmarking aids that will make your journey through the docs
more efficient and productive.

The following are not administrator specific but are presented to give you a fuller picture of the database:

• Cassandra Query Language (CQL) is the query language for DataStax Enterprise.

• DataStax provides drivers in several programming languages for connecting client applications to the
database.

• APIs are available for OpsCenter, DseGraphFrame, DataStax Spark Cassandra Connector, and the drivers.

Plan
The Planning and testing guide contains guidelines for capacity planning and hardware selection in production
environments. Key topics include:

• Estimating disk capacity

• Estimating RAM

• CPU recommendations

Install
DataStax offers a variety of ways to set up a cluster:
Cloud

• Google Cloud Platform (GCP) Marketplace | Google Deployment Guide

• Microsoft Azure Marketplace | Azure Deployment Guide

• AWS Quick Start | Amazon Deployment Guide

On premises

• Installing and deploying DSE using Lifecycle Manager

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
16
Getting started with DataStax Enterprise 6.0

• Packages for Yum- and Debian-based platforms

• Docker images

• Binary tarball

• Deployment per workload type

For help with choosing an install type, see Which install method should I use?
Secure
DSE Advanced Security provides fine-grained user and access controls to keep applications data protected and
compliant with regulatory standards like PCI, SOX, HIPAA, and the European Union’s General Data Protection
Regulation (GDPR). Key topics include:

• Create database users and roles

• Set up and configure LDAP access

• Configure database permissions

The DSE database includes the default role cassandra with password cassandra. This is a superuser login has
full access to the database. DataStax recommends only using the cassandra role once during initial Role Based
Access Control (RBAC) set up to establish your own root account and then disabling the cassandra role. See
Adding a superuser login.

Tune
Important topics for optimizing the performance of the database include:

• Recommended production settings

• Tuning the Java Virtual Machine

• Enable the Nodesync service (continuous background repair)

• Load test your cluster before deployment

Operations
The most commonly used operations include:

• Starting and stopping DataStax Enterprise per workload type.

• Backup and recovery

• Adding or removing nodes, datacenters, or clusters

• Moving a node from one rack to another

• Tools

Load
The primary tools for getting data into and out of the database are:

• DataStax Bulk Loader

• DataStax Apache Kafka Connector

• DSE Graph Loader

For other methods, see Migrating data to DataStax Enterprise.


Monitor
DataStax provides the following tools to monitor clusters and view metrics:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
17
Getting started with DataStax Enterprise 6.0

• DSE OpsCenter

• DSE Metrics Collector

Troubleshooting

• Support Knowledge Base

• Troubleshooting guide

• Submit a support ticket (registered customers)

Upgrading
Key topics in the Upgrade Guide include:

• Upgrading from earlier DSE releases

• Patch release upgrades

• Upgrading from Apache Cassandra to DataStax Enterprise

Advanced Functionality
See Advanced functionality in DataStax Enterprise 6.0.

Advanced functionality in DataStax Enterprise 6.0


Brief descriptions of the advanced functionality in DataStax Enterprise 6.0.
DataStax Enterprise (DSE) version 6.0 is the industry's best distributed cloud database designed for hybrid cloud.
Easily deploy the only active-everywhere database platform that runs wherever needed: on-premises, across
regions or clouds. Benefit from all the capabilities of the best distribution of Apache Cassandra™ with enterprise
tooling and expert support required for production cloud applications.
DSE Analytics
Built on a production-certified version of Apache Spark™, with enhanced capabilities like AlwaysOn SQL
for process streaming and historical data at cloud scale.
DSE Search
Provides powerful search and indexing capabilities, including support for full-text, relevancy, sub-string,
and fuzzy queries over large data sets, aggregation, and geospatial matchups.
DSE Graph
DSE Graph is optimized for storing billions of items and their relationships to enable you to identify and
analyze hidden relationships between connected data and build powerful modern applications for real-
time use cases: fraud detection, customer 360, social networks, IoT, and recommendation systems. The
DSE Graph Quick Start is a great place to get started.
DSE OpsCenter
Provides visual management and monitoring for DataStax Enterprise, including automatic backups,
reduced manual operations, automatic failover, patch release upgrades, and secure management of
DSE clusters on-premises, in the cloud, or in hybrid environments that span multiple data centers.
Lifecycle Manager
A visual provisioning and monitoring tool for DataStax Enterprise clusters. LCM allows you to define
the cluster configuration including datacenter, node topology, and security. LCM monitoring helps you
troubleshoot installation, configuration, and upgrade jobs.
DSE Advanced Security
Provides fine-grained user and access controls to keep applications data protected and compliance
with regulatory standards like PCI, SOX, HIPAA, and the European Union’s General Data Protection
Regulation (GDPR).
DSE Metrics Collector
Aggregates DSE metrics and integrates with existing monitoring solutions to facilitate problem resolution
and remediation.
DSE Management Services

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
18
Getting started with DataStax Enterprise 6.0

DSE Management Services automatically handle administration and maintenance tasks and assist with
overall database cluster management.
NodeSync service
Continuous background repair that virtually eliminates manual efforts to run repair operations in a
DataStax cluster.
Advanced Replication
Advanced Replication allows a single cluster to have a primary hub with multiple spokes. This allows
configurable, bi-directional distributed data replication to and from source and destination clusters.
DSE In-Memory
Store and access data exclusively from memory.
DSE Multi-Instance
Run multiple DataStax Enterprise nodes on a single host machine.
DSE Tiered Storage
Automate data movement across different types of storage media.

DataStax Enterprise 6.0 new features


DataStax Enterprise, built on Apache Cassandra™, powers the Right-Now Enterprise with an always-on,
distributed cloud database designed for hybrid cloud. DataStax Enterprise (DSE) 6.0 dramatically increases
performance and eases operational management with new features and enhancements.
Be sure to read the DataStax Enterprise 6.0 release notes.

Feature Description

NodeSync DSE NodeSync removes the need for manual repair operations in DSE's distribution of Cassandra and eliminates
cluster outages that are attributed to manual repair failures. This equates to operational cost savings, reduced support
cycles, and reduced application management pain. NodeSync also makes applications run more predictably, making
capacity planning easier. NodeSync’s advantages for operational simplicity extend across the whole data layer
including database, search, and analytics.
Be sure to read the DSE NodeSync: Operational Simplicity at its Best blog.

Advanced Performance DSE Advanced Performance delivers numerous performance advantages over open-source Apache Cassandra
including:

• Thread per core (TPC) and asynchronous architecture: A coordination-free design, DSE’s thread-per-core
architecture provides up to 2x more throughput for read and write operations.

• Storage engine optimizations that provide up to half the latency of open source Cassandra and include optimized
compaction.

• DataStax Bulk Loader Up to 4x faster loads and unloads of data over current data loading utilities. Up to 4 times
faster than current data loading utilities. Be sure to read the Introducing DataStax Bulk Loader blog.

• Continuous paging improves DSE Analytics read performance by up to 3x over open source Apache Cassandra
and Apache Spark.

Be sure to read the DSE Advanced Performance blog.

DSE TrafficControl DSE TrafficControl provides a backpressure mechanism to avoid overloading DSE nodes with client or replica
requests that could make DSE nodes unresponsive or lead to long garbage collections and out of memory errors. DSE
TrafficControl is enabled by default and comes pre-tuned to accommodate very different workloads, from simple reads
and writes to the most extreme workloads. It requires no configuration.

Automated Upgrades for Part of OpsCenter LifeCycle Manager, the Upgrade Service handles patch upgrades of DSE clusters at the data center,
patch releases rack, or node level with up to 60% less manual involvement. The Upgrade Service allows you to easily clone your
existing configuration profile to ensure compatibility with DSE upgrades. Be sure to read the Taking the Pain Out of
Database Upgrades blog.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
19
Getting started with DataStax Enterprise 6.0

Feature Description

DSE Analytics New features in DSE Analytics include:

• AlwaysOn SQL with advanced security, ensures around-the-clock uptime for analytics queries with the freshest,
secure insight. It is interoperable with existing business intelligence tools that utilize ODBC/JDBC and other Spark-
based tools. Be sure to read the Introducing AlwaysOn SQL for DSE Analytics blog.

• Structured Streaming simple, efficient, and robust streaming of data from Apache Kafka, file systems, or other
sources.

• Enhanced Spark SQL support allows you to execute Spark queries using a variation of the SQL language. Spark
SQL includes APIs for returning Spark Datasets in Scala and Java, and interactively using an SQL shell or visually
through DataStax Studio notebooks.

Be sure to read the What’s New for DataStax Enterprise Analytics 6 blog.

DSE Graph New features in DSE Graph include:

• Better throughput for DSE Graph due to Advanced Performance improvements, resulting in DSE Graph handling
more requests per node.

• Smart Analytics Query Routing: the DSE Graph engine automatically routes a Gremlin OLAP traversal to the
correct implementation (DSE Graph Frames or Gremlin OLAP) for the fastest and best execution.

• Advanced Schema Management provides the ability to remove any graph schema element, not just vertex labels
or properties.

• The Batches in DSE Graph Fluent API adds the ability to execute DSE Graph statements in batches to speeds up
writes to DSE Graph.

• TinkerPop 3.3.0. DataStax has added a lot of great enhancements to the Apache TinkerPopTM tool suite.
Enhancements have proved faster, more robust graph querying and provided a better developer experience.

Be sure to read the What’s New in DSE Graph 6 blog.

DSE Security New security features include:

• Private Schemas: Control who can see what parts of a table definition, critical for security compliance best
practices.

• Separation of Duties: Create administrator roles who can carry out everyday administrative tasks without having
unnecessary access to data.

• Auditing by Role: Focus your audits on the users you need to scrutinize. You can now elect to audit activity by user
type and increase the signal to noise ratio by removing application tier system accounts from the audit trail.

• Unified Authorization for DSE Analytics: Additional protection for data used for analytics operations.

Be sure to read the Safe data? Check. DataStax Enterprise Advanced Security blog.

DSE Search Built with a production-certified version of Apache Solr™ 6, DSE Search requires less configuration, improved search
data consistency, and a more synchronous write path for indexing data with less moving pieces to tune and monitor.
DSE 5.1 introduced index management CQL and cqlsh commands to streamline operations and development. DSE 6.0
adds a wider array of CQL query functionality and indexing support.
Be sure to read the What’s New for Search in DSE 6 blog.

Drivers DataStax drivers are updated for DSE 6.0, including:

• The Batches in DSE Graph Fluent API adds the ability to execute DSE Graph statements in batches to speed up
writes to DSE Graph.

• The C# and Node.js DataStax drivers include Batches in DSE Graph Fluent API, as well as the Java and Python
drivers.

Be sure to read the What’s New With Drivers for DSE 6 blog.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
20
Getting started with DataStax Enterprise 6.0

Feature Description

DataStax Studio Improvements to DSE Studio further ease DSE development include:

• Notebook Sharing: Easily collaborate with your colleagues to develop DSE applications using the new import and
export capabilities.

• Spark SQL support: Query and analyze data with Spark SQL using DataStax Studio's visual and intelligent
notebooks, which provide syntax highlighting, auto-code completion and correction, and more.

• Interactive Graphs: explore and configure DSE Graph schemas with a whiteboard-like view that allows you to drag
your vertices and edges.

• Notebook History: provides a historical dated record with descriptions and change events that makes it easy to
track and rollback changes.

Be sure to read the Announcing DataStax Studio 6 blog.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
21
Chapter 2. DataStax Enterprise release notes
Release notes for DataStax Enterprise 6.0.

DataStax Enterprise 6.0 release notes


DataStax Enterprise release notes cover cluster requirements, upgrade guidance, components, security updates,
changes and enhancements, issues, and resolved issues for DataStax Enterprise (DSE) 6.0.x.
Each point release includes a highlights and executive summary section to provide guidance and add visibility
to important improvements.

Requirement for Uniform Licensing


All nodes in each DataStax licensed cluster must be uniformly licensed to use the same subscription. For
example, if a cluster contains five nodes, all five nodes within that cluster must be DSE. Mixing different
subscriptions within a cluster is not permitted. The DataStax Advanced Workloads Pack may be added to any
DSE cluster in an incremental fashion. For example, a 10-node DSE cluster may be extended to include three
nodes of the Advanced Workloads Pack. “Cluster” means a collection of nodes running the software which
communicate with one another using gossip. See Enterprise Terms.
Before you upgrade

Upgrade advice Compatibility

Before you upgrade to a later major version, upgrade to the latest patch Upgrades to DSE 6.0 are supported from:
release (6.0.13) on your current version. Be sure to read the relevant
upgrade documentation. • DSE 5.1

• DSE 5.0

Check the compatibility page for your products. DSE 6.0 product compatibility:

• OpsCenter 6.5

• Studio 6.0

See Upgrading DataStax drivers. DataStax Drivers: You may need to recompile your client application
code.

Use DataStax Bulk Loader for loading and unloading data. Loads data into DSE 5.0 or later and unloads data from any Apache
Cassandra™ 2.1 or later data source.

DSE 6.0.13 release notes


26 August 2020
In this section:

• 6.0.13 Components

• Cassandra enhancements for DSE 6.0.13

• General upgrade advice for DSE 6.0.13

• TinkerPop changes for DSE 6.0.13

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
22
DataStax Enterprise release notes

In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.13 Components

All components from DSE 6.0.13 are listed. Components that are updated for DSE 6.0.11 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2793

• Apache Spark™ 2.2.3.13

• Apache TinkerPop™ 3.3.7-20190521-f71ce0d7

• Apache Tomcat® 8.0.53

• DSE Java Driver 1.6.10

• Netty 4.1.25.7.dse

• Spark JobServer 0.8.0.45.2

• TinkerPop 3.3.7 with production-certified changes

DSE 6.0.13 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements if any.

DataStax recommends upgrading all DSE Search nodes to DSE 6.0.13 or later.

6.0.13 DSE core

Changes and enhancements:

• Fixed StackOverflowError thrown during read repairs (only large clusters or clusters with enabled vnodes
are affected). (DB-4350)

• Increased default direct_reads_size_in_mb value. Previously it was 2M per core + 2M shared. It is now
4M per core + 4M shared. (DB-4348)

• Slow indexing at bootstrap time due to early TPC boundaries computation when node is replaced by a node
with the same IP (DB-4049)

• Fixed a problem with the treatment of zeroes in the type decimal that could cause assertion errors, or not
being able to find some rows if their key is 0 written using different precisions, or both. (DB-4472)

• CQLSH can be run with Python 2 or 3. (DB-4151)

• Fixed the NullPointerException issue described in CASSANDRA-14200: NPE when dumping an SSTable
with null value for timestamp column. (DB-4512)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
23
DataStax Enterprise release notes

• Fix an issue that was causing excessive contention during encryption/decryption operations. The fix results
in an encryption/decryption performance improvement. (DB-4419)

• A new configuration option in cassandra.yaml was added: snapshot_before_dropping_column which is


false by default. When enabled, every time the user drops a column/columns from a table, a snapshot will
be created on each node in the cluster before the change in schema is applied.

• Fixed an issue to prevent an unbounded number of flushing tasks for memtables that are almost empty.
(DB-4376)

• Global BloomFilterFalseRatio is now calculated in the same way as table BloomFilterFalseRatio. Now
both types of metrics include true negatives, the formula is ratio = falsePositiveCount / (truePositiveCount +
falsePositiveCount + trueNegativeCount). (DB-4439)

• Fixed a bug whereby after a node replacement procedure. the bootstrap indexing in DSE Search was
happening only on one TPC core. (DB-4049)

• DNS Service Discovery is now a part of the DSE/LDAP integration. (DSP-11450)

• Systemd units are included for DSE packages for CentOS and compatible OSes. (DSP-7603)

• The server_host option in dse.yaml now handles mutiple, comma separated LDAP server addresses.
(DSP-20833)

• Cassandra tools now work on encrypted SSTables when security is configured. (DSP-20940)

• Workaround for LOGBACK-1194 - explicit scanPeriod added to logback.xml. (DSP-17911)

• Recording a slow CQL query to the log will no longer block the thread. (DSP-20894)

• Added entries to jvm.options to assist with capturing thread dumps. (DSP-20778)

• The frequency of range queries performed by lease manager is now configurable via
dse.lease.refresh.interval.seconds system property (an addition to JMX and dsetool command)
(DSP-20696)

• Security updates:

# Fixed a CVE-2019-20444 issue in which HttpObjectDecoder.java in Netty, before 4.1.44, allowed an


HTTP header that lacked a colon. (DB-4068)

# The jackson-databind library has been upgraded to 2.9.10.4 to address a Jackson databind vulnerability
(CVE-2020-8840) (DSP-20981)

# DNS Service Discovery is now a part of the DSE/LDAP integration. (DSP-11450)

# Fixed some security vulnerabilities for Solr HTTP REST API when authorization is enabled. Now, users
with no appropriate permissions can perform search operations. Resources can be deleted when
authorization is enabled, given the correct permissions. (DSP-20749)

# Fixed an issue where the audit logging did not capture search queries. (DSP-21058)

# There are two new LDAP options in dse.yaml - extra_user_search_bases and


extra_group_search_bases where the user can define additional search bases for users and groups
respectively. For users, if the user is not found in one search base all other bases are searched. For
groups, groups found in all defined search bases are merged. (DSP-12612)

# While there is no change in default behavior, there is a new render_cql_literals option in dse.yaml
under the audit logging section, which is false by default. When enabled, bound variables for logged
statements will be rendered as CQL literals, which means there will be additional quotation marks and
escaping, as well as values of all complex types (collections, tuples, udts) will be in human readable
format. (DSP-17032)

# Fixed LDAP settings to properly handle nested groups so that LDAP enumerates all ancestors of a
user's distinguishedName. Inherited groups retrieval with directory_search and members_search types.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
24
DataStax Enterprise release notes

Fixed fetching parent groups of a role that's mapped to an LDAP group. See new dse.yaml options,
all_groups_xxx in ldap_options, to configure optimized retrieval of parent groups, including inherited
ones, in a single roundtrip. (DSP-20107)

# When DSE tries one authentication scheme and finds that the password is invalid, DSE now tries
another scheme, but only if the user has a scheme permission for that other scheme. (DSP-20903)

# Raised the upper bound limit on DSE LDAP caches. The upper limit for
ldap_options.credentials_validity_in_ms has been increased to 864,000,000 ms, which is
10 days. The upper limit for ldap_options.search_validity_in_seconds has been increased to
864,000 seconds, which is 10 days. (DSP-21072)

# Fixed an error condition when DSE failed to get the LDAP roles while refreshing a database schema.
(DSP-21075)

6.0.13 DSE Advanced Replication

Changes and enhancements:

• Advanced Replication's OutOfMemoryErrors caused by Roaring bitmap deserialization. (DSP-15675)

6.0.13 DSEFS

Changes and enhancements:

• To minimize fsck impact on overloaded clusters, throttling is possible via -p or --parallelism arguments.

• Backported DSP-15762: optimize remove-recursive implementation, lowering the tombstone impact on


Spark jobs. (DSP-20750)

• The byos-export command exports dsefs configuration for AbstractFileSystem (DSP-20906)

• Fixed an issue where an excessive number of connections are created to port 5599 when using DSEFS.
(DSP-21021)

• Fixed excessive allocation when running fsck on DSEFS volumes. (DSP-21246)

6.0.13 DSE Search

Changes and enhancements:

• Search-related latency metrics will now decay in time like other metrics. Named queries (using query.name
parameter) will now have separate latency metrics. New mbean atributes are available for search
latency metrics: TotalLatency (us), Min, Max, Mean, StdDev, DurationUnit, MeanRate, OneMinuteRate,
FiveMinuteRate, FifteenMinuteRate, RateUnit, 98th, 999th. (DSP-19612)

• Significantly reduced the time to (re)load encrypted search cores. (DSP-20692)

• Fixed some security vulnerabilities for Solr HTTP REST API when authorization is enabled. Now, users with
no appropriate permissions can perform search operations. Resources can be deleted when authorization is
enabled, given the correct permissions. (DSP-20749)

• Fixed a bug where a decryption block cache occasionally was not operational (SOLR-14498). (DSP-20987)

• Fixed an issue where the audit logging did not capture search queries. (DSP-21058)

• Fixed a bug where after several months of up time an encrypted index wouldn't accept more writes unless
the core is reloaded. (DSP-21234)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
25
DataStax Enterprise release notes

Cassandra enhancements for DSE 6.0.13


DataStax Enterprise 6.0.13 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases.
General upgrade advice for DSE 6.0.13
DataStax Enterprise 6.0.13 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.13
DataStax Enterprise (DSE) 6.0.13 includes TinkerPop 3.3.7 with all DataStax enhancements from earlier
versions. See the TinkerPop upgrade documentation.
DSE 6.0.12 release notes
4 May 2020
In this section:

• 6.0.12 Components

• Cassandra enhancements for DSE 6.0.12

• General upgrade advice for DSE 6.0.12

• TinkerPop changes for DSE 6.0.12

Table 1: DSE functionality


6.0.12 DSE core 6.0.12 DSE Advanced Replication

6.0.12 DSE Analytics 6.0.12 DSEFS

6.0.12 DSE Graph 6.0.12 DSE Search

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.12 Components

All components from DSE 6.0.12 are listed. Components that are updated for DSE 6.0.11 are indicated with an
asterisk (*).

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
26
DataStax Enterprise release notes

• Apache Solr™6.0.1.1.2716

• Apache Spark™2.2.3.13

• Apache Tomcat® 8.0.53

• DataStax Bulk Loader 1.2.1

• DSE Java Driver 1.6.10

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.25.6.dse

• Spark Jobserver 0.8.0.45.2 DSE custom version

• TinkerPop 3.3.7 with production-certified changes

For a full list, see DataStax Enterprise 6.0.12 third-party software.


DSE 6.0.12 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.

DataStax recommends upgrading all DSE Search nodes to DSE 6.0.12 or later.

6.0.12 DSE core

Changes and enhancements:

• Added hostname_verification to ldap_options in dse.yaml. (DSP-20302)

• The frequency of range queries performed by lease manager is now configurable via JMX and dsetool
command. (DSP-20696)

• Added dse.ldap.retry_interval.ms system property, which sets the time between subsequent retries
when trying authentication using LDAP server. (DSP-20298)

• Removed Jodd Core dependency that created vulnerability to Arbitrary File Writes. (DSP-19206)

• Added a new JMX attribute of ConnectionSearchPassword for LdapAuthenticator bean has been added,
which updates the LDAP search password without the need to restart DSE. (DSP-18928)

• dsetool ring shows in-progress search index building during bootstrap. (DSP-15281)

• Made the search reference visible in the error message for LDAP connections. (DSP-20578)

• DecayingEstimatedHistogram now decays even when there are no updates so invalid metric values do not
linger. (DSP-20674)

• Added functionality to query role_stats when stats is enabled under role_management_options in


dse.yaml. (DB-4283)

• The replica side filtering dtests test_update_on_wide_table and


test_complementary_update_with_limit_on_static_column_with_not_empty_partitions are more
reliable. (DB-4043)

• Nodesync can now be enabled on all system distributed and protected tables. (DB-3241)

• Improved the estimated values of histogram percentiles reported via JMX. In some cases, the percentiles
may go slightly up. (DB-4275)

• Added anticompaction to nodetool stop command help menu. (DB-3821)

• Added --disable-history option to cqlsh that disables saving history to disk for current execution. Added
history section to cqlshrc which is called with boolean parameter disabled that is set to False by default.
(DB-3843)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
27
DataStax Enterprise release notes

• Improved error messaging for enabled internode SSL encryption in Cassandra Tools test suite. (DB-3957)

• Removed the serialization header partition/clustering key validation (DB-4111)

• Security updates:

# Upgraded Jackson Core and Jackson Mapper to address CVE-2019-10172. (DSP-20073)

Resolved issues:

• LDAP cursors leak. (DSP-20623)

• Bug that prevented LIST ROLES and LIST USERS to work with system-keyspace-filtering enabled.
(DB-4221)

• Continuous paging sessions could leak if the continuous result sets on the driver side were not exhausted or
cancelled. (DB-4313)

• Potentially incorrect dropped messages in case of time drifts on a machine. (DB-3891)

• Read inconsistencies. (CASSANDRA-12126) (DB-3873)

• Error that caused nodetool viewbuildstatus to return an incorrect error message. (DB-2397)

6.0.12 DSE Advanced Replication

Resolved issues:

• Advanced Replication's OutOfMemoryErrors caused by Roaring bitmap deserialization. (DSP-15675)

6.0.12 DSE Analytics

Changes and enhancements:

• Internal continuous paging sessions were not closed when LIMIT clause was added in SQL query, which
caused sessions leak and inability to close the Spark application gracefully because the Java driver waited
indefinitely for orphaned sessions to finish. (DSP-19804)

• Removed Jodd Core dependency that created vulnerability to Arbitrary File Writes. (DSP-19206)

• Added spark.cassandra.session.consistency.level parameter to Spark Connector. Set


HiveMetaStore default consistency level to LOCAL_QUORUM instead of ONE. (DSP-19982)

• During Spark Application startup, Exception: java.lang.ExceptionInInitializerError thrown from


the UncaughtExceptionHandler in thread "main" was logged, sometimes instead of a meaningful
error. (DSP-20474)

• Security updates:

# Patched hive with HIVE-13390 to fix CVE-2016-3083. (DSP-20612)

6.0.12 DSEFS

Changes and enhancements:

• DSEFS local file system implementation returns alphabetically sorted directories and files when using
wildcards and listing command. (DSP-20057)

• When creating a file through WebHDFS API, DSEFS does not verify WX permissions of parent's parent
when the parent exists. (DSP-20355)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
28
DataStax Enterprise release notes

• DSEFS internode encrypted communication doesn't fail when


server_encryption_options.require_endpoint_verification is enabled. (DSP-20689)

Resolved issues:

• DSEFS cannot use Mixed Case keyspaces, which was broken by DSP-16825. (DSP-20354)

6.0.12 DSE Graph

Changes and enhancements:

• Exposed configuration and metrics for Gremlin query cache. (DSP-20240)

• Changed classic Graph query so vertices are read from _p tables in Cassandra using SELECT ... WHERE
<vertex primary key columns> statement. The search predicate is applied in memory. (DSP-20230)

6.0.12 DSE Search

Changes and enhancements:

• Error messages related to Solr errors contain better description of the root cause. (DSP-13792)

• The dsetool stop_core_reindex command now mentions the node in the output message. (DSP-17090)

• Added indexing reason to output of dsetool core_indexing_status command. (DSP-17672)

• Improved warnings for search index creation via dsetool or CQL. (DSP-17994)

• Improved guidance with warnings when index rebuild is required for ALTER SEARCH INDEX, RELOAD SEARCH
INDEX, and dsetool reload_core commands. (DSP-19347)

• Improved real-time search to fix a docValues bug. (DSP-20300)

• suggest request handler requires select permission. Previously, suggest request handler returned
forbidden response when authorization was on, regardless of the user permissions. (DSP-20697)

• Security update:

# Upgraded Apache Solr to address CVE-2018-8026. (DSP-16653)

Cassandra enhancements for DSE 6.0.12


DataStax Enterprise 6.0.12 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases.
General upgrade advice for DSE 6.0.12
DataStax Enterprise 6.0.11 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.12
DataStax Enterprise (DSE) 6.0.12 includes TinkerPop 3.3.7 with all DataStax enhancements from earlier
versions. See the TinkerPop upgrade documentation.
DSE 6.0.11 release notes
10 December 2019
In this section:

• 6.0.11 Components

• DSE 6.0.11 Highlights

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
29
DataStax Enterprise release notes

• Cassandra enhancements for DSE 6.0.11

• General upgrade advice for DSE 6.0.11

• TinkerPop changes for DSE 6.0.11

Table 2: DSE functionality


6.0.11 DSE core 6.0.11 DSE Search

6.0.11 DSE Analytics

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.11 Components

All components from DSE 6.0.11 are listed. Components that are updated for DSE 6.0.10 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2642 *

• Apache Spark™ 2.2.3.9 *

• Apache Tomcat® 8.0.53

• DataStax Bulk Loader 1.2.1

• DSE Java Driver 1.6.10

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.25.6.dse

• Spark Jobserver 0.8.0.45.2 * DSE custom version

• TinkerPop 3.3.7 with production-certified changes

For a full list, see DataStax Enterprise 6.0.11 third-party software.


DSE 6.0.11 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.

DSE 6.0.11 Highlights

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
30
DataStax Enterprise release notes

High-value benefits of upgrading to DSE 6.0.11:


DSE Search highlights

DataStax recommends upgrading all DSE Search nodes to DSE 6.0.11 or later.

• Fixed a bug to avoid multiple disposals of Solr filter cache DocSet objects. (DSP-15765)

• Improve performance, logging, and add options for using the Solr timeAllowed parameter in all queries.
The Solr timeAllowed option in queries is now enforced by default to prevent long-running shard queries.
(DSP-19781, DSP-19790)

6.0.11 DSE core

Changes and enhancements:

• Add support for nodesync command to specify different IP addresses for JMX and CQL. (DB-2969)

• Enhancements to the offline sstablescrub utility. (DB-3510, DB-3511)

# Specify which SSTables to scrub.

# Scrub multiple tables of the same keyspace.

# Specify the number of threads to simultaneously scrub SSTables within a table.

• Prevent accepting streamed SSTables or loading SSTables when the clustering order does not match.
(DB-3530)

• Dropping and re-adding the same column with incompatible types is not supported. This change prevents
unreadable SSTables. (DB-3586)

Resolved issues:

• Background compactions block SSTable operations too long. (DB-3682)

• Post-bootstrap indexing is executed by only a single CPU core. (DB-3692)

• Reads against ma and mc SSTables hit more SSTables than necessary due to the bug fixed by
CASSANDRA-14861. (DB-3691)

• Error retrieving expired columns with secondary index on key components. (DB-3764)

• The diff logic used by the secondary index does not always pick the latest schema and results in ERROR
[CoreThread-8] errors on batch writes. (DB-3838)

• Unexpected CoreThread error thrown by LWT.PROPOSE. (DB-3858)

• Fixed concurrency factor calculation for distributed range read with a maximum 10 times
the number of cores. Configurable maximum concurrency factor with new JVM argument -
Ddse.max_concurrent_range_requests. (DB-3859)

• Prevent continuous triggers with read defragmentation. (DB-3866)

• Cached serialized mutations can cause G1 GC humongous objects. (DB-3867)

• AIO and DSE Metrics Collector are not available on REHL/Centos 6.x because GLIBC_2.14 is not present.
(DSP-18603)

• Upgrade Jackson Databind to address CVE-2019-14540 and CVE-2019-16942. (DSP-19764, DSP-19896)

• Using SELECT JSON for empty BLOB values incorrectly returns an empty string instead of the expected 0x.
(DSP-20022)

• RoleManager cache keeps invalid values if the LDAP connectivity is down. (DSP-20098)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
31
DataStax Enterprise release notes

• LDAP user login fails due to parsing failure on user DN with parentheses. (DSP-20106)

6.0.11 DSE Analytics

Changes and enhancements:

• Add new DSE class com/datastax/bdp/spark/* for dse-spark-dependencies. (DSP-16070)

• New du dsefs shell command lists sizes of the files and directories in a specific directory. (DSP-19572)

• Improve configuration of available system resources for Spark Workers. You can now set the total memory
and total cores with new environment variables that take precedence over the resource_manager_options
defined in dse.yaml. (DSP-19673)
dse.yaml resource_manager_options Environment variable

memory_total SPARK_WORKER_TOTAL_MEMORY

cores_total SPARK_WORKER_TOTAL_CORES

• Support for multiple contact points is added for DSEFS implementation of the Hadoop FileSystem.
(DSP-19704)
Provide FileSystem URI with:

$ dsefs://host0\[:port\]\[,host1\[:port\]\]/

6.0.11 DSE Search

Enhancements:

• The Solr timeAllowed option in queries is now enforced by default to prevent long-running shard queries.
This change prevents complex facets and boolean queries from using system resources after the DSE
Search coordinator considers the queries to have timed out. For all queries, the default for the timeAllowed
value uses the value of client_request_timeout_seconds setting in dse.yaml. (DSP-19781, DSP-19790)
While using Solr timeAllowed in queries improves performance for long zombie queries, it can cause
increased per-request latency cost in mixed workloads. If the per-request latency cost is too high, use the
-Ddse.timeAllowed.enabled.default search system property to disable timeAllowed in your queries.

• Upgrade spray-json to prevent Denial Of Service (DoS) vulnerability CVE-2018-18854 and


CVE-2018-18853. (DSP-19208)

Resolved issues:

• Error on disposals of Solr filter cache DocSet objects. (DSP-15765)

• Apply filter cache optimization to remote shard requests when RF=N. . (DSP-19800)

• Filter cache warming doesn't warm parent-only filter correctly when RF=N. (DSP-19802)

• Memory allocation issue causes performance degradation at query time. (DSP-19805)

Cassandra enhancements for DSE 6.0.11


DataStax Enterprise 6.0.11 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases, and adds these production-certified changes:

• Handle paging states serialized with a different version than the session version (CASSANDRA-15176)

• Toughen up column drop/recreate type validations (CASSANDRA-15204)

• SSTable min/max metadata can cause data loss (CASSANDRA-14861)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
32
DataStax Enterprise release notes

• Use Bounds instead of Range for sstables in anticompaction (CASSANDRA-14411)

General upgrade advice for DSE 6.0.11


DataStax Enterprise 6.0.11 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.11
DataStax Enterprise (DSE) 6.0.11 includes TinkerPop 3.3.7 with all DataStax enhancements from earlier
versions. See the TinkerPop upgrade documentation.
DSE 6.0.10 release notes
19 September 2019
In this section:

• 6.0.10 Components

• DSE 6.0.10 Highlights

• Cassandra enhancements for DSE 6.0.10

• General upgrade advice for DSE 6.0.10

• TinkerPop changes for DSE 6.0.10

Table 3: DSE functionality


6.0.10 DSE core 6.0.10 DSE Graph

6.0.10 DSE Analytics 6.0.10 DSE Search

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.10 Components

All components from DSE 6.0.10 are listed. Components that are updated for DSE 6.0.10 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2507 *

• Apache Spark™ 2.2.3.5 *

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
33
DataStax Enterprise release notes

• Apache Tomcat® 8.0.53

• DataStax Bulk Loader 1.2.1

• DSE Java Driver 1.6.10 *

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.13.dse

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.7 with additional production-certified changes

DSE 6.0.10 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.

DSE 6.0.10 Highlights

High-value benefits of upgrading to DSE 6.0.10 include these highlights:


DSE Database (DSE core) highlights

• Fixed incorrect handling of frozen type issues to accept all valid CQL statements and reject all invalid CQL
statements. (DB-3084)

• Standalone cqlsh client tool provides an interface for developers to interact with the database and issue
CQL commands without having to install the database software. From DataStax Labs, download the version
of CQLSH that corresponds to your DataStax database version. (DSP-18694)

• New options to select cipher suite and protocol to configure KMIP encryption when connecting to a KMIP
server. (DSP-17294)

DSE Analytics highlights

• Storing and revoking permissions for the application owner is removed. The application owner is explicitly
assumed to have these permissions. (DSP-19393)

DSE Graph highlights

• Fixed an issue where T values are hidden by property keys of the same name in valueMap(). (DSP-19261)

DSE Search highlights

• Improved search query latency. (DSP-18677)

• Unbounded facet searches are no longer allowed. (DSP-18693)

# facet.limit < 0 is no longer supported. Override the default facet.limit of 20000 with the -
Dsolr.max.facet.limit.size system property.

# This change adds guardrails that can cause misconfigured faceting queries to fail. Before upgrading, set
an explicit facet.limit.

6.0.10 DSE core

Changes and enhancements:

• DSE version now appears as a comment in all configuration files. (DB-1022)

• Improved troubleshooting. A log entry is now created when autocompaction is disabled or enabled for a
table. (DB-1635)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
34
DataStax Enterprise release notes

• Enhanced DroppedMessages logging output adds the size percentiles of the dropped messages, their most
common destinations, and the most common tables targeted for read requests or mutations. (DB-1250)

• Reformatted StatusLogger output to reduce details in the INFO level system.log. The detailed output is still
present in the debug.log. (DB-2552)

• For nodetool tpstats -F json and nodetool tpstats -F yaml, wait latencies (in ms) appear in the
output. Although not labeled, the wait latencies are included in the following order: 50%, 75%, 95%, 98%,
99%, Min, and Max. (DB-3401)

• New resources improve debugging leaked chunks before the cache evicts them and provide more
meaningful call stack and stack trace. (DB-3504)

# RandomAccessReader/RandomAccessReader

# AsyncPartitionReader/FlowSource

# AsyncSSTableScanner/FlowSource

• Allocate large buffers directly in the chunk cache. (DB-3506)

• Buffers should return to the pool if a chunk is leaked. (DB-3512)

• New nodetool commands to get current values: getcachecapacity, getcachekeystosave, and


gethintedhandoffthrottlekb. (DB-3618)

• New options to select cipher suite and protocol to configure KMIP encryption when connecting to a KMIP
server. (DSP-17294)

• Standalone cqlsh client tool provides an interface for developers to interact with the database and issue
CQL commands without having to install the database software. From DataStax Labs, download the version
of CQLSH that corresponds to your DataStax database version. (DSP-18694)

• Upgraded Apache MINA Core library to 2.0.21 to prevent a security issue where Apache MINA Core was
vulnerable to information disclosure. (DSP-19213)

• Update Jackson Databind to 2.9.9.1 for all components except DataStax Bulk Loader. (DSP-19441)

Resolved issues:

• Fix to prevent NPE during repair in mixed-version clusters. (DB-1985)

• Tarball installs to create two instances on the same physical server with remote JMX access with binding
the separated IPs to port 7199 causes JMX error of Address already in use (Bind failed) because
com.sun.management.jmxremote.host is ignored. (DB-2483)

• Prevent changing the replication strategy of system keyspaces. (DB-2960)

• Upgrade Jackson Databind to address CVE-2018-11307 and CVE-2018-19361. (DB-2911, DSP-18099,


DSP-19319)

• Slow startup or node hangs when encryption is used. (DB-3050)

• Incorrect handling of frozen type issues: valid CQL statements are not accepted and invalid CQL statements
are not property rejected. (DB-3084)

• DSE fails to start with ERROR Attempted serializing to buffer exceeded maximum of 65535 bytes. Improved
error to identify a workaround for commitlog corruption. (DB-3162)

• sstabledowngrade needs write access to the snapshot folder for a different output location. (DB-3231)

• The number of pending compactions reported by nodetool compactionstats was incorrect (off by one) for
Time Window Compaction Strategy (TWCS). (DB-3284)

• Invalid JSON output for nodetool tpstats -F json. (DB-3401)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
35
DataStax Enterprise release notes

• When unable to send mutations to replicas due to overloading, hints are mistakenly created against the local
node. (DB-3421)

• When a non-frozen UDT column is dropped and the table is later re-created from the schema that was
created as part of a snapshot, the dropped column record is invalid and may lead to failure loading some
SSTables. (DB-3434)

• sstablepartitions incorrectly handles -k and -x options. (DB-3442)


Workaround: To specify multiple keys, repeat the -k or -x option for each key.

• Memory leaks when updating tables with materialized views. (DB-3504)

• Error in custom provider prevents DSE node startup. With this fix, the node will start up but insights
is not active. See the DataStax Support Knowledge Base for steps to resolve existing missing or incorrect
keyspace replication problems. (DSP-19521)

Known issues:

• On Oracle Linux 7.x, StorageService.java:4970 exception occurs with DSE package installation.
(DSP-19625)
Workaround: On Oracle Linux 7.x operating systems, install DSE using the binary tarball.

6.0.10 DSE Analytics

Changes and enhancements:

• Storing and revoking permissions for the application owner is removed. Instead of explicitly storing
permission of the application owner to manage and view Spark applications, the application owner is
explicitly assumed to have these permissions. (DSP-19393)

Resolved issues:

• Spark applications incorrectly reported that joins were broken. DirectJoin output check too strict.
(DSP-19063)

• Submitting many Spark apps will reach the default tombstone_failure_threshold before the default 90 days
gc_grace_seconds defined for the system_auth.role_permissions table. (DSP-19098)
Workaround with this fix:

1. Manually grant permissions to the user before the user starts Spark jobs:

GRANT AUTHORIZE, DESCRIBE, MODIFY ON ANY SUBMISSION IN WORKPOOL


'datacenter_name.workpool' TO role_name;

2. Start Spark jobs for this user.


3. After all Spark jobs are complete for this user, revoke the permissions for this user.

REVOKE AUTHORIZE, DESCRIBE, MODIFY ON ANY SUBMISSION IN WORKPOOL


'datacenter_name.workpool' FROM role_name;

• Credentials are not masked in the debug level logs for Spark Jobserver and Spark submitted jobs.
(DSP-19490)

6.0.10 DSE Graph

Changes and enhancements:

• New graph truncate command to remove all data from graph. (DSP-17609)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
36
DataStax Enterprise release notes

• Support for ifExists() before truncate(), like system.graph("foo").ifExists().truncate(), in DSE


Graph (classic graph) API. (DSP-19357)

Resolved issues:

• gremlin-console startup time is improved. (DSP-11550)

• T values get hidden by property keys of the same name in valueMap(). (DSP-19261)

6.0.10 DSE Search

Enhancements:

• DSE 6.0 search query latency is on parity with DSE 5.1. (DSP-18677)

• For token ranges dictated by distribution, filter cache warming occurs when a node is restarted, a search
index is rebuilt, or when node health score is up to 0.9. New per-core metrics for metric type WarmupMetrics
and other improvements. (DSP-8621)

• Unbounded facet searches are no longer allowed. (DSP-18693)

# facet.limit < 0 is no longer supported. Override the default facet.limit of 20000 with the -
Dsolr.max.facet.limit.size system property.

# This change adds guardrails that can cause misconfigured faceting queries to fail. Before upgrading, set
an explicit facet.limit.

Resolved issues:

• Solr CQL count query incorrectly returns the count as all data count but should return all data count minus
start offset. (DSP-16153)

• Validation error does not get returned when docValues are applied when types do not allow docValues.
(DSP-16884)
With this fix, the following exception behavior is applied:

# Throw exception when docValues:true is specified for a column and column type does not support
docValues.

# Do not throw exception and ignore docValues:true for columns with types that do not support docValues
if docValues:true is set for *.

• When using live indexing, also known as Real Time (RT) indexing, stale Solr documents contain data that is
updated in the database. This issue happens when a facet query is run against a search index (core) while
inserting or loading data, and the search core is shut down. (DSP-18786)

• When driver uses paging, CQL query fails when using a Solr index to query with a sort on a field that
contains the primary key name in the field: InvalidRequest: Error from server: code=2200 [Invalid
query] message="Cursor functionality requires a sort containing a uniqueKey field tie
breaker". (DSP-19210)

Known issues:

• The count() query with Solr enabled can be inaccurate or inconsistent. (DSP-19401)

Cassandra enhancements for DSE 6.0.10


DataStax Enterprise 6.0.10 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases, and adds these production-certified changes:
General upgrade advice for DSE 6.0.10
DataStax Enterprise 6.0.10 is compatible with Apache Cassandra™ 3.11.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
37
DataStax Enterprise release notes

All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.10
DataStax Enterprise (DSE) 6.0.10 includes TinkerPop 3.3.7 with all DataStax enhancements from earlier
versions.
DSE 6.0.9 release notes
9 July 2019
In this section:

• 6.0.9 Components

• 6.0.9 Important bug fix

• Cassandra enhancements for DSE 6.0.9

• General upgrade advice for DSE 6.0.9

• TinkerPop changes for DSE 6.0.9

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

6.0.9 Components

All components from DSE 6.0.9 are listed.

• Apache Solr™ 6.0.1.1.2460

• Apache Spark™ 2.2.3.4

• Apache Tomcat® 8.0.53

• DataStax Bulk Loader 1.2.0

• DSE Java Driver 1.6.9

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.13.dse

• Spark Jobserver 0.8.0.45 DSE custom version

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
38
DataStax Enterprise release notes

• TinkerPop 3.3.7 with additional production-certified changes

DSE 6.0.9 is compatible with Apache Cassandra™ 3.11 and includes all DataStax enhancements from earlier
versions.

DSE 6.0.9 Important bug fix

• Fixed possible data loss when using DSE Tiered Storage. (DB-3404)
If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

Cassandra enhancements for DSE 6.0.9


DataStax Enterprise 6.0.9 is compatible with Apache Cassandra™ 3.11 and includes all DataStax
enhancements from earlier releases.
General upgrade advice for DSE 6.0.9
DataStax Enterprise 6.0.9 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.9
DataStax Enterprise (DSE) 6.0.9 includes TinkerPop 3.3.7 with all DataStax enhancements from earlier
versions.
DSE 6.0.8 release notes
11 June 2019
In this section:

• 6.0.8 Components

• DSE 6.0.8 Highlights

• Cassandra enhancements for DSE 6.0.8

• General upgrade advice for DSE 6.0.8

• TinkerPop changes for DSE 6.0.8

Table 4: DSE functionality


• 6.0.8 DSE core • 6.0.8 DSE Graph

• 6.0.8 DSE Analytics • 6.0.8 DSE Search

• 6.0.8 DSEFS

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
39
DataStax Enterprise release notes

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

6.0.8 Components

All components from DSE 6.0.8 are listed. Components that are updated for DSE 6.0.8 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2460 *

• Apache Spark™ 2.2.3.4 *

• Apache Tomcat® 8.0.53 *

• DataStax Bulk Loader 1.2.0

• DSE Java Driver 1.6.9

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.13.dse *

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.7 with additional production-certified changes

DSE 6.0.8 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.

DSE 6.0.8 Highlights

High-value benefits of upgrading to DSE 6.0.8 include these highlights:


DSE Database (DSE core) highlights

• Significant fixes and improvements for native memory, the chunk cache, and async read timeouts.

• New configurable memory leak tracking. (DB-3123)

• Improved lightweight transactions (LWT) handling. (DB-3018, DB-3124)

DSE Analytics highlights

• When DSE authentication is enabled, Spark security is forced to be enabled. (DSP-17274)

• Spark security is turned on in dse.yaml configuration file. (DSP-17271)

DSEFS highlights

• Fix handling of path alternatives in DSEFS shell to provide wildcard support for mkdir and ls commands.
(DSP-17768)

DSE Graph highlights

• Operations through gremlin-console run with anonymous permissions. (DSP-18471)

• You can now dynamically pass cluster and connection configuration for different graph objects. Fixes the
issue where DseGraphFrame cannot directly copy graph from one cluster to another. (DSP-18605)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
40
DataStax Enterprise release notes

DSE Search highlights


Changes and improvements:

• Performance improvements and overload protection for search queries. (DSP-15875)

• New configurable memory leak tracking: new nodetool leaksdetection command and Memory leak detection
settings options in cassandra.yaml. (DB-3123)

• Performance improvements to Solr deletes that correspond to Cassandra rows. (DSP-17419)

• Changes to correct uneven distribution of shard requests with the STATIC set cover finder. (DSP-18197)

• New recommended method for case-insensitive text search, faceting, grouping, and sorting with new
LowerCaseStrField Solr field type. This type sets field values as lowercase and stores them as lowercase in
docValues. (DSP-18763)

Important bug fixes:

• The queryExecutorThreads and timeAllowed Solr parameters can be used together. (DSP-18717)

• Avoid interrupting request threads when an internode handshake fails so that the Lucene file channel lock
cannot be interrupted. Fixes LUCENE-8262. (DSP-18211)

6.0.8 DSE core

Changes and enhancements:

• Improved lightweight transactions (LWT) handling:

# Improved lightweight transactions (LWT) performance. New cassandra.yaml LWT configuration options.
(DB-3018)

# Optimized memory usage for direct reads pool when using a high number of LWTs. (DB-3124)
When not set in cassandra.yaml, the default calculated size of direct_reads_size_in_mb changed from
128 MB to 2 MB per TPC core thread, plus 2 MB shared by non-TPC threads, with a maximum value of
128 MB.

• Improved logging identifies which client, keyspace, table, and partition key is rejected when mutation
exceeds size threshold. (DB-1051)

• Improve status reporting for nodesync validation list. (DB-2707)

• Enable upgrading and downgrading SSTables using a CQL file that contains DDL statements to recreate the
schema. (DB-2951)

• Configurable memory leak tracking. (DB-3123)

# New nodetool leaksdetection command

Resolved issues:

• Nodes in a cluster continue trying to connect to a decommissioned node. (DB-2886)

• 32-bit integer overflow in StreamingTombstoneHistogramBuilder during compaction. (DB-3108)

• Possible direct memory leak when part of bulk allocation fails. (DB-3125)

• Counters in memtable allocators and buffer pool metrics can be incorrect when out of memory (OOM)
failures occur. (DB-3126)

• Memory leak occurs when a read from disk times out. (DB-3127)

• AssertionError in temporary buffer pool causes CorruptSSTableException. (DB-3172)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
41
DataStax Enterprise release notes

• Memory leak on errors when reading. (DB-3175)

• Bootstrap should fail when the node can't fetch the schema from other nodes in the cluster. (DB-3186)

• Increment pending echos when sending gossip echo requests. (DB-3187)

• Deadlock when replaying schema mutations from commit log during DSE startup. (DB-3190)

• Make the remote host visible in the error message for failed magic number verification. (DSP-18645)

Known issue:

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

6.0.8 DSE Analytics

Changes and enhancements:

• A warning message is displayed when DSE authentication is enabled, but Spark security is not enabled.
(DSP-17273)

• When DSE authentication is enabled, Spark security is forced to be enabled. (DSP-17274)


dse.yaml Spark security is enforced

authentication_options When enabled: true

spark_security_enabled This setting is ignored.

spark_security_encryption_enabled This setting is ignored.

• Spark Cassandra Connector: To improve connection for streaming applications with shorter batch times, the
default value for Keep Alive is increased to 1 hour. (DSP-17393)

Resolved issues:

• Cassandra Spark Connector rejects nested UDT when null. (DSP-17965)

• CassandraHiveMetastore does not unquote predicates for server-side filtering. (DSP-18017)

• Reduce probability of hitting max_concurrent_sessions limit for OLAP workloads with BYOS (Bring Your
Own Spark). (DSP-18280)
For OLAP workloads with BYOS, DataStax recommends increasing the max_concurrent_sessions using
this formula as a guideline:

max_concurrent_sessions = spark_executors_threads_per_node x reliability_coefficient

where reliability_coefficient must be greater than 1, with a minimum reliability_coefficient


value between 2 and replication factor (RF) x 2.

• dse spark-submit --status driver_ID command fails. (DSP-18616)

• BYOS DSEFS access fails with AuthenticationException with dseauth_internal_no_otherschemes.


(DSP-18822)

• Accessing files from Spark through WebHDFS interface fails with message: java.io.IOException:
Content-Length is missing. (DSP-18559)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
42
DataStax Enterprise release notes

• Submitting many Spark applications will reach the default tombstone_failure_threshold before the default 90
days gc_grace_seconds defined for the system_auth.role_permissions table. (DSP-19098)

6.0.8 DSEFS

Resolved issues:

• Fix handling of path alternatives in DSEFS shell to provide wildcard support for mkdir and ls commands.
(DSP-17768)
For example, to make several subdirectories with a single command:

$ dse fs mkdir -p /datastax/demos/weather_sensors/{byos-daily,byos-monthly,byos-


station}

$ dse fs mkdir -p {path1,path2}/dir

6.0.8 DSE Graph

Changes and enhancements:

• The graph configuration and gremlin_server sections in DSE Graph system-level options are now correctly
commented out at the top level. (DSP-18477)

Resolved issues:

• NPE when dropping a graph with an alias in gremlin console. (DSP-13387)

• Time, date, inet, and duration data types are not supported in graph search indexes. (DSP-17694)

• Should prevent sharing Gremlin Groovy closures between scripts that are submitted through session-less
connections, like DSE drivers. (DSP-18146)

• Operations through gremlin-console run with system permissions, but should run with anonymous
permissions. (DSP-18471)

• DseGraphFrame cannot directly copy graph from one cluster to another. You can now dynamically pass
cluster and connection configuration for different graph objects. (DSP-18605)
Workaround for earlier versions:

1. Export graph to DSEFS:

$ g.V.write.format("csv").save("dsefs://culster1/tmp/vertices") &&
g.E.write.format("csv").save("dsefs://culster1/tmp/edges")

2. Import graph to the other cluster:

$ g.updateVertices(spark.read.format("csv").load("dsefs://culster1/tmp/vertices")
&& g.updateEdges(spark.read.format("csv").load("dsefs://culster1/tmp/edges")

• Issue querying a search index when the vertex label is set to cache properties. (DSP-18898)

• UnsatisfiedLinkError when insert multi edge with DseGraphFrame in BYOS (Bring Your Own Spark).
(DSP-18916)

• DSE Graph does not use primary key predicate in Search/.has() predicate. (DSP-18993)

6.0.8 DSE Search

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
43
DataStax Enterprise release notes

Changes and enhancements:

• Reject requests from the TPC backpressure queue when requests are on the queue for too long.
(DSP-15875)

• Changes to correct uneven distribution of shard requests with the STATIC set cover finder. (DSP-18197)
A new inertia parameter for dsetool set_core_property supports fine tuning. The default value of 1 can be
adjusted for environments with vnodes and more than 10 vnodes.

• New recommended method for case-insensitive text search, faceting, grouping, and sorting with new
LowerCaseStrField custom Solr field type. This type sets field values as lowercase and stores them as
lowercase in docValues. (DSP-18763)
DataStax does not support using the TextField Solr field type with solr.KeywordTokenizer and
solr.LowerCaseFilterFactory to achieve single-token, case-insensitive indexing on a CQL text field.

Resolved issues:

• SASI queries don't work on tables with row level access control (RLAC). (DB-3082)

• Documents might not be removed from the index when a key element has value equal to a Solr reserved
word. (DSP-17419)

• FQ broken with queryExecutorThreads and timeAllowed set. (DSP-18717)

• Avoid interrupting request threads when an internode handshake fails so that the Lucene file channel lock
cannot be interrupted. Fixes LUCENE-8262. (DSP-18211)
Workaround for earlier versions: Reload the search core without restarting or reindexing.

• Search should error out, rather than timeout, on Solr query with non-existing field list (fl) fields. (DSP-18218)

Cassandra enhancements for DSE 6.0.8


DataStax Enterprise 6.0.8 is compatible with Apache Cassandra™ 3.11 and includes all production-certified
enhancements from earlier releases.
General upgrade advice for DSE 6.0.8
DataStax Enterprise 6.0.8 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.8
DataStax Enterprise (DSE) 6.0.8 includes these production-certified enhancements to TinkerPop 3.3.7:

• Developed DSL pattern for gremlin-javascript.

• Generated uberjar artifact for Gremlin Console.

• Improved folding of property() step into related mutating steps.

• Added inject() to steps generated on the DSL TraversalSource.

• Removed gperfutils dependencies from Gremlin Console.

• Fixed PartitionStrategy when setting vertex label and having includeMetaProperties configured to
true.

• Ensure gremlin.sh works when directories contain spaces.

• Prevented client-side hangs if metadata generation fails on the server.

• Fixed bug with EventStrategy in relation to addE() where detachment was not happening properly.

• Ensured that gremlin.sh works when directories contain spaces.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
44
DataStax Enterprise release notes

• Fixed bug in detachment of Path where embedded collection objects would prevent that process.

• Enabled ctrl+c to interrupt long running processes in Gremlin Console.

• Quieted "host unavailable" warnings for both the driver and Gremlin Console.

• Fixed construction of g:List from arrays in gremlin-javascript.

• Fixed bug in GremlinGroovyScriptEngine interpreter mode around class definitions.

• Implemented EdgeLabelVerificationStrategy.

• Fixed behavior of P for within() and without() in Gremlin Language Variants (GLV) to be consistent with
Java when using variable arguments (varargs).

• Cleared the input buffer after exceptions in Gremlin Console.

• Added parameter to configure the processor in the gremlin-javascript client constructor.

• Docker images now use gremlin user instead of root user.

• Refactored use of commons-lang to use common-lang3 only. Dependencies may still use commons-lang.

• Bumped commons-lang3 to 3.8.1.

• Added GraphSON serialization support for Duration, Char, ByteBuffer, Byte, BigInteger, and BigDecimal in
gremlin-python.

• Added ProfilingAware interface to allow steps to be notified that profile() was being called.

• Fixed bug where profile() could produce negative timings when group() contained a reducing barrier.

• Improved logic determining the dead or alive state of a Java driver connection.

• Improved handling of dead connections and the availability of hosts.

• Bumped httpclient to 4.5.7.

• Bumped slf4j to 1.7.25.

• Bumped commons-codec to 1.12.

• Fixed partial response failures when using authentication in gremlin-python.

• Fixed a bug in PartitionStrategy where addE() as a start step was not applying the partition.

• Improved performance of JavaTranslator by reducing calls to Method.getParameters().

• Implemented EarlyLimitStrategy which is supposed to significantly reduce backend operations for


queries that use range().

• Reduced chance of hash collisions in Bytecode and its inner classes.

• Added Symbol.asyncIterator member to the Traversal class to provide support for await ... of
loops (async iterables).

Bug fixes:

• TINKERPOP-2081 PersistedOutputRDD materialises rdd lazily with Spark 2.x.

• TINKERPOP-2091 Wrong/missing feature requirements in StructureStandardTestSuite.

• TINKERPOP-2094 Gremlin Driver Cluster Builder serializer method does not use mimeType as suggested.

• TINKERPOP-2095 GroupStep looks for irrelevant barrier steps.

• TINKERPOP-2096 gremlinpython: AttributeError when connection is closed before result is received.

• TINKERPOP-2100 coalesce() creating unexpected results when used with order().

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
45
DataStax Enterprise release notes

• TINKERPOP-2105 Gremlin-Python connection not returned back to the pool on exception from the Gremlin
Server.

• TINKERPOP-2113 P.Within() doesn't work when given a List argument.

Improvements:

• TINKERPOP-1889 JavaScript Gremlin Language Variants (GLV): Use heartbeat to prevent connection
timeout.

• TINKERPOP-2010 Generate jsdoc for gremlin-javascript.

• TINKERPOP-2013 Process tests that are auto-ignored stink.

• TINKERPOP-2018 Generate API docs for Gremlin.Net.

• TINKERPOP-2038 Make groovy script cache size configurable.

• TINKERPOP-2050 Add a :bytecode command to Gremlin Console.

• TINKERPOP-2062 Add Traversal class to CoreImports.

• TINKERPOP-2065 Optimize iterate() for remote traversals.

• TINKERPOP-2067 Allow getting raw data from Gremlin.Net.Driver.IGremlinClient.

• TINKERPOP-2068 Bump Jackson Databind 2.9.7.

• TINKERPOP-2069 Document configuration of Gremlin.Net.

• TINKERPOP-2070 gremlin-javascript: Introduce Connection representation.

• TINKERPOP-2071 gremlin-python: the graphson deserializer for g:Set should return a python set.

• TINKERPOP-2073 Generate tabs for static code blocks.

• TINKERPOP-2074 Ensure that only NuGet packages for the current version are pushed.

• TINKERPOP-2077 VertexProgram.Builder should have a default create() method with no Graph.

• TINKERPOP-2078 Hide use of EmptyGraph or RemoteGraph behind a more unified method for
TraversalSource construction.

• TINKERPOP-2084 For remote requests in console, display the remote stack trace.

• TINKERPOP-2092 Deprecate default GraphSON serializer fields.

• TINKERPOP-2097 Create a DriverRemoteConnection with an initialized Client.

• TINKERPOP-2102 Deprecate static fields on TraversalSource related to remoting.

• TINKERPOP-2106 When gremlin executes timeout, throw TimeoutException instead of


TraversalInterruptedException/InterruptedIOException.

• TINKERPOP-2110 Allow connection on different path (from /gremlin).

• TINKERPOP-2114 Document common Gremlin anti-patterns.

• TINKERPOP-2118 Bump to Groovy 2.4.16.

• TINKERPOP-2121 Bump Jackson Databind 2.9.8.

DSE 6.0.7 release notes


1 April 2019
In this section:

• 6.0.7 Components

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
46
DataStax Enterprise release notes

• DSE 6.0.7 Highlights

• Cassandra enhancements for DSE 6.0.7

• General upgrade advice for DSE 6.0.7

• TinkerPop changes for DSE 6.0.7

Table 5: DSE functionality


6.0.7 DSE core 6.0.7 DSE Graph

6.0.7 DSE Analytics 6.0.7 DSE Search

6.0.7 DSEFS

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

6.0.7 Components

All components from DSE 6.0.7 are listed. Components that are updated for DSE 6.0.7 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2407 *

• Apache Spark™ 2.2.3.4 *

• Apache Tomcat® 8.0.53

• DataStax Bulk Loader 1.2.0

• DSE Java Driver 1.6.9

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.13.dse *

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.6 with additional production-certified changes

DSE 6.0.7 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
47
DataStax Enterprise release notes

DSE 6.0.7 Highlights

High-value benefits of upgrading to DSE 6.0.7 include these highlights:


DSE Database (DSE core) highlights

• Compaction performance improvement with new cassandra.yaml pick_level_on_streaming option.


(DB-1658)

• Improved user tools for SSTable upgrades (sstableupgrade) and downgrades (sstabledowngrade).
(DB-2950)

• New cassandra.yaml direct_reads_size_in_mb option sets the size of the new buffer pool for direct
transient reads. (DB-2958)

• Reduction of LWT contention by improved handling of IO threads. (DB-2965)

• Remedy deadlock during node startup when calculating disk boundaries. (DB-3028)

• Correct handling of dropped UDT columns in SSTables. (DB-3031)


Workaround: If issues with UDTs in SSTables exist after upgrade from DSE 5.0.x, run sstablescrub -e
fix-only offline on the SSTables that have or had UDTs that were created in DSE 5.0.x.

• The frame decoding off-heap queue size is configurable and smaller by default. (DB-3047)

DSE Analytics highlights

• Authorization to AlwaysOn SQL web UI is supported. (DSP-18236)

• Handle quote in cache query of AlwaysOn SQL (AOSS). (DSP-18418)

• Fix leakage in BulkTableWriter. (DSP-18513)

DSE Graph highlights

• Some minor DSE GraphFrame code fixes. (DSP-18215)

• Improved updateEdges and updateVertices usability for single label update. (DSP-18404)

• Operations through gremlin-console run with anonymous instead of system permissions. (DSP-18471)

• Gremlin (groovy) scripts compile faster. (DSP-18025)

• Data caching improvements during DSE GraphFrame operations. (DSP-17870)

DSE Search highlights

• Fixed facets and stats queries when using queryExecutorThreads. (DSP-18237)

• Fixed timestamp PK routing with solr_query. (DSP-18223)

• Search/Solr HTTP request for CSV output is fixed. (DSP-18029)

6.0.7 DSE core

Changes and enhancements:

• Compaction performance improvement with new cassandra.yaml pick_level_on_streaming option.


(DB-1658)
Streamed-in SSTables of tables using LCS (leveled compaction strategy) are placed in the same level as
the source node, with possible up-leveling. Set pick_level_on_streaming to true to save compaction work for
operations like nodetool refresh and replacing a node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
48
DataStax Enterprise release notes

• The sstableloader downgrade from DSE to OSS Apache Cassandra is supported with new
sstabledowngrade tool. (DB-2756)
The sstabledowngrade command cannot be used to downgrade system tables or downgrade DSE
versions.

• TupleType values with null fields NPE when being made byte-comparable. (DB-2872)

• Support for using sstableloader to stream OSS Cassandra 3.x and DSE 5.x data to DSE 6.0 and later.
(DB-2909)

• Memory improvements with these supported changes:

# Configurable memory is supported for offline sstable tools. (DB-2955)


You can use these environment variables tools:

# MAX_HEAP_SIZE - defaults to 256 MB

# MAX_DIRECT_MEMORY - defaults to ((system_memory - heap_size) / 4) with a minimum of


1 GB and a max of 8 GB.

To specify memory on the command line:

$ MAX_HEAP_SIZE=2g MAX_DIRECT_MEMORY=10g sstabledowngrade keyspace table

The sstabledowngrade command cannot be used to downgrade system tables or downgrade DSE
versions.

# Buffer pool, and metrics for the buffer pool, are now in two pools. In cassandra.yaml,
file_cache_size_in_mb option sets the file cache (or chunk cache) and new direct_reads_size_in_mb
option for all other short-lived read operations. (DB-2958)
To retrieve the buffer pool metrics:

$ nodetool sjk mxdump -q


"org.apache.cassandra.metrics:type=CachedReadsBufferPool,name=*"

$ nodetool sjk mxdump -q


"org.apache.cassandra.metrics:type=DirectReadsBufferPool,name=*"

For legacy compatibility, org.apache.cassandra.metrics:type=BufferPool still exists and is the


same as org.apache.cassandra.metrics:type=CachedReadsBufferPool.

# cassandra-env.sh respect heap and direct memory values set in jvm.options or as environment
variables. (DB-2973)
The precedence for heap and direct memory is:

# Environment variables

# jvm.options

# calculations in cassandra-env.sh

# AIO is automatically disabled if the chunk cache size is small enough: less or equal to system RAM / 8.
(DB-2997)

# Limit off-heap frame queues by configurable number of frames and total number of bytes. (DB-3047)

Resolved issues:

• Native server Message.Dispatcher.Flusher task stalls under heavy load. (DB-1814)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
49
DataStax Enterprise release notes

• Race in CommitLog can cause failed force-flush-all. (DB-2542)

• Unclosed range tombstones in read response. (DB-2601)

• The sstableloader downgrade from DSE to OSS Apache Cassandra is not supported. New
sstabledowngrade tool is required. (DB-2756)

• Unused memory in buffer pool. (DB-2788)

• nodesync fails when validating MV row with empty partition key. (DB-2823)

• TupleType values with null fields NPE when being made byte-comparable. (DB-2872)

• The memory in use in the buffer pool is not identical to the memory allocated. (DB-2904)

• Reference leak in SSTableRewriter in sstableupgrade when keepOriginals is true. (DB-2944)

• Hint-dispatcher file-channel not closed, if open() fails with OOM. (DB-2947)

• Offline sstable tools fail with Out of Direct Memory error. (DB-2955)

• Hints and metadata should not use buffer pool. (DB-2958)

• Lightweight transactions contention may cause IO thread exhaustion. (DB-2965)

• DIRECT_MEMORY is being calculated using 25% of total system memory if -Xmx is set in jvm.options.
(DB-2973)

• Netty direct buffers can potentially double the -XX:MaxDirectMemorySize limit. (DB-2993)

• Increased NIO direct memory because the buffers are not cleaned until GC is run. (DB-2996)

• nodesync cannot be enabled on materialized views (MV). (DB-3008)

• Mishandling of frozen in complex nested types. (DB-3081)

• Check of two versions of metadata for a column fails on upgrade from DSE 5.0.x when type is not of same
class. Loosen the check from CASSANDRA-13776 to prevent Trying to compare 2 different types
ERROR on upgrades. (DB-3021)

• Deadlock during node startup when calculating disk boundaries. (DB-3028)

• cqlsh EXECUTE AS command does not work. (DB-3098)

• Dropped UDT columns in SSTables deserialization are broken after upgrading from DSE 5.0. (DB-3031)

• Kerberos protocol and QoP parameters are not correctly propagated. (DSP-15455)

• RpcExecutionException does not print the user who is not authorized to perform a certain action.
(DSP-15895)

• Leak in BulkTableWriter. (DSP-18513)

Known issue:

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

6.0.7 DSE Analytics

Changes and enhancements:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
50
DataStax Enterprise release notes

• Support configuration to connect to multiple hosts from BYOS connector. (DSP-18231)

Resolved issues:

• After client-to-node SSL is enabled, all Spark nodes must also listen on port 7480. (DSP-15744)

• dse client-tool configuration byos-export does not export required Spark properties. (DSP-15938)

• Downloaded Spark JAR files are executable for all users. (DSP-17692)

• Issue with viewing information for completed jobs when authentication is enabled. (DSP-17854)

• Spark Cassandra Connector does properly cache manually prepared RegularStatements, see
SPARKC-558. (DSP-18075)

• Unexpected gossip failure. java.lang.NullPointerException: null. (DSP-18194)

• Apache Spark local privilege escalation vulnerability: CVE-2018-11760. (DB-18225)

• Invalid options show for dse spark-submit command line help. (DSP-18293)

• Can't access AlwaysOn SQL (AOSS) UI when authorization is enabled. (DSP-18236)

• Spark SQL function concat_ws results in a compilation error when an array column is included in the column
list and when the number of columns to be concatenated exceeds 8. (DSP-18383)

• Improved error messaging for AlwaysOn SQL (AOSS) client tool. (DSP-18409)

• CQL syntax error when single quote is not correctly escaped before including in save cache query to AOSS
cache table. (DSP-18418)

• Remove class DGFCleanerInterceptor from byos.jar. (DSP-18445)

• GBTClassifier in Spark ML fails when periodic checkpointing is on. (DSP-18450)

Known issue:

• DSE 6.0.7 is not compatible with Zeppelin in SparkR and PySpark 0.8.1. (DSP-18777)
The Apache Spark™ 2.2.3.4 that is included with DSE 6.0.7 contains the patched protocol and all versions
of DSE are compatible with the Scala interpreter.
However, SparkR and PySpark use only a separate channel for communication with Zeppelin. This protocol
was vulnerable to attack from other users on the system and was secured in CVE-2018-11760. Zeppelin
in SparkR and PySpark 0.8.1 fails because it does not recognize that Spark 2.2.2 and later contain this
patched protocol and attempts to use the old protocol. The Zeppelin patch to recognize this protocol is not
available in a released Zeppelin build.
Solution: Do not upgrade to DSE 6.0.7 if you use SparkR or PySpark. Wait for the Zeppelin release later
than 0.8.1 that will recognize that DSE-packaged Spark can use the secured protocol.

• Submitting many Spark apps will reach the default tombstone_failure_threshold before the default 90 days
gc_grace_seconds defined for the system_auth.role_permissions table. (DSP-19098)
Workaround for use cases where a large number of Spark jobs are submitted:

1. Before the user starts the Spark jobs, manually grant permissions to the user:

GRANT AUTHORIZE, DESCRIBE, MODIFY ON ANY SUBMISSION IN WORKPOOL


'datacenter_name.workpool' TO role_name;

2. Start Spark jobs for this user.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
51
DataStax Enterprise release notes

3. After this user completes all the Spark jobs, revoke permissions for the user:

REVOKE AUTHORIZE, DESCRIBE, MODIFY ON ANY SUBMISSION IN WORKPOOL


'datacenter_name.workpool' FROM role_name;

6.0.7 DSEFS

Resolved issues:

• Change dsefs:// default port when the DSEFS setting public_port is changed in dse.yaml. (DSP-17962)
The shortcut dsefs:/// now automatically resolves to broadcastaddress:dsefs.public_port, instead of
incorrectly using broadcastaddress:5598 regardless of the configured port.

• DSEFS WebHDFS API GETFILESTATUS op returns AccessDeniedException for the file even when user
has correct permission. (DSP-18044)

• Problem with change group ownership of files using the fileSystem.setOwner method. (DSP-18052)

6.0.7 DSE Graph

Changes and enhancements:

• Vertex and especially edge loading is simplified. idColumn function is no longer required. (DSP-18404)

Resolved issues:

• OLAP traversal duplicates the partition key properties: OLAP g.V().properties() prints 'first' vertex n times
with custom ids. (DSP-15688)

• Edges are inserted with tombstone values set when inserting a recursive edge with multiple cardinality.
(DSP-17377)

• AND operator is ignored in combination with OR operator in graph searches. (DSP-18061)

6.0.7 DSE Search

Resolved issues:

• SASI should discard stale static row. (DB-2956)

• Anti-compaction transaction causes temporary data loss. (DB-3016)

• Solr HTTP request for CSV output is blank. The CSVResponseWriter returns only stored fields if a field list is
not provided in the URL. (DSP-18029)
To workaround, specify a field list with the URL:

/select?q=*%3A*&sort=lst_updt_gdttm+desc&rows=10&fl=field1,field2&wt=csv&indent=true

• Timestamp PK routing on solr_query fails. (DSP-18223)

• Facets and stats queries broken when using queryExecutorThreads. (DSP-18237)

Cassandra enhancements for DSE 6.0.7


DataStax Enterprise 6.0.7 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases, and adds these production-certified changes:

• Always close RT markers returned by ReadCommand#executeLocally(). (CASSANDRA-14515)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
52
DataStax Enterprise release notes

Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser. (CASSANDRA-14781)

General upgrade advice for DSE 6.0.7


DataStax Enterprise 6.0.7 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.7
DataStax Enterprise (DSE) 6.0.7 includes production-certified enhancements to TinkerPop 3.3.6. See
TinkerPop upgrade documentation for all changes.

• Disables the ScriptEngine global function cache which can hold on to references to "g" along with some
other minor bug fixes/enhancements.

DSE 6.0.6 release notes


DataStax recommends the latest patch release for most environments.

27 February 2019

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.6 Components

All components from DSE 6.0.6 are listed. Components that are updated for DSE 6.0.6 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2380

• Apache Spark™ 2.2.2.8

• Apache Tomcat® 8.0.53

• DataStax Bulk Loader 1.2.0

• DSE Java Driver 1.6.9

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.12.dse

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
53
DataStax Enterprise release notes

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.5 with additional production-certified changes *

DSE 6.0.6 is compatible with Apache Cassandra™ 3.11 and includes all production-certified changes from
earlier versions.
DSE 6.0.6 Important bug fix

• DSE 5.0 SSTables with UDTs are corrupted in DSE 5.1, DSE 6.0, and DSE 6.7. (DB-2954,
Cassandra-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), the SSTable serialization headers are fixed
when DSE is started with DSE 6.0.6 or later.

DSE 6.0.6 Known issue:

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

Cassandra enhancements for DSE 6.0.6


DataStax Enterprise 6.0.6 is compatible with Apache Cassandra™ 3.11 and includes all production-certified
enhancements from previous releases.
General upgrade advice for DSE 6.0.6
DataStax Enterprise 6.0.6 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.6
DataStax Enterprise (DSE) 6.0.6 includes all enhancements from previous DSE releases that are in addition to
TinkerPop 3.3.5. See TinkerPop upgrade documentation for all changes.
DSE 6.0.5 release notes
7 February 2019
In this section:

• DSE 6.0.5 Components

• DSE 6.0.5 Highlights

• DSE 6.0.5 Known issues

• Cassandra enhancements for DSE 6.0.5

• General upgrade advice for DSE 6.0.5

• TinkerPop changes for DSE 6.0.5

Table 6: DSE functionality


6.0.5 DSE core 6.0.5 DSE Graph

6.0.5 DSE Analytics 6.0.5 DSE Search

6.0.5 DSEFS

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
54
DataStax Enterprise release notes

The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.5 Components

All components from DSE 6.0.5 are listed. Components that are updated for DSE 6.0.5 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2380 *

• Apache Spark™ 2.2.2.8 *

• Apache Tomcat® 8.0.53 *

• DataStax Bulk Loader 1.2.0

• DSE Java Driver 1.6.9

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.12.dse *

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.5 with additional production-certified changes *

DSE 6.0.5 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.

DSE 6.0.5 Highlights

High-value benefits of upgrading to DSE 6.0.5 include these highlights:


DSE Database (DSE core) highlights
Improvements:

• DSE Metrics Collector aggregates DSE metrics and integrates with existing monitoring solutions to facilitate
problem resolution and remediation. (DSP-17319)
See:

# Enable DSE Metrics Collector

# Configuring data and log directories for DSE Metrics Collector

Important bug fixes:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
55
DataStax Enterprise release notes

• Fixed resource leak related to streaming operations that affects tiered storage users. Excessive number of
TieredRowWriter threads causing java.lang.OutOfMemoryError. (DB-2463)

• Exception now occurs when user with no permissions returns no rows on restricted table. (DB-2668)

• Upgraded nodes that still have big-format SSTables from DSE 5.x caused errors during read. (DB-2801)

• Fixed an issue where heap memory usage seems higher with default file cache settings. (DB-2865)

• Fixed prepared statement cache issues when using row-level access control (RLAC) permissions. Existing
prepared statements were not correctly invalidated. (DB-2867)

DSE Analytics highlights


Upgrade if:

• DSEFS or AOSS fail to start.

• You use BYOS with Spark 2.3 or 2.4.

• You are getting OOM or authentication errors.

• You use scripts that invoke DSEFS commands and need to handle failures properly.

• You use dse spark-sql-metastore-migrate with DSE Unified Authentication and internal authentication.
(DSP-17632)

• You want to run the DSEFS auth demo. (DSP-17700)

• You have DSE 5.0.x with DSEFS client connected to DSE 5.1.x and later DSEFS server. (DSP-17600)

• You experienced a memory leak in Spark Thrift Server. (DSP-17433)

• You use DSEFS with listen_on_broadcast_address is true in cassandra.yaml. (DSP-17363)

• You use DSEFS and listen_address is blank in cassandra.yaml. (DSP-16296)

• You are moving directories in DSEFS. (DSP-17347)

• Improve memory handling in AlwaysOn SQL (AOSS) by enabling spark.sql.thriftServer.incrementalCollect to


prevent OOM on large result sets. (DSP-17428)

DSE Graph highlights


Upgrade if:

• You want new JMX operations for graph MBeans. (DSP-15928)

• You get errors for OLAP traversals after dropping schema elements. (DSP-15884)

• You have slow gremlin script compilation times. (DSP-14132)

• You want server side error messages for remote exceptions reported in Gremlin console. (DSP-16375)

• You occasionally get inconsistent query results. (DSP-18005)

• Use graph OLAP and want secret tokens redacted in log files. (DSP-18074)

• You want to build fuzzy-text search indexes on string properties that form part of a vertex label ID.
(DSP-17386)

DSE Search highlights


Upgrade if:

• You want security improvements:

# Upgrade Apache Commons Compress to prevent Denial Of Service (DoS) vulnerability present in
Commons Compress 1.16.1, CVE-2018-11771. (DSP-17019)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
56
DataStax Enterprise release notes

# Critical memory leak and corruption fixes for encrypted indexes. (DSP-17111)

# Upgrade Apache Tomcat to prevent Denial Of Service (DoS), CVE-2018-1336. (DSP-17303)

• You index timestamp partition keys. (DSP-17761)

• You do a lot of reindexing. (DSP-17975)

DSE 6.0.5 Known issue:

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.

DSE 6.0.5 core

Changes and enhancements:

• nodetool command changes:

# New tool sstablepartitions identifies large partitions. (DB-803)

# nodetool listendpointspendinghints command prints hint information about the endpoints this node has
hints for. (DB-1674)

# nodetool rebuild_view rebuilds materialized views for local data. Existing view data is not cleared.
(DB-2451)

# Improved messages for nodetool nodesyncservice ratesimulator command include explanation for
single node clusters and when no tables have NodeSync enabled. (DB-2468)

• Taking a snapshot causes FSError serialization error. (DB-2581)

• Direct Memory field output of nodetool gcstats includes all allocated off-heap memory. Metrics for native
memory are added in org.apache.cassandra.metrics.NativeMemoryMetrics.java. (DB-2796)

• Batch replay is interrupted and good batches are skipped when a mutation of an unknown table is found.
(DB-2855)

• New environment variable MAX_DIRECT_MEMORY overrides cassandra.yaml value for how much direct
memory (NIO direct buffers) that the JVM can use. (DB-2919)

• Improved encryption key error reporting. (DSP-17723)

Resolved issues:

• Race condition occurs on bootstrap completion. (DB-1383)

• Running the nodetool nodesyncservice enable command reports the error NodeSyncRecord
constructor assertion failed. (DB-2280)
Workaround: Before DSE 6.0.5, a restart of DSE resolves the issue so that you can execute the command
and enable NodeSync without error.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
57
DataStax Enterprise release notes

• Rebuild should not fail when a keyspace is not replicated to other datacenters. (DB-2301)

• Repair may skip some ranges due to received range cache. (DB-2432)

• Read and compaction errors with levelled compaction strategy (LCS). (DB-2446)

• Excessive number of TieredRowWriter threads causing java.lang.OutOfMemoryError (DB-2463)

• The nodetool nodesyncservice ratesimulator -deadline-overrides option is not supported. (DB-2468)

• NullPointerException during compaction on table with TimeWindowCompactionStrategy (TWCS). (DB-2472)

• Chunk cache can retain data from a previous version of a file, causing restore failures. (DB-2489)

• LineNumberInference is not failure-safe, not finding the source information can break the request. (DB-2568)

• Improved error message when Netty Epoll library cannot be loaded. (DB-2579)

• Prevent potential SSTable corruption with nodetool refresh. (DB-2594)

• The nodetool gcstats command output incorrectly reports the GC reclaimed metric in bytes, instead of the
expected MB. (DB-2598)

• TypeParser is not thread safe. (DB-2602)

• STCS, DTCS, TWCS, TMD aren't thread-safe. (DB-2609)

• Possible corruption in compressed files with uncompressed chunks. (DB-2634)

• Incorrect order of application of nodetool garbagecollect leaves tombstones that should be deleted.
(DB-2658)

• Exception should occur when user with no permissions returns no rows on restricted table. (DB-2668)

• DSE does not start with Unable to gossip with any peers error if cross_node_timeout is true.
(DB-2670)

• Memory leak on unfetched continuous paging requests. (DB-2851)

• Heap memory usage is higher with default file cache settings. (DB-2865)

• Prepared statement cache issues when using row-level access control (RLAC) permissions. Existing
prepared statements are not correctly invalidated. (DB-2867)

• User-defined aggregates (UDAs) that instantiate user-defined types (UDTs) break after restart. (DB-2771)

• Upgraded nodes that still have big-format SSTables from DSE 5.x can cause errors during read. (DB-2801)
Workaround for upgrades from DSE 5.x to DSE versions before 6.0.5 and DSE 6.7.0: Run offline
sstableupgrade before starting the upgraded node.

• Late continuous paging errors can leave unreleased buffers behind. (DB-2862)

• Security: java-xmlbuilder is vulnerable to XML external entities (XXE). (DSP-13962)

• dsetool does not work when native_transport_interface is set in cassandra.yaml. (DSP-16796)


To workaround for earlier versions: Use native_transport_interface_prefer_ipv6 instead.

• Improve config encryption error reporting for missing system key and unencrypted passwords. (DSP-17480)

• Fix sstableloader error when internode encryption, client_encryption, and config encryption are enabled.
(DSP-17536)

• sstableloader throws an error if system_info_encryption is enabled in dse.yaml and a table is encrypted.


(DSP-17826)

6.0.5 DSE Analytics

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
58
DataStax Enterprise release notes

Changes and enhancements:

• Improved error handling: only submission-related error exceptions from Spark submitted applications are
wrapped in a Dse Spark Submit Bootstrapper Failed to Submit error. (DSP-16359)

• Improved error message for dse client-tool when DSE Analytics is not correctly configured. (DSP-17322)

• AlwaysOn SQL (AOSS) improvements:

# Provide a way for clients to determine if AlwaysOn SQL (AOSS) is enabled in DSE. (DSP-17180)

# Improved logging messages with recommended resolutions for AlwaysOn SQL (AOSS). (DSP-17326,
DSP-17533)

# Improved error message for AlwaysOn SQL (AOSS) when the role specified by auth_user does not
exist. (DSP-17358)

# Set default for spark.sql.thriftServer.incrementalCollect to true for AlwaysOn SQL (AOSS). (DSP-17428)

# Structured Streaming support for (Bring Your Own Spark) BYOS Spark 2.3. (DSP-17593)

Resolved issues:

• Memory leak in Spark Thrift Server. (DSP-17433)

• Race condition allows Spark Executor working directories to be removed before stopping those executors.
(DSP-15769)

• Restore DseGraphFrame support in BYOS and spark-dependencies artifacts. Include graph frames python
library in graphframe.jar. (DSP-16383)

• Search optimizations for search analytics Spark SQL queries are applied to a datacenter that no longer has
search enabled. Queries launched from a search-enabled datacenter cause search optimizations even when
the target datacenter does not have search enabled. (DSP-16465)

• Unable to get available memory before Spark Workers are registered. (DSP-16790)

• DirectJoin and Spark Extensions don't work with Pyspark. (DSP-16904)

• Spark shell error Cannot proxy as a super user occurs when AlwaysOn Spark SQL (AOSS) is running
with authentication. (DSP-17200)

• Spark Connector has hard dependencies on dse-core when running Spark Application tests with dse-
connector. (DSP-17232)

• AlwaysOn SQL (AOSS) should attempt to auto start again on datacenter restart, regardless of the previous
status. (DSP-17359)

• AlwaysOn SQL (AOSS) restart hangs for at least 15 minutes if it cannot start, should fail with meaningful
error message. (DSP-17264)

• Submission in client mode does not support specifying remote jars (DSEFS) for main application resource
(main jar) and jars specified with --jars / spark.jars. (DSP-17382)

• Incorrect conversions in DirectJoin Spark SQL operations for timestamps, UDTs, and collections.
(DSP-17444)

• DSE 5.0.x DSEFS client is not able to list files when connected to 5.1.x (and up) DSEFS server.
(DSP-17600)

• dse spark-sql-metastore-migrate does not work with DSE Unified Authentication and internal
authentication. (DSP-17632)

• SparkContext closing is faulty with significantly increased shutdown time. (DSP-17699)

• Spark Web UI redirection drops path component. (DSP-17877)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
59
DataStax Enterprise release notes

6.0.5 DSEFS

Changes and enhancements:

• Improved error message when no available chunks are found. (DSP-16623)

• Add the ability to disable and configure DSEFS internode (node-to-node) authentication. (DSP-17721)

Resolved issues:

• DSEFS throws exceptions and cannot initialize when listen_address is left blank. (DSP-16296)

• Timeout issues in DSEFS startup. (DSP-16875)


Initialization would fail with error messages similar to:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for


query failed (no host was tried)

• DSEFS exit code not set in some cases (DSP-17266)

• Moving a directory under itself causes data loss and orphan data structures. (DSP-17347)

• DSEFS does not support listen_on_broadcast_address as configured in cassandra.yaml. (DSP-17363)

• DSEFS retries resolving corrupted paths. (DSP-17379)

• DSEFS auth demo does not work. (DSP-17700)

6.0.5 DSE Graph

Changes and enhancements:

• New tool fixes inconsistencies in graph data that are caused by schema changes, like label delete, or
improper data loading. (DSP-15884)

# DSE Graph Gremlin console: graph.cleanUp()

# Spark: spark.dseGraph("name").cleanUp()

• New JMX operations for graph MBeans. (DSP-15928)

# adjacency-cache.size - adjacency cache size attribute

# adjacency-cache.clear - operation to clean adjacency cache

# index-cache.size - vertex cache size attribute

# index-cache.clear - operation to clean vertex cache

JMX operations are not cluster-aware. Invoke on each node as appropriate to your environment.

Resolved issues:

• Properties unattached to vertex show up with null values. (DSP-12300)

• DSEGF label drop hang with a lot of edges, both ended the same label. (DSP-17096)

• Graph/Search escaping fixes. (DSP-17216, DSP-17277, DSP-17816)

• A Gremlin query with search predicate containing \u2028 or \u2029 characters fails. (DSP-17227)

• Geo.inside predicate with Polygon no longer works on secondary index if JTS is not installed. (DSP-17284)

• Search indexes on key fields work only with non-tokenized queries. (DSP-17386)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
60
DataStax Enterprise release notes

• g.V().repeat(...).until(...).path() returns incomplete path without edges. (DSP-17933)

• Graph OLTP: Potential ThreadLocal resource leak. (DSP-17808)

• Graph OLTP: Slow gremlin script compilation times. (DSP-14132)

• DseGraphFrame fail to read properties with symbols, like period (.), in names. (DSP-17818)

• DSE GraphFrame operations cache but do not explicitly uncache. (DSP-17870)

• Inconsistent results when using gremlin on static data. (DSP-18005)

• Graph OLAP: secret tokens are unmasked in log files. (DSP-18074)

6.0.5 DSE Search

Changes and enhancements:

• Large queries with oversize frames no longer cause buffer corruption on the receiver. (DSP-15664)

• If a client executes a query that results in a shard attempting to send an internode frame larger than the size
specified in frame_length_in_mb, the client receive an error message with a message like this:

Attempted to write a frame of <n> bytes with a maximum frame size of <n> bytes

In earlier versions, the query timed out with no message. Information was provided only as error in the logs.

• In earlier releases, CQL search queries failed with UTFDataFormatException on very large SELECT clauses
and when tables have a very large number of columns. (DSP-17220)
With this fix, CQL search queries fail with UTFDataFormatException only when SELECT clauses constitute
a string larger than 64k UTF-8 encode bytes.

• New DSE start-up parameter -Ddse.consistent_replace improves LOCAL_QUORUM and QUORUM


consistency on new node after node replacement. (DB-1577)

• Upgrade Apache Commons Compress to prevent Denial Of Service (DoS) vulnerability present in Commons
Compress 1.16.1, CVE-2018-11771. (DSP-17019)

• Requesting a core reindex with dsetool reload_core or REBUILD SEARCH INDEX no longer builds up a
queue of reindexing tasks on a node. Instead, a single starting reindexing task handles all reindex requests
that are already submitted to that node. (DSP-17045, DSP-13030)

• Upgrade Apache Tomcat to prevent Denial Of Service (DoS), CVE-2018-1336. (DSP-17303)

• The calculated value for maxMergeCount is changed to improve indexing performance. (DSP-17597)

max(max(<maxThreadCount * 2>, <num_tokens * 8>), <maxThreadCount + 5>)

where num_tokens is the number of token ranges to assign to the virtual node (vnode) as configured in
cassandra.yaml.

• CQL timestamp field can be part of a Solr unique key. (DSP-17761)

Resolved issues:

• Race condition occurs on bootstrap completion and Solr core fails to initialize during node bootstrap.
(DB-1383, DSP-14823)
Workaround: Restart the node that failed to initialize.

• Internode protocol can send oversize frames causing buffer corruption on the receiver. (DSP-15664)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
61
DataStax Enterprise release notes

• CQL search queries fail with UTFDataFormatException on very large SELECT clauses. (DSP-17220)
With this fix, CQL search queries fail with UTFDataFormatException only when SELECT clauses constitute
a string larger than 64k UTF-8 encode bytes.

• java.lang.AssertionError: rtDocValues.maxDoc=5230 maxDoc=4488 error is thrown in the system.log


during indexing and reindexing. (DSP-17529)

• Histogram for snapshot is unsynchronized. (DSP-17308)

• Unexpected search index errors occur when non-ASCII characters, like the U+3000 (ideographic space)
character, are in indexed columns. (DSP-17816, DSP-17961)

• TextField type in search index schema should be case-sensitive if created when using copyField.
(DSP-17817)

• gf.V().id().next() causes data to get mismatched with properties in legacy DseGraphFrame. (DSP-17979)

• Loading frozen map columns fails during search read-before-write. (DSP-18073)

Cassandra enhancements for DSE 6.0.5


DataStax Enterprise 6.0.5 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases, and adds these production-certified changes:

• Pad uncompressed chunks when they would be interpreted as compressed (CASSANDRA-14892)

• Correct SSTable sorting for garbagecollect and levelled compaction (CASSANDRA-14870)

• Avoid calling iter.next() in a loop when notifying indexers about range tombstones (CASSANDRA-14794)

• Fix purging semi-expired RT boundaries in reversed iterators (CASSANDRA-14672)

• DESC order reads can fail to return the last Unfiltered in the partition (CASSANDRA-14766)

• Fix corrupted collection deletions for dropped columns in messages (CASSANDRA-14568)

• Fix corrupted static collection deletions in messages (CASSANDRA-14568)

• Handle failures in parallelAllSSTableOperation (cleanup/upgradesstables/etc) (CASSANDRA-14657)

• Improve TokenMetaData cache populating performance avoid long locking (CASSANDRA-14660)

• Fix static column order for SELECT * wildcard queries (CASSANDRA-14638)

• sstableloader should use discovered broadcast address to connect intra-cluster (CASSANDRA-14522)

• Fix reading columns with non-UTF names from schema (CASSANDRA-14468)

General upgrade advice for DSE 6.0.5


DataStax Enterprise 6.0.5 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.5
DataStax Enterprise (DSE) 6.0.5 includes production-certified enhancements to TinkerPop 3.3.6.
Resolved issues:

• Masked sensitive configuration options in the KryoShimServiceLoader logs.

• Fixed a concurrency issue in TraverserSet.

DSE 6.0.4 release notes


8 October 2018
DataStax recommends the latest patch release for most environments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
62
DataStax Enterprise release notes

• Important bug fix

• 6.0.4 Components

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.4 Important bug fix

• Fix wrong offset in size calculation in trie builder. (DB-2477)

DSE 6.0.4 Known issue:

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

6.0.4 Components

All components from DSE 6.0.4 are listed. No components were updated from the previous DSE version.

• Apache Solr™ 6.0.1.1.2338

• Apache Spark™ 2.2.2.5

• Apache Tomcat® 8.0.47

• DataStax Bulk Loader 1.1.0

• DSE Java Driver 1.6.9

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.11.dse

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.3 with additional production-certified changes

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
63
DataStax Enterprise release notes

DSE 6.0.4 is compatible with Apache Cassandra™ 3.11 and includes all production-certified enhancements from
earlier DSE versions.
General upgrade advice for DSE 6.0.4
DataStax Enterprise 6.0.4 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
DSE 6.0.3 release notes
20 September 2018

DataStax recommends installing the latest patch release. Due to DB-2477, DataStax does not recommend
using DSE 6.0.3 for production.

• 6.0.3 Components

• DSE 6.0.3 Highlights

• DSE 6.0.3 Known issues

• General upgrade advice for DSE 6.0.3

• TinkerPop changes for DSE 6.0.3

Table 7: DSE functionality


6.0.3 DSE core 6.0.3 DSE Graph

6.0.3 DSE Analytics 6.0.3 DSE Search

6.0.3 DSEFS

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

6.0.3 Components

All components from DSE 6.0.3 are listed. Components that are updated for DSE 6.0.3 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2338 *

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
64
DataStax Enterprise release notes

• Apache Spark™ 2.2.2.5 *

• Apache Tomcat® 8.0.47

• DataStax Bulk Loader 1.1.0

• DSE Java Driver 1.6.9

• Key Management Interoperability Protocol (KMIP) 1.7.1e

• Netty 4.1.13.11.dse

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.3 with additional production-certified changes *

DataStax Enterprise 6.0.3 is compatible with Apache Cassandra™ 3.11 and includes all production-certified
enhancements from earlier DSE versions.

DSE 6.0.3 Highlights

High-value benefits of upgrading to DSE 6.0.3 include these highlights:


DSE Database (DSE core) highlights
Improvements:

• Deleting a static column and adding it back as a non-static column introduces corruption. (DB-1630)

• NodeSync command line tool only connects over JMX to a single node. (DB-1693)

• Create a log message when DDL statements are executed. (DB-2383)

Important bug fixes:

• Authentication cache loading can exhaust native threads. (DB-2248)

• The nodesync tasks fail with assertion error. (DB-2323)

• Unexpected behavior change when using row-level permissions with modification conditions like IF EXISTS.
(DB-2429)

• Non-internal users are unable to use permissions granted on CREATE. (DSP-16824)

DSE Analytics highlights


Improvements:

• Improved security isolates Spark applications. (DSP-16093)

• Upgrade to Spark 2.2.2. (DSP-16761)

• Jetty 9.4.1 upgrade addresses security vulnerabilities in Spark dependencies packaged with DSE.
(DSP-16893)

• dse spark-submit kill and status commands support optionally explicit Spark Master IP address.
(DSP-16910, DSP-16991)

Important bug fixes:

• Fixed problems with temporary and data directories for Spark applications. (DSP-15476, DSP-15880)

• Spark Cassandra Connector method saveToCassandra should not require solr_query column when search
is enabled. (DSP-16427)

• Cassandra streaming sink doesn't work with some sources. (DSP-16635)

• Metastore can't handle table with 100+ columns. (DSP-16742)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
65
DataStax Enterprise release notes

• Fully qualified paths with resource URL are correctly resolved in Spark structured streaming checkpointing.
Backport SPARK-20894. (DSP-16972)

DSEFS highlights
Important bug fixes:

• Only superusers are allowed to remove corrupted non-empty directories when authentication is enabled for
DSEFS. Improved error message when performing an operation on a corrupted path. (DSP-16340)

• cassandra nonsuperuser gets dsefs AccessDeniedException due to Insufficient permissions. (DSP-16713)

• DSEFS Hadoop layer doesn't properly translate DSEFS exceptions to Hadoop exceptions in some methods.
(DSP-16933)

• Closing DSEFS client before all issued requests are completed causes unexpected message type:
DefaultLastHttpContent error. (DSP-16953)

• Under high loads, DSEFS reports temporary incorrect state for various files/directories. (DSP-17178)

DSE Graph highlights

• Aligned query behavior using geo.inside() predicate for polygon search with and without search indexes.
(DSP-16108)

• Added convenience methods for reading graph configuration: getEffectiveAllowScan and


getEffectiveSchemaMode. (DSP-16650)

• Fixed bug where deleting a search index that was defined inside a graph fails. (DSP-16765)

• Changed default write consistency level (CL) for Graph to LOCAL_QUORUM. (DSP-17140)
In earlier DSE versions, the default QUORUM write consistency level (CL) was not appropriate for multi-
datacenter production environments.

DSE Search highlights


Improvements:

• Reduce the number of token filters for distributed searches with vnodes. (DSP-14189)

• Avoid unnecessary exception and error creation in the Solr query parser. (DSP-17147)

Important bug fixes:

• Avoid accumulating redundant router state updates during schema disagreement. (DSP-15615)

• A search enabled node could return different exceptions than a non-search enabled node when a keyspace
or table did not exist. (DSP-16834)

• DSE does not start without appropriate Tomcat JAR scanning exclusions. (DSP-16841)

• CQL single-pass queries have incorrect results when query is run with primary key and search index
schema does not contain all columns in selection. (DSP-16895)

• Node health score of 1 is not obtainable. Search node gets stuck at 0.00 node health score after replacing a
node in a cluster. (DSP-17107)

DSE 6.0.3 Known issues:

• Wrong offset in size calculation in trie builder. (DB-2477)

• Possible data loss when using DSE Tiered Storage. (DB-3404)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
66
DataStax Enterprise release notes

If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.

6.0.3 DSE core

Changes and enhancements:

• Create a log message when DDL statements are executed. (DB-2383)

• Due to Thread Per Core (TPC) asynchronous request processing architecture, the
index_summary_capacity_in_mb and index_summary_resize_interval_in_minutes settings in
cassandra.yaml are removed. (DB-2390)

• Connections on non-serialization errors are not dropped. (DB-2233)

• NetworkTopologyStrategy warning about unrecognized option at startup. (DB-2235)

• NodeSync waits to start until all nodes in the cluster are upgraded. (DB-2385)

• Improved error handling and logging for TDE encryption key management. (DP-15314)

• DataStax does more extensive testing on OpenJDK 8 due to the end of public updates for Oracle JRE/JDK
8. (DSP-16179)

• Non-internal users are unable to use permissions granted on CREATE. (DSP-16824)

Resolved issues:

• NodeSync command line tool only connects over JMX to a single node. (DB-1693)

• TotalBlockedTasksGauge metric value is computed incorrectly. (DB-2002)

• Move TWCS message "No compaction necessary for bucket size" to Trace level or NoSpam. (DB-2022)

• Non-portable syntax (MX4J bash-isms) in cassandra-env.sh broke service scripts. (DB-2123)

• sstableloader options assume the RPC/native (client) interface is the same as the internode (node-to-node)
interface. (DB-2184)

• The nodesync tasks fail with assertion error. (DB-2323)

• NodeSync fails on upgraded nodes while a cluster is in a partially upgraded state. (DB-2385)

• StackOverflowError around IncrementalTrieWriterPageAware#writeRecursive() during compaction.


(DB-2364)

• Compaction strategy instantiation errors don't generate meaningful error messages, instead return only
InvocationTargetException. (DB-2404)

• Unexpected behavior change when using row-level permissions with modification conditions like IF EXISTS.
(DB-2429)

• Authentication cache loading can exhaust native threads. The Spark master node is not able to be elected.
(DB-2248)

• Audit events for CREATE ROLE and ALTER ROLE with incorrect spacing exposes PASSWORD in plain
text. (DB-2285)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
67
DataStax Enterprise release notes

• Client warnings are not always propagated via LocalSessionWrapper. (DB-2304)

• Timestamps inserted with ISO 8601 format are saved with wrong millisecond value. (DB-2312)

• Compaction fails with IllegalArgumentException: null. (DB-2329)

• Error out if not all permissions for GRANT/REVOKE/RESTRICT/UNRESTRICT are applicable for a
resource. (DB-2373)

• BulkLoader class exits without printing the stack trace for throwable error. (DB-2377)

• Unexpected behavior change when using row-level permissions with modification conditions like IF EXISTS.
(DB-2429)

• nodetool describecluster incorrectly shows DseDelegateSnitch instead of the snitch configured in


cassandra.yaml. (DSP-16158)

• Using geo types does not work when memtable allocation type is set to offheap_objects. (DSP-16302)

• Heap-size calculation is incorrect for RpcCallStatement + SearchIndexStatement. (DSP-16731)

• The -graph option for the cassandra-stress tool failed on generating the target output html in the JAR file.
(DSP-17046)

Known issue:

• Upgraded nodes that still have big-format SSTables from DSE 5.x can cause errors during read. (DB-2801)
Workaround for upgrades from DSE 5.x to DSE versions before 6.0.5 and DSE 6.7.0: Run offline
sstableupgrade before starting the upgraded node.

6.0.3 DSE Analytics

Changes and enhancements:

• DSE pyspark libraries are added to PYTHONPATH for dse exec command. Add support for Jupyter
integration. (DSP-16797)

• DSE custom strategies allowed in Spark Structured Streaming. (DSP-16856)

• dse spark-submit kill and status commands support optionally explicit master address. (DSP-16910,
DSP-16991)

• Address security vulnerabilities in Spark dependencies packaged with DSE. Upgrade Netty to 9.4.11.
(DSP-16893)

• Jetty 9.4.1 upgrade addresses security vulnerabilities in Spark dependencies packaged with DSE.
(DSP-16893)

# Jetty Http Utility CVE-2017-7656

# Jetty Http Utility CVE-2017-7657

# Jetty Http Utility CVE-2017-7658

# Jetty Server Core CVE-2018-12538

# Jetty Utilities CVE-2018-12536

Resolved issues:

• A Spark application can be registered twice in rare instances. (DSP-15247)

• Problems with temporary and data directories for Spark applications. (DSP-15476, DSP-15880)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
68
DataStax Enterprise release notes

# DSE client applications, like Spark, will not start if user HOME environment variable is not defined, user
home directory does not exist, or the current user does not have write permissions.

# Temporary data directory for AOSS is /var/log/spark/rdd, the same as the server-side temporary
data location for Spark. Configurable with SPARK_EXECUTOR_DIRS environment variable in spark-
env.sh.

# If TMPDIR environment variable is missing, /tmp is set for all DSE apps. If /tmp directory does not
exist, it is created with 1777 permissions. If directory creation fails, perform a hard stop.

• Improved security isolates Spark applications; prevents run_as runner for Spark from running a malicious
program. (DSP-16093)

• Spark Cassandra Connector method saveToCassandra should not require solr_query column when search
is enabled. (DSP-16427)

• Cassandra streaming sink doesn't work with some sources. (DSP-16635)

• cassandra nonsuperuser gets dsefs AccessDeniedException due to Insufficient permissions. (DSP-16713)

• DSE Spark logging does not match OSS Spark logging levels. (DSP-16726)

• Metastore can't handle table with 100+ columns with auto Spark SQL table creation. (DSP-16742)

• DseDirectJoin and reading from Hive Tables does not work in Spark Structured Streaming. (DSP-16856)

• Fully qualified paths with resource URL are resolved in Spark structured streaming checkpointing. Backport
SPARK-20894. (DSP-16972)

• AlwaysOn SQL (AOSS) dsefs directory creation does not wait for all operations to finish before closing
DSEFS client. (DSP-16997)

6.0.3 DSEFS

Changes and enhancements:

• Improved error message when performing an operation on a corrupted path. (DSP-16340)

• Only superusers are able to remove corrupted non-empty directories when authentication is enabled for
DSEFS. (DSP-16340)

Resolved issues:

• 8 ms timeout failure when a data directory is removed. (DSP-16645)

• In DSEFS shell, listing too many local file system directories in a single session causes a file descriptor leak.
(DSP-16657)

• DSEFS fails to start when there is a table with duration type or other type DSEFS can't understand.
(DSP-16825)

• DSEFS Hadoop layer doesn't properly translate DSEFS exceptions to Hadoop exceptions in some methods.
(DSP-16933)

• Closing DSEFS client before all issued requests are completed causes unexpected message type:
DefaultLastHttpContent error. (DSP-16953)

• Under high loads, DSEFS reports temporary incorrect state for various files/directories. (DSP-17178)

6.0.3 DSE Graph

Changes and enhancements:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
69
DataStax Enterprise release notes

• Maximum evaluation timeout limit is 1094 days. (DSP-16709)

# Gremlin evaluation_timeout parameter:

schema.config().option('graph.traversal_sources.g.evaluation_timeout').set(Duration.ofDays(1094))

# dse.yaml options: analytic_evaluation_timeout, realtime_evaluation_timeout

• Default write consistency level (CL) for Graph is LOCAL_QUORUM. (DSP-17140)


In earlier DSE versions, the default QUORUM write consistency level (CL) was not appropriate for multi-
datacenter production environments.

Known issue:

• Point-in-polygon queries no longer work without JTS. (DSP-17284)


Although point-in-polygon queries previously worked without JTS, the queries used a Cartesian coordinate
system implementation that did not understand the dateline. For best results, install JTS. See Spatial
queries with polygons require JTS.

Resolved issues:

• Align query behavior using geo.inside() predicate for polygon search with and without search indexes.
(DSP-16108)

• Avoid looping indefinitely when a thread making internode requests is interrupted while trying to acquire a
connection. (DSP-16544)

• Setting graph.traversal_sources.g.evaluation_timeout breaks graph. (DSP-16709)

• Deleting a search index that was defined inside a graph fails. (DSP-16765)

6.0.3 DSE Search

Changes and enhancements:

• Reduce the number of unique token selections for distributed searches with vnodes. (DSP-14189)
Search load balancing strategies are per search index (per core) and are set with dsetool set_core_property.

• Log fewer messages at INFO level in TTLIndexRebuildTask. (DSP-15600)

• Avoid unnecessary exception and error creation in the Solr query parser. (DSP-17147)

Resolved issues:

• Avoid accumulating redundant router state updates during schema disagreement. (DSP-15615)

• Should not allow search index rebuild during drain. (DSP-16504)

• NRT codec is not registered at startup for Solr cores that have switched to RT. (DSP-16663)

• Dropping search index when index build is in progress can interrupt Solr core closure. (DSP-16774)

• Exceptions thrown when search is enabled and table is not found in existing keyspace. (DSP-16834)

• DSE should not start without appropriate Tomcat JAR scanning exclusions. (DSP-16841)

• CQL single-pass queries have incorrect results when query is run with primary key and search index
schema does not contain all columns in selection. (DSP-16895)
Best practice: For optimal single-pass queries, including queries where solr_query is used with a partition
restriction, and queries with partition restrictions and a search predicate, ensure that the columns to
SELECT are not indexed in the search index schema.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
70
DataStax Enterprise release notes

Workaround: Since auto-generation indexes all columns by default, you can ensure that the field is not
indexed but still returned in a single-pass query. For example, this statement indexes everything except
for column c3, and informs the search index schema about column c3 for efficient and correct single-pass
queries.

CREATE SEARCH INDEX ON test_search.abc WITH COLUMNS * { indexed : true }, c3


{ indexed : false };

• Node health score of 1 is not obtainable. Search node gets stuck at 0.00 node health score after replacing a
node in a cluster. (DSP-17107)

General upgrade advice for DSE 6.0.3


DataStax Enterprise 6.0.3 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.3
DataStax Enterprise (DSE) 6.0.3 includes TinkerPop 3.3.3 and all enhancements from earlier DSE versions.

DSE 6.0.2 release notes


19 July 2018

• 6.0.2 Components

• DSE 6.0.2 Highlights

• DSE 6.0.2 Known issues

• General upgrade advice for DSE 6.0.2

• TinkerPop changes for DSE 6.0.2

Table 8: DSE functionality


6.0.2 DSE core 6.0.2 DSE Graph

6.0.2 DSE Analytics 6.0.2 DSE Search

6.0.2 DSEFS DataStax Bulk Loader 1.1.0

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
71
DataStax Enterprise release notes

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

6.0.2 Components

All components from DSE 6.0.2 are listed. Components that are updated for DSE 6.0.2 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2321 *

• Apache Spark™ 2.2.1.2

• Apache Tomcat® 8.0.47

• DataStax Bulk Loader 1.1.0 *

• DSE Java Driver 1.6.5

• Netty 4.1.13.11.dse

• Spark Jobserver 0.8.0.45 DSE custom version

• TinkerPop 3.3.3 with additional production-certified changes *

DataStax Enterprise 6.0.2 is compatible with Apache Cassandra™ 3.11 and includes all production-certified
enhancements from earlier DSE versions.

DSE 6.0.2 Highlights

High-value benefits of upgrading to DSE 6.0.2 include these highlights:


DSE Analytics and DSEFS

• Fixed issue where CassandraConnectionConf creates excessive database connections and reports too
many HashedWheelTimer instances. (DSP-16365)

DSE Graph

• Fixed several edge cases of using search indexes. (DSP-14802, DSP-16292)

DSE Search

• Search index permissions can be applied at the keyspace level. (DSP-15385)

• Schemas with stored=true work because stored=true is ignored. The workaround for 6.0.x upgrades with
schema.xml fields with “indexed=false, stored=true, docValues=true” is no longer required. (DSP-16392)

• Minor bug fixes and error handling improvements. (DSP-16435, DSP-16061, DSP-16078)

6.0.2 DSE core

Changes and enhancements:

• sstableloader supports custom config file locations. (DSP-16092)

• -d option to create local encryption keys without configuring the directory in dse.yaml. (DSP-15380)

Resolved issues:

• Show delegated snitch in nodetool describecluster. (DB-2057)

• Use more precise grep patterns to prevent accidental matches in cassandra-env.sh. (DB-2114)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
72
DataStax Enterprise release notes

• Add missing equality sign to SASI schema snapshot. (DB-2129)

• For tables using DSE Tiered Storage, nodetool cleanup places cleaned SSTables in the wrong tier.
(DB-2173)

• Support creating system keys before the output directory is configured in dse.yaml. (DSP-15380)

• Client prepared statements are not populated in system.prepared_statements table. (DSP-15900)

• Improved compatibility with external tables stored in the DSE Metastore in remote systems. (DSP-16561)

DSE 6.0.2 Known issue:

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.

6.0.2 DSE Analytics

Changes and enhancements:

• Apache Hadoop Azure libraries for Hadoop 2.7.1 have been added to the Spark classpath to simplify
integration with Microsoft Azure and Microsoft Azure Blob Storage. (DSP-15943)

• AlwaysOn SQL (AOSS) improvements:

# AlwaysOn SQL (AOSS) support for enabling Kerberos and SSL at the same time. (DSP-16087)

# Add 120 seconds wait time so that Spark Master recovery process completes before status check of
AlwaysOn SQL (AOSS) app. (DSP-16249)

# AlwaysOn SQL (AOSS) driver continually runs on a node even when DSE is down. (DSP-16297)

# AlwaysOn SQL (AOSS) binds to native_transport_address. (DSP-16469)

# Improved defaults and errors for AlwaysOn SQL (AOSS) workpool. (DSP-16343)

Resolved issues:

• CassandraConnectionConf creates excessive database connections and reports too many


HashedWheelTimer instances. (DSP-16365)

• Need to disable cluster object JMX metrics report to prevent count exceptions spam in Spark driver log.
(DSP-16442)

• Fixed Spark-Connector dependencies and published SparkBuildExamples. (DSP-16699)

6.0.2 DSEFS

Changes and enhancements:

• DSEFS operations: chown, chgrp, and chmod support recursive (-R) and verbose (-v) flag. (DSP-14238)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
73
DataStax Enterprise release notes

• Client and internode connection improvements. (DSP-14284, DSP-16065)

# DSEFS clients close idle connections after 60 seconds, configurable in dse.yaml.

# Idle DSEFS internode connections are closed after 120 seconds. Configurable with new dse.yaml
option internode_idle_connection_timeout_ms.

# Configurable connection pool with core_max_concurrent_connections_per_host.

• DSEFS clients close idle connections after 60 seconds, configurable in dse.yaml. (DSP-14284)

• Improvements to DataSourceInputStream remove possible lockup. (DSP-16409)

# If the second read is issued after a failed read, it is not blocked forever. The stream is automatically
closed on errors, and subsequent reads will fail with IllegalStateException.

# The timeout message includes information about the underlying DataSource object.

# No more reads are issued to the underlying DataSource after it reports hasMoreData = false.

# The read loop has been simplified to properly move to the next buffer if the requested number of bytes
hasn't been delivered yet.

# Empty buffer returned from the DataSource when hasMoreData = true is not treated as an EOF. The
read method validates offset and length arguments.

• Security improvement: DSEFS uses an isolated native memory pool for file data and metadata sent between
nodes. This isolation makes it harder to exploit potential memory management bugs. (DSP-16492)

Resolved issues:

• DSEFS silently fails when TCP port 5599 is not open between nodes. (DSP-16101)

6.0.2 DSE Graph

Changes and enhancements:

• Vertices and vertex properties created or modified with graphframes respect TTL as defined in the schema.
In earlier versions, vertices and vertex properties had no TTL. Edges created or modified with graphframes
continue to have no TTL. (DSP-15555)

• Improved Gremlin console authentication configuration. (DSP-9905)

Resolved issues:

• 0 (zero) is not treated as unlimited abort of max num errors. (DGL-307)

• Search indexes are broken for multi cardinality properties. (DSP-14802)

• DGF interceptor does not take into account GraphStep parameters with g.V(id) queries. (DSP-16172)

• The clause LIMIT does not work in a graph traversal with search predicate TOKEN, returning only a subset
of expected results. (DSP-16292)

6.0.2 DSE Search

Changes and enhancements:

• The node health option uptime_ramp_up_period_seconds default value in dse.yaml is reduced to 3 hours
(10800 seconds). (DSP-15752)

• CQL solr_query supports Solr facet heatmaps. (DSP-16404)

• Improved handling of asynchronous I/O timeouts during search read-before-write. (DSP-16061)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
74
DataStax Enterprise release notes

• Schemas with stored=true work because stored=true is ignored. (DSP-16392)

• Use monotonically increasing time source for search query execution latency calculation. (DSP-16435)

Resolved issues:

• Search index permissions can be applied at keyspace level. (DSP-15835)

• The encryptors thread cache in ThreadLocalIndexEncryptionConfiguration leaves entries in the cache.


(DSP-16078)

• Classpath conflict between Lucene and SASI versions of Snowball. (DSP-16116)

• Indexing fails if fields have 'indexed=false', 'stored=true', and `docValues=true'. (DSP-16392)

DataStax Bulk Loader 1.1.0

Changes and enhancements:

• DataStax Bulk Loader (dsbulk) version 1.1.0 is automatically installed with DataStax Enterprise 6.0.2, and
can also be installed as a standalone tool. See DataStax Bulk Loader 1.1.0 release notes. (DSP-16484)

General upgrade advice for DSE 6.0.2


DataStax Enterprise 6.0.2 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.2
DataStax Enterprise (DSE) 6.0.2 includes these production-certified enhancements to TinkerPop 3.3.3:

• Implemented TraversalSelectStep which allows to select() runtime-generated keys.

• Coerced BulkSet to g:List in GraphSON 3.0.

• Deprecated CredentialsGraph DSL in favor of CredentialsTraversalDsl which uses the recommended


method for Gremlin DSL development.

• Allowed iterate() to be called after profile().

• Fixed regression issue where the HTTPChannelizer doesn’t instantiate the specified
AuthenticationHandler.

• Defaulted GLV tests for gremlin-python to run for GraphSON 3.0.

• Fixed a bug with Tree serialization in GraphSON 3.0.

• In gremlin-python, the GraphSON 3.0 g:Set type is now deserialized to List.

DSE 6.0.1 release notes


5 June 2018

• 6.0.1 Components

• 6.0.1 Highlights

• DSE 6.0.1 Known issues

• Cassandra enhancements for DSE 6.0.1

• General upgrade advice DSE 6.0.1

• TinkerPop changes for 6.0.1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
75
DataStax Enterprise release notes

Table 9: DSE functionality


6.0.1 DSE core 6.0.1 DSE Graph

6.0.1 DSE Analytics 6.0.1 DSE Search

6.0.1 DSEFS DataStax Bulk Loader 1.0.2

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

6.0.1 Components

All components from DSE 6.0.1 are listed. Components that are updated for DSE 6.0.1 are indicated with an
asterisk (*).

• Apache Solr™ 6.0.1.1.2295 *

• Apache Spark™ 2.2.1.2 *

• Apache Tomcat® 8.0.47

• DataStax Bulk Loader 1.0.2 *

• DSE Java Driver 1.6.5

• Netty 4.1.13.11.dse

• Spark Jobserver 0.8.0.45 DSE custom version *

• TinkerPop 3.3.3 with additional production-certified changes *

DSE 6.0.1 is compatible with Apache Cassandra™ 3.11 and adds additional production-certified enhancements.

DSE 6.0.1 Highlights

High-value benefits of upgrading to DSE 6.0.1 include these highlights:


DataStax Enterprise core

• Fix binding JMX to any address. (DB-2081)

• DataStax Bulk Loader 1.0.2 is bundled with DSE 6.0.1. (DSP-16206)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
76
DataStax Enterprise release notes

DSE Analytics and DSEFS

• Upgrade to Spark 2.2.1 for bug fixes.

• Fixed issue where multiple Spark Masters can be started on the same machine. (DSP-15636)

• Improved Spark Master discovery and reliability. (DSP-15801, DSP-14405)

• Improved AlwaysOn SQL (AOSS) startup reliability. (DSP-15871, DSP-15468, DSP-15695, DSP-15839)

• Resolved the missing /tmp directory in DSEFS after fresh cluster installation. (DSP-16058)

• Fixed handling of Parquet files with partitions. (DSP-16067)

• Fixed the HashedWheelTimer leak in Spark Connector that affected BYOS. (DSP-15569)

DSE Search

• Fix for the known issue that prevented using TTL (time-to-live) with DSE Search live indexing (RT indexing).
(DSP-16038, DSP-14216)

• Addresses security vulnerabilities in libraries packaged with DSE. (DSP-15978)

• Fix for using faceting with non-zero offsets. (DSP-15946)

• Fix for ORDER BY clauses in native CQL syntax. (DSP-16064)

DSE 6.0.1 Known issues:

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.

• dsetool does not work when native_transport_interface is set in cassandra.yaml. (DSP-16796)


To workaround: Use native_transport_interface_prefer_ipv6 instead.

6.0.1 DSE core

Changes and enhancements:

• Improved NodeSync usability with secure environments. (DB-2034)

• sstableloader supports custom config file locations. (DSP-16092)

• LDAP tuning parameters allow all LDAP connection pool options to be set. (DSP-15948)

Resolved issues:

• Use the indexed item type as backing table key validator of 2i on collections. (DB-1121)

• Add getConcurrentCompactors to JMX in order to avoid loading DatabaseDescriptor to check its value in
nodetool. (DB-1730)

• Send a final error message when a continuous paging session is cancelled. (DB-1798)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
77
DataStax Enterprise release notes

• Ignore empty counter cells on digest calculation. (DB-1881)

• Apply view batchlog mutation parallel with local view mutations. (DB-1900)

• Use same IO queue depth as Linux scheduler and advise against overriding it. (DB-1909)

• Fix startup error message rejecting COMPACT STORAGE after upgrade. (DB-1916)

• Improve user warnings on startup when libaio package is not installed. (DB-1917)

• Avoid copy-on-heap when flushing. (DB-1916)

• Set MX4J_ADDRESS to 127.0.0.1 if not explicitly set. (DB-1950)

• Prevent OOM due to OutboundTcpConnection backlog by dropping request messages after the queue
becomes too large. (DB-2001)

• Fix exception in trace log messages of non-frozen user types. (DB-2005)

• Limit max cached direct buffer on NIO to 1 MB. (DB-2028)

• Reusing table ID with CREATE TABLE causes failure on restart. (DB-2032)

• BulkLoader class exits without printing the stack trace. (DB-2033)

• Fix binding JMX to any address. (DB-2081)

• sstableloader does not decrypt passwords using config encryption in DSE. (DSP-13492)

• dse client-tool help doesn't work if ~/.dserc file exists. (DSP-15869)

6.0.1 DSE Analytics

• The Spark Jobserver demo has an incorrect version for the Spark Jobserver API. (DSP-15832)
Workaround: In the demo's gradle.properties file, change the version from 0.6.2 to 0.6.2.238.

Changes and enhancements:

• Decreased the number of exceptions logged during master move from node to node. (DSP-14405)

• When querying remote cluster from Spark job, connector does not route requests to data replicas.
(DSP-15202)

• Long CassandraRDD.where clauses throw StackOverflow exceptions. (DSP-15438)

• AlwaysOn SQL dependency on JPS is removed. The jps_directory entry in dse.yaml is removed.
(DSP-15468)

• Improved AlwaysOn SQL configuration. (DSP-15734)

• Improved security for Spark JobServer. All uploaded JARs, temporary files, and logs are created under the
current user's home directory: ~/.spark-jobserver. (DSP-15832)

• Improved process scanning for AlwaysOn SQL driver. (DSP-15839)

• In Portfolio demo, pricer is no longer required to be run with sudo. (DSP-15970)

• Scala 2.10 in BYOS is no longer supported. (DSP-15999)

• Improved validation for authentication configuration for AlwaysOn SQL. (DSP-16018)

• Optimize memoizing converters for UDTs. (DSP-16121)

• During misconfigured cluster bootstrap, the AlwaysOn SqlServer does not start due to missing /tmp/hive
directory in DSEFS. (DSP-16058)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
78
DataStax Enterprise release notes

Resolved issues:

• A shard request timeout caused an assertion error from Lucene getNumericDocValues in the log.
(DSP-14216)

• Multiple Spark Masters can be started on the same machine. (DSP-15636)

• Do not start AlwaysOn SQL until Spark Master is ready. (DSP-15695)

• DSE client tool returns wrong Spark Master address. (DSP-15801)

• In some situations, AlwaysOn SQL cannot start unless DSE node is restarted. (DSP-15871)

• Portfolio demo does not work on package installs. (DSP-15970)

• Java driver in Spark Connector uses daemon threads to prevent shutdown hooks from being blocked by
driver thread pools. (DSP-16051)

• dse client-tool spark sql-schema --all exports definitions for solr_admin keyspace. (DSP-16073).

• HashedWheelTimer leak in Spark Connector, affecting BYOS. (DSP-15569)

6.0.1 DSEFS

Resolved issues:

• Can't quote file patterns in DSEFS shell. (DSP-15550)

6.0.1 DSE Graph

Changes and enhancements:

• DseGraphFrame performance improvement reduces number of joins for count() and other id only queries.
(DSP-15554)

• Performance improvements for traversal execution with Fluent API and script-based executions.
(DSP-15686)

Resolved issues:

• edge_threads and vertex_threads can end up being 0. (DGL-305)

• When using graph frames, cannot upload edges when ids for vertices are complex non-text ids.
(DSP-15614)

• CassandraHiveMetastore is prevented from adding multiple partitions for file-based data sources. Fixes
MSCK REPAIR TABLE command. (DSP-16067)

6.0.1 DSE Search

Changes and enhancements:

• Output Solr foreign filter cache warning only on classes other than DSE classes. (DSP-15625)

• Solr security upgrades bundle. (DSP-15978)

# Apache Directory API All: CVE-2015-3250

# Apache Hadoop Common: CVE-2016-5393, CVE-2017-15713, CVE-2016-3086

# Apache Tika parsers: CVE-2018-1339

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
79
DataStax Enterprise release notes

# Bouncy Castle Provider: CVE-2018-5382

# Data Mapper for Jackson: CVE-2018-5968, CVE-2017-17485, CVE-2017-15095, CVE-2018-7489,


CVE-2018-5968, CVE-2017-7525

# Guava: Google Core Libraries for Java: CVE-2018-10237

# Simple XML: CVE-2017-1000190

# Xerces2-j: CVE-2013-4002

# uimaj-core: CVE-2017-15691

Resolved issues:

• Offline sstable tools fail is DSE Search index is present on a table. (DSP-15628)

• HTTP read on solr_stress doesn't inject random data into placeholders. (DSP-15727)

• Servlet container shutdown (Tomcat) prematurely stops logback context. (DSP-15807)

• ERROR 500 on distributed http json.facet with non-zero offset. (DSP-15946)

• Search index TTL Expiration thread loops without effect with live indexing (RT indexing). (DSP-16038)

• Search incorrectly assumes only single-row ORDER BY clauses on first clustering key. (DSP-16064)

DataStax Bulk Loader 1.0.2

• DataStax Bulk Loader 1.0.2 is bundled with DSE 6.0.1. (DSP-16206)

DataStax recommends using the latest DataStax Bulk Loader 1.2.0 For details, see DataStax Bulk Loader.
Cassandra enhancements for DSE 6.0.1
DataStax Enterprise 6.0.1 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases, and adds these production-certified changes:

• cassandra-stress throws NPE if insert section isn't specified in user profile (CASSSANDRA-14426)

• nodetool listsnapshots is missing local system keyspace snapshots (CASSANDRA-14381)

• Remove string formatting lines from BufferPool hot path (CASSANDRA-14416)

• Detect OpenJDK jvm type and architecture (CASSANDRA-12793)

• Don't use guava collections in the non-system keyspace jmx attributes (CASSANDRA-12271)

• Allow existing nodes to use all peers in shadow round (CASSANDRA-13851)

• Fix cqlsh to read connection.ssl cqlshrc option again (CASSANDRA-14299)

• Downgrade log level to trace for CommitLogSegmentManager (CASSANDRA-14370)

• CQL fromJson(null) throws NullPointerException (CASSANDRA-13891)

• Serialize empty buffer as empty string for json output format (CASSANDRA-14245)

• Cassandra not starting when using enhanced startup scripts in windows (CASSANDRA-14418)

• Fix progress stats and units in compactionstats (CASSANDRA-12244)

• Better handle missing partition columns in system_schema.columns (CASSANDRA-14379)

• Deprecate background repair and probablistic read_repair_chance table options (CASSANDRA-13910)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
80
DataStax Enterprise release notes

• Delay hints store excise by write timeout to avoid race with decommission (CASSANDRA-13740)

• Add missed CQL keywords to documentation (CASSANDRA-14359)

• Avoid deadlock when running nodetool refresh before node is fully up (CASSANDRA-14310)

• Handle all exceptions when opening sstables (CASSANDRA-14202)

• Handle incompletely written hint descriptors during startup (CASSANDRA-14080)

• Handle repeat open bound from SRP in read repair (CASSANDRA-14330)

• CqlRecordReader no longer quotes the keyspace when connecting, as the java driver will
(CASSANDRA-10751)

• Fix compaction failure caused by reading un-flushed data (CASSANDRA-12743)

• Fix JSON queries with IN restrictions and ORDER BY clause (CASSANDRA-14286)

• CQL fromJson(null) throws NullPointerException (CASSANDRA-13891)

• Check checksum before decompressing data (CASSANDRA-14284)

General upgrade advice DSE 6.0.1


DataStax Enterprise 6.0.1 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for 6.0.1
DataStax Enterprise (DSE) 6.0.1 includes these production-certified enhancements to TinkerPop 3.3.3:

• Performance enhancement to Bytecode deserialization. (TINKERPOP-1936)

• Path history isn't preserved for keys in mutations. (TINKERPOP-1947)

• Traversal construction performance enhancements (TINKERPOP-1950)

• Bump to Groovy 2.4.15 - resolves a Groovy bug preventing Lambda creation in GLVs in some cases.
(TINKERPOP-1953)

DSE 6.0.0 release notes


17 April 2018

• 6.0.0 Components

• 6.0 New features

• Cassandra enhancements for DSE 6.0

• General upgrade advice for DSE 6.0.0

• TinkerPop changes for DSE 6.0.0

Table 10: DSE functionality


6.0.0 DSE core 6.0.0 DSE Graph

6.0.0 DSE Advanced Replication 6.0.0 DSE Search

6.0.0 DSE Analytics 6.0.0 DataStax Studio

6.0.0 DSEFS DataStax Bulk Loader 1.0.1

DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
81
DataStax Enterprise release notes

The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:

• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.

• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.

• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.

• DataStax recommends 16 or more logical cores for Advanced Performance nodes.

DSE 6.0.0 Do not use TTL (time-to-live) with DSE Search live indexing (RT indexing). To use these features
together, upgrade to DSE 6.0.1. (DSP-16038)

6.0.0 Components

• Apache Solr™ 6.0.1.1.2234

• Apache Spark™ 2.2.0.14

• Apache Tomcat® 8.0.47

• DataStax Bulk Loader 1.0.1

• DSE Java Driver 1.6.5

• Netty 4.1.13.11.dse

• Spark Jobserver 0.8.0.44 (DSE custom version)

• TinkerPop 3.3.2 with additional production-certified changes

DSE 6.0 is compatible with Apache Cassandra™ 3.11 and adds additional production-certified enhancements.

6.0 New features

See DataStax Enterprise 6.0 new features.

6.0.0 DSE core

Experimental features. These features are experimental and are not supported for production:

• SASI indexes.

• DSE OpsCenter Labs features in OpsCenter.

Known issues:

• sstableloader incorrectly detects keyspace when working with snapshots. (DB-2649)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
82
DataStax Enterprise release notes

Workaround: create a directory that matches the keyspace name, and then create symbolic links into that
directory from snapshot directory with name of the destination table. For example:

$ mkdir -p /var/tmp/keyspace1 ln -s <path>/cassandra/data/keyspace1/


standard1-0e65b961deb311e88daf5581c30c2cd4/snapshots/data-load /var/tmp/keyspace1/
standard1

• Possible data loss when using DSE Tiered Storage. (DB-3404)


If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.

• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.

• DSE 6.0 will not start with OpsCenter 6.1 installed. OpsCenter 6.5 is required for managing DSE 6.0
clusters. See DataStax OpsCenter compatibility with DSE. (DSP-15996)

Changes and enhancements:

Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading to DSE 6.0, you
must migrate all tables that have COMPACT STORAGE to CQL table format.
Upgrades from DSE 5.0.x or DSE 5.1.x with Thrift-compatible tables require DSE 5.1.6 or later or DSE 5.0.12
or later.

• For TWCS, flush to separate SSTables based on write time. (DB-42)

• Allow to aggregate by time intervals. Allow aggregates in GROUP BY results. (DB-75)

• Allow user-defined functions (UDFs) within GROUP BY clause and allow non-deterministic UDFs within
GROUP BY clause. New CQL keywords (DETERMINISTIC and MONOTONIC). The cassandra.yaml file
enable_user_defined_functions_threads option has no changes to default behavior of true; set to false to
use UDFs in GROUP BY clauses. (DB-672)

• Improved architecture with Thread Per Core (TPC) asynchronous read and write paths. (DB-707)
New DSE start-up parameters:

# -Ddse.io.aio.enable

# -Ddse.io.aio.force

Observable metrics with nodetool tpstats.

• New options in cassandra.yaml. (DB-111, DB-707, DB-945, DB-1381, DB-1656)

# aggregated_request_timeout_in_ms

# batchlog_endpoint_strategy to improve batchlog endpoint selection. (DB-1367)

# client_timeout_sec, cancel_timeout_sec, file_cache_size_in_mb, tpc_cores, tpc_io_cores,


io_global_queue_depth

# The rpc_* properties are deprecated and renamed to native_transport_*. (DB-1130)

# streaming_connections_per_host

# key_cache_* settings are no longer used in new SSTable format, but retained to support existing
SSTable format

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
83
DataStax Enterprise release notes

• Removed options in cassandra.yaml:

# buffer_pool_use_heap_if_exhausted, concurrent_counter_writes, concurrent_materialized_view_writes,


concurrent_reads, concurrent_writes, credentials_update_interval_in_ms,
credentials_validity_in_ms, max_client_wait_time_ms, max_threads, native_transport_max_threads,
otc_backlog_expiration_interval_ms, request_scheduler.

# Deprecated options:
Deprecated options Replaced with

rpc_address native_transport_address

rpc_interface native_transport_interface

rpc_interface_prefer_ipv6 native_transport_interface_prefer_ipv6

rpc_port native_transport_port

broadcast_rpc_address native_transport_broadcast_address

rpc_keepalive native_transport_keepalive

• Default value changes in cassandra.yaml:

# batch_size_warn_threshold_in_kb: 64

# column_index_size_in_kb: 16

# memtable_flush_writers: 4

# roles_validity_in_ms: 120000 (2 minutes)

# permissions_validity_in_ms: 120000 (2 minutes)

• Legacy auth tables no longer supported. (DB-897)

• Authentication and authorization improvements. RLAC (setting row-level permissions) speed is improved.
(DB-909)

• Incremental repair is opt-in. (DB-1126)

• JMX exposed metrics for external dropped messages include COUNTER_MUTATION, MUTATION,
VIEW_MUTATION, RANGE_SLICE, READ, READ_REPAIR, LWT, HINTS, TRUNCATE, SNAPSHOT,
SCHEMA, REPAIR, OTHER. (DB-1127)

• By default, enable heap histogram logging on OutOfMemoryError. To disable, set the


cassandra.printHeapHistogramOnOutOfMemoryError system property to false. (DB-1498)

• After upgrade is complete and all nodes are on DSE 6.0 and the required schema change occurs,
authorization (CassandraAuthorizer) and audit logging (CassandraAuditWriter) enable the use of new
columns. (DB-1597)

• Automatic fallback of GossipingPropertyFileSnitch to PropertyFileSnitch (cassandra-


topology.properties) is disabled by default and can be enabled by using the -
Dcassandra.gpfs.enable_pfs_compatibility_mode=true startup flag. (DB-1663)

• Improved messages when mixing mutually exclusive YAML properties. (DB-1719)

• Background read-repair. (DB-1771)

• Authentication filters used in DSE Search moved to DSE core. (DSP-12531)

• The DataStax Installer is no longer supported. To upgrade from earlier versions that used the DataStax
Installer, see Upgrading to DSE 6.0 from DataStax Installer installations. For new installations, use a
supported installation method. (DSP-13640)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
84
DataStax Enterprise release notes

• Improved authentication and security. (DSP-14173)


Supporting changes:

# Allow to grant/revoke multiple permissions in one statement. (DB-792)

# Database administrators can manage role permissions without having access to the data. (DB-757)

# Filter rows from system keyspaces and system_schema tables based on user permissions. New
system_keyspaces_filtering option in cassandra.yaml returns information based on user access to
keyspaces. (DB-404)

# Removed cassandra.yaml options credentials_validity_in_ms and credentials_update_interval_in_ms.


For upgrade impact, see Upgrading from DataStax Enterprise 5.1 to 6.0. (DB-909)

# Warn when the cassandra superuser logs in. (DB-104)

# New metric for replayed batchlogs and trace-level logging include the age of the replayed batchlog.
(DB-1314)

# Decimals with a scale > 100 are no longer converted to a plain string to prevent
DecimalSerializer.toString() being used as an attack vector. (DB-1848)

# Auditing by role: new dse.yaml audit options included_roles and excluded_roles. (DSP-15733)

• libaio package dependency for DataStax Enterprise 6.0 installations on RHEL-based systems using Yum
and on Debian-based systems using APT install. For optimal performance in tarball installations, DataStax
recommends installing the libaio package. (DSP-14228)

• DSE performance objects metrics changes in tables dse_perf.node_snapshot, dse_perf.cluster_snapshot,


and dse_perf.dc_snapshot. (DSP-14413)

# Metrics are populated in two new columns: background_io_pending and hints_pending.

# Metrics are not populated, -1 is written for columns: read_requests_pending, write_requests_pending,


completed_mutations, and replicate_on_write_tasks_pending.

• The default number of threads used by performance objects increased from 1 to 4. Upgrade restrictions
apply. (DSP-14515)

• All tables are created without COMPACT STORAGE. (DSP-14735)

• Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, migrate all
tables that have COMPACT STORAGE to CQL table format. DSE 6.0 will not start if COMPACT STORAGE
tables are present. See Upgrading from DSE 5.1.x or Upgrading from DSE 5.0.x. (DSP-14839)

• The minimum supported version of Oracle Java SE Runtime Environment 8 (JDK) is 1.8u151. (DSP-14818)

• sstabledump supports the -l option to output each partition as its own JSON object. (DSP-15079)

• Audit improvements, new and changed filtering event categories. (DSP-15724)

• Upgrades to OpsCenter 6.5 or later are required before starting DSE 6.0. DataStax recommends upgrading
to the latest OpsCenter version that supports your DSE version. Check the compatibility page for your
products. (DSP-15996)

Resolved issues:

• Warn when the cassandra superuser logs in. (DB-104)

• Prevent multiple serializations of mutation. (DB-370)

• Internal implementation of paging by bytes. (DB-414)

• Connection refused should be logged less frequently. (DB-455)

• Refactor messaging service code. (DB-497)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
85
DataStax Enterprise release notes

• Change protocol to allow sending keyspace independent of query string. (DB-600)

• Add result set metadata to prepared statement MD5 hash calculation. (DB-608)

• Add DSE columns to system tables. (DB-716)

system.peers:
dse_version text,
graph boolean,
server_id text,
workload text,
workloads frozen<set<text>>

system.local:
dse_version text,
graph boolean,
server_id text,
workload text,
workloads frozen<set<text>>

• Fix LWT asserts for immutable TableMetadat.a (DB-728)

• MigrationManager should use toDebugString() when logging TableMetadata. (DB-739)

• Create administrator roles who can carry out everyday administrative tasks without having unnecessary
access to data. (DB-757)

• When repairing Paxos commits, only block on nodes are being repaired. (DB-761)

• Allow to grant/revoke multiple permissions in one statement (DB-792)

• SystemKeyspace.snapshotOnVersionChange() never called in production code. (DB-797)

• Error in counting iterated SSTables when choosing whether to defrag in timestamp ordered path. (DB-1018)

• Check for mismatched versions when answering schema pulls. (DB-1026)

• Expose ports (storage, native protocol, JMX) in system local and peers tables. (DB-1040)

• Rename ITrigger interface method from augment to augmentNonBlocking. (DB-1046)

• Load mapped buffer into physical memory after mlocking it for MemoryOnlyStrategy. (DB-1052)

• New STARTUP message parameters identify clients. (DB-1054)

• Emit client warning when a GRANT/REVOKE/RESTRICT/UNRESTRICT command has no effect. (DB-1083)

• Update bundled Python driver to 2.2.0.post0-d075d57. (DB-1152)

• Forbid advancing KeyScanningIterator before exhausting or closing the current iterator. (DB-1199)

• Ensure that empty clusterings with kind==CLUSTERING are Clustering.EMPTY. (DB-1248)

• New nodetool abortrebuild command stops a currently running rebuild operation. (DB-1234)

• Batchlog replays do not leverage remote coordinators. (DB-1337)

• Avoid copying EMPTY_STATIC_ROW to heap again with offheap memtable. (DB-1375)

• Allow DiskBoundaryManager to cache different directories. (DB-1454)

• Abort repair when there is only one node. (DB-1511)

• OutOfMemory during view update. (DB-1493)

• Drop response on view lock acquisition timeout and add ViewLockAcquisitionTimeouts metric. (DB-1522)

• Handle race condition on dropping keyspace and opening keyspace. (DB-1570)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
86
DataStax Enterprise release notes

• The JVM version check in conf/cassandra-env.sh does not work. (DB-1882)

• dsetool ring prints ERROR when data_file_directories is removed from cassandra.yaml. (DSP-13547)

• Driver: Jackson-databind is vulnerable to remote code execution (RCE) attacks. (DSP-13498)

• LDAP library issue. (DSP-15927)

6.0.0 DSE Advanced Replication

Changes and enhancements:

• Support for DSE Advanced Replication V1 is removed. For V1 installations, you must first upgrade to DSE
5.1.x and migrate your DSE Advanced Replication to V2, and then upgrade to DSE 6.0. (DSP-13376)

• Enhanced CLI security prevents injection attacks and sanitizes and validates the command line inputs.
(DSP-13682)

Resolved issues:

• Improve logging on unsupported operation failure and remove the failed mutation from replog. (DSP-15043)

• Channel creation fails with NPE when using mixed case destination name. (DSP-15538)

6.0.0 DSE Analytics

Experimental features. These features are experimental and are not supported for production:

• Importing graphs using DseGraphFrame.

Known issues:

• DSE Analytics: Additional configuration is required when enabling context-per-jvm in the Spark Jobserver.
(DSP-15163)

Changes and enhancements:

• Previously deprecated environment variables, including SPARK_CLASSPATH, are removed in Spark 2.2.0.
(DSP-8379)

• AlwaysOn SQL service, a HA (highly available) Spark SQL Thrift server. (DSP-10996)

# JPS is an option required for nodes with AlwaysON SQL enabled.

# The spark_config_settings and hive_config_settings are removed from dse.yaml. The configuration is
provided in the spark-alwayson-sql.conf file in DSEHOME/resources/spark/conf with the same default
contents as DSEHOME/resources/spark/conf/spark-defaults.conf. (DSP-15837)

• Cassandra File System (CFS) is removed. Use DSEFS instead. Before upgrading to DSE 6.0, remove CFS
keyspaces. See the From CFS to DSEFS dev blog post. (DSP-12470)

• Optimization for SearchAnalytics with SELECT COUNT(*) and no predicates. (DSP-12669)

• Authenticate JDBC users to Spark SQL Thrift Server. Queries that are executed during JDBC session are
run as the user who authenticated through JDBC. (DSP-13395)

• Solr optimization is automatic; spark.sql.dse.solr.enabled is deprecated, use


spark.sql.dse.search.enableOptimization instead. (DSP-13398)

• Optimization for SearchAnalytics with SELECT COUNT(*) and no predicates. (DSP-13398)

• dse spark-beeline command is removed, use dse beeline instead. (DSP-13468)

• cfs-stress tool is replaced by fs-stress tool. (DSP-13549)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
87
DataStax Enterprise release notes

• Encryption for data stored on the server and encryption of Spark spill files is supported. (DSP-13841)

• Improved security with Spark. (DSP-13991)

• Spark local applications no longer use /var/lib/spark/rdd, instead configure and use .sparkdirectory for
processes started by the user. (DSP-14380)

• Input metrics are not thread-safe and are not used properly in CassandraJoinRDD and
CassandraLeftJoinRDD. (DSP-14569)

• AlwaysOn SQL workpool option adds high availability (HA) for the JDBC or ODBC connections for analytics
node. (DSP-14719)

• CFS is removed. Before upgrade, move HiveMetaStore from CFS to DSEFS and update URL references.
(DSP-14831)

• Include SPARK-21494 to use correct app id when authenticating to external service. (DSP-14140)

• Upgrade to DSE 6.0 must be complete on all nodes in the cluster before Spark Worker and Spark Master
will start. (DSP-14735)

• Spark Cassandra Connector in DSE 6.0.0, has the following changes:

# Changes to default values: spark.output.concurrent.writes: 100, spark.task.maxFailures: 10


(DSP-15164)

# spark.cassandra.connection.connections_per_executor_max is removed; use


new properties spark.cassandra.connection.local_connections_per_executor,
spark.cassandra.connection.remote_connections_per_executor_min, and
spark.cassandra.connection.remote_connections_per_executor_max. (DSP-15193

# All Spark-related parameters are now camelCase. Parameters are case-sensitive. The snake_versions
are automatically translated to the camelCaseVersions except when the parameters are used as table
options. In SparkSQL and with spark.read.options(...), the parameters are case-insensitive because of
internal SQL implementation.

# The DSE artifact is com.datastax.dse : spark-connector: 6.0.0.

# The DseSparkDependencies JAR is still required. (DSP-15694)

• Use NodeSync (continuous repair) and LOCAL_QUORUM for reading from Spark recovery storage.
(DSP-15219)
Supporting changes:

# Spark Master will not start until LOCAL_QUORUM is achieved for dse_analytics keyspace.

# Spark Master recovery data is first attempted to be updated with LOCAL_QUORUM, and if that fails,
then attempt with LOCAL_ONE. Recovery data are always queried with LOCAL_QUORUM (unlike
previous versions of DSE where we used LOCAL_ONE)

# DSE Analytics internal data moved from spark_system to dse_analytics keyspace.

DataStax strongly recommends enabling NodeSync for continuous repair on all tables in the
dse_analytics keyspace. NodeSync is required on the rm_shared_data keyspace that stores Spark
recovery information.

Resolved issues:

• DSE does not work with Spark Crypto based encryption. (DSP-14140)

6.0.0 DSEFS

Changes and enhancements:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
88
DataStax Enterprise release notes

• Wildcard characters are supported in DSEFS shell commands. (DSP-10583)

• DSEFS should support all DSE authentication schemes. (DSP-12956)

• Improved authorization security sets the default permission to 755 for directories and 644 for files. New
DSEFS clusters create the root directory / with 755 permission to prevent non-super users from modifying
root content; for example, by using mkdir or put commands. (DSP-13609)

• Enable SSL for DSEFS encryption. (DSP-13771)

• HTTP communication logging level changed from DEBUG to TRACE. (DSP-14400)

• DSEFS shell history has been moved to ~/.dse/dsefs_history. (DSP-15070)

• New tool to move hive metastore from CFS to DSEFS and update references.

• Add echo command to DSEFS shell. (DSP-15446)

• Changes in dse.yaml for advanced DSEFS settings.

• Alternatives wildcards are Hadoop compatible. (DSP-15249)

6.0.0 DSE Graph

Known issues:

• Dropping a property of vertex label with materialized view (MV) indices breaks graph. To drop a property
key for a vertex label that has a materialized view index, additional steps are required to prevent data loss or
cluster errors. See Dropping graph schema. (DSP-15532)

• Secondary indexes used for DSE Graph queries have higher latency in DSE 6.0 than in the previous
version. (DB-1928)

• Backup snapshots taken with OpsCenter 6.1 will not load to DSE 6.0. Use the backup service in OpsCenter
6.5 or later. (DSP-15922)

Changes and enhancements:

• Improved and simplified data batch loading of pre-formatted data. (DGL-235)


Supporting changes:

# Schema discovery and schema generation are deprecated. (DGL-246)

# Standard vertex IDs are deprecated. Use custom vertex IDs instead. (DSP-13485)

# Standard IDs are deprecated. (DGL-247)

# Transformations are deprecated. (DGL-248)

• Schema API changes: all .remove() methods are renamed to .drop() and schema.clear() is renamed to
schema.drop(). Schema API supports removing vertex/edge labels and property keys. Unify use of drop |
remove | clear in the Schema API and use .drop() everywhere. (DSP-8385, DSP-14150)

• Include materialized view (MV) indexes in query optimizer only if the MV was fully built. (DSP-10219)

• DSE profiling of graph statements from the gremlin shell. (DSP-13484)

• Improve Graph OLAP performance by smart routing query to DseGraphFrame engine with
DseGraphFrameInterceptorStrategy. (DSP-13489)

• OSS TinkerPop 3.3 supports Spark 2.2. (DSP-13632)

• Partitioned vertex tables (PVT) are removed. (DSP-13676)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
89
DataStax Enterprise release notes

• Graph online analytical processing (OLAP) supports drop() with DseGraphFrame interceptor. Simple queries
can be used in drop operations. (DSP-13998)

• DSE Graphs vertices and edges tables are accessible from SparkSQL and automated to dse_graph
SparkSQL database. (DSP-12046)

• More Gremlin APIs are supported in DSEGraphFrames: dedup, sort, limit, filter, as()/select(), or().
(DSP-13649)

• Some graph and gremlin_server properties in earlier versions of DSE are no longer required for DSE 6.0.
The default settings from the earlier versions of dse.yaml are preserved. These settings were removed from
dse.yaml.

# adjacency_cache_clean_rate

# adjacency_cache_max_entry_size_in_mb

# adjacency_cache_size_in_mb

# gremlin_server_enabled

# index_cache_clean_rate

# index_cache_max_entry_size_in_mb

# schema_mode - default schema_mode is production

# window_size

# ids (all vertex ID assignment and partitioning strategy options)

# various gremlin_server settings

If these properties exist in the dse.yaml file after upgrading to DSE 6.0, logs display warnings. You can
ignore these warnings or modify dse.yaml so that only the required graph system level and gremlin_server
properties are present. (DSP-14308)

• Spark Jobserver is the DSE custom version 0.8.0.44. Applications must use the compatible Spark Jobserver
API in DataStax repository. (DSP-14152)

• Edge label names and property key names allow only [a-zA-Z0-9], underscore, hyphen, and period. The
string formatting for vertices with text custom IDs has changed. (DSP-14710)
Supporting changes (DSP-15167):

# schema.describe() displays the entire schema, even if it contains illegal names.

# In-place upgrades allow existing schemas with invalid edge label names and property key names.

# Schema elements with illegal names cannot be uploaded or added.

• Invoking toString on a custom vertex ID containing a text property, or on an edge ID that is incident upon a
vertex with a custom vertex ID, now returns a value that encloses the text property value in double quotation
marks and escapes the value's internal double-quotes. This change protects older formats from irresolvable
parsing ambiguity. For example:

// old
{~label=v, x=foo}
{~label=w, x=a"b}
// new
{~label=v, x="foo"}
{~label=w, x="a""b"}

• Support for math()-step (math) to enable scientific calculator functionality within Gremlin. (DSP-14786)

• The GraphQueryThreads JMX attribute has been removed. Thread selection occurs with Thread Per Core
(TPC) asynchronous request processing architecture. (DSP-15222)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
90
DataStax Enterprise release notes

Resolved issues:

• spark.sql.hive.metastore.barrierPrefixes is set to org.apache.spark.sql.cassandra to properly use


CassandraConnector in DSE HiveMetastore implementation. (DSP-14120)

• Intermittent KryoException: Buffer underflow error when running order by query in OLTP mode.
(DSP-12694)

• DseGraphFrame does not support infix and() and or(). (DSP-12013)

• Graph could be left in an inconsistent state if a schema migration fails. (DSP-15532)

• DseGraphFrames properties().count() step return vertex count instead of multi properties count.
(DSP-15049)

• GraphSON parsing error prevents proper type detection under certain conditions. (DSP-14066)

6.0.0 DSE Search

Experimental features. These features are experimental and are not supported for production:

• The dsetool index_checks use an Apache Lucene® experimental feature.

Known issues:

• Search index TTL Expiration thread loops without effect with live indexing (RT indexing). (DSP-16038)

Changes and enhancements:

• DSE Search is very IO intensive. Performance is impacted by the Thread Per Core (TPC) asynchronous
read and write paths architecture. (DB-707)
Before using DSE Search in DSE 6.0 and later, review and follow the DataStax recommendations:

# On search nodes, change the tpc_cores value from its default to the number of physical CPUs. Refer
to Tuning TPC cores.

# Disable AIO and set the file_cache_size_in_mb value to 512. Refer to Disabling AIO.

# Locate DSE Cassandra transactional data and Solr-based DSE Search data on separate Solid State
Drives (SSDs). Refer to Set the location of search indexes.

# Plan for sufficient memory resources and disk space to meet operational requirements. Refer to
Capacity planning for DSE Search.

• Writes are flushed to disk in segments that use a new Lucene codec that does not exist in earlier versions.
Unique key values are no longer stored as both docValues and Lucene stored fields. The unique key values
are now stored only as docValues in a new codec to store managed fields like Lucene. Downgrades to
versions earlier than DSE 6.0 are not supported. (DSP-8465)

• Document inserts and updates using HTTP are removed. Before upgrading, ensure you are using CQL for
all inserts and updates. (DSP-9725).

• DSENRTCachingDirectoryFactory is removed. Before upgrading, change your search index config.


(DSP-10126)

• The <dataDir> parameter in the solrconfig.xml file is not supported. Instead, follow the steps in Set the
location of search indexes. (DSP-13199)

• Improved performance by early termination of sorting. Ideal for queries that need only a few results returned,
from a large number of total matches. (DSP-13253)

• Native CQL syntax for search queries. (DSP-13411)


Supporting changes:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
91
DataStax Enterprise release notes

# The default for CQL text type changed from solr.TextField to solr.StrField.

# Updated wikipedia demo syntax.

# enable_tokenized_text_copy_fields replaces enable_string_copy_fields in spaceSaving


profiles.

# The spaceSavingNoTextfield resource generation profile is removed.

• Delete by id is removed. Delete by query no longer accepts wildcard queries, including queries that match
all documents (for example, <delete><query>*:*</query></delete>). Instead, use CQL to DELETE by
Primary Key or the TRUNCATE command. (DSP-13436)

• Search index config changes. (DSP-14137)

# mergePolicy, maxMergeDocs, and mergeFactor are no longer supported.

# RAM buffer size settings are no longer required in search index config. Global RAM buffer usage in
Lucene is governed by the memtable size limits in cassandra.yaml. RAM buffers are counted toward the
memtable_heap_space_in_mb.

• dsetool core_indexing_status --progress option is always true. (DSP-13465)

• The HTTP API for Solr core management is removed. Instead, use CQL commands for search index
management or dsetool search index commands. (DSP-13530)

• The Tika functionality bundled with Apache Solr is removed. Instead, use the stand-alone Apache Tika
project. (DSP-13892)

• Logging configuration improvements. (DSP-14137)

# The solrvalidation.log is removed. You can safely remove appender SolrValidationErrorAppender and
the logger SolrValidationErrorLogger from logback.xml. Indexing errors manifest as:

# failures at the coordinator if they represent failures that might succeed at some later point in time
using the hint replay mechanism

# as messages in the system.log if the failures are due to non-recoverable indexing validation errors
(for data that is written to the database, but not indexed properly)

• The DSE custom update request processor (URP) implementation is deprecated. Use the field input/output
(FIT) transformer API instead. (DSP-14360)

• The stored flag in search index schemas is deprecated and is no longer added to auto-generated schemas.
If the flag exists in custom schemas, it is ignored. (DSP-14425)

• Tuning improvements for indexing. (DSP-14785, DSP-14978))

# Indexing is no longer asynchronous. Document updates are written to the Lucene RAM buffer
synchronously with the mutation backing table.

# See Tuning search for maximum indexing throughput.

# The back_pressure_threshold_per_core in dse.yaml affects only index rebuilding/reindexing. DataStax


recommends not changing the default value of 1024.

# These options in dse.yaml are removed:

# enable_back_pressure_adaptive_nrt_commit

# max_solr_concurrency_per_core

# solr_indexing_error_log_options

DSE 6.0 will not start with these options present.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
92
DataStax Enterprise release notes

• StallMetrics MBean is removed. Before upgrading to DSE 6.0, change operators that use the MBean.
(DSP-14860)

• Optimize Paging when limit is smaller than the page size. (DSP-15207)

Resolved issues includes all bug fixes up to DSE 5.1.8. Additional 6.0.0 fixes:

• Isolate Solr Resource Loading at startup to the Local DC. (DSP-10911)

DataStax Studio 6.0.0

• For use with DSE 6.0.x, DataStax Studio 6.0.0 is installed as a standalone tool. (DSP-13999, DSP-15623)

For details, see DataStax Studio 6.0.0 release notes.

DataStax Bulk Loader 1.0.1

• DataStax Bulk Loader (dsbulk) version 1.0.1 is automatically installed with DataStax Enterprise 6.0.0, and
can also be installed as a standalone tool. (DSP-13999, DSP-15623)

For details, see DataStax Bulk Loader 1.0.1 release notes.


Cassandra enhancements for DSE 6.0
DataStax Enterprise 6.0.0 is compatible with Apache Cassandra™ 3.11 and adds these production-certified
enhancements:

• Add DEFAULT, UNSET, MBEAN and MBEANS to `ReservedKeywords`. (CASSANDRA-14205)

• Add Unittest for schema migration fix (CASSANDRA-14140)

• Print correct snitch info from nodetool describecluster (CASSANDRA-13528)

• Close socket on error during connect on OutboundTcpConnection (CASSANDRA-9630)

• Enable CDC unittest (CASSANDRA-14141)

• Split CommitLogStressTest to avoid timeout (CASSANDRA-14143)

• Improve commit log chain marker updating (CASSANDRA-14108)

• Fix updating base table rows with TTL not removing view entries (CASSANDRA-14071)

• Reduce garbage created by DynamicSnitch (CASSANDRA-14091)

• More frequent commitlog chained markers (CASSANDRA-13987)

• RPM package spec: fix permissions for installed jars and config files (CASSANDRA-14181)

• More PEP8 compiance for cqlsh (CASSANDRA-14021)

• Fix support for SuperColumn tables (CASSANDRA-12373)

• Fix missing original update in TriggerExecutor (CASSANDRA-13894)

• Improve short read protection performance (CASSANDRA-13794)

• Fix counter application order in short read protection (CASSANDRA-12872)

• Fix MV timestamp issues (CASSANDRA-11500)

• Fix AssertionError in short read protection (CASSANDRA-13747)

• Gossip thread slows down when using batch commit log (CASSANDRA-12966)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
93
DataStax Enterprise release notes

• Allow native function calls in CQLSSTableWriter (CASSANDRA-12606)

• Copy session properties on cqlsh.py do_login (CASSANDRA-13847)

• Fix load over calculated issue in IndexSummaryRedistribution (CASSANDRA-13738)

• Obfuscate password in stress-graphs (CASSANDRA-12233)

• ReverseIndexedReader may drop rows during 2.1 to 3.0 upgrade (CASSANDRA-13525)

• Avoid reading static row twice from old format sstables (CASSANDRA-13236)

• Fix possible NPE on upgrade to 3.0/3.X in case of IO errors (CASSANDRA-13389)

• Add duration data type (CASSANDRA-11873)

• Properly report LWT contention (CASSANDRA-12626)

• Stress daemon help is incorrect(CASSANDRA-12563)

• Remove ALTER TYPE support (CASSANDRA-12443)

• Fix assertion for certain legacy range tombstone pattern (CASSANDRA-12203)

• Remove support for non-JavaScript UDFs (CASSANDRA-12883)

• Better handle invalid system roles table (CASSANDRA-12700)

• Upgrade netty version to fix memory leak with client encryption (CASSANDRA-13114)

• Fix trivial log format error (CASSANDRA-14015)

• Allow SSTabledump to do a JSON object per partition (CASSANDRA-13848)

• Remove unused and deprecated methods from AbstractCompactionStrategy (CASSANDRA-14081)

• Fix Distribution.average in cassandra-stress (CASSANDRA-14090)

• Presize collections (CASSANDRA-13760)

• Add GroupCommitLogService (CASSANDRA-13530)

• Parallelize initial materialized view build (CASSANDRA-12245)

• Fix flaky SecondaryIndexManagerTest.assert[Not]MarkedAsBuilt (CASSANDRA-13965)

• Make LWTs send resultset metadata on every request (CASSANDRA-13992)

• Fix flaky indexWithFailedInitializationIsNotQueryableAfterPartialRebuild (CASSANDRA-13963)

• Introduce leaf-only iterator (CASSANDRA-9988)

• Allow only one concurrent call to StatusLogger (CASSANDRA-12182)

• Refactoring to specialised functional interfaces (CASSANDRA-13982)

• Speculative retry should allow more friendly parameters (CASSANDRA-13876)

• Throw exception if we send/receive repair messages to incompatible nodes (CASSANDRA-13944)

• Replace usages of MessageDigest with Guava's Hasher (CASSANDRA-13291)

• Add nodetool command to print hinted handoff window (CASSANDRA-13728)

• Fix some alerts raised by static analysis (CASSANDRA-13799)

• Checksum SSTable metadata (CASSANDRA-13321, CASSANDRA-13593)

• Add result set metadata to prepared statement MD5 hash calculation (CASSANDRA-10786)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
94
DataStax Enterprise release notes

• Add incremental repair support for --hosts, --force, and subrange repair (CASSANDRA-13818)

• Refactor GcCompactionTest to avoid boxing (CASSANDRA-13941)

• Expose recent histograms in JmxHistograms (CASSANDRA-13642)

• Add SERIAL and LOCAL_SERIAL support for cassandra-stress (CASSANDRA-13925)

• LCS needlessly checks for L0 STCS candidates multiple times (CASSANDRA-12961)

• Correctly close netty channels when a stream session ends (CASSANDRA-13905)

• Update lz4 to 1.4.0 (CASSANDRA-13741)

• Throttle base partitions during MV repair streaming to prevent OOM (CASSANDRA-13299)

• Improve short read protection performance (CASSANDRA-13794)

• Fix AssertionError in short read protection (CASSANDRA-13747)

• Use compaction threshold for STCS in L0 (CASSANDRA-13861)

• Fix problem with min_compress_ratio: 1 and disallow ratio < 1 (CASSANDRA-13703)

• Add extra information to SASI timeout exception (CASSANDRA-13677)

• Rework CompactionStrategyManager.getScanners synchronization (CASSANDRA-13786)

• Add additional unit tests for batch behavior, TTLs, Timestamps (CASSANDRA-13846)

• Add keyspace and table name in schema validation exception (CASSANDRA-13845)

• Emit metrics whenever we hit tombstone failures and warn thresholds (CASSANDRA-13771)

• Allow changing log levels via nodetool for related classes (CASSANDRA-12696)

• Add stress profile yaml with LWT (CASSANDRA-7960)

• Reduce memory copies and object creations when acting on ByteBufs (CASSANDRA-13789)

• simplify mx4j configuration (Cassandra-13578)

• Fix trigger example on 4.0 (CASSANDRA-13796)

• force minumum timeout value (CASSANDRA-9375)

• Add bytes repaired/unrepaired to nodetool tablestats (CASSANDRA-13774)

• Don't delete incremental repair sessions if they still have sstables (CASSANDRA-13758)

• Fix pending repair manager index out of bounds check (CASSANDRA-13769)

• Don't use RangeFetchMapCalculator when RF=1 (CASSANDRA-13576)

• Don't optimise trivial ranges in RangeFetchMapCalculator (CASSANDRA-13664)

• Use an ExecutorService for repair commands instead of new Thread(..).start() (CASSANDRA-13594)

• Fix race / ref leak in anticompaction (CASSANDRA-13688)

• Fix race / ref leak in PendingRepairManager (CASSANDRA-13751)

• Enable ppc64le runtime as unsupported architecture (CASSANDRA-13615)

• Improve sstablemetadata output (CASSANDRA-11483)

• Support for migrating legacy users to roles has been dropped (CASSANDRA-13371)

• Introduce error metrics for repair (CASSANDRA-13387)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
95
DataStax Enterprise release notes

• Refactoring to primitive functional interfaces in AuthCache (CASSANDRA-13732)

• Update metrics to 3.1.5 (CASSANDRA-13648)

• batch_size_warn_threshold_in_kb can now be set at runtime (CASSANDRA-13699)

• Avoid always rebuilding secondary indexes at startup (CASSANDRA-13725)

• Upgrade JMH from 1.13 to 1.19 (CASSANDRA-13727)

• Upgrade SLF4J from 1.7.7 to 1.7.25 (CASSANDRA-12996)

• Default for start_native_transport now true if not set in config (CASSANDRA-13656)

• Don't add localhost to the graph when calculating where to stream from (CASSANDRA-13583)

• Allow skipping equality-restricted clustering columns in ORDER BY clause (CASSANDRA-10271)

• Use common nowInSec for validation compactions (CASSANDRA-13671)

• Improve handling of IR prepare failures (CASSANDRA-13672)

• Send IR coordinator messages synchronously (CASSANDRA-13673)

• Flush system.repair table before IR finalize promise (CASSANDRA-13660)

• Fix column filter creation for wildcard queries (CASSANDRA-13650)

• Add 'nodetool getbatchlogreplaythrottle' and 'nodetool setbatchlogreplaythrottle' (CASSANDRA-13614)

• fix race condition in PendingRepairManager (CASSANDRA-13659)

• Allow noop incremental repair state transitions (CASSANDRA-13658)

• Run repair with down replicas (CASSANDRA-10446)

• Added started & completed repair metrics (CASSANDRA-13598)

• Added started & completed repair metrics (CASSANDRA-13598)

• Improve secondary index (re)build failure and concurrency handling (CASSANDRA-10130)

• Improve calculation of available disk space for compaction (CASSANDRA-13068)

• Change the accessibility of RowCacheSerializer for third party row cache plugins (CASSANDRA-13579)

• Allow sub-range repairs for a preview of repaired data (CASSANDRA-13570)

• NPE in IR cleanup when columnfamily has no sstables (CASSANDRA-13585)

• Fix Randomness of stress values (CASSANDRA-12744)

• Allow selecting Map values and Set elements (CASSANDRA-7396)

• Fast and garbage-free Streaming Histogram (CASSANDRA-13444)

• Update repairTime for keyspaces on completion (CASSANDRA-13539)

• Add configurable upper bound for validation executor threads (CASSANDRA-13521)

• Bring back maxHintTTL propery (CASSANDRA-12982)

• Add testing guidelines (CASSANDRA-13497)

• Add more repair metrics (CASSANDRA-13531)

• RangeStreamer should be smarter when picking endpoints for streaming (CASSANDRA-4650)

• Avoid rewrapping an exception thrown for cache load functions (CASSANDRA-13367)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
96
DataStax Enterprise release notes

• Log time elapsed for each incremental repair phase (CASSANDRA-13498)

• Add multiple table operation support to cassandra-stress (CASSANDRA-8780)

• Fix incorrect cqlsh results when selecting same columns multiple times (CASSANDRA-13262)

• Fix WriteResponseHandlerTest is sensitive to test execution order (CASSANDRA-13421)

• Improve incremental repair logging (CASSANDRA-13468)

• Start compaction when incremental repair finishes (CASSANDRA-13454)

• Add repair streaming preview (CASSANDRA-13257)

• Cleanup isIncremental/repairedAt usage (CASSANDRA-13430)

• Change protocol to allow sending key space independent of query string (CASSANDRA-10145)

• Make gc_log and gc_warn settable at runtime (CASSANDRA-12661)

• Take number of files in L0 in account when estimating remaining compaction tasks (CASSANDRA-13354)

• Skip building views during base table streams on range movements (CASSANDRA-13065)

• Improve error messages for +/- operations on maps and tuples (CASSANDRA-13197)

• Remove deprecated repair JMX APIs (CASSANDRA-11530)

• Fix version check to enable streaming keep-alive (CASSANDRA-12929)

• Make it possible to monitor an ideal consistency level separate from actual consistency level
(CASSANDRA-13289)

• Outbound TCP connections ignore internode authenticator (CASSANDRA-13324)

• Upgrade junit from 4.6 to 4.12 (CASSANDRA-13360)

• Cleanup ParentRepairSession after repairs (CASSANDRA-13359)

• Upgrade snappy-java to 1.1.2.6 (CASSANDRA-13336)

• Incremental repair not streaming correct sstables (CASSANDRA-13328)

• Upgrade the JNA version to 4.3.0 (CASSANDRA-13300)

• Add the currentTimestamp, currentDate, currentTime and currentTimeUUID functions


(CASSANDRA-13132)

• Remove config option index_interval (CASSANDRA-10671)

• Reduce lock contention for collection types and serializers (CASSANDRA-13271)

• Make it possible to override MessagingService.Verb ids (CASSANDRA-13283)

• Avoid synchronized on prepareForRepair in ActiveRepairService (CASSANDRA-9292)

• Adds the ability to use uncompressed chunks in compressed files (CASSANDRA-10520)

• Don't flush sstables when streaming for incremental repair (CASSANDRA-13226)

• Remove unused method (CASSANDRA-13227)

• Fix minor bugs related to #9143 (CASSANDRA-13217)

• Output warning if user increases RF (CASSANDRA-13079)

• Remove pre-3.0 streaming compatibility code for 4.0 (CASSANDRA-13081)

• Add support for + and - operations on dates (CASSANDRA-11936)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
97
DataStax Enterprise release notes

• Fix consistency of incrementally repaired data (CASSANDRA-9143)

• Increase commitlog version (CASSANDRA-13161)

• Make TableMetadata immutable, optimize Schema (CASSANDRA-9425)

• Refactor ColumnCondition (CASSANDRA-12981)

• Parallelize streaming of different keyspaces (CASSANDRA-4663)

• Improved compactions metrics (CASSANDRA-13015)

• Speed-up start-up sequence by avoiding un-needed flushes (CASSANDRA-13031)

• Use Caffeine (W-TinyLFU) for on-heap caches (CASSANDRA-10855)

• Thrift removal (CASSANDRA-11115)

• Remove pre-3.0 compatibility code for 4.0 (CASSANDRA-12716)

• Add column definition kind to dropped columns in schema (CASSANDRA-12705)

• Add (automate) Nodetool Documentation (CASSANDRA-12672)

• Update bundled cqlsh python driver to 3.7.0 (CASSANDRA-12736)

• Reject invalid replication settings when creating or altering a keyspace (CASSANDRA-12681)

• Clean up the SSTableReader#getScanner API wrt removal of RateLimiter (CASSANDRA-12422)

• Use new token allocation for non bootstrap case as well (CASSANDRA-13080)

• Avoid byte-array copy when key cache is disabled (CASSANDRA-13084)

• Require forceful decommission if number of nodes is less than replication factor (CASSANDRA-12510)

• Allow IN restrictions on column families with collections (CASSANDRA-12654)

• Log message size in trace message in OutboundTcpConnection (CASSANDRA-13028)

• Add timeUnit Days for cassandra-stress (CASSANDRA-13029)

• Add mutation size and batch metrics (CASSANDRA-12649)

• Add method to get size of endpoints to TokenMetadata (CASSANDRA-12999)

• Expose time spent waiting in thread pool queue (CASSANDRA-8398)

• Conditionally update index built status to avoid unnecessary flushes (CASSANDRA-12969)

• cqlsh auto completion: refactor definition of compaction strategy options (CASSANDRA-12946)

• Add support for arithmetic operators (CASSANDRA-11935)

• Add histogram for delay to deliver hints (CASSANDRA-13234)

• Fix cqlsh automatic protocol downgrade regression (CASSANDRA-13307)

• Changing `max_hint_window_in_ms` at runtime (CASSANDRA-11720)

• Nodetool repair can hang forever if we lose the notification for the repair completing/failing
(CASSANDRA-13480)

• Anticompaction can cause noisy log messages (CASSANDRA-13684)

• Switch to client init for sstabledump (CASSANDRA-13683)

• CQLSH: Don't pause when capturing data (CASSANDRA-13743)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
98
DataStax Enterprise release notes

General upgrade advice for DSE 6.0.0


DataStax Enterprise 6.0.0 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.0
DataStax Enterprise (DSE) 6.0.0 includes all changes from previous releases plus these production-certified
changes that are in addition to TinkerPop 3.3.2. See TinkerPop upgrade documentation for all changes.

• Made iterate() a first class step. (TINKERPOP-1834)

• Fixed a bug in NumberHelper that led to wrong min/max results if numbers exceeded the Integer limits.
(TINKERPOP-1873)

• Improved error messaging for failed serialization and deserialization of request/response messages.

• Fixed bug in handling of Direction.BOTH in Messenger implementations to pass the message to the
opposite side of the `StarGraph` in VertexPrograms for OLAP traversals. (TINKERPOP-1862)

• Fixed a bug in Gremlin Console which prevented handling of gremlin.sh flags that had an equal sign (=)
between the flag and its arguments. (TINKERPOP-1879)

• Fixed a bug where SparkMessenger was not applying the edgeFunction`from MessageScope`in
VertexPrograms for OLAP-based traversals. (TINKERPOP-1872)

• TinkerPop drivers prior to 3.2.4 won't authenticate with Kerberos anymore. A long-deprecated option on the
Gremlin Server protocol was removed.

DataStax Bulk Loader release notes


Release notes for DataStax Bulk Loader 1.1.x and 1.0.x.
DataStax Bulk Loader 1.1.x and 1.0.x can migrate data in CSV or JSON format into DSE from another DSE or
TM
Apache Cassandra cluster.

• Can unload data from any Cassandra 2.1 or later data source

• Can load data to DSE 5.0 or later

DataStax Studio release notes


Release notes for DataStax Studio 6.0.x.
See the DataStax Studio 6.0 release notes in the DataStax Studio guide.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
99
Chapter 3. Installing DataStax Enterprise 6.0
Installation information is located in the Installation Guide.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
100
Chapter 4. Configuration

Recommended production settings


DataStax recommends the following settings for using DataStax Enterprise (DSE) in production environments.

Depending on your environment, some of the following settings might not persist after reboot. Check with your
system administrator to ensure these settings are viable for your environment.

Use the Preflight check tool to run a collection of tests on a DSE node to detect and fix node configurations. The
tool can detect and optionally fix many invalid or suboptimal configuration settings, such as user resource limits,
swap, and disk settings.
Configure the chunk cache
Beginning in DataStax Enterprise (DSE) 6.0, the amount of native memory used by the DSE process has
increased significantly.
The main reason for this increase is the chunk cache (or file cache), which is like an OS page cache. The
following sections provide additional information:

• See Chunk cache history for a historical description of the chunk cache, and how it is calculated in DSE 6.0
and later.

• See Chunk cache differences from OS page cache to understand key differences between the chunk cache
and the OS page cache.

Consider the following recommendations depending on workload type for your cluster.

DSE recommendations

Regarding DSE, consider the following recommendations when choosing the max direct memory and file cache
size:

• Total server memory size

• Adequate memory for the OS and other applications

• Adequate memory for the Java heap size

• Adequate memory for native raw memory (such as bloom filters and off-heap memtables)

For 64 GB servers, the default settings are typically adequate. For larger servers, increase the max direct
memory (-XX:MaxDirectMemorySize), but leave approximately 15-20% of memory for the OS and other in-
memory structures. The file cache size will be set automatically to half of that. This setting is acceptable, but the
size could be increased gradually if the cache hit rate is too low and there is still available memory on the server.

DSE Search recommendations

Disabling asynchronous I/O (AIO) and explicitly setting the chunk cache size (file_cache_size_in_mb) improves
performance for most DSE Search workloads. When enforced, SSTables and Lucene segments, as well as other
minor off-heap elements, will reside in the OS page cache and be managed by the kernel.
A potentially negative impact of disabling AIO might be measurably higher read latency when DSE goes to disk,
in cases where the dataset is larger than available memory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
101
Configuration

To disable AIO and set the chunk cache size, see Disable AIO.

DSE Analytics recommendations

DSE Analytics relies heavily on memory for performance. Because Apache Spark™ effectively manages its own
memory through the Apache Spark application settings, you must determine how much memory the Apache
Spark application receives. Therefore, you must think about how much memory to allocate to the chunk cache
versus how much memory to allocate for Apache Spark applications. Similar to DSE Search, you can disable
AIO and lower the chunk cache size to provide Apache Spark with more memory.

DSE Graph recommendations

Because DSE Graph heavily relies on several different workloads, it’s important to follow the previous
recommendations for the specific workload. If you use DSE Search or DSE Analytics with DSE Graph, lower the
chunk cache and disable AIO for the best performance. If you use DSE Graph only on top of Apache Cassandra,
increase the chunk cache gradually, leaving 15-20% of memory available for other processes.

Chunk cache differences from OS page cache

There are several differences between the chunk cache and the OS page cache, and a full description is outside
the scope of this information. However, the following differences are relevant to DSE:

• Because the OS page cache is sized dynamically by the operating system, it can grow and shrink depending
on the available server memory. The chunk cache must be sized statically.
If the chunk cache is too small, the available server memory will be unused. For servers with large amounts
of memory (50 GB or more), the memory is wasted. If the chunk cache is too large, the available memory on
the server can reduce enough that the OS will kill the DSE process to avoid an out of memory issue.

At the time of writing, the size of the chunk cache cannot be changed dynamically so to change the size
of the chunk cache the DSE process must be restarted.

• Restarting the DSE process will destroy the chunk cache, so each time the process is restarted, the chunk
cache will be cold. The OS page cache only becomes cold after a server restart.

• The memory used by the file cache is part of the DSE process memory, and is therefore seen by the OS as
user memory. However, the OS page cache memory is seen as buffer memory.

• The chunk cache uses mostly NIO direct memory, storing file chunks into NIO byte buffers. However, NIO
does have an on-heap footprint, which DataStax is working to reduce.

Chunk cache history

The chunk cache is not new to Apache Cassandra, and was originally intended to cache small parts (chunks) of
SSTable files to make read operations faster. However, the default file access mode was memory mapped until
DSE 5.1, so the chunk cache had a secondary role and its size was limited to 512 MB.
The default setting of 512 MB was configured by the file_cache_size_in_mb parameter in cassandra.yaml.

In DSE 6.0 and later, the chunk cache has increased relevance, not just because it replaces the OS page cache
for database read operations, but because it is a central component of the asynchronous thread-per-core (TPC)
architecture.
By default, the chunk cache is configured to use the following portion of the max direct memory:

• One-half (½) of the max direct memory for the DSE process

• One-fourth (¼) of the max direct memory for tools

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
102
Configuration

The max direct memory is calculated as one-half (½) of the system memory minus the JVM heap size:

Max direct memory = ((system memory - JVM heap size))/2

You can explicitly configure the max direct memory by setting the JVM MaxDirectMemorySize (-
XX:MaxDirectMemorySize) parameter. See increasing the max direct memory. Alternatively, you can override
the max direct memory setting by explicitly configuring the file_cache_size_in_mb parameter in cassandra.yaml.
Install the latest Java Virtual Machine
Configure your operating system to use the latest build of a Technology Compatibility Kit (TCK) Certified
OpenJDK version 8. For example, OpenJDK 8 (1.8.0_151 minimum). Java 9 is not supported.

Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8. This change
is due to the end of public updates for Oracle JRE/JDK 8.

See the installation instructions for your operating system:

• Installing Open JDK 8 on Debian or Ubuntu Systems

• Installing OpenJDK 8 on RHEL-based Systems

Synchronize clocks
Use Network Time Protocol (NTP) to synchronize the clocks on all nodes and application servers.
Synchronizing clocks is required because DataStax Enterprise (DSE) overwrites a column only if there is another
version whose timestamp is more recent, which can happen when machines are in different locations.
DSE timestamps are encoded as microseconds because UNIX Epoch time does not include timezone
information. The timestamp for all writes in DSE is Universal Time Coordinated (UTC). DataStax recommends
converting to local time only when generating output to be read by humans.

1. Install NTP for your operating system:


Operating system Command

Debian-based system $ sudo apt-get install ntpdate

1
RHEL-based system $ sudo yum install ntpdate

1
On RHEL 7 and later, chrony is the default network time protocol daemon. The configuration file for chrony is located in /etc/chrony.conf
on these systems.

2. Start the NTP service on all nodes:

$ sudo service ntp start -x

3. Run the ntupdate command to synchronize clocks:

$ sudo ntpdate 1.ro.pool.ntp.org

4. Verify that your NTP configuration is working:

$ ntpstat

Set kernel parameters


Configure the following kernel parameters for optimal traffic and user limits.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
103
Configuration

Run the following command to view all current Linux kernel settings:

$ sudo sysctl -a

TCP settings
During low traffic intervals, a firewall configured with an idle connection timeout can close connections to local
nodes and nodes in other data centers. To prevent connections between nodes from timing out, set the following
network kernel settings:

1. Set the following TCP keepalive timeout values:

$ sudo sysctl -w \ net.ipv4.tcp_keepalive_time=60 \ net.ipv4.tcp_keepalive_probes=3 \


net.ipv4.tcp_keepalive_intvl=10

These values set the TCP keepalive timeout to 60 seconds with 3 probes, 10 seconds gap between each.
The settings detect dead TCP connections after 90 seconds (60 + 10 + 10 + 10). The additional traffic is
negligible, and permanently leaving these settings is not an issue. See Firewall idle connection timeout
causes nodes to lose communication during low traffic times on Linux .

2. Change the following settings to handle thousands of concurrent connections used by the database:

$ sudo sysctl -w \ net.core.rmem_max=16777216 \ net.core.wmem_max=16777216


\ net.core.rmem_default=16777216 \ net.core.wmem_default=16777216 \
net.core.optmem_max=40960 \ net.ipv4.tcp_rmem='4096 87380 16777216' \
net.ipv4.tcp_wmem='4096 65536 16777216'

Instead of changing the system TCP settings, you can prevent reset connections during streaming by tuning
the streaming_keep_alive_period_in_secs setting in cassandra.yaml.

Set user resource limits


Use the ulimit -a command to view the current limits. Although limits can also be temporarily set using this
command, DataStax recommends making the changes permanent.
For more information, see Recommended production settings.
Debian-based systems

1. Edit the /etc/pam.d/su file and uncomment the following line to enable the pam_limits.so module:

session required pam_limits.so

This change to the PAM configuration file ensures that the system reads the files in the /etc/security/
limits.d directory.

2. If you run DSE as root, some Linux distributions (such as Ubuntu), require setting the limits for the root user
explicitly instead of using cassandra_user:

root - memlock unlimited


root - nofile 1048576
root - nproc 32768
root - as unlimited

RHEL-based systems

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
104
Configuration

1. Set the nproc limits to 32768 in the /etc/security/limits.d/90-nproc.conf configuration file:

cassandra_user - nproc 32768

All systems

1. Add the following line to /etc/sysctl.conf:

vm.max_map_count = 1048575

2. Open the configuration file for your installation type:


Installation type Configuration file

Tarball installation /etc/security/limits.conf

Package installation /etc/security/limits.d/cassandra.conf

3. Configure the following settings for the <cassandra_user> in the configuration file:

<cassandra_user> - memlock unlimited


<cassandra_user> - nofile 1048576
<cassandra_user> - nproc 32768
<cassandra_user> - as unlimited

4. Reboot the server or run the following command to make all changes take effect:

$ sudo sysctl -p

Persist updated settings

1. Add the following values to the /etc/sysctl.conf file:

net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.core.optmem_max=40960
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

2. Load the settings using one of the following commands:

$ sudo sysctl -p /etc/sysctl.conf

$ sudo sysctl -p /etc/sysctl.d/*.conf

3. To confirm the user limits are applied to the DSE process, run the following command where pid is the
process ID of the currently running DSE process:

$ cat /proc/pid/limits

Disable settings that impact performance


Disable the following settings, which can cause issues with performance.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
105
Configuration

Disable CPU frequency scaling


Recent Linux systems include a feature called CPU frequency scaling or CPU speed scaling. This feature allows
a server's clock speed to be dynamically adjusted so that the server can run at lower clock speeds when the
demand or load is low. This change reduces the server's power consumption and heat output, which significantly
impacts cooling costs. Unfortunately, this behavior has a detrimental effect on servers running DSE, because
throughput can be capped at a lower rate.
On most Linux systems, a CPUfreq governor manages the scaling of frequencies based on defined rules. The
default ondemand governor switches the clock frequency to maximum when demand is high, and switches to the
lowest frequency when the system is idle.

Do not use governors that lower the CPU frequency. To ensure optimal performance, reconfigure all CPUs to
use the performance governor, which locks the frequency at maximum.

The performance governor will not switch frequencies, which means that power savings will be bypassed to
always run at maximum throughput. On most systems, run the following command to set the governor:

for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor


do
[ -f $CPUFREQ ] || continue
echo -n performance > $CPUFREQ
done

If this directory does not exist on your system, refer to one of the following pages based on your operating
system:

• Debian-based systems: cpufreq-set command on Debian systems

• RHEL-based systems: CPUfreq setup on RHEL systems

For more information, see High server load and latency when CPU frequency scaling is enabled in the DataStax
Help Center.
Disable zone_reclaim_mode on NUMA systems
The Linux kernel can be inconsistent in enabling/disabling zone_reclaim_mode, which can result in odd
performance problems.
To ensure that zone_reclaim_mode is disabled:

$ echo 0 > /proc/sys/vm/zone_reclaim_mode

For more information, see Peculiar Linux kernel performance problem on NUMA systems.
Disable swap
Failure to disable swap entirely can severely lower performance. Because the database has multiple replicas
and transparent failover, it is preferable for a replica to be killed immediately when memory is low rather than
go into swap. This allows traffic to be immediately redirected to a functioning replica instead of continuing to
hit the replica that has high latency due to swapping. If your system has a lot of DRAM, swapping still lowers
performance significantly because the OS swaps out executable code so that more DRAM is available for
caching disks.
If you insist on using swap, you can set vm.swappiness=1. This allows the kernel swap out the absolute least
used parts.

$ sudo swapoff --all

To make this change permanent, remove all swap file entries from /etc/fstab.
For more information, see Nodes seem to freeze after some period of time.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
106
Configuration

Optimize disk settings


The default disk configurations on most Linux distributions are not optimal. Follow these steps to optimize
settings for your Solid State Drives (SSDs) or spinning disks.

Complete the optimization settings for either SSDs or spinning disks. Do not complete both procedures for
either storage type.

Optimize SSDs
Complete the following steps to ensure the best settings for SSDs.

1. Ensure that the SysFS rotational flag is set to false (zero).


This overrides any detection by the operating system to ensure the drive is considered an SSD.

2. Apply the same rotational flag setting for any block devices created from SSD storage, such as mdarrays.

3. Determine your devices by running lsblk:

$ lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT


vda 253:0 0 32G 0 disk
|
|-sda1 253:1 0 8M 0 part
|-sda2 253:2 0 32G 0 part /

In this example, the current devices are sda1 and sda2.

4. Set the IO scheduler to either deadline or noop for each of the listed devices:
For example:

$ echo deadline > /sys/block/device_name/queue/scheduler

where device_name is the name of the device you want to apply settings for.

• The deadline scheduler optimizes requests to minimize IO latency. If in doubt, use the deadline
scheduler.

$ echo deadline > /sys/block/device_name/queue/scheduler

• The noop scheduler is the right choice when the target block device is an array of SSDs behind a high-
end IO controller that performs IO optimization.

$ echo noop > /sys/block/device_name/queue/scheduler

5. Set the nr_requests value to indicate the maximum number of read and write requests that can be queued:
Machine size Value

Large machines $ echo 128 sys/block/device_name/queue/nr_requests

Small machines $ echo 32 sys/block/device_name/queue/nr_requests

6. Set the readahead value for the block device to 8 KB.


This setting tells the operating system not to read extra bytes, which can increase IO time and pollute the
cache with bytes that weren’t requested by the user.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
107
Configuration

The recommended readahead setting for RAID on SSDs is the same as that for SSDs that are not being
used in a RAID installation.

a. Open /etc/rc.local for editing.

b. Add the following lines to set the readahead on startup:

touch /var/lock/subsys/local
echo 0 > /sys/class/block/sda/queue/rotational
echo 8 > /sys/class/block/sda/queue/read_ahead_kb

c. Save and close /etc/rc.local.

Optimize spinning disks


1. Check to ensure read-ahead value is not set to 65536:

$ sudo blockdev --report /dev/spinning_disk

2. Set the readahead to 128, which is the recommended value:

$ sudo blockdev --setra 128 /dev/spinning_disk

Set the heap size for Java garbage collection


The default JVM garbage collection (GC) is G1 for DSE 5.1 and later.
DataStax does not recommend using G1 when using Java 7. This is due to a problem with class unloading in
G1. In Java 7, PermGen fills up indefinitely until a full GC is performed.

Heap size is usually between ¼ and ½ of system memory. Do not devote all memory to heap because it is also
used for offheap cache and file system cache.
See Tuning Java Virtual Machine for more information on tuning the Java Virtual Machine (JVM).

If you want to use Concurrent-Mark-Sweep (CMS) garbage collection, contact the DataStax Services team for
configuration help. Tuning Java resources provides details on circumstances where CMS is recommended,
though using CMS requires time, expertise, and repeated testing to achieve optimal results.

The easiest way to determine the optimum heap size for your environment is:

1. Set the MAX_HEAP_SIZE in the jvm.options file to a high arbitrary value on a single node.

2. View the heap used by that node:

• Enable GC logging and check the logs to see trends.

• Use List view in OpsCenter.

3. Use the value for setting the heap size in the cluster.

This method decreases performance for the test node, but generally does not significantly reduce cluster
performance.

If you don't see improved performance, contact the DataStax Services team for additional help in tuning the JVM.
Check Java Hugepages settings
Many modern Linux distributions ship with the Transparent Hugepages feature enabled by default. When Linux
uses Transparent Hugepages, the kernel tries to allocate memory in large chunks (usually 2MB), rather than 4K.
This allocation can improve performance by reducing the number of pages the CPU must track. However, some

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
108
Configuration

applications still allocate memory based on 4K pages, which can cause noticeable performance problems when
Linux tries to defragment 2MB pages.
For more information, see the Cassandra Java Huge Pages blog and this RedHat bug report.
To solve this problem, disable defrag for Transparent Hugepages:

$ echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

For more information, including a temporary fix, see No DSE processing but high CPU usage.

YAML and configuration properties


cassandra.yaml configuration file
The cassandra.yaml file is the main configuration file for DataStax Enterprise. The dse.yaml file is the primary
configuration file for security, DSE Search, DSE Graph, and DSE Analytics.

After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.

Syntax
For the properties in each section, the parent setting has zero spaces. Each child entry requires at least two
spaces. Adhere to the YAML syntax and retain the spacing.

• Literal default values are shown as literal.

• Calculated values are shown as calculated.

• Default values that are not defined are shown as Default: none.

• Internally defined default values are described.


Default values can be defined internally, commented out, or have implementation dependencies on other
properties in the cassandra.yaml file. Additionally, some commented-out values may not match the
actual default values. The commented out values are recommended alternatives to the default values.

Organization
The configuration properties are grouped into the following sections:

• Quick start
The minimal properties needed for configuring a cluster.

• Default directories
If you have changed any of the default directories during installation, set these properties to the new
locations. Make sure you have root access.

• Commonly used
Properties most frequently used when configuring DataStax Enterprise.

• Performance tuning
Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O,
CPU, reads, and writes.

• Advanced
Properties for advanced users or properties that are less commonly used.

• Security

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
109
Configuration

DSE Unified Authentication provides authentication, authorization, and role management.

• Continuous paging options Properties configure memory, threads, and duration when pushing pages
continuously to the client.

• Memory leak detection settings Properties configure memory leak detection.

Quick start properties


The minimal properties needed for configuring a cluster.

cluster_name: 'Test Cluster'


listen_address: localhost
# listen_interface: wlan0
# listen_interface_prefer_ipv6: false

See Initializing a DataStax Enterprise cluster.

cluster_name
The name of the cluster. This setting prevents nodes in one logical cluster from joining another. All
nodes in a cluster must have the same value.
Default: 'Test Cluster'
listen_address
The IP address or hostname that the database binds to for connecting this node to other nodes.

• Never set listen_address to 0.0.0.0.

• Set listen_address or listen_interface, do not set both.

Default: localhost
listen_interface
The interface that the database binds to for connecting to other nodes. Interfaces must correspond to a
single address. IP aliasing is not supported.
Set listen_address or listen_interface, not both.
Default: commented out (wlan0)
listen_interface_prefer_ipv6
Use IPv4 or IPv6 when interface is specified by name.

• false - use first IPv4 address.

• true - use first IPv6 address.

When only a single address is used, that address is selected without regard to this setting.
Default: commented out (false)
Default directories

data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
cdc_raw_directory: /var/lib/cassandra/cdc_raw
hints_directory: /var/lib/cassandra/hints
saved_caches_directory: /var/lib/cassandra/saved_caches

If you have changed any of the default directories during installation, set these properties to the new locations.
Make sure you have root access.
data_file_directories

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
110
Configuration

The directory where table data is stored on disk. The database distributes data evenly across the
location, subject to the granularity of the configured compaction strategy. If not set, the directory is
$DSE_HOME/data/data.
For production, DataStax recommends RAID 0 and SSDs.
Default: - /var/lib/cassandra/data
commitlog_directory
The directory where the commit log is stored. If not set, the directory is $DSE_HOME/data/commitlog.
For optimal write performance, place the commit log on a separate disk partition, or ideally on a
separate physical device from the data file directories. Because the commit log is append only, a hard
disk drive (HDD) is acceptable.

DataStax recommends explicitly setting the location of the DSE Metrics Collector data directory.
When the DSE Metrics Collector is enabled and when the insights_options data dir is not explicitly
set in dse.yaml, the default location of the DSE Metrics Collector data directory is the same directory
as the commitlog directory.

Default: /var/lib/cassandra/commitlog
cdc_raw_directory
The directory where the change data capture (CDC) commit log segments are stored on flush. DataStax
recommends a physical device that is separate from the data directories. If not set, the directory is
$DSE_HOME/data/cdc_raw. See Change Data Capture (CDC) logging.
Default: /var/lib/cassandra/cdc_raw
hints_directory
The directory in which hints are stored. If not set, the directory is $CASSANDRA_HOME/data/hints.
Default: /var/lib/cassandra/hints
saved_caches_directory
The directory location where table key and row caches are stored. If not set, the directory is
$DSE_HOME/data/saved_caches.
Default: /var/lib/cassandra/saved_caches
Commonly used properties
Properties most frequently used when configuring DataStax Enterprise.
Before starting a node for the first time, DataStax recommends that you carefully evaluate your requirements.

• Common initialization properties

• Common compaction settings

• Common memtable settings

• Common automatic backup settings

Common initialization properties

commit_failure_policy: stop
prepared_statements_cache_size_mb:
# disk_optimization_strategy: ssd
disk_failure_policy: stop
endpoint_snitch: com.datastax.bdp.snitch.DseSimpleSnitch
seed_provider:
- org.apache.cassandra.locator.SimpleSeedProvider
- seeds: "127.0.0.1"
enable_user_defined_functions: false
enable_scripted_user_defined_functions: false
enable_user_defined_functions_threads: true

Be sure to set the properties in the Quick start section as well.

commit_failure_policy

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
111
Configuration

Policy for commit disk failures:

• die - Shut down the node and kill the JVM, so the node can be replaced.

• stop - Shut down the node, leaving the node effectively dead, available for inspection using JMX.

• stop_commit - Shut down the commit log, letting writes collect but continuing to service reads.

• ignore - Ignore fatal errors and let the batches fail.

Default: stop
prepared_statements_cache_size_mb
Maximum size of the native protocol prepared statement cache. Change this value only if there are
more prepared statements than fit in the cache.
Generally, the calculated default value is appropriate and does not need adjusting. DataStax
recommends contacting the DataStax Services team before changing this value.

Specifying a value that is too large results in long running GCs and possibly out-of-memory errors.
Keep the value at a small fraction of the heap.
Constantly re-preparing statements is a performance penalty. When not set, the default is automatically
calculated to heap / 256 or 10 MB, whichever is greater.
Default: calculated
disk_optimization_strategy
The strategy for optimizing disk reads.

• ssd - solid state disks

• spinning - spinning disks

When commented out, the default is ssd.


Default: commented out (ssd)
disk_failure_policy
Sets how the database responds to disk failure. Recommend settings: stop or best_effort. Valid values:

• die - Shut down gossip and client transports, and kill the JVM for any file system errors or single
SSTable errors, so the node can be replaced.

• stop_paranoid - Shut down the node, even for single SSTable errors.

• stop - Shut down the node, leaving the node effectively dead, but available for inspection using
JMX.

• best_effort - Stop using the failed disk and respond to requests based on the remaining available
SSTables. This setting allows obsolete data at consistency level of ONE.

• ignore - Ignore fatal errors and lets the requests fail; all file system errors are logged but otherwise
ignored.

See Recovering from a single disk failure using JBOD.


Default: stop
endpoint_snitch
A class that implements the IEndpointSnitch interface. The database uses the snitch to locate nodes
and route requests.
Use only snitch implementations bundled with DSE.

• DseSimpleSnitch
Appropriate only for development deployments. Proximity is determined by DSE workload, which
places transactional, analytics, and search nodes into their separate datacenters. Does not
recognize datacenter or rack information.

• GossipingPropertyFileSnitch

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
112
Configuration

Recommended for production. Reads rack and datacenter for the local node in cassandra-
rackdc.properties file and propagates these values to other nodes via gossip. For migration from
the PropertyFileSnitch, uses the cassandra-topology.properties file if it is present.

• PropertyFileSnitch
Determines proximity by rack and datacenter that are explicitly configured in cassandra-
topology.properties file.

• Ec2Snitch
For EC2 deployments in a single region. Loads region and availability zone information from the
Amazon EC2 API. The region is treated as the datacenter, the availability zone is treated as the
rack, and uses only private IP addresses. For this reason, Ec2Snitch does not work across multiple
regions.

• Ec2MultiRegionSnitch
Uses the public IP as the broadcast_address to allow cross-region connectivity. This means you
must also set seed addresses to the public IP and open the storage_port or ssl_storage_port
on the public IP firewall. For intra-region traffic, the database switches to the private IP after
establishing a connection.

• RackInferringSnitch
Proximity is determined by rack and datacenter, which are assumed to correspond to the 3rd and
2nd octet of each node's IP address, respectively. Best used as an example for writing a custom
snitch class (unless this happens to match your deployment conventions).

• GoogleCloudSnitch
Use for deployments on Google Cloud Platform across one or more regions. The region is
treated as a datacenter and the availability zones are treated as racks within the datacenter. All
communication occurs over private IP addresses within the same logical network.

• CloudstackSnitch
Use the CloudstackSnitch for Apache Cloudstack environments.

See Snitches.
Default: com.datastax.bdp.snitch.DseSimpleSnitch
seed_provider
The addresses of hosts that are designated as contact points in the cluster. A joining node contacts one
of the nodes in the -seeds list to learn the topology of the ring.
Use only seed provider implementations bundled with DSE.

• class_name - The class that handles the seed logic. It can be customized, but this is typically not
required.
Default: org.apache.cassandra.locator.SimpleSeedProvider

• - seeds - A comma delimited list of addresses that are used by gossip for bootstrapping new nodes
joining a cluster. If your cluster includes multiple nodes, you must change the list from the default
value to the IP address of one of the nodes.
Default: "127.0.0.1"

Making every node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a
small seed list (approximately three nodes per datacenter).

See Initializing a single datacenter per workload type and Initializing multiple datacenters per
workload type.
Default: org.apache.cassandra.locator.SimpleSeedProvider

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
113
Configuration

enable_user_defined_functions
Enables user defined functions (UDFs). UDFs present a security risk, since they are executed on the
server side. UDFs are executed in a sandbox to contain the execution of malicious code.

• true - Enabled. Supports Java as the code language. Detect endless loops and unintended memory
leaks.

• false - Disabled.

Default: false (disabled)


enable_scripted_user_defined_functions
Enables the use of JavaScript language in UDFs.

• true - Enabled. Allow JavaScript in addition to Java as a code language.

• false - Disabled. Only allow Java as a code language.

If enable_user_defined_functions is false, this setting has no impact.


Default: false
enable_user_defined_functions_threads
Enables asynchronous UDF execution which requires a function to complete before being executed
again.

• true - Enabled. Only one instance of a function can run at one time. Asynchronous execution
prevents UDFs from running too long or forever and destabilizing the cluster.

• false - Disabled. Allows multiple instances of the same function to run simultaneously. Required to
use UDFs within GROUP BY clauses.
Disabling asynchronous UDF execution implicitly disables the security manager. You must
monitor the read timeouts for UDFs that run too long or forever, which can cause the cluster to
destabilize.

Default: true
Common compaction settings

compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100

compaction_throughput_mb_per_sec
The MB per second to throttle compaction for the entire system. The faster the database inserts data,
the faster the system must compact in order to keep the SSTable count down.

• 16 to 32 x rate of write throughput in MB/second, recommended value.

• 0 - disable compaction throttling

See Configuring compaction.


Default: 16
compaction_large_partition_warning_threshold_mb
The partition size threshold before logging a warning.
Default: 100
Common memtable settings

memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048

memtable_heap_space_in_mb

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
114
Configuration

The amount of on-heap memory allocated for memtables. The database uses the total of this amount
and the value of memtable_offheap_space_in_mb to set a threshold for automatic memtable flush.
See memtable_cleanup_threshold and Tuning the Java heap.
Default: calculated 1/4 of heap size (2048)
memtable_offheap_space_in_mb
The amount of off-heap memory allocated for memtables. The database uses the total of this amount
and the value of memtable_heap_space_in_mb to set a threshold for automatic memtable flush.
See memtable_cleanup_threshold and Tuning the Java heap.
Default: calculated 1/4 of heap size (2048)
Common automatic backup settings

incremental_backups: false
snapshot_before_compaction: false

incremental_backups
Enables incremental backups.

• true - Enable incremental backups to create a hard link to each SSTable flushed or streamed
locally in a backups subdirectory of the keyspace data. Incremental backups enable storing
backups off site without transferring entire snapshots.
The database does not automatically clear incremental backup files. DataStax recommends
setting up a process to clear incremental backup hard links each time a new snapshot is
created.

• false - Do not enable incremental backups.

See Enabling incremental backups.


Default: false
snapshot_before_compaction
Whether to take a snapshot before each compaction. A snapshot is useful to back up data when there
is a data format change.
Be careful using this option, the database does not clean up older snapshots automatically.

See Configuring compaction.


Default: false
snapshot_before_dropping_column
When enabled, every time the user drops a column/columns from a table, a snapshot is created on
each node in the cluster before the change in schema is applied. Those snapshots have the same
name on each node. For example: auto-snapshot_drop-column-columnname_20200515143511000.
The name includes the name of the dropped column and the timestamp (UTC) when the column was
dropped.
The database does not automatically clear incremental backup files. DataStax recommends setting
up a process to clear incremental backup hard links each time a new snapshot is created.
Default: false
Performance tuning properties
Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU,
reads, and writes.
Performing tuning properties include:

• Commit log settings

• Lightweight transactions (LWT) settings

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
115
Configuration

• Change-data-capture (CDC) space settings

• Common compaction settings

• Common memtable settings

• Cache and index settings

• Streaming settings

Commit log settings

commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
# commitlog_sync_group_window_in_ms: 1000
# commitlog_sync_batch_window_in_ms: 2 //deprecated
commitlog_segment_size_in_mb: 32
# commitlog_total_space_in_mb: 8192
# commitlog_compression:
# - class_name: LZ4Compressor
# parameters:
# -

commitlog_sync
Commit log synchronization method:

• periodic - Send ACK signal for writes immediately. Commit log is synced every
commitlog_sync_period_in_ms.

• group - Send ACK signal for writes after the commit log has been flushed to disk. Wait up to
commitlog_sync_group_window_in_ms between flushes.

• batch - Send ACK signal for writes after the commit log has been flushed to disk. Each incoming
write triggers the flush task.

Default: periodic
commitlog_sync_period_in_ms
Use with commitlog_sync: periodic. Time interval between syncing the commit log to disk. Periodic
syncs are acknowledged immediately.
Default: 10000
commitlog_sync_group_window_in_ms
Use with commitlog_sync: group. The time that the database waits between flushing the commit log
to disk. DataStax recommends using group instead of batch.
Default: commented out (1000)
commitlog_sync_batch_window_in_ms
Deprecated. Use with commitlog_sync: batch. The maximum length of time that queries may be
batched together.
Default: commented out (2)
commitlog_segment_size_in_mb
The size of an individual commitlog file segment. A commitlog segment may be archived, deleted, or
recycled after all its data has been flushed to SSTables. This data can potentially include commitlog
segments from every table in the system. The default size is usually suitable, but for commitlog
archiving you might want a finer granularity; 8 or 16 MB is reasonable.

If you set max_mutation_size_in_kb explicitly, then you must set commitlog_segment_size_in_mb to:

2 * max_mutation_size_in_kb / 1024

The value must be positive and less than 2048.

See Commit log archive configuration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
116
Configuration

Default: 32
max_mutation_size_in_kb
The maximum size of a mutation before the mutation is rejected. Before increasing the commitlog
segment size of the commitlog segments, investigate why the mutations are larger than expected. Look
for underlying issues with access patterns and data model, because increasing the commitlog segment
size is a limited fix. When not set, the default is calculated as (commitlog_segment_size_in_mb *
1024) / 2.
Default: calculated
commitlog_total_space_in_mb
Disk usage threshold for commit logs before triggering the database flushing memtables to disk. If the
total space used by all commit logs exceeds this threshold, the database flushes memtables to disk for
the oldest commitlog segments to reclaim disk space by removing those log segments from the commit
log. This flushing reduces the amount of data to replay on start-up, and prevents infrequently updated
tables from keeping commitlog segments indefinitely. If the commitlog_total_space_in_mb is small,
the result is more flush activity on less-active tables.
See Configuring memtable thresholds.
Default for 64-bit JVMs: calculated (8192 or 25% of the total space of the commit log
value, whichever is smaller)
Default for 32-bit JVMs: calculated (32 or 25% of the total space of the commit log value,
whichever is smaller )
commitlog_compression
The compressor to use if commit log is compressed. To make changes, uncomment the
commitlog_compression section and these options:

# commitlog_compression:
# - class_name: LZ4Compressor
# parameters:
# -

• class_name: LZ4Compressor, Snappy, or Deflate

• parameters: optional parameters for the compressor

When not set, the default compression for the commit log is uncompressed.
Default: commented out
Lightweight transactions (LWT) settings

$ # concurrent_lw_transactions: 128 # max_pending_lw_transactions: 10000

concurrent_lw_transactions
Maximum number of permitted concurrent lightweight transactions (LWT).

• A higher number might improve throughput if non-contending LWTs are in heavy use, but will use
more memory and might be less successful with contention.

• When not set, the default value is 8x the number of TPC cores. This default value is appropriate for
most environments.

Default: calculated 8x the number of TPC cores


max_pending_lw_transactions
Maximum number of lightweight transactions (LWT) in the queue before node reports
OverloadedException for LWTs.
Default: 10000
Change-data-capture (CDC) space settings

cdc_enabled: false
cdc_total_space_in_mb: 4096

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
117
Configuration

cdc_free_space_check_interval_ms: 250

See also cdc_raw_directory.


cdc_enabled
Enables change data capture (CDC) functionality on a per-node basis. This modifies the logic used for
write path allocation rejection.

• true - use CDC functionality to reject mutations that contain a CDC-enabled table if at space limit
threshold in cdc_raw_directory.

• false - standard behavior, never reject.

Default: false
cdc_total_space_in_mb
Total space to use for change-data-capture (CDC) logs on disk. If space allocated for CDC exceeds
this value, the database throws WriteTimeoutException on mutations, including CDC-enabled tables.
A CDCCompactor (a consumer) is responsible for parsing the raw CDC logs and deleting them when
parsing is completed.
Default: calculated (4096 or 1/8th of the total space of the drive where the cdc_raw_directory resides)
cdc_free_space_check_interval_ms
Interval between checks for new available space for CDC-tracked tables when the
cdc_total_space_in_mb threshold is reached and the CDCCompactor is running behind or experiencing
back pressure. When not set, the default is 250.
Default: commented out (250)
Compaction settings

#concurrent_compactors: 1
# concurrent_validations: 0
concurrent_materialized_view_builders: 2
sstable_preemptive_open_interval_in_mb: 50
# pick_level_on_streaming: false

See also compaction_throughput_mb_per_sec in the common compaction settings section and Configuring
compaction.

concurrent_compactors
The number of concurrent compaction processes allowed to run simultaneously on a node, not
including validation compactions for anti-entropy repair. Simultaneous compactions help preserve
read performance in a mixed read-write workload by limiting the number of small SSTables that
accumulate during a single long-running compaction. If your data directories are backed by SSDs,
increase this value to the number of cores. If compaction running too slowly or too fast, adjust
compaction_throughput_mb_per_sec first.
Increasing concurrent compactors leads to more use of available disk space for compaction,
because concurrent compactions happen in parallel, especially for STCS. Ensure that adequate disk
space is available before increasing this configuration.

Generally, the calculated default value is appropriate and does not need adjusting. DataStax
recommends contacting the DataStax Services team before changing this value.
Default: calculated The fewest number of disks or number of cores, with a minimum of 2 and a
maximum of 8 per CPU core.
concurrent_validations
Number of simultaneous repair validations to allow. When not set, the default is unbounded. Values less
than one are interpreted as unbounded.
Default: commented out (0) unbounded
concurrent_materialized_view_builders

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
118
Configuration

Number of simultaneous materialized view builder tasks allowed to run concurrently. When a view
is created, the node ranges are split into (num_processors * 4) builder tasks and submitted to this
executor.
Default: 2
sstable_preemptive_open_interval_in_mb
The size of the SSTables to trigger preemptive opens. The compaction process opens SSTables before
they are completely written and uses them in place of the prior SSTables for any range previously
written. This process helps to smoothly transfer reads between the SSTables by reducing cache churn
and keeps hot rows hot.
A low value has a negative performance impact and will eventually cause heap pressure and GC
activity. The optimal value depends on hardware and workload.
Default: 50
pick_level_on_streaming
The compaction level for streamed-in SSTables.

• true - streamed-in SSTables of tables using LeveledCompactionStrategy (LCS) are placed on the
same level as the source node. For operational tasks like nodetool refresh or replacing a node, true
improves performance for compaction work.

• false - streamed-in SSTables are placed in level 0.

When not set, the default is false.


Default: commented out (false)
Memtable settings

memtable_allocation_type: heap_buffers
# memtable_cleanup_threshold: 0.34
memtable_flush_writers: 4

memtable_allocation_type
The method the database uses to allocate and manage memtable memory.

• heap_buffers - On heap NIO (non-blocking I/O) buffers.

• offheap_buffers - Off heap (direct) NIO buffers.

• offheap_objects - Native memory, eliminating NIO buffer heap overhead.

Default: heap_buffers
memtable_cleanup_threshold
Ratio used for automatic memtable flush.
Generally, the calculated default value is appropriate and does not need adjusting. DataStax
recommends contacting the DataStax Services team before changing this value.
When not set, the calculated default is 1/(memtable_flush_writers + 1)
Default: commented out (0.34)
memtable_flush_writers
The number of memtable flush writer threads per disk and the total number of memtables that can
be flushed concurrently, generally a combination of compute that is I/O bound. Memtable flushing
is more CPU efficient than memtable ingest. A single thread can keep up with the ingest rate of a
server on a single fast disk, until the server temporarily becomes I/O bound under contention, typically
with compaction. Generally, the default value is appropriate and does not need adjusting for SSDs.
However, the recommended default for HDDs: 2.
Default for SSDs: 4
Cache and index settings

column_index_size_in_kb: 16
# file_cache_size_in_mb: 4096

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
119
Configuration

# direct_reads_size_in_mb: 128

column_index_size_in_kb
Granularity of the index of rows within a partition. For huge rows, decrease this setting to improve seek
time. Lower density nodes might benefit from decreasing this value to 4, 2, or 1.
Default: 16
file_cache_size_in_mb
DSE 6.0.0-6.0.6: Maximum memory for buffer pooling and SSTable chunk cache. 32 MB is reserved
for pooling buffers, the remaining memory is the cache for holding recent or frequently used index
pages and uncompressed SSTable chunks. This pool is allocated off heap and is in addition to the
memory allocated for heap. Memory is allocated only when needed.
DSE 6.0.7 and later: Buffer pool is split into two pools, this setting defines the maximum memory to
use file buffers that are stored in the file cache, also known as chunk cache. Memory is allocated only
when needed but is not released. The other buffer pool is direct_reads_size_in_mb.
See Tuning Java Virtual Machine.
Default: calculated (0.5 of -XX:MaxDirectMemorySize)
direct_reads_size_in_mb
DSE 6.0.7 and later: Buffer pool is split into two pools, this setting defines the buffer pool for
transient read operations. A buffer is typically used by a read operation and then returned to this pool
when the operation is finished so that it can be reused by other operations. The other buffer pool is
file_cache_size_in_mb. When not set, the default calculated as 2 MB per TPC core thread, plus 2 MB
shared by non-TPC threads, with a maximum value of 128 MB.
Default: calculated
Streaming settings

# stream_throughput_outbound_megabits_per_sec: 200
# inter_dc_stream_throughput_outbound_megabits_per_sec: 200
# streaming_keep_alive_period_in_secs: 300
# streaming_connections_per_host: 1

stream_throughput_outbound_megabits_per_sec
Throttle for the throughput of all outbound streaming file transfers on a node. The database does
mostly sequential I/O when streaming data during bootstrap or repair which can saturate the network
connection and degrade client (RPC) performance. When not set, the value is 200 Mbps.
Default: commented out (200)
inter_dc_stream_throughput_outbound_megabits_per_sec
Throttle for all streaming file transfers between datacenters, and for network stream traffic as configured
with stream_throughput_outbound_megabits_per_sec. When not set, the value is 200 Mbps.
Should be set to a value less than or equal to stream_throughput_outbound_megabits_per_sec
since it is a subset of total throughput.
Default: commented out (200)
streaming_keep_alive_period_in_secs
Interval to send keep-alive messages to prevent reset connections during streaming. The stream
session fails when a keep-alive message is not received for 2 keep-alive cycles. When not set, the
default is 300 seconds (5 minutes) so that a stalled stream times out in 10 minutes.
Default: commented out (300)
streaming_connections_per_host
Maximum number of connections per host for streaming. Increase this value when you notice that joins
are CPU-bound, rather than network-bound. For example, a few nodes with large files. When not set,
the default is 1.
Default: commented out (1)
Fsync settings

trickle_fsync: true

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
120
Configuration

trickle_fsync_interval_in_kb: 10240

trickle_fsync
When set to true, causes fsync to force the operating system to flush the dirty buffers at the set
interval trickle_fsync_interval_in_kb. Enable this parameter to prevent sudden dirty buffer flushing from
impacting read latencies. Recommended for use with SSDs, but not with HDDs.
Default: false
trickle_fsync_interval_in_kb
The size of the fsync in kilobytes.
Default: 10240
max_value_size_in_mb
The maximum size of any value in SSTables. SSTables are marked as corrupted when the threshold is
exceeded.
Default: 256
Thread Per Core (TPC) parameters

#tpc_cores:
# tpc_io_cores:
io_global_queue_depth: 128

tpc_cores
The number of concurrent CoreThreads. The CoreThreads are the main workers in a DSE 6.x node,
and process various asynchronous tasks from their queue. If not set, the default is the number of cores
(processors on the machine) minus one. Note that configuring tpc_cores affects the default value for
tpc_io_cores.
To achieve optimal throughput and latency, for a given workload, set tpc_cores to half the number
of CPUs (minimum) to double the number of CPUs (maximum). In cases where there are a large
number of incoming client connections, increasing tpc_cores to more than the default usually results in
CoreThreads receiving more CPU time.

DSE Search workloads only: set tpc_cores to the number of physical CPUs. See Tuning search
for maximum indexing throughput.
Default: commented out; defaults to the number of cores minus one.
tpc_io_cores
The subset of tpc_cores that process asynchronous IO tasks. (That is, disk reads.) Must be smaller or
equal to tpc_cores. Lower this value to decrease parallel disk IO requests.
Default: commented out; by default, calculated as min(io_global_queue_depth/4, tpc_cores)
io_global_queue_depth
Global IO queue depth used for reads when AIO is enabled, which is the default for SSDs. The optimal
queue depth as found with the fio tool for a given disk setup.
Default: 128
NodeSync parameters

nodesync:
rate_in_kb: 1024

By default, the NodeSync service runs on every node.


Manage the NodeSync service using the nodetool nodesyncservice command.

See Setting the NodeSync rate.

rate_in_kb
The maximum kilobytes per second for data validation on the local node. The optimum validation rate
for each node may vary.
Default: 1024

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
121
Configuration

Advanced properties
Properties for advanced users or properties that are less commonly used.
Advanced initialization properties

batch_size_warn_threshold_in_kb: 64
batch_size_fail_threshold_in_kb: 640
unlogged_batch_across_partitions_warn_threshold: 10
# broadcast_address: 1.2.3.4
# listen_on_broadcast_address: false
# initial_token:
# num_tokens: 128
# allocate_tokens_for_local_replication_factor: 3
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800

auto_bootstrap
This setting has been removed from default configuration.

• true - causes new (non-seed) nodes migrate the right data to themselves automatically

• false - When initializing a fresh cluster without data

See Initializing a DataStax Enterprise cluster.


When not set, the internal default is true.
Default: not present
batch_size_warn_threshold_in_kb
Threshold to log a warning message when any multiple-partition batch size exceeds this value in
kilobytes.
Increasing this threshold can lead to node instability.
Default: 64
batch_size_fail_threshold_in_kb
Threshold to fail and log WARN on any multiple-partition batch whose size exceeds this value. The
default value is 10X the value of batch_size_warn_threshold_in_kb.
Default: 640
unlogged_batch_across_partitions_warn_threshold
Threshold to log a WARN message on any batches not of type LOGGED that span across more
partitions than this limit.
Default: 10
broadcast_address
The public IP address this node uses to broadcast to other nodes outside the network or across regions
in multiple-region EC2 deployments. If this property is commented out, the node uses the same IP
address or hostname as listen_address. A node does not need a separate broadcast_address in a
single-node or single-datacenter installation, or in an EC2-based network that supports automatic
switching between private and public communication. It is necessary to set a separate listen_address
and broadcast_address on a node with multiple physical network interfaces or other topologies where
not all nodes have access to other nodes by their private IP addresses. For specific configurations, see
the instructions for listen_address.
Default: listen_address
listen_on_broadcast_address
Enables the node to communicate on both interfaces.

• true - If this node uses multiple physical network interfaces, set a unique IP address for
broadcast_address

• false - if this node is on a network that automatically routes between public and private networks,
like Amazon EC2 does

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
122
Configuration

See listen_address.
Default: false
initial_token
The token to start the contiguous range. Set this property for single-node-per-token architecture, in
which a node owns exactly one contiguous range in the ring space. Setting this property overrides
num_tokens.
If your installation is not using vnodes or this node's num_tokens is set it to 1 or is commented out, you
should always set an initial_token value when setting up a production cluster for the first time, and
when adding capacity. See Generating tokens.
Use this parameter only with num_tokens (vnodes ) in special cases such as Restoring from a
snapshot.
Default: 1 (disabled)
num_tokens
Define virtual node (vnode) token architecture.
All other nodes in the datacenter must have the same token architecture.

• 1 - disable vnodes and use 1 token for legacy compatibility.

• a number between 2 and 128 - the number of token ranges to assign to this virtual node (vnode). A
higher value increases the probability that the data and workload are evenly distributed.
DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured for
your environment.

Using vnodes can impact performance for your cluster. DataStax recommends testing the
configuration before enabling vnodes in production environments.

When the token number varies between nodes in a datacenter, the vnode logic assigns a
proportional number of ranges relative to other nodes in the datacenter. In general, if all nodes
have equal hardware capability, each node should have the same num_tokens value.

Default: 1 (disabled)
To migrate an existing cluster from single node per token range to vnodes, see Enabling virtual nodes
on an existing production cluster.
allocate_tokens_for_local_replication_factor

• RF of keyspaces in datacenter - triggers the recommended algorithmic allocation for the RF and
num_tokens for this node.
The allocation algorithm optimizes the workload balance using the target keyspace replication
factor. DataStax recommends setting the number of tokens to 8 to distribute the workload with
~10% variance between nodes. The allocation algorithm attempts to choose tokens in a way that
optimizes replicated load over the nodes in the datacenter for the specified RF. The load assigned
to each node is close to proportional to the number of vnodes.

The allocation algorithm is supported only for the Murmur3Partitioner and RandomPartitioner
partitioners. The Murmur3Partitioner is the default partitioning strategy for new clusters and the
right choice for new clusters in almost all cases.

• commented out - uses the random selection algorithm to assign token ranges randomly.
Over time, loads in a datacenter using the random selection algorithm become unevenly
distributed. DataStax recommends using only the allocation algorithm.

Default: commented out (use random selection algorithm)


See Virtual node (vnode) configuration, and for set up instructions see Adding nodes to vnode-enabled
cluster or Adding a datacenter to a cluster.
partitioner

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
123
Configuration

The class that distributes rows (by partition key) across all nodes in the cluster. Any IPartitioner
may be used, including your own as long as it is in the class path. For new clusters use the default
partitioner.
DataStax Enterprise provides the following partitioners for backward compatibility:

• RandomPartitioner

• ByteOrderedPartitioner (deprecated)

• OrderPreservingPartitioner (deprecated)

Use only partitioner implementations bundled with DSE.

See Partitioners.
Default: org.apache.cassandra.dht.Murmur3Partitioner
tracetype_query_ttl
TTL for different trace types used during logging of the query process.
Default: 86400
tracetype_repair_ttl
TTL for different trace types used during logging of the repair process.
Default: 604800
Advanced automatic backup setting

auto_snapshot: true

auto_snapshot
Enables snapshots of the data before truncating a keyspace or dropping a table. To prevent data loss,
DataStax strongly advises using the default setting. If you set auto_snapshot to false, you lose data on
truncation or drop.
Default: true
Global row properties

column_index_cache_size_in_kb: 2
# row_cache_class_name: org.apache.cassandra.cache.OHCProvider
row_cache_size_in_mb: 0
row_cache_save_period: 0
# row_cache_keys_to_save: 100

When creating or modifying tables, you can enable or disable the row cache for that table by setting the caching
parameter. Other row cache tuning and configuration options are set at the global (node) level. The database
uses these settings to automatically distribute memory for each table on the node based on the overall workload
and specific table usage. You can also configure the save periods for these caches globally.

See Configuring caches.

column_index_cache_size_in_kb
(Only applies to BIG format SSTables) Threshold for the total size of all index entries for a partition that
the database stores in the partition key cache. If the total size of all index entries for a partition exceeds
this amount, the database stops putting entries for this partition into the partition key cache.
Default: 2
row_cache_class_name
The classname of the row cache provider to use. Valid values:

• org.apache.cassandra.cache.OHCProvider - fully off-heap

• org.apache.cassandra.cache.SerializingCacheProvider - partially off-heap, available in earlier


releases

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
124
Configuration

Use only row cache provider implementations bundled with DSE.


When not set, the default is org.apache.cassandra.cache.OHCProvider (fully off-heap)
Default: commented out (org.apache.cassandra.cache.OHCProvider)
row_cache_size_in_mb
Maximum size of the row cache in memory. The row cache can save time, but it is space-intensive
because it contains the entire row. Use the row cache only for hot rows or static rows. If you reduce the
size, you may not get the hottest keys loaded on start up.

• 0 - disable row caching

• MB - Maximum size of the row cache in memory

Default: 0 (disabled)
row_cache_save_period
The number of seconds that rows are kept in cache. Caches are saved to saved_caches_directory. This
setting has limited use as described in row_cache_size_in_mb.
Default: 0 (disabled)
row_cache_keys_to_save
The number of keys from the row cache to save. All keys are saved.
Default: commented out (100)
Counter caches properties

counter_cache_size_in_mb:
counter_cache_save_period: 7200
# counter_cache_keys_to_save: 100

Counter cache helps to reduce counter locks' contention for hot counter cells. In case of RF = 1 a counter cache
hit causes the database to skip the read before write entirely. With RF > 1 a counter cache hit still helps to
reduce the duration of the lock hold, helping with hot counter cell updates, but does not allow skipping the read
entirely. Only the local (clock, count) tuple of a counter cell is kept in memory, not the whole counter, so it is
relatively cheap.

If you reduce the counter cache size, the database may load the hottest keys start-up.

counter_cache_size_in_mb
When no value is set, the database uses the smaller of minimum of 2.5% of heap or 50 megabytes
(MB). If your system performs counter deletes and relies on low gc_grace_seconds, you should disable
the counter cache. To disable, set to 0.
Default: calculated
counter_cache_save_period
The time, in seconds, after which the database saves the counter cache (keys only). The database
saves caches to saved_caches_directory.
Default: 7200 (2 hours)
counter_cache_keys_to_save
Number of keys from the counter cache to save. When not set, the database saves all keys.
Default: commented out (disabled, saves all keys)
Tombstone settings

tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000

When executing a scan, within or across a partition, the database must keep tombstones in memory to allow
them to return to the coordinator. The coordinator uses tombstones to ensure that other replicas know about the
deleted rows. Workloads that generate numerous tombstones may cause performance problems and exhaust
the server heap. Adjust these thresholds only if you understand the impact and want to scan more tombstones.
You can adjust these thresholds at runtime using the StorageServiceMBean.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
125
Configuration

See the DataStax Developer Blog post Cassandra anti-patterns: Queues and queue-like datasets.

tombstone_warn_threshold
The database issues a warning if a query scans more than this number of tombstones.
Default: 1000
tombstone_failure_threshold
The database aborts a query if it scans more than this number of tombstones.
Default: 100000
Network timeout settings

read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
aggregated_request_timeout_in_ms: 120000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
# cross_dc_rtt_in_ms: 0

read_request_timeout_in_ms
Default: 5000. How long the coordinator waits for read operations to complete before timing it out.
range_request_timeout_in_ms
Default: 10000. How long the coordinator waits for sequential or index scans to complete before timing
it out.
aggregated_request_timeout_in_ms
How long the coordinator waits for sequential or index scans to complete. Lowest acceptable value is
10 ms. This timeout does not apply to aggregated queries such as SELECT, COUNT(*), MIN(x), and so
on.
Default: 120000 (2 minutes)
write_request_timeout_in_ms
How long the coordinator waits for write requests to complete with at least one node in the local
datacenter. Lowest acceptable value is 10 ms.
See Hinted handoff: repair during write path.
Default: 2000 (2 seconds)
counter_write_request_timeout_in_ms
How long the coordinator waits for counter writes to complete before timing it out.
Default: 5000 (5 seconds)
cas_contention_timeout_in_ms
How long the coordinator continues to retry a CAS (compare and set) operation that contends with other
proposals for the same row. If the coordinator cannot complete the operation within this timespan, it
aborts the operation.
Default: 1000 (1 second)
truncate_request_timeout_in_ms
How long the coordinator waits for a truncate (the removal of all data from a table) to complete before
timing it out. The long default value allows the database to take a snapshot before removing the data. If
auto_snapshot is disabled (not recommended), you can reduce this time.
Default: 60000 (1 minute)
request_timeout_in_ms
The default timeout value for other miscellaneous operations. Lowest acceptable value is 10 ms.
See Hinted handoff: repair during write path.
Default: 10000
cross_dc_rtt_in_ms
How much to increase the cross-datacenter timeout (write_request_timeout_in_ms +
cross_dc_rtt_in_ms) for requests that involve only nodes in a remote datacenter. This setting is
intended to reduce hint pressure.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
126
Configuration

DataStax recommends using LOCAL_* consistency levels (CL) for read and write requests in multi-
datacenter deployments to avoid timeouts that may occur when remote nodes are chosen to satisfy
the CL, such as QUORUM.
Default: commented out (0)
slow_query_log_timeout_in_ms
Default: 500. How long before a node logs slow queries. Select queries that exceed this value generate
an aggregated log message to identify slow queries. To disable, set to 0.
Inter-node settings

storage_port: 7000
cross_node_timeout: false
# internode_send_buff_size_in_bytes:
# internode_recv_buff_size_in_bytes:
internode_compression: dc
inter_dc_tcp_nodelay: false

storage_port
The port for inter-node communication. Follow security best practices, do not expose this port to the
internet. Apply firewall rules.
See Securing DataStax Enterprise ports.
Default: 7000
cross_node_timeout
Enables operation timeout information exchange between nodes to accurately measure request
timeouts. If this property is disabled, the replica assumes any requests are forwarded to it instantly by
the coordinator. During overload conditions this means extra time is required for processing already-
timed-out requests.
Before enabling this property make sure NTP (network time protocol) is installed and the times are
synchronized among the nodes.
Default: false
internode_send_buff_size_in_bytes
The sending socket buffer size, in bytes, for inter-node calls.
See TCP settings.

The sending socket buffer size and internode_recv_buff_size_in_bytes is limited by


net.core.wmem_max. If this property is not set, net.ipv4.tcp_wmem determines the buffer size. For
more details run man tcp and refer to:

• /proc/sys/net/core/wmem_max

• /proc/sys/net/core/rmem_max

• /proc/sys/net/ipv4/tcp_wmem

• /proc/sys/net/ipv4/tcp_wmem

Default: not set


internode_recv_buff_size_in_bytes
The receiving socket buffer size in bytes for inter-node calls.
Default: not set
internode_compression
Controls whether traffic between nodes is compressed. Valid values:

• all - Compresses all traffic

• dc - Compresses traffic between datacenters only

• none - No compression.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
127
Configuration

Default: dc
inter_dc_tcp_nodelay
Enables tcp_nodelay for inter-datacenter communication. When disabled, the network sends larger,
but fewer, network packets. This reduces overhead from the TCP protocol itself. However, disabling
inter_dc_tcp_nodelay may increase latency by blocking cross datacenter responses.
Default: false
Native transport (CQL Binary Protocol)

start_native_transport: true
native_transport_port: 9042
# native_transport_port_ssl: 9142
# native_transport_max_frame_size_in_mb: 256
# native_transport_max_concurrent_connections: -1
# native_transport_max_concurrent_connections_per_ip: -1
native_transport_address: localhost
# native_transport_interface: eth0
# native_transport_interface_prefer_ipv6: false
# native_transport_broadcast_address: 1.2.3.4
native_transport_keepalive: true

See also native_transport_port_ssl in SSL Ports.

start_native_transport
Enables or disables the native transport server.
Default: true
native_transport_port
The port where the CQL native transport listens for clients. For security reasons, do not expose this port
to the internet. Firewall it if needed.
Default: 9042
native_transport_max_frame_size_in_mb
The maximum allowed size of a frame. Frame (requests) larger than this are rejected as invalid.
Default: 256
native_transport_max_concurrent_connections
The maximum number of concurrent client connections.
Default: -1 (unlimited)
native_transport_max_concurrent_connections_per_ip
The maximum number of concurrent client connections per source IP address.
Default: -1 (unlimited)
native_transport_address
When left blank, uses the configured hostname of the node. Unlike the listen_address, this value
can be set to 0.0.0.0, but you must set the native_transport_broadcast_address to a value other than
0.0.0.0.
Set native_transport_address OR native_transport_interface, not both.
Default: localhost
native_transport_interface
IP aliasing is not supported.
Set native_transport_address OR native_transport_interface, not both.
Default: eth0
native_transport_interface_prefer_ipv6
Use IPv4 or IPv6 when interface is specified by name.

• false - use first IPv4 address.

• true - use first IPv6 address.

When only a single address is used, that address is selected without regard to this setting.
Default: commented out (false)
native_transport_broadcast_address

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
128
Configuration

Native transport address to broadcast to drivers and other DSE nodes. This cannot be set to 0.0.0.0.

• blank - will be set to the value of native_transport_address

• IP_address - when native_transport_address is set to 0.0.0.0

Default: commented out (1.2.3.4)


native_transport_keepalive
Enables keepalive on native connections.
Default: true
Advanced fault detection settings
Settings to handle poorly performing or failing components.

# gc_log_threshold_in_ms: 200
# gc_warn_threshold_in_ms: 1000
# otc_coalescing_strategy: DISABLED
# otc_coalescing_window_us: 200
# otc_coalescing_enough_coalesced_messages: 8

gc_log_threshold_in_ms
The threshold for log messages at the INFO level. Adjust to minimize logging.
Default: commented out (200)
gc_warn_threshold_in_ms
Threshold for GC pause. Any GC pause longer than this interval is logged at the WARN level. By
default, the database logs any GC pause greater than 200 ms at the INFO level.

See Configuring logging.

Default: commented out (1000)


otc_coalescing_strategy
Strategy to combine multiple network messages into a single packet for outbound TCP connections
to nodes in the same data center. See the DataStax Developer Blog post Performance doubling with
message coalescing.
Use only strategy implementations bundled with DSE.
Supported strategies are:

• FIXED

• MOVINGAVERAGE

• TIMEHORIZON

• DISABLED

Default: commented out (DISABLED)


otc_coalescing_window_us
How many microseconds to wait for coalescing messages to nodes in the same datacenter.

• For FIXED strategy - the amount of time after the first message is received before it is sent with any
accompanying messages.

• For MOVING average - the maximum wait time and the interval that messages must arrive on
average to enable coalescing.

Default: commented out (200)


otc_coalescing_enough_coalesced_messages
The threshold for the number of messages to nodes in the same data center. Do not coalesce
messages when this value is exceeded. Should be more than 2 and less than 128.
Default: commented out (8)
seed_gossip_probability

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
129
Configuration

The percentage of time that gossip messages are sent to a seed node during each round of gossip.
Decreases the time to propagate gossip changes across the cluster.
Default: 1.0 (100%)
Backpressure settings

back_pressure_enabled: false
back_pressure_strategy:
- class_name: org.apache.cassandra.net.RateBasedBackPressure
parameters:
- high_ratio: 0.90
factor: 5
flow: FAST

back_pressure_enabled
Enables the coordinator to apply the specified back pressure strategy to each mutation that is sent to
replicas.
Default: false
back_pressure_strategy
To add new strategies, implement org.apache.cassandra.net.BackpressureStrategy and provide a
public constructor that accepts a Map<String, Object>.
Use only strategy implementations bundled with DSE.
class_name
The default class_name uses the ratio between incoming mutation responses and outgoing mutation
requests.
Default: org.apache.cassandra.net.RateBasedBackPressure
high_ratio
When outgoing mutations are below this value, they are rate limited according to the incoming rate
decreased by the factor (described below). When above this value, the rate limiting is increased by the
factor.
Default: 0.90
factor
A number between 1 and 10. When backpressure is below high ratio, outgoing mutations are rate
limited according to the incoming rate decreased by the given factor; if above high ratio, the rate limiting
is increased by the given factor.
Default: 5
flow
The flow speed to apply rate limiting:

• FAST - rate limited to the speed of the fastest replica

• SLOW - rate limit to the speed of the slowest replica

Default: FAST
dynamic_snitch_badness_threshold
The performance threshold for dynamically routing client requests away from a poorly performing
node. Specifically, it controls how much worse a poorly performing node has to be before the dynamic
snitch prefers other replicas. A value of 0.2 means the database continues to prefer the static snitch
values until the node response time is 20% worse than the best performing node. Until the threshold is
reached, incoming requests are statically routed to the closest replica as determined by the snitch.
Default: 0.1
dynamic_snitch_reset_interval_in_ms
Time interval after which the database resets all node scores. This allows a bad node to recover.
Default: 600000
dynamic_snitch_update_interval_in_ms
The time interval, in milliseconds, between the calculation of node scores. Because score calculation is
CPU intensive, be careful when reducing this interval.
Default: 100

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
130
Configuration

Hinted handoff options

hinted_handoff_enabled: true
# hinted_handoff_disabled_datacenters:
# - DC1
# - DC2
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_directory: /var/lib/cassandra/hints
hints_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128
#hints_compression:
# - class_name: LZ4Compressor
# parameters:
# -
batchlog_replay_throttle_in_kb: 1024
# batchlog_endpoint_strategy: random_remote

See Hinted handoff: repair during write path.

hinted_handoff_enabled
Enables or disables hinted handoff. A hint indicates that the write needs to be replayed to an
unavailable node. The database writes the hint to a hints file on the coordinator node.

• false - do not enable hinted handoff

• true - globally enable hinted handoff, except for datacenters specified for
hinted_handoff_disabled_datacenters

Default: true
hinted_handoff_disabled_datacenters
A blacklist of datacenters that will not perform hinted handoffs. To disable hinted handoff on a certain
datacenter, add its name to this list.
Default: commented out
max_hint_window_in_ms
Maximum amount of time during which the database generates hints for an unresponsive node.
After this interval, the database does not generate any new hints for the node until it is back up and
responsive. If the node goes down again, the database starts a new interval. This setting can prevent a
sudden demand for resources when a node is brought back online and the rest of the cluster attempts
to replay a large volume of hinted writes.
See About failure detection and recovery.
Default: 10800000 (3 hours)
hinted_handoff_throttle_in_kb
Maximum amount of traffic per delivery thread in kilobytes per second. This rate reduces proportionally
to the number of nodes in the cluster. For example, if there are two nodes in the cluster, each delivery
thread uses. the maximum rate. If there are three, each node throttles to half of the maximum, since the
two nodes are expected to deliver hints simultaneously.
When applying this limit, the calculated hint transmission rate is based on the uncompressed hint
size, even if internode_compression or hints_compression is enabled.
Default: 1024
hints_flush_period_in_ms
The time, in milliseconds, to wait before flushing hints from internal buffers to disk.
Default: 10000
max_hints_delivery_threads
Number of threads the database uses to deliver hints. In multiple datacenter deployments, consider
increasing this number because cross datacenter handoff is generally slower.
Default: 2
max_hints_file_size_in_mb

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
131
Configuration

The maximum size for a single hints file, in megabytes.


Default: 128
hints_compression
The compressor for hint files. Supported compressors: LZ, Snappy, and Deflate. When not set, the
database does not compress hints files.
Default: LZ4Compressor
batchlog_replay_throttle_in_kb
Total maximum throttle, in KB per second, for replaying hints. Throttling is reduced proportionally to the
number of nodes in the cluster
Default: 1024
batchlog_endpoint_strategy
Strategy to choose the batchlog storage endpoints.

• random_remote - Default, purely random. Prevents the local rack, if possible. Same behavior as
earlier releases.

• dynamic_remote - Uses DynamicEndpointSnitch to select batchlog storage endpoints. Prevents


the local rack, if possible. This strategy offers the same availability guarantees as random_remote,
but selects the fastest endpoints according to the DynamicEndpointSnitch. DynamicEndpointSnitch
tracks reads but not writes. Write-only, or mostly-write, workloads might not benefit from this
strategy. Note: this strategy will fall back to random_remote if dynamic_snitch is not enabled.

• dynamic - Mostly the same as dynamic_remote, except that local rack is not excluded, which offers
lower availability guarantee than random_remote or dynamic_remote. Note: this strategy will fall
back to random_remote if dynamic_snitch is not enabled.

Default: random_remote
Security properties
DSE Advanced Security fortifies DataStax Enterprise (DSE) databases against potential harm due to deliberate
attack or user error. Configuration properties include authentication and authorization, permissions, roles,
encryption of data in-flight and at-rest, and data auditing. DSE Unified Authentication provides authentication,
authorization, and role management. Enabling DSE Unified Authentication requires additional configuration in
dse.yaml, see Configuring DSE Unified Authentication.

authenticator: com.datastax.bdp.cassandra.auth.DseAuthenticator
# internode_authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
authorizer: com.datastax.bdp.cassandra.auth.DseAuthorizer
role_manager: com.datastax.bdp.cassandra.auth.DseRoleManager
system_keyspaces_filtering: false
roles_validity_in_ms: 120000
# roles_update_interval_in_ms: 120000
permissions_validity_in_ms: 120000
# permissions_update_interval_in_ms: 120000

authenticator
The authentication backend. The only supported authenticator is DseAuthenticator for external
authentication with multiple authentication schemes such as Kerberos, LDAP, and internal
authentication. Authenticators other than DseAuthenticator are deprecated and not supported. Some
security features might not work correctly if other authenticators are used. See authentication_options in
dse.yaml.
Use only authentication implementations bundled with DSE.
Default: com.datastax.bdp.cassandra.auth.DseAuthenticator
internode_authenticator
Internode authentication backend to enable secure connections from peer nodes.
Use only authentication implementations bundled with DSE.
Default: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
authorizer

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
132
Configuration

The authorization backend. Authorizers other than DseAuthorizer are not supported. DseAuthorizer
supports enhanced permission management of DSE-specific resources. Authorizers other than
DseAuthorizer are deprecated and not supported. Some security features might not work correctly if
other authorizers are used. See Authorization options in dse.yaml.
Use only authorization implementations bundled with DSE.
Default: com.datastax.bdp.cassandra.auth.DseAuthorizer
system_keyspaces_filtering
Enables system keyspace filtering so that users can access and view only schema information
for rows in the system and system_schema keyspaces to which they have access. When
system_keyspaces_filtering is set to true:

• Data in the system.local and system.peers tables are visible

• Data in the following tables of the system keyspace are filtered based on the role's DESCRIBE
privileges for keyspaces; only rows for appropriate keyspaces will be displayed in:

# size_estimates

# sstable_activity

# built_indexes

# built_views

# available_ranges

# view_builds_in_progress

• Data in all tables in the system_schema keyspace are filtered based on a role's DESCRIBE privileges
for keyspaces stored in the system_schema tables.

• Read operations against other tables in the system keyspace are denied

Security requirements and user permissions apply. Enable this feature only after appropriate user
permissions are granted. You must grant the DESCRIBE permission to role on any keyspaces stored
in the system keyspaces. If you do not grant the permission, you will see an error that states the
keyspace is not found.

GRANT DESCRIBE ON KEYSPACE keyspace_name TO ROLE role_name;

See Controlling access to keyspaces and tables and Configuring the security keyspaces replication
factors.
Default: false
role_manager
The DSE Role Manager supports LDAP roles and internal roles supported by the
CassandraRoleManager. Role options are stored in the dse_security keyspace. When using the DSE
Role Manager, increase the replication factor of the dse_security keyspace. Role managers other than
DseRoleManager are deprecated and not supported. Some security features might not work correctly if
other role managers are used.
Use only role manager implementations bundled with DSE.
Default: com.datastax.bdp.cassandra.auth.DseRoleManager
roles_validity_in_ms
Validity period for roles cache in milliseconds. Determines how long to cache the list of roles assigned
to the user; users may have several roles, either through direct assignment or inheritance (a role that
has been granted to another role). Adjust this setting based on the complexity of your role hierarchy,
tolerance for role changes, the number of nodes in your environment, and activity level of the cluster.
Fetching permissions can be an expensive operation, so this setting allows flexibility. Granted roles
are cached for authenticated sessions in AuthenticatedUser. After the specified time elapses, role

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
133
Configuration

validity is rechecked. Disabled automatically when internal authentication is not enabled when using
DseAuthenticator.

• 0 - disable role caching

• milliseconds - how long to cache the list of roles assigned to the user

Default: 120000 (2 minutes)


roles_update_interval_in_ms
Refresh interval for roles cache. After this interval, cache entries become eligible for refresh. On next
access, the database schedules an async reload, and returns the old value until the reload completes. If
roles_validity_in_ms is non-zero, then this value must also be non-zero. When not set, the default is
the same value as roles_validity_in_ms.
Default: commented out (120000)
permissions_validity_in_ms
How long permissions in cache remain valid to manage performance impact of permissions queries.
Fetching permissions can be resource intensive. Set the cache validity period to your security
tolerances. The cache is used for the standard authentication and the row-level access control (RLAC)
cache. The cache is quite effective at small durations.

• 0 - disable permissions cache

• milliseconds - time, in milliseconds

REVOKE does not automatically invalidate cached permissions. Permissions are invalidated the next
time they are refreshed.
Default: 120000 (2 minutes)
permissions_update_interval_in_ms
Sets refresh interval for the standard authentication cache and the row-level access control
(RLAC) cache. After this interval, cache entries become eligible for refresh. On next access,
the database schedules an async reload and returns the old value until the reload completes. If
permissions_validity_in_ms is non-zero, the value for roles_update_interval_in_ms must also be non-
zero. When not set, the default is the same value as permissions_validity_in_ms.
Default: commented out (2000)
permissions_cache_max_entries
The maximum number of entries that are held by the standard authentication cache and row-level
access control (RLAC) cache. With the default value of 1000, the RLAC permissions cache can have
up to 1000 entries in it, and the standard authentication cache can have up to 1000 entries. This single
option applies to both caches. To size the permissions cache for use with Setting up Row Level Access
Control (RLAC), use this formula:

numRlacUsers * numRlacTables + 100

If this option is not present in cassandra.yaml, manually enter it to use a value other than 1000. See
Enabling DSE Unified Authentication.
Default: not set (1000)
Inter-node encryption options
Node-to-node (internode) encryption protects data that is transferred between nodes in a cluster using SSL.

server_encryption_options:
internode_encryption: none
keystore: resources/dse/conf/.keystore
keystore_password: cassandra
truststore: resources/dse/conf/.truststore
truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
134
Configuration

# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_
# require_client_auth: false
# require_endpoint_verification: false

server_encryption_options
Inter-node encryption options. If enabled, you must also generate keys and provide the appropriate key
and truststore locations and passwords. No custom encryption options are supported.
The passwords used in these options must match the passwords used when generating the keystore
and truststore. For instructions on generating these files, see Creating a Keystore to Use with JSSE.

See Configuring SSL for node-to-node connections.


internode_encryption
Encryption options for of inter-node communication using the TLS_RSA_WITH_AES_128_CBC_SHA
cipher suite for authentication, key exchange, and encryption of data transfers. Use the DHE/ECDHE
ciphers, such as TLS_DHE_RSA_WITH_AES_128_CBC_SHA if running in (Federal Information
Processing Standard) FIPS 140 compliant mode.

• all - Encrypt all inter-node communications

• none - No encryption

• dc - Encrypt the traffic between the datacenters

• rack - Encrypt the traffic between the racks

Default: none
keystore
Relative path from DSE installation directory or absolute path to the Java keystore (JKS) suitable for
use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer
(SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to
encrypt outgoing messages.
Default: resources/dse/conf/.keystore
keystore_password
Password for the keystore. This must match the password used when generating the keystore and
truststore.
Default: cassandra
truststore
Relative path from DSE installation directory or absolute path to truststore containing the trusted
certificate for authenticating remote servers.
Default: resources/dse/conf/.truststore
truststore_password
Password for the truststore.
Default: cassandra
protocol
Default: commented out (TLS)
algorithm
Default: commented out (SunX509)
store_type
Valid types are JKS, JCEKS, and PKCS12.
PKCS11 is not supported.
Default: commented out (JKS)
truststore_type
Valid types are JKS, JCEKS, and PKCS12.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
135
Configuration

PKCS11 is not supported. Also, due to an OpenSSL issue, you cannot use a PKCS12 truststore that
was generated via OpenSSL. For example, a truststore generated via the following command will not
work with DSE:

$ openssl pkcs12 -export -nokeys -out truststore.pfx -in intermediate.chain.pem

However, truststores generated via Java's keytool and then converted to PKCS12 work with DSE.
Example:

$ keytool -importcert -alias rootca -file rootca.pem -keystore truststore.jks

$ keytool -importcert -alias intermediate -file intermediate.pem -keystore


truststore.jks

$ keytool -importkeystore -srckeystore truststore.jks -destkeystore truststore.pfx


-deststoretype pkcs12

Default: commented out (JKS)


cipher_suites
Supported ciphers:

• TLS_RSA_WITH_AES_128_CBC_SHA

• TLS_RSA_WITH_AES_256_CBC_SHA

• TLS_DHE_RSA_WITH_AES_128_CBC_SHA

• TLS_DHE_RSA_WITH_AES_256_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

Default: commented out


require_client_auth
Whether to enable certificate authentication for node-to-node (internode) encryption. When not set, the
default is false.
Default: commented out (false)
require_endpoint_verification
Whether to verify the connected host and the host name in the certificate match. When not set, the
default is false.
Default: commented out (false)
Client-to-node encryption options
Client-to-node encryption protects in-flight data from client machines to a database cluster using SSL (Secure
Sockets Layer) and establishes a secure channel between the client and the coordinator node.

client_encryption_options:
enabled: false
# If enabled and optional is set to true, encrypted and unencrypted connections over
native transport are handled.
optional: false
keystore: resources/dse/conf/.keystore
keystore_password: cassandra
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
# truststore: resources/dse/conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
136
Configuration

# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_

See Configuring SSL for client-to-node connections.

client_encryption_options
Whether to enable client-to-node encryption. You must also generate keys and provide the appropriate
key and truststore locations and passwords. There are no custom encryption options enabled for
DataStax Enterprise.
Advanced settings:
enabled
Whether to enable client-to-node encryption.
Default: false
optional
When optional is selected, both encrypted and unencrypted connections over native transport are
allowed. That is a necessary transition state to facilitate enabling client to node encryption on live
clusters without inducing an outage for existing unencrypted clients. Typically, once existing clients
are migrated to encrypted connections, optional is unselected in order to enforce native transport
encryption.
Default: false
keystore
Relative path from DSE installation directory or absolute path to the Java keystore (JKS) suitable for
use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer
(SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to
encrypt outgoing messages.
Default: resources/dse/conf/.keystore
keystore_password
Password for the keystore.
Default: cassandra
require_client_auth
Whether to enable certificate authentication for client-to-node encryption. When not set, the default is
false.
When set to true, client certificates must be present on all nodes in the cluster.
Default: commented out (false)
truststore
Relative path from DSE installation directory or absolute path to truststore containing the trusted
certificate for authenticating remote servers.
Default: resources/dse/conf/.truststore
truststore_password
Password for the truststore. This must match the password used when generating the keystore and
truststore.
Truststore password and path is only required when require_client_auth is set to true.
Default: cassandra
protocol
Default: commented out (TLS)
algorithm
Default: commented out (SunX509)
store_type
Valid types are JKS, JCEKS and PKCS12. For file-based keystores, use PKCS12.
PKCS11 is not supported.
Default: commented out (JKS)
truststore_type

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
137
Configuration

Valid types are JKS, JCEKS, and PKCS12.


PKCS11 is not supported. Also, due to an OpenSSL issue, you cannot use a PKCS12 truststore that
was generated via OpenSSL. For example, a truststore generated via the following command will not
work with DSE:

$ openssl pkcs12 -export -nokeys -out truststore.pfx -in intermediate.chain.pem

However, truststores generated via Java's keytool and then converted to PKCS12 work with DSE.
Example:

$ keytool -importcert -alias rootca -file rootca.pem -keystore truststore.jks

$ keytool -importcert -alias intermediate -file intermediate.pem -keystore


truststore.jks

$ keytool -importkeystore -srckeystore truststore.jks -destkeystore truststore.pfx


-deststoretype pkcs12

Default: commented out (JKS)


cipher_suites
Supported ciphers:

• TLS_RSA_WITH_AES_128_CBC_SHA

• TLS_RSA_WITH_AES_256_CBC_SHA

• TLS_DHE_RSA_WITH_AES_128_CBC_SHA

• TLS_DHE_RSA_WITH_AES_256_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

Default: commented out


Transparent data encryption options

transparent_data_encryption_options:
enabled: false
chunk_length_kb: 64
cipher: AES/CBC/PKCS5Padding
key_alias: testing:1
# CBC IV length for AES must be 16 bytes, the default size
# iv_length: 16
key_provider:
- class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:
- keystore: conf/.keystore
keystore_password: cassandra
store_type: JCEKS
key_password: cassandra

transparent_data_encryption_options
DataStax Enterprise supports this option only for backward compatibility. When using DSE, configure
data encryption options in the dse.yaml; see Transparent data encryption.
TDE properties:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
138
Configuration

• enabled: (Default: false)

• chunk_length_kb: (Default: 64)

• cipher: options:

# AES

# CBC

# PKCS5Padding

• key_alias: testing:1

• iv_length: 16
iv_length is commented out in the default cassandra.yaml file. Uncomment only if cipher is set
to AES. The value must be 16 (bytes).

• key_provider:

# class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:

# keystore: conf/.keystore

# keystore_password: cassandra

# store_type: JCEKS

# key_password: cassandra

SSL Ports

ssl_storage_port: 7001
native_transport_port_ssl: 9142

See Securing DataStax Enterprise ports.

ssl_storage_port
The SSL port for encrypted communication. Unused unless enabled in encryption_options. Follow
security best practices, do not expose this port to the internet. Apply firewall rules.
Default: 7001
native_transport_port_ssl
Dedicated SSL port where the CQL native transport listens for clients with encrypted communication.
For security reasons, do not expose this port to the internet. Firewall it if needed.

• commented out (disabled) - the native_transport_port will encrypt all traffic

• port number different than native_transport_port - use encryption for native_transport_port_ssl,


keep native_transport_port unencrypted to use both unencrypted and encrypted traffic

Default: 9142
Continuous paging options

continuous_paging:
max_concurrent_sessions: 60
max_session_pages: 4
max_page_size_mb: 8
max_local_query_time_ms: 5000
client_timeout_sec: 600
cancel_timeout_sec: 5

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
139
Configuration

paused_check_interval_ms: 1

continuous_paging
Options to tune continuous paging that pushes pages, when requested, continuously to the client:

• Maximum memory used:

max_concurrent_sessions # max_session_pages # max_page_size_mb

Default: calculated (60 # 4 # 8 = 1920 MB)

Guidance

• Because memtables and SSTables are used by the continuous paging query, you can define the
maximum period of time during which memtables cannot be flushed and compacted SSTables
cannot be deleted.

• If fewer threads exist than sessions, a session cannot execute until another one is swapped out.

• Distributed queries (CL > ONE or non-local data) are swapped out after every page, while local
queries at CL = ONE are swapped out after max_local_query_time_ms.

max_concurrent_sessions
The maximum number of concurrent sessions. Additional sessions are rejected with an unavailable
error.
Default: 60
max_session_pages
The maximum number of pages that can be buffered for each session. If the client is not reading from
the socket, the producer thread is blocked after it has prepared max_session_pages.
Default: 4
max_page_size_mb
The maximum size of a page, in MB. If an individual CQL row is larger than this value, the page can be
larger than this value.
Default: 8
max_local_query_time_ms
The maximum time for a local continuous query to run. When this threshold is exceeded, the
session is swapped out and rescheduled. Swapping and rescheduling ensures the release of
resources that prevent the memtables from flushing and ensures fairness when max_threads <
max_concurrent_sessions. Adjust when high write workloads exist on tables that have continuous
paging requests.
Default: 5000
client_timeout_sec
How long the server will wait, in seconds, for clients to request more pages if the client is not reading
and the server queue is full.
Default: 600
cancel_timeout_sec
How long to wait before checking if a paused session can be resumed. Continuous paging sessions
are paused because of backpressure or when the client has not request more pages with backpressure
updates.
Default: 5
paused_check_interval_ms
How long to wait, in milliseconds, before checking if a continuous paging sessions can be resumed,
when that session is paused because of backpressure.
Default: 1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
140
Configuration

Fault detection setting

# phi_convict_threshold: 8

phi_convict_threshold
The sensitivity of the failure detector on an exponential scale. Generally, this setting does not need
adjusting.
See About failure detection and recovery.
When not set, the internal value is 8.
Default: commented out (8)
Memory leak detection settings

#leaks_detection_params:
# sampling_probability: 0
# max_stacks_cache_size_mb: 32
# num_access_records: 0
# max_stack_depth: 30

sampling_probability
The sampling probability to track for the specified resource. For resources tracked, see nodetool
leaksdetection.

• 0 - disable tracking. Default.

• 1 - enable tracking all the time

• A number between 0 and 1 - the percentage of time to randomly track a resource. For example,
0.5 will track resources 50% of the time.

Tracking incurs a significant stack trace collection cost for every access and consumes heap space.
Enable tracking only when directed by DataStax Support.
Default: commented out (0)
max_stacks_cache_size_mb
Set the size of the cache for call stack traces. Stack traces are used to debug leaked resources, and
use heap memory. Set the amount of heap memory dedicated to each resource by setting the max
stacks cache size in MB.
Default: commented out (32)
num_access_records
Set the average number of stack traces kept when a resource is accessed. Currently only supported for
chunks in the cache.
Default: commented out (0)
max_stack_depth
Set the depth of the stack traces collected. Changes only the depth of the stack traces that will be
collected from the time the parameter is set. Deeper stacks are more unique, so increasing the depth
may require increasing stacks_cache_size_mb.
Default: commented out (30)
dse.yaml configuration file
The dse.yaml file is the primary configuration file for security, DSE Search, DSE Graph, and DSE Analytics.
After changing properties in the dse.yaml file, you must restart the node for the changes to take effect.

Package installations /etc/dse/dse.yaml


Tarball installations installation_location/
resources/dse/conf/dse.yaml

The cassandra.yaml file is the primary configuration file for the DataStax Enterprise database.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
141
Configuration

Syntax
For the properties in each section, the parent setting has zero spaces. Each child entry requires at least
two spaces. Adhere to the YAML syntax and retain the spacing. For example, no spaces before the parent
node_health_options entry, and at least two spaces before the child settings:

node_health_options:
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30

Organization
The DataStax Enterprise configuration properties are grouped into the following sections:

• Security and authentication options

• DSE In-Memory

• Node health

• Health-based routing

• Lease metrics

• DSE Search options

• DSE Analytics options

• Performance Service options

• DSE Metrics Collector options

• Audit logging

• audit_logging_options

• DSE Tiered Storage

• DSE Advanced Replication

• Inter-node messaging

• DSE Multi-Instance

• DSE Graph options

Security and authentication options

• Authentication options

• Role management options

• Authorization options

• Kerberos options

• LDAP options

• Encrypt sensitive system resources

• Encrypted configuration properties settings

• KMIP encryption options

• DSE Search index encryption settings

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
142
Configuration

Authentication options
Authentication options for the DSE Authenticator that allows you to use multiple schemes for authentication in a
DataStax Enterprise cluster. Additional authenticatorconfiguration is required in cassandra.yaml.
Internal and LDAP schemes can also used for role management, see role_management_options.

See Enabling DSE Unified Authentication.

# authentication_options:
# enabled: false
# default_scheme: internal
# other_schemes:
# - ldap
# - kerberos
# scheme_permissions: false
# transitional_mode: disabled
# allow_digest_with_kerberos: true
# plain_text_without_ssl: warn

authentication_options
Options for the DseAuthenticator to authenticate users when the authenticator option in
cassandra.yaml is set to com.datastax.bdp.cassandra.auth.DseAuthenticator. Authenticators other than
DseAuthenticator are not supported.
enabled
Enables user authentication.

• true - The DseAuthenticator authenticates users.

• false - The DseAuthenticator does not authenticate users and allows all connections.

When not set, the default is false.


Default: commented out false
default_scheme
Sets the first scheme to validate a user against when the driver does not request a specific scheme.

• internal - Plain text authentication using the internal password authentication.

• ldap - Plain text authentication using pass-through LDAP authentication.

• kerberos - GSSAPI authentication using the Kerberos authenticator.

Default: commented out (internal)


other_schemes
List of schemes that are also checked if validation against the first scheme fails and no scheme was
specified by the driver. Same scheme names as default_scheme.
scheme_permissions
Whether roles need to have permission granted to them in order to use specific authentication
schemes. These permissions can be granted only when the DseAuthorizer is used. Set to one of the
following values:

• true - Use multiple schemes for authentication. Every role requires permissions to a scheme in
order to be assigned.

• false - Do not use multiple schemes for authentication. Prevents unintentional role assignment that
might occur if user or group names overlap in the authentication service.

See Binding a role to an authentication scheme.


When not set, the default is false.
Default: commented out (false)
allow_digest_with_kerberos

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
143
Configuration

Controls whether DIGEST-MD5 authentication is also allowed with Kerberos. The DIGEST-MD5
mechanism is not directly associated with an authentication scheme, but is used by Kerberos to pass
credentials between nodes and jobs.

• true - DIGEST-MD5 authentication is also allowed with Kerberos. In analytics clusters, set to true to
use Hadoop inter-node authentication with Hadoop and Spark jobs.

• false - DIGEST-MD5 authentication is not used with Kerberos.

Analytics nodes require true to use internode authentication with Hadoop and Spark jobs. When not set,
the default is true.
Default: commented out (true)
plain_text_without_ssl
Controls how the DseAuthenticator responds to plain text authentication requests over unencrypted
client connections. Set to one of the following values:

• block - Block the request with an authentication error.

• warn - Log a warning about the request but allow it to continue.

• allow - Allow the request without any warning.

Default: commented out (warn)


transitional_mode
Whether to enable transitional mode for temporary use during authentication setup in an already
established environment.
Transitional mode allows access to the database using the anonymous role, which has all permissions
except AUTHORIZE.

• disabled - Transitional mode is disabled. All connections must provide valid credentials and map to
a login-enabled role.

• permissive - Only super users are authenticated and logged in. All other authentication attempts
are logged in as the anonymous user.

• normal - Allow all connections that provide credentials. Maps all authenticated users to their role
AND maps all other connections to anonymous.

• strict - Allow only authenticated connections that map to a login-enabled role OR connections that
provide a blank username and password as anonymous.

Credentials are required for all connections after authentication is enabled; use a blank username
and password to login with anonymous role in transitional mode.

Default: commented out (disabled)


Role management options

#role_management_options:
# mode: internal
# stats: false

See Enabling DSE Unified Authentication.

role_management_options
Options for the DSE Role Manager. To enable role manager, set:

• authorization_options enabled to true

• role_manager in cassandra.yaml to com.datastax.bdp.cassandra.auth.DseRoleManager

See Setting up logins and users.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
144
Configuration

When scheme_permissions is enabled, all roles must have permission to execute on the authentication
scheme, see Binding a role to an authentication scheme.
mode
Set to one of the following values:

• internal - Scheme that manages roles per individual user in the internal database. Allows nesting
roles for permission management.

• ldap - Scheme that assigns roles by looking up the user name in LDAP and mapping the group
attribute (ldap_options) to an internal role name. To configure an LDAP scheme, complete the
steps in Defining an LDAP scheme.

Internal role management allows nesting roles for permission management; when using LDAP mode
role, nesting is disabled. Using GRANT role_name TO role_name results in an error.
Default: commented out (internal)
stats
Set to true, to enable logging of DSE role creation and modification events in the
dse_security.role_stats system table. All nodes must have the stats option enabled, and must be
restarted for the functionality to take effect.
To query role events:

SELECT * FROM dse_security.role_stats;

role | created | password_changed


-------+---------------------------------+---------------------------------
user1 | 2020-04-13 00:44:09.221000+0000 | null
user2 | 2020-04-12 23:49:21.457000+0000 | 2020-04-12 23:49:21.457000+0000

(2 rows)

Default: commented out (false)


Authorization options

#authorization_options:
# enabled: false
# transitional_mode: disabled
# allow_row_level_security: false

See Enabling DSE Unified Authentication.

authorization_options
Options for the DSE Authorizer.
enabled
Whether to use the DSE Authorizer for role-based access control (RBAC).

• true - use the DSE Authorizer for role-based access control (RBAC)

• false - do not use the Dse Authorizer

When not set, the default is false.


Default: commented out (false)
transitional_mode
Allows the DSE Authorizer to operate in a temporary transitional mode during setup of authorization in a
cluster. Set to one of the following values:

• disabled - Transitional mode is disabled.

• normal - Permissions can be passed to resources, but are not enforced.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
145
Configuration

• strict - Permissions can be passed to resources, and are enforced on authenticated users.
Permissions are not enforced against anonymous users.

Default: commented out (disabled)


allow_row_level_security
Whether to enable row-level access control (RLAC) permissions; use the same setting on all nodes.

• true - use row-level security

• false - do not use row-level

When not set, the default is false.


Default: commented out (false)
Kerberos options

kerberos_options:
keytab: resources/dse/conf/dse.keytab
service_principal: dse/_HOST@REALM
http_principal: HTTP/_HOST@REALM
qop: auth

See Defining a Kerberos scheme.

kerberos_options
Options to configure security for a DataStax Enterprise cluster using Kerberos.
keytab
The file path of dse.keytab.
service_principal
The service_principal that the DataStax Enterprise process runs under must use the form dse_user/
_HOST@REALM, where:

• dse_user is the name of the user that starts the DataStax Enterprise process.

• _HOST is converted to a reverse DNS lookup of the broadcast address.

• REALM is the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.

http_principal
The http_principal is used by the Tomcat application container to run DSE Search. The Tomcat
web server uses the GSSAPI mechanism (SPNEGO) to negotiate the GSSAPI security mechanism
(Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be
uppercase.
qop
A comma-delimited list of Quality of Protection (QOP) values that clients and servers can use for each
connection. The client can have multiple QOP values, while the server can have only a single QOP
value. The valid values are:

• auth - Authentication only.

• auth-int - Authentication plus integrity protection for all transmitted data.

• auth-conf - Authentication plus integrity protection and encryption of all transmitted data.
Encryption using auth-conf is separate and independent of whether encryption is done using
SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax
recommends choosing only one method and using it for both encryption and authentication.

LDAP options
Define LDAP options to authenticate users against an external LDAP service and/or for Role Management using
LDAP group look up.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
146
Configuration

See Enabling DSE Unified Authentication.

# ldap_options:
# server_host:
# server_port: 389
# hostname_verification: false
# search_dn:
# search_password:
# use_ssl: false
# use_tls: false
# truststore_path:
# truststore_password:
# truststore_type: jks
# user_search_base:
# user_search_filter: (uid={0})
# user_memberof_attribute: memberof
# group_search_type: directory_search
# group_search_base:
# group_search_filter: (uniquemember={0})
# group_name_attribute: cn
# credentials_validity_in_ms: 0
# search_validity_in_seconds: 0
# connection_pool:
# max_active: 8
# max_idle: 8

Microsoft Active Directory (AD) example, for both authentication and role management:

ldap_options:
server_host: win2012ad_server.mycompany.lan
server_port: 389
search_dn: cn=lookup_user,cn=users,dc=win2012domain,dc=mycompany,dc=lan
search_password: lookup_user_password
use_ssl: false
use_tls: false
truststore_path:
truststore_password:
truststore_type: jks
#group_search_type: directory_search
group_search_type: memberof_search
#group_search_base:
#group_search_filter:
group_name_attribute: cn
user_search_base: cn=users,dc=win2012domain,dc=mycompany,dc=lan
user_search_filter: (sAMAccountName={0})
user_memberof_attribute: memberOf
connection_pool:
max_active: 8
max_idle: 8

See Defining an LDAP scheme.

ldap_options
Options to configure LDAP security. When not set, LDAP authentication is not used.
Default: commented out
server_host
A comma separated list of LDAP server hosts.
Do not use LDAP on the same host (localhost) in production environments. Using LDAP on the same
host (localhost) is appropriate only in single node test or development environments.
Default: none

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
147
Configuration

server_port
The port on which the LDAP server listens.

• 389 - the default port for unencrypted connections

• 636 - typically used for encrypted connections; the default SSL port for LDAP is 636

Default: commented out (389)


hostname_verification
Enable hostname verification. The following conditions must be met:

• Either use_ssl or use_tls must be set to true.

• A valid truststore with the correct path specified in truststore_path must exist. The truststore
must have a certificate entry, trustedCertEntry, including a SAN DNSName entry that matches the
hostname of the LDAP server.

Default: false
search_dn
Distinguished name (DN) of an account with read access to the user_search_base and
group_search_base. For example:

• OpenLDAP: uid=lookup,ou=users,dc=springsource,dc=com

• Microsoft Active Directory (AD): cn=lookup, cn=users, dc=springsource, dc=com

Do not create/use an LDAP account or group called cassandra. The DSE database comes with a
default login role, cassandra, that has access to all database objects and uses the consistency level
QUOROM.
When not set, an anonymous bind is used for the search on the LDAP server.
Default: commented out
search_password
The password of the search_dn account.
Default: commented out
use_ssl
Whether to use an SSL-encrypted connection.

• true - use an SSL-encrypted connection, set server_port to the LDAP port for the server (typically
port 636)

• false - do not enable SSL connections to the LDAP server

Default: commented out (false)


use_tls
Whether to enable TLS connections to the LDAP server.

• true - enable TLS connections to the LDAP server, set server_port to the TLS port of the LDAP
server.

• false - do not enable TLS connections to the LDAP server

Default: commented out (false)


truststore_path
The path to the truststore for SSL certificates.
Default: commented out
truststore_password
The password to access the trust store.
Default: commented out
truststore_type
The type of truststore.
Default: commented out (jks)
user_search_base

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
148
Configuration

Distinguished name (DN) of the object to start the recursive search for user entries for authentication
and role management memberof searches. For example to search all users in example.com,
ou=users,dc=example,dc=com.

• For your LDAP domain, set the ou and dc elements. Typically set to
ou=users,dc=domain,dc=top_level_domain. For example, ou=users,dc=example,dc=com.

• Active Directory uses a different search base, typically


CN=search,CN=Users,DC=ActDir_domname,DC=internal. For example,
CN=search,CN=Users,DC=example-sales,DC=internal.

Default: commented out


user_search_filter
Attribute that identifies the user that the search filter uses for looking up user names.

• uid={0} - when using LDAP

• samAccountName={0} - when using AD (Microsoft Active Directory). For example,


(sAMAccountName={0})

Default: commented out (uid={0})


user_memberof_attribute
Attribute that contains a list of group names; role manager assigns DSE roles that exactly match any
group name in the list. Required when managing roles using group_search_type: memberof_search
with LDAP (role_manager.mode:ldap). The directory server must have memberof support, which is a
default user attribute in Microsoft Active Directory (AD).
Default: commented out (memberof)
group_search_type
Required when managing roles with LDAP (role_manager.mode: ldap). Define how group membership
is determined for a user. Choose from one of the following values:

• directory_search - Filters the results by doing a subtree search of group_search_base to find


groups that contain the user name in the attribute defined in the group_search_filter. (Default)

• memberof_search - Recursively search for user entries using the user_search_base and
user_search_filter. Get groups from the user attribute defined in user_memberof_attribute.
The directory server must have memberof support.

Default: commented out (directory_search)


group_search_base
The unique distinguished name (DN) of the group record from which to start the group membership
search on.
Default: commented out
group_search_filter
Set to any valid LDAP filter.
Default: commented out (uniquemember={0})
group_name_attribute
The attribute in the group record that contains the LDAP group name. Role names are case-sensitive
and must match exactly on DSE for assignment. Unmatched groups are ignored.
Default: commented out (cn)
credentials_validity_in_ms
The duration period of the credentials cache.

• 0 - disable credentials cache

• duration period in milliseconds - enable a search cache and improve performance by reducing the
number of requests that are sent to the internal or LDAP server. See Defining an LDAP scheme.

When not set, the default is 0 (disabled).


Default: commented out (0)
search_validity_in_seconds

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
149
Configuration

The duration period for the search cache.

• 0 - disable search credentials cache

• duration period in seconds - enables a search cache and improves performance by reducing the
number of requests that are sent to the internal or LDAP server

Default: commented out (0, disabled)


connection_pool
The configuration settings for the connection pool for making LDAP requests.
max_active
The maximum number of active connections to the LDAP server.
Default: commented out (8)
max_idle
The maximum number of idle connections in the pool awaiting requests.
Default: commented out (8)
Encrypt sensitive system resources
Options to encrypt sensitive system resources using a local encryption key or a remote KMIP key.

system_info_encryption:
enabled: false
cipher_algorithm: AES
secret_key_strength: 128
chunk_length_kb: 64
key_provider: KmipKeyProviderFactory
kmip_host: kmip_host_name

DataStax recommends using a remote encryption key from a KMIP provider when using Transparent Data
Encryption (TDE) features. Use a local encryption key only if a KMIP server is not available.

system_info_encryption
Options to set encryption settings for system resources that might contain sensitive information,
including the system.batchlog and system.paxos tables, hint files, and the database commit log.
enabled
Whether to enable encryption of system resources. See Encrypting system resources.
The system_trace keyspace is NOT encrypted by enabling the system_information_encryption
section. In environments that also have tracing enabled, manually configure encryption with
compression on the system_trace keyspace. See Transparent data encryption.
Default: false
cipher_algorithm
The name of the JCE cipher algorithm used to encrypt system resources.
Table 11: Supported cipher algorithms names
cipher_algorithm secret_key_strength

AES 128, 192, or 256

DES 56

DESede 112 or 168

Blowfish 32-448

RC2 40-128

Default: AES
secret_key_strength
Length of key to use for the system resources. See Supported cipher algorithms names.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
150
Configuration

DSE uses a matching local key or requests the key type from the KMIP server. For KMIP, if an
existing key does not match, the KMIP server automatically generates a new key.
Default: 128
chunk_length_kb
Optional. Size of SSTable chunks when data from the system.batchlog or system.paxos are written to
disk.
To encrypt existing data, run nodetool upgradesstables -a system batchlog paxos on all
nodes in the cluster.
Default: 64
key_provider
KMIP key provider to enable encrypting sensitive system data with a KMIP key. Comment out if using a
local encryption key.
Default: commented out (KmipKeyProviderFactory)
kmip_host
The KMIP key server host. Set to the kmip_group_name that defines the KMIP host in kmip_hosts
section. DSE requests a key from the KMIP host and uses the key generated by the KMIP provider.
Default: commented out
Encrypted configuration properties settings
Settings for using encrypted passwords in sensitive configuration file properties.

system_key_directory: /etc/dse/conf
config_encryption_active: false
config_encryption_key_name: (key_filename | KMIP_key_URL )

system_key_directory
Path to the directory where local encryption/decryption key files are stored, also called system keys.
Distribute the system keys to all nodes in the cluster. Ensure that the DSE account is the folder owner
and has read/write/execute (700) permissions.
See Setting up local encryption keys.
This directory is not used for KMIP keys.

Default: /etc/dse/conf
config_encryption_active
Whether to enable encryption on sensitive data stored in tables and in configuration files.

• true - enable encryption of configuration property values using the specified


config_encryption_key_name. When set to true, the configuration values must be encrypted or
commented out. See Encrypting configuration file properties.
Lifecycle Manager (LCM) is not compatible when config_encryption_active is true in DSE
and OpsCenter. For LCM limitations, see Encrypted DSE configuration values.

• false - Do not enable encryption of configuration property values.

Default: false
config_encryption_key_name
Set to the local encryption key filename or KMIP key URL to use for configuration file property value
decryption.
Use dsetool dsetool encryptconfigvalue to generate encrypted values for the configuration file
properties.
Default: system_key. The default name is not configurable.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
151
Configuration

KMIP encryption options


Options for KMIP encryption keys and communication between the DataStax Enterprise node and the KMIP key
server or key servers. Enables DataStax Enterprise encryption features to use encryption keys that stored on a
server that is not running DataStax Enterprise.

kmip_hosts:
your_kmip_groupname:
hosts: kmip1.yourdomain.com, kmip2.yourdomain.com
keystore_path: pathto/kmip/keystore.jks
keystore_type: jks
keystore_password: password
truststore_path: pathto/kmip/truststore.jks
truststore_type: jks
truststore_password: password
key_cache_millis: 300000
timeout: 1000
protocol: protocol
cipher_suites: supported_cipher

kmip_hosts
Connection settings for key servers that support the KMIP protocol.
kmip_groupname
A user-defined name for a group of options to configure a KMIP server or servers, key settings, and
certificates. Configure options for a kmip_groupname section for each KMIP key server or group of
KMIP key servers. Using separate key server configuration settings allows use of different key servers
to encrypt table data, and eliminates the need to enter key server configuration information in DDL
statements and other configurations. Multiple KMIP hosts are supported.
Default: commented out
hosts
A comma-separated list KMIP hosts (host[:port]) using the FQDN (Fully Qualified Domain Name). DSE
queries the host in the listed order, so add KMIP hosts in the intended failover sequence.
For example, if the host list contains kmip1.yourdomain.com, kmip2.yourdomain.com, DSE tries
kmip1.yourdomain.com and then kmip2.yourdomain.com.
keystore_path
The path to a Java keystore created from the KMIP agent PEM files.
Default: commented out (/etc/dse/conf/KMIP_keystore.jks)
keystore_type
The type of keystore.
Default: commented out (jks)
keystore_password
The password to access the keystore.
Default: commented out (password)
truststore_path
The path to a Java truststore that was created using the KMIP root certificate.
Default: commented out (/etc/dse/conf/KMIP_truststore.jks)
truststore_type
The type of truststore.
Default: commented out (jks)
truststore_password
The password to access the truststore.
Default: commented out (password)
key_cache_millis
Milliseconds to locally cache the encryption keys that are read from the KMIP hosts. The longer the
encryption keys are cached, the fewer requests are made to the KMIP key server, but the longer it takes
for changes, like revocation, to propagate to the DataStax Enterprise node. DataStax Enterprise uses
concurrent encryption, so multiple threads fetch the secret key from the KMIP key server at the same
time. DataStax recommends using the default value.
Default: commented out (300000)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
152
Configuration

timeout
Socket timeout in milliseconds.
Default: commented out (1000)
protocol
protocol
When not specified, JVM default is used. Example: TLSv1.2
cipher_suites
When not specified, JVM default is used. Examples:

• TLS_RSA_WITH_AES_128_CBC_SHA

• TLS_RSA_WITH_AES_256_CBC_SHA

• TLS_DHE_RSA_WITH_AES_128_CBC_SHA

• TLS_DHE_RSA_WITH_AES_256_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

See cipher_algorithm.
DSE Search index encryption settings

# solr_encryption_options:
# decryption_cache_offheap_allocation: true
# decryption_cache_size_in_mb: 256

solr_encryption_options
Settings to tune encryption of search indexes.
decryption_cache_offheap_allocation
Whether to allocate shared DSE Search decryption cache off JVM heap.

• true - allocate shared DSE Search decryption cache off JVM heap

• false - do not allocate shared DSE Search decryption cache off JVM heap

When not set, the default is true.


Default: commented out (true)
decryption_cache_size_in_mb
The maximum size of shared DSE Search decryption cache in megabytes (MB).
Default: commented out (256)
DSE In-Memory options
To use the DSE In-Memory, choose one of these options to specify how much system memory to use for all in-
memory tables: fraction or size.

# max_memory_to_lock_fraction: 0.20
# max_memory_to_lock_mb: 10240

max_memory_to_lock_fraction
A fraction of the system memory. The default value of 0.20 specifies to use up to 20% of system
memory. This max_memory_to_lock_fraction value is ignored if max_memory_to_lock_mb is set to a
non-zero value. To specify a fraction, use instead of max_memory_to_lock_mb.
Default: commented out (0.20)
max_memory_to_lock_mb
A maximum amount of memory in megabytes (MB).

• not set - use the fraction specified with max_memory_to_lock_fraction

• number greater than 0 - maximum amount of memory in megabytes (MB)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
153
Configuration

Default: commented out (10240)


Node health options

node_health_options:
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30

node_health_options
Node health options are always enabled.
refresh_rate_ms
Default: 60000
uptime_ramp_up_period_seconds
The amount of continuous uptime required for the node's uptime score to advance the node health
score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a
composite score based on dropped mutations and uptime.
If a node is repairing after a period of downtime, you might want to increase the uptime period to the
expected repair time.
Default: commented out (10800 3 hours)
dropped_mutation_window_minutes
The historic time window over which the rate of dropped mutations affect the node health score.
Default: 30
Health-based routing

enable_health_based_routing: true

enable_health_based_routing
Whether to consider node health for replication selection for distributed DSE Search queries. Health-
based routing enables a trade-off between index consistency and query throughput.

• true - consider node health when multiple candidates exist for a particular token range.

• false - ignore node health for replication selection. When the primary concern is performance, do
not enable health-based routing.

Default: true
Lease metrics

lease_metrics_options:
enabled:false
ttl_seconds: 604800

lease_metrics_options
Lease holder statistics help monitor the lease subsystem for automatic management of Job Tracker and
Spark Master nodes.
enabled
Enables (true) or disables (false) log entries related to lease holders. Most of the time you do not want
to enable logging.
Default: false
ttl_seconds
Defines the time, in milliseconds, to persist the log of lease holder changes. Logging of lease holder
changes is always on, and has a very low overhead.
Default: 604800

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
154
Configuration

DSE Search options

• Scheduler settings for DSE Search indexes

• async_bootstrap_reindex

• CQL Solr paging

• Solr CQL query option

• DSE Search resource upload limit

• Shard transport options

• DSE Search indexing settings

Scheduler settings for DSE Search indexes


To ensure that records with TTLs are purged from search indexes when they expire, the search indexes are
periodically checked for expired documents.

ttl_index_rebuild_options:
fixed_rate_period: 300
initial_delay: 20
max_docs_per_batch: 4096
thread_pool_size: 1

ttl_index_rebuild_options
Section of options to control the schedulers in charge of querying for and removing expired records, and
the execution of the checks.
fix_rate_period
Time interval to check for expired data in seconds.
Default: 300
initial_delay
The number of seconds to delay the first TTL check to speed up start-up time.
Default: 20
max_docs_per_batch
The maximum number of documents to check and delete per batch by the TTL rebuild thread. All
documents determined to be expired are deleted from the index during each check, to avoid memory
pressure, their unique keys are retrieved and deletes issued in batches.
Default: 4096
thread_pool_size
The maximum number of cores that can execute TTL cleanup concurrently. Set the thread_pool_size
to manage system resource consumption and prevent many search cores from executing simultaneous
TTL deletes.
Default: 1
Reindexing of bootstrapped data

async_bootstrap_reindex: false

async_bootstrap_reindex
For DSE Search, configure whether to asynchronously reindex bootstrapped data. Default: false

• If enabled, the node joins the ring immediately after bootstrap and reindexing occurs
asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down.
The dsetool ring command can be used to check the status of the reindexing.

• If disabled, the node joins the ring after reindexing the bootstrapped data.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
155
Configuration

CQL Solr paging


Options to specify the paging behavior.

cql_solr_query_paging: off

cql_solr_query_paging

• driver - Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the
driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.

• off - Paging is off. Ignore driver paging settings for CQL queries and use normal Solr paging unless:

# The current workload is an analytics workload, including SearchAnalytics. SearchAnalytics


nodes always use driver paging settings.

# The cqlsh query parameter paging is set to driver.


Even when cql_solr_query_paging: off, paging is dynamically enabled with the
"paging":"driver" parameter in JSON queries.

When not set, the default is off.


Default: commented out (off)
Solr CQL query option
Available option for CQL Solr queries.

cql_solr_query_row_timeout: 10000

cql_solr_query_row_timeout
The maximum time in milliseconds to wait for each row to be read from the database during CQL Solr
queries.
Default: commented out (10000 10 seconds)
DSE Search resource upload limit

solr_resource_upload_limit_mb: 10

solr_resource_upload_limit_mb
Option to disable or configure the maximum file size of the search index config or schema. Resource
files can be uploaded, but the search index config and schema are stored internally in the database
after upload.

• 0 - disable resource uploading

• upload size - The maximum upload size limit in megabytes (MB) for a DSE Search resource file
(search index config or schema).

Default: 10
Shard transport options

shard_transport_options:
netty_client_request_timeout: 60000

shard_transport_options
Fault tolerance option for inter-node communication between DSE Search nodes.
netty_client_request_timeout
Timeout behavior during distributed queries. The internal timeout for all search queries to prevent long
running queries. The client request timeout is the maximum cumulative time (in milliseconds) that a
distributed search request will wait idly for shard responses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
156
Configuration

Default: 60000 (1 minute)


DSE Search indexing settings

# back_pressure_threshold_per_core: 1024
# flush_max_time_per_core: 5
# load_max_time_per_core: 5
# enable_index_disk_failure_policy: false
# solr_data_dir: /MyDir
# solr_field_cache_enabled: false
# ram_buffer_heap_space_in_mb: 1024
# ram_buffer_offheap_space_in_mb: 1024

See Tuning search for maximum indexing throughput.

back_pressure_threshold_per_core
The maximum number of queued partitions during search index rebuilding and reindexing. This
maximum number safeguards against excessive heap use by the indexing queue. If set lower than the
number of threads per core (TPC), not all TPC threads can be actively indexing.
Default: commented out (1024)
flush_max_time_per_core
The maximum time, in minutes, to wait for the flushing of asynchronous index updates that occurs at
DSE Search commit time or at flush time. Expert level knowledge is required to change this value.
Always set the value reasonably high to ensure flushing completes successfully to fully sync DSE
Search indexes with the database data. If the configured value is exceeded, index updates are only
partially committed and the commit log is not truncated which can undermine data durability.
When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely
manner. Live indexing increases the time to flush asynchronous index updates.
Default: commented out (5)
load_max_time_per_core
The maximum time, in minutes, to wait for each DSE Search index to load on startup or create/reload
operations. This advanced option should be changed only if exceptions happen during search index
loading. When not set, the default is 5 minutes.
Default: commented out (5)
enable_index_disk_failure_policy
Whether to apply the configured disk failure policy if IOExceptions occur during index update
operations.

• true - apply the configured Cassandra disk failure policy to index write failures

• false - do not apply the disk failure policy

When not set, the default is false.


Default: commented out (false)
solr_data_dir
The directory to store index data. For example:
solr_data_dir: /var/lib/cassandra/solr.data
See Managing the location of DSE Search data.By default, each DSE Search index is saved in
solr_data_dir/keyspace_name.table_name, or as specified by the dse.solr.data.dir system
property.
Default: commented out
solr_field_cache_enabled
The Apache Lucene® field cache is deprecated. Instead, for fields that are sorted, faceted, or grouped
by, set docValues="true" on the field in the search index schema. Then reload the search index and
reindex. When not set, the default is false.
Default: commented out (false)
ram_buffer_heap_space_in_mb
Global Lucene RAM buffer usage threshold for heap to force segment flush. Setting too low might
induce a state of constant flushing during periods of ongoing write activity. For NRT, forced segment

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
157
Configuration

flushes also de-schedule pending auto-soft commits to avoid potentially flushing too many small
segments. When not set, the default is 1024.
Default: commented out (1024)
ram_buffer_offheap_space_in_mb
Global Lucene RAM buffer usage threshold for offheap to force segment flush. Setting too low might
induce a state of constant flushing during periods of ongoing write activity. For NRT, forced segment
flushes also de-schedule pending auto-soft commits to avoid potentially flushing too many small
segments. When not set, the default is 1024.
Default: commented out (1024)
Performance Service options

• Global Performance Service options

• Performance Service options

• DSE Search Performance Service options

• Spark Performance Service options

Global Performance Service options


Available options to configure the thread pool that is used by most plug-ins. A dropped task warning
is issued when the performance service requests more tasks than performance_max_threads +
performance_queue_capacity. When a task is dropped, collected statistics might not be current.

# performance_core_threads: 4
# performance_max_threads: 32
# performance_queue_capacity: 32000

performance_core_threads
Number of background threads used by the performance service under normal conditions. Default: 4
performance_max_threads
Maximum number of background threads used by the performance service.
performance_queue_capacity
The number of queued tasks in the backlog when the number of performance_max_threads are busy.
Default: 32000
Performance Service options
These settings are used by the Performance Service to configure collection of performance metrics on
transactional nodes. Performance metrics are stored in the dse_perf keyspace and can be queried with CQL
using any CQL-based utility, such as cqlsh or any application using a CQL driver. To temporarily make changes
for diagnostics and testing, use the dsetool perf subcommands.

See Collecting system level diagnostics.

graph_events
Graph event information.

graph_events:
ttl_seconds: 600

ttl_seconds
The TTL in milliseconds.
Default: 600
cql_slow_log_options
Options to configure reporting distributed sub-queries for search (query executions on individual shards)
that take longer than a specified period of time.

# cql_slow_log_options:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
158
Configuration

# enabled: true
# threshold: 200.0
# minimum_samples: 100
# ttl_seconds: 259200
# skip_writing_to_db: true
# num_slowest_queries: 5

See Collecting slow queries.


enabled
Enables (true) or disables (false) log entries for slow queries. When not set, the default is true.
Default: commented out (true)
threshold
The threshold in milliseconds or as a percentile.

• A value greater than 1 is expressed in time and will log queries that take longer than the specified
number of milliseconds.

• A value of 0 to 1 is expressed as a percentile and will log queries that exceed this percentile.

Default: commented out (200.0 0.2 seconds)


minimum_samples
The initial number of queries before activating the percentile filter.
Default: commented out (100)
ttl_seconds
Time, in milliseconds, to keep the slow query log entries.
Default: commented out (259200)
skip_writing_to_db
Whether to keep slow queries in-memory only and not write data to database.

• false - write slow queries to the database; the threshold must be >= 2000 ms to prevent a high load
on the database

• true - skip writing to database, keep slow queries only in memory

Default: commented out (true)


num_slowest_queries
The number of slow queries to keep in-memory.
Default: commented out (5)
cql_system_info_options
Options to configure collection of system-wide performance information about a cluster.

cql_system_info_options:
enabled: false
refresh_rate_ms: 10000

enabled
Whether to collect system-wide performance information about a cluster.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
resource_level_latency_tracking_options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
159
Configuration

Options to configure collection of object I/O performance statistics.

resource_level_latency_tracking_options:
enabled: false
refresh_rate_ms: 10000

See Collecting system level diagnostics.


enabled
Whether to collect object I/O performance statistics.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
db_summary_stats_options
Options to configure collection of summary statistics at the database level.

db_summary_stats_options:
enabled: false
refresh_rate_ms: 10000

See Collecting database summary diagnostics.


enabled
Whether to collect database summary performance information.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
cluster_summary_stats_options
Options to configure collection of statistics at a cluster-wide level.

cluster_summary_stats_options:
enabled: false
refresh_rate_ms: 10000

See Collecting cluster summary diagnostics.


enabled
Whether to collect statistics at a cluster-wide level.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
spark_cluster_info_options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
160
Configuration

Options to configure collection of data associated with Spark cluster and Spark applications.

spark_cluster_info_options:
enabled: false
refresh_rate_ms: 10000

See Monitoring Spark with Spark Performance Objects.


enabled
Whether to collect Spark performance statistics.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
histogram_data_options
Histogram data for the dropped mutation metrics are stored in the dropped_messages table in the
dse_perf keyspace.

histogram_data_options:
enabled: false
refresh_rate_ms: 10000
retention_count: 3

See Collecting histogram diagnostics.


enabled

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
retention_count
Default: 3
user_level_latency_tracking_options
User-resource latency tracking settings.

user_level_latency_tracking_options:
enabled: false
refresh_rate_ms: 10000
top_stats_limit: 100
quantiles: false

See Collecting user activity diagnostics.


enabled

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
161
Configuration

The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
top_stats_limit
Limit the number of individual metrics.
Default: 100
quantiles
Default: false
DSE Search Performance Service options
These settings are used by the DataStax Enterprise Performance Service.

solr_slow_sub_query_log_options:
enabled: false
ttl_seconds: 604800
threshold_ms: 3000
async_writers: 1

solr_update_handler_metrics_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000

solr_request_handler_metrics_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000

solr_index_stats_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000

solr_cache_stats_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000

solr_latency_snapshot_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000

solr_slow_sub_query_log_options
See Collecting slow search queries.
enabled

• false - do not collect metrics

• true - enable collection of metrics

Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
async_writers

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
162
Configuration

The number of server threads dedicated to writing in the log. More than one server thread might
degrade performance.
Default: 1
threshold_ms
Default: 3000
solr_update_handler_metrics_options
Options to collect search index direct update handler statistics over time.
See Collecting handler statistics.
enabled

• false - do not collect metrics

• true - enable collection of metrics

Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
solr_index_stats_options
Options to record search index statistics over time.
See Collecting index statistics.
enabled

• false - do not collect metrics

• true - enable collection of metrics

Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
solr_cache_stats_options
See Collecting cache statistics.
enabled

• false - do not collect metrics

• true - enable collection of metrics

Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
solr_latency_snapshot_options
See Collecting Apache Solr performance statistics.
enabled

• false - do not collect metrics

• true - enable collection of metrics

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
163
Configuration

Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
Spark Performance Service options
See Monitoring Spark application information.

spark_application_info_options:
enabled: false
refresh_rate_ms: 10000
driver:
sink: false
connectorSource: false
jvmSource: false
stateSource: false
executor:
sink: false
connectorSource: false
jvmSource: false

spark_application_info_options
Statistics options.
enabled

• false - do not collect metrics

• true - enable collection of metrics

Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
driver
Options to configure collection of metrics at the Spark Driver.
connectorSource
Whether to collect Spark Cassandra Connector metrics at the Spark Driver.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
jvmSource
Whether to collect JVM heap and garbage collection (GC) metrics from the Spark Driver.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
stateSource
Whether to collect application state metrics at the Spark Driver.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
executor

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
164
Configuration

Options to configure collection of metrics at Spark executors.


sink
Whether to write metrics collected at Spark executors.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
connectorSource
Whether to collect Spark Cassandra Connector metrics at Spark executors.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
jvmSource
Whether to collect JVM heap and GC metrics at Spark executors.

• false - do not collect metrics

• true - enable collection of metrics

Default: false
DSE Analytics options

• Spark

• Starting Spark drivers and executors

• DSE File System (DSEFS) options

• Spark Performance Service

Spark resource and encryption options

spark_shared_secret_bit_length: 256
spark_security_enabled: false
spark_security_encryption_enabled: false

spark_daemon_readiness_assertion_interval: 1000

resource_manager_options:
worker_options:
cores_total: 0.7
memory_total: 0.6

workpools:
- name: alwayson_sql
cores: 0.25
memory: 0.25

spark_ui_options:
encryption: inherit
encryption_options:
enabled: false
keystore: .keystore
keystore_password: cassandra
require_client_auth: false
truststore: .truststore
truststore_password: cassandra
# Advanced settings
# protocol: TLS
# algorithm: SunX509

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
165
Configuration

# store_type: JKS
# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_

spark_shared_secret_bit_length
The length of a shared secret used to authenticate Spark components and encrypt the connections
between them. This value is not the strength of the cipher for encrypting connections. Default: 256
spark_security_enabled
In DSE 6.0.8 and later, when DSE authentication is enabled with authentication_options, Spark security
is enabled regardless of this setting.
Enables Spark security based on shared secret infrastructure. Enables mutual authentication and
optional encryption between DSE Spark Master and Workers, and of communication channels, except
the web UI.
Default: false
spark_security_encryption_enabled
In DSE 6.0.8 and later, when DSE authentication is enabled with authentication_options, Spark security
is enabled regardless of this setting.
Enables encryption between DSE Spark Master and Workers, and of communication channels,
except the web UI. Uses DIGEST-MD5 SASL-based encryption mechanism. Requires
spark_security_enabled: true.
Configure encryption between the Spark processes and DSE with client-to-node encryption in
cassandra.yaml.
spark_daemon_readiness_assertion_interval
Time interval, in milliseconds, between subsequent retries by the Spark plugin for Spark Master and
Worker readiness to start. Default: 1000
resource_manager_options
DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-
automatic fashion. You can define the total amount of physical resources available to Spark Workers,
and optionally add named work pools with specific resources dedicated to them.
worker_options
If the option is not specified, the default value 0.6 is used. The amount of system resources that are
made available to the Spark Worker.
cores_total
The number of total system cores available to Spark. If the option is not specified, the default value 0.7
is used.
For DSE 6.0.11 and later, the SPARK_WORKER_TOTAL_CORES environment variables takes precedence
over this setting.

This setting can be the exact number of cores or a decimal of the total system cores. When the value is
expressed as a decimal, the available resources are calculated in the following way:

Spark Worker cores = cores_total * total system cores

The lowest value that you can assign to Spark Worker cores is 1 core. If the results are lower, no
exception is thrown and the values are automatically limited.

Setting cores_total or a workpool's cores to 1.0 is a decimal value, meaning 100% of the available
cores will be reserved. Setting cores_total or cores to 1 (no decimal point) is an explicit value, and
one core will be reserved.
memory_total
The amount of total system memory available to Spark. This setting can be the exact amount of
memory or a decimal of the total system memory. When the value is an absolute value, you can use
standard suffixes like M for megabyte and G for gigabyte.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
166
Configuration

When the value is expressed as a decimal, the available resources are calculated in the following way:

Spark Worker memory = memory_total * (total system memory - memory assigned to


DataStax Enterprise)

The lowest values that you can assign to Spark Worker memory is 64 MB. If the results are lower, no
exception is thrown and the values are automatically limited.
If the option is not specified, the default value 0.6 is used.
For DSE 6.0.11 and later, the SPARK_WORKER_TOTAL_MEMORY environment variables takes
precedence over this setting.

workpools
Named work pools that can use a portion of the total resources defined under worker_options. A
default work pool named default is used if no work pools are defined in this section. If work pools are
defined, the resources allocated to the work pools are taken from the total amount, with the remaining
resources available to the default work pool. The total amount of resources defined in the workpools
section must not exceed the resources available to Spark in worker_options.
A work pool named alwayson_sql is created by default for AlwaysOn SQL. By default, it is configured
to use 25% of the resources available to Spark.
name
The name of the work pool.
cores
The number of system cores to use in this work pool expressed as either an absolute value or a decimal
value. This option follows the same rules as cores_total.
memory
The amount of memory to use in this work pool expressed as either an absolute value or a decimal
value. This option follows the same rules as memory_total.
spark_ui_options
Specify the source for SSL settings for Spark Master and Spark Worker UIs. The spark_ui_options
apply only to Spark daemon UIs, and do not apply to user applications even when the user applications
are run in cluster mode.
encryption

• inherit - inherit the SSL settings from the client encryption options.

• custom - use the following encryption_optionsfrom dse.yaml.

Default: inherit
encryption_options
Set encryption options for HTTPS of Spark Master and Worker UI. The spark_encryption_options are
not valid for DSE 5.1 and later.
enabled
Whether to enable Spark encryption for Spark client-to-Spark cluster and Spark internode
communication.
Default: false
keystore
The keystore for Spark encryption keys.
The relative file path is the base Spark configuration directory that is defined by the SPARK_CONF_DIR
environment variable. The default Spark configuration directory is resources/spark/conf.
Default: resources/dse/conf/.ui-keystore
keystore_password
The password to access the key store.
Default: cassandra
require_client_auth
Whether to require truststore for client authentication. When not set, the default is false.
Default: commented out (false)
truststore

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
167
Configuration

The truststore for Spark encryption keys.


The relative file path is the base Spark configuration directory that is defined by the SPARK_CONF_DIR
environment variable. The default Spark configuration directory is resources/spark/conf.
Default: commented out (resources/dse/conf/.ui-truststore)
truststore_password
The password to access the truststore.
Default: commented out (cassandra)
protocol
Defines the encryption protocol. The TLS protocol must be supported by JVM and Spark.
Default: commented out (TLS)
algorithm
Defines the key manager algorithm.
Default: commented out (TLSunX509SunX509S)
store_type
Defines the keystore type.
Default: commented out (JKS)
cipher_suites
Defines the cipher suites for Spark encryption:

• TLS_RSA_WITH_AES_128_CBC_SHA

• TLS_RSA_WITH_AES_256_CBC_SHA

• TLS_DHE_RSA_WITH_AES_128_CBC_SHA

• TLS_DHE_RSA_WITH_AES_256_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

Default: commented out


Starting Spark drivers and executors

spark_process_runner:
runner_type: default
run_as_runner_options:
user_slots:
- slot1
- slot2

spark_process_runner:
Options to configure how Spark driver and executor processes are created and managed.
runner_type

• default - Use the default runner type.

• run_as - Use the run_as_runner_options options. See Running Spark processes as separate
users.

run_as_runner_options
The slot users for separating Spark processes users from the DSE service user. See Running Spark
processes as separate users.
Default: slot1, slot2
AlwaysOn SQL options
Properties to enable and configure AlwaysOn SQL.

# AlwaysOn SQL options


# alwayson_sql_options:
# enabled: false

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
168
Configuration

# thrift_port: 10000
# web_ui_port: 9077
# reserve_port_wait_time_ms: 100
# alwayson_sql_status_check_wait_time_ms: 500
# workpool: alwayson_sql
# log_dsefs_dir: /spark/log/alwayson_sql
# auth_user: alwayson_sql
# runner_max_errors: 10

alwayson_sql_options
The AlwaysOn SQL options enable and configure the server on this node.
enabled
Whether to enable AlwaysOn SQL for this node. The node must be an analytics node. When not set,
the default is false.
Default: commented out (false)
thrift_port
The Thrift port on which AlwaysOn SQL listens.
Default: commented out (10000)
web_ui_port
The port on which the AlwaysOn SQL web UI is available.
Default: commented out (9077)
reserve_port_wait_time_ms
The wait time in milliseconds to reserve the thrift_port if it is not available.
Default: commented out (100)
alwayson_sql_status_check_wait_time_ms
The time in milliseconds to wait for a health check status of the AlwaysOn SQL server.
Default: commented out (500)
workpool
The work pool name used by AlwaysOn SQL.
Default: commented out (alwayson_sql)
log_dsefs_dir
Location in DSEFS of the AlwaysOn SQL log files.
Default: commented out (/spark/log/alwayson_sql)
auth_user
The role to use for internal communication by AlwaysOn SQL if authentication is enabled. Custom roles
must be created with login=true.
Default: commented out (alwayson_sql)
runner_max_errors
The maximum number of errors that can occur during AlwaysOn SQL service runner thread runs before
stopping the service. A service stop requires a manual restart.
Default: commented out (10)
DSE File System (DSEFS) options
Properties to enable and configure the DSE File System (DSEFS).
DSEFS replaced the Cassandra File System (CFS). DSE version 6.0 and later do not support CFS.

dsefs_options:
enabled:
keyspace_name: dsefs
work_dir: /var/lib/dsefs
public_port: 5598
private_port: 5599
data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
min_free_space: 5368709120

# service_startup_timeout_ms: 30000

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
169
Configuration

# service_close_timeout_ms: 600000
# server_close_timeout_ms: 2147483647 # Integer.MAX_VALUE
# compression_frame_max_size: 1048576
# query_cache_size: 2048
# query_cache_expire_after_ms: 2000
# gossip_options:
# round_delay_ms: 2000
# startup_delay_ms: 5000
# shutdown_delay_ms: 10000
# rest_options:
# request_timeout_ms: 330000
# connection_open_timeout_ms: 55000
# client_close_timeout_ms: 60000
# server_request_timeout_ms: 300000
# idle_connection_timeout_ms: 60000
# internode_idle_connection_timeout_ms: 120000
# core_max_concurrent_connections_per_host: 8
# transaction_options:
# transaction_timeout_ms: 3000
# conflict_retry_delay_ms: 200
# conflict_retry_count: 40
# execution_retry_delay_ms: 1000
# execution_retry_count: 3
# block_allocator_options:
# overflow_margin_mb: 1024
# overflow_factor: 1.05

dsefs_options
Enable and configure options for DSEFS.
enabled
Whether to enable DSEFS.

• true - enables DSEFS on this node, regardless of the workload.

• false - disables DSEFS on this node, regardless of the workload.

• blank or commented out (#) - DSEFS will start only if the node is configured to run analytics
workloads.

Default: commented out (blank)


keyspace_name
The keyspace where the DSEFS metadata is stored. You can optionally configure multiple DSEFS file
systems within a single datacenter by specifying different keyspace names for each cluster.
Default: commented out (dsefs)
work_dir
The local directory for storing the local node metadata, including the node identifier. The volume of data
stored in this directory is nominal and does not require configuration for throughput, latency, or capacity.
This directory must not be shared by DSEFS nodes.
Default: commented out (/var/lib/dsefs)
public_port
The public port on which DSEFS listens for clients.
DataStax recommends that all nodes in the cluster have the same value. Firewalls must open this
port to trusted clients. The service on this port is bound to the native_transport_address.
Default: commented out (5598)
private_port
The private port for DSEFS inter-node communication.
Do not open this port to firewalls; this private port must be not visible from outside of the cluster.
Default: commented out (5599)
data_directories
One or more data locations where the DSEFS data is stored.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
170
Configuration

- dir
Mandatory attribute to identify the set of directories. DataStax recommends segregating these data
directories on physical devices that are different from the devices that are used for DataStax Enterprise.
Using multiple directories on JBOD improves performance and capacity.
Default: commented out (/var/lib/dsefs/data)
storage_weight
The weighting factor for this location specifies how much data to place in this directory, relative to other
directories in the cluster. This soft constraint determines how DSEFS distributes the data. For example,
a directory with a value of 3.0 receives about three times more data than a directory with a value of 1.0.
Default: commented out (1.0)
min_free_space
The reserved space, in bytes, to not use for storing file data blocks. You can use a unit of measure
suffix to specify other size units. For example: terabyte (1 TB), gigabyte (10 GB), and megabyte (5000
MB).
Default: commented out (5368709120)
Advanced properties for DSEFS
service_startup_timeout_ms
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to bootstrap.
Default: commented out (30000)
service_close_timeout_ms
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to close.
Default: commented out (600000)
server_close_timeout_ms
Wait time, in milliseconds, that the DSEFS server waits during shutdown before closing all pending
connections.
Default: commented out (2147483647)
compression_frame_max_size
The maximum accepted size of a compression frame defined during file upload.
Default: commented out (1048576)
query_cache_size
Maximum number of elements in a single DSEFS Server query cache.
Default: commented out (2048)
query_cache_expire_after_ms
The time to retain the DSEFS Server query cache element in cache. The cache element expires when
this time is exceeded.
Default: commented out (2000)
gossip options
Options to configure DSEFS gossip rounds.
round_delay_ms
The delay, in milliseconds, between gossip rounds.
Default: commented out (2000)
startup_delay_ms
The delay time, in milliseconds, between registering the location and reading back all other locations
from the database.
Default: commented out (5000)
shutdown_delay_ms
The delay time, in milliseconds, between announcing shutdown and shutting down the node.
Default: commented out (30000)
rest_options
Options to configure DSEFS rest times.
request_timeout_ms
The time, in milliseconds, that the client waits for a response that corresponds to a given request.
Default: commented out (330000)
connection_open_timeout_ms
The time, in milliseconds, that the client waits to establish a new connection.
Default: commented out (55000)
client_close_timeout_ms

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
171
Configuration

The time, in milliseconds, that the client waits for pending transfer to complete before closing a
connection.
Default: commented out (60000)
server_request_timeout_ms
The time, in milliseconds, to wait for the server rest call to complete.
Default: commented out (300000)
idle_connection_timeout_ms
The time, in milliseconds, for RestClient to wait before closing an idle connection. If RestClient does not
close connection after timeout, the connection is closed after 2*idle_connection_timeout_ms.

• time - wait time to close idle connection

• 0 - disable closing idle connections

Default: commented out (60000)


internode_idle_connection_timeout_ms
Wait time, in milliseconds, before closing idle internode connection. The internode connections are
primarily used to exchange data during replication. Do not set lower than the default value for heavily
utilized DSEFS clusters.
Default: commented out (0) (disabled)
core_max_concurrent_connections_per_host
Maximum number of connections to a given host per single CPU core. DSEFS keeps a connection pool
for each CPU core.
Default: 8
transaction_options
Options to configure DSEFS transaction times.
transaction_timeout_ms
Transaction run time, in milliseconds, before the transaction is considered for timeout and rollback.
Default: 3000
conflict_retry_delay_ms
Wait time, in milliseconds, before retrying a transaction that was ended due to a conflict. Default: 200
conflict_retry_count
The number of times to retry a transaction before giving up. Default: 40
execution_retry_delay_ms
Wait time, in milliseconds, before retrying a failed transaction payload execution. Default: 1000
execution_retry_count
The number of payload execution retries before signaling the error to the application. Default: 3
block_allocator_options
Controls how much additional data can be placed on the local coordinator before the local node
overflows to the other nodes. The trade-off is between data locality of writes and balancing the cluster.
A local node is preferred for a new block allocation, if:

used_size_on_the_local_node < average_used_size_per_node * overflow_factor +


overflow_margin

overflow_margin_mb

• margin_size - overflow margin size in megabytes

• 0 - disable block allocation overflow

Default: commented out (1024)


overflow_factor

• factor - overflow factor on an exponential scale

• 1.0 - disable block allocation overflow

Default: commented out (1.05)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
172
Configuration

DSE Metrics Collector options


When data_dir is not uncommented, the default location of the DSE Metrics Collector data directory is the
same directory as the commitlog directory as defined in cassandra.yaml.

Uncomment these options only to change the default directories:

# insights_options:
# data_dir: /var/lib/cassandra/insights_data
# log_dir: /var/log/cassandra/

insights_options
Options for DSE Metrics Collector.
data_dir
Directory to store collected metrics. When not set, the default directory is /var/lib/cassandra/
insights_data.
When data_dir is not set, the default location of the /insights_data directory is the same location
as the /commitlog directory, as defined with the commitlog_directory property in cassandra.yaml.
log_dir
Directory to store logs for collected metrics. The log file is dse-collectd.log. The file with the collectd
PID is dse-collectd.pid. When not set, the default directory is /var/log/cassandra/.
Audit database activities
Track database activity using the audit log feature. To get the maximum information from data auditing, turn on
data auditing on every node.
See Setting up database auditing.

audit_logging_options
Options to enable and configure database activity logging.
enabled
Whether to enable database activity auditing.

• true - enables database activity auditing

• false - disables database activity auditing

Default: false
logger
The logger to use for recording events:

• SLF4JAuditWriter - Capture events in a log file.

• CassandraAuditWriter - Capture events in a table, dse_audit.audit_log.

Configure logging level, sensitive data masking, and log file name/location in the logback.xml file.
Default: SLF4JAuditWriter
included_categories
Comma separated list of event categories that are captured, where the category names are:

• QUERY - Data retrieval events.

• DML - (Data manipulation language) Data change events.

• DDL - (Data definition language) Database schema change events.

• DCL - (Data change language) Role and permission management events.

• AUTH - (Authentication) Login and authorization related events.

• ERROR - Failed requests.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
173
Configuration

• UNKNOWN - Events where the category and type are both UNKNOWN.

Event categories that are not listed are not captured.


Use either included_categories or excluded_categories but not both. When specifying included
categories leave excluded_categories blank or commented out.
Default: none (include all categories)
excluded_categories
Comma separated list of categories to ignore, where the categories are:

• QUERY - Data retrieval events.

• DML - (Data manipulation language) Data change events.

• DDL - (Data definition language) Database schema change events.

• DCL - (Data change language) Role and permission management events.

• AUTH - (Authentication) Login and authorization related events.

• ERROR - Failed requests.

• UNKNOWN - Events where the category and type are both UNKNOWN.

Events in all other categories are logged.


Use either included_categories or excluded_categories but not both. When specifying excluded
categories leave included_categories blank or commented out.
Default: none (exclude no categories )
included_keyspaces
The keyspaces for which events are logged. Specify keyspace names in a comma separated list or use
a regular expression to filter on keyspace name.
DSE supports using either included_keyspaces or excluded_keyspaces but not both. When
specifying included categories leave excluded_keyspaces blank or comment it out.
Default: none (include all keyspaces)
excluded_keyspaces
Log events for all keyspaces which are not listed. Specify a comma separated list keyspace names or
use a regular expression to filter on keyspace name. Only use this option if included_keyspaces is
blank or commented out.
Default: none (exclude no keyspaces)
included_roles
The roles for which events are logged. Log events for the listed roles. Specify roles in a comma
separated list.
DSE supports using either included_roles or excluded_roles but not both. When specifying
included_roles leave excluded_keyspaces blank or comment it out.
Default: none (include all roles)
excluded_roles
The roles for which events are not logged. Specify a comma separated list role names. Only use this
option if included_roles is blank or commented out.
Default: none (exclude no roles)
Cassandra audit writer options

retention_time: 0
cassandra_audit_writer_options:
mode: sync
batch_size: 50
flush_time: 250
queue_size: 30000
write_consistency: QUORUM
# dropped_event_log: /var/log/cassandra/dropped_audit_events.log

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
174
Configuration

# day_partition_millis: 3600000

retention_time
The amount of time, in hours, audit events are retained by supporting loggers. Only the
CassandraAuditWriter supports retention time.

• 0 - retain events forever

• hours - the number of hours to retain audit events

Default: 0 (retain events forever)


cassandra_audit_writer_options
Audit writer options.
mode
The mode the writer runs in.

• sync - A query is not executed until the audit event is successfully written.

• async - Audit events are queued for writing to the audit table, but are not necessarily logged before
the query executes. A pool of writer threads consumes the audit events from the queue, and writes
them to the audit table in batch queries.
While async substantially improves performance under load, if there is a failure between when
a query is executed, and its audit event is written to the table, the audit table might be missing
entries for queries that were executed.

Default: sync
batch_size
Available only when mode: async. Must be greater than 0.
The maximum number of events the writer dequeues before writing them out to the table. If
warnings in the logs reveal that batches are too large, decrease this value or increase the value of
batch_size_warn_threshold_in_kb in cassandra.yaml.
Default: 50
flush_time
Available only when mode: async.
The maximum amount of time in milliseconds before an event is removed from the queue by a writer
before being written out. This flush time prevents events from waiting too long before being written to
the table when there are not a lot of queries happening.
Default: 500
queue_size
The size of the queue feeding the asynchronous audit log writer threads. When there are more events
being produced than the writers can write out, the queue fills up, and newer queries are blocked until
there is space on the queue. If a value of 0 is used, the queue size is unbounded, which can lead to
resource exhaustion under heavy query load.
Default: 30000
write_consistency
The consistency level that is used to write audit events.
Default: QUORUM
dropped_event_log
The directory to store the log file that reports dropped events. When not set, the default is /var/log/
cassandra/dropped_audit_events.log.
Default: commented out (/var/log/cassandra/dropped_audit_events.log)
day_partition_millis
The interval, in milliseconds, between changing nodes to spread audit log information across multiple
nodes. For example, to change the target node every 12 hours, specify 43200000 milliseconds. When
not set, the default is 3600000 (1 hour).
Default: commented out (3600000) (1 hour)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
175
Configuration

DSE Tiered Storage options


Options to define one or more disk configurations for DSE Tiered Storage. Specify multiple disk configurations
as unnamed tiers by a collection of paths that are defined in priority order, with the fastest storage media in
the top tier. With heterogeneous storage configurations across the cluster, specify each disk configuration
with config_name:config_settings, and then use this configuration in CREATE TABLE or ALTER TABLE
statements.
DSE Tiered Storage does not change compaction strategies. To manage compression and compaction
options, use the compaction option. See Modifying compression and compaction.

# tiered_storage_options:
# strategy1:
# tiers:
# - paths:
# - /mnt1
# - /mnt2
# - paths: [ /mnt3, /mnt4 ]
# - paths: [ /mnt5, /mnt6 ]
#
# local_options:
# k1: v1
# k2: v2
#
# 'another strategy':
# tiers: [ paths: [ /mnt1 ] ]

tiered_storage_options
Options to configure the smart movement of data across different types of storage media so that data
is matched to the most suitable drive type, according to the performance and cost characteristics it
requires
strategy1
The first disk configuration strategy. Create a strategy2, strategy3, and so on. In this example, strategy1
is the configurable name of the tiered storage configuration strategy.
tiers
The unnamed tiers in this section define a storage tier with the paths and file paths that define the
priority order.
local_options
Local configuration options overwrite the tiered storage settings for the table schema in the local
dse.yaml file. See Testing DSE Tiered Storage configurations.
- paths
The section of file paths that define the data directories for this tier of the disk configuration. Typically
list the fastest storage media first. These paths are used only to store data that is configured to use
tiered storage. These paths are independent of any settings in the cassandra.yaml file.
- /filepath
The file paths that define the data directories for this tier of the disk configuration.
DSE Advanced Replication configuration settings
DSE Advanced Replication configuration options to replicate data from remote clusters to central data hubs.

# advanced_replication_options:
# enabled: false
# conf_driver_password_encryption_enabled: false
# advanced_replication_directory: /var/lib/cassandra/advrep
# security_base_path: /base/path/to/advrep/security/files/

advanced_replication_options
Options to enable and configure DSE Advanced Replication.
enabled
Whether to enable an edge node to collect data in the replication log.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
176
Configuration

Default: commented out (false)


conf_driver_password_encryption_enabled
Whether to enable encryption of driver passwords. When enabled, the stored driver password is
expected to be encrypted. See Encrypting configuration file properties.
Default: commented out (false)
advanced_replication_directory
The directory for storing advanced replication CDC logs. A directory replication_logs will be created
in the specified directory.
Default: commented out (/var/lib/cassandra/advrep)
security_base_path
The base path to prepend to paths in the Advanced Replication configuration locations, including
locations to SSL keystore, SSL truststore, and so on.
Default: commented out (/base/path/to/advrep/security/files/)
Inter-node messaging options
Configuration options for the internal messaging service used by several components of DataStax Enterprise. All
internode messaging requests use this service.

internode_messaging_options:
port: 8609
# frame_length_in_mb: 256
# server_acceptor_threads: 8
# server_worker_threads: 16
# client_max_connections: 100
# client_worker_threads: 16
# handshake_timeout_seconds: 10
# client_request_timeout_seconds: 60

internode_messaging_options
Configuration options for inter-node messaging.
port
The mandatory port for the inter-node messaging service.
Default: 8609
frame_length_in_mb
Maximum message frame length. When not set, the default is 256.
Default: commented out (256)
server_acceptor_threads
The number of server acceptor threads. When not set, the default is the number of available
processors.
Default: commented out
server_worker_threads
The number of server worker threads. When not set, the default is the number of available processors *
8.
Default: commented out
client_max_connections
The maximum number of client connections. When not set, the default is 100.
Default: commented out (100)
client_worker_threads
The number of client worker threads. When not set, the default is the number of available processors *
8.
Default: commented out
handshake_timeout_seconds
Timeout for communication handshake process. When not set, the default is 10.
Default: commented out (10)
client_request_timeout_seconds

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
177
Configuration

Timeout for non-query search requests like core creation and distributed deletes. When not set, the
default is 60.
Default: commented out (60)
DSE Multi-Instance server_id
server_id
In DSE Multi-Instance /etc/dse-nodeId/dse.yaml files, the server_id option is generated to uniquely
identify the physical server on which multiple instances are running. The server_id default value is the
media access control address (MAC address) of the physical server. You can change server_id when
the MAC address is not unique, such as a virtualized server where the host’s physical MAC is cloned.
DSE Graph options

• DSE Graph system-level options

• DSE Graph Gremlin Server options

DSE Graph system-level options


These graph options are system-level configuration options and options that are shared between graph
instances. Add an option if it is not present in the provided dse.yaml file.

# graph:
# analytic_evaluation_timeout_in_minutes: 10080
# realtime_evaluation_timeout_in_seconds: 30
# schema_agreement_timeout_in_ms: 10000
# system_evaluation_timeout_in_seconds: 180
# index_cache_size_in_mb: 128
# max_query_queue: 10000
# max_query_threads (no explicit default)
# max_query_params: 16

graph
These graph options are system-level configuration options and options that are shared between graph
instances.
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid.
The ISO 8601 format is deprecated.
analytic_evaluation_timeout_in_minutes
Maximum time to wait for an OLAP analytic (Spark) traversal to evaluate. When not set, the default is
10080 (168 hours).
Default: commented out (10080)
realtime_evaluation_timeout_in_seconds
Maximum time to wait for an OLTP real-time traversal to evaluate. When not set, the default is 30
seconds.
Default: commented out (30)
schema_agreement_timeout_in_ms
Maximum time to wait for the database to agree on schema versions before timing out. When not set,
the default is 10000 (10 seconds).
Default: commented out (10000)
system_evaluation_timeout_in_seconds
Maximum time to wait for a graph system-based request to execute, like creating a new graph. When
not set, the default is 180 (3 minutes).
Default: commented out (180)
schema_mode
Controls the way that the schemas are handled.

• Production = Schema must be created before data insertion. Schema cannot be changed after
data is inserted. Full graph scans are disallowed unless the option graph.allow_scan is changed to
TRUE.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
178
Configuration

• Development = No schema is required to write data to a graph. Schema can be changed after data
is inserted. Full graph scans are allowed unless the option graph.allow_scan is changed to FALSE.

When not set, the default is Production. If this option is not present, manually enter it to use
Development.
Default: not present
index_cache_size_in_mb
The amount of ram to allocate to the index cache. When not set, the default is 128.
Default: commented out (128)
max_query_queue
The maximum number of CQL queries that can be queued as a result of Gremlin requests. Incoming
queries are rejected if the queue size exceeds this setting. When not set, the default is 10000.
Default: commented out (10000)
max_query_threads
The maximum number of threads to use for queries to the database. When this option is not set, the
default is calculated:

• If gremlinPool is present and nonzero:


10 * the gremlinPool setting

• If gremlinPool is not present in this file or set to zero:


The number of available CPU cores

See gremlinPool.
Default: calculated
max_query_params
The maximum number of parameters that can be passed on a graph query request for TinkerPop
drivers and drivers using the Cassandra native protocol. Passing very large numbers of parameters
on requests is an anti-pattern, because the script evaluation time increases proportionally. DataStax
recommends reducing the number of parameters to speed up script compilation times. Before you
increase this value, consider alternate methods for parameterizing scripts, like passing a single map. If
the graph query request requires many arguments, pass a list.
Default: commented out (16)
DSE Graph Gremlin Server options
The Gremlin Server is configured using Apache TinkerPop specifications.

# gremlin_server:
# port: 8182
# threadPoolWorker: 2
# gremlinPool: 0
# scriptEngines:
# gremlin-groovy:
# config:
# sandbox_enabled: false
# sandbox_rules:
# whitelist_packages:
# - package.name
# whitelist_types:
# - fully.qualified.type.name
# whitelist_supers:
# - fully.qualified.class.name
# blacklist_packages:
# - package.name
# blacklist_supers:
# - fully.qualified.class.name

gremlin_server
The top-level configurations in Gremlin Server.
port
The available communications port for Gremlin Server. When not set, the default is 8182.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
179
Configuration

Default: commented out (8182)


threadPoolWorker
The number of worker threads that handle non-blocking read and write (requests and responses) on the
Gremlin Server channel, including routing requests to the right server operations, handling scheduled
jobs on the server, and writing serialized responses back to the client. When not set, the default is 2.
Default: commented out (2)
gremlinPool
The number of Gremlin threads available to execute actual scripts in a ScriptEngine. This pool
represents the workers available to handle blocking operations in Gremlin Server.

• 0 - the value of the JVM property cassandra.available_processors, if that property is set

• When not set - the value of Runtime.getRuntime().availableProcessors()

Default: commented out (0)


scriptEngines
Section to configure gremlin server scripts.
gremlin-groovy
Section for gremlin-groovy scripts.
sandbox_enabled
Sandbox is enabled by default. To disable the gremlin groovy sandbox entirely, set to false.
sandbox_rules
Section for sandbox rules.
whitelist_packages
List of packages, one package per line, to whitelist.
-package.name
Retain the hyphen before the fully qualified package name.
whitelist_types
List of types, one type per line, to whitelist.
-fully.qualified.type.name
Retain the hyphen before the fully qualified type name.
whitelist_supers
List of super classes, one class per line, to whitelist. Retain the hyphen before the fully qualified class
name.
-fully.qualified.class.name
Retain the hyphen before the fully qualified class name.
blacklist_packages
List of packages, one package per line, to blacklist.
-package.name
Retain the hyphen before the fully qualified package name.
blacklist_supers
List of super classes, one class per line, to blacklist. Retain the hyphen before the fully qualified class
name.
-fully.qualified.class.name
Retain the hyphen before the fully qualified class name.
See also remote.yaml file for Gremlin console configuration .
remote.yaml configuration file
The remote.yaml file is the primary configuration file for DSE Graph Gremlin console connection to the Gremlin
Server.
The dse.yaml file is the primary configuration file for the DataStax Enterprise Graph configuration, and includes
the setting for the Gremlin Server options.
Synopsis
For the properties in each section, the parent setting has zero spaces. Each child entry requires at least
two spaces. Adhere to the YAML syntax and retain the spacing. For example, no spaces before the parent
node_health_options entry, and at least two spaces before the child settings:

node_health_options:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
180
Configuration

refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30

DSE Graph Gremlin basic options


An Apache TinkerPop YAML file, remote.yaml, is configured with Gremlin Server information: The Gremlin
Server is configured using Apache TinkerPop specifications.

hosts: [localhost]
port: 8182
serializer: { className:
org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0,
config: { ioRegistries:
[org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0] }}

hosts
Identifies a host or hosts running a DSE node that is running Gremlin Server. You may need to use the
native_transport_address value set in cassandra.yaml.
Default: [localhost]
You can also connect to the Spark Master node for the datacenter by either running the console from
the Spark Master or specifying the Spark Master in the hosts field in the remote.yaml file.
port
Identifies a port on a DSE node running Gremlin Server. The port value needs to match the port value
specified for gremlin_server: in the dse.yaml file.
Default: 8182
serializer
Specifies the class and configuration for the serializer used to pass information between the Gremlin
console and the Gremlin Server.
Default: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0,
config: { ioRegistries:
[org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0]
DSE Graph Gremlin connectionPool options
The connectionPool settings specify a number of options that will be passed between the Gremlin console and
the Gremlin Server.

connectionPool: {
enableSsl: false,
maxContentLength: 65536000,
maxInProcessPerConnection: 4,
maxSimultaneousUsagePerConnection: 16,
maxSize: 8,
maxWaitForConnection: 3000,
maxWaitForSessionClose: 3000,
minInProcessPerConnection: 1,
minSimultaneousUsagePerConnection: 8,
minSize: 2,
reconnectInterval: 1000,
resultIterationBatchSize: 64,
# trustCertChainFile: /etc/dse/graph/gremlin-console/conf/mycert.pem
# Note: trustCertChainFile deprecated as of TinkerPop 3.2.10; instead use trustStore.
trustStore: /full/path/to/jsse/truststore/file
}

enableSsl
Determines if SSL should be enabled. If enabled on the server, SSL must be enabled on the client.
To configure the Gremlin console to use SSL, when SSL is enabled on the Gremlin Server, edit the
connectionPool section of remote.yaml:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
181
Configuration

• Set enableSsl to true.

• Specify the path to the:

# Java Secure Socket Extension (JSSE) truststore file via the trustStore parameter

# Or the PEM-based trustCertChainFile


trustCertChainFile is deprecated as of TinkerPop 3.2.10 If SSL is enabled, when
you can, switch to specifying the JSSE truststore file via the trustStore parameter in
remote.yaml.

Example:

hosts: [localhost]
username: Cassandra_username
password: Cassandra_password
port: 8182
...
connectionPool: {
enableSsl: true,
trustStore: /full/path/to/JSSE/truststore/file,
...
...

For related information, refer to the TinkerPop security documentation.


Default: false
maxContentLength
The maximum length in bytes that a message can be sent to the server. This number can be no greater
than the setting of the same name in the server configuration.
Default: 65536000
maxInProcessPerConnection
The maximum number of in-flight requests that can occur on a connection.
Default: 4
maxSimultaneousUsagePerConnection
The maximum number of times that a connection can be borrowed from the pool simultaneously.
Default: 16
maxSize
The maximum size of a connection pool for a host.
Default: 8
maxWaitForConnection
The amount of time in milliseconds to wait for a new connection before timing out.
Default: 3000
maxWaitForSessionClose
The amount of time in milliseconds to wait for a session to close before timing out (does not apply to
sessionless connections).
Default: 3000
minInProcessPerConnection
The minimum number of in-flight requests that can occur on a connection.
Default: 1
minSimultaneousUsagePerConnection
The maximum number of times that a connection can be borrowed from the pool simultaneously.
Default: 8
minSize
The minimum size of a connection pool for a host.
Default: 2
reconnectInterval
The amount of time in milliseconds to wait before trying to reconnect to a dead host.
Default: 1000
resultIterationBatchSize

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
182
Configuration

The override value for the size of the result batches to be returned from the server.
Default: 64
trustCertChainFile
The location of the public certificate from the DSE truststore file, in PEM format. Also set enableSsl:
true.

Deprecated as of TinkerPop 3.2.10. Instead use trustStore.

If you are using the deprecated trustCertChainFile in your version of remote.yaml, here are
the details. Depending on how you created the DSE truststore file, you may already have the
PEM format certificate file from the root Certificate Authority. If so, specify the PEM file with this
trustCertChainFile option. If not, export the public certificate from the DSE truststore (CER format)
and convert it to PEM format. Then specify the PEM file with this option. Example:

$ pwd

/etc/dse/graph/gremlin-console/conf

$ keytool -export -keystore /etc/dse/keystores/client.truststore -alias clusterca


-file mycert.cer

$ openssl x509 -inform der -in mycert.cer -out mycert.pem

In this example, the connectionPool section of remote.yaml should then include the following options
(assuming you are aware that trustCertChainFile is deprecated, as noted above).

connectionPool: {
enableSsl: true,
trustCertChainFile: /etc/dse/graph/gremlin-console/conf/mycert.pem,
...
}

Default: Unspecified
trustStore
The location of the Java Secure Socket Extension (JSSE) truststore file. Trusted certificates for verifying
the remote client's certificate. Similar to setting the JSSE property javax.net.ssl.trustStore. If
this value is not provided in remote.yaml and if SSL is enabled (via enableSSL: true), the default
TrustManager is used.
Default: Unspecified
DSE Graph Gremlin AuthProperties options
Security considerations for authentication between the Gremlin console and the Gremlin server require additional
options in the remote.yaml file.

# jaasEntry:
# protocol:
# username: xxx
# password: xxx

jaasEntry
Sets the AuthProperties.Property.JAAS_ENTRY properties for authentication to Gremlin Server.
Default: commented out (no value)
protocol
Sets the AuthProperties.Property.PROTOCOL properties for authentication to Gremlin Server.
Default: commented out (no value)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
183
Configuration

username
The username to submit on requests that require authentication.
Default: commented out (xxx)
password
The password to submit on requests that require authentication.
Default: commented out (xxx)
cassandra-rackdc.properties file
The GossipingPropertyFileSnitch, Ec2Snitch, and Ec2MultiRegionSnitch use the cassandra-rackdc.properties
configuration file to determine which datacenters and racks nodes belong to. They inform the database about the
network topology to route requests efficiently and distribute replicas evenly. Settings for this file depend on the
type of snitch:

• GossipingPropertyFileSnitch

• Configuring the Amazon EC2 single-region snitch

• Configuring Amazon EC2 multi-region snitch

This page also includes instructions for migrating from the PropertyFileSnitch to the GossipingPropertyFileSnitch.
GossipingPropertyFileSnitch
This snitch is recommended for production. It uses rack and datacenter information for the local node defined in
the cassandra-rackdc.properties file and propagates this information to other nodes via gossip.
To configure a node to use GossipingPropertyFileSnitch, edit the cassandra-rackdc.properties file as follows:

• Define the datacenter and rack that include this node. The default settings:

dc=DC1
rack=RAC1

datacenter and rack names are case-sensitive. For examples, see Initializing a single datacenter per
workload type and Initializing multiple datacenters per workload type.

• To save bandwidth, add the prefer_local=true option. This option tells DataStax Enterprise to use the
local IP address when communication is not across different datacenters.

Migrating from the PropertyFileSnitch to the GossipingPropertyFileSnitch


To allow migration from the PropertyFileSnitch, the GossipingPropertyFileSnitch uses the cassandra-
topology.properties file when present. Delete the file after the migration is complete. For more information
about migration, see Switching snitches.

The GossipingPropertyFileSnitch always loads cassandra-topology.properties when that


file is present. Remove the file from each node on any new cluster or any cluster migrated from the
PropertyFileSnitch.

cassandra-topology.properties file
The PropertyFileSnitch uses the cassandra-topology.properties for datacenters and rack names and to
determine network topology so that requests are routed efficiently and allows the database to distribute replicas
evenly.
The GossipingPropertyFileSnitch snitch is recommended for production. See Migrating from the
PropertyFileSnitch to the GossipingPropertyFileSnitch.

PropertyFileSnitch
This snitch determines proximity as determined by rack and datacenter. It uses the network details located in the
cassandra-topology.properties file. When using this snitch, you can define your datacenter names to be whatever
you want. Make sure that the datacenter names correlate to the name of your datacenters in the keyspace

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
184
Configuration

definition. Every node in the cluster should be described in the cassandra-topology.properties file, and this
file should be exactly the same on every node in the cluster.
Setting datacenters and rack names
If you had non-uniform IPs and two physical datacenters with two racks in each, and a third logical datacenter for
replicating analytics data, the cassandra-topology.properties file might look like this:

Datacenter and rack names are case-sensitive.

# datacenter One

175.56.12.105=DC1:RAC1
175.50.13.200=DC1:RAC1
175.54.35.197=DC1:RAC1

120.53.24.101=DC1:RAC2
120.55.16.200=DC1:RAC2
120.57.102.103=DC1:RAC2

# datacenter Two

110.56.12.120=DC2:RAC1
110.50.13.201=DC2:RAC1
110.54.35.184=DC2:RAC1

50.33.23.120=DC2:RAC2
50.45.14.220=DC2:RAC2
50.17.10.203=DC2:RAC2

# Analytics Replication Group

172.106.12.120=DC3:RAC1
172.106.12.121=DC3:RAC1
172.106.12.122=DC3:RAC1

# default for unknown nodes


default =DC3:RAC1

Configuring snitches for cloud providers


Configure a cloud provider snitch that corresponds to the provider.
Configuring the Amazon EC2 single-region snitch
Use the Ec2Snitch for simple cluster deployments on Amazon EC2 where all nodes in the cluster are within a
single region. Because private IPs are used, this snitch does not work across multiple regions.
In EC2 deployments, the region name is treated as the datacenter name and availability zones are treated as
racks within a datacenter. For example, if a node is in the us-east-1 region, us-east is the datacenter name and
1 is the rack location. (Racks are important for distributing replicas, but not for datacenter naming.)
If you are using only a single datacenter, you do not need to specify any properties.
If you need multiple datacenters, set the dc_suffix options in the cassandra-rackdc.properties file. Any other
lines are ignored.
For example, for each node within the us-east region, specify the datacenter in its cassandra-
rackdc.properties file:
datacenter names are case-sensitive.

• node0
dc_suffix=_1_cassandra

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
185
Configuration

• node1
dc_suffix=_1_cassandra

• node2
dc_suffix=_1_cassandra

• node3
dc_suffix=_1_cassandra

• node4
dc_suffix=_1_analytics

• node5
dc_suffix=_1_search

This results in three datacenters for the region:

us-east_1_cassandra
us-east_1_analytics
us-east_1_search

The datacenter naming convention in this example is based on the workload. You can use other conventions,
such as DC1, DC2 or 100, 200.

Keyspace strategy options


When defining your keyspace strategy options, use the EC2 region name, such as ``us-east``, as your
datacenter name.
Configuring Amazon EC2 multi-region snitch
Use the Ec2MultiRegionSnitch for deployments on Amazon EC2 where the cluster spans multiple regions.
You must configure settings in both the cassandra.yaml file and the property file (cassandra-
rackdc.properties) used by the Ec2MultiRegionSnitch.

Configuring cassandra.yaml for cross-region communication


The Ec2MultiRegionSnitch uses public IP designated in the broadcast_address to allow cross-region
connectivity. Configure each node as follows:

1. In the cassandra.yaml, set the listen_address to the private IP address of the node, and the
broadcast_address to the public IP address of the node.
This allows DataStax Enterprise nodes in one EC2 region to bind to nodes in another region, thus enabling
multiple datacenter support. For intra-region traffic, DataStax Enterprise switches to the private IP after
establishing a connection.

2. Set the addresses of the seed nodes in the cassandra.yaml file to that of the public IP. Private IP are not
routable between networks. For example:

seeds: 50.34.16.33, 60.247.70.52

To find the public IP address, from each of the seed nodes in EC2:

$ curl http://instance-data/latest/meta-data/public-ipv4

Do not make all nodes seeds, see Internode communications (gossip).

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
186
Configuration

3. Be sure that the storage_port or ssl_storage_port is open on the public IP firewall.

Configuring the snitch for cross-region communication


In EC2 deployments, the region name is treated as the datacenter name and availability zones are treated as
racks within a datacenter. For example, if a node is in the us-east-1 region, us-east is the datacenter name and
1 is the rack location. (Racks are important for distributing replicas, but not for datacenter naming.)
For each node, specify its datacenter in the cassandra-rackdc.properties. The dc_suffix option defines the
datacenters used by the snitch. Any other lines are ignored.
In the example below, there are two DataStax Enterprise datacenters and each datacenter is named for its
workload. The datacenter naming convention in this example is based on the workload. You can use other
conventions, such as DC1, DC2 or 100, 200. (datacenter names are case-sensitive.)

Region: us-east Region: us-west

Node and datacenter: Node and datacenter:

• node0 • node0
dc_suffix=_1_transactional dc_suffix=_1_transactional

• node1 • node1
dc_suffix=_1_transactional dc_suffix=_1_transactional

• node2 • node2
dc_suffix=_2_transactional dc_suffix=_2_transactional

• node3 • node3
dc_suffix=_2_transactional dc_suffix=_2_transactional

• node4 • node4
dc_suffix=_1_analytics dc_suffix=_1_analytics

• node5 • node5
dc_suffix=_1_search dc_suffix=_1_search

This results in four us-east datacenters: This results in four us-west datacenters:

us-east_1_transactional us-west_1_transactional
us-east_2_transactional us-west_2_transactional
us-east_1_analytics us-west_1_analytics
us-east_1_search us-west_1_search

Keyspace strategy options


When defining your keyspace strategy options, use the EC2 region name, such as ``us-east``, as your
datacenter name.
Configuring the Google Cloud Platform snitch
Use the GoogleCloudSnitch for DataStax Enterprise deployments on Google Cloud Platform across one or
more regions. The region is treated as a datacenter and the availability zones are treated as racks within the
datacenter. All communication occurs over private IP addresses within the same logical network.
The region name is treated as the datacenter name and zones are treated as racks within a datacenter. For
example, if a node is in the us-central1-a region, us-central1 is the datacenter name and a is the rack location.
(Racks are important for distributing replicas, but not for datacenter naming.) This snitch can work across
multiple regions without additional configuration.
If you are using only a single datacenter, you do not need to specify any properties.
If you need multiple datacenters, set the dc_suffix options in the cassandra-rackdc.properties file. Any other
lines are ignored.
For example, for each node within the us-central1 region, specify the datacenter in its cassandra-
rackdc.properties file:
Datacenter names are case-sensitive.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
187
Configuration

Node dc_suffix

node0 dc_suffix=_a_transactional

node1 dc_suffix=_a_transactional

node2 dc_suffix=_a_transactional

node3 dc_suffix=_a_transactional

node4 dc_suffix=_a_analytics

node5 dc_suffix=_a_search

Configuring the Apache CloudStack snitch


Use the CloudstackSnitch for Apache CloudStack environments. Because zone naming is free-form in Apache
CloudStack, this snitch uses the widely-used <country> <az> notation.

Setting system properties during startup


Use the system property (-D) switch to modify the DataStax Enterprise (DSE) settings during start up.
To automatically pass the settings each time DSE starts, uncomment or add the switch to the jvm.options file.

Synopsis
Change the start up parameters using the following syntax:

• Command line:

dse cassandra -Dparameter_name=value

• jvm.options file:

-Dparameter_name=value

• cassandra-env.sh file:

JVM_OPTS="$JVM_OPTS -Dparameter_name=value"

Only pass the parameter to the start-up operation once. If the same switch is passed to the start operation
multiple times, for example from both the jvm.options file and on the command line, DSE may fail to start or
may use the wrong parameter.

Startup examples
Starting a node without joining the ring:

• Command line:

dse cassandra -Dcassandra.join_ring=false

• jvm.options:

-Dcassandra.join_ring=false

Replacing a dead node:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
188
Configuration

• Command line:

dse cassandra -Dcassandra.replace_address=10.91.176.160

• jvm.options:

-Dcassandra.replace_address=10.91.176.160

Changing LDAP authentication retry interval from its default of 10 ms:

• Command line:

dse -Ddse.ldap.retry_interval.ms=20

• jvm.options:

-Ddse.ldap.retry_interval.ms=20

Cassandra system properties


Cassandra native Java Virtual Machine (JVM) system parameters.
-Dcassandra.auto_bootstrap
Set auto_bootstrap to false during the initial set up of a node to override the default setting in the
cassandra.yaml file.
Default: true.
-Dcassandra.available_processors
Number of processors available to DSE. In a multi-instance deployment, each instance independently
assumes that all CPU processors are available to it. Use this setting to specify a smaller set of
processors.
Default: all_processors.
-Dcassandra.config
Set to the directory location of the cassandra.yaml file.
Default: depends on the type of installation.
-Dcassandra.consistent.rangemovement
Set to true, makes bootstrapping behavior effective.
Default: false.
-Ddse.consistent_replace
Specify the level of consistency required during a node replacement (ONE, QUORUM, or LOCAL_QUORUM).
The default value, ONE, may result in possibly stale data but uses less system resources. If set to
QUORUM or LOCAL_QUORUM, the replacement node coordinates repair among a (local) quorum of replicas
concurrently with replacement streaming. Repair transfers the differences to the replacement node,
ensuring it is consistent with other replicas when the replacement process is finished, assuming data is
inserted using either QUORUM or LOCAL_QUORUM consistency levels.

The value for consistent replace should match the value for application read consistency.
Default: ONE
-Ddse.consistent_replace.parallelism
Specify how many ranges will be repaired simultaneously during a consistent replace. The higher
the parallelism, the more resources are consumed cluster-wide, which may affect overall cluster
performance. Used only in conjunction with Dcassandra.consistent_replace.
Default: 2
-Ddse.consistent_replace.retries
Specify how many times a failed repair will be retried during a replace. If all retries fail, the replace fails.
Used only in conjunction with Dcassandra.consistent_replace.
Default: 3
-Ddse.consistent_replace.whitelist

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
189
Configuration

Specify keyspaces and tables on which to perform a consistent replace. The keyspaces and tables
can be specified as: “ks1, ks2.cf1”. The default is blank, in which case all keyspaces and tables are
replaced. Used only in conjunction with Dcassandra.consistent_replace.
Default: blank (not set)
-Dcassandra.disable_auth_caches_remote_configuration
Set to true to disable authentication caches, for example the caches used for credentials, permissions,
and roles. This will mean those config options can only be set (persistently) in cassandra.yaml and will
require a restart for new values to take effect.
Default: false.
-Dcassandra.expiration_date_overflow_policy
Set the policy (REJECT or CAP) for any TTL (time to live) timestamps that exceeds the maximum value
supported by the storage engine, 2038-01-19T03:14:06+00:00. The database storage engine can
only encode TTL timestamps through January 19 2038 03:14:07 UTC due to the Year 2038 problem.

• REJECT: Reject requests that contain an expiration timestamp later than


2038-01-19T03:14:06+00:00.

• CAP: Allow requests and insert expiration timestamps later than 2038-01-19T03:14:06+00:00 as
2038-01-19T03:14:06+00:00.

• CAP-NOWARN: Allow requests and insert expiration timestamps later than


2038-01-19T03:14:06+00:00 as 2038-01-19T03:14:06+00:00, but do not emit a warning.

Default: REJECT.
-Dcassandra.force_default_indexing_page_size
Set to true to disable dynamic calculation of the page size used when indexing an entire partition
during initial index build or a rebuild. Fixes the page size to the default of 10000 rows per page.
Default: false.
-Dcassandra.ignore_dc
Set to true to ignore the datacenter name change on startup. Applies only when using
DseSimpleSnitch.
Default: false.
-Dcassandra.initial_token
Use when DSE is not using virtual nodes (vnodes). Set to the initial partitioner token for the node on the
first start up.
Default: blank (not set).
Vnodes automatically select tokens.
-Dcassandra.join_ring
Set to false to prevent the node from joining a ring on startup.
Add the node to the ring afterwards using nodetool join and a JMX call.
Default: true.
-Dcassandra.load_ring_state
Set to false to clear all gossip state for the node on restart.
Default: true.
-Dcassandra.metricsReporterConfigFile
Enables pluggable metrics reporter and configures it from the specified file.
Default: blank (not set).
-Dcassandra.native_transport_port
Set to the port number that CQL native transport listens for clients.
Default: 9042.
-Dcassandra.native_transport_startup_delay_seconds
Set to the number of seconds to delay the native transport server start up.
Default: 0 (no delay).
-Dcassandra.partitioner
Set to the partitioner name.
Default: org.apache.cassandra.dht.Murmur3Partitioner.
-Dcassandra.partition_sstables_by_token_range
Set to false to disable JBOD SSTable partitioning by token range to multiple data_file_directories.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
190
Configuration

Advanced setting that should only be used with guidance from DataStax Support.
Default: true.
-Dcassandra.printHeapHistogramOnOutOfMemoryError
Set to false to disable a heap histogram dump on an OutOfMemoryError.
Default: false.
-Dcassandra.replace_address
Set to the listen_address or the broadcast_address when replacing a dead node with a new node. The
new node must be in the same state as before bootstrapping, without any data in its data directory.
The broadcast_address defaults to the listen_address except when the ring is using the
Configuring Amazon EC2 multi-region snitch.
-Dcassandra.replace_address_first_boot
Same as -Dcassandra.replace_address but only runs the first time the Cassandra node boots.
This property is preferred over -Dcassandra.replace_address since it has no effect on subsequent
boots if it is not removed from jvm.options or cassandra-env.sh.
-Dcassandra.replayList
Allows restoring specific tables from an archived commit log.
-Dcassandra.ring_delay_ms
Set to the number of milliseconds the node waits to hear from other nodes before formally joining the
ring.
Default: 30000.
-Dcassandra.ssl_storage_port
Sets the SSL port for encrypted communication.
Default: 7001.
-Dcassandra.start_native_transport
Enables or disables the native transport server. See start_native_transport in cassandra.yaml.
Default: true.
-Dcassandra.storage_port
Sets the port for inter-node communication.
Default: 7000.
-Dcassandra.write_survey
Set to true to enable a tool for testing new compaction and compression strategies. write_survey
allows you to experiment with different strategies and benchmark write performance differences without
affecting the production workload. See Testing compaction and compression.
Default: false.
Java Management Extension system properties
DataStax Enterprise exposes metrics and management operations via Java Management Extensions (JMX).
JConsole and the nodetool utility are JMX-compliant management tools.
-Dcom.sun.management.jmxremote.port
Sets the port number on which the database listens for JMX connections.
By default, you can interact with DataStax Enterprise using JMX on port 7199 without authentication.
Default: 7199
-Dcom.sun.management.jmxremote.ssl
Change to true to enable SSL for JMX.
Default: false
-Dcom.sun.management.jmxremote.authenticate
True enables remote authentication for JMX.
Default: false
-Djava.rmi.server.hostname
Sets the interface hostname or IP that JMX should use to connect. Uncomment and set if you are
having trouble connecting.
Search system properties
DataStax Enterprise (DSE) Search system properties.
-Ddse.search.client.timeout.secs

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
191
Configuration

Set the timeout in seconds for native driver search core management calls using the dsetool search-
specific commands.
Default: 600 (10 minutes).
-Ddse.search.query.threads
Sets the number of Search queries that can execute in parallel. Consider increasing this value or
reducing client/driver requests per connection if EnqueuedRequestCount does not stabilize near zero.
Default: The default is two times the number of CPUs (including hyperthreading).
-Ddse.timeAllowed.enabled.default
The Solr timeAllowed option is enforced by default to prevent long-running shard queries (such as
complex facets and Boolean queries) from using system resources after they have timed out from the
DSE Search coordinator.
DSE Search checks the timeout per segment instead of during document or terms iteration. The
system property solr.timeAllowed.docsPerSample has been removed.
By default for all queries, the timeAllowed value is the same as the
internode_messaging_options.client_request_timeout_seconds setting in dse.yaml. For more
details, see Limiting queries by time.
Using the Solr timeAllowed parameter may cause a latency cost. If you find the cost for queries is
too high in your environment, consider setting the -Ddse.timeAllowed.enabled.default property
to false at DSE startup time. Or set timeAllowed.enable to false in the query.
Default: true.
-Ddse.solr.data.dir
Set the path to store DSE Search data. See Set the location of search indexes.
-Dsolr.offheap.enable
The DSE Search per-segment filter cache is moved off-heap by using native memory to reduce on-
heap memory consumption and garbage collection overhead. The off-heap filter cache is enabled by
default. To disable, set to false to pass the offheap JVM system property at startup time. When not set,
the default is true.
Default: true
Threads per core system properties
Tune TPC using the Netty system parameters.
-Ddse.io.aio.enable
Set to false to have all read operations use the AsynchronousFileChannel regardless of the
operating system or disk type.
The default setting true allows dynamic switching of libraries for read operations as follows:

• LibAIO on solid state drives (SSD) and EXT4/XFS

• AsynchronousFileChannel for read operations on hard disk drives and all non-Linux operating
systems

Use this advanced setting only with guidance from DataStax Support.
Default: true
-Ddse.io.aio.force
Set to true to force all read operations to use LibAIO regardless of the disk type or operating system.
Use this advanced setting only with guidance from DataStax Support.
Default: false
-Dnetty.eventloop.busy_extra_spins=N
Set to the number of iterations in the epoll event loops performed when queues are empty before
moving on to the next backoff stage. Increasing the value reduces latency while increasing CPU usage
when the loops are idle.
Default: 10
-Dnetty.epoll_check_interval_nanos

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
192
Configuration

Sets the granularity for calling an epoll select in nanoseconds, which is a system call. Setting the value
too low impacts performance because by making too many system calls. Setting the value too high,
impacts performance by delaying the discovery of new events.
Default: 2000
-Dnetty.schedule_check_interval_nanos
Set the granularity for checking if scheduled events are ready to execute in nanoseconds. Specifying a
value below 1 nanosecond is not productive. Too high a values delays scheduled tasks.
Default: 1000
LDAP system properties for DataStax Enterprise Authentication
-Ddse.ldap.connection.timeout.ms
The number of milliseconds before the connection timesout.
Default:
-Ddse.ldap.retry_interval.ms
Allows you to set the time in milliseconds between subsequent retries when authenticating via an LDAP
server.
Default: 10
-Ddse.ldap.pool.min.idle
Finer control over the connection pool for DataStax Enterprise LDAP authentication connector. The
min idle settings determines the minimum number of connections allowed in the pool before the evictor
thread will create new connections. This setting has no effect if the evictor thread isn't configured to run.
Default:
-Ddse.ldap.pool.exhausted.action
Determines what the pool does when it is full. It can be one of:

• fail - the pool with throw an exception

• block - the pool will block for max wait ms (default)

• grow - the pool will just keep growing (not recommended)

Default: block
-Ddse.ldap.pool.max.wait
When the dse.ldap.pool.exhausted.action is block, sets the number of milliseconds to block the
pool before throwing an exception.
Default:
-Ddse.ldap.pool.test.borrow
Tests a connection when it is borrowed from the pool.
Default:
-Ddse.ldap.pool.test.return
Tests a connection returned to the pool.
Default:
-Ddse.ldap.pool.test.idle
Tests any connections in the eviction loop that are not being evicted. Only works if the time between
eviction runs is greater than 0ms.
Default:
-Ddse.ldap.pool.time.between.evictions
Determines the time in ms (milliseconds) between eviction runs. When run with the
dse.ldap.pool.test.idle this becomes a basic keep alive for connections.
Default:
-Ddse.ldap.pool.num.tests.per.eviction
Number of connections in the pool that are tested each connection run. If this is set the same as max
active (the pool size) then all connections will be tested each eviction run.
Default:
-Ddse.ldap.pool.min.evictable.idle.time.ms
Determines the minimum time in ms (milliseconds) that a connection can sit in the pool before it
becomes available for eviction.
Default:
-Ddse.ldap.pool.soft.min.evictable.idle.time.ms

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
193
Configuration

Determines the minimum time in ms (milliseconds) that a connection can sit the pool before it
becomes available for eviction with the proviso that the number of connections doesn't fall below
dse.ldap.pool.min.evictable.idle.time.ms.
Default:
Kerberos system properties
-Ddse.sasl.protocol
Kerberos principal name, user@realm. For example, dse_admin@EXAMPLE.com.
-Djava.security.auth.login.config
The path to the JAAS configuration file for DseClient.
NodeSync system parameters
-Ddse.nodesync.controller_update_interval_sec
Set the frequency to execute NodeSync auto-tuning process in seconds.
Default: 300 (5 minutes).
-Ddse.nodesync.log_reporter_interval_sec
Set the frequency of short INFO progress report in seconds.
Default: 600 (10 minutes).
-Ddse.nodesync.min_validation_interval_sec
Set to the minimum number of seconds between validations of the same segment, mostly to avoid busy
spinning on new/empty clusters.
Default: 300 (5 minutes).
-Ddse.nodesync.min_warn_interval_sec
Set to the minimum number of seconds between logging warnings.
Avoid logging warnings too often.
Default: 36000 (10 hours).
-Ddse.nodesync.rate_checker_interval_sec
Set the frequency in seconds of comparing the current configured rate to tables and their deadline. Log
a warning if rate considered too low.
Default: 1800 (30 minutes).
-Ddse.nodesync.segment_lock_timeout_sec
Set the Time-to-live (TTL) on locks inserted in the status table in seconds.
Default: 600 (10 minutes).
-Ddse.nodesync.segment_size_target_bytes
Set to the targeted maximum size for segments in bytes.
Default: 26214400 (200 MB).
-Ddse.nodesync.size_checker_interval_sec
Set the frequency to check if the depth used for a table should be updated due to data size changes in
seconds.
Default: 7200 (2 hours).

Choosing a compaction strategy


To implement a compaction strategy, follow these steps:

1. Read how data is maintained to understand the compaction strategies.

2. Answer the questions below to determine the appropriate compaction strategy for each table.

3. Configure each table to use the appropriate compaction strategy.

4. Test the compaction strategy with your data.

Which compaction strategy is best?


The following questions are based on developer and user experience with the compaction strategies.
Does your table process time series data?

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
194
Configuration

If the answer is yes, use TWCS (TimeWindowCompactionStrategy). If the answer is no, read the
following questions.
Does your table handle more reads than writes, or more writes than reads?
LCS (LeveledCompactionStrategy) is appropriate if there are twice or more reads than writes, especially
randomized reads. If the reads and writes are approximately equal, the performance penalty from LCS
may not be worth the benefit. Be aware that LCS can be overwhelmed by a high number of writes. One
advantage of LCS is that it keeps related data in a small set of SSTables.
Does the data in your table change often?
If your data is immutable or there are few upserts, use STCS (SizeTieredCompactionStrategy), which
does not have the write performance penalty of LCS.
Do you require predictable levels of read and write activity?
LCS keeps the SSTables within predictable sizes and numbers. For example, if your table's read and
write ratio is small, and the read activity is expected to conform to a Service Level Agreement (SLA), it
may be worth the LCS write performance penalty to keep read rates and latency at predictable levels.
And, you may be able to overcome the LCS write penalty by adding more nodes.
Will your table be populated by a batch process?
For batched reads and writes, STCS performs better than LCS. The batch process causes little or no
fragmentation, so the benefits of LCS are not realized; batch processes can overwhelm tables that use
LCS.
Does your system have limited disk space?
LCS handles disk space more efficiently than STCS: LCS requires about 10% headroom in addition to
the space occupied by the data. In some cases, STCS and DTCS (DateTieredStorageStrategy) require
as much as 50% more headroom than the data space. (DTCS is deprecated.)
Is your system reaching its limits for input and output?
LCS is significantly more input and output intensive than DTCS or STCS. Switching to LCS may
introduce extra input and output load that offsets the advantages.
Configuring and running compaction
Set the table compaction strategy in the CREATE TABLE or ALTER TABLE statement parameters. See
table_options.
You can start compaction manually using the nodetool compact command.
Testing compaction strategies
To test the compaction strategy:

• Create a three-node cluster using one of the compaction strategies, then stress test the cluster using
thecassandra-stress utility and measure the results.

• Set up a node on your existing cluster and enable the write survey mode option on the node to analyze live
data.

NodeSync service
About NodeSync
NodeSync is an easy to use continuous background repair that has low overhead and provides consistent
performance and virtually eliminates manual efforts to run repair operations in a DataStax cluster.

• Continuously validates that data is in sync on all replica.

• Always running but low impact on cluster performance

• Fully automatic, no manual intervention needed

• Completely replace anti-entropy repairs

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
195
Configuration

For write-heavy workloads, where more than 20% of the operations are writes, you may notice CPU
consumption overhead associated with NodeSync. If that's the case for you environment, DataStax
recommends using nodetool repair instead of enabling NodeSync. See nodetool repair.

NodeSync service
By default, each node runs the NodeSync service. The service is idle unless it has something to validate.
NodeSync is enabled on a per table basis. The service continuously validates local data ranges for NodeSync-
enabled tables and repairs any inconsistency found. The local data ranges are split into small segments, which
act as validation save points. Segments are prioritized in order to try to meet the per-table deadline target.
Segments
A segment is a small local token range of a table. NodeSync recursively splits local ranges in half a certain
number of times (depth) to create segments. The depth is calculated using the total table size, assuming equal
distribution of data. Typically segments cover no more than 200 MB. The token ranges can be no smaller than a
single partition, so very large partitions can result in segments larger than the configured size.
Validation process and status
After a segment is selected for validation, NodeSync reads the entirety of the data it covers from all replica
(using paging), checks for inconsistencies, and repairs if needed. When a node validates a segment, it “locks”
it in a system table to avoid work duplication by other nodes. It is not a race-free lock; there is a possibility of
duplicated work which saves the complexity and cost of true distributed locking.
Segment validation is saved on completion in the system_distributed.nodesync_status table, which is used
internally for resuming on failure, prioritization, segment locking, and by tools. It is not meant to be read directly.

• Validation status is:

# successful: All replicas responded and all inconsistencies (if any) were properly repaired.

# full_in_sync: All replica were already in sync.

# full_repaired: Some replica were repaired.

# unsuccessful: Either some replicas did not respond or repairs on inconsistent replicas failed.

# partial_in_sync: Not all replica responded, but all respondents were in sync.

# partial_repaired: Not all replica responded, some that responded were repaired.

# uncompleted: At most 1 node was available/responded; no validation happened.

# failed: Some unexpected errors occurred. (Check the node logs.)


If validation of a large segment is interrupted, increase the amount of redundant work.

Limitations

• For debugging/tuning, understanding of traditional repair will be mostly unhelpful, since NodeSync depends
on the read repair path

• No special optimizations for remote DC - may perform poorly on particularly bad WAN links

• In aggregate, CPU consumption of NodeSync might exceed traditional repair

• NodeSync only makes internal adjustments to try to hit the configured rate - operators must ensure this
configured throughput is sufficient to meet the gc_grace_seconds commitment and can be achieved by the
hardware

Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
196
Configuration

Starting and stopping the NodeSync service


The NodeSync service automatically starts with the dse cassandra command. You can manually start and stop
the service on each node.

1. Verify the status of the NodeSync service:

$ nodetool nodesyncservice status

The output should indicate running.

The NodeSync service is running

2. Disable the NodeSync service:

$ nodetool nodesyncservice disable

On the next restart of DataStax Enterprise (DSE), the NodeSync service will start up.

3. Verify the status of the NodeSync service:

$ nodetool nodesyncservice status

The output should indicate not running.

The NodeSync service is not running

Enabling NodeSync validation


By default, NodeSync is disabled when a table is created. It is also disabled on tables that were migrated from
earlier versions. To continuously verify data consistency in the background without the need for anti-entropy
repairs, enable NodeSync on one or more tables.

Data only needs to be validated if the table is in more than one datacenter or is in a datacenter where the
keyspace has a replication factor or 2 or more.

• Enable on an existing table:

# Change the NodeSync setting on a single table using CQL syntax:

ALTER TABLE table_name WITH


nodesync={'enabled': 'true'};

# All tables in a keyspace using nodesync enable:

$ nodesync enable -v -k keyspace_name "*"

# A list of tables using nodesync enable:

$ nodesync enable keyspace_name.table_name keyspace_name.table_name

• Create a table with nodesync enabled:

CREATE TABLE table_name ( column_list ) WITH

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
197
Configuration

nodesync={'enabled': 'true'};

Tuning NodeSync validations


NodeSync tries to validate all tables within their respective deadlines, while respecting the configured rate limit.
If a table is 10GB and has a deadline_target_sec=10 and the rate_in_kb is set to 1MB/sec, validation will not
happen quickly enough. Configure the rate and deadlines realistically, take data sizes into account and adapt
with data growth.
For write-heavy workloads, where more than 20% of the operations are writes, you may notice CPU
consumption overhead associated with NodeSync. If that's the case for you environment, DataStax
recommends using nodetool repair instead of enabling NodeSync. See nodetool repair.

NodeSync records warnings to the system.log, if it detects any of the following conditions:

• rate_in_kb is too low to validate all tables within their deadline, even under ideal circumstances.

• rate_in_kb cannot be sustained by the node (too high for the node load/hardware).

Setting the NodeSync rate


Estimating rate setting impact
The rate_in_kb sets the per node rate of the local NodeSync service. It controls the maximum number of bytes
per second used to validate data. There is a fundamental tradeoff between how fast NodeSync validates data
and how many resources it consumes. The rate is a limit on the amount of resources used and a target that
NodeSync tries to achieve by auto-tuning internals. The set rate might not be achieved in practice, because
validation can complete at a slower rate on new or small cluster or the node might temporarily or permanently
lack available resources.
Initial rate setting
There is no strong requirement to keep all nodes validating at the same rate. Some nodes will simply validate
more data than others. When setting the rate, use the simplest method first by using the defaults.

1. Check the rate_in_kb setting within the nodesync section in the cassandra.yaml file.

2. Try increasing or decreasing the value at run time:

$ nodetool nodesyncservice setrate value_in_kb_sec

3. Check the configured rate.

$ nodetool nodesyncservice getrate

The configured rate is different from the effective rate, which can be found in the NodeSync Service
metrics.

Simulating NodeSync rates


When adjusting rates, use the NodeSync rate simulator to help determine the configuration settings by
computing the rate necessary to validate all tables within their allowed deadlines.
Unfortunately, no perfect value exists because NodeSync also deals with many unknown or difficult to predict
factors, such as:

• Failures - When a node fails, it does not participate in NodeSync validation while it is offline.

• Temporary overloads - During periods of overload, such as an unexpected events, nodes can not achieve
the configured rate.

• Data size variation - The rate required to repair all tables within a fixed amount of time directly depends on
the size of the data to validate, which is typically a moving target.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
198
Configuration

All these factors can impact the overall NodeSync rate. Therefore build safety margins within the configured
rate. The NodeSyncServiceRate simulator helps to set the rate.
Setting the NodeSync deadline
Each table with NodeSync enabled has a deadline_target_sec property. This is the target for the maximum time
between 2 validations of the same data. As long as the deadline is met, all parts of the ring (for the table) are
validated at least that often.
The deadline (deadline_target_sec) relates to grace period (gc_grace_seconds). The deadline should
always be less than or equal to the grace period. As long as the deadline is met, no data is resurrected due to
tombstone purging.
The deadline defaults to which ever is longer, the grace period or four days. Typically an acceptable default,
unless the table has a grace period of zero. For testing, the deadline value can be less than the grace period.
Verify for a few weeks if a lower gc_grace value is realistic without taking risk before changing it.
NodeSync prioritize segments in order to try to meet the deadline. The next segment to validate at any given
time is the one the closest to missing its deadline. For example, if table 1 has half the deadline of table 2, table
1 validates approximately twice as often as table 2.
Use OpsCenter to get a graphical representation of the NodeSync validation status. See Viewing NodeSync
Status.
The syntax to change the per-table nodesync property:

ALTER TABLE table_name


WITH nodesync = { 'enabled': 'true',
'deadline_target_sec': value };

Manually starting NodeSync validation


Force NodeSync to repair specific segments. After a user validation is submitted, it takes precedence over
normal NodeSync work. Normal work resumes automatically after the validation finishes.
For write-heavy workloads, where more than 20% of the operations are writes, you may notice CPU
consumption overhead associated with NodeSync. If that's the case for you environment, DataStax
recommends using nodetool repair instead of enabling NodeSync. See nodetool repair.

This is an advanced tool. Usually, it is better to let NodeSync prioritize segments on its own.

• Submitting user validations:

$ nodesync validation submit keyspace_name.table_name

• Listing user validations:

$ nodesync validation list

• Canceling user validations:

$ nodesync validation cancel validation_id

See nodesync validation.

Using multiple network interfaces


Steps for configuring DataStax Enterprise for multiple network interfaces or when using different regions in cloud
implementations.
You must configure settings in both the cassandra.yaml file and the relevant property file:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
199
Configuration

• cassandra-rackdc.properties (GossipingPropertyFileSnitch, Ec2Snitch, or Ec2MultiRegionSnitch)

• cassandra-topology.properties (PropertyFileSnitch)

Configuring cassandra.yaml for multiple networks or across regions in cloud


implementations
In multiple networks or cross-region cloud scenarios, communication between datacenters can only take
place using an external IP address. The external IP address is defined in the cassandra.yaml file using the
broadcast_address setting. Configure each node as follows:

1. In the cassandra.yaml file , set the listen_address to the private IP address of the node, and the
broadcast_address to the public address of the node.
This allows nodes to bind to nodes in another network or region, thus enabling multiple datacenter support.
For intra-network or region traffic, DSE switches to the private IP after establishing a connection.

2. Set the addresses of the seed nodes in the cassandra.yaml file to that of the public IP. Private IP are not
routable between networks. For example:

seeds: 50.34.16.33, 60.247.70.52

Do not make all nodes seeds, see Internode communications (gossip).

3. Be sure that the storage_port or ssl_storage_port is open on the public IP firewall.

Be sure to enable encryption and authentication when using public IPs. See Configuring SSL for node-to-node
connections. Another option is to use a custom VPN to have local, inter-region/ datacenter IPs.

Additional cassandra.yaml configuration for non-EC2 implementations


If multiple network interfaces are used in a non-EC2 implementation, enable the listen_on_broadcast_address
option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not automatically enabled. Enabling
listen_on_broadcast_address allows DSE to listen on both listen_address and broadcast_address with
two network interfaces.
Configuring the snitch for multiple networks
External communication between the datacenters can only happen when using the broadcast_address (public IP).
The GossipingPropertyFileSnitch is recommended for production. The cassandra-rackdc.properties file defines
the datacenters used by this snitch. Enable the option prefer_local to ensure that traffic to broadcast_address
will re-route to listen_address.
For each node in the network, specify its datacenter in cassandra-rackdc.properties file.
In the example below, there are two datacenters and each datacenter is named for its workload. The datacenter
naming convention in this example is based on the workload. You can use other conventions, such as DC1, DC2
or 100, 200. (datacenter names are case-sensitive.)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
200
Configuration

Network A Network B

Node and datacenter: Node and datacenter:

• node0 • node0
dc=DC_A_transactional dc=DC_A_transactional
rack=RAC1 rack=RAC1

• node1 • node1
dc=DC_A_transactional dc=DC_A_transactional
rack=RAC1 rack=RAC1

• node2 • node2
dc=DC_B_transactional dc=DC_B_transactional
rack=RAC1 rack=RAC1

• node3 • node3
dc=DC_B_transactional dc=DC_B_transactional
rack=RAC1 rack=RAC1

• node4 • node4
dc=DC_A_analytics dc=DC_A_analytics
rack=RAC1 rack=RAC1

• node5 • node5
dc=DC_A_search dc=DC_A_search
rack=RAC1 rack=RAC1

Configuring the snitch for cross-region communication in cloud implementations


Be sure to use the appropriate snitch for your implementation. If deploying on Amazon EC2, see the
instructions in Ec2MultiRegionSnitch.

In cloud deployments, the region name is treated as the datacenter name and availability zones are treated as
racks within a datacenter. For example, if a node is in the us-east-1 region, us-east is the datacenter name and 1
is the rack location. (Racks are important for distributing replicas, but not for datacenter naming.)
In the example below, there are two DataStax Enterprise datacenters and each datacenter is named for its
workload. The datacenter naming convention in this example is based on the workload. You can use other
conventions, such as DC1, DC2 or 100, 200. (Datacenter names are case-sensitive.)
For each node, specify its datacenter in the cassandra-rackdc.properties. The dc_suffix option defines the
datacenters used by the snitch. Any other lines are ignored.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
201
Configuration

Region: us-east Region: us-west

Node and datacenter: Node and datacenter:

• node0 • node0
dc_suffix=_1_transactional dc_suffix=_1_transactional

• node1 • node1
dc_suffix=_1_transactional dc_suffix=_1_transactional

• node2 • node2
dc_suffix=_2_transactional dc_suffix=_2_transactional

• node3 • node3
dc_suffix=_2_transactional dc_suffix=_2_transactional

• node4 • node4
dc_suffix=_1_analytics dc_suffix=_1_analytics

• node5 • node5
dc_suffix=_1_search dc_suffix=_1_search

This results in four us-east datacenters: This results in four us-west datacenters:

us-east_1_transactional us-west_1_transactional
us-east_2_transactional us-west_2_transactional
us-east_1_analytics us-west_1_analytics
us-east_1_search us-west_1_search

Configuring gossip settings


When a node first starts up, it looks at its cassandra.yaml configuration file to determine the name of the cluster it
belongs to; which nodes (called seeds) to contact to obtain information about the other nodes in the cluster; and
other parameters for determining port and range information.

1. In the cassandra.yaml file, set the following parameters:

Property Description

cluster_name Name of the cluster that this node is joining. Must be the same for every node in the
cluster.

listen_address The IP address or hostname that the database binds to for connecting this node to other
nodes.

listen_interface Use this option instead of listen_address to specify the network interface by name, rather
than address/hostname

(Optional) broadcast_address The public IP address this node uses to broadcast to other nodes outside the network
or across regions in multiple-region EC2 deployments. If this property is commented
out, the node uses the same IP address or hostname as listen_address. A node
does not need a separate broadcast_address in a single-node or single-datacenter
installation, or in an EC2-based network that supports automatic switching between
private and public communication. It is necessary to set a separate listen_address and
broadcast_address on a node with multiple physical network interfaces or other topologies
where not all nodes have access to other nodes by their private IP addresses. For specific
configurations, see the instructions for listen_address. The default is the listen_address.

seed_provider A -seeds list is comma-delimited list of hosts (IP addresses) that gossip uses to learn the
topology of the ring. Every node should have the same list of seeds.
Making every node a seed node is not recommended because of increased
maintenance and reduced gossip performance. Gossip optimization is not critical, but
it is recommended to use a small seed list (approximately three nodes per datacenter).

storage_port The inter-node communication port (default is 7000). Must be the same for every node in
the cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
202
Configuration

Property Description

initial_token For legacy clusters. Set this property for single-node-per-token architecture, in which a
node owns exactly one contiguous range in the ring space.

num_tokens For new clusters. The number of tokens randomly assigned to this node in a cluster that
uses virtual nodes (vnodes).

Configuring the heap dump directory


Analyzing the heap dump file can help troubleshoot memory problems. Java starts with the option -XX:
+HeapDumpOnOutOfMemoryError. Using this option triggers a heap dump in the event of an out-of-memory
condition. The heap dump file consists of references to objects that cause the heap to overflow. By default, the
database puts the file a subdirectory of the working, root directory when running as a service. If the database
does not have write permission to the root directory, the heap dump fails. If the root directory is too small to
accommodate the heap dump, the server crashes.
The DataStax Help Center also provides troubleshooting information.
To ensure that a heap dump succeeds and to prevent crashes, configure a heap dump directory that is:

• Accessible to the database for writing

• Large enough to accommodate a heap dump

Base the size of the directory on the value of the Java -mx option.

Set the location of the heap dump in the cassandra-env.sh file.

1. Open the cassandra-env.sh file for editing.

2. Scroll down to the comment about the heap dump path:

# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR


if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$
$.hprof"
fi

3. On the line after the comment, set the CASSANDRA_HEAPDUMP_DIR to the desired path:

# set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR


export CASSANDRA_HEAPDUMP_DIR=path
if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$
$.hprof"
fi

4. Save the cassandra-env.sh file and restart.

Configuring Virtual Nodes


Virtual node (vnode) configuration
Virtual nodes simplify many tasks in DataStax Enterprise, such as eliminating the need to determine the partition
range (calculate and assign tokens), rebalancing the cluster when adding or removing nodes, and replacing dead
nodes. For a complete description of virtual nodes and how they work, see Virtual nodes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
203
Configuration

DataStax Enterprise requires the same token architecture on all nodes in a datacenter. The nodes must all be
vnode-enabled or single-token architecture. Across the entire cluster, datacenter architecture can vary. For
example, a single cluster with:

• A transaction-only datacenter running OLTP.

• A single-token architecture search datacenter (no vnodes).

• An analytics datacenter with vnodes.

Guidelines for using virtual nodes


• DSE requires the same token architecture on all nodes in a datacenter.
The nodes must all be vnode-enabled or single-token architecture. Across the entire cluster, datacenter
architecture can vary.
For example, a single cluster with:

# A transaction-only datacenter running OLTP.

# A single-token architecture search datacenter (no vnodes).

# An analytics datacenter with vnodes.

• DataStax recommends using 8 vnodes (tokens).


DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured for your
environment.

Using 8 vnodes distributes the workload between systems with a ~10% variance and has minimal impact on
performance.

• Ensure correct vnode configuration with cassandra.yaml settings:

# When adding a vnode to an existing cluster or setting up nodes in a new datacenter,


set the target replication factor (RF) of keyspaces in the datacenter with the
allocate_tokens_for_local_replication_factor option.

# The allocation algorithm distributes the token ranges proportionately using the num_tokens settings.
All systems in the datacenter should have the same num_token settings unless the systems
performance varies between systems. To distribute more of the workload to the higher performance
hardware, increase the number of tokens for those systems.
The allocation algorithm efficiently balances the workload using fewer tokens; when systems are added
to a datacenter, the algorithm maintains the balance. Using a higher number of tokens more evenly
distributes the workload, but also significantly increases token management overhead.
Set the number of vnode tokens based on the workload distribution requirements of the datacenter:
Table 12: Allocation algorithm workload distribution variance
Replication 4 vnode (tokens) 8 vnode (tokens) 64 vnode 128 vnode
factor (tokens) (tokens)

2 ~17.5% ~12.5% ~3% ~1%

3 ~14% ~10% ~2% ~1%

5 ~11% ~7% ~1% ~1%

• Add nodes to the cluster one at a time.


When adding multiple nodes to the cluster using the allocation algorithm, ensure that nodes are added
one at a time. If nodes are added concurrently, the algorithm assigns the same tokens to different nodes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
204
Configuration

Enabling vnodes
In the cassandra.yaml file:

1. Uncomment num_tokens and set the required number of tokens.

2. (Recommended) To use the allocation algorithm uncomment allocate_tokens_for_local_replication_factor


and set it to the target replication factor for the keyspaces in the datacenter. If the replication varies,
alternate between the replication factor (RF) settings.

3. Comment out the initial_token or leave unset.

To upgrade existing clusters to vnodes, see Enabling virtual nodes on an existing production cluster.
Disabling vnodes
If you do not use vnodes, you must make sure that each node is responsible for roughly an equal amount of
data. To ensure that each node is responsible for an equal amount of data, assign each node an initial-token
value and calculate the tokens for each datacenter as described in Generating tokens.

1. In the cassandra.yaml file:

a. Comment out the num_tokens and allocate_tokens_for_local_replication_factor.

b. Uncomment the initial_token and set it to 1 or to the value of a generated token for a multi-node cluster.

Enabling virtual nodes on an existing production cluster


You cannot directly convert a single-token nodes to a vnode. However, you can configure another datacenter
configured with vnodes already enabled and allow automatic distribution to the existing data into the new nodes.
This method has the least impact on performance.

DataStax recommends not using vnodes with DSE Search. However, if you decide to use vnodes with DSE
Search, do not use more than 8 vnodes and ensure that allocate_tokens_for_local_replication_factor option in
cassandra.yaml is correctly configured for your environment.

1. Add a new datacenter to the cluster.

2. Once the new datacenter with vnodes enabled is up, switch your clients to use the new datacenter.

3. Run a full repair with nodetool repair.


This step ensures that after you move the client to the new datacenter that any previous writes are
added to the new datacenter and that nothing else, such as hints, is dropped when you remove the old
datacenter.

4. Update your schema to no longer reference the old datacenter.

5. Remove the old datacenter from the cluster.


See Decommissioning a datacenter.

Logging configuration
Changing logging locations
Logging locations are set at installation. Generally, the default logs location is /var/log. For example, /var/
log/cassandra and /var/log/tomcat.
For details, see Default file locations for package installations and Default file locations for tarball installations.
You can also change logging locations with OpsCenter Configuration Profiles.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
205
Configuration

1. To change logging locations after installation:

• To generate all logs in the same location, add CASSANDRA_LOG_DIR to the dse-env.sh file:

export CASSANDRA_LOG_DIR="/your/log/location"

• For finer-grained control, edit the logback.xml file and replace ${cassandra.logdir} with the path.

2. To change the Tomcat server log locations for DSE Search, edit one of these files:

• Set TOMCAT_LOGS in the cassandra-env.sh file:

export TOMCAT_LOGS="/your/log/location"

• Set the locations in resources/tomcat/conf/logging.properties.

3. After you change logging locations, restart DataStax Enterprise.

Configuring logging
Logging functionality uses Simple Logging Facade for Java (SLF4J) with a logback backend. Logs are written
to the system.log and debug.log in the logging directory. You can configure logging programmatically or
manually. Manual ways to configure logging are:

• Run the nodetool setlogginglevel command.

• Configure the logback-test.xml or logback.xml file installed with DataStax Enterprise.

• Use the JConsole tool to configure logging through JMX.

Logback looks for the logback-test.xml file first, and then for the logback.xml file.
The following example details the XML configuration of the logback.xml file:

<configuration scan="true">
<jmxConfigurator />

<!-- SYSTEMLOG rolling file appender to system.log (INFO level) -->

<appender name="SYSTEMLOG" class="ch.qos.logback.core.rolling.RollingFileAppender">


<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<file>${cassandra.logdir}/system.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${cassandra.logdir}/system.log.%i.zip</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>20</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>20MB</maxFileSize>
</triggeringPolicy>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
</encoder>
</appender>

<!-- DEBUGLOG rolling file appender to debug.log (all levels) -->

<appender name="DEBUGLOG" class="ch.qos.logback.core.rolling.RollingFileAppender">


<file>${cassandra.logdir}/debug.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${cassandra.logdir}/debug.log.%i.zip</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>20</maxIndex>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
206
Configuration

</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>20MB</maxFileSize>
</triggeringPolicy>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
</encoder>
</appender>

<!-- ASYNCLOG assynchronous appender to debug.log (all levels) -->

<appender name="ASYNCDEBUGLOG" class="ch.qos.logback.classic.AsyncAppender">


<queueSize>1024</queueSize>
<discardingThreshold>0</discardingThreshold>
<includeCallerData>true</includeCallerData>
<appender-ref ref="DEBUGLOG" />
</appender>

<!-- STDOUT console appender to stdout (INFO level) -->

<if condition='isDefined("dse.console.useColors")'>
<then>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<withJansi>true</withJansi>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder>
<pattern>%highlight(%-5level) [%thread] %green(%date{ISO8601})
%yellow(%X{service}) %F:%L - %msg%n<$
</encoder>
</appender>
</then>
</if>
<if condition='isNull("dse.console.useColors")'>
<then>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
</encoder>
</appender>
</then>
</if>

<include file="${SPARK_SERVER_LOGBACK_CONF_FILE}"/>
<include file="${GREMLIN_SERVER_LOGBACK_CONF_FILE}"/>

<!-- Uncomment the LogbackMetrics appender and the corresponding appender-ref in the
root to activate
<appender name="LogbackMetrics"
class="com.codahale.metrics.logback.InstrumentedAppender" />
-->

<root level="${logback.root.level:-INFO}">
<appender-ref ref="SYSTEMLOG" />
<appender-ref ref="STDOUT" />
<!-- Comment out the ASYNCDEBUGLOG appender to disable debug.log -->
<appender-ref ref="ASYNCDEBUGLOG" />
<!-- Uncomment LogbackMetrics and its associated appender to enable metric collecting for
logs. -->
<!-- <appender-ref ref="LogbackMetrics" /> -->
<appender-ref ref="SparkMasterFileAppender" />
<appender-ref ref="SparkWorkerFileAppender" />

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
207
Configuration

<appender-ref ref="GremlinServerFileAppender" />


</root>

<!--audit log-->
<appender name="SLF4JAuditWriterAppender"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${cassandra.logdir}/audit/audit.log</file>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
<immediateFlush>true</immediateFlush>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${cassandra.logdir}/audit/audit.log.%i.zip</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>5</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>200MB</maxFileSize>
</triggeringPolicy>
</appender>

<logger name="SLF4JAuditWriter" level="INFO" additivity="false">


<appender-ref ref="SLF4JAuditWriterAppender"/>
</logger>

<appender name="DroppedAuditEventAppender"
class="ch.qos.logback.core.rolling.RollingFileAppender" prudent=$
<file>${cassandra.logdir}/audit/dropped-events.log</file>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
<immediateFlush>true</immediateFlush>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${cassandra.logdir}/audit/dropped-events.log.%i.zip</
fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>5</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>200MB</maxFileSize>
</triggeringPolicy>
</appender>

<logger name="DroppedAuditEventLogger" level="INFO" additivity="false">


<appender-ref ref="DroppedAuditEventAppender"/>
</logger>

<logger name="org.apache.cassandra" level="DEBUG"/>


<logger name="com.datastax.bdp.db" level="DEBUG"/>
<logger name="com.datastax.driver.core.NettyUtil" level="ERROR"/>
<logger name="com.datastax.bdp.search.solr.metrics.SolrMetricsEventListener"
level="DEBUG"/>
<logger name="org.apache.solr.core.CassandraSolrConfig" level="WARN"/>
<logger name="org.apache.solr.core.SolrCore" level="WARN"/>
<logger name="org.apache.solr.core.RequestHandlers" level="WARN"/>
<logger name="org.apache.solr.handler.component" level="WARN"/>
<logger name="org.apache.solr.search.SolrIndexSearcher" level="WARN"/>
<logger name="org.apache.solr.update" level="WARN"/>
<logger name="org.apache.lucene.index" level="INFO"/>
<logger name="com.cryptsoft" level="OFF"/>
<logger name="org.apache.spark.rpc" level="ERROR"/>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
208
Configuration

</configuration>

The appender configurations specify where to print the log and its configuration. Each appender is defined as
appendername="appender", and are described as follows.
SYSTEMLOG
Directs logs and ensures that WARN and ERROR messages are written synchronously to the /var/
log/cassandra/system.log file.
DEBUGLOG | ASYNCDEBUGLOG
Generates the /var/log/cassandra/debug.log file, which contains an asynchronous log of events
written to the system.log file, plus production logging information useful for debugging issues.
STDOUT
Directs logs to the console in a human-readable format.
LogbackMetrics
Records the rate of logged events by their logging level.
SLF4JAuditWriterAppender | DroppedAuditEventAppender
Used by the audit logging functionality. See Setting up database auditing for more information.
The following logging functionality is configurable:

• Rolling policy

# The policy for rolling logs over to an archive

# Location and name of the log file

# Location and name of the archive

# Minimum and maximum file size to trigger rolling

• Format of the message

• The log level

Log levels
The valid values for setting the log level include ALL for logging information at all levels, TRACE through
ERROR, and OFF for no logging. TRACE creates the most verbose log, and ERROR, the least.

• ALL

• TRACE

• DEBUG

• INFO (Default)

• WARN

• ERROR

• OFF

When set to TRACE or DEBUG output appears only in the debug.log. When set to INFO the debug.log is
disabled.

Increasing logging levels can generate heavy logging output on a moderately trafficked cluster.

Use the nodetool getlogginglevels command to see the current logging configuration.

bin\nodetool getlogginglevels
Logger Name Log Level
ROOT INFO

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
209
Configuration

com.thinkaurelius.thrift ERROR

To add debug logging to a class permanently using the logback framework, use nodetool setlogginglevel to
confirm the component or class before setting it in the logback.xml file in installation_location/conf. Modify
to include the following line or similar at the end of the file:

<logger name="org.apache.cassandra.gms.FailureDetector" level="DEBUG"/>

Restart the node to invoke the change.


Migrating to logback from log4j
If you upgrade from an earlier version that used log4j, you can convert log4j.properties files to logback.xml
using the logback PropertiesTranslator web-application.
Using log file rotation
The default policy rolls the system.log file after the size exceeds 20MB. Archives are compressed in zip format.
Logback names the log files system.log.1.zip, system.log.2.zip, and so on. For more information, see
logback documentation.
Enabling extended compaction logging
To configure collection of in-depth information about compaction activity on a node, and write it to a dedicated
log file, see the log_all property for compaction.
Commit log archive configuration
DataStax Enterprise provides commit log archiving and point-in-time recovery. The commit log is archived at
node startup and when a commit log is written to disk, or at a specified point-in-time. You configure this feature in
the commitlog_archiving.properties configuration file.
The commands archive_command and restore_command expect only a single command with arguments. The
parameters must be entered verbatim. STDOUT and STDIN or multiple commands cannot be executed. To
workaround, you can script multiple commands and add a pointer to this file. To disable a command, leave it
blank.

• Archive a commit log segment:

Command archive_command=

Parameters %path Fully qualified path of the segment to archive.

%name Name of the commit log.

Example archive_command=/bin/ln %path /backup/%name

• Restore an archived commit log:

Command restore_command=

Parameters %from Fully qualified path of the archived commitlog segment from the restore_directories.

%to Name of live commit log directory.

Example restore_command=cp -f %from %to

• Set the restore directory location:

Command restore_directories=

Format restore_directories=restore_directory_location

• Restore mutations created up to and including the specified timestamp:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
210
Configuration

Command restore_point_in_time=

Format <timestamp> (YYYY:MM:DD HH:MM:SS)

Example restore_point_in_time=2013:12:11 17:00:00

Restore stops when the first client-supplied timestamp is greater than the restore point timestamp.
Because the order in which the database receives mutations does not strictly follow the timestamp order,
this can leave some mutations unrecovered.

Change Data Capture (CDC) logging


Change Data Capture (CDC) logging captures and tracks data that has changed. CDC logging is configured
per table, with limits on the amount of disk space to consume for storing the CDC logs. CDC logs use the same
binary format as the commit log.
Upon flushing the memtable to disk, CommitLogSegments that contain data for CDC-enabled tables are moved
to the configured cdc_raw directory. After the disk space limit is reached, CDC-enabled tables reject writes until
space is freed.
Prerequisites: Before enabling CDC logging, define a plan for moving and consuming the CDC log information.
DataStax recommends a physical device for the CDC log that is separate from the data directories.

1. Enable CDC logging and configure CDC directories and space in cassandra.yaml.
For example, to enable CDC logging with default values:

cdc_enabled: true
cdc_total_space_in_mb: 4096
cdc_free_space_check_interval_ms: 250
cdc_raw_directory: /var/lib/cassandra/cdc_raw

2. To enable CDC logging for a database table, create or alter the table with the table property.
For example, to enable CDC logging on the cycling table:

ALTER TABLE cycling WITH cdc=true;

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
211
Chapter 5. Initializing a DataStax Enterprise cluster
Complete the following tasks before initializing a DSE cluster.

• Establish a firm understanding of how the database works. Be sure to read at least Understanding the
database architecture and Data replication.

• Ensure the environment is suitable for the use case and workload.

• Review recommended production settings.

• Choose a name for the cluster.

• For a mixed-workload cluster, determine the purpose of each node.

• Determine the snitch and replication strategy. The GossipingPropertyFileSnitch and NetworkTopologyStrategy
are recommended for production environments.

• Obtain the IP address of each node.

• Ensure that DataStax Enterprise is installed on each node.

• Determine which nodes are seed nodes. Do not make all nodes seed nodes.
Seed nodes are not required for DSE Search datacenters, see Internode communications (gossip).

• Review and make appropriate changes to other property files, such as cassandra-rackdc.properties.

• Set virtual nodes correctly for the type of datacenter. DataStax recommends using 8 vnodes (tokens). See
Virtual nodes for more information.

Initializing datacenters
In most circumstances, each workload type, such as search, analytics, and transactional, should be organized
into separate virtual datacenters. Workload segregation avoids contention for resources. However, workloads can
be combined in SearchAnalytics nodes when there is not a large demand for analytics, or when analytics queries
must use a DSE Search index. Generally, combining transactional (OLTP) and analytics (OLAP) workloads
results in decreased performance.
When creating a keyspace using CQL, DataStax Enterprise creates a virtual datacenter for a cluster, even a one-
node cluster, automatically. You assign nodes that run the same type of workload to the same datacenter. The
separate, virtual datacenters for different types of nodes segregate workloads that run DSE Search from those
nodes that run other workload types.
Single datacenters per workload type
If using a single, physical datacenter, single datacenter deployments are useful.
Multiple datacenters per workload type
If using multiple datacenters, consider multiple datacenter deployments.
The following scenarios describe some benefits of using multiple, physical datacenters:

• Isolating replicas from external infrastructure failures, such as networking between datacenters and power
outages.

• Distributing data replication across multiple, geographically-dispersed nodes.

• Adding separation between different physical racks in a physical datacenter.

• Diversifying assets between public cloud providers and on-premise managed datacenters.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
212
Initializing a DataStax Enterprise cluster

• Preventing the slow down of a real-time analytics cluster by a development cluster running analytics jobs on
live data.

• Using virtual datacenters in the physical datacenter to ensure reads from a specific datacenter is local to the
requests, especially when using a consistency level greater than ONE. This strategy ensures lower latency
because it avoids reads from one node in New York and another read from a node in Los Angeles.

Initializing a single datacenter per workload type


In this scenario, a mixed workload cluster has only one datacenter for each type of workload. For example, an
eight-node cluster with the following nodes would use three datacenters, one for each workload type:

• DC1 = 3 DSE Analytics nodes

• DC2 = 3 Transactional nodes

• DC3 = 2 DSE Search nodes

In contrast, a multiple datacenter cluster has more than one datacenter for each type of workload.
The eight-node cluster spans two racks across three datacenters. Applications in each datacenter will use a
default consistency level of LOCAL_QUORUM. One node per rack will serve as a seed node.

Node IP address Type Seed Rack

node0 110.82.155.0 Transactional # RAC1

node1 110.82.155.1 Transactional RAC1

node2 110.54.125.1 Transactional RAC2

node3 110.54.125.2 Analytics RAC1

node4 110.54.155.2 Analytics # RAC2

node5 110.82.155.3 Analytics RAC1

node6 110.54.125.3 Search RAC1

node7 110.82.155.4 Search RAC2

Prerequisites:
To prepare the environment, complete the prerequisite tasks outlined in Initializing a DataStax Enterprise
cluster.

If the new datacenter uses existing nodes from another datacenter or cluster, complete the following steps to
ensure that old data will not interfere with the new cluster:

1. If the nodes are behind a firewall, open the required ports for internal/external communication.

2. Decommission each node that will be added to the new datacenter.

3. Clear the data from DataStax Enterprise (DSE) to completely remove application directories.

4. Install DSE on each node.

1. Complete the following steps to prevent client applications from prematurely connecting to the new
datacenter, and to ensure that the consistency level for reads or writes does not query the new datacenter:

If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.

a. Configure client applications to use the DCAwareRoundRobinPolicy.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
213
Initializing a DataStax Enterprise cluster

b. Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter,
which might not have any data.

c. If using the QUORUM consistency level, change to LOCAL_QUORUM.

d. If using the ONE consistency level, set to LOCAL_ONE.

See the programming instructions for your driver.

2. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.

a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.

ALTER KEYSPACE keyspace_name WITH REPLICATION =


{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3};

• DSE security: system_auth, dse_security

• DSE performance: dse_perf

• DSE analytics: dse_leases, dsefs

• System resources: system_traces, system_distributed

• OpsCenter (if installed)

• All keyspaces created by users

b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.

DESCRIBE SCHEMA ;

CREATE KEYSPACE dse_perf WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_leases WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dsefs WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_security WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;

3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.

Use the same version of DSE on all nodes in the cluster.

4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
214
Initializing a DataStax Enterprise cluster

Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.

a. Configure node properties:

• -seeds: internal_IP_address of each seed node


Include at least one seed node from each datacenter. DataStax recommends more than
one seed node per datacenter, in more than one rack. Do not make all nodes seed
nodes.

• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.

• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.

• endpoint_snitch: snitch
See endpoint_snitch and snitches.

Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.

Snitch Configuration file

GossipingPropertyFileSnitch cassandra-rackdc.properties file

Amazon EC2 single-region snitch

Amazon EC2 multi-region snitch

Google Cloud Platform snitch

PropertyFileSnitch cassandra-topology.properties file

• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.

b. Configure node architecture (all nodes in the datacenter must use the same type):
Virtual node (vnode) allocation algorithm settings

• Set num_tokens to 8 (recommended).

• Set allocate_tokens_for_local_replication_factor to the target replication factor for keyspaces


in the new datacenter. If the keyspace RF varies, alternate the settings to use all the
replication factors.

• Comment out the initial_token property.

DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
215
Initializing a DataStax Enterprise cluster

For more information, refer to Virtual node (vnode) configuration.


Single-token architecture settings

• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.

• Comment out both num_tokens and allocate_tokens_for_local_replication_factor.

5. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties


(PropertyFileSnitch) file, assign datacenter and rack names to the IP addresses of each node, and assign a
default datacenter name and rack name for unknown nodes.

Migration information: The GossipingPropertyFileSnitch always loads cassandra-


topology.properties when the file is present. Remove the file from each node on any new cluster,
or any cluster migrated from the PropertyFileSnitch.

# Transactional Node IP=Datacenter:Rack


110.82.155.0=DC_Transactional:RAC1
110.82.155.1=DC_Transactional:RAC1
110.54.125.1=DC_Transactional:RAC2
110.54.125.2=DC_Analytics:RAC1
110.54.155.2=DC_Analytics:RAC2
110.82.155.3=DC_Analytics:RAC1
110.54.125.3=DC_Search:RAC1
110.82.155.4=DC_Search:RAC2

# default for unknown nodes


default=DC1:RAC1

After making any changes in the configuration files, you must the restart the node for the changes to
take effect.

6. Make the following changes in the existing datacenters.

a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.

b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.

7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:

• Package installations: Starting DataStax Enterprise as a service

• Tarball installations: Starting DataStax Enterprise as a stand-alone process

8. Rotate starting DSE through the racks until all the nodes are up.

9. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.

ALTER KEYSPACE keyspace_name WITH REPLICATION =

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
216
Initializing a DataStax Enterprise cluster

{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3, 'NewDC2' : 2};

If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.

10. Run nodetool rebuild on each node in the new datacenter, specifying the datacenter to rebuild from. This
step replicates the data to the new datacenter in the cluster.

$ nodetool rebuild -- datacenter_name

You must specify an existing datacenter in the command line, or the new nodes will appear to rebuild
successfully, but might not contain all anticipated data.
Requests to the new datacenter with LOCAL_ONE or ONE consistency levels can fail if the existing
datacenters are not completely in-sync.

a. Use nodetool rebuild on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the existing cluster.

b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.

11. Check that your cluster is up and running:

$ dsetool status

If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.

12. Complete 3 through 11 to add the third datacenter (DC3) to the cluster.

The datacenters in the cluster are now replicating with each other.

DC: Cassandra Workload: Cassandra Graph: no


==============================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 110.82.155.0 21.33 KB 256 33.3% a9fa31c7-f3c0-... RAC1
UN 110.82.155.1 21.33 KB 256 33.3% f5bb416c-db51-... RAC1
UN 110.54.125.1 21.33 KB 256 16.7% b836748f-c94f-... RAC2

DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.2 28.44 KB 13.0.% e2451cdf-f070- ... -922337.... RAC1
UN 110.82.155.2 44.47 KB 16.7% f9fa427c-a2c5- ... 30745512... RAC2
UN 110.82.155.3 54.33 KB 23.6% b9fc31c7-3bc0- ..- 45674488... RAC1

DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.3 15.44 KB 50.2.% e2451cdf-f070- ... 9243578.... RAC1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
217
Initializing a DataStax Enterprise cluster

UN 110.82.155.4 18.78 KB 49.8.% e2451cdf-f070- ... 10000 RAC2

Initializing multiple datacenters per workload type


In this scenario, a mixed workload cluster has more than one datacenter for each type of workload. For example,
the following ten-node cluster is spans five datacenters, whereas a single datacenter cluster has only one
datacenter for each node type.

• DC1 = 2 DSE Analytics nodes

• DC2 = 2 Transactional nodes

• DC3 = 2 DSE Search nodes

• DC4 = 2 DSE Analytics nodes

• DC5 = 2 Transactional nodes

The ten-node cluster spans two racks across five datacenters. Applications in each datacenter will use a default
consistency level of LOCAL_QUORUM. One node per rack will serve as a seed node.

Node IP address Type Seed

node0 110.82.155.0 Transactional # RAC1

node1 110.82.155.1 Transactional RAC1

node2 110.54.125.1 Transactional RAC2

node3 110.55.120.1 Transactional RAC1

node4 110.54.125.2 Analytics RAC1

node5 110.54.155.2 Analytics # RAC2

node6 110.82.155.3 Analytics RAC1

node7 110.55.120.2 Analytics RAC1

node8 110.54.125.3 Search RAC1

node9 110.82.155.4 Search RAC2

Prerequisites:
Complete the prerequisite tasks outlined in Initializing a DataStax Enterprise cluster to prepare the
environment.

If the new datacenter uses existing nodes from another datacenter or cluster, complete the following steps to
ensure that old data will not interfere with the new cluster:

1. If the nodes are behind a firewall, open the required ports for internal/external communication.

2. Decommission each node that will be added to the new datacenter.

3. Clear the data from DataStax Enterprise (DSE) to completely remove application directories.

4. Install DSE on each node.

1. Complete the following steps to prevent client applications from prematurely connecting to the new
datacenter, and to ensure that the consistency level for reads or writes does not query the new datacenter:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
218
Initializing a DataStax Enterprise cluster

If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.

a. Configure client applications to use the DCAwareRoundRobinPolicy.

b. Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter,
which might not have any data.

c. If using the QUORUM consistency level, change to LOCAL_QUORUM.

d. If using the ONE consistency level, set to LOCAL_ONE.

See the programming instructions for your driver.

2. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.

a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.

ALTER KEYSPACE keyspace_name WITH REPLICATION =


{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3};

• DSE security: system_auth, dse_security

• DSE performance: dse_perf

• DSE analytics: dse_leases, dsefs

• System resources: system_traces, system_distributed

• OpsCenter (if installed)

• All keyspaces created by users

b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.

DESCRIBE SCHEMA ;

CREATE KEYSPACE dse_perf WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_leases WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dsefs WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_security WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;

3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
219
Initializing a DataStax Enterprise cluster

Use the same version of DSE on all nodes in the cluster.

4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.

Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.

a. Configure node properties:

• -seeds: internal_IP_address of each seed node


Include at least one seed node from each datacenter. DataStax recommends more than
one seed node per datacenter, in more than one rack. Do not make all nodes seed
nodes.

• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.

• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.

• endpoint_snitch: snitch
See endpoint_snitch and snitches.

Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.

Snitch Configuration file

GossipingPropertyFileSnitch cassandra-rackdc.properties file

Amazon EC2 single-region snitch

Amazon EC2 multi-region snitch

Google Cloud Platform snitch

PropertyFileSnitch cassandra-topology.properties file

• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.

b. Configure node architecture (all nodes in the datacenter must use the same type):
Virtual node (vnode) allocation algorithm settings

• Set num_tokens to 8 (recommended).

• Set allocate_tokens_for_local_replication_factor to the target replication factor for keyspaces


in the new datacenter. If the keyspace RF varies, alternate the settings to use all the
replication factors.

• Comment out the initial_token property.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
220
Initializing a DataStax Enterprise cluster

DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.
For more information, refer to Virtual node (vnode) configuration.
Single-token architecture settings

• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.

• Comment out both num_tokens and allocate_tokens_for_local_replication_factor.

5. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties


(PropertyFileSnitch) file, assign datacenter and rack names to the IP addresses of each node, and assign a
default datacenter name and rack name for unknown nodes.

Migration information: The GossipingPropertyFileSnitch always loads cassandra-


topology.properties when the file is present. Remove the file from each node on any new cluster,
or any cluster migrated from the PropertyFileSnitch.

# Transactional Node IP=Datacenter:Rack


110.82.155.0=DC_Transactional:RAC1
110.82.155.1=DC_Transactional:RAC1
110.54.125.1=DC_Transactional:RAC2
110.54.125.2=DC_Analytics:RAC1
110.54.155.2=DC_Analytics:RAC2
110.82.155.3=DC_Analytics:RAC1
110.54.125.3=DC_Search:RAC1
110.82.155.4=DC_Search:RAC2

# default for unknown nodes


default=DC1:RAC1

After making any changes in the configuration files, you must the restart the node for the changes to
take effect.

6. Make the following changes in the existing datacenters.

a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.

b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.

7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:

• Package installations: Starting DataStax Enterprise as a service

• Tarball installations: Starting DataStax Enterprise as a stand-alone process

8. Rotate starting DSE through the racks until all the nodes are up.

9. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.

ALTER KEYSPACE keyspace_name WITH REPLICATION =

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
221
Initializing a DataStax Enterprise cluster

{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3, 'NewDC2' : 2};

If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.

10. Run nodetool rebuild on each node in the new datacenter, specifying the datacenter to rebuild from. This
step replicates the data to the new datacenter in the cluster.

$ nodetool rebuild -- datacenter_name

You must specify an existing datacenter in the command line, or the new nodes will appear to rebuild
successfully, but might not contain all anticipated data.
Requests to the new datacenter with LOCAL_ONE or ONE consistency levels can fail if the existing
datacenters are not completely in-sync.

a. Use nodetool rebuild on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the existing cluster.

b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.

11. Check that your cluster is up and running:

$ dsetool status

If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.

12. Complete 3 through 11 to add the remaining datacenters to the cluster.

The datacenters in the cluster are now replicating with each other.

DC: Cassandra Workload: Cassandra Graph: no


==============================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 110.82.155.0 21.33 KB 256 50.2% a9fa31c7-f3c0-... RAC1
UN 110.82.155.1 21.33 KB 256 49.8% f5bb416c-db51-... RAC1

DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.2 28.44 KB 50.2.% e2451cdf-f070- ... -922337.... RAC1
UN 110.82.155.2 44.47 KB 49.8% f9fa427c-a2c5- ... 30745512... RAC2

DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.3 15.44 KB 50.2.% e2451cdf-f070- ... 9243578.... RAC1
UN 110.82.155.4 18.78 KB 49.8.% e2451cdf-f070- ... 10000 RAC2

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
222
Initializing a DataStax Enterprise cluster

DC: Cassandra2 Workload: Cassandra Graph: no


==============================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 110.54.125.1 21.33 KB 256 16.7% b836748f-c94f-... RAC2
UN 110.55.120.1 21.33 KB 256 16.7% b354798g-c94f-... RAC2

DC: Analytics2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.82.155.3 54.33 KB 50.2% b9fc31c7-3bc0- ..- 45674488... RAC1
UN 110.55.120.2 54.33 KB 49.8% b8gd45e4-3bc0- ..- 45674488... RAC2

What's next:

• Initializing single-token architecture datacenters

• Configuring the security keyspaces replication factors

Setting seed nodes for a single datacenter


This overview is a simple example of setting seed nodes for a new datacenter with 5 nodes.
About seed nodes:

• A seed node is used to bootstrap the gossip process for new nodes joining a cluster.

• To learn the topology of the ring, a joining node contacts one of the nodes in the -seeds list in
cassandra.yaml.

• The first time you bring up a node in a new cluster, only one node is the seed node.

• The seeds list is a comma delimited list of addresses. Since this example cluster includes 5 nodes, you must
change the list from the default value "127.0.0.1" to the IP address of one of the nodes.

• After all nodes are added, all nodes in the datacenter must be configured to use the same seed nodes.

Preventing problems in gossip communications


To prevent problems in gossip communications, be sure to use the same list of seed nodes for all nodes in a
cluster. This is most critical the first time a node starts up. By default, a node remembers other nodes it has
gossiped with between subsequent restarts. The seed node designation has no purpose other than bootstrapping
the gossip process for new nodes joining the cluster. Seed nodes are not a single point of failure, nor do they
have any other special purpose in cluster operations beyond the bootstrapping of nodes.

Making every node a seed node is not recommended because of increased maintenance and reduced gossip
performance. Gossip optimization is not critical, but it is recommended to use a small seed list (approximately
three nodes per datacenter).

This single datacenter example has 5 nodes, where nodeA, nodeB, and nodeC are seed nodes.

Node IP address Seed

nodeA 110.82.155.0 #

nodeB 110.82.155.1 #

nodeC 110.54.125.1 #

nodeD 110.54.125.2

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
223
Initializing a DataStax Enterprise cluster

Node IP address Seed

nodeE 110.54.155.2

1. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.

Use the same version of DSE on all nodes in the cluster.

2. For nodeA, nodeB, and nodeC, configure only nodeA as seed node:

a. In cassandra.yaml:

seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: 110.82.155.0

3. Start the seed nodes one at a time nodeA, nodeB, and then nodeC.

4. For nodeA, nodeB, and nodeC, change cassandra.yaml to configure nodeA, nodeB, and nodeC as seed
nodes:

a. In cassandra.yaml:

seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: 110.82.155.0, 110.82.155.1, 110.54.125.1

You do not need to restart nodeA, nodeB, or nodeC after changing the seed node entry in
cassandra.yaml; the nodes will reread the seed nodes.

5. For nodeD and nodeE, change cassandra.yaml to configure nodeA, nodeB, and nodeC as seed nodes:

a. In cassandra.yaml:

seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: 110.82.155.0, 110.82.155.1, 110.54.125.1

6. Start nodeD and nodeE.


Result: All nodes in the datacenter have the same seed nodes: nodeA, nodeB, and nodeC.

Use cases for listen address


Correct cassandra.yaml listen_address settings for various use cases.

• Never set listen_address to 0.0.0.0.

• Set listen_address or listen_interface, do not set both.

• Single-node installations: do one of the following:

# Comment out the listen_address property. If the node is properly configured (host name, name
resolution, and so on), the database uses InetAddress.getLocalHost() to get the local address from
the system.

# Leave the default setting, localhost.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
224
Initializing a DataStax Enterprise cluster

• Node in a multi-node installations: set the listen_address property to the node's IP address or hostname,
or set listen_interface.

• Node in a multi-network or multi-datacenter installation, within an EC2 environment that supports


automatic switching between public and private interfaces: set listen_address to the node's IP address
or hostname, or set listen_interface.

• Node with two physical network interfaces in a multi-datacenter installation or cluster deployed
across multiple Amazon EC2 regions using the Ec2MultiRegionSnitch:

1. Set listen_address to this node's private IP or hostname, or set listen_interface (for communication
within the local datacenter).

2. Set broadcast_address to the second IP or hostname (for communication between datacenters).

3. Set listen_on_broadcast_address to true.

4. If this node is a seed node, add the node's public IP address or hostname to the seeds list.

• Open the storage_port or ssl_storage_port on the public IP firewall.

Initializing single-token architecture datacenters


Follow these steps only when not using virtual nodes (vnodes).
In most circumstances, each workload type, such as search, analytics, and transactional, should be organized
into separate virtual datacenters. Workload segregation avoids contention for resources. However, workloads can
be combined in SearchAnalytics nodes when there is not a large demand for analytics, or when analytics queries
must use a DSE Search index. Generally, combining transactional (OLTP) and analytics (OLAP) workloads
results in decreased performance.
When creating a keyspace using CQL, DataStax Enterprise creates a virtual datacenter for a cluster, even a one-
node cluster, automatically. You assign nodes that run the same type of workload to the same datacenter. The
separate, virtual datacenters for different types of nodes segregate workloads that run DSE Search from those
nodes that run other workload types.
Prerequisites:
Complete the tasks outlined in Initializing a DataStax Enterprise cluster to prepare the environment.

These steps provide information about setting up a cluster having one or more datacenters.

1. Suppose you install DataStax Enterprise on these nodes:

• node0 10.168.66.41 (seed1)

• node1 10.176.43.66

• node2 10.168.247.41

• node3 10.176.170.59 (seed2)

• node4 10.169.61.170

• node5 10.169.30.138

2. Calculate the token assignments as described in Calculating tokens for single-token architecture nodes.
The following tables list tokens for a 6 node cluster with a single datacenter or two datacenters.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
225
Initializing a DataStax Enterprise cluster

Table 13: Single Datacenter


Node Token

node0 0

node1 21267647932558653966460912964485513216

node2 42535295865117307932921825928971026432

node3 63802943797675961899382738893456539648

node4 85070591730234615865843651857942052864

node5 106338239662793269832304564822427566080

Table 14: Multiple Datacenters


Node Token Offset Datacenter

node0 0 NA DC1

node1 56713727820156410577229101238628035242 NA DC1

node2 113427455640312821154458202477256070485 NA DC1

node3 100 100 DC2

node4 56713727820156410577229101238628035342 100 DC2

node5 113427455640312821154458202477256070585 100 DC2

3. If the nodes are behind a firewall, open the required ports for internal/external communication.

4. If DataStax Enterprise is running, stop the node and clear the data:

• Package installations: To stop DSE:

$ sudo service dse stop

To remove data from the default directories:

$ sudo rm -rf /var/lib/cassandra/*

• Tarball installations:
From the installation location, stop the database:

$ bin/dse cassandra-stop

Remove all data:

$ cd /var/lib/cassandra/data && sudo rm -rf data/* commitlog/* saved_caches/*


hints/*

5. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.

Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
226
Initializing a DataStax Enterprise cluster

a. Configure node properties.

• initial_token: token_value_from_calculation

• num_tokens: 1

• -seeds: internal_IP_address of each seed node


Include at least one seed node from each datacenter. DataStax recommends more than
one seed node per datacenter. Do not make all nodes seed nodes.

• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.

• auto_bootstrap: false
Add the bootstrap setting only when initializing a new cluster with no data.

• endpoint_snitch: snitch
See endpoint_snitch and snitches.

Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.

Snitch Configuration file

GossipingPropertyFileSnitch cassandra-rackdc.properties file

Configuring the Amazon EC2 single-region snitch

Configuring Amazon EC2 multi-region snitch

Configuring the Google Cloud Platform snitch

PropertyFileSnitch cassandra-topology.properties file

• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.

6. Set the properties in the dse.yaml file as required by your use case.

7. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties


(PropertyFileSnitch) file, assign datacenter and rack names to the IP addresses of each node, and assign a
default datacenter name and rack name for unknown nodes.

Migration information: The GossipingPropertyFileSnitch always loads cassandra-


topology.properties when the file is present. Remove the file from each node on any new cluster, or
any cluster migrated from the PropertyFileSnitch.

# Transactional Node IP=Datacenter:Rack


110.82.155.0=DC_Transactional:RAC1
110.82.155.1=DC_Transactional:RAC1
110.54.125.1=DC_Transactional:RAC2
110.54.125.2=DC_Analytics:RAC1
110.54.155.2=DC_Analytics:RAC2
110.82.155.3=DC_Analytics:RAC1
110.54.125.3=DC_Search:RAC1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
227
Initializing a DataStax Enterprise cluster

110.82.155.4=DC_Search:RAC2

# default for unknown nodes


default=DC1:RAC1

After making any changes in the configuration files, you must the restart the node for the changes to take
effect.

8. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a time,
and then start the rest of the nodes:

• Package installations: Starting DataStax Enterprise as a service

• Tarball installations: Starting DataStax Enterprise as a stand-alone process

9. Check that your cluster is up and running:

$ dsetool status

If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.

Datacenter: Cassandra
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 110.82.155.0 21.33 KB 256 33.3% a9fa31c7-f3c0-... RAC1
UN 110.82.155.1 21.33 KB 256 33.3% f5bb416c-db51-... RAC1
UN 110.82.155.2 21.33 KB 256 16.7% b836748f-c94f-... RAC1

Calculating tokens for single-token architecture nodes


This page contains information on manually calculating tokens.
DataStax recommends using Lifecycle Manager in DSE OpsCenter instead.
About single-token architecture
Use single-token architecture when not using virtual nodes (vnodes). See Guidelines for using virtual nodes. You
do not need to calculate tokens when using vnodes.
When you start a DataStax Enterprise cluster without vnodes, you must ensure that the data is evenly divided
across the nodes in the cluster using token assignments and that no two nodes share the same token even if
they are in different datacenters. Tokens are hash values that partitioners use to determine where to store rows
on each node. This value determines the node's position in the ring and what data the node is responsible for.
Each node is responsible for the region of the cluster between itself (inclusive) and its predecessor (exclusive).
As a simple example, if the range of possible tokens is 0 to 100 and there are four nodes, the tokens for the
nodes are: 0, 25, 50, 75. This division ensures that each node is responsible for an equal range of data. For
more information, see Data distribution overview.
Before starting each node in the cluster for the first time, comment out the num_token property and assign an
initial_token value in the cassandra.yaml configuration file.
Using the Token generator
Use token-generator tool for

• Calculating tokens for a single datacenter with one rack

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
228
Initializing a DataStax Enterprise cluster

• Calculating tokens for a single datacenter with multiple racks

• Calculating tokens for a multiple datacenter cluster

• Calculating tokens when adding or replacing nodes/datacenters

Usage:

• Package installations:

$ token-generator num_of_nodes_in_dc ... [options]

• Tarball installations:

$ installation_location/resources/cassandra/tools/bin/token-generator num_of_nodes_in_dc
... [options]

If no options are entered, Token Generator Interactive Mode is invoked.

Options Description

Help Show help.

• -h

• --help

Partitioner Specify the partitioner:


63
• --murmur3 • Murmur3Partitioner uses a maximum possible range of hash values from -2 to
63
+2 -1. Default partitioner if not specified.
• --random
127
• Random partitioner uses a range from 0 to 2 -1. Default partitioner before
DataStax Enterprise 3.1/Apache Cassandra™ 1.2.

Offset token values Use when adding or replacing dead nodes or datacenters.

• --ringoffset offset

Ring range Specify token values within a specified range.

• --ringrange range_start range_end

Test Displays various ring arrangements and generates an HTML file showing these
arrangements.
• --test

Calculating tokens for a single datacenter with one rack


Example:

$ token-generator 6 DC #1: Node #1: -9223372036854775808 Node #2: -6148914691236517206


Node #3: -3074457345618258604 Node #4: -2 Node #5: 3074457345618258600 Node #6:
6148914691236517202

Calculating tokens for a single datacenter with multiple racks


DataStax recommends that each rack have the same number of nodes so you can alternate the rack
assignments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
229
Initializing a DataStax Enterprise cluster

1. Calculate the tokens:

$ token-generator 8 DC #1: Node #1: -9223372036854775808 Node #2: -6917529027641081856


Node #3: -4611686018427387904 Node #4: -2305843009213693952 Node #5: 0 Node #6:
2305843009213693952 Node #7: 4611686018427387904 Node #8: 6917529027641081856

2. Assign the tokens to nodes on alternating racks in the cassandra-rackdc.properties or the cassandra-
topology.properties file.

Figure 1: Alternating rack assignments

Calculating tokens for a multiple datacenter cluster


Do not use SimpleStrategy for this type of cluster. You must use the NetworkTopologyStrategy. This strategy
determines replica placement independently within each datacenter.
Example:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
230
Initializing a DataStax Enterprise cluster

1. Calculate the tokens:

$ token-generator 4 4 DC #1: Node #1: -9223372036854775808 Node #2:


-4611686018427387904 Node #3: 0 Node #4: 4611686018427387904 DC #2: Node #1:
-4690182801719768975 Node #2: -78496783292381071 Node #3: 4533189235135006833 Node #4:
9144875253562394737

2. After calculating the tokens, assign the tokens so that the nodes in each datacenter are evenly dispersed
around the ring.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
231
Initializing a DataStax Enterprise cluster

Figure 2: Token position and datacenter assignments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
232
Initializing a DataStax Enterprise cluster

Datacenter 1

Datacenter 2

TokenToken position
position

3. Alternate the rack assignments as described above.

Calculating tokens when adding or replacing nodes/datacenters


To avoid token collisions, use the --ringoffset option.

1. Calculate the tokens with the offset:

$ token-generator 3 2 --ringoffset 100

The results show the generated token values for the Murmur3Partitioner for one datacenter with 3 nodes
and one datacenter with 2 nodes with an offset:

DC #1:
Node #1: 6148914691236517105
Node #2: 12297829382473034310
Node #3: 18446744073709551516
DC #2:
Node #1: 9144875253562394637
Node #2: 18368247290417170445

The value of the offset is for the first node and all other nodes are calculated for even distribution from the
offset.
The tokens without the offset are:

$ token-generator 3 2 DC #1: Node #1: -9223372036854775808 Node #2:


-3074457345618258603 Node #3: 3074457345618258602 DC #2: Node #1: -78496783292381071
Node #2: 9144875253562394737

2. After calculating the tokens, assign the tokens so that the nodes in each datacenter are evenly dispersed
around the ring and alternate the rack assignments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
233
Chapter 6. Security
For securing DataStax Enterprise 6.0, see the DataStax Security Guide.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
234
Chapter 7. Using DataStax Enterprise advanced
functionality
Information on using DSE Analytics, DSEFS, DSE Search, DSE Graph, DSE Advanced Replication, DSE In-
Memory, DSE Multi-Instance, DSE Tiered Storage and DSE Performance services.

DSE Analytics
DataStax Enterprise (DSE) integrates real-time and batch operational analytics capabilities with an enhanced
version of Apache Spark™. With DSE Analytics you can easily generate ad-hoc reports, target customers with
personalization, and process real-time streams of data. The analytics toolset lets you write code once and then
use it for both real-time and batch workloads.
About DSE Analytics
DataStax Enterprise (DSE) integrates real-time and batch operational analytics capabilities with an enhanced
version of Apache Spark™. With DSE Analytics you can easily generate ad-hoc reports, target customers with
personalization, and process real-time streams of data. The analytics toolset lets you write code once and then
use it for both real-time and batch workloads.
DSE Analytics jobs can use the DataStax Enterprise File System (DSEFS) to handle the large data sets typical
of analytic processing. DSEFS replaces CFS (Cassandra File System).
DSE Analytics features
No single point of failure
DSE Analytics supports a peer-to-peer, distributed cluster for running Spark jobs. Being peers, any
node in the cluster can load data files, and any analytics node can assume the responsibilities of Spark
Master.
Spark Master management
DSE Analytics provides automatic Spark Master management.
Analytics without ETL
Using DSE Analytics, you run Spark jobs directly against data in the database. You can perform real-
time and analytics workloads at the same time without one workload affecting the performance of the
other. Starting some cluster nodes as Analytics nodes and others as pure transactional real-time nodes
automatically replicates data between nodes.
DataStax Enterprise file system (DSEFS)
DSEFS (DataStax Enterprise file system) is a fault-tolerant, general-purpose, distributed file system
within DataStax Enterprise. It is designed for use cases that need to leverage a distributed file system
for data ingestion, data staging, and state management for Spark Streaming applications (such
as checkpointing or write-ahead logging). DSEFS is similar to HDFS, but avoids the deployment
complexity and single point of failure typical of HDFS. DSEFS is HDFS-compatible and is designed to
work in place of HDFS in Spark and other systems.
DSE Analytics Solo
DSE Analytics Solo datacenters are devoted entirely to DSE Analytics processing, for deployments that
require separation of analytics jobs from transactional data.
Integrated security
DSE Analytics uses the advanced security features of DSE, simplifying configuration and deployment.
AlwaysOn SQL
AlwaysOn SQL is a highly-available service that provides JDBC and ODBC interfaces to applications
accessing DSE Analytics data.
Enabling DSE Analytics
To enable Anayltics, follow the architecture guidelines for choosing a workload type for the datacenters in the
cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
235
Using DataStax Enterprise advanced functionality

Setting the replication factor for analytics keyspaces


Keyspaces and tables are automatically created when DSE Analytics nodes are started for the first time. The
replication factor must be adjusted for these keyspaces in order for the analytics features to work properly and to
avoid data loss.
The keyspaces used by DSE Analytics are the following:

• dse_analytics

• dse_leases

• dsefs

• "HiveMetaStore"

All analytics keyspaces are initially created with the SimpleStrategy replication strategy and a replication
factor (RF) of 1. Each of these must be updated in production environments to avoid data loss. After starting
the cluster, alter the keyspace to use the NetworkTopologyStrategy replication strategy with an appropriate
settings for the replication factor and datacenters. For most environments using DSE Analytics, a suitable
replication factor will be either 3 or the cluster size, whichever is smaller.
For example, use a CQL statement to configure the dse_leases keyspace for a replication factor of 3 in both
DC1 and DC2 datacenters using NetworkTopologyStrategy:

ALTER KEYSPACE dse_leases


WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'DC1': '3',
'DC2': '3'
};

Only replicate DSE Analytics keyspaces to other DSE Analytics datacenters. DSEFS does not support
replication to other datacenters, and the dsefs keyspace only contains metadata, not the data stored in
DSEFS. Each DSE Analytics datacenter should have its own DSEFS instance.

The datacenter name used is case-sensitive. If needed, use the dsetool status command to confirm the exact
datacenter spelling.
After adjusting the replication factor, nodetool repair must be run on each node in the affected datacenters.
For example to repair the altered keyspace dse_leases:

$ nodetool repair -full dse_leases

Repeat the above steps for each of the analytics keyspaces listed above. For more information see Changing
keyspace replication strategy.
DSE Analytics and Search integration
An integrated DSE SearchAnalytics cluster allows analytics jobs to be performed using CQL queries. This
integration allows finer-grained control over the types of queries that are used in analytics workloads, and
improves performance by reducing the amount of data that is processed. However, a DSE SearchAnalytics
cluster does not provide workload isolation and there are no detailed guidelines for provisioning and performance
in production environments.
Nodes that are started in SearchAnalytics mode allow you to create analytics queries that use DSE Search
indexes. These queries return RDDs that are used by Spark jobs to analyze the returned data.
The following code shows how to use a DSE Search query from the DSE Spark console.

val table = sc.cassandraTable("music","solr")


val result = table.select("id","artist_name")
.where("solr_query='artist_name:Miles*'")

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
236
Using DataStax Enterprise advanced functionality

.take(10)

You can use Spark Spark Datasets/DataFrames instead of RDDs.

val table = spark.read.format("org.apache.spark.sql.cassandra")


.options(Map("keyspace"->"music", "table" -> "solr"))
.load()
val result =
table.select("id","artist_name").where("solr_query='artist_name:Miles*'")
.show(10)

You may alternately use a Spark SQL query.

val result = spark.sql("SELECT id, artist_name FROM music.solr WHERE solr_query =


'artist_name:Miles*' LIMIT 10")

For a detailed example, see Running the Wikipedia demo with SearchAnalytics.
Configuring a DSE SearchAnalytics cluster
1. Create DSE SearchAnalytics nodes in a mixed-workload cluster, as described in Initializing a single
datacenter per workload type.
The name of the datacenter is set to SearchAnalytics when using the DseSimpleSnitch. Do not modify
existing search or analytics nodes that use DseSimpleSnitch to be SearchAnalytics nodes. If you use
another snitch like GossipingPropertyFileSnitch you can have a mixed workload within a datacenter.

2. Perform load testing to ensure your hardware has enough CPU and memory for the additional resource
overhead that is required by Spark and Solr.
SearchAnalytics nodes always use driver paging settings. See Using pagination (cursors) with CQL Solr
queries.

SearchAnalytics nodes might consume more resources than search or analytics nodes. Resource
requirements of the nodes greatly depend on the type of query patterns you are using.

Considerations for DSE SearchAnalytics clusters


Care should be taken when enabling both Search and Analytics on a DSE node. Since both workloads will be
enabled, ensure proper resources are provisioned for these simultaneous workloads. This includes sufficient
memory and compute resources to accommodate the specific indexing, query, and processing appropriate to the
use case.
SearchAnalytics clusters are appropriate for production environments, provided these environments provide
sufficient resources for the specific workload, as is the case for all DSE clusters.
All of the fields that are queried on DSE SearchAnalytics clusters must be defined in the search index schema
definition. Fields that are not defined in the search index schema columns are excluded from the results returned
from Spark queries.
Using predicate push down on search indexes in Spark SQL
Search predicate push down allows queries in SearchAnalytics datacenters to use Solr-
indexed columns in Spark SQL queries. To enable Search predicate push down, set
the spark.sql.dse.search.enableOptimization property to on or auto. By default,
spark.sql.dse.search.enableOptimization is set to auto.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
237
Using DataStax Enterprise advanced functionality

When in auto mode the predicate push down will do a COUNT operation against the Search indices both with
and without the predicate filters applied. If the number of records with the predicate filter is less than the result
of the following formula:

spark.sql.dse.search.autoRatio * the total number of records

the optimization occurs automatically.


The property spark.sql.dse.search.autoRatio is user configurable. The default value is 0.03.
The performance of DSE Search is directly related to the number of records returned in a query. Requests
which require a large portion of the dataset are likely better served by a full table scan without using predicate
push downs.
To enable Solr predicate push down on a Scala dataset:

val solrEnabledDataSet = spark.read


.format("org.apache.spark.sql.cassandra")
.options(Map(
"keyspace" -> "ks",
"table" -> "tab",
"spark.sql.dse.search.enableOptimization" -> "on")
.load()

To create a temporary table in Spark SQL with Solr predicate push down enabled:

CREATE TEMPORARY TABLE temp USING org.apache.spark.sql.cassandra OPTIONS (


table "tab",
keyspace "ks",
spark.sql.dse.search.enableOptimization "on");

Set the spark.sql.dse.search.enableOptimization property globally by adding it to the server configuration


file.
The optimizer works on the push down level so only predicates which are being pushed to the source
can be optimized. Use the explain command to see exactly what predicates are being pushed to the
CassandraSourceRelation.

val query = spark.sql("query")


query.explain

Logging optimization plans


The optimization plans for a query using predicate push downs are logged by setting the
org.apache.spark.sql.SolrPredicateRules logger to DEBUG in the Spark logging configuration files.

<logger name="org.apache.spark.sql.SolrPredicateRules" level="DEBUG"/>

About DSE Analytics Solo


DSE Analytics Solo datacenters provide analytics processing with Spark and distributed storage using DSEFS
without storing transactional database data.
DataStax Enterprise is flexible when deploying analytic processing in concert with transactional workloads. There
are two main ways to deploy DSE Analytics: collocated with the database processing nodes, and on segregated
machines in their own datacenter.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
238
Using DataStax Enterprise advanced functionality

Figure 3: Traditional and DSE Analytics Solo deployments

Traditional DSE Analytics deployments have both the DataStax database process and the Spark process
running on the same machine. This allows for simple deployment of analytic processing when the analysis is not
as intensive, or the database is not as heavily used.
DSE Analytics Solo allows customers to deploy DSE Analytics processing on segregated hardware
configurations in a different datacenter from the transactional DSE nodes. This ensures consistent behavior of
both engines in a configuration that does not compete for computer resources. This configuration is good for
processing-intensive analytic workloads.
DSE Analytics Solo allows the flexibility to have more nodes dedicated to data processing than are used for
database transactions. This is particularly good for situations where the processing needs far exceed the
transactional resource needs. For example, suppose you have a Spark Streaming job that will analyze and
filter 99.9% of the incoming data, storing only a few records after analysis. The resources required by the
transactional datacenter are much smaller than the resources required to analyze the data.
DSE Analytics Solo is more elastic in terms of scaling up, or down, the analytic processing in the cluster. This is
particularly useful when you need extra analytics processing, such as end of the day or end of the quarter surges
in analytics jobs. Since a DSE Analytics Solo node does not store database data, when new nodes are added to
a cluster there is very little data moved across the network to the new nodes. In an analytics and transactional
collocated environment, adding a node means moving transactional data between the existing nodes and the
new nodes.
For information on creating a DSE Analytics Solo datacenter, see Creating a DSE Analytics Solo datacenter.
Analyzing data using Spark
Spark is the default mode when you start an analytics node in a packaged installation.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
239
Using DataStax Enterprise advanced functionality

About Spark
Apache Spark is a framework for analyzing large data sets across a cluster, and is enabled when you start an
Analytics node. Spark runs locally on each node and executes in memory when possible. Spark uses multiple
threads instead of multiple processes to achieve parallelism on a single node, avoiding the memory overhead of
several JVMs.
Apache Spark integration with DataStax Enterprise includes:

• Spark Cassandra Connector for accessing data stores in DSE

• DSE Resource Manager for managing Spark components in a DSE cluster

• Spark Job Server

• Spark SQL support

• AlwaysOn SQL

• Spark SQL Thrift Server

• Spark streaming

• DataFrames API to manipulate data within Spark

• SparkR integration

Spark architecture
The software components for a single DataStax Enterprise analytics node are:

• Spark Worker

• DataStax Enterprise File System (DSEFS)

• The database

A Spark Master acts purely as a resource manager for Spark applications. Spark Workers launch executors that
are responsible for executing part of the job that is submitted to the Spark Master. Each application has its own
set of executors. Spark architecture is described in the Apache documentation.
DSE Spark nodes use a different resource manager than standalone Spark nodes. The DSE Resource
Manager simplifies integration between Spark and DSE. In a DSE Spark cluster, client applications use the
CQL protocol to connect to any DSE node, and that node redirects the request to the Spark Master.
The communication between the Spark client application (or driver) and the Spark Master is secured the same
way as connections to DSE, which means that plain password authentication as well as Kerberos authentication
is supported, with or without SSL encryption. Encryption and authentication can be configured per application,
rather than per cluster. Authentication and encryption between the Spark Master and Worker nodes can be
enabled or disabled regardless of the application settings.
Spark supports multiple applications. A single application can spawn multiple jobs and the jobs run in parallel.
An application reserves some resources on every node and these resources are not freed until the application
finishes. For example, every session of Spark shell is an application that reserves resources. By default, the
scheduler tries allocate the application to the highest number of different nodes. For example, if the application
declares that it needs four cores and there are ten servers, each offering two cores, the application most likely
gets four executors, each on a different node, each consuming a single core. However, the application can
get also two executors on two different nodes, each consuming two cores. You can configure the application
scheduler. Spark Workers and Spark Master are part of the main DSE process. Workers spawn executor JVM
processes which do the actual work for a Spark application (or driver). Spark executors use native integration to
access data in local transactional nodes through the Open Source Spark-Cassandra Connector. The memory
settings for the executor JVMs are set by the user submitting the driver to DSE.
In deployment for each Analytics datacenter one node runs the Spark Master, and Spark Workers run on each
of the nodes. The Spark Master comes with automatic high availability.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
240
Using DataStax Enterprise advanced functionality

Figure 4: Spark integration with DataStax Enterprise

As you run Spark, you can access data in the Hadoop Distributed File System (HDFS), or the DataStax
Enterprise File System (DSEFS) by using the URL for the respective file system.
Highly available Spark Master
The Spark Master High Availability mechanism uses a special table in the dse_analytics keyspace to
store information required to recover Spark workers and the application. Reads to the recovery data in
dse_analytics are always performed using the LOCAL_QUORUM consistency level. Writes are attempted

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
241
Using DataStax Enterprise advanced functionality

first using LOCAL_QUORUM, and if that fails, the write is retried using LOCAL_ONE. Unlike the high availability
mechanism mentioned in Spark documentation, DataStax Enterprise does not use ZooKeeper.
If the original Spark Master fails, the reserved one automatically takes over. To find the current Spark Master,
run:

$ dse client-tool spark leader-address

DataStax Enterprise provides Automatic Spark Master management.

The Spark Master will not start until LOCAL_QUORUM is attainable for the dse_analytics keyspace.

Unsupported features
The following Spark features and APIs are not supported:

• Writing to blob columns from Spark


Reading columns of all types is supported; however, you must convert collections of blobs to byte arrays
before serializing.

Using Spark with DataStax Enterprise


DataStax Enterprise integrates with Apache Spark to allow distributed analytic applications to run using
database data.
Starting Spark
Before you start Spark, configure Authorizing remote procedure calls (RPC) for the DseClientTool object.
RPC permission for the DseClientTool object is required to run Spark because the DseClientTool object
is called implicitly by the Spark launcher.

By default DSEFS is required to execute Spark applications. DSEFS should not be disabled when Spark is
enabled on a DSE node. If there is a strong reason not to use DSEFS as the default file system, reconfigure
Spark to use a different file system. For example to use a local file system set the following properties in
spark-daemon-defaults.conf:

spark.hadoop.fs.defaultFS=file:///
spark.hadoop.hive.metastore.warehouse.dir=file:///tmp/warehouse

How you start Spark depends on the installation and if you want to run in Spark mode or SearchAnalytics
mode:
Package installations:
To start the Spark trackers on a cluster of analytics nodes, edit the /etc/default/dse file to set
SPARK_ENABLED to 1.
When you start DataStax Enterprise as a service, the node is launched as a Spark node. You can
enable additional components.

Mode Option in /etc/ Description


default/dse

Spark SPARK_ENABLED=1 Start the node in Spark mode.

SearchAnalytics mode SPARK_ENABLED=1 SearchAnalytics mode requires testing in your environment


SEARCH_ENABLED=1 before it is used in production clusters. In dse.yaml,
cql_solr_query_paging: driver is required.

Tarball installations:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
242
Using DataStax Enterprise advanced functionality

To start the Spark trackers on a cluster of analytics nodes, use the -k option:

$ installation_location/bin/dse cassandra -k

Nodes started with -k are automatically assigned to the default Analytics datacenter if you do not
configure a datacenter in the snitch property file.
You can enable additional components:
Mode Option Description

Spark -k Start the node in Spark mode.

SearchAnalytics mode -k -s In dse.yaml, cql_solr_query_paging: driver is required.

For example:
To start a node in SearchAnalytics mode, use the -k and -s options.

$ installation_location/bin/dse cassandra -k -s

Starting the node with the Spark option starts a node that is designated as the master, as shown by the
Analytics(SM) workload in the output of the dsetool ring command:

$ dsetool ring

Address DC Rack Workload Graph Status


State Load Owns Token
Health [0,1]

0
10.200.175.149 Analytics rack1 Analytics(SM) no Up
Normal 185 KiB ? -9223372036854775808
0.90
10.200.175.148 Analytics rack1 Analytics(SW) no Up
Normal 194.5 KiB ? 0
0.90
Note: you must specify a keyspace to get ownership information.

Launching Spark
After starting a Spark node, use dse commands to launch Spark.
Usage:
Package installations: dse spark
Tarball installations: installation_location/bin/dse spark
You can use Cassandra specific properties to start Spark. Spark binds to the listen_address that is specified
in cassandra.yaml.
DataStax Enterprise supports these commands for launching Spark on the DataStax Enterprise command line:
dse spark
Enters interactive Spark shell, offers basic auto-completion.
Package installations: dse spark
Tarball installations: installation_location/bin/ dse spark
dse spark-submit

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
243
Using DataStax Enterprise advanced functionality

Launches applications on a cluster like spark-submit. Using this interface you can use Spark cluster
managers without the need for separate configurations for each application. The syntax for package
installations is:

$ dse spark-submit --class class_name jar_file other_options

For example, if you write a class that defines an option named d, enter the command as follows:

$ dse spark-submit --class com.datastax.HttpSparkStream target/


HttpSparkStream.jar -d $NUM_SPARK_NODES

The JAR file can be located in a DSEFS directory. If the DSEFS cluster is secured, provide
authentication credentials as described in DSEFS authentication.

The dse spark-submit command supports the same options as Apache Spark's spark-submit. For
example, to submit an application using cluster mode using the supervise option to restart in case of
failure:

$ dse spark-submit --deploy-mode cluster --supervise --class


com.datastax.HttpSparkStream target/HttpSparkStream.jar -d $NUM_SPARK_NODES

The directory in which you run the dse Spark commands must be writable by the current user.

Internal authentication is supported.


Use the optional environment variables DSE_USERNAME and DSE_PASSWORD to increase security and prevent the
user name and passwords from appearing in the Spark log files or in the process list on the Spark Web UI. To
specify a user name and password using environment variables, add the following to your Bash .profile or
.bash_profile:

export DSE_USERNAME=user
export DSE_PASSWORD=secret

These environment variables are supported for all Spark and dse client-tool commands.

DataStax recommends using the environment variables instead of passing user credentials on the
command line.

You can provide authentication credentials in several ways, see Credentials for authentication.
Specifying Spark URLs
You do not need to specify the Spark Master address when starting Spark jobs with DSE. If you connect to any
Spark node in a datacenter, DSE will automatically discover the Master address and connect the client to the
Master.
Specify the URL for any Spark node using the following format:

dse://[Spark node address[:port number]]?[parameter name=parameter value;]...

By default the URL is dse://?, which is equivalent to dse://localhost:9042. Any parameters you set in the
URL will override the configuration read from DSE's Spark configuration settings.
You can specify the work pool in which the application will be run by adding the workpool=work pool name as
a URL parameter. For example, dse://1.1.1.1:123?workpool=workpool2.
Valid parameters are CassandraConnectorConf settings with the spark.cassandra. prefix stripped. For
example, you can set the spark.cassandra.connection.local_dc option to dc2 by specifying dse://?
connection.local_dc=dc2.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
244
Using DataStax Enterprise advanced functionality

Or to specify multiple spark.cassandra.connection.host addresses for high-availability if the specified


connection point is down: dse://1.1.1.1:123?connection.host=1.1.2.2,1.1.3.3.
If the connection.host parameter is specified, the host provided in the standard URL is prepended to the list
of hosts set in connection.host. If the port is specified in the standard URL, it overrides the port number set
in the connection.port parameter.
Connection options when using dse spark-submit are retrieved in the following order: from the Master URL,
then the Spark Cassandra Connector options, then the DSE configuration files.
Detecting Spark application failures
DSE has a failure detector for Spark applications, which detects whether a running Spark application is dead
or alive. If the application has failed, the application will be removed from the DSE Spark Resource Manager.
The failure detector works by keeping an open TCP connection from a DSE Spark node to the Spark Driver in
the application. No data is exchanged, but regular TCP connection keep-alive control messages are sent and
received. When the connection is interrupted, the failure detector will attempt to reacquire the connection every
1 second for the duration of the appReconnectionTimeoutSeconds timeout value (5 seconds by default). If it
fails to reacquire the connection during that time, the application is removed.
A custom timeout value is specified by adding appReconnectionTimeoutSeconds=value in the master URI
when submitting the application. For example to set the timeout value to 10 seconds:

$ dse spark --master dse://?appReconnectionTimeoutSeconds=10

Running Spark commands against a remote cluster


To run Spark commands against a remote cluster, you must export the DSE configuration from one of the
remote nodes to the local client machine.
To run a driver application remotely, there must be full public network communication between the remote
nodes and the client machine.
Prerequisites:
The local client requires Spark driver ports on the client to be accessible by the remote DSE cluster nodes.
This might require configuring the firewall on the client machine and the remote DSE cluster nodes to allow
communication between the machines.
Spark dynamically selects ports for internal communication unless the ports are manually set. To use
dynamically chosen ports, the firewall needs to allow all port access from the remote cluster.
To set the ports manually, set the ports in the respective properties in spark-defaults.conf as shown in this
example:

spark.blockManager.port 38000
spark.broadcast.port 38001
spark.driver.port 38002
spark.executor.port 38003
spark.fileserver.port 38004
spark.replClassServer.port 38005

For a full list of ports used by DSE, see Securing DataStax Enterprise ports.

1. Export the DataStax Enterprise client configuration from the remote node to the client node:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
245
Using DataStax Enterprise advanced functionality

a. On the remote node:

$ dse client-tool configuration export dse-config.jar

b. Copy the exported JAR to the client nodes.

$ scp dse-config.jar user@clientnode1.example.com:

c. On the client node:

$ dse client-tool configuration import dse-config.jar

2. Run the Spark command against the remote node.

$ dse spark-submit submit options myApplication.jar

To set the driver host to a publicly accessible IP address, pass in the spark.driver.host option.

$ dse spark-submit --conf spark.driver.host=IP address myApplication.jar

Monitoring Spark with the web interface


A web interface, bundled with DataStax Enterprise, facilitates monitoring, debugging, and managing Spark.
Using the Spark web interface
To use the Spark web interface enter the listen IP address of any Spark node in a browser followed by port
number 7080 (configured in the spark-env.sh configuration file). Starting in DSE 5.1, all Spark nodes within
an Analytics datacenter will redirect to the current Spark Master.
If the Spark Master is not available, the UI will keep polling for the Spark Master every 10 seconds until the
Master is available.
The Spark web interface can be secured using SSL. SSL encryption of the web interface is enabled by default
when client encryption is enabled.
If authentication is enabled, and plain authentication is available, you will be prompted for authentication
credentials when accessing the web UI. We recommend using SSL with authentication.

Kerberos authentication is not supported in the Spark web UI. If authentication is enabled and either LDAP
or Internal authentication is not available, the Spark web UI will not be accessible. If this occurs, disable
authentication for the Spark web UI only by removing the spark.ui.filters setting in spark-daemon-
defaults.conf located in the Spark configuration directory.

DSE SSL encryption and authentication only apply to the Spark Master and Worker UIs, not the Spark Driver
UI. To use encryption and authentication with the Driver UI, refer to the Spark security documentation.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
246
Using DataStax Enterprise advanced functionality

The UI includes information on the number of cores and amount of memory available to Spark in total and in
each work pool, and similar information for each Spark worker. The applications list the associated work pool.
See the Spark documentation for information on using the Spark web UI.
Authorization in the Spark web UI
When authorization is enabled and an authenticated user accesses the web UI, what they can see and do
is controlled by their permissions. This allows administrators to control who has permission to view specific
application logs, view the executors for the application, kill the application, and list all applications. Viewing and
modifying applications can be configured per datacenter, work pool, or application.
See Using authorization with Spark for details on granting permissions.
Displaying fully qualified domain names in the web UI
To display fully qualified domain names (FQDNs) in the Spark web UI, set the SPARK_PUBLIC_DNS variable in
spark-env.sh on each Analytics node.
Set SPARK_PUBLIC_DNS to the FQDN of the node if you have SSL enabled for the web UI.
Redirecting to the fully qualified domain name of the master
Set the SPARK_LOCAL_IP or SPARK_LOCAL_HOSTNAME in the spark-env.sh file on each node to the fully qualified
domain name (FQDN) of the node to force any redirects to the web UI using the FQDN of the Spark master.
This is useful when enabling SSL in the web UI.

export SPARK_LOCAL_HOSTNAME=FQDN of the node

Filtering properties in the Spark Driver UI


The Spark Driver UI has an Environment tab that lists the Spark configuration and system properties used
by Spark. This can include sensitive information like passwords and security tokens. DSE Spark filters these
properties and mask their values with sequences of asterisks. The spark.redaction.regex filter is configured
as a regular expression that by default includes all properties that contain the string "secret", "token", or
"password" as well as all system properties. To modify the filter, edit the spark.redaction.regex property in
spark-defaults.conf in the Spark configuration directory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
247
Using DataStax Enterprise advanced functionality

Using DSE Spark with third party tools and integrations


The dse exec command sets the required environment variables required to run third-party tools that integrate
with Spark.

$ dse exec command

If the tool is run on a server that is not part of the DSE cluster, see Running Spark commands against a
remote cluster.
Jupyter integration
Download and install Jupyter notebook on a DSE node.
To launch Jupyter notebook:

$ dse exec jupyter notebook

A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In
contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a
context.
Livy integration
Download and install Livy on a DSE node. By default Livy runs Spark in local mode. Before starting Livy create
a configuration file by copying the conf/livy.conf.template to conf/livy.conf, then uncomment or add
the following two properties:

livy.spark.master = dse:///
livy.repl.enable-hive-context = true

To launch Livy:

$ dse exec livy-server

RStudio integration
Download and install R on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run
RStudio:

$ dse exec rstudio

In the RStudio session start a Spark session:

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))


sparkR.session()

These instructions are for RStudio desktop, not RStudio Server. In multiuser environments, we recommend
using AlwaysOn SQL and JDBC connections rather than SparkR.

Zeppelin integration
Download and install Zeppelin on a DSE node. To launch Zeppelin server:

$ dse exec zeppelin.sh

By default Zeppelin runs Spark in local mode. Update the master property to dse:/// in the Spark session in
the Interpreters configuration page. No configuration file changes are required to run Zeppelin.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
248
Using DataStax Enterprise advanced functionality

Configuring Spark
Configuring Spark for DataStax Enterprise includes:
Configuring Spark nodes
Modify the settings for Spark nodes security, performance, and logging.
To manage Spark performance and operations:

• Set the replication factor for DSE Analytics keyspaces

• Set environment variables

• Protect Spark directories

• Grant access to default Spark directories

• Secure Spark nodes

• Configure Spark memory and cores

• Configure Spark logging options

Set environment variables


DataStax recommends using the default values of Spark environment variables unless you need to increase
the memory settings due to an OutOfMemoryError condition or garbage collection taking too long. Use the
Spark memory configuration options in the dse.yaml and spark-env.sh files.
You can set a user-specific SPARK_HOME directory if you also set ALLOW_SPARK_HOME=true in your environment
before starting DSE.
For example, on Debian or Ubuntu using a package installation:

$ export SPARK_HOME=$HOME/spark && export ALLOW_SPARK_HOME=true && sudo service dse


start

The temporary directory for shuffle data, RDDs, and other ephemeral Spark data can be configured for both
the locally running driver and for the Spark server processes managed by DSE (Spark Master, Workers,
shuffle service, executor and driver running in cluster mode).
For the locally running Spark driver, the SPARK_LOCAL_DIRS environment variable can be customized in the
user environment or in spark-env.sh. By default, it is set to the system temporary directory. For example,
on Ubuntu it is /tmp/. If there's no system temporary directory, then SPARK_LOCAL_DIRS is set to a .spark
directory in the user's home directory.
For all other Spark server processes, the SPARK_EXECUTOR_DIRS environment variable can be customized in
the user environment or in spark-env.sh. By default it is set to /var/lib/spark/rdd.

The default SPARK_LOCAL_DIRS and SPARK_EXECUTOR_DIRS environment variable values differ from non-
DSE Spark.

To configure worker cleanup, modify the SPARK_WORKER_OPTS environment variable and add the cleanup
properties. The SPARK_WORKER_OPTS environment variable can be set in the user environment or in spark-
env.sh. For example, the following enables worker cleanup,.sets the cleanup interval to 30 minutes (i.e. 1800
seconds), and sets the length of time application worker directories will be retained to 7 days (i.e. 604800
seconds).

$ export SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS \ -Dspark.worker.cleanup.enabled=true \


-Dspark.worker.cleanup.interval=1800 \ -Dspark.worker.cleanup.appDataTtl=604800"

Protect Spark directories


After you start up a Spark cluster, DataStax Enterprise creates a Spark work directory for each
Spark Worker on worker nodes. A worker node can have more than one worker, configured by the

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
249
Using DataStax Enterprise advanced functionality

SPARK_WORKER_INSTANCES option in spark-env.sh. If SPARK_WORKER_INSTANCES is undefined, a


single worker is started. The work directory contains the standard output and standard error of executors and
other application specific data stored by Spark Worker and executors; the directory is writable only by the DSE
user.
By default, the Spark parent work directory is located in /var/lib/spark/work, with each worker in a
subdirectory named worker-number, where the number starts at 0. To change the parent worker directory,
configure SPARK_WORKER_DIR in the spark-env.sh file.
The Spark RDD directory is the directory where RDDs are placed when executors decide to spill them to
disk. This directory might contain the data from the database or the results of running Spark applications.
If the data in the directory is confidential, prevent access by unauthorized users. The RDD directory might
contain a significant amount of data, so configure its location on a fast disk. The directory is writable only by
the cassandra user. The default location of the Spark RDD directory is /var/lib/spark/rdd. The directory
should be located on a fast disk. To change the RDD directory, configure SPARK_EXECUTOR_DIRS in the
spark-env.sh file.

Grant access to default Spark directories


Before starting up nodes on a tarball installation, you need permission to access the default Spark directory
locations: /var/lib/spark and /var/log/spark. Change ownership of these directories as follows:

sudo mkdir -p /var/lib/spark/rdd; sudo chmod a+w /var/lib/spark/rdd; sudo chown -R


$USER:$GROUP /var/lib/spark/rdd &&
sudo mkdir -p /var/log/spark; sudo chown -R $USER:$GROUP /var/log/spark

In multiple datacenter clusters, use a virtual datacenter to isolate Spark jobs. Running Spark jobs consume
resources that can affect latency and throughput.
DataStax Enterprise supports the use of virtual nodes (vnodes) with Spark.
Secure Spark nodes
Client-to-node SSL
Ensure that the truststore entries in cassandra.yaml are present as described in Client-to-node
encryption, even when client authentication is not enabled.
Enabling security and authentication
Security is enabled using the spark_security_enabled option in dse.yaml. Setting it to
enabled turns on authentication between the Spark Master and Worker nodes, and allows you to
enable encryption. To encrypt Spark connections for all components except the web UI, enable
spark_security_encryption_enabled. The length of the shared secret used to secure Spark
components is set using the spark_shared_secret_bit_length option, with a default value of 256
bits. These options are described in DSE Analytics options. For production clusters, enable these
authentication and encryption. Doing so does not significantly affect performance.
Authentication and Spark applications
If authentication is enabled, users need to be authenticated in order to submit an application.
Authorization and Spark applications
If DSE authorization is enabled, users needs permission to submit an application. Additionally, the
user submitting the application automatically receives permission to manage the application, which
can optionally be extended to other users.
Database credentials for the Spark SQL Thrift server
In the hive-site.xml file, configure authentication credentials for the Spark SQL Thrift server. Ensure
that you use the hive-site.xml file in the Spark directory:

• Package installations: /etc/dse/spark/hive-site.xml

• Tarball installations: installation_location/resources/spark/conf/hive-site.xml

Kerberos with Spark


With Kerberos authentication, the Spark launcher connects to DSE with Kerberos credentials and
requests DSE to generate a delegation token. The Spark driver and executors use the delegation
token to connect to the cluster. For valid authentication, the delegation token must be renewed

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
250
Using DataStax Enterprise advanced functionality

periodically. For security reasons, the user who is authenticated with the token should not be able to
renew it. Therefore, delegation tokens have two associated users: token owner and token renewer.
The token renewer is none so that only a DSE internal process can renew it. When the application is
submitted, DSE automatically renews delegation tokens that are associated with Spark application.
When the application is unregistered (finished), the delegation token renewal is stopped and the
token is cancelled.
Set Kerberos options, see Defining a Kerberos scheme.
Configure Spark memory and cores
Spark memory options affect different components of the Spark ecosystem:
Spark History server and the Spark Thrift server memory
The SPARK_DAEMON_MEMORY option configures the memory that is used by the Spark SQL
Thrift server and history-server. Add or change this setting in the spark-env.sh file on nodes that run
these server applications.
Spark Worker memory
The memory_total option in resource_manager_options.worker_options section of dse.yaml
configures the total system memory that you can assign to all executors that are run by the work
pools on the particular node. The default work pool will use all of this memory if no other work pools
are defined. If you define additional work pools, you can set the total amount of memory by setting the
memory option in the work pool definition.
Application executor memory
You can configure the amount of memory that each executor can consume for the application. Spark
uses a 512MB default. Use either the spark.executor.memory option, described in "Spark Available
Properties", or the --executor-memory mem argument to the dse spark command.
Application memory
You can configure additional Java options that are applied by the worker when spawning an executor for
the application. Use the spark.executor.extraJavaOptions property, described in Spark 1.6.2 Available
Properties. For example: spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value
-Dnumbers="one two three"

Core management
You can manage the number of cores by configuring these options.

• Spark Worker cores


The cores_total option in the resource_manager_options.worker_options section of dse.yaml
configures the total number of system cores available to Spark Workers for executors. If no work pools are
defined in the resource_manager_options.workpools section of dse.yaml the default work pool will
use all the cores defined by cores_total. If additional work pools are defined, the default work pool will
use the cores available after allocating the cores defined by the work pools.
A single executor can borrow more than one core from the worker. The number of cores used by the
executor relates to the number of parallel tasks the executor might perform. The number of cores offered
by the cluster is the sum of cores offered by all the workers in the cluster.

• Application cores
In the Spark configuration object of your application, you configure the number of application cores that
the application requests from the cluster using either the spark.cores.max configuration property or the
--total-executor-cores cores argument to the dse spark command.

See the Spark documentation for details about memory and core allocation.
DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-automatic
fashion. The resource_manager_options.worker_options section in the dse.yaml file has options to
configure the proportion of system resources that are made available to Spark Workers and any defined
work pools, or explicit resource settings. When specifying decimal values of system resources the available
resources are calculated in the following way:

• Spark Worker memory = memory_total * (total system memory - memory assigned to DSE)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
251
Using DataStax Enterprise advanced functionality

• Spark Worker cores = cores_total * total system cores

This calculation is used for any decimal values. If the setting is not specified, the default value 0.7 is used. If
the value does not contain a decimal place, the setting is the explicit number of cores or amount of memory
reserved by DSE for Spark.

Setting cores_total or a workpool's cores to 1.0 is a decimal value, meaning 100% of the available cores
will be reserved. Setting cores_total or cores to 1 (no decimal point) is an explicit value, and one core will
be reserved.

The lowest values you can assign to a named work pool's memory and cores are 64 MB and 1 core,
respectively. If the results are lower, no exception is thrown and the values are automatically limited.
The following example shows a work pool named workpool1 with 1 core and 512 MB of RAM assigned to it.
The remaining resources calculated from the values in worker_options are assigned to the default work
pool.

resource_manager_options:
worker_options:
cores_total: 0.7
memory_total: 0.7

workpools:
- name: workpool1
cores: 1
memory: 512M

Running Spark clusters in cloud environments


If you are using a cloud infrastructure provider like Amazon EC2, you must explicitly open the ports for publicly
routable IP addresses in your cluster. If you do not, the Spark workers will not be able to find the Spark Master.
One work-around is to set the prefer_local setting in your cassandra-rackdc.properties snitch setup file to
true:

# Uncomment the following line to make this snitch prefer the internal ip when possible,
as the Ec2MultiRegionSnitch does.
prefer_local=true

This tells the cluster to communicate only on private IP addresses within the datacenter rather than the public
routable IP addresses.
Configuring the number of retries to retrieve Spark configuration
When Spark fetches configuration settings from DSE, it will not fail immediately if it cannot retrieve the
configuration data, but will retry 5 times by default, with increasing delay between retries. The number of
retries can be set in the Spark configuration, by modifying the spark.dse.configuration.fetch.retries
configuration property when calling the dse spark command, or in spark-defaults.conf.
Disabling continuous paging
Continuous paging streams bulk amounts of records from DSE to the DataStax Java Driver
used by DSE Spark. By default, continuous paging in queries is enabled. To disable it, set the
spark.dse.continuous_paging_enabled setting to false when starting the Spark SQL shell or in spark-
defaults.conf. For example:

$ dse spark-sql --conf spark.dse.continuous_paging_enabled=false

Using continuous paging can potentially improve performance up to 3 times, though the improvement
will depend on the data and the queries. Some factors that impact the performance improvement are the

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
252
Using DataStax Enterprise advanced functionality

number of executor JVMs per node and the number of columns included in the query. Greater performance
gains were observed with fewer executor JVMs per node and more columns selected.

Configuring the Spark web interface ports


By default the Spark web UI runs on port 7080. To change the port number, do the following:

1. Open the spark-env.sh file in a text editor.

2. Set the SPARK_MASTER_WEBUI_PORT variable to the new port number. For example, to set it to port 7082:

export SPARK_MASTER_WEBUI_PORT=7082

3. Repeat these steps for each Analytics node in your cluster.

4. Restart the nodes in the cluster.

Enabling Graphite Metrics in DSE Spark


Users can add third party JARs to Spark nodes by adding them to the Spark lib directory on each node and
restart the cluster. Add the Graphite Metrics JARs to this directory to enable metrics in DSE Spark.
The default location of the Spark lib directory depends on the type of installation:

• Package installations: /usr/share/dse/spark/lib

• Tarball installations: /var/lib/spark

To add the Graphite JARs to Spark in a package installation, copy them to the Spark lib directory:

$ cp metrics-graphite-3.1.2.jar /usr/share/dse/spark/lib/ && cp metrics-json-3.1.2.jar /


usr/share/dse/spark/lib/

Setting Spark properties for the driver and executor


Additional Spark properties for the Spark driver and executors are set in spark-defaults.conf. For example, to
enable Spark's commons-crypto encryption library:

spark.network.crypto.enabled true

Using authorization with Spark


See Analytic applications and Setting up DSE Spark application permissions.
Spark server configuration
The spark-daemon-defaults.conf file configures DSE Spark Masters and Workers.

Table 15: Spark server configuration properties


Option Default Description
value

dse.spark.application.timeout 30 The duration in seconds after which the application will be


considered dead if no heartbeat is received.

spark.dseShuffle.sasl.port 7447 The port number on which a shuffle service for


SASL secured applications is started. Bound to the
listen_address in cassandra.yaml.

spark.dseShuffle.noSasl.port 7437 The port number on which a shuffle service for unsecured
applications is started. Bound to the listen_address in
cassandra.yaml.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
253
Using DataStax Enterprise advanced functionality

By default Spark executor logs, which log the majority of your Spark Application output, are
redirected to standard output. The output is managed by Spark Workers. Configure logging by adding
spark.executor.logs.rolling.* properties to spark-daemon-defaults.conf file.

spark.executor.logs.rolling.maxRetainedFiles 3
spark.executor.logs.rolling.strategy size
spark.executor.logs.rolling.maxSize 50000

Additional Spark properties that affect the master and driver can be added to spark-daemon-defaults.conf.
For example, to enable Spark's commons-crypto encryption library:

spark.network.crypto.enabled true

Automatic Spark Master election


Spark Master elections are automatically managed, and do not require any manual configuration.
DSE Analytics datacenters communicate with each other to elect one of the nodes as the Spark Master and
another as the reserve Master. The Master keeps track of each Spark Worker and application, storing the
information in a system table. If the Spark Master node fails, the reserve Master takes over and a new reserve
Master is elected from the remaining Analytics nodes.
Each Analytics datacenter elects its own master.
For dsetool commands and options, see dsetool.
Determining the Spark Master address
You do not need to specify the Master address when configuring or using Spark with DSE Analytics.
Configuring applications with a valid URL is sufficient for DSE to connect to the Master node and run the
application. The following commands give information about the Spark configuration of DSE:

• To view the URL used to configure Spark applications:

$ dse client-tool spark master-address

dse://10.200.181.62:9042?
connection.local_dc=Analytics;connection.host=10.200.181.63;

• To view the current address of the Spark Master in this datacenter:

$ dse client-tool spark leader-address

10.200.181.62

• Workloads for Spark Master are flagged as Workload: Analytics(SM).

$ dsetool ring

Address DC Rack Workload Graph


Status State Load Owns Token
Health [0,1]
0

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
254
Using DataStax Enterprise advanced functionality

10.200.181.62 Analytics rack1 Analytics(SM) no Up


Normal 111.91 KiB ? -9223372036854775808
0.10

• Query the dse_leases.leases table to list all the masters from each data center with Analytics nodes:

select * from dse_leases.leases ;

name | dc | duration_ms | epoch | holder


-------------------+----------------------+-------------+---------+---------------
Leader/master/6.0 | Analytics | 30000 | 805254 | 10.200.176.42
Leader/master/6.0 | SearchGraphAnalytics | 30000 | 1300800 | 10.200.176.45
Leader/master/6.0 | SearchAnalytics | 30000 | 7 | 10.200.176.44

Ensure that the replication factor is configured correctly for the dse_leases keyspace
If the dse_leases keyspace is not properly replicated, the Spark Master might not be elected.
Every time you add a new datacenter, you must manually increase the replication factor of the dse_leases
keyspace for the new DSE Analytics datacenter. If DataStax Enterprise or Spark security options are
enabled on the cluster, you must also increase the replication factor for the dse_security keyspace across
all logical datacenters.
The initial node in a multi datacenter has a replication factor of 1 for the dse_leases keyspace. For new
datacenters, the first node is created with the dse_leases keyspace with an replication factor of 1 for that
datacenter. However, any datacenters that you add have a replication factor of 0 and require configuration
before you start DSE Analytics nodes. You must change the replication factor of the dse_leases keyspace for
multiple analytics datacenters. See Setting the replication factor for analytics keyspaces.
Monitoring the lease subsystem
All changes to lease holders are recorded in the dse_leases.logs table. Most of the time, you do not want to
enable logging.

1. To turn on logging, ensure that the lease_metrics_options is enabled in the dse.yaml file:

lease_metrics_options:
enabled:true
ttl_seconds: 604800

2. Look at the dse_leases.logs table:

select * from dse_leases.logs ;

name | dc | monitor | at |
new_holder | old_holder
-------------------+-----+---------------+---------------------------------
+---------------+------------
Leader/master/6.0 | dc1 | 10.200.180.44 | 2018-05-17 00:45:02.971000+0000 |
10.200.180.44 |
Leader/master/6.0 | dc1 | 10.200.180.49 | 2018-05-17 02:37:07.381000+0000 |
10.200.180.49 |

3. When the lease_metrics_option is enabled, you can examine the acquire, renew, resolve, and disable
operations. Most of the time, these operations should complete in 100 ms or less:

select * from dse_perf.leases ;

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
255
Using DataStax Enterprise advanced functionality

name | dc | monitor | acquire_average_latency_ms


| acquire_latency99ms | acquire_max_latency_ms | acquire_rate15 |
disable_average_latency_ms | disable_latency99ms | disable_max_latency_ms
| disable_rate15 | renew_average_latency_ms | renew_latency99ms |
renew_max_latency_ms | renew_rate15 | resolve_average_latency_ms |
resolve_latency99ms | resolve_max_latency_ms | resolve_rate15 | up |
up_or_down_since
-------------------+-----+---------------+----------------------------
+---------------------+------------------------+----------------
+----------------------------+---------------------+------------------------
+----------------+--------------------------+-------------------
+----------------------+--------------+----------------------------
+---------------------+------------------------+----------------+------
+---------------------------------
Leader/master/6.0 | dc1 | 10.200.180.44 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 |
24 | 100 | 100 | 0 |
8 | 26 | 26 | 0 | True |
2018-05-03 19:30:38.395000+0000
Leader/master/6.0 | dc1 | 10.200.180.49 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 | 0 | 0 |
10 | 32 | 32 | 0 | True |
2018-05-03 19:30:55.656000+0000

4. If the log warnings and errors do not contain relevant information, edit the logback.xml file and add:

<logger name="com.datastax.bdp.leasemanager" level="DEBUG">

5. Restart the node for the debugging settings to take effect.

Troubleshooting
Perform these various lease holder troubleshooting activities before you contact DataStax Support.
Verify the workload status
Run the dsetool ring command:

$ dsetool ring

If the replication factor is inadequate or if the replicas are down, the output of the dsetool ring
command contains a warning:

Address DC Rack Workload Graph


Status State Load Owns Token
Health [0,1]

0
10.200.178.232 SearchGraphAnalytics rack1 SearchAnalytics yes
Up Normal 153.04 KiB ? -9223372036854775808
0.00
10.200.178.230 SearchGraphAnalytics rack1 SearchAnalytics(SM) yes
Up Normal 92.98 KiB ? 0
0.000

If the automatic Job Tracker or Spark Master election fails, verify that an appropriate replication factor
is set for the dse_leases keyspace.
Use cqlsh commands to verify the replication factor of the analytics keyspaces

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
256
Using DataStax Enterprise advanced functionality

1. Describe the dse_leases keyspace:

DESCRIBE KEYSPACE dse_leases;

CREATE KEYSPACE dse_leases WITH replication =


{'class': 'NetworkTopologyStrategy', 'Analytics1': '1'}
AND durable_writes = true;

2. Increase the replication factor of the dse_leases keyspace:

ALTER KEYSPACE dse_leases WITH replication =


{'class': 'NetworkTopologyStrategy', 'Analytics1': '3', 'Analytics2':'3'}
;

3. Run nodetool repair.

Configuring Spark logging options


You can configure Spark logging options for the Spark logs.
Log directories
The Spark logging directory is the directory where the Spark components store individual log files. DataStax
Enterprise places logs in the following locations:
Executor logs

• SPARK_WORKER_DIR/worker-n/application_id/executor_id/stderr

• SPARK_WORKER_DIR/worker-n/application_id/executor_id/stdout

Spark Master/Worker logs


Spark Master: the global system.log
Spark Worker: SPARK_WORKER_LOG_DIR/worker-n/worker.log
The default SPARK_WORKER_LOG_DIR location is /var/log/spark/worker.
Default log directory for Spark SQL Thrift server
The default log directory for starting the Spark SQL Thrift server is $HOME/spark-thrift-server.
Spark Shell and application logs
Spark Shell and application logs are output to the console.
SparkR shell log
The default location for the SparkR shell is $HOME/.sparkR.log
Log configuration file
Log configuration files are located in the same directory as spark-env.sh.

To configure Spark logging options:

1. Configure logging options, such as log levels, in the following files:


Executors logback-spark-executor.xml
Spark Master logback.xml
Spark Worker logback-spark-server.xml
Spark Driver (Spark Shell, Spark applications) logback-spark.xml
SparkR logback-sparkR.xml

2. If you want to enable rolling logging for Spark executors, add the following options to spark-daemon-
defaults.conf.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
257
Using DataStax Enterprise advanced functionality

Enable rolling logging with 3 log files retained before deletion. The log files are broken up by size with a
maximum size of 50,000 bytes.

spark.executor.logs.rolling.maxRetainedFiles 3
spark.executor.logs.rolling.strategy size
spark.executor.logs.rolling.maxSize 50000

The default location of the Spark configuration files depends on the type of installation:

• Package installations: /etc/dse/spark/

• Tarball installations: installation_location/resources/spark/conf

3. Configure a safe communication channel to access the Spark user interface.

When user credentials are specified in plain text on the dse command line, like dse -u username
-p password, the credentials are present in the logs of Spark workers when the driver is run in
cluster mode.
The Spark Master, Spark Worker, executor, and driver logs might include sensitive information.
Sensitive information includes passwords and digest authentication tokens for Kerberos guidelines
mode that are passed in the command line or Spark configuration. DataStax recommends using
only safe communication channels like VPN and SSH to access the Spark user interface.

You can provide authentication credentials in several ways, see Credentials for authentication.

Running Spark processes as separate users


Spark processes can be configured to run as separate operating system users.
By default, processes started by DSE are run as the same OS user who started the DSE server process. This
is called the DSE service user. One consequence of this is that all applications that are run on the cluster can
access DSE data and configuration files, and access files of other applications.
You can delegate running Spark applications to runner processes and users by changing options in dse.yaml.
Overview of the run_as process runner
The run_as process runner allows you to run Spark applications as a different OS user than the DSE service
user. When this feature is enabled and configured:

• All simultaneously running applications deployed by a single DSE service user will be run as a single OS
user.

• Applications deployed by different DSE service users will be run by different OS users.

• All applications will be run as a different OS user than the DSE service user.

This allows you to prevent an application from accessing DSE server private files, and prevent one application
from accessing the private files of another application.
How the run_as process runner works
DSE uses sudo to run Spark applications components (drivers and executors) as specific OS users. DSE
doesn't link a DSE service user with a particular OS user. Instead, a configurable number of spare user
accounts or slots are used. When a request to run an executor or a driver is received, DSE finds an unused
slot, and locks it for that application. Until the application is finished, all of that application's processes run as
that slot user. When the application completes, the slot user will be released and will be available to other
applications.
Since the number of slots is limited, a single slot is shared among all the simultaneously running applications
run by the same DSE service user. Such a slot is released once all the applications of that user are removed.
When there is not enough slots to run an application, an error is logged and DSE will try to run the executor or

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
258
Using DataStax Enterprise advanced functionality

driver on a different node. DSE does not limit the number of slots you can configure. If you need to run more
applications simultaneously, create more slot users.
Slots assignment is done on a per node basis. Executors of a single application may run as different slot users
on different DSE nodes. When DSE is run on a fat node, different DSE instances running within the same OS
should be configured with different sets of slot users. If they use the same slot users, a single OS user may run
the applications of two different DSE service users.
When a slot is released, all directories which are normally managed by Spark for the application are removed.
If the application doesn't finish, but all executors are done on a node, and a slot user is about to be released,
all the application files are modified so that their ownership is changed to the DSE service user with owner-
only permission. When a new executor for this application is run on this node, the application files are
reassigned back to the slot user assigned to that application.
Configuring the run_as process runner
The administrator needs to prepare slot users in the OS before configuring DSE. The run_as process runner
requires:

• Each slot user has its own primary group, which name is the same as the name of slot user. This is
typically the default behaviour of the OS. For example, the slot1 user's primary group is slot1.

• The DSE service user is a member of each slot's primary group. For example, if the DSE service user is
cassandra, the cassandra user is a member of the slot1 group.

• The DSE service user is a member of a group with the same name as the service user. For example, if
the DSE service user is cassandra, the cassandra user is a member of the cassandra group.

• sudo is configured so that the DSE service user can execute any command as any slot user without
providing a password.

Override the umask setting to 007 for slot users so that files created by sub-processes will not be accessible by
anyone else by default, and DSE configuration files are not visible to slot users.
You may further secure the DSE server environment by modifying the OS's limits.conf file to set exact disk
space quotas for each slot user.
After adding the slot users and groups and configuring the OS, modify the dse.yaml file. In the
spark_process_runner section enable the run_as process runner and set the list of slot users on each node.

spark_process_runner:
# Allowed options are: default, run_as
runner_type: run_as

run_as_runner_options:
user_slots:
- slot1
- slot2

Example configuration for run_as process runner


In this example, two slot users, slot1 and slot2 will be created and configured with DSE. The default DSE
service user of cassandra is used.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
259
Using DataStax Enterprise advanced functionality

1. Create the slot users.

$ sudo useradd -r -s /bin/false slot1 && sudo useradd -r -s /bin/false slot2

2. Add the slot users to the DSE service user's group.

$ sudo usermod -a -G slot1,slot2 cassandra

3. Make sure the DSE service user is a member of a group with the same name as the service user. For
example, if the DSE service user is cassandra:

$ groups cassandra

cassandra : cassandra

4. Log out and back in again to make the group changes take effect.

5. Modify the sudoers file with the slot users.

Runas_Alias SLOTS = slot1, slot2


Defaults>SLOTS umask=007
Defaults>SLOTS umask_override
cassandra ALL=(SLOTS) NOPASSWD: ALL

6. Modify dse.yaml to enable the run_as process runner and add the new runners.

# Configure the way how the driver and executor processes are created and managed.
spark_process_runner:
# Allowed options are: default, run_as
runner_type: run_as

# RunAs runner uses sudo to start Spark drivers and executors. A set of
predefined fake users, called slots, is used
# for this purpose. All drivers and executors owned by some DSE user are run as
some slot user x. At the same time
# drivers and executors of any other DSE user use different slots.
run_as_runner_options:
user_slots:
- slot1
- slot2

Configuring the Spark history server


The Spark history server provides a way to load the event logs from Spark jobs that were run with event
logging enabled. The Spark history server works only when files were not flushed before the Spark Master
attempted to build a history user interface.

To enable the Spark history server:

1. Create a directory for event logs in the DSEFS file system:

$ dse fs 'mkdir -p /spark/events'

2. On each node in the cluster, edit the spark-defaults.conf file to enable event logging and specify the
directory for event logs:

#Turns on logging for applications submitted from this machine


spark.eventLog.dir dsefs:///spark/events
spark.eventLog.enabled true

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
260
Using DataStax Enterprise advanced functionality

#Sets the logging directory for the history server


spark.history.fs.logDirectory dsefs:///spark/events
# Optional property that changes permissions set to event log files
# spark.eventLog.permissions=777

3. Start the Spark history server on one of the nodes in the cluster:
The Spark history server is a front-end application that displays logging data from all nodes in the
Spark cluster. It can be started from any node in the cluster.
If you've enabled authentication set the authentication method and credentials in a properties file and
pass it to the dse command. For example, for basic authentication:

spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.username=role name
spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.password=password

If you set the event log location in spark-defaults.conf, set the spark.history.fs.logDirectory
property in your properties file.

spark.history.fs.logDirectory=dsefs:///spark/events

$ dse spark-history-server start

With a properties file:

dse spark-history-server start --properties-file properties file

If you specify a properties file, none of the configuration in spark-defaults.conf is used. The
properties file should contain all the required configuration properties.

The history server is started and can be viewed by opening a browser to http://node
hostname:18080.

The Spark Master web UI does not show the historical logs. To work around this known issue,
access the history from port 18080.

4. When event logging is enabled, the default behavior is for all logs to be saved, which causes the storage
to grow over time. To enable automated cleanup edit spark-defaults.conf and edit the following
options:

spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 7d

For these settings, automated cleanup is enabled, the cleanup is performed daily, and logs older than
seven days are deleted.

Setting Spark Cassandra Connector-specific properties


Spark integration uses the Spark Cassandra Connector under the hood. You can use the configuration options
defined in that project to configure DataStax Enterprise Spark. Spark recognizes system properties that have
the spark. prefix and adds the properties to the configuration object implicitly upon creation. You can avoid
adding system properties to the configuration object by passing false for the loadDefaults parameter in the
SparkConf constructor.
The full list of parameters is included in the Spark Cassandra Connector documentation.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
261
Using DataStax Enterprise advanced functionality

You pass settings for Spark, Spark Shell, and other DataStax Enterprise Spark built-in applications using the
intermediate application spark-submit, described in Spark documentation.
Configuring the Spark shell
Pass Spark configuration arguments using the following syntax:

$ dse spark [submission_arguments] [application_arguments]

where submission_arguments are:

[--help] [--verbose]
[--conf name=spark.value|sparkproperties.conf]
[--executor-memory memory]
[--jars additional-jars]
[--master dse://?appReconnectionTimeoutSeconds=secs]
[--properties-file path_to_properties_file]
[--total-executor-cores cores]

--conf name=spark.value|sparkproperties.conf
An arbitrary Spark option to the Spark configuration prefixed by spark.

• name-spark.value

• sparkproperties.conf - a configuration

--executor-memory mem
The amount of memory that each executor can consume for the application. Spark uses a 512 MB
default. Specify the memory argument in JVM format using the k, m, or g suffix.
--help
Shows a help message that displays all options except DataStax Enterprise Spark shell options.
--jars path_to_additional_jars
A comma-separated list of paths to additional JAR files.
--properties-file path_to_properties_file
The location of the properties file that has the configuration settings. By default, Spark loads the
settings from spark-defaults.conf.
--total-executor-cores cores
The total number of cores the application uses.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.
Spark shell application arguments:
-i app_script_file
Spark shell application argument that runs a script from the specified file.
Configuring Spark applications
You pass the Spark submission arguments using the following syntax:

$ dse spark-submit [submission_arguments] application_file [application_arguments]

All submission_arguments and these additional spark-submit submission_arguments:


--class class_name
The full name of the application main class.
--name appname
The application name as displayed in the Spark web application.
--py-files files
A comma-separated list of the .zip, .egg, or .py files that are set on PYTHONPATH for Python
applications.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
262
Using DataStax Enterprise advanced functionality

--files files
A comma-separated list of files that are distributed among the executors and available for the
application.
In general, Spark submission arguments are translated into system properties -Dname=value and other VM
parameters like classpath. The application arguments are passed directly to the application.
Property list
When you run dse spark-submit on a node in your Analytics cluster, all the following properties are set
automatically, and the Spark Master is automatically detected. Only set the following properties if you need to
override the automatically managed properties.
spark.cassandra.connection.native.port
Default = 9042. Port for native client protocol connections.
spark.cassandra.connection.rpc.port
Default = 9160. Port for thrift connections.
spark.cassandra.connection.host
The host name or IP address to which the Thrift RPC service and native transport is bound.
The native_transport_address property in the cassandra.yaml, which is localhost by default,
determines the default value of this property.
You can explicitly set the Spark Master address using the --master master address parameter to dse spark-
submit.

$ dse spark-submit --master master address application JAR file

For example, if the Spark node is at 10.0.0.2:

$ dse spark-submit --master dse://10.0.0.2? myApplication.jar

The following properties can be overridden for performance or availability:


Connection properties
spark.cassandra.session.consistency.level
Default = LOCAL_ONE. The default consistency level for sessions which are accessed from the
CassandraConnector object as in CassandraConnector.withSessionDo.
This property does not affect the consistency level of DataFrame and RDD read and write
operations. Use spark.cassandra.input.consistency.level for read operations and
spark.cassandra.output.consistency.level for write operations.

Read properties
spark.cassandra.input.split.size
Default = 100000. Approximate number of rows in a single Spark partition. The higher the value, the
fewer Spark tasks are created. Increasing the value too much may limit the parallelism level.
spark.cassandra.input.fetch.size_in_rows
Default = 1000. Number of rows being fetched per round-trip to the database. Increasing this value
increases memory consumption. Decreasing the value increases the number of round-trips. In earlier
releases, this property was spark.cassandra.input.page.row.size.
spark.cassandra.input.consistency.level
Default = LOCAL_ONE. Consistency level to use when reading.
Write properties
You can set the following properties in SparkConf to fine tune the saving process.
spark.cassandra.output.batch.size.bytes
Default = 1024. Maximum total size of a single batch in bytes.
spark.cassandra.output.consistency.level
Default = LOCAL_QUORUM. Consistency level to use when writing.
spark.cassandra.output.concurrent.writes

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
263
Using DataStax Enterprise advanced functionality

Default = 100. Maximum number of batches executed in parallel by a single Spark


task.
spark.cassandra.output.batch.size.rows
Default = None. Number of rows per single batch. The default is unset, which means the connector
will adjust the number of rows based on the amount of data in each row.
See the Spark Cassandra Connector documentation for details on additional, low-level properties.
Creating a DSE Analytics Solo datacenter
DSE Analytics Solo datacenters do not store any database or search data, but are strictly used for analytics
processing. They are used in conjunction with one or more datacenters that contain database data.
Creating a DSE Analytics Solo datacenter within an existing DSE cluster
In this example scenario, there is an existing datacenter, DC1 which has existing database data. Create a new
DSE Analytics Solo datacenter, DC2, which does not store any data but will perform analytics jobs using the
database data from DC1.

• Make sure all keyspaces in the DC1 datacenter use NetworkTopologyStrategy. If necessary, alter the
keyspace.

ALTER KEYSPACE mykeyspace


WITH REPLICATION = { 'class' = 'NetworkTopologyStrategy', 'DC1' : 3 };

• Add nodes to a new datacenter named DC2, then enable Analytics on those nodes.

• Configure the dse_leases and dse_analytics keyspaces to replicate to both DC1 and DC2. For example:

ALTER KEYSPACE dse_leases


WITH REPLICATION = { 'class' = 'NetworkTopologyStrategy', 'DC1' : 3, 'DC2' : 3 };

• When submitting Spark applications specify the --master URL with the name or IP address of a node in
the DC2 datacenter, and set the spark.cassandra.connection.local_dc configuration option to DC1.

dse spark-submit --master "dse://?connection.local_dc=DC2"


--class com.datastax.dse.demo.loss.Spark10DayLoss --conf
"spark.cassandra.connection.local_dc=DC1" portfolio.jar

The Spark workers read the data from the DC1.

Accessing an external DSE transactional cluster from a DSE Analytics Solo cluster
To access an external DSE transactional cluster, explicitly set the connection to the transactional cluster when
creating RDDs or Datasets within the application.
In the following examples, the external DSE transactional cluster has a node running on 10.10.0.2.
To create an RDD from the transactional cluster's data:

import com.datastax.spark.connector._
import com.datastax.spark.connector.cql._
import org.apache.spark.SparkContext

def analyticsSoloExternalDataExample ( sc: SparkContext) = {


val connectorToTransactionalCluster =
CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "10.10.0.2"))

val rddFromTransactionalCluster = {
// Sets connectorToTransactionalCluster as default connection for everything in this
code block
implicit val c = connectorToTransactionalCluster
// get the data from the test.words table
sc.cassandraTable("test","words")
}

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
264
Using DataStax Enterprise advanced functionality

Creating a Dataset from the transactional :

import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql.CassandraConnectorConf

// set params for the particular cluster


spark.setCassandraConf("TransactionalCluster",
CassandraConnectorConf.ConnectionHostParam.option("10.10.0.2"))

val df = spark
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "words", "keyspace" -> "test"))
.load()

When you submit the application to the DSE Analytics Solo cluster, it will retrieve the data from the external
DSE transactional cluster.
Spark JVMs and memory management
Spark jobs running on DataStax Enterprise are divided among several different JVM processes, each with
different memory requirements.
DataStax Enterprise and Spark Master JVMs
The Spark Master runs in the same process as DataStax Enterprise, but its memory usage is negligible. The
only way Spark could cause an OutOfMemoryError in DataStax Enterprise is indirectly by executing queries
that fill the client request queue. For example, if it ran a query with a high limit and paging was disabled or it
used a very large batch to update or insert data in a table. This is controlled by MAX_HEAP_SIZE in cassandra-
env.sh. If you see an OutOfMemoryError in system.log, you should treat it as a standard OutOfMemoryError
and follow the usual troubleshooting steps.
Spark executor JVMs
The Spark executor is where Spark performs transformations and actions on the RDDs and is usually
where a Spark-related OutOfMemoryError would occur. An OutOfMemoryError in an executor will show
up in the stderr log for the currently executing application (usually in /var/lib/spark). There are several
configuration settings that control executor memory and they interact in complicated ways.

• The memory_total option in the resource_manager_options.worker_options section of dse.yaml


defines the maximum fraction of system memory to give all executors for all applications running on a
particular node. It uses the following formula:
memory_total * (total system memory - memory assigned to DataStax Enterprise)

• spark.executor.memory is a system property that controls how much executor memory a specific
application gets. It must be less than or equal to the calculated value of memory_total. It can be specified
in the constructor for the SparkContext in the driver application, or via --conf spark.executor.memory
or --executor-memory command line options when submitting the job using spark-submit.

The client driver JVM


The driver is the client program for the Spark job. Normally it shouldn't need very large amounts of memory
because most of the data should be processed within the executor. If it does need more than a few gigabytes,
your application may be using an anti-pattern like pulling all of the data in an RDD into a local data structure by
using collect or take. Generally you should never use collect in production code and if you use take, you
should be only taking a few records. If the driver runs out of memory, you will see the OutOfMemoryError in
the driver stderr or wherever it's been configured to log. This is controlled one of two places:

• SPARK_DRIVER_MEMORY in spark-env.sh

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
265
Using DataStax Enterprise advanced functionality

• spark.driver.memory system property which can be specified via --conf spark.driver.memory or


--driver-memory command line options when submitting the job using spark-submit. This cannot be
specified in the SparkContext constructor because by that point, the driver has already started.

Spark worker JVMs


The worker is a watchdog process that spawns the executor, and should never need its heap size increased.
The worker's heap size is controlled by SPARK_DAEMON_MEMORY in spark-env.sh. SPARK_DAEMON_MEMORY also
affects the heap size of the Spark SQL thrift server.
Using Spark modules with DataStax Enterprise

Getting started with Spark Streaming


Spark Streaming allows you to consume live data streams from sources, including Akka, Kafka, and Twitter.
This data can then be analyzed by Spark applications, and the data can be stored in the database.
You use Spark Streaming by creating an org.apache.spark.streaming.StreamingContext instance based
on your Spark configuration. You then create a DStream instance, or a discretionized stream, an object that
represents an input stream. DStream objects are created by calling one of the methods of StreamingContext,
or using a utility class from external libraries to connect to other sources like Twitter.
The data you consume and analyze is saved to the database by calling one of the saveToCassandra methods
on the stream object, passing in the keyspace name, the table name, and optionally the column names and
batch size.

Spark Streaming applications require synchronized clocks to operate correctly. See Synchronize clocks.

The following Scala example demonstrates how to connect to a text input stream at a particular IP address
and port, count the words in the stream, and save the results to the database.

1. Import the streaming context objects.

import org.apache.spark.streaming._

2. Create a new StreamingContext object based on an existing SparkConf configuration object, specifying
the interval in which streaming data will be divided into batches by passing in a batch duration.

val sparkConf = ....


val ssc = new StreamingContext(sc, Seconds(1)) // Uses the context automatically
created by the spark shell

Spark allows you to specify the batch duration in milliseconds, seconds, and minutes.

3. Import the database-specific functions for StreamingContext, DStream, and RDD objects.

import com.datastax.spark.connector.streaming._

4. Create the DStream object that will connect to the IP and port of the service providing the data stream.

val lines = ssc.socketTextStream(server IP address, server port number)

5. Count the words in each batch and save the data to the table.

val words = lines.flatMap(_.split(" "))


val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
266
Using DataStax Enterprise advanced functionality

.saveToCassandra("streaming_test", "words_table", SomeColumns("word", "count"))

6. Start the computation.

ssc.start()
ssc.awaitTermination()

In the following example, you start a service using the nc utility that repeats strings, then consume the
output of that service using Spark Streaming.
Using cqlsh, start by creating a target keyspace and table for streaming to write into.

CREATE KEYSPACE IF NOT EXISTS streaming_test


WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1 };

CREATE TABLE IF NOT EXISTS streaming_test.words_table


(word TEXT PRIMARY KEY, count COUNTER);

In a terminal window, enter the following command to start the service:

$ nc -lk 9999 one two two three three three four four four four someword

In a different terminal start a Spark shell.

$ dse spark

In the Spark shell enter the following:

import org.apache.spark.streaming._
import com.datastax.spark.connector.streaming._

val ssc = new StreamingContext(sc, Seconds(1))


val lines = ssc.socketTextStream( "localhost", 9999)
val words = lines.flatMap(_.split( " "))
val pairs = words.map(word => (word, 1))

val wordCounts = pairs.reduceByKey(_ + _)


wordCounts.saveToCassandra( "streaming_test", "words_table", SomeColumns( "word",
"count"))
wordCounts.print()
ssc.start()
ssc.awaitTermination()
exit()

Using cqlsh connect to the streaming_test keyspace and run a query to show the results.

$ cqlsh -k streaming_test

select * from words_table;

word | count
---------+-------
three | 3
one | 1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
267
Using DataStax Enterprise advanced functionality

two | 2
four | 4
someword | 1

What's next:
Run the http_receiver demo. See the Spark Streaming Programming Guide for more information, API
documentation, and examples on Spark Streaming.
Creating a Spark Structured Streaming sink using DSE
Spark Structured Streaming is a high-level API for streaming applications. DSE supports Structured
Streaming for storing data into DSE.
The following Scala example shows how to store data from a streaming source to DSE using the
cassandraFormat method.

val query = source.writeStream


.option("checkpointLocation", checkpointDir.toString)
.cassandraFormat("table name", "keyspace name")
.outputMode(OutputMode.Update)
.start()

This example sets the OutputMode to Update, described in the Spark API documentation.
The cassandraFormat method is equivalent to calling the format method and in
org.apache.spark.sql.cassandra.

val query = source.writeStream


.option("checkpointLocation", checkpointDir.toString)
.format("org.apache.spark.sql.cassandra")
.option("keyspace", ks)
.option("table", "kv")
.outputMode(OutputMode.Update)
.start()

Using Spark SQL to query data


Spark SQL allows you to execute Spark queries using a variation of the SQL language. Spark SQL includes
APIs for returning Spark Datasets in Scala and Java, and interactively using a SQL shell.
Spark SQL basics
In DSE, Spark SQL allows you to perform relational queries over data stored in DSE clusters, and executed
using Spark. Spark SQL is a unified relational query language for traversing over distributed collections of
data, and supports a variation of the SQL language used in relational databases. Spark SQL is intended as a
replacement for Shark and Hive, including the ability to run SQL queries over Spark data sets. You can use
traditional Spark applications in conjunction with Spark SQL queries to analyze large data sets.
The SparkSession class and its subclasses are the entry point for running relational queries in Spark.
DataFrames are Spark Datasets organized into named columns, and are similar to tables in a traditional
relational database. You can create DataFrame instances from any Spark data source, like CSV files, Spark
RDDs, or, for DSE, tables in the database. In DSE, when you access a Spark SQL table from the data in DSE
transactional cluster, it registers that table to the Hive metastore so SQL queries can be run against it.

Any tables you create or destroy, and any table data you delete, in a Spark SQL session will not be
reflected in the underlying DSE database, but only in that session's metastore.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
268
Using DataStax Enterprise advanced functionality

Starting the Spark SQL shell


The Spark SQL shell allows you to interactively perform Spark SQL queries. To start the shell, run dse spark-
sql:

$ dse spark-sql

The Spark SQL shell in DSE automatically creates a Spark session and connects to the Spark SQL Thrift
server to handle the underlying JDBC connections.
If the schema changes in the underlying database table during a Spark SQL session (for example, a column
was added using CQL), drop the table and then refresh the metastore to continue querying the table with the
correct schema.

DROP TABLE tablename;


SHOW TABLES;

Queries to a table whose schema has been modified cause a runtime exception.
Spark SQL limitations
• You cannot load data from one file system to a table in a different file system.

CREATE TABLE IF NOT EXISTS test (id INT, color STRING) PARTITIONED BY (ds STRING);
LOAD DATA INPATH 'hdfs2://localhost/colors.txt' OVERWRITE INTO TABLE test PARTITION
(ds ='2008-08-15');

The first line creates a table on the default file system. The second line attempts to load data into that
table from a path on a different file system, and will fail.

Querying database data using Spark SQL in Scala


When you start Spark, DataStax Enterprise creates a Spark session instance to allow you to run
Spark SQL queries against database tables. The session object is named spark and is an instance of
org.apache.spark.sql.SparkSession. Use the sql method to execute the query.

1. Start the Spark shell.

$ dse spark

2. Use the sql method to pass in the query, storing the result in a variable.

val results = spark.sql("SELECT * from my_keyspace_name.my_table")

3. Use the returned data.

results.show()

+--------------------+-----------+
| id|description|
+--------------------+-----------+
|de2d0de1-4d70-11e...| thing|
|db7e4191-4d70-11e...| another|
|d576ad50-4d70-11e...|yet another|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
269
Using DataStax Enterprise advanced functionality

+--------------------+-----------+

Querying database data using Spark SQL in Java


Java applications that query table data using Spark SQL first need an instance of
org.apache.spark.sql.SparkSession.
The Spark session object is used to connect to DataStax Enterprise.
Create the Spark session instance using the builder interface:

SparkSession spark = SparkSession


.builder()
.appName("My application name")
.config("option name", "option value")
.master("dse://1.1.1.1?connection.host=1.1.2.2,1.1.3.3")
.getOrCreate();

After the Spark session instance is created, you can use it to create a DataFrame instance from the query.
Queries are executed by calling the SparkSession.sql method.

DataFrame employees = spark.sql("SELECT * FROM company.employees");


employees.registerTempTable("employees");
DataFrame managers = spark.sql("SELECT name FROM employees WHERE role = 'Manager' ");

The returned DataFrame object supports the standard Spark operations.

employees.collect();

Querying DSE Graph vertices and edges with Spark SQL


Spark SQL can query DSE Graph vertex and edge tables. The dse_graph database holds the vertex
and edge tables for each graph. The naming format for the tables is graph name_vertices and graph
name_edges. For example, if you have a graph named gods, the vertices and edges are accessible in Spark
SQL in the dse_graph.gods_vertices and dse_graph.gods_edges tables.

select * from dse_graph.gods_vertices;

If you have properties that are spelled the same but with different capitalizations (for example, id and Id),
start Spark SQL with the --conf spark.sql.caseSensitive=true option.
Prerequisites:
Start your cluster with both Graph and Spark enabled.

1. Start the Spark SQL shell.

$ dse spark-sql

2. Query the vertices and edges using SELECT statements.

USE dse_graph;
SELECT * FROM gods_vertices where name = 'Zeus';

3. Join the vertices and edges in a query.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
270
Using DataStax Enterprise advanced functionality

Vertices are identified by id columns. Edge tables have src and dst columns that identify the from
and to vertices, respectively. A join can be used to traverse the graph. For example to find all vertex
ids that are reached by the out edges:

SELECT gods_edges.dst FROM gods_vertices JOIN gods_edges ON gods_vertices.id =


gods_edges.src;

What's next: The same steps work from the Spark shell using spark.sql() to run the query statements, or
using the JDBC/ODBC driver and the Spark SQL Thrift Server.
Using Spark predicate push down in Spark SQL queries
Spark predicate push down to database allows for better optimized Spark queries. A predicate is a condition
on a query that returns true or false, typically located in the WHERE clause. A predicate push down filters
the data in the database query, reducing the number of entries retrieved from the database and improving
query performance. By default the Spark Dataset API will automatically push down valid WHERE clauses to the
database.
You can also use predicate push down on DSE Search indices within SearchAnalytics data centers.
Restrictions on column filters
Partition key columns can be pushed down as long as:

• All partition key columns are included in the filter.

• No more than one equivalence predicate per column.

Use an IN clause to specify multiple restrictions for a particular column:

val primaryColors = List("red", "yellow", "blue")

val df = spark.read.cassandraFormat("cars", "inventory").load


df.filter(df("car_color").isin(primaryColors: _*))

Clustering key columns can be pushed down with the following rules:

• Only the last predicate in the filter can be a non equivalence predicate.

• If there is more than one predicate for a column, the predicates cannot be equivalence predicates.

When predicate push down occurs


When a Dataset has no push down filters, all requests on the Dataset do a full unfiltered table scan. Adding
predicate filters on the Dataset for eligible database columns modifies the underlying query to narrow its
scope.
Determining if predicate push down is being used in queries
By using the explain method on a Dataset (or EXPLAIN in Spark SQL), queries can be analyzed to see if the
predicates need to be cast to the correct data type. For example, create the following CQL table:

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy',


'replication_factor': 1 };
USE test;
CREATE table words (
user TEXT,
word TEXT,
count INT,
PRIMARY KEY (user, word));

INSERT INTO words (user, word, count ) VALUES ( 'Russ', 'dino', 10 );


INSERT INTO words (user, word, count ) VALUES ( 'Russ', 'fad', 5 );
INSERT INTO words (user, word, count ) VALUES ( 'Sam', 'alpha', 3 );

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
271
Using DataStax Enterprise advanced functionality

INSERT INTO words (user, word, count ) VALUES ( 'Zebra', 'zed', 100 );

Then create a Spark Dataset in the Spark console using that table and look for PushedFilters in the output
after issuing the EXPLAIN command:

val df = spark.read.cassandraFormat("words", "test").load


df.explain

== Physical Plan ==
*Scan org.apache.spark.sql.cassandra.CassandraSourceRelation [user#0,word#1,count#2]
ReadSchema: struct<user:string,word:string,count:int>

Because this query doesn't filter on columns capable of being pushed down, there are no PushedFilters in
the physical plan.
Adding a filter, however, does change the physical plan to include PushedFilters:

val dfWithPushdown = df.filter(df("word") > "ham")


dfWithPushdown.explain

== Physical Plan ==
*Scan org.apache.spark.sql.cassandra.CassandraSourceRelation
[user#0,word#1,count#2] PushedFilters: [*GreaterThan(word,ham)], ReadSchema:
struct<user:string,word:string,count:int>

The PushedFilters section of the physical plan includes the GreaterThan push down filter. The asterisk
indicates that push down filter will be handled only at the datasource level.
Troubleshooting predicate push down
When creating Spark SQL queries that use comparison operators, making sure the predicates are pushed
down to the database correctly is critical to retrieving the correct data with the best performance.
For example, given a CQL table with the following schema:

CREATE TABLE test.common (


year int,
birthday timestamp,
userid uuid,
likes text,
name text,
PRIMARY KEY (year, birthday, userid)
)

Suppose you want to write a query that selects all entries where the birthday is earlier than a given date:

SELECT * FROM test.common WHERE birthday < '2001-1-1';

Use the EXPLAIN command to see the query plan:

EXPLAIN SELECT * FROM test.common WHERE birthday < '2001-1-1';

== Physical Plan ==
*Filter (cast(birthday#1 as string) < 2001-1-1)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
272
Using DataStax Enterprise advanced functionality

+- *Scan org.apache.spark.sql.cassandra.CassandraSourceRelation
[year#0,birthday#1,userid#2,likes#3,name#4] ReadSchema:
struct<year:int,birthday:timestamp,userid:string,likes:string,name:string>
Time taken: 0.72 seconds, Fetched 1 row(s)

Note that the Filter directive is treating the birthday column, a CQL TIMESTAMP, as a string. The query
optimizer looks at this comparison and needs to make the types match before generating a predicate. In
this case the optimizer decides to cast the birthday column as a string to match the string '2001-1-1',
but cast functions cannot be pushed down. The predicate isn't pushed down, and it doesn't appear in
PushedFilters. A full table scan will be performed at the database layer, with the results returned to Spark
for further processing.
To push down the correct predicate for this query, use the cast method to specify that the predicate is
comparing the birthday column to a TIMESTAMP, so the types match and the optimizer can generate the
correct predicate.

EXPLAIN SELECT * FROM test.common WHERE birthday < cast('2001-1-1' as TIMESTAMP);

== Physical Plan ==
*Scan org.apache.spark.sql.cassandra.CassandraSourceRelation
[year#0,birthday#1,userid#2,likes#3,name#4]
PushedFilters: [*LessThan(birthday,2001-01-01 00:00:00.0)],
ReadSchema: struct<year:int,birthday:timestamp,userid:string,likes:string,name:string>
Time taken: 0.034 seconds, Fetched 1 row(s)

Note the PushedFilters indicating that the LessThan predicate will be pushed down for the column data in
birthday. This should speed up the query as a full table scan will be avoided.

Supported syntax of Spark SQL

The following syntax defines a SELECT query.

SELECT [DISTINCT] [column names]|[wildcard]


FROM [keyspace name.]table name
[JOIN clause table name ON join condition]
[WHERE condition]
[GROUP BY column name]
[HAVING conditions]
[ORDER BY column names [ASC | DSC]]

A SELECT query using joins has the following syntax.

SELECT statement
FROM statement
[JOIN | INNER JOIN | LEFT JOIN | LEFT SEMI JOIN | LEFT OUTER JOIN | RIGHT JOIN | RIGHT
OUTER JOIN | FULL JOIN | FULL OUTER JOIN]
ON join condition

Several select clauses can be combined in a UNION, INTERSECT, or EXCEPT query.

SELECT statement 1
[UNION | UNION ALL | UNION DISTINCT | INTERSECT | EXCEPT]
SELECT statement 2

Select queries run on new columns return '', or empty results, instead of None.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
273
Using DataStax Enterprise advanced functionality

The following syntax defines an INSERT query.

INSERT [OVERWRITE] INTO [keyspace name.]table name


VALUES values

The following syntax defines a CACHE TABLE query.

CACHE TABLE table name [AS table alias]

You can remove a table from the cache using a UNCACHE TABLE query.

UNCACHE TABLE table name

Keywords in Spark SQL


The following keywords are reserved in Spark SQL.
ALL
AND
AS
ASC
APPROXIMATE
AVG
BETWEEN
BY
CACHE
CAST
COUNT
DESC
DISTINCT
FALSE
FIRST
LAST
FROM
FULL
GROUP
HAVING
IF
IN
INNER
INSERT
INTO
IS
JOIN
LEFT
LIMIT
MAX
MIN
NOT
NULL
ON
OR
OVERWRITE
LIKE
RLIKE

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
274
Using DataStax Enterprise advanced functionality

UPPER
LOWER
REGEXP
ORDER
OUTER
RIGHT
SELECT
SEMI
STRING
SUM
TABLE
TIMESTAMP
TRUE
UNCACHE
UNION
WHERE
INTERSECT
EXCEPT
SUBSTR
SUBSTRING
SQRT
ABS
Inserting data into tables with static columns using Spark SQL
Static columns are mapped to different columns in Spark SQL and require special handling. Spark SQL Thrift
servers use Hive. When you when run an insert query, you must pass data to those columns.
To work around the different columns, set cql3.output.query in the insertion Hive table properties to
limit the columns that are being inserted. In Spark SQL, alter the external table to configure the prepared
statement as the value of the Hive CQL output query. For example, this prepared statement takes values that
are inserted into columns a and b in mytable and maps these values to columns b and a, respectively, for
insertion into the new row.

spark-sql> ALTER TABLE mytable SET TBLPROPERTIES ('cql3.output.query' = 'update


mykeyspace.mytable set b = ? where a = ?');
spark-sql> ALTER TABLE mytable SET SERDEPROPERTIES ('cql3.update.columns' =
'b,a');

Running HiveQL queries using Spark SQL


Spark SQL supports queries written using HiveQL, a SQL-like language that produces queries that are
converted to Spark jobs. HiveQL is more mature and supports more complex queries than Spark SQL. To
construct a HiveQL query, first create a new HiveContext instance, and then submit the queries by calling
the sql method on the HiveContext instance.
See the Hive Language Manual for the full syntax of HiveQL.

Creating indexes with DEFERRED REBUILD is not supported in Spark SQL.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
275
Using DataStax Enterprise advanced functionality

1. Start the Spark shell.

$ bin/dse spark

2. Use the provided HiveContext instance sqlContext to create a new query in HiveQL by calling the sql
method on the sqlContext object..

val results = sqlContext.sql("SELECT * FROM my_keyspace.my_table")

Using the DataFrames API


The Spark DataFrames API encapsulates data sources, including DataStax Enterprise data, organized into
named columns.
The Spark Cassandra Connector provides an integrated DataSource to simplify creating DataFrames. For
more technical details, see the Spark Cassandra Connector documentation that is maintained by DataStax
and the Cassandra and PySpark DataFrames post.
Examples of using the DataFrames API
This Python example shows using the DataFrames API to read from the table ks.kv and insert into a different
table ks.othertable.

$ dse pyspark

table1 = spark.read.format("org.apache.spark.sql.cassandra")
.options(table="kv", keyspace="ks")
.load()
table1.write.format("org.apache.spark.sql.cassandra")
.options(table="othertable", keyspace = "ks")
.save(mode ="append")

Using the DSE Spark console, the following Scala example shows how to create a DataFrame object from
one table and save it to another.

$ dse spark

val table1 = spark.read.format("org.apache.spark.sql.cassandra")


.options(Map( "table" -> "words", "keyspace" -> "test"))
.load()
table1.createCassandraTable("test", "otherwords", partitionKeyColumns =
Some(Seq("word")), clusteringKeyColumns = Some(Seq("count")))
table1.write.cassandraFormat("otherwords", "test").save()

The write operation uses one of the helper methods, cassandraFormat, included in the Spark Cassandra
Connector. This is a simplified way of setting the format and options for a standard DataFrame operation. The
following command is equivalent to write operation using cassandraFormat:

table1.write.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "othertable", "keyspace" -> "test"))
.save()

Using the Spark SQL Thriftserver


The Spark SQL Thriftserver uses JDBC and ODBC interfaces for client connections to the database.
The AlwaysOn SQL service is a high-availability service built on top of the Spark SQL Thriftserver. The Spark
SQL Thriftserver is started manually on a single node in an Analytics datacenter, and will not failover to

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
276
Using DataStax Enterprise advanced functionality

another node. Both AlwaysOn SQL and the Spark SQL Thriftserver provide JDBC and ODBC interfaces to
DSE, and share many configuration settings.

1. If you are using Kerberos authentication, in the hive-site.xml file, configure your authentication
credentials for the Spark SQL Thrift server.

<property>
<name>hive.server2.authentication.kerberos.principal</name>
<value>thriftserver/_HOST@EXAMPLE.COM</value>
</property>

<property>
<name>hive.server2.authentication.kerberos.keytab</name>
<value>/etc/dse/dse.keytab</value>
</property>

Ensure that you use the hive-site.xml file in the Spark directory:

• Package installations: /etc/dse/spark/hive-site.xml

• Tarball installations: installation_location/resources/spark/conf/hive-site.xml

2. Start DataStax Enterprise with Spark enabled as a service or in a standalone installation.

3. Start the server by entering the dse spark-sql-thriftserver start command as a user with
permissions to write to the Spark directories.
To override the default settings for the server, pass in the configuration property using the --hiveconf
option. See the HiveServer2 documentation for a complete list of configuration properties.

$ dse spark-sql-thriftserver start

By default, the server listens on port 10000 on the localhost interface on the node from which it was
started. You can specify the server to start on a specific port. For example, to start the server on port
10001, use the --hiveconf hive.server2.thrift.port=10001 option.

$ dse spark-sql-thriftserver start --hiveconf hive.server2.thrift.port=10001

You can configure the port and bind address permanently in resources/spark/conf/spark-env.sh:

$ export HIVE_SERVER2_THRIFT_PORT=10001 export


HIVE_SERVER2_THRIFT_BIND_HOST=1.1.1.1

You can specify general Spark configuration settings by using the --conf option.

$ dse spark-sql-thrift-server start --conf spark.cores.max=4

4. Use DataFrames to read and write large volumes of data. For example, to create the table_a_cass_df
table that uses a DataFrame while referencing table_a:

CREATE TABLE table_a_cass_df using org.apache.spark.sql.cassandra OPTIONS (table


"table_a", keyspace "ks")

With DataFrames, compatibility issues exist with UUID and Inet types when inserting data with the
JDBC driver.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
277
Using DataStax Enterprise advanced functionality

5. Use the Spark Cassandra Connector tuning parameters to optimize reads and writes.

6. To stop the server, enter the dse spark-sql-thriftserver stop command.

$ dse spark-sql-thriftserver stop

What's next:
You can now connect your application by using the Simba JDBC driver to the server at the URI:
jdbc:hive2://hostname:port number, using the Simba ODBC driver or use dse beeline.

Using SparkR with DataStax Enterprise


Apache SparkR is a front-end for the R programming language for creating analytics applications. DataStax
Enterprise integrates SparkR to support creating data frames from DSE data.
SparkR support in DSE requires you to first install R on the client machines on which you will be using SparkR.
To use R user defined functions and distributed functions the same version of R should be installed on all the
nodes in the Analytics cluster. DSE SparkR is built against R version 3.1.1. Many Linux distributions by default
install older versions of R.
For example, on Debian and Ubuntu clients:

$ sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/" >> /etc/apt/


sources.list' && gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9 && gpg -a --
export E084DAB9 | sudo apt-key add - && sudo apt-get update && sudo apt-get install r-
base

On RedHat and CentOS clients:

$ sudo yum install R

Starting SparkR
Start the SparkR shell using the dse command to automatically set the Spark session within R.

1. Start the R shell using the dse command.

$ dse sparkR

Using AlwaysOn SQL service


AlwaysOn SQL is a high availability service that responds to SQL queries from JDBC and ODBC applications.
By default, AlwaysOn SQL is disabled. It is built on top of the Spark SQL Thriftserver, but provides failover and
caching between instances so there is no single point of failure. AlwaysOn SQL provides enhanced security,
leveraging the same user management as the rest of DSE, executing queries to the underlying database as the
user authenticated to AlwaysOn SQL.
In order to run AlwaysOn SQL, you must have:

• A running datacenter with DSE Analytics nodes enabled.

• Enabled AlwaysOn SQL on every Analytics node in the datacenter.

• Modified the replication factor for all Analytics nodes, if necessary.

• Set the native_transport_address in cassandra.yaml to an IP address accessible by the AlwaysOn SQL


clients. This address depends on your network topology and deployment scenario.

• Configured AlwaysOn SQL for security, if authentication is enabled.

Lifecycle Manager allows you to enable and configure AlwaysOn SQL in managed clusters.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
278
Using DataStax Enterprise advanced functionality

When AlwaysOn SQL is enabled within an Analytics datacenter, all nodes within the datacenter must have
AlwaysOn SQL enabled. Use dsetool ring to find which nodes in the datacenter are Analytics nodes.

AlwaysOn SQL is not supported when using DSE Multi-Instance or other deployments with multiple DSE
instances on the same server.

The dse client-tool alwayson-sql command controls the server. The command works on the local
datacenter unless you specify the datacenter with the --dc option:

$ dse client-tool alwayson-sql --dc datacenter_name command

Enabling AlwaysOn SQL


Set enabled to true and uncomment the AlwaysOn SQL options in dse.yaml .
Configuring AlwaysOn SQL
The alwayson_sql_options section in dse.yaml, described in detail at AlwaysOn SQL options, has options
for setting the ports, timeout values, log location, and other Spark or Hive configuration settings. Additional
configuration options are located in spark-alwayson-sql.conf.
AlwaysOn SQL binds to the native_transport_address in cassandra.yaml.
If you have changed some configuration settings in dse.yaml while AlwaysOn SQL is running, you can have the
server pick up the new configuration by entering:

dse client-tool alwayson-sql reconfig

The following settings can be changed using reconfig:

• reserve_port_wait_time_ms

• alwayson_sql_status_check_wait_time_ms

• log_dsefs_dir

• runner_max_errors

Changing other options requires a restart, except for the enabled option. Enabling or disabling AlwaysOn
SQL requires restarting DSE.
The spark-alwayson-sql.conf file contains Spark and Hive settings as properties. When AlwaysOn SQL is
started, spark-alwayson-sql.conf is scanned for Spark properties, similar to other Spark applications started
with dse spark-submit. Properties that begin with spark.hive are submitted as properties using --hiveconf,
removing the spark. prefix.
For example, if spark-alwayson-sql.conf has the following setting:

spark.hive.server2.table.type.mapping CLASSIC

That setting will be converted to --hiveconf hive.server2.table.type.mapping=CLASSIC when AlwaysOn


SQL is started.
Configuring AlwaysOnSQL in a DSE Analytics Solo datacenter
If AlwaysOn SQL is used in a DSE Analytics Solo datacenter, modify spark-alwayson-sql.conf to configure
Spark with the DSE Analytics Solo datacenters. In the following example, the transactional datacenter name is
dc0 and the DSE Analytics Solo datacenter is dc1.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
279
Using DataStax Enterprise advanced functionality

Under spark.master set the Spark URI to the connect to the DSE Analytics Solo datacenter.

spark.master=dse://?connection.local_dc=dc1

Add the spark.cassandra.connection.local_dc property to spark-alwayson-sql.conf and set it to the name


of the transactional datacenter.

spark.cassandra.connection.local_dc=dc0

Starting and stopping AlwaysOn SQL


If you have enabled AlwaysOn SQL, it will start when the cluster is started. If AlwaysOn SQL is enabled and
DSE is restarted, AlwaysOn SQL will be started regardless of the previous state of AlwaysOn SQL. You only
need to explicitly start the server if it has been stopped, for example for a configuration change.
To start AlwaysOn SQL service:

$ dse client-tool alwayson-sql start

To start the server on a specific datacenter, specify the datacenter name with the --dc option:

$ dse client-tool alwayson-sql --dc dc-west start

To completely stop AlwaysOn SQL service:

$ dse client-tool alwayson-sql stop

The server must be manually started after issuing a stop command.


To restart a running server:

$ dse client-tool alwayson-sql restart

Checking the status of AlwaysOn SQL


To find the status of AlwaysOn SQL issue a status command using dse-client-tool.

$ dse client-tool alwayson-sql status

You can also view the status in a web browser by going to http://node name or IP address:AlwaysOn SQL
web UI port. By default, the port is 9077. For example, if 10.10.10.1 is the IP address of an Analytics node with
AlwaysOn SQL enabled, navigate to http://10.10.10.1:9077.
The returned status is one of:

• RUNNING: the server is running and ready to accept client requests.

• STOPPED_AUTO_RESTART: the server is being started but is not yet ready to accept client requests.

• STOPPED_MANUAL_RESTART: the server was stopped with either a stop or restart command. If the server
was issued a restart command, the status will be changed to STOPPED_AUTO_RESTART as the server
starts again.

• STARTING: the server is actively starting up but is not yet ready to accept client requests.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
280
Using DataStax Enterprise advanced functionality

Caching tables within Spark SQL queries


To increase performance, you can specify tables to be cached into RAM using the CACHE TABLE directive.
Permanent cached tables will be recached on server restart.
You can cache an existing table by issuing a CACHE TABLE Spark SQL command through a client:

CACHE TABLE keyspace_name.table_name;

CACHE TABLE keyspace_name.table_name AS select statement;

The temporary cache table is only valid for the session in which it was created, and will not be recreated on
server restart.
Create a permanent cache table using the CREATE CACHE TABLE directive and a SELECT query:

CREATE CACHE TABLE keyspace_name.table_name AS select_statement;

The table cache can be destroyed using the UNCACHE TABLE and CLEAR CACHE directives.

UNCACHE TABLE keyspace_name.table_name;

The CLEAR CACHE directive removes the table cache.

CLEAR CACHE;

Issuing DROP TABLE will remove all metadata including the table cache.
Enabling SSL for AlwaysOn SQL
Communication between the driver and AlwaysOn SQL can be encrypted using SSL.
The following instructions give an example of how to set up SSL with a self-signed keystore and truststore.

1. Ensure client-to-node encryption is enabled and configured correctly.

2. If the SSL keystore and truststore used for AlwaysOn SQL differ from the keystore and truststore
configured in cassandra.yaml, add the required settings to enable SSL to the hive-site.xml configuration
file.

By default the SSL settings in cassandra.yaml will be used with AlwaysOn SQL.

<property>
<name>hive.server2.thrift.bind.host</name>
<value>hostname</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>true</value>
</property>
<property>
<name>hive.server2.keystore.path</name>
<value>path to keystore/keystore.jks</value>
</property>
<property>
<name>hive.server2.keystore.password</name>
<value>keystore password</value>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
281
Using DataStax Enterprise advanced functionality

</property>

3. Start or restart the AlwaysOn SQL service.

Changes in the hive-site.xml configuration file only require a restart of AlwaysOn SQL service,
not DSE.

$ dse client-tool alwayson-sql start

4. Test the connection with Beeline.

$ dse beeline

beeline> !connect jdbc:hive2://hostname:10000/default;ssl=true;sslTrustStore=path to


truststore/truststore.jks;trustStorePassword=truststore password

The JDBC URL for the Simba JDBC Driver is:

jdbc:spark://hostname:10000/default;SSL=1;SSLTrustStore=path to truststore/
truststore.jks;SSLTrustStorePwd=truststore password

Using authentication with AlwaysOn SQL


AlwaysOn SQL can be configured to use DSE authentication.
When DSE authentication is enabled, modify the hive-site.xml configuration file to enable JDBC authentication.
DSE supports configurations for password authentication and Kerberos authentication. The hive-site.xml
file has sections with preconfigured settings to use no authentication (the default), password authentication, or
Kerberos authentication. Uncomment the preferred authentication mechanism, then restart AlwaysOn SQL.

DSE supports multiple authentication mechanisms, but AlwaysOn SQL only supports one mechanism per
datacenter.

AlwaysOn SQL supports DSE proxy authentication. The user who executes the queries is the user who
authenticated using JDBC. If AlwaysOn SQL was started by user Amy, and then Bob begins a JDBC session,
the queries are executed by Amy on behalf of Bob. Amy must have permissions to execute these queries on
behalf of Bob.
To enable authentication in AlwaysOn SQL alwayson_sql_options, follow these steps.

1. Create the auth_user role specified in AlwaysOn SQL options and grant the following permissions to the
role.

CREATE ROLE alwayson_sql WITH LOGIN=true; // role name matches auth_user

// Required if scheme_permissions true


GRANT EXECUTE ON ALL AUTHENTICATION SCHEMES TO alwayson_sql;

// Spark RPC settings


GRANT ALL PERMISSIONS ON REMOTE OBJECT DseResourceManager TO alwayson_sql;
GRANT ALL PERMISSIONS ON REMOTE OBJECT DseClientTool TO alwayson_sql;
GRANT ALL PERMISSIONS ON REMOTE OBJECT AlwaysOnSqlRoutingRPC to alwayson_sql;
GRANT ALL PERMISSIONS ON REMOTE OBJECT AlwaysOnSqlNonRoutingRPC to alwayson_sql;

// Spark and DSE required table access


GRANT SELECT ON system.size_estimates TO alwayson_sql;
GRANT SELECT, MODIFY ON "HiveMetaStore".sparkmetastore TO alwayson_sql;
GRANT SELECT, MODIFY ON dse_analytics.alwayson_sql_cache_table TO alwayson_sql;

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
282
Using DataStax Enterprise advanced functionality

GRANT SELECT, MODIFY ON dse_analytics.alwayson_sql_info TO alwayson_sql;

// Permissions to create and change applications


GRANT CREATE, DESCRIBE ON ANY WORKPOOL TO alwayson_sql;
GRANT MODIFY, DESCRIBE ON ANY SUBMISSION TO alwayson_sql;

See Setting up DSE Spark application permissions for more details.

2. Create the user role.


For internal authentication:

CREATE ROLE 'user_name'


WITH LOGIN = true;

If you use Kerberos, set up a role that matches the full Kerberos principal name for each user.

CREATE ROLE 'user_name/example.com@EXAMPLE.COM'


WITH LOGIN = true;

3. Grant permissions to access keyspaces and tables to the user role.


For internal roles:

GRANT SELECT ON KEYSPACE keyspace_name


TO 'user_name';

For Kerberos roles:

GRANT SELECT ON KEYSPACE keyspace_name


TO 'user_name/example.com@EXAMPLE.COM';

4. Allow the AlwaysOn SQL role (auth_user) to execute commands with the user role.
For internal roles:

GRANT PROXY.EXECUTE
ON ROLE 'user_name'
TO alwayson_sql;

For Kerberos roles:

GRANT PROXY.EXECUTE
ON ROLE 'user_name/example.com@EXAMPLE.COM'
TO alwayson_sql;

5. Open the hive-site.xml configuration file in an editor.

6. Uncomment and modify the authentication mechanism used in hive-site.xml.

• If password authentication is used, enable password authentication in DSE.

• If Kerberos authentication is to be used, Kerberos does not need to be enabled in DSE. AlwaysOn
SQL must have its own service principal and keytab.

• The user must have login permissions in DSE in order to login through JDBC to AlwaysOn SQL.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
283
Using DataStax Enterprise advanced functionality

This example shows how to enable Kerberos authentication. Modify the Kerberos domain and path to the
keytab file.

<!-- Start of: configuration for authenticating JDBC users with Kerberos -->
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>

<property>
<name>hive.server2.authentication</name>
<value>KERBEROS</value>
</property>

<property>
<name>hive.server2.authentication.kerberos.principal</name>
<value>hiveserver2/_HOST@KERBEROS DOMAIN</value>
</property>

<property>
<name>hive.server2.authentication.kerberos.keytab</name>
<value>path to hiveserver2.keytab</value>
</property>
<!-- End of: configuration for authenticating JDBC users with Kerberos -->

7. Modify the owner of the /spark and /tmp/hive directories in DSEFS so the new role can write to the log
and temp files.

$ dse fs 'chown -R -u alwayson_sql -g alwayson_sql /spark'

$ dse fs 'chown -R -u alwayson_sql -g alwayson_sql /tmp/hive'

8. Restart AlwaysOn SQL.

$ dse client-tool alwayson-sql restart

Simba JDBC Driver for Apache Spark


The Simba JDBC Driver for Spark provides a standard JDBC interface to the information stored in DataStax
Enterprise with AlwaysOn SQL running.
See Installing Simba JDBC Driver for Apache Spark.
Simba ODBC Driver for Apache Spark
The Simba ODBC Driver for Spark provides users access to DataStax Enterprise (DSE) clusters with a
running AlwaysOn SQL. The driver is compliant with the latest ODBC 3.52 specification and automatically
translates any SQL-92 query into Spark SQL.
See Installing Simba ODBC Driver for Apache Sparkhttps://docs.datastax.com/en/driver-matrix/doc/
driver_matrix/common/installSimbaODBCdriver.html
Connecting to AlwaysOn SQL server using Beeline
You can use Shark Beeline to test AlwaysOn SQL.

1. Start AlwaysOn SQL.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
284
Using DataStax Enterprise advanced functionality

2. Start the Beeline shell.

$ dse beeline

3. Connect to the server using the JDBC URI for your server.

beeline> !connect jdbc:hive2://localhost:10000

4. Connect to a keyspace and run a query from the Beehive shell.

0: jdbc:hive2://localhost:10000> use test;


0: jdbc:hive2://localhost:10000> select * from test;

Accessing DataStax Enterprise data from external Spark clusters


DataStax Enterprise works with external Spark clusters in a bring-your-own-Spark (BYOS) model.
Overview of BYOS support in DataStax Enterprise
BYOS support in DataStax Enterprise consists of a JAR file and a generated configuration file that provides
all the necessary classes and configuration settings for connecting to a particular DataStax Enterprise cluster
from an external Spark cluster. To specify a different classpath to accommodate applications originally written
for open source Apache Spark, specify the -framework option with dse spark commands.
All DSE resources, including DSEFS file locations, can be accessed from the external Spark cluster.
BYOS is tested against the version of Spark integrated into DSE (described in the DataStax Enterprise 6.0
release notes) and the following Spark distributions:

• Hortonworks Data Platform (HDP) 2.5

• Cloudera CDH 5.10

Generating the BYOS configuration file


The byos.properties file is used to connect to a DataStax Enterprise cluster from a Spark cluster. The
configuration file contains connection information about the DataStax Enterprise cluster. This file must
be generated on a node in the DataStax Enterprise cluster. You can specify an arbitrary name for the
generated configuration file. The byos.properties name is used throughout the documentation to refer to this
configuration file.
Prerequisites:
If you are using Graph OLAP queries with BYOS, increase the max_concurrent_sessions setting in your
cluster to 120.

1. Connect to a node in your DataStax Enterprise cluster.

2. Generate the byos.properties file using the dse client-tool command.

$ dse client-tool configuration byos-export ~/byos.properties

This will generate the byos.properties file in your home directory. See dse client-tool for more
information on the options for dse client-tool.

What's next:
The byos.properties file can be copied to a node in the external Spark cluster and used with the Spark shell,
as described in Connecting to DataStax Enterprise using the Spark shell on an external Spark cluster.
Connecting to DataStax Enterprise using the Spark shell on an external Spark cluster
Use the generated byos.properties configuration file and the byos-version.jar from a DataStax Enterprise
node to connect to the DataStax Enterprise cluster from the Spark shell on an external Spark cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
285
Using DataStax Enterprise advanced functionality

Prerequisites:
You must generate the byos.properties on a node in your DataStax Enterprise cluster.

1. Copy the byos.properties file you previously generated from the DataStax Enterprise node to the local
Spark node.

$ scp user@dsenode1.example.com:~/byos.properties .

If you are using Kerberos authentication, specify the --generate-token and --token-renewer
<username> options when generating byos.properties, as described in dse client-tool configuration
byos-export.

2. Copy the byos-version.jar file from the clients directory from a node in your DataStax Enterprise cluster
to the local Spark node.
The byos-version.jar file location depends on the type of installation.

$ scp user@dsenode1.example.com:/usr/share/dse/clients/dse-byos_2.11-6.0.2.jar
byos-6.0.jar

3. Merge external Spark properties into byos.properties.

$ cat ${SPARK_HOME}/conf/spark-defaults.conf >> byos.properties

4. If you are using Kerberos authentication, set up a CRON job or other task scheduler to periodically call
dse client-tool cassandra renew-token <token> where <token> is the encoded token string in
byos.properties.

5. Start the Spark shell using the byos.properties and byos-version.jar file.

$ spark-shell --jars byos-6.0.jar --properties-file byos.properties

Generating Spark SQL schema files


Spark SQL can import schema files generated by DataStax Enterprise.

1. Export the schema file using dse client-tool.

$ dse client-tool --use-server-config spark sql-schema --all > output.sql

2. Copy the schema to an external Spark node.

$ scp output.sql user@sparknode1.example.com:

3. On a Spark node, import the schema using Spark.

$ spark-sql --jars byos-5.1.jar --properties-file byos.properties -f output.sql

Starting Spark SQL Thrift Server with Kerberos


Spark SQL Thrift Server is a long running service and must be configured to start with a keytab file if Kerberos
is enabled. The user principal must be added to DSE, and Spark SQL Thrift Server restarted with the
generated BYOS configuration file and byos-version.jar.
Prerequisites:
These instructions are for the Spark SQL Thrift Server included in HortonWorks 2.4. The Hadoop Spark SQL
Thrift Server principal is hive/_HOST@REALM.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
286
Using DataStax Enterprise advanced functionality

1. Create the principal on the DSE node using cqlsh.

create user hive/spark_sql_thrift_server_host@REALM;

2. Login as the hive user on the Spark SQL Thrift Server host.

3. Create a ~/.java.login.config file with a JAAS Kerberos configuration.

4. Merge the existing Spark SQL Thrift Server configuration properties with the generated BYOS
configuration file into a new file.

$ cat /usr/hdp/current/spark-thriftserver/conf/spark-thrift-sparkconf.conf
byos.properties > custom-sparkconf.conf

5. Start Spark SQL Thrift Server with the custom configuration file and byos-version.jar.

$ /usr/hdp/2.4.2.0-258/spark/sbin/start-thriftserver.sh --jars byos-version.jar --


properties-file custom-sparkconf.conf

6. Connect using the Beeline client.

$ beeline -u 'jdbc:hive2://hostname:port/default;principal=hive/_HOST@REALM'

What's next:
Generated SQL schema files can be passed to beeline with the -f option to generate a mapping for DSE
tables so both Hadoop and DataStax Enterprise tables will be available through the service for queries.
Using the Spark Jobserver
DataStax Enterprise includes a bundled copy of the open-source Spark Jobserver, an optional component
for submitting and managing Spark jobs, Spark contexts, and JARs on DSE Analytics clusters. Refer to the
Components in the release notes to find the version of the Spark Jobserver included in this version of DSE.
Valid spark-submit options are supported and can be applied to the Spark Jobserver. To use the Jobserver:

• Start the job server:

$ dse spark-jobserver start [any_spark_submit_options]

• Stop the job server:

$ dse spark-jobserver stop

The default location of the Spark Jobserver depends on the type of installation:

• Package installations: /usr/share/dse/spark/spark-jobserver

• Tarball installations: installation_location/resources/spark/spark-jobserver

All the uploaded JARs, temporary files, and log files are created in the user's $HOME/.spark-jobserver
directory, first created when starting Spark Jobserver.
Beneficial use cases for the Spark Jobserver include sharing cached data, repeated queries of cached data,
and faster job starts.

Running multiple SparkContext instances in a single JVM is not recommended. Therefore it is not
recommended to create a new SparkContext for each submitted job in a single Spark Jobserver instance.
We recommend one of the two following Spark Jobserver usages.

• Persistent Context Mode: a single pre-created SparkContext shared by all jobs.

• Context per JVM: each job has it's own SparkContext in a separate JVM.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
287
Using DataStax Enterprise advanced functionality

By default, the H2 database is used for storing Spark Jobserver related metadata. In this setup, using
Context per JVM requires additional configuration. See the Spark Jobserver docs for details.

In Context per JVM mode, job results must not contain instances of classes that are not present in the
Spark Jobserver classpath. Problems with returning unknown (to server) types can be recognized by
following log line:

Association with remote system [akka.tcp://JobServer@127.0.0.1:45153]


has failed, address is now gated for [5000] ms.
Reason: [<unknown type name is placed here>]

Please consult Spark Jobserver docs to see configuration details.

For an example of how to create and submit an application through the Spark Jobserver, see the spark-
jobserver demo included with DSE.
The default location of the demos directory depends on the type of installation:

• Package installations: /usr/share/dse/demos

• Tarball installations: installation_location/demos

Enabling SSL communication with Jobserver


To enable SSL encryption when connecting to Jobserver, you must have a server certificate, and a truststore
containing the certificate. Add the following configuration section to the dse.conf file in the Spark Jobserver
directory.

spray.can.server {
ssl-encryption = on
keystore = "path to keystore"
keystorePW = "keystore password"
}

The default location of the Spark Jobserver depends on the type of installation:

• Package installations: /usr/share/dse/spark/spark-jobserver

• Tarball installations: installation_location/resources/spark/spark-jobserver

Restart the Jobserver after saving the configuration changes.


DSEFS (DataStax Enterprise file system)
DSEFS is the default distributed file system on DSE Analytics nodes.
About DSEFS
DSEFS (DataStax Enterprise file system) is a fault-tolerant, general-purpose, distributed file system within
DataStax Enterprise. It is designed for use cases that need to leverage a distributed file system for data
ingestion, data staging, and state management for Spark Streaming applications (such as checkpointing or
write-ahead logging). DSEFS is similar to HDFS, but avoids the deployment complexity and single point of
failure typical of HDFS. DSEFS is HDFS-compatible and is designed to work in place of HDFS in Spark and
other systems.
DSEFS is the default distributed file system in DataStax Enterprise, and is automatically enabled on all analytics
nodes.
DSEFS stores file metadata (such as file path, ownership, permissions) and file contents separately:

• Metadata is stored in the database.

• File data blocks are stored locally on each node and are replicated onto multiples nodes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
288
Using DataStax Enterprise advanced functionality

The redundancy factor is set at the DSEFS directory or file level, which is more granular than the
replication factor that is set at the keyspace level in the database.

For performance on production clusters, store the DSEFS data on physical devices that are separate from
the database. For development and testing you may store DSEFS data on the same physical device as the
database.
Deployment overview

• The DSEFS server runs in the same JVM as DataStax Enterprise. Similar to the database, there is no
master node. All nodes running DSEFS are equal.

• A single DSEFS cannot span multiple datacenters. To deploy DSEFS in multiple datacenters, you can
create a separate instance of DSEFS for each datacenter.

• You can use different keyspaces to configure multiple DSEFS file systems in a single datacenter.

• For optimal performance, locate the local DSEFS data on a different physical drive than the database.

• Encryption is not supported. Use operating system access controls to protect the local DSEFS data
directories. Other limitations apply.

• DSEFS uses the LOCAL_QUORUM consistency level to store file metadata. DSEFS will always try to write
each data block to replicated node locations, and even if a write fails, it will retry to another node before
acknowledging the write. DSEFS writes are very similar to the ALL consistency level, but with additional
failover to provide high-availability. DSEFS reads are similar to the ONE consistency level.

Enabling DSEFS
DSEFS is automatically enabled on analytics nodes, and disabled on non-analytics nodes. You can enable the
DSEFS service on any node in a DataStax Enterprise cluster. Nodes within the same datacenter with DSEFS
enabled will join together to behave as a DSEFS cluster.

On each node:

1. In the dse.yaml file, set the properties for the DSE File System options:

dsefs_options:
enabled:
keyspace_name: dsefs
work_dir: /var/lib/dsefs
public_port: 5598
private_port: 5599
data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
min_free_space: 5368709120

a. Enable DSEFS:

enabled: true

If enabled is blank or commented out, DSEFS starts only if the node is configured to run analytics
workloads.

b. Define the keyspace for storing the DSEFS metadata:

keyspace_name: dsefs

You can optionally configure multiple DSEFS file systems in a single datacenter.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
289
Using DataStax Enterprise advanced functionality

c. Define the work directory for storing the DSEFS metadata for the local node. The work directory
should not be shared with other DSEFS nodes:

work_dir: /var/lib/dsefs

d. Define the public port on which DSEFS listens for clients:

public_port: 5598

DataStax recommends that all nodes in the cluster have the same value. Firewalls must open
this port to trusted clients. The service on this port is bound to the native_transport_address.

e. Define the private port for DSEFS inter-node communication:

private_port: 5599

Do not open this port to firewalls; this private port must be not visible from outside of the
cluster.

f. Set the data directories where the file data blocks are stored locally on each node.

data_directories:
- dir: /var/lib/dsefs/data

If you use the default /var/lib/dsefs/data data directory, verify that the directory exists and
that you have root access. Otherwise, you can define your own directory location, change the
ownership of the directory, or both:

$ sudo mkdir -p /var/lib/dsefs/data; sudo chown -R $USER:$GROUP /var/lib/


dsefs/data

Ensure that the data directory is writeable by the DataStax Enterprise user. Put the data
directories on different physical devices than the database. Using multiple data directories on
JBOD improves performance and capacity.

g. For each data directory, set the weighting factor to specify how much data to place in this directory,
relative to other directories in the cluster. This soft constraint determines how DSEFS distributes
the data. For example, a directory with a value of 3.0 receives about three times more data than a
directory with a value of 1.0.

data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0

h. For each data directory, define the reserved space, in bytes, to not use for storing file data blocks.
See min_free_space.

data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
min_free_space: 5368709120

2. Restart the node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
290
Using DataStax Enterprise advanced functionality

3. Repeat steps for the remaining nodes.

4. With guidance from DataStax Support, you can tune advanced DSEFS properties:

# service_startup_timeout_ms: 30000
# service_close_timeout_ms: 600000
# server_close_timeout_ms: 2147483647 # Integer.MAX_VALUE
# compression_frame_max_size: 1048576
# query_cache_size: 2048
# query_cache_expire_after_ms: 2000
# gossip_options:
# round_delay_ms: 2000
# startup_delay_ms: 5000
# shutdown_delay_ms: 10000
# rest_options:
# request_timeout_ms: 330000
# connection_open_timeout_ms: 55000
# client_close_timeout_ms: 60000
# server_request_timeout_ms: 300000
# idle_connection_timeout_ms: 60000
# internode_idle_connection_timeout_ms: 120000
# core_max_concurrent_connections_per_host: 8
# transaction_options:
# transaction_timeout_ms: 3000
# conflict_retry_delay_ms: 200
# conflict_retry_count: 40
# execution_retry_delay_ms: 1000
# execution_retry_count: 3
# block_allocator_options:
# overflow_margin_mb: 1024
# overflow_factor: 1.05

5. Continue with using DSEFS.

Disabling DSEFS
To disable DSEFS and remove metadata and data:

1. Remove all directories and files from the DSEFS file system:

$ dse fs rm -r filepath

2. Wait a while for all nodes to perform the delete operations.

3. Verify that all DSEFS data directories where the file data blocks are stored locally on each node are empty.
These data directories are configured in dse.yaml. Your directories are probably different from this
default data_directories value:

data_directories:
- dir: /var/lib/dsefs/data

4. Disable the DSEFS entries in all dse.yaml files on all nodes.

5. Restart DataStax Enterprise.

6. Truncate all of the tables in the dsefs keyspace.


Do not remove the dsefs keyspace. If you inadvertently removed the dsefs keyspace, you must
specify a different keyspace name in dse.yaml or create an empty dsefs keyspace (this empty dsefs
keyspace will be populated with tables during DSEFS start up).

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
291
Using DataStax Enterprise advanced functionality

Do not delete the data_directories before removing the dsefs keyspace tables, or removing the
node from the cluster.

Configuring DSEFS
You must configure data replication. You can optionally configure multiple DSEFS file systems in a datacenter,
and perform other functions, including setting the Kafka log retention.
DSEFS does not span datacenters. Create a separate DSEFS instance in each datacenter, as described in the
steps below.
DSEFS limitations
Know these limitations when you configure and tune DSEFS. The following functionality and features are not
supported:

• Encryption.
Use operating system access controls to protect the local DSEFS data directories.

• File system consistency checks (fsck) and file repair have only limited support. Running fsck will re-
replicate blocks that were under-replicated because a node was taken out of a cluster.

• File repair.

• Forced rebalancing, although the cluster will eventually reach balance.

• Checksum.

• Automatic backups.

• Multi-datacenter replication.

• Symbolic links (soft links, symlinks) and hardlinks.

• Snapshots.

1. Configure replication for the metadata and the data blocks.


You must set the replication factor appropriately to prevent data loss in the case of node failure.
Replication factors must be set for both the metadata and the data blocks. The replication factor of 3 for
data blocks is suitable for most use-cases.

a. Globally: set replication for the metadata in the dsefs keyspace that is stored in the database.
For example, use a CQL statement to configure a replication factor of 3 on the Analytics
datacenter using NetworkTopologyStrategy:

ALTER KEYSPACE dsefs


WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'Analytics': '3'};

Datacenter names are case-sensitive. Verify the case of the using utility, such as dsetool
status.

b. Run nodetool repair on the DSEFS keyspace.

$ nodetool repair dsefs

c. Locally: set the redundancy factor on a specific DSEFS file or directory where the data blocks are
stored.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
292
Using DataStax Enterprise advanced functionality

For example, use the command line:

$ dse fs mkdir -n 4 newdirectory

When a redundancy factor is not specified, it is inherited from the parent directory. The default
redundancy factor is 3.

2. If you have multiple Analytics datacenters, you must configure each DSEFS file system to replicate within
its own datacenter:

a. In the dse.yaml file, specify a separate DSEFS keyspace for each logical datacenter.
For example, on a cluster with logical datacenters DC1 and DC2.
On each node in DC1:

dsefs_options:
...
keyspace_name: dsefs1

On each node in DC2:

dsefs_options:
...
keyspace_name: dsefs2

b. Restart the nodes.

c. Alter the keyspace replication to exist only on the specific datacenters.


On DC1:

ALTER KEYSPACE dsefs1


WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'DC1': '3'};

On DC2:

ALTER KEYSPACE dsefs2


WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'DC2': '3'};

d. Run nodetool repair on the DSEFS keyspace.

$ nodetool repair dsefs

For example, in a cluster with multiple datacenters, the keyspace names dsefs1 and dsefs2 define
separate file systems in each datacenter.

3. When bouncing a streaming application, verify the Kafka log configuration (especially
log.retention.check.interval.ms and policies.log.retention.bytes). Ensure the Kafka log
retention policy is robust enough to handle the length of time expected to bring the application and
consumers back up.
For example, if the log retention policy is too conservative and deletes or rolls are logged very
frequently to save disk space, the users are likely to encounter issues when attempting to recover from
a checkpoint that references offsets that are no longer maintained by the Kafka logs.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
293
Using DataStax Enterprise advanced functionality

DSEFS command line tool


The DSEFS functionality supports operations including uploading, downloading, moving, and deleting files,
creating directories, and verifying the DSEFS status.
DSEFS commands are available only in the logical datacenter. DSEFS works with secured and unsecured
clusters, see DSEFS authentication.
You can interact with the DSEFS file system in several modes:

• Interactive command line shell.

To start DSEFS and launch the DSE FS shell:

$ dse fs

• As part of dse commands.

• With a REST API.

Configuring DSEFS shell logging


The default location of the DSEFS shell log file .dsefs-shell.log is the user home directory. The default
log level is INFO. To configure DSEFS shell logging, edit the installation_location/resources/dse/conf/
logback-dsefs-shell.xml file.

Using with the dse command line


Precede the DSEFS command with dse fs:

$ dse [dse_auth_credentials] fs dsefs_command [options]

For example, to list the file system status and disk space usage in human-readable format:

$ dse -u user1 -p mypassword fs "df -h"

Optional command arguments are enclosed in square brackets. For example, [dse_auth_credentials] and [-
R]
Variable values are italicized. For example, directory and [subcommand].
Working with the local file system in the DSEFS shell
You can refer to files in the local file system by prefixing paths with file:. For example the following command
will list files in the system root directory:

dsefs dsefs://127.0.0.1:5598/ > ls file:/


bin cdrom dev home lib32 lost+found mnt proc run srv tmp var
initrd.img.old vmlinuz.old
boot data etc lib lib64 media opt root sbin sys usr initrd.img vmlinuz

If you need to perform many subsequent operations on the local file system, first change the current working
directory to file: or any local file system path:

dsefs dsefs://127.0.0.1:5598/ > cd file:


dsefs file:/home/user1/path/to/local/files > ls
conf src target build.sbt
dsefs file:/home/user1/path/to/local/files > cd ..

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
294
Using DataStax Enterprise advanced functionality

dsefs file:/home/user1/path/to/local >

DSEFS shell remembers the last working directory of each file system separately. To go back to the previous
DSEFS directory, enter:

dsefs file:/home/user1/path/to/local/files > cd dsefs:


dsefs dsefs://127.0.0.1:5598/ >

To go back again to the previous local directory:

dsefs dsefs://127.0.0.1:5598/ > cd file:


dsefs file:/home/user1/path/to/local/files >

To refer to a path relative to the last working directory of the file system, prefix a relative path with either dsefs:
or file:. The following session will create a directory new_directory in the directory /home/user1:

dsefs dsefs://127.0.0.1:5598/ > cd file:/home/user1


dsefs file:/home/user1 > cd dsefs:
dsefs dsefs://127.0.0.1:5598/ > mkdir file:new_directory
dsefs dsefs://127.0.0.1:5598/ > realpath file:new_directory
file:/home/user1/new_directory
dsefs dsefs://127.0.0.1:5598/ > stat file:new_directory
DIRECTORY file:/home/user1/new_directory:
Owner user1
Group user1
Permission rwxr-xr-x
Created 2017-01-15 13:10:06+0200
Modified 2017-01-15 13:10:06+0200
Accessed 2017-01-15 13:10:06+0200
Size 4096

To copy a file between two different file systems, you can also use the cp command with explicit file system
prefixes in the paths:

dsefs file:/home/user1/test > cp dsefs:archive.tgz another-archive-copy.tgz


dsefs file:/home/user1/test > ls
another-archive-copy.tgz archive-copy.tgz archive.tgz

Authentication
For dse dse_auth_credentials you can provide user credentials in several ways, see Providing credentials from
DSE tools. For authentication with DSEFS, see DSEFS authentication.
Wildcard support
Some DSEFS commands support wildcard pattern expansion in the path argument. Path arguments containing
wildcards are expanded before method invocation into a set of paths matching the wildcard pattern, then the
given method is invoked for each expanded path.
For example in the following directory tree:

dirA
|--dirB
|--file1
|--file2

Giving the stat dirA/* command would be transparently translated into three invocations: stat dirA/dirB,
stat dirA/file1, and stat dirA/file2.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
295
Using DataStax Enterprise advanced functionality

DSEFS supports the following wildcard patterns:

• * matches any files system entry (file or directory) name, as in the example of stat dirA/*.

• ? matches any single character in the file system entry name. For example stat dirA/dir? matches
dirA/dirB.

• [] matches any characters enclosed within the brackets. For example stat dirA/file[0123] matches
dirA/file1 and dirA/file2.

• {} matches any sequence of characters enclosed within the brackets and separated with ,. For example
stat dirA/{dirB,file2} matches dirA/dirB and dirA/file2.

There are no limitations on the number of wildcard patterns in a single path.


For authentication with DSEFS, see DSEFS authentication.
Executing multiple commands
DSEFS can execute multiple commands on one line. Use quotes around the commands and arguments. Each
command will be executed separately by DSEFS.

$ dse fs 'cat file1 file2 file3 file4' 'ls dir1'

Forcing synchronization
Before confirming writing a file, DSEFS by default forces all blocks of the file to be written to the storage
devices. This behavior can be controlled with --no-force-sync and --force-fsync flags when creating files
or directories in the DSEFS shell with mkdir, put, and cp commands. The force/no-force behavior is inherited
from the parent directory, if not specified. For example, if a directory is created with --no-force-sync, then all
files are created with --no-force-sync unless --force-fsync is explicitly set during file creation.
Turning off forced synchronization improves latency and performance at a cost of durability. For example,
if a power loss occurs before writing the data to the storage device, you may lose data. Turn off forced
synchronization only if you have a reliable backup power supply in your datacenter and failure of all replicas is
unlikely, or if you can afford losing file data.
The Hadoop SYNC_BLOCK flag has the same effect as --force-sync in DSEFS. The Hadoop LAZY_PERSIST
flag has the same effect as --no-force-sync in DSEFS.
Removing a DSEFS node
When removing a node running DSEFS from a DSE cluster, additional steps are needed to ensure proper
correctness within the DSEFS data set.

Make sure the replication factor for the cluster is greater than ONE before continuing.

1. From a node in the same datacenter as the node to be removed, start the DSEFS shell.

$ dse fs

2. Show the current DSEFS nodes with the df command.

dsefs > df

Location Status DC Rack Host


Address Port Directory Used Free Reserved
144e587c-11b1-4d74-80f7-dc5e0c744aca up GraphAnalytics rack1
node1.example.com 10.200.179.38 5598 /var/lib/dsefs/data 0 29289783296
5368709120

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
296
Using DataStax Enterprise advanced functionality

98ca0435-fb36-4344-b5b1-8d776d35c7d6 up GraphAnalytics rack1


node2.example.com 10.200.179.39 5598 /var/lib/dsefs/data 0 29302099968
5368709120

3. Find the node to be removed in the list and note the UUID value for it under the Location column.

4. If the node is up, unmount it from DSEFS with the command umount UUID.

dsefs > umount 98ca0435-fb36-4344-b5b1-8d776d35c7d6

5. If the node is not up (for example, after a hardware failure), force unmount it from DSEFS with the
command umount -f UUID.

dsefs > umount -f 98ca0435-fb36-4344-b5b1-8d776d35c7d6

6. Run a file system check with the fsck command to make sure all blocks are replicated.

dsefs > fsck

7. Continue with the normal steps for removing a node.

If data was written to a DSEFS node, more nodes were added to the cluster, and the original node was
removed without running fsck, the data in the original node may be permanently lost.

Removing old DSEFS directories


If you have changed the DSEFS data directory and the old directory is still visible, remove it using the umount
option.

1. Start the DSEFS shell as a role with superuser privileges.

$ dse fs

2. Show the current DSEFS nodes with the df command.

dsefs > df

3. Find the directory to be removed in the list and note the UUID value for it under the Location column.

4. Unmount it from DSEFS with the command umount UUID.

dsefs > umount 98ca0435-fb36-4344-b5b1-8d776d35c7d6

5. Run a file system check with the fsck command to make sure all blocks are replicated.

dsefs > fsck

If the file system check results in an IOException, make sure all the nodes in the cluster are running.
Examples
Using the DSEFS shell, these commands put the local bluefile to the remote DSEFS greenfile:

dsefs / > ls -l

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
297
Using DataStax Enterprise advanced functionality

dsefs / > put file:/bluefile greenfile

To view the new file in the DSEFS directory:

dsefs / > ls -l
Type Permission Owner Group Length Modified Name

file rwxrwxrwx none none 17 2016-05-11 09:34:26+0000 greenfile

Using the dse command, these commands create the test2 directory and upload the local README.md file to the
new DSEFS directory.

$ dse fs "mkdir /test2" && dse fs "put README.md /test2/README.md"

To view the new directory listing:

$ dse fs "ls -l /test2"

Type Permission Owner Group Length Modified Name


file rwxrwxrwx none none 3382 2016-03-07 23:20:34+0000 README.md

You can use two or more dse commands in a single command line. This is faster because the JVM is launched
and connected/disconnected with DSEFS only once. For example:

$ dse fs "mkdir / test2" "put README.md /test/README.md"

The following example shows how to use the --no-force-sync flag on a directory, and how to check the state
of the --force-sync flag using stat. These commands are run from within the DSEFS shell.

dsefs> mkdir --no-force-sync /tmp


dsefs> put file:some-file.dat /tmp/file.tmp
dsefs> stat /tmp/file.tmp
FILE dsefs://127.0.0.1:5598/tmp/file.tmp
Owner none
Group none
Permission rwxrwxrwx
Created 2017-03-06 17:54:35+0100
Modified 2017-03-06 17:54:35+0100
Accessed 2017-03-06 17:54:35+0100
Size 1674
Block size 67108864
Redundancy 3
Compressed false
Encrypted false
Forces sync false
Comment

DSEFS compression
DSEFS is able to compress files to save storage space and bandwidth. Compression is performed by DSE
during upload upon a user’s explicit request. Decompression is transparent. Data is always uncompressed by
the server before it is returned to the client.
Compression is performed within block boundaries. The unit of compression—the chunk of data that gets
compressed individually—is called a frame and its size can be specified during file upload.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
298
Using DataStax Enterprise advanced functionality

Encoders
DSEFS is shipped with the lz4 encoder which works out of the box.
Compression
To compress files use the -c or --compression-encoder parameter for put or cp command. The parameter
specifies the compression encoder to use for the file that is about to get uploaded.

dsefs / > put -c lz4 file /path/to/file

The frame size can optionally be set with the -f, --compression-frame-size option.
The maximum frame size in bytes is set in the compression_frame_max_size option in dse.yaml. If a user
sets the frame size to a value greater than compression_frame_max_size when using put -f an error will be
thrown and the command will fail. Modify the compression_frame_max_size setting based on the available
memory of the node.
Files that are compressed can be appended in the same way as uncompressed files. If the file is compressed
the appended data gets transparently compressed with the file's encoder specified for the initial put operation.
Directories can have a default compression encoder specified during directory creation with the mkdir
command. Newly added files with the put command inherit the default compression encoder from containing
directory. You can override the default compression encoder with the c parameter during put operations.

dsefs / > mkdir -c lz4 /some/path

Decompression
Decompression is performed automatically for all commands that transport data to the client. There is no need
for additional configuration to retrieve the original, decompressed file content.
Storage space
Enabling compression creates a distinction between the logical and physical file size.
The logical size is the size of a file before uploading it to DSEFS, where it is then compressed. The logical size
is shown by the stat command under Size.

dsefs dsefs://10.0.0.1:5598/ > stat /tmp/wikipedia-sample.bz2


FILE dsefs://10.0.0.1:5598/tmp/wikipedia-sample.bz2:
Owner none
Group none
Permission rwxrwxrwx
Created 2017-04-06 20:06:21+0000
Modified 2017-04-06 20:06:21+0000
Accessed 2017-04-06 20:06:21+0000
Size 7723180
Block size 67108864
Redundancy 3
Compressed true
Encrypted false
Comment

The physical size is the actual size of a data stored on the storage device. The physical size is shown by the df
command and by the stat -v command for each block separately, under the Compressed length column.
Limitations
Truncating compressed files is not possible.
DSEFS authentication
DSEFS works with secured DataStax Enterprise clusters.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
299
Using DataStax Enterprise advanced functionality

For related SSL details, see Enabling SSL encryption for DSEFS.

DSEFS authentication with secured clusters


Authentication is required only when it is enabled in the cluster. DSEFS on secured clusters requires the
DseAuthenticator, see Configuring DSE Unified Authentication. Authentication is off by default.
DSEFS supports authentication using DSE Unified Authentication, and supports all authentication schemes
supported by DSE Authenticator, including Kerberos.
DSEFS authentication can secure client to server communication.
Spark applications
For Spark applications, provide authentication credentials in one of these ways:

• Set with the dse spark-submit command using one of the credential options described in Providing
credentials on command line.

• Programmatically set the user credentials in the Spark configuration object before the SparkContext is
created:

conf.set("spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.username",
<user>)
conf.set("spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.password",
<pass>)

If a Kerberos authentication token is in use, you do not need to set any properties in the context object. If
you need to explicitly set the token, set the spark.hadoop.cassandra.auth.token property.

• When running the Spark Shell, where the SparkContext is created at startup, set the properties in the
Hadoop configuration object:

sc.hadoopConfiguration.set("com.datastax.bdp.fs.client.authentication.basic.username",
<user>)
sc.hadoopConfiguration.set("com.datastax.bdp.fs.client.authentication.basic.password",
<pass>)

Note the absence of the spark.hadoop prefix.

• When running a Spark application or the Spark Shell, provide properties in the spark-defaults.conf
configuration file:

<property>
<name>com.datastax.bdp.fs.client.authentication.basic.username</name>
<value>username</value>
</property>
<property>
<name>com.datastax.bdp.fs.client.authentication.basic.password</name>
<value>password</value>
</property>

Optional: If you want to use this method, but do not have privileges to write to core-default.xml, copy
this file to any location path and set the environment variable to point to the file with:

export HADOOP2_CONF_DIR=path

DSEFS shell
Providing authentication credentials while using the DSEFS shell is as easy as in other DSE tools. The DSEFS
shell supports different authentication methods listed below in priority order. When more than one method
can be used, the one with higher priority is chosen. For example when the DSE_TOKEN environment variable

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
300
Using DataStax Enterprise advanced functionality

is set and the DSEFS shell is started with a username and password set as environment variables in the
$HOME/.dserc file the provided username and password is used for authentication as it has higher priority.

1. Specifying a username and password.

• Providing credentials on command line

• Providing credentials in a file

• Providing credentials using environment variable

$ export DSE_USERNAME=username && export DSE_PASSWORD=password

$ dse fs 'mkdir /dir1'

2. Using a Kerberos delegation token. See dse client-tool cassandra for further information.

$ export DSE_TOKEN=`dse -u token_user -p password client-tool cassandra generate-


token`

$ dse fs 'mkdir /dir1'

3. Using a cached Kerberos ticket after authenticating using a tool like kinit.

$ kinit username

$ dse fs 'mkdir /dir1'

4. Using a Kerberos keytab file and a login configuration file.


If the configuration file is in a non-default location, specify the location using the
java.security.auth.login.config property in the DSEFS_SHELL_OPTS variable:

$ DSEFS_SHELL_OPTS="-Djava.security.auth.login.config=path to login config file" dse fs

DSEFS authorization
DSEFS authorization verifies user and group permissions on files and directories stored in DSEFS.
DSEFS authorization is disabled by default. It requires no configuration, it is automatically enabled along with
DSE authorization.

For related SSL details, see Enabling SSL encryption for DSEFS.

Owners, groups, and permissions


In unsecured clusters with DSEFS authentication disabled all newly created files and directories are created
with the owner set to none, group set to none. In unsecured clusters every DSEFS user has full access to every
file and directory.

dsefs dsefs://127.0.0.1:5598/ > ls -l


Type Permission Owner Group Length Modified Name

dir rwxrwxrwx none none - 2016-12-01 15:50:49+0100 some_dir

In secured clusters with DSEFS authentication enabled all newly created files and directories are created with
owner set the authenticated user’s username and group set to authenticated user primary role. See the CQL
roles documentation for detailed information on user roles. File and directory permissions can be specified

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
301
Using DataStax Enterprise advanced functionality

during creation as a parameter for the put and mkdir commands. Please use help put or help mkdir for
details.

dsefs dsefs://127.0.0.1:5598/ > ls -l


Type Permission Owner Group Length Modified Name

dir rwxr-x--- john admin - 2016-12-02 15:52:54+0100 other_dir

To change the owner or group of an existing file or directory use chown or chgrp commands. Please use help
chown or help chgrp for details.
DSEFS by default creates directories with rwxr-xr-x (octal 755) permissions and files with rw-r-r- (octal
644). To change the permissions of an existing file or directory use the chmod command. Please use help
chmod for details.

DSEFS superusers
A DSEFS user is a superuser if and only if the user is a database superuser. Superusers are allowed to
read and write every file and directory stored in DSEFS. Only superusers are allowed to execute DSEFS
maintenance operations like fsck and umount.
DSEFS users
User access is verified against:

• Owner permissions if the file or directory owner name is equal to the authenticated user’s username.

• Group permissions if the file or directory group belongs to the authenticated user’s groups. Groups are
mapped from the database's user role names.

• Other permissions if the above conditions are false.

Each DSEFS command requires it’s own set of permissions. For a given path a/b/c, c is a leaf and a/b is a
parent path. The following table shows what permissions must be present for the given operation to succeed. R
indicates read, W indicates write, and X indicates execute privileges.

Table 16: Affect of permissions on files by DSEFS command


Command Path checked Parent path Leaf permissions
for permissions permissions

append a/b/c a/b/c X W

cat a/b/c a/b/c X R

cd a/b/c a/b/c X

chgrp same as in chown for group

chmod a/b/c a/b/c X The user must be the owner.

chown a/b/c a/b/c X Only superusers can change


the owner. To change the group
the user needs to be a member
of the target group or be a
superuser.

cp same as in get and than put

expand a/?/c a/?/c X X

get a/b/c a/b/c X R

ls a/b/c a/b/c X RX if c is a directory.

mkdir a/b/c a/b X WX

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
302
Using DataStax Enterprise advanced functionality

Command Path checked Parent path Leaf permissions


for permissions permissions

mv a/b/c d/e/f a/b and d/e X WX

put a/b/c a/b X WX

realpath a/b/c a/b/c X

rename a/b/c d a/b X WX

rm a/b/c a/b X WX

rmdir a/b/c a/b X WX

stat a/b/c a/b/c X

truncate a/b/c a/b/c X W

Authorization transitional mode


DSEFS authorization supports transitional mode provided by DSEAuthorizer. Legacy authorizers, like
TransitionalAuthorizer, are not supported. DSE will not start if unsupported authorizer is configured and
error is reported in log messages.
Using the DSEFS REST interface
DSEFS provides a REST interface that implements the commands from WebHDFS.
The REST interface is enabled on all DSE nodes running DSEFS. It is available at the following base URI:
http://node hostname or IP address:5598/webhdfs/v1
For example from a terminal using the curl command:

$ curl -L -X PUT 'localhost:5598/webhdfs/v1/fs/a/b/c/d/e?op=MKDIRS'


&& curl -L -X PUT -T logfile.txt '127.0.0.1:5598/webhdfs/v1/fs/log?
op=CREATE&overwrite=true&blocksize=50000&rf=1' && curl -L -X POST -T logfile.txt
'localhost:5598/webhdfs/v1/fs/log?op=APPEND'

If the DSE cluster has authentication enabled, use the curl --location-trusted parameter when the
WebHDFS noredirect parameter is false (the default value).

Or from the DSE Spark shell:

val rdd1 = sc.textFile("webhdfs://localhost:5598/webhdfs/v1/fs/log")

Programmatic access to DSEFS


DSEFS can be accessed programmatically from an application by obtaining DSEFS's implementation of
Hadoop's FileSystem interface.
DSE includes a demo project with simple applications that demonstrate how to acquire, configure, and use this
implementation. The demo project demonstrates reading, writing and connecting to a secured DSEFS using the
API. The demo is located in the dsefs directory under the demos directory.
The default location of the demos directory depends on the type of installation:

• Package installations: /usr/share/dse/demos

• Tarball installations: installation_location/demos

The README.md has instructions on building and running the demo applications.
Hadoop FileSystem interface implemented by DseFileSystem
The DseFileSystem class has partial support of the Hadoop FileSystem interface.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
303
Using DataStax Enterprise advanced functionality

Specify the DSEFS URI by using the following form: dsefs://host0[:port][,host1[:port]]/.


Multiple contact points can be specified in the URI, separated by commas. For example: dsefs://
host1.example.com,host2.example.com/. If host1.example.com is down, a connection to
host2.example.com will be attempted.
The following table outlines which methods have been implemented.

Table 17: Methods of Hadoop FileSystem interface implemented by DseFileSystem


Method Status Comment

getScheme() # since 5.0.12, 5.1.6

getURI() #

getName() # default, deprecated

getDefaultPort() # since 5.0.12, 5.1.6

makeQualified(Path) # default

getDelegationToken(String) # returns null

addDelegationTokens(String, Credentials) #

collectDelegationTokens(...) #

getChildFileSystems() # default, returns null

getFileBlockLocations(FileStatus, long, long) #

getFileBlockLocations(Path, long, long) #

getServerDefaults() # default, deprecated

getServerDefaults(Path) # default

resolvePath(Path) # default

open # all variants, buffer size not supported

create # all variants, checksum options, progress reporting and


APPEND, NEW_BLOCK flags not supported

createNonRecursive # all variants

createNewFile # default

append # all variants, progress reporting not supported

concat # since 5.0.12, 5.1.6

getReplication(Path) #

setReplication(Path, short) # does nothing

rename #

truncate(Path, long) # since 5.0.12, 5.1.6

delete(Path) #

delete(Path, boolean) #

deleteOnExit(Path) # default

cancelDeleteOnExit(Path) # default

exists(Path) #

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
304
Using DataStax Enterprise advanced functionality

Method Status Comment

isDirectory(Path) #

isFile(Path) #

getLength(Path) #

getContentSummary(Path) # default

listStatus # all variants

listCorruptFileBlocks(Path) # throws UnsupportedOperationException

globStatus # default

listLocatedStatus # default

listStatusIterator # default

listFiles # default

getHomeDirectory() # default

getWorkingDirectory() #

setWorkingDirectory() #

getInitialWorkingDirectory() # default,returns null

mkdirs #

copyFromLocalFile # default

moveFromLocalFile # default

copyToLocalFile # default

moveToLocalFile # default

startLocalOutput # default

close #

getUsed # default, slow

getBlockSize #

getDefaultBlockSize() # since 5.0.12, 5.1.6

getDefaultBlockSize(Path) # since 5.0.12, 5.1.6

getDefaultReplication() # since 5.0.12, 5.1.6

getDefaultReplication(Path) # since 5.0.12, 5.1.6

getFileStatus(Path) #

access(Path, FsAction) # default

createSymLink # throws UnsupportedOperationException

getFileLinkStatus # default, same as getFileStatus

supportsSymLinks # returns false

getLinkTarget # throws UnsupportedOperationException

resolveLink # throws UnsupportedOperationException

getFileChecksum # returns null

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
305
Using DataStax Enterprise advanced functionality

Method Status Comment

setVerifyChecksum # does nothing

setWriteChecksum # does nothing

getStatus # default, returns incorrect default data

setPermission #

setOwner #

setTimes # does nothing

createSnapshot # throws UnsupportedOperationException

renameSnapshot # throws UnsupportedOperationException

deleteSnapshot # throws UnsupportedOperationException

modifyAclEntries # throws UnsupportedOperationException

removeAclEntries # throws UnsupportedOperationException

removeDefaultAcl # throws UnsupportedOperationException

removeAcl # throws UnsupportedOperationException

setAcl # throws UnsupportedOperationException

getAclStatus # throws UnsupportedOperationException

setXAttr # throws UnsupportedOperationException

getXAttr # throws UnsupportedOperationException

getXAttrs # throws UnsupportedOperationException

listXAttrs # throws UnsupportedOperationException

removeXAttr # throws UnsupportedOperationException

Using JMX to read DSEFS metrics


DSEFS reports status and performance metrics through JMX in the domain com.datastax.bdp:type=dsefs.
This page describes the classes exposed in JMX.
Location
Location metrics provide information about each DSEFS location status. There is one set of Location metrics
for each DSEFS location. Every DataStax Enterprise (DSE) node knows about all locations, so connect to any
node to get the full status of the cluster. The following gauges are defined:
directory
Path to the directory where DSEFS data is stored. This is a constant value configured in dse.yaml
estFreeSpace
Estimated amount of free space on the device where the storage directory is located, in bytes. This
value is refreshed periodically, so if you need an up-to-date value, read the BlockStore.freeSpace
metric.
estUsedSpace
Estimated amount of space used by the contents of the storage directory, in bytes. This value is
refreshed periodically, so if you need an up-to-date value, read the BlockStore.usedSpace metric.
minFreeSpace
Amount of reserved space in bytes. Configured statically in dse.yaml.
privateAddress
IP and port of the endpoint for DSEFS internode communication.
publicAddress
IP and port of the endpoint for DSEFS clients.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
306
Using DataStax Enterprise advanced functionality

readOnly
Returns true if the location is in read-only mode.
status
One of the following values: up, down, unavailable:

• If the location is up, the location is fully operational and this node will attempt to read or write from
it.

• If the location is down, the location is on a node that has been gracefully shut down by the
administrator and no reads or writes will be attempted.

• If the location is unavailable, this node has problems communicating with that location, and the
real status is unknown. This node will check the status periodically.

storageWeight
How much data relative to other locations will be stored in this location. This is a static value
configured in dse.yaml
BlockStore
BlockStore metrics report how fast and how much data is being read/written by the data layer of the DSEFS
node. They are reported only for the locations managed by the node to which you connect with JMX. In order
to get metrics information for all the locations in the cluster, you need to indvidually connect to all nodes with
DSEFS.
blocksDeleted
How many blocks are deleted, in blocks per second.
blocksRead
Read accesses in blocks per second.
blocksWritten
Writes in blocks per second.
bytesDeleted
How fast data is removed, in bytes per second.
bytesRead
How fast data is being read, in bytes per second.
bytesWritten
How fast data is written, in bytes per second.
readErrors
The total count and rate of read errors (rate in errors per second).
writeErrors
The total count and rate of write errors (rate in errors per second).
directory
The path to the storage directory of this location.
freeSpace
How much space is left on the device in bytes.
usedSpace
Estimated amount of space used by this location in bytes.
RestServer
RestServer reports metrics related to the communication layer of DSEFS, separately for internode traffic and
clients. Each set of these metrics is identified by a scope of the form: listen address:listen port. By default
port 5598 is used for clients, and port 5599 is for internode communication.
connectionCount
The current number of open inbound connections.
connectionRate
The total rate and count of connections since the server was started.
requestRate
The total rate and number of requests, respectively: all, DELETE, GET, POST, and PUT requests. Use
deleteRate, getRate, postRate, or putRate to obtain requests of a specific type.
downloadBytesRate
Throughput in bytes per second of the transfer from server to client.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
307
Using DataStax Enterprise advanced functionality

uploadBytesRate
Throughput in bytes per second of the transfer from client to server.
responseTime
The time that elapses from receiving the full request body to the moment the server starts sending out
the response.
uploadTime
The time it takes to read the request body from the client.
downloadTime
The time that it takes to send the response body to the client.
errors
A counter which is increased every time the service handling the request throws an unexpected error.
errors is not increased by errors handled by the service logic. For example, file not found errors do
not increment errors.
CassandraClient
CassandraClient reports metrics related to the communication layer between DSEFS and the database.
responseTime
Tracks the response times of database queries.
errors
A counter increased by query execution errors (for example, timeout errors).

DSE Search
DSE Search allows you to quickly find data and provide a modern search experience for your users, helping you
create features like product catalogs, document repositories, ad-hoc reporting engines, and more.
Because DataStax Enterprise is a cohesive data management platform so other workloads such as DSE Graph,
DSE Analytics and Search integration, and DSE Analytics can take full advantage of the indexing and query
capabilities of DSE Search.
About DSE Search
DSE Search is part of DataStax Enterprise (DSE). DSE Search allows you to find data and create features like
product catalogs, document repositories, and ad-hoc reports. See DSE Search architecture.
DSE Analytics and Search integration and DSE Analytics can use the indexing and query capabilities of DSE
Search. DSE Search manages search indexes with a persistent store.
The benefits of running enterprise search functions through DataStax Enterprise and DSE Search include:

• DSE Search is backed by a scalable database.

• A persistent store for search indexes.

• A fault-tolerant search architecture across multiple datacenters.

• Add search capacity just like you add capacity in the DSE database.

• Set up replication for DSE Search nodes the same way as other nodes by creating a keyspace or changing
the replication factor of a keyspace to optimize performance.

• DSE Search has two indexing modes: Near-real-time (NRT) and live indexing, also called real-time (RT)
indexing. Configure and tune DSE Search for maximum indexing throughput.

• Near real-time query capabilities.

• TDE encryption of DSE Search data, including search indexes and commit logs. See Encrypting Search
indexes.

• CQL index management commands simplify search index management.

• Local node (optional) management of search indexing resources with dsetool commands.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
308
Using DataStax Enterprise advanced functionality

• Read/write to any DSE Search node and automatically index stored data.

• Examine and aggregate real-time data using CQL.

• Fault-tolerant queries, efficient deep paging, and advanced search node resiliency.

• Virtual nodes (vnodes) support.

• Set the location of the search index.

• Native CQL queries that leverage search indexes for an array of CQL query functionality and indexing
support.

• Using CQL, DSE Search supports partial document updates that enable you to modify existing information
and maintain a lower transaction cost.

• Supports indexing and querying of advanced data types, including tuples and user-defined types (UDT).

• DSE Search is built with a production-certified version of Apache Solr™. DSE Search uses some Solr tools
and APIs, the implementation does not guarantee that Solr tools and APIs work as expected. Be sure to
review the unsupported features for DSE Search.

See the DataStax blog post What’s New for Search in DSE 6. Highlights include:

• Simplified indexing pipeline and back-pressure that reduces the frequency of dropped mutations and
requires less configuration. (Soft commit is still required for update visibility.)

• NodeSync search data and data repair is processed automatically by DSE.

• Native CQL queries can use search indexes for additional CQL query functionality and index support.
Search queries do not require a solr_query clause, and some queries that previously required ALLOW
FILTERING no longer have that limitation because search indexes are used automatically.

• Query LIKE operator can be used with search indexes.

• Default search index configuration provides functionality similar to the ANSI SQL LIKE operator, and
requires less processing to generate the data and less index data for the search.

• Disabled the ability to perform writes and deletes using the Solr HTTP interface.

• Additional logging for shard replica requests to improve troubleshooting.

• Default index behavior from Cassandra is overridden to improve the performance of post-repair index
building.

DSE Search architecture


In a distributed environment, the data is spread over multiple nodes. Deploy DSE Search nodes in their own
datacenter to run DSE Search on all nodes.
Data is written to the database first, and then indexes are updated next.
When you update a table using CQL, the search index is updated. Indexing occurs automatically after an
update. Writes are durable. All writes to a replica node are recorded in memory and in a commit log before they
are acknowledged as a success. If a crash or server failure occurs before the memory tables are flushed to
disk, the commit log is replayed on restart to recover any lost writes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
309
Using DataStax Enterprise advanced functionality

Figure 5:

DSE Search terms


In DSE Search, there are several names for an index of documents on a single node:

• A search index (formerly referred to as a search core)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
310
Using DataStax Enterprise advanced functionality

• A collection

• One shard of a collection

See the following table for a mapping between database and DSE Search concepts.

Table 18: Relationship between the database and DSE Search concepts
Database Search single node environment

Table Search index (core) or collection

Row Document

Primary key Unique key

Column Field

Node n/a

Partition n/a

Keyspace n/a

How DSE Search works

• Each document in a search index is unique and contains a set of fields that adhere to a user-defined
schema.

• The schema lists the field types and defines how they should be indexed.

• DSE Search maps search indexes to tables.

• Each table has a separate search index on a particular node.

• Solr documents are mapped to rows, and document fields to columns.

• A shard is indexed data for a subset of the data on the local node.

• The keyspace is a prefix for the name of the search index and has no counterpart in Solr.

• Search queries are routed to enough nodes to cover all token ranges.

# The query is sent to all token ranges to get all possible results.

# The search engine considers the token ranges that each node is responsible for, taking into account
the replication factor (RF), and computes the minimum number of nodes that is required to query all
ranges.

• On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to
route sub-queries to the nodes most capable of handling them. The shard routing is token aware, but is not
limited unless the search query specifies a specific token range.

• With replication, a node or search index contains more than one partition (shard) of table (collection) data.
Unless the replication factor equals the number of cluster nodes, the node or search index contains only a
portion of the data of the table or collection.

DSE Search path


This section provides an overview of the DSE Search path, and how Solr integrates with Cassandra:

1. A row mutation is performed in Cassandra.

2. A thread in the Thread Per Core (TPC) architecture processes the mutation.

3. The mutation is forwarded to the secondary index API.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
311
Using DataStax Enterprise advanced functionality

4. A Lucene document is built from the latest full row in the backing table.

5. The document is placed in the Lucene RAM buffer.

6. Control is returned to Cassandra.

7. The Cassandra write operation is completed.

Note:

• The RAM buffer is flushed when a commit is performed.

# Commits occur when the RAM buffer is full, or a soft commit or hard commit is performed.

# Soft commits are triggered by the auto soft commit timer.

# Hard commits are triggered by a memtable flush on the base Cassandra table.

• The Lucene documents are flushed to disk into a Lucene segment.

• Part of the flush process ensures that for a given document identifier, only one live document exists.
Therefore, any duplicate older documents are deleted.

• Lucene merges segments periodically in a similar way that Cassandra performs compaction.

DSE Search versus Open Source Apache Solr™


By virtue of its integration into DataStax Enterprise, differences exist between DSE Search and Open Source
Solr (OSS).
Major differences

Capability DSE OS Solr Description


Search

Includes a database yes no For OSS, create an interface to add a database.

Indexes real-time data yes no Ingests real-time data and automatically indexes the data.

Provides an intuitive way to yes no CQL for loading and updating data.
update data

Supports data distribution yes yes [1] Transparently distributes real-time, analytics, and search data to
multiple nodes in a cluster.

Balances loads on nodes/shards yes no Unlike Solr and Solr Cloud, DSE Search loads can be efficiently
rebalanced.

Spans indexes over multiple yes no A DSE cluster can have more than one datacenter for different types of
datacenters nodes.

Makes durable updates to data yes no Updates are durable and written to the commit log for all updates.

Automatically reindexes search yes no OSS requires the client to reingest everything to reindex data in Solr.
data

Upgrades of Apache Lucene® yes no DataStax integrates Lucene upgrades periodically and data is
preserve data preserved when you upgrade DSE.

Supports timeAllowed queries yes no OSS Solr does not support using timeAllowed queries with deep
with deep paging. paging.

[1] OSS requires using Zookeeper.


Feature differences
DSE Search supports limiting queries by time by using the Solr timeAllowed parameter. DSE Search differs
from native Solr:

• If the timeAllowed is exceeded, an exception is thrown.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
312
Using DataStax Enterprise advanced functionality

• If the timeAllowed is exceeded, and the additional shards.tolerant parameter is set to true, the application
returns the partial results collected so far.
When partial results are returned, the CQL custom payload contains the DSESearch.isPartialResults key.

Unsupported features for DSE Search


Unsupported features include Apache Cassandra™ and Apache Solr™ features.
Unsupported Apache Cassandra features
These limitations apply to DSE Search:

• Column aliases are not supported in solr_query queries.

• Continuous paging.

• Static columns

• Counter columns

• Super columns

• Thrift-compatible tables with column comparators other than UTF-8 or ASCII.

• PER PARTITION clause is not supported for DSE Search solr_query queries.

• Indexing frozen maps is not supported. However, indexing frozen sets and lists of native and user-defined
(tuple/UDT) element types is supported.

• Using DSE Search with newly created COMPACT STORAGE tables is deprecated.

Unsupported Apache Solr™ features


These limitations apply to DSE Search:

• DSE Search does not support Solr Managed Resources.

• Solr schema fields that are both dynamic and multiValued only for CQL-based search indexes.

• The deprecated replaceFields request parameters on document updates for CQL-based search indexes.
Instead, use the suggested procedure for inserting/updating data.

• Block joins based on the Lucene BlockJoinQuery in search indexes and CQL tables.

• Schemaless mode.

• Partial schema updates through the REST API after search indexes are changed.
For example, to update individual fields of a schema using the REST API to add a new field to a schema,
you must change the schema.xml file, upload it again, and reload the core (same for copy fields).

• org.apache.solr.spelling.IndexBasedSpellChecker and org.apache.solr.spelling.FileBasedSpellChecker


Instead use org.apache.solr.spelling.DirectSolrSpellChecker for spell checking.

• The commitWithin parameter.

• The SolrCloud CloudSolrServer feature of SolrJ for endpoint discovery and round-robin load balancing.

• The DSE Search configurable SolrFilterCache does not support auto-warming.

• DSE Search does not support the duration Cassandra data type.

• SELECT statements with DISTINCT are not supported with solr_query.

• RealTime Get.

• GetReplicationHandler: Store & Restore.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
313
Using DataStax Enterprise advanced functionality

• useDocValuesAsStored in schema fields and as a query request parameter.

• Solr Graph queries.

• Solr SQLStreaming aggregations.

• Data import handler.

• Tuple/UDT subfield sorting and faceting.

• The dataDir parameter in solrconfig.xml.

• Parallel SQL interface.

Deprecated Solr and Lucene features

• The Tika functionality that is bundled with Apache Solr is deprecated. Instead, use the stand-alone Apache
Tika project.

• Highlighting.

• MoreLikeThis search component.

• SpellCheck search component.

• Suggester (suggest search component).

• ClassicSimilarityFactory class.

Other deprecated features


The following features that were previously available for use with DSE Search are deprecated and no longer
supported.

• The DSE custom URP implementation is deprecated. Use the field input/output (FIT) transformer API
instead.

Other unsupported features

• HTTP delete-by-query, HTTP delete-by-id, and other Solr HTTP updates.

• JBOD mode.

• The Solr updatelog is not supported in DSE Search.


The commit log replaces the Solr updatelog. Consequently, features that require the updateLog are not
supported. Instead of using atomic updates, partial document updates are available by running the update
with CQL.

• CQL Solr queries do not support native functions or column aliases as selectors.

• RamDirectoryFactory or other non-persistent DirectoryFactory implementations.

• Tuple and UDT limitations apply.

Apache Solr and Apache Lucene limitations


Apache Solr™ and Apache Lucene® limitations
This topic lists the Apache Solr and Apache Lucene limitations that apply to DSE Search.

• The 2.1 billion records limitation, per index on each node, as described in Lucene limitations.

• The 1024 maxBoolean clause limit in SOLR-4586.

• Solr field name policy applies to the indexed field names:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
314
Using DataStax Enterprise advanced functionality

# Every field must have a name.

# Field names must consist of alphanumeric or underscore characters only.

# Fields cannot start with a digit.

# Names with both leading and trailing underscores (for example, _version_) are reserved.

Non-compliant field names are not supported from all components. Backward compatibility is not
guaranteed.

• Limitations and known Apache Solr issues apply to DSE Search queries. For example: incorrect SORT
results for tokenized text fields.

Configuring DSE Search

DSE Search reference


Reference information for DSE Search.
Search index config
Reference information to change query behavior for search indexes:

• DataStax recommends CQL CREATE SEARCH INDEX and ALTER SEARCH INDEX CONFIG
commands.

• dsetool commands can also be used to manage search indexes.

Changing search index config


To create and make changes to the search index config, follow these basic steps:

1. Create a search index. For example:

CREATE SEARCH INDEX ON demo.health_data;

2. Alter the search index. For example:

ALTER SEARCH INDEX CONFIG ON demo.health_data SET autoCommitTime = 30000;

3. Optionally view the XML of the pending search index. For example:

DESCRIBE PENDING SEARCH INDEX CONFIG on demo.health_data;

4. Make the pending changes active. For example:

RELOAD SEARCH INDEX ON demo.health_data;

Sample search index config

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<config>
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</
abortOnConfigurationError>
<luceneMatchVersion>LUCENE_6_0_0</luceneMatchVersion>
<dseTypeMappingVersion>2</dseTypeMappingVersion>
<directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/>
<indexConfig>
<rt>false</rt>
<rtOffheapPostings>true</rtOffheapPostings>
<useCompoundFile>false</useCompoundFile>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
315
Using DataStax Enterprise advanced functionality

<ramBufferSizeMB>512</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
</deletionPolicy>
<infoStream file="INFOSTREAM.txt">false</infoStream>
</indexConfig>
<jmx/>
<updateHandler class="solr.DirectUpdateHandler2">
<autoSoftCommit>
<maxTime>10000</maxTime>
</autoSoftCommit>
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048"
lowWaterMarkMB="1024"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>16</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="true">
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/>
<httpCaching never304="true"/>
</requestDispatcher>
<requestHandler class="solr.SearchHandler" default="true" name="search">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler
class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler"
name="solr_query">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler class="solr.UpdateRequestHandler" name="/update"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
<requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field"
startup="lazy"/>
<requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document"
startup="lazy"/>
<requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
<requestHandler class="solr.PingRequestHandler" name="/admin/ping">
<lst name="invariants">
<str name="qt">search</str>
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
316
Using DataStax Enterprise advanced functionality

</config>

For CQL index management, use configuration element shortcuts with CQL commands.
Configuration elements are listed alphabetically by shortcut. The XML element is shown with the element start
tag. An ellipsis indicates that other elements or attributes are not shown.
autoCommitTime
Defines the time interval between updates to the search index with the most recent data after an
INSERT, UPDATE, or DELETE. By default, changes are automatically committed every 10000
milliseconds. To change the time interval between updates:

1. Set auto commit time on the pending search index:

ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 30000;

2. You can view the pending search config:

DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr;

The resulting XML shows the maximum time between updates is 30000 milliseconds:

<updateHandler class="solr.DirectUpdateHandler2">
<autoSoftCommit>
<maxTime>30000</maxTime>
</autoSoftCommit>
</updateHandler>

3. To make the pending changes active, reload the search index:

RELOAD SEARCH INDEX ON wiki.solr;

See Tuning search for maximum indexing throughput.


defaultQueryField
Name of the default field to query. Default not set. To set the field to use when no field is specified by
the query, see Setting up default query field.
directoryFactory
The directory factory to use for search indexes. Encryption is enabled per search
index. To enable encryption for a search index, change the class for directoryFactory to
EncryptedFSDirectoryFactory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
317
Using DataStax Enterprise advanced functionality

1. Enable encryption on the pending search index:

ALTER SEARCH INDEX CONFIG ON wiki.solr SET directoryFactory =


EncryptedFSDirectoryFactory;

2. You can view the pending search config:

DESCRIBE PENDING SEARCH INDEX CONFIG on wiki.solr;

The resulting XML shows that encryption is enabled:

<directoryFactory class="solr.EncryptedFSDirectoryFactory"
name="DirectoryFactory"/>

3. To make the pending changes active, reload the search index:

RELOAD SEARCH INDEX ON wiki.solr;

Even though additional properties are available to tune encryption, DataStax recommends using the
default settings.
filterCacheLowWaterMark
Default is 1024 MB. See below.
filterCacheHighWaterMark
Default is 2048 MB.
The DSE Search configurable filter cache reliably bounds the filter cache memory usage for a search
index. This implementation contrasts with the default Solr implementation which defines bounds for
filter cache usage per segment. SolrFilterCache bounding works by evicting cache entries after the
configured per search index (per core) high watermark is reached, and stopping after the configured
lower watermark is reached.

• The filter cache is cleared when the search index is reloaded.

• SolrFilterCache does not support auto-warming.

SolrFilterCache defaults to offheap. In general, the larger the index is, then the larger the filter cache
should be. A good default is 1 to 2 GB. If the index is 1 billion docs per node, then set to 4 to 5 GB.

1. To change cache eviction for a large index, set the low and high values one at a time:

ALTER SEARCH INDEX CONFIG ON solr.wiki SET filterCacheHighWaterMark = 5000;

ALTER SEARCH INDEX CONFIG ON solr.wiki SET filterCacheLowWaterMark = 2000;

2. View the pending search index config:

<query>
...
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="5000"
lowWaterMarkMB="2000"/>
...

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
318
Using DataStax Enterprise advanced functionality

</query>

3. To make the pending changes active, reload the search index:

RELOAD SEARCH INDEX ON wiki.solr;

mergeFactor
When a new segment causes the number of lowest-level segments to exceed the merge factor value,
then those segments are merged together to form a single large segment. When the merge factor is
10, each merge results in the creation of a single segment that is about ten times larger than each of
its ten constituents. When there are 10 of these larger segments, then they in turn are merged into an
even larger single segment. Default is 10.

1. To change the number of segments to merge at one time:

ALTER SEARCH INDEX CONFIG ON solr.wiki SET mergeFactor = 5;

2. View the pending search index config:

<indexConfig>
...
<mergeFactor>10</mergeFactor>
...
</indexConfig>

3. To make the pending changes active, reload the search index:

RELOAD SEARCH INDEX ON wiki.solr;

mergeMaxThreadCount
Must configure with mergeMaxMergeCount. The number of concurrent merges that Lucene can
perform for the search index. The default mergeScheduler settings are set automatically. Do not
adjust this setting.
Default: ½ the number of tpc_cores
mergeMaxMergeCount
Must configure with mergeMaxThreadCount. The number of pending merges (active and in the
backlog) that can accumulate before segment merging starts to block/throttle incoming writes. The
default mergeScheduler settings are set automatically. Do not adjust this setting.
Default: 2x the mergeMaxThreadCount
ramBufferSize
The index RAM buffer size in megabytes (MB). The RAM buffer holds uncommitted documents. A
larger RAM buffer reduces flushes. Segments are also larger when flushed. Fewer flushes reduces I/
O pressure which is ideal for higher write workload scenarios.
For example, adjust the ramBufferSize when you configure live indexing:

ALTER SEARCH INDEX CONFIG ON wiki.solr SET autoCommitTime = 100;


ALTER SEARCH INDEX CONFIG ON wiki.solr SET realtime = true;
ALTER SEARCH INDEX CONFIG ON wiki.solr SET ramBufferSize = 2048;
RELOAD SEARCH INDEX ON wiki.solr ;

Default: 512
realtime
Enables live indexing to increase indexing throughput. Enable live indexing on only one node per
cluster. Live indexing, also called real-time (RT) indexing, supports searching directly against the
Lucene RAM buffer and more frequent, cheaper soft-commits, which provide earlier visibility to newly
indexed data.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
319
Using DataStax Enterprise advanced functionality

Live indexing requires a larger RAM buffer and more memory usage than an otherwise equivalent
NRT setup. See Tune RT indexing.
Configuration elements without shortcuts
To specify configuration elements that do not have shortcuts, you can specify the XML path to the setting and
separate child elements using a period.
deleteApplicationStrategy
Controls how to retrieve deleted documents when deletes are being applied. Seek exact is the safe
default most people should choose, but for a little extra performance you can try seekceiling.
Valid case-insensitive values are:

• seekexact
Uses bloom filters to avoid reading from most segments. Use when memory is limited and the
unique key field data does not fit into memory.

• seekceiling
More performant when documents are deleted/inserted into the database with sequential keys,
because this strategy can stop reading from segments when it is known that terms can no longer
appear.

Default: seekexact
mergePolicyFactory
The AutoExpungeDeletesTieredMergePolicy custom merge policy is based on TieredMergePolicy.
This policy cleans up the large segments by merging them when deletes reach the percentage
threshold. A single auto expunge merge occurs at a time. Use for large indexes that are not merging
the largest segments due to deletes. To determine whether this merge setting is appropriate for your
workflow, view the segments on the Solr Segment Info screen.
When set, the XML is described as:

<indexConfig>
<mergePolicyFactory
class="org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory">
<int name="maxMergedSegmentMB">1005</int>
<int name="forceMergeDeletesPctAllowed">25</int>
<bool name="mergeSingleSegments">true</bool>
</mergePolicyFactory>
</indexConfig>

To extend TieredMergePolicy to support automatic removal of deletes:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
320
Using DataStax Enterprise advanced functionality

1. To enable automatic removal of deletes, set the custom policy:

ALTER SEARCH INDEX CONFIG ON wiki.solr SET


indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFa
= true;

2. Set the maximum segment size in MB:

ALTER SEARCH INDEX CONFIG ON wiki.solr SET


indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFa
= 1005;

3. Set the percentage threshold for deleting from the large segments:

ALTER SEARCH INDEX CONFIG ON wiki.solr SET


indexConfig.mergePolicyFactory[@class='org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFa
= 25;

If mergeFactor is in the existing index config, you must drop it from the search index before you alter
the table to support automatic removal of deletes:

ALTER SEARCH INDEX CONFIG ON wiki.solr DROP indexConfig.mergePolicyFactory;

parallelDeleteTasks
Regulates how many tasks are created to apply deletes during soft/hard commit in parallel.
Supported for RT and NRT indexing. Specify a positive number greater than 0.
Leave parallelDeleteTasks at the default value, except when issues occur with write load when
running a mixed read/write workload. If writes occasionally spike in utilization and negatively impact
your read performance, then set this value lower.
Default: the number of available processors
Search index schema
Search index schema reference information to use for creating and altering a search index schema:

• DataStax recommends CQL CREATE SEARCH INDEX and ALTER SEARCH INDEX SCHEMA
commands.

• dsetool commands can also be used to manage search indexes.

The schema defines the relationship between data in a table and a search index. See Creating a search index
with default values and Quick Start for CQL index management for details and examples.
A sample search index schema XML:
Sample XML

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TextField" name="TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_thyroid_disease"
type="TextField"/>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
321
Using DataStax Enterprise advanced functionality

<field indexed="true" multiValued="false" name="pets" type="TextField"/>


<field indexed="true" multiValued="false" name="secondary_smoke" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_lupus" type="TextField"/>
<field indexed="true" multiValued="false" name="gender" type="TextField"/>
<field indexed="true" multiValued="false" name="birthplace" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="income_group"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="marital_status" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="age_months"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="bird" type="TextField"/>
<field indexed="true" multiValued="false" name="hay_fever" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_hay_fever"
type="TextField"/>
<field indexed="true" multiValued="false" name="routine_medical_coverage"
type="TextField"/>
<field indexed="true" multiValued="false" name="annual_income_20000"
type="TextField"/>
<field indexed="true" multiValued="false" name="exam_status" type="TextField"/>
<field indexed="true" multiValued="false" name="other_pet" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_stroke" type="TextField"/>
<field indexed="true" multiValued="false" name="employer_paid_plan"
type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="family_sequence"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="diagnosed_cataracts"
type="TextField"/>
<field indexed="true" multiValued="false" name="major_medical_coverage"
type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_gout" type="TextField"/>
<field indexed="true" multiValued="false" name="age_unit" type="TextField"/>
<field indexed="true" multiValued="false" name="goiter" type="TextField"/>
<field indexed="true" multiValued="false" name="chronic_bronchitis"
type="TextField"/>
<field indexed="true" multiValued="false" name="county" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="num_smokers"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="screening_month" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_emphysema"
type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_other_cancer"
type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="id"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="dental_coverage" type="TextField"/>
<field indexed="true" multiValued="false" name="health_status" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false"
name="monthly_income_total" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="fish" type="TextField"/>
<field indexed="true" multiValued="false" name="dog" type="TextField"/>
<field indexed="true" multiValued="false" name="asthma" type="TextField"/>
<field indexed="true" multiValued="false" name="ethnicity" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="age"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="diagnosed_asthma" type="TextField"/>
<field indexed="true" multiValued="false" name="race_ethnicity" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_congestive_heart_failure"
type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="family_size"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="race" type="TextField"/>
<field indexed="true" multiValued="false" name="thyroid_disease" type="TextField"/>
<field indexed="true" multiValued="false" name="bronchitis" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="household_size"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="cat" type="TextField"/>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
322
Using DataStax Enterprise advanced functionality

<field indexed="true" multiValued="false" name="diagnosed_goiter" type="TextField"/>


<field indexed="true" multiValued="false" name="diagnosed_skin_cancer"
type="TextField"/>
<field indexed="true" multiValued="false" name="fips" type="TextField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>

dsetool search index commands


dsetool commands for DSE Search
The dsetool commands for DSE Search provide search index management.

• dsetool create_core

• dsetool core_indexing_status

• dsetool get_core_config

• dsetool get_core_schema

• dsetool index_checks (experimental)

• dsetool infer_solr_schema

• dsetool list_index_files

• dsetool read_resource

• dsetool rebuild_indexes

• dsetool reload_core

• dsetool stop_core_reindex

• dsetool unload_core

• dsetool upgrade_index_files

• dsetool write_resource

DataStax recommends using CQL commands to manage search indexes.


Configuration properties
Reference information for DSE Search configuration properties.

• Data location in cassandra.yaml

• Scheduler settings in dse.yaml

• Indexing settings in dse.yaml

• Safety thresholds in cassandra.yaml

• Inter-node communication in dse.yaml

• Query options in dse.yaml

• Client connections in dse.yaml

• Performance in cassandra.yaml

• Performance in dse.yaml

Data location in cassandra.yaml


See Set the location of search indexes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
323
Using DataStax Enterprise advanced functionality

data_file_directories
The directory where table data is stored on disk. The database distributes data evenly across the
location, subject to the granularity of the configured compaction strategy. If not set, the directory is
$DSE_HOME/data/data.
For production, DataStax recommends RAID 0 and SSDs.
Default: - /var/lib/cassandra/data
Scheduler settings in dse.yaml
Configuration options to control the scheduling and execution of indexing checks.
ttl_index_rebuild_options
Section of options to control the schedulers in charge of querying for and removing expired records,
and the execution of the checks.
fix_rate_period
Time interval to check for expired data in seconds.
Default: 300
initial_delay
The number of seconds to delay the first TTL check to speed up start-up time.
Default: 20
max_docs_per_batch
The maximum number of documents to check and delete per batch by the TTL rebuild thread. All
documents determined to be expired are deleted from the index during each check, to avoid memory
pressure, their unique keys are retrieved and deletes issued in batches.
Default: 4096
thread_pool_size
The maximum number of cores that can execute TTL cleanup concurrently. Set the thread_pool_size
to manage system resource consumption and prevent many search cores from executing
simultaneous TTL deletes.
Default: 1
Indexing settings in dse.yaml
solr_resource_upload_limit_mb
Option to disable or configure the maximum file size of the search index config or schema. Resource
files can be uploaded, but the search index config and schema are stored internally in the database
after upload.

• 0 - disable resource uploading

• upload size - The maximum upload size limit in megabytes (MB) for a DSE Search resource file
(search index config or schema).

Default: 10
flush_max_time_per_core
The maximum time, in minutes, to wait for the flushing of asynchronous index updates that occurs at
DSE Search commit time or at flush time. Expert level knowledge is required to change this value.
Always set the value reasonably high to ensure flushing completes successfully to fully sync DSE
Search indexes with the database data. If the configured value is exceeded, index updates are only
partially committed and the commit log is not truncated which can undermine data durability.
When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely
manner. Live indexing increases the time to flush asynchronous index updates.
Default: commented out (5)
load_max_time_per_core
The maximum time, in minutes, to wait for each DSE Search index to load on startup or create/reload
operations. This advanced option should be changed only if exceptions happen during search index
loading. When not set, the default is 5 minutes.
Default: commented out (5)
enable_index_disk_failure_policy
Whether to apply the configured disk failure policy if IOExceptions occur during index update
operations.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
324
Using DataStax Enterprise advanced functionality

• true - apply the configured Cassandra disk failure policy to index write failures

• false - do not apply the disk failure policy

When not set, the default is false.


Default: commented out (false)
solr_data_dir
The directory to store index data. For example:
solr_data_dir: /var/lib/cassandra/solr.data
See Managing the location of DSE Search data.By default, each DSE Search index is saved in
solr_data_dir/keyspace_name.table_name, or as specified by the dse.solr.data.dir system
property.
Default: commented out
solr_field_cache_enabled
The Apache Lucene® field cache is deprecated. Instead, for fields that are sorted, faceted, or
grouped by, set docValues="true" on the field in the search index schema. Then reload the search
index and reindex. When not set, the default is false.
Default: commented out (false)
async_bootstrap_reindex
For DSE Search, configure whether to asynchronously reindex bootstrapped data. Default: false

• If enabled, the node joins the ring immediately after bootstrap and reindexing occurs
asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down.
The dsetool ring command can be used to check the status of the reindexing.

• If disabled, the node joins the ring after reindexing the bootstrapped data.

Safety thresholds
Configure safety thresholds and fault tolerance for DSE Search with options in dse.yaml and cassandra.yaml.
Safety thresholds in cassandra.yaml
Configuration options include:
read_request_timeout_in_ms
Default: 5000. How long the coordinator waits for read operations to complete before timing it out.
Security in dse.yaml
Security options for DSE Search. See DSE Search security checklist.
solr_encryption_options
Settings to tune encryption of search indexes.
decryption_cache_offheap_allocation
Whether to allocate shared DSE Search decryption cache off JVM heap.

• true - allocate shared DSE Search decryption cache off JVM heap

• false - do not allocate shared DSE Search decryption cache off JVM heap

When not set, the default is true.


Default: commented out (true)
decryption_cache_size_in_mb
The maximum size of shared DSE Search decryption cache in megabytes (MB).
Default: commented out (256)
http_principal
The http_principal is used by the Tomcat application container to run DSE Search. The Tomcat
web server uses the GSSAPI mechanism (SPNEGO) to negotiate the GSSAPI security mechanism
(Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be
uppercase.
Inter-node communication in dse.yaml
Inter-node communication between DSE Search nodes.
shard_transport_options
Fault tolerance option for inter-node communication between DSE Search nodes.
netty_client_request_timeout

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
325
Using DataStax Enterprise advanced functionality

Timeout behavior during distributed queries. The internal timeout for all search queries to prevent
long running queries. The client request timeout is the maximum cumulative time (in milliseconds)
that a distributed search request will wait idly for shard responses.
Default: 60000 (1 minute)
Query options in dse.yaml
Options for CQL Solr queries.
cql_solr_query_paging

• driver - Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the
driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.

• off - Paging is off. Ignore driver paging settings for CQL queries and use normal Solr paging
unless:

# The current workload is an analytics workload, including SearchAnalytics. SearchAnalytics


nodes always use driver paging settings.

# The cqlsh query parameter paging is set to driver.


Even when cql_solr_query_paging: off, paging is dynamically enabled with the
"paging":"driver" parameter in JSON queries.

When not set, the default is off.


Default: commented out (off)
cql_solr_query_row_timeout
The maximum time in milliseconds to wait for each row to be read from the database during CQL Solr
queries.
Default: commented out (10000 10 seconds)
Client connections in dse.yaml
The default IP address that the HTTP and Solr Admin interface uses to access DSE Search. See
Changing Tomcat web server settings.
native_transport_address
When left blank, uses the configured hostname of the node. Unlike the listen_address, this value
can be set to 0.0.0.0, but you must set the native_transport_broadcast_address to a value other than
0.0.0.0.
Set native_transport_address OR native_transport_interface, not both.
Default: localhost
Performance in cassandra.yaml
Decreasing the memtable space to make room for Solr caches might improve performance. See
Changing the stack size and memtable space.
memtable_heap_space_in_mb
The amount of on-heap memory allocated for memtables. The database uses the total of this amount
and the value of memtable_offheap_space_in_mb to set a threshold for automatic memtable flush.
See memtable_cleanup_threshold and Tuning the Java heap.
Default: calculated 1/4 of heap size (2048)
Performance in dse.yaml
Node routing options.
node_health_options
Node health options are always enabled.
refresh_rate_ms
Default: 60000
uptime_ramp_up_period_seconds
The amount of continuous uptime required for the node's uptime score to advance the node health
score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a
composite score based on dropped mutations and uptime.
If a node is repairing after a period of downtime, you might want to increase the uptime period to
the expected repair time.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
326
Using DataStax Enterprise advanced functionality

Default: commented out (10800 3 hours)


dropped_mutation_window_minutes
The historic time window over which the rate of dropped mutations affect the node health score.
Default: 30
Viewing search index schema and config
Search index schema and config are stored internally in the database. When you modify a search index
schema or config, the changes are pending.
Use the RELOAD SEARCH INDEX command to apply the pending changes to the active (in use) search index.
DataStax recommends using CQL to view the pending or active (in use) schema or config.
CQL shell DESCRIBE command
Use the CQL shell command DESCRIBE SEARCH INDEX to view the active and pending schema and config.
Show the active index config for wiki.solr:

DESCRIBE ACTIVE SEARCH INDEX CONFIG ON demo.health_data;

The results are shown in XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<config>
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</
abortOnConfigurationError>
<luceneMatchVersion>LUCENE_6_0_0</luceneMatchVersion>
<dseTypeMappingVersion>2</dseTypeMappingVersion>
<directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/>
<indexConfig>
<rt>false</rt>
<rtOffheapPostings>true</rtOffheapPostings>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>512</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
</deletionPolicy>
<infoStream file="INFOSTREAM.txt">false</infoStream>
</indexConfig>
<jmx/>
<updateHandler class="solr.DirectUpdateHandler2">
<autoSoftCommit>
<maxTime>10000</maxTime>
</autoSoftCommit>
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048"
lowWaterMarkMB="1024"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>16</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="true">
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/>
<httpCaching never304="true"/>
</requestDispatcher>
<requestHandler class="solr.SearchHandler" default="true" name="search">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
327
Using DataStax Enterprise advanced functionality

<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler"
name="solr_query">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler class="solr.UpdateRequestHandler" name="/update"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
<requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field"
startup="lazy"/>
<requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document"
startup="lazy"/>
<requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
<requestHandler class="solr.PingRequestHandler" name="/admin/ping">
<lst name="invariants">
<str name="qt">search</str>
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>

Show the pending index config:


View the pending search index config or schema before it is active. For example, to view the pending index
schema for demo.health_data:

DESCRIBE PENDING SEARCH INDEX SCHEMA ON demo.health_data;

The results are shown in XML:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TextField" name="TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_thyroid_disease"
type="TextField"/>
<field indexed="true" multiValued="false" name="pets" type="TextField"/>
<field indexed="true" multiValued="false" name="secondary_smoke" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_lupus" type="TextField"/>
<field indexed="true" multiValued="false" name="gender" type="TextField"/>
<field indexed="true" multiValued="false" name="birthplace" type="TextField"/>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
328
Using DataStax Enterprise advanced functionality

<field docValues="true" indexed="true" multiValued="false" name="income_group"


type="TrieIntField"/>
<field indexed="true" multiValued="false" name="marital_status" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="age_months"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="bird" type="TextField"/>
<field indexed="true" multiValued="false" name="hay_fever" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_hay_fever"
type="TextField"/>
<field indexed="true" multiValued="false" name="routine_medical_coverage"
type="TextField"/>
<field indexed="true" multiValued="false" name="annual_income_20000"
type="TextField"/>
<field indexed="true" multiValued="false" name="exam_status" type="TextField"/>
<field indexed="true" multiValued="false" name="other_pet" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_stroke" type="TextField"/>
<field indexed="true" multiValued="false" name="employer_paid_plan" type="TextField"/
>
<field docValues="true" indexed="true" multiValued="false" name="family_sequence"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="diagnosed_cataracts"
type="TextField"/>
<field indexed="true" multiValued="false" name="major_medical_coverage"
type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_gout" type="TextField"/>
<field indexed="true" multiValued="false" name="age_unit" type="TextField"/>
<field indexed="true" multiValued="false" name="goiter" type="TextField"/>
<field indexed="true" multiValued="false" name="chronic_bronchitis" type="TextField"/
>
<field indexed="true" multiValued="false" name="county" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="num_smokers"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="screening_month" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_emphysema"
type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_other_cancer"
type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="id"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="dental_coverage" type="TextField"/>
<field indexed="true" multiValued="false" name="health_status" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false"
name="monthly_income_total" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="fish" type="TextField"/>
<field indexed="true" multiValued="false" name="dog" type="TextField"/>
<field indexed="true" multiValued="false" name="asthma" type="TextField"/>
<field indexed="true" multiValued="false" name="ethnicity" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="age"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="diagnosed_asthma" type="TextField"/>
<field indexed="true" multiValued="false" name="race_ethnicity" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_congestive_heart_failure"
type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="family_size"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="race" type="TextField"/>
<field indexed="true" multiValued="false" name="thyroid_disease" type="TextField"/>
<field indexed="true" multiValued="false" name="bronchitis" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="household_size"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="cat" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_goiter" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_skin_cancer"
type="TextField"/>
<field indexed="true" multiValued="false" name="fips" type="TextField"/>
</fields>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
329
Using DataStax Enterprise advanced functionality

<uniqueKey>(id,age)</uniqueKey>
</schema>

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TextField" name="TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_thyroid_disease"
type="TextField"/>
<field indexed="true" multiValued="false" name="pets" type="TextField"/>
<field indexed="true" multiValued="false" name="secondary_smoke" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_lupus" type="TextField"/>
<field indexed="true" multiValued="false" name="gender" type="TextField"/>
<field indexed="true" multiValued="false" name="birthplace" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="income_group"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="marital_status" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="age_months"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="bird" type="TextField"/>
<field indexed="true" multiValued="false" name="hay_fever" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_hay_fever"
type="TextField"/>
<field indexed="true" multiValued="false" name="routine_medical_coverage"
type="TextField"/>
<field indexed="true" multiValued="false" name="annual_income_20000"
type="TextField"/>
<field indexed="true" multiValued="false" name="exam_status" type="TextField"/>
<field indexed="true" multiValued="false" name="other_pet" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_stroke" type="TextField"/>
<field indexed="true" multiValued="false" name="employer_paid_plan" type="TextField"/
>
<field docValues="true" indexed="true" multiValued="false" name="family_sequence"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="diagnosed_cataracts"
type="TextField"/>
<field indexed="true" multiValued="false" name="major_medical_coverage"
type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_gout" type="TextField"/>
<field indexed="true" multiValued="false" name="age_unit" type="TextField"/>
<field indexed="true" multiValued="false" name="goiter" type="TextField"/>
<field indexed="true" multiValued="false" name="chronic_bronchitis" type="TextField"/
>
<field indexed="true" multiValued="false" name="county" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="num_smokers"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="screening_month" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_emphysema"
type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_other_cancer"
type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="id"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="dental_coverage" type="TextField"/>
<field indexed="true" multiValued="false" name="health_status" type="TextField"/>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
330
Using DataStax Enterprise advanced functionality

<field docValues="true" indexed="true" multiValued="false"


name="monthly_income_total" type="TrieIntField"/>
<field indexed="true" multiValued="false" name="fish" type="TextField"/>
<field indexed="true" multiValued="false" name="dog" type="TextField"/>
<field indexed="true" multiValued="false" name="asthma" type="TextField"/>
<field indexed="true" multiValued="false" name="ethnicity" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="age"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="diagnosed_asthma" type="TextField"/>
<field indexed="true" multiValued="false" name="race_ethnicity" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_congestive_heart_failure"
type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="family_size"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="race" type="TextField"/>
<field indexed="true" multiValued="false" name="thyroid_disease" type="TextField"/>
<field indexed="true" multiValued="false" name="bronchitis" type="TextField"/>
<field docValues="true" indexed="true" multiValued="false" name="household_size"
type="TrieIntField"/>
<field indexed="true" multiValued="false" name="cat" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_goiter" type="TextField"/>
<field indexed="true" multiValued="false" name="diagnosed_skin_cancer"
type="TextField"/>
<field indexed="true" multiValued="false" name="fips" type="TextField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>

Alternate ways to view


Other ways to view the search index schema and config in XML:

• dsetool
View the pending (uploaded) or active (in use) schema or config.

# dsetool get_core_config

# dsetool get_core_schema

• Solr Admin
View only the last uploaded (pending) resource.

Customizing the search index schema


A search schema defines the relationship between data in a table and a search index. The schema identifies
the columns to index and maps column names to Apache Solr™ types.
Schema defaults
DSE Search automatically maps the CQL column type to the corresponding Solr field type, defines the field type
analyzer and filtering classes, and sets the DocValues.
If required, modify the schema using the CQL-Solr type compatibility matrix.

Table and schema definition


Fields with indexed="true" are indexed and stored as secondary files in Lucene so that the fields are
searchable. The indexed fields are stored in the database, not in Lucene, with the exception of copy fields.
Copy field destinations are not stored in the database.
To set field values as lowercase and have them stored as lowercase in docValues, use the custom
LowerCaseStrField type. Refer to Using LowerCaseStrField with search indexes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
331
Using DataStax Enterprise advanced functionality

Sample schema
The following example from Querying CQL collections uses a simple primary key. The schema version attribute
is the Solr version number for the schema syntax and semantics. In this example, version="1.5".

<schema name="my_search_demo" version="1.5">


<types>
<fieldType class="solr.StrField" multiValued="true" name="StrCollectionField"/>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="text" class="solr.TextField"/>
<fieldType class="solr.TextField" name="textcollection" multiValued="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
</types>
<fields>
<field name="id" type="string" indexed="true"/>
<field name="quotes" type="textcollection" indexed="true"/>
<field name="name" type="text" indexed="true"/>
<field name="title" type="text" indexed="true"/>
</fields>
<defaultSearchField>quotes</defaultSearchField>
<uniqueKey>id</uniqueKey>
</schema>

DSE Search indexes the id, quotes, name, and title fields.
Mapping CQL primary keys and Solr unique keys
DSE Search supports CQL tables using simple or compound primary keys.
If the field is a compound primary key or Defining a multi-column partition key column in the database, the
unique key value is enclosed parentheses. The schema for this kind of table requires a different syntax than the
simple primary key:

• List each compound primary key column that appears in the CQL table in the schema as a field, just like
any other column.

• Declare the unique key using the key columns enclosed in parentheses.

• Order the keys in the uniqueKey element as the keys are ordered in the CQL table.

• When using composite partition keys, do not include the extra set of parentheses in the uniqueKey.
Partition key CQL syntax Solr uniqueKey syntax
Simple CQL primary key CREATE TABLE ( . . . a type PRIMARY <uniqueKey>a</uniqueKey>
KEY, . . . );
Parenthesis are not required for a single key.
(a is both the partition key and the primary key)
Compound primary key CREATE TABLE ( . . . PRIMARY KEY ( a, b, c ) ); <uniqueKey>(a, b, c)</uniqueKey>
(a is the partition key and a b c is the primary key)
Composite partition key CREATE TABLE ( . . . PRIMARY KEY ( ( a, b), c ); <uniqueKey>(a, b, c)</uniqueKey>
(a b is the partition key and a b c is the primary key)

Changing auto-generated search index settings


Using dsetool, you can customize the default settings for auto-generated search indexes by providing a YAML-
formatted file with these options:
auto_soft_commit_max_time:ms
The maximum auto soft commit time in milliseconds.
default_query_field:field
The query field to use when no field is specified in queries.
distributed=true | false
Whether to distribute and apply the operation to all nodes in the local datacenter.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
332
Using DataStax Enterprise advanced functionality

• True applies the operation to all nodes in the local datacenter.

• False applies the operation only to the node it was sent to. False works only when recovery=true.

Default: true
Distributing a re-index to an entire datacenter degrades performance severely in that datacenter.
enable_tokenized_text_copy_fields:( true | false )
Whether to generate tokenized text.
Default: false
exclude_columns: col1, col2, col3, ...
A comma-separated (CSV) list of columns to exclude.
generate_DocValues_for_fields:( * | field1, field2, ... )
The fields to automatically configure DocValues in the generated search index schema. Specify '*' to
add all possible fields:

generate_DocValues_for_fields: '*'

or specify a comma-separated list of fields, for example:

generate_DocValues_for_fields: uuidfield, bigintfield

Due to SOLR-7264, setting docValues to true on a boolean field in the Solr schema does not work. A
workaround for boolean docValues is to use 0 and 1 with a TrieIntField.
generateResources=true | false
Whether to automatically generate search index resources based on the existing CQL table metadata.
Cannot be used with schema= and solrconfig=.
Valid values:

• true - Automatically generate search index schema and configuration resources if resources do
not already exist.

• false - Default. Do not automatically generate search index resources.

include_columns=col1, col2, col3, ...


A comma-separated (CSV) list of columns to include. Empty = includes all columns.
index_merge_factor:segments
How many segments of equal size to build before merging them into a single segment.
index_ram_buffer_size=MB
The index ram buffer size in megabytes (MB).
lenient=( true | false )
Ignore non-supported type columns and continue to generate resources, instead of erroring out when
non-supported type columns are encountered. Default: false
resource_generation_profiles
To minimize index size, specify a CSV list of profiles to apply while generating resources.
Table 19: Resource generation profiles
Profile name Description

spaceSavingAll Applies spaceSavingNoJoin and spaceSavingSlowTriePrecision profiles.

spaceSavingNoJoin Do not index a hidden primary key field. Prevents joins across cores.

spaceSavingSlowTriePrecision Sets trie fields precisionStep to '0', allowing for greater space saving but slower querying.

Using spaceSavings profiles disables auto generation of DocValues.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
333
Using DataStax Enterprise advanced functionality

For example:

resource_generation_profiles: spaceSavingNoJoin, spaceSavingSlowTriePrecision

rt=true
Whether to enable live indexing to increase indexing throughput. Enable live indexing on only one
search index per cluster.

rt=true

CQL index management command examples


For example:

CREATE SEARCH INDEX CONFIG ON wiki.solr SET defaultQueryField='last_name';

See About search commands.


Using dsetool
Customize the search index config with YAML-formatted files
Create a config.yaml file that lists the following options to customize the config and schema files:

default_query_field: name
auto_soft_commit_max_time: 1000
generate_DocValues_for_fields: '*'
enable_string_copy_fields: false

Use the dsetool command to generate the search index with these options to customize the config and schema
generation. Use coreOptions to specify the config.yaml file:

$ dsetool create_core demo.health_data coreOptions=config.yaml

Customize the search index with options inline


Use the dsetool command to generate the search index and customize the schema generation. Use
coreOptions to turn on live indexing (also called RT):

$ dsetool create_core udt_ks.users generateResources=true reindex=true


coreOptions=rt.yaml

You can verify that DSE Search created the solrconfig and schema by reading core resources using dsetool.
Enable encryption for a new search index
Specify the class for directoryFactory to solr.EncryptedFSDirectoryFactory with coreOptionsInline:

$ dsetool create_core keyspace_name.table_name generateResources=true


coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"

Using LowerCaseStrField with search indexes


DataStax Enterprise 6.0.8 introduces a custom field type, LowerCaseStrField, which provides the following
features:

• Converts the data into lowercase and correctly stores the lowercase data in docValues.

• Converts the query values to lowercase.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
334
Using DataStax Enterprise advanced functionality

You cannot apply LowerCaseStrField to a table's primary key. You also cannot use any analyzers with
LowerCaseStrField.

DataStax advises against using TextField with solr.KeywordTokenizer and


solr.LowerCaseFilterFactory. Unintended search results could occur because the raw data was not stored
as lowercase in docValues, contrary to expectations. Instead, use the custom LowerCaseStrField type as
described in this topic.
For example, to use LowerCaseStrField on a field in a new index:

$ cqlsh -e "CREATE SEARCH INDEX ON healthcare.health_data WITH COLUMNS *, birthplace


{ lowerCase : true };"

The command creates a search index with birthplace using the LowerCaseStrField field type. The field type
is added automatically.
To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.
Examples:

$ dsetool get_core_schema healthcare.health_data

$ cqlsh -e "DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;"

Output:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField"
name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" type="StrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace"
type="LowerCaseStrField"/>
<field docValues="true" indexed="true" multiValued="false" name="income_group"
type="TrieIntField"/>
...
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>

To add a new field to an existing index schema with the LowerCaseStrField field type, you can:

• Use the ALTER SEARCH INDEX SCHEMA command in cqlsh

• Or you can display the current schema with dsetool get_core_schema; edit the XML manually; and use
dsetool write_resource to update the schema by specifying your edited schema XML. Refer to dsetool
get_core_schema and dsetool write_resource.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
335
Using DataStax Enterprise advanced functionality

For example, in cqlsh, the following command adds the LowerCaseStrField field type to the new field
medicalNotes if it does not exist:

ALTER SEARCH INDEX SCHEMA ON healthcare.health_data ADD lowerCaseString medicalNotes;

DESCRIBE PENDING SEARCH INDEX SCHEMA ON healthcare.health_data;

No matter which command you choose, using cqlsh or dsetool, be sure to RELOAD and REBUILD the search
index in each datacenter in the cluster.

RELOAD SEARCH INDEX ON healthcare.health_data;

REBUILD SEARCH INDEX ON healthcare.health_data;

DESCRIBE ACTIVE SEARCH INDEX SCHEMA ON healthcare.health_data;

Output:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
<fieldType class="com.datastax.bdp.search.solr.core.types.LowerCaseStrField"
name="LowerCaseStrField"/>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
...
<field docValues="true" indexed="true" multiValued="false" name="birthplace"
type="LowerCaseStrField"/>
...
<field docValues="true" indexed="true" multiValued="false" name="medicalNotes"
type="LowerCaseStrField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
336
Using DataStax Enterprise advanced functionality

</schema>

There is a workaround to apply LowerCaseStrField to primary key columns. To do so, use the copyField
declaration to copy the primary key field data to the new field that's defined as type LowerCaseStrField.
Example:

ALTER SEARCH INDEX SCHEMA ON <table> ADD lowerCaseString key_column_copy;

ALTER SEARCH INDEX SCHEMA ON <table> ADD copyField[@source='key_column',


@dest='key_column_copy'];

RELOAD SEARCH INDEX ON <table>;

REBUILD SEARCH INDEX ON <table>;

The search query is case insensitive. All queries are converted to lowercase and return the same result. For
example, searches for the following values return the same result:

• name

• Name

• NAME

Set the location of search indexes


Data that is added to a DSE Search node is locally indexed on the DSE node. Data changes to one node also
apply to the other nodes. DSE Search has its own indexing documents. You can control where the DSE Search
indexing documents are saved.
It is critical that you locate DSE Cassandra transactional data and Solr-based DSE Search data on separate
Solid State Drives (SSDs). Failure to do so will very likely result in sub-optimal search indexing performance.
By default, each DSE Search index is saved in solr_data_dir/keyspace_name.table_name, or as specified by
the dse.solr.data.dir system property.
The dataDir parameter in the solrconfig.xml file is not supported.

1. Shut down the search node.

2. Move the solr.data directory to the new location.

3. Specify the location:


From the command line To dynamically change:

cd installation_location &&
bin/dse cassandra -s -Ddse.solr.data.dir=My_data_dir

In dse.yaml To permanently change:

solr_data_dir: My_data_dir

Where My_data_dir is /var/lib/cassandra/solr.data. For example:

solr_data_dir: /var/lib/cassandra/solr.data

4. Start the node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
337
Using DataStax Enterprise advanced functionality

DSE Search logging


DSE Search logs errors, warnings, debug, trace, and info messages in the system log: /var/log/cassandra/
system.log.

Search logging classes


To add debug logging to a class permanently using the logback framework, use nodetool setlogginglevel
to confirm the component or class before setting it in the logback.xml file in installation_location/conf.
Modify to include the following line or similar at the end of the file:

<logger name="org.apache.cassandra.gms.FailureDetector" level="DEBUG"/>

Restart the node to invoke the change.


Classes for DSE Search :
Admin operations

com.datastax.bdp.search.solr.transport.protocols.admin.ReindexRequestProcessor
com.datastax.bdp.search.solr.transport.protocols.admin.CoreAdminRequestProcessor
com.datastax.bdp.search.solr.core.SolrCoreResourceManager
com.datastax.bdp.search.solr.core.CassandraResourceLoader
org.apache.solr.core.SolrCore

Indexing

com.datastax.bdp.search.solr.log.EncryptedCommitLog
com.datastax.bdp.search.solr.metrics.SolrMetricsEventListener
com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex
org.apache.lucene.store.crypto.EncryptedFSDirectory
org.apache.lucene.index.IndexWriter
org.apache.lucene.index.DocumentsWriter
org.apache.lucene.store.crypto.ThreadLocalIndexEncryptionConfiguration
org.apache.lucene.index.AutoExpungeDeletesTieredMergePolicy

Queries

com.datastax.bdp.search.solr.transport.protocols.query.ShardRequestProcessor
com.datastax.bdp.search.solr.metrics.QueryMetrics
com.datastax.bdp.search.solr.auth.DseHttpRequestAuthenticatorFactory
com.datastax.bdp.search.solr.handler.shard.modern.ModernShardHandler
com.datastax.bdp.search.solr.dht.ShardRouter
com.datastax.bdp.search.solr.transport.protocols.query.RowsRequestProcessor
com.datastax.bdp.search.solr.transport.protocols.update.AbstractUpdateCommandProcessor
org.apache.solr.search.SolrFilterCache
org.apache.solr.search.SolrIndexSearcher
org.apache.solr.handler.component.SearchHandler
org.apache.solr.core.SolrCore
org.apache.solr.handler.component.QueryComponent

See Configuring logging.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
338
Using DataStax Enterprise advanced functionality

Accessing the validation log


Validation errors occur when non-indexable data is sent from nodes other than DSE Search nodes. The
validation errors are logged in:

/var/log/cassandra/solrvalidation.log

For example, if a node that is not running DSE Search puts a string in a date field, an exception is logged for
that column when the data is replicated to the search node.
Enabling multi-threaded queries
Multi-threaded queries are useful for a low-indexing volume with longer running queries.
Multi-threaded queries can offset the load of a query onto the CPU instead of writing and reading to disk.
Benchmarking is recommended, multi-threaded queries do not always improve performance.
Use the CQL index management commands to set the number of queryExecutorThreads for the search index
config:

1. Change the number of threads on an existing table:

ALTER SEARCH INDEX CONFIG ON healthcare.health_data SET


config.queryExecutorThreads=4;

2. To view the pending search index config in XML format, use this CQL shell command:

DESCRIBE PENDING SEARCH INDEX CONFIG ON healthcare.health_data;

The results in XML:

<config>
...
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048"
lowWaterMarkMB="1024"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>16</maxWarmingSearchers>
...
</query>
...
<queryExecutorThreads>4</queryExecutorThreads>
</config>
...

3. Use the RELOAD SEARCH INDEX command to apply the pending changes to the search index:

RELOAD SEARCH INDEX ON healthcare.health_data;

4. To view the active search index in XML format:

DESCRIBE ACTIVE SEARCH INDEX CONFIG ON healthcare.health_data;

Configuring additional search components


To configure additional search components, add the search component and define it in the handler.
For example, to add the Java spell checking package JaSpell:

<searchComponent class="solr.SpellCheckComponent" name="suggest_jaspell">

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
339
Using DataStax Enterprise advanced functionality

<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup</str>
<str name="field">suggest</str>
<str name="storeDir">suggest</str>
<str name="buildOnCommit">true</str>
<float name="threshold">0.0</float>
</lst>
</searchComponent>

Configure the parameters for the request handler:

<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">


<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.extendedResults">true</str>
</lst>
<arr name="last-components">
<str>suggest_jaspell</str>
</arr>
</requestHandler>

Load balancing for distributed search queries


DSE Search uses algorithms to balance the load for distributed search queries by minimizing the number of
shards that are queried and reducing the amount of data that is transferred from non-local nodes. Strategies are
per search index (per core) and can be changed with dsetool set_core_property. Changes are recognized with
RELOAD SEARCH INDEX and do not require restarting the node. Different search indexes can have different
values.
Core properties
The core properties for load balancing distributed search queries are:

shard.set.cover.finder

The shard set cover finder calculates how to set cover for a query and specify how one node is selected over
others for reading the search data.
The value can be one of:

• STATIC
Use Results

# Faster # For a given index, a particular coordinator accesses the same token
ranges from the respective shards.
# shard.set.cover.finder=STATIC
# Creates fewer token filters.

# Load balancing on the client side is required to achieve uniform utilization


of shards by the coordinator nodes.

• DYNAMIC

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
340
Using DataStax Enterprise advanced functionality

Use Results

# DYNAMIC is the default in DSE 6.0 # There is no fixed distribution of shard requests for a given coordinator.
For two queries, there may be two different sets of shard requests.
# shard.set.cover.finder=DYNAMIC
# Creates a large number of unique token filters because different queries
may yield shard requests accessing different sets of token ranges. This
scenario is often times a problem especially with vnodes because there
is a much greater number of possible combinations.

# In your development environment, compare the load balancing


performance when you test using the STATIC or DYNAMIC setting. The
DSE 6.0 default of DYNAMIC may not be optimal for your search queries.

shard.shuffling.strategy

When shard.set.cover.finder=DYNAMIC, you can change the shard shuffling strategy to one of these values:

• HOST - Shards are selected based on the host that received the query.

• QUERY - Shards are selected based on the query string.

• HOST_QUERY - Shards are selected by host x query.

• RANDOM - Different random set of shards are selected with each request (default).

• SEED - Selects the same shard from one query to another.

shard.set.cover.finder.inertia

When shard.set.cover.finder=STATIC, you can change the shard cover finder inertia value. Increasing the
inertia value from the default of 1 may improve performance for clusters with more than 1 vnode and more than
20 nodes. The default is appropriate for most workloads.
Changing core properties
Changing core properties is an advanced operation that sets properties in the dse-search.properties
resource for the search index.
These example commands show how to change core properties for the demo keyspace and the health_data
table.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
341
Using DataStax Enterprise advanced functionality

1. To change the shard set cover finder:

$ dsetool set_core_property demo.health_data shard.set.cover.finder=STATIC

2. Only when shard.set.cover.finder=DYNAMIC, you can change the shard shuffling strategy:

$ dsetool set_core_property demo.health_data shard.shuffling.strategy=query

3. To recognize the changes on the node, reload the search index:

RELOAD SEARCH INDEX ON demo.health_data

4. To view the state of the properties in the dse-search.properties resource:

$ dsetool list_core_properties demo.health_data

Result:

shard.set.cover.finder=STATIC

Log files show the loaded DSE search properties. The dsetool list_core_properties command shows
only the state of the properties in the dse-search.properties resource.

Excluding hosts from distributed queries


To exclude hosts from distributed queries, perform these steps on each node that you want to send queries to:

1. Navigate to the solr/conf directory.


solr conf
The default Solr conf location depends on the type of installation:

• Package installations: /usr/share/dse/resources/solr/conf

• Tarball installations: installation_location/resources/solr/conf

2. Open the exclude.hosts file, and add the list of nodes to be excluded. Separate each name with a
newline character.

3. Update the list of routing endpoints on each node by calling the JMX operation refreshEndpoints() on
the com.datastax.bdp:type=ShardRouter mbean.

DSE Search performance tuning and monitoring


Use these tuning options for DSE Search:
Tuning search for maximum indexing throughput
To tune DataStax Enterprise (DSE) Search for maximum indexing throughput, follow the recommendations in
this topic. Also see the related topics in DSE Search performance tuning and monitoring. If search throughput
improves in your development environment, consider using the recommendations in production.
Locate transactional and search data on separate SSDs
It is critical that you locate DSE Cassandra transactional data and Solr-based DSE Search data on separate
Solid State Drives (SSDs). Failure to do so will very likely result in sub-optimal search indexing performance.

For the steps to accomplish this task, refer to Set the location of search indexes.
In addition, plan for sufficient memory resources and disk space to meet operational requirements. Refer to
Capacity planning for DSE Search.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
342
Using DataStax Enterprise advanced functionality

Determine physical CPU resources


Before you tune anything, determine how many physical CPUs you have. The JVM does not know whether
CPUs are using hyper-threading.
Assess the IO throughput
DSE Search can be very IO intensive. Performance is impacted by the Thread Per Core (TPC) asynchronous
read and write paths architecture. In your development environment, check the iowait system metric by using
the iostat command during peak load. For example, on Linux:

iostat -x -c -d -t 1 600

IOwait is a measure of time over a given period that a CPU (or all CPUs) spent idle because all runnable tasks
were waiting for an IO operation to complete. While each environment is unique, a general guidelines is to
check whether iowait is above 5% more than 5% of the time. If that scenario occurs, try upgrading to faster
SSD devices or tune the machine to use less IO, and test again. Again, it's important to locate the search data
on dedicated SSDs, separate from the transactional data.
Disable AIO
All DSE Search index updates first perform a read-before-write against the partition or row being indexed. This
functionality means DSE uses the core database's internal read path, which in turn uses the asynchronous I/O
(AIO) chunk cache apparatus.
If you are experiencing poor performance during search indexing, or during read or write queries of frequently
used datasets, DataStax recommends that you try the following steps. Starting in your development
environment:

1. Disable AIO.

2. Set file_cache_size_in_mb to 512.

To disable AIO, pass -Ddse.io.aio.enabled=false to DSE at startup. Once enforced, SSTables and Lucene
segments, as well as other minor off-heap elements, will reside in the OS page cache and will be managed by
the kernel.
Disabling AIO will generate a WARN entry in system.log. Example:

WARN [main] 2019-10-01 21:37:16,563 StartupChecks.java:632


- Asynchronous I/O has been manually disabled (through the'dse.io.aio.enabled'
system property). This may result in subpar performance.

If performance improves, consider using these settings in production.


DSE 6.0 and later use AIO and a custom chunk cache that replaces the OS page cache for SSTable data.
However, in DSE 6.0.7 and later 6.0.x releases, and in DSE 6.7.3 and later 6.7.x releases, AIO is disabled
automatically if the file cache size is less than one-eighth (#) of the system memory. By default, the chunk
cache is configured to use one-half (½) of the max direct memory for the DSE process.
For related information, refer to increasing the max direct memory.
The only scenario where you could consider leaving AIO enabled is when you have mostly DSE/Cassandra
database workloads and your DSE Search usage is very light.
Differences between indexing modes
There are two indexing modes in DSE Search:

• Near-real-time (NRT) indexing is the default indexing mode for Apache Solr™ and Apache Lucene®.

• Live indexing, also called real-time (RT) indexing, supports searching directly against the Lucene RAM
buffer and more frequent, cheaper soft-commits, which provide earlier visibility to newly indexed data.
However, RT indexing requires a larger RAM buffer and more memory usage than an otherwise equivalent
NRT setup.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
343
Using DataStax Enterprise advanced functionality

Tune NRT reindexing


DSE Search provides multi-threaded asynchronous indexing with a back pressure mechanism to avoid
saturating available memory and to maintain stable performance. Multi-threaded indexing improves
performance on machines that have multiple CPU cores.
For reindexing only, the IndexPool MBean provides operational visibility and tuning through JMX.
For NRT only, to maximize NRT throughput during a manual re-index, adjust these settings in the search index
config:

• Increase the soft commit time, which is set to 10 seconds (10000 ms) by default. For example, increase the
time to 60 seconds and then reload the search index:

ALTER SEARCH INDEX CONFIG ON demo.health_data SET autoCommitTime = 60000;

To make the pending changes active:

RELOAD SEARCH INDEX ON demo.health_data;

A disadvantage of changing the autoSoftCommit attribute is that newly updated rows take longer than usual
(10000ms) to appear in search results.
Tune RT indexing
Live indexing reduces the time for docs to be searchable.

1. To enable live indexing (also known as RT):

ALTER SEARCH INDEX CONFIG ON demo.health_data SET realtime = true;

2. To configure live indexing, set the autoCommitTime to a value between 100-1000 ms:

ALTER SEARCH INDEX CONFIG ON demo.health_data SET autoCommitTime = 1000;

Test with tuning values of 100-1000 ms. An optimal setting in this range depends on your hardware and
environment. For live indexing (RT), this refresh interval saturates at 1000 ms. A value higher than 1000
ms is not recognized.

3. Ensure that search nodes have at least 14 GB heap.

4. If you change the heap, restart DSE to use live indexing with the changed heap size.

Tune TPC cores


DSE Search workloads do not benefit from hyper-threading for writes (indexing). To optimize DSE Search for
indexing throughput for both modes (NRT and RT), change tpc_cores in cassandra.yaml from the default to
the number of physical CPUs. Change this setting only on search nodes, because this change might degrade
throughput for workloads other than search.
Size RAM buffer
The default settings for RAM buffer in dse.yaml are appropriate for:

• ram_buffer_heap_space_in_mb: 1024

• ram_buffer_offheap_space_in_mb: 1024
Because NRT does not use offheap, these settings apply only to RT.

Adjust these settings to configure how much global memory all Solr cores use to accumulate updates before
flushing segments. Setting this value too low can induce a state of constant flushing during periods of ongoing
write activity. For NRT, these forced segment flushes will also de-schedule pending auto-soft commits to avoid
potentially flushing too many small segments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
344
Using DataStax Enterprise advanced functionality

JMX MBean path: com.datastax.bdp.metrics.search.RamBufferSize


Check back pressure setting
The back_pressure_threshold_per_core in dse.yaml affects only index rebuilding/reindexing. If you upgraded to
DSE 6.0 from earlier versions, ensure that you use the new default value of 1024.
Use default mergeScheduler
The default mergeScheduler settings are set automatically. Do not adjust these settings in DSE 6.0 and later.
In earlier versions, the default settings were different and might have required tuning.
Resolving query timeouts on restarted nodes
When restarting nodes with large indexes (hundreds of megabytes), initial queries might timeout due to the time
it takes to build the token range filter queries.

To workaround timeouts:

1. Run with a replication factor greater than 1 to ensure that replicas are always available.

2. Configure the dse.yaml settings for enable_health_based_routing and uptime_ramp_up_period_seconds to


be larger than the amount of time it takes for the first query to answer. 1 hour is usually enough.

3. After restarting the node, issue several match all queries. For example, q=*:* to warm up the filters.

4. If you're using the Java Driver, create an ad-hoc session with only the node to warm up in the white list.
Issuing many queries increase the chances that all token ranges are used.

After the uptime ramp-up period, the node starts to be hit by distributed queries. The filters are warmed up
already and timeouts should not occur.
Table compression can optimize reads
Search nodes typically engage in read-dominated tasks, so maximizing storage capacity of nodes, reducing the
volume of data on disk, and limiting disk I/O can improve performance. You can configure data compression on
a per-table basis to optimize performance of read-dominated tasks.
You can implement custom compression classes using the
org.apache.cassandra.io.compress.ICompressor interface. See CQL table properties.

Parallelizing large row reads


For performance, you can configure DSE Search to parallelize the retrieval of a large number of rows.
Configure the queryResponseWriter in the search index as follows:

<queryResponseWriter name="javabin" class="solr.BinaryResponseWriter">


<str name="resolverFactory">com.datastax.bdp.search.solr.response.ParallelRowResolver
$Factory</str>
</queryResponseWriter>

By default, the parallel row resolver uses up to x threads to execute parallel reads, where x is the number of
CPUs. Each thread sequentially reads a batch of rows equal to the total requested rows divided by the number
of CPUs:
Rows read = Total requested rows / Number of CPUs
You can change the batch size per request, by specifying the cassandra.readBatchSize HTTP request
parameter. Smaller batches use more parallelism, while larger batches use less.
Changing the stack size and memtable space
Some Solr users have reported that increasing the stack size improves performance under Tomcat. To
increase the stack size, uncomment and modify the default -Xss256k setting in the cassandra-env.sh file.
Also, decreasing the memtable space to make room for Solr caches might improve performance. Modify the

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
345
Using DataStax Enterprise advanced functionality

memtable space using the memtable_heap_space_in_mb and memtable_offheap_space_in_mb properties in


the cassandra.yaml file.
Tuning index size and range query speed
In DataStax Enterprise, you can trade off search index size for range query speed and vice versa. You make
this tradeoff to suit a particular use case and on a core-by-core basis by setting up the precision step of two
special token field types that are used by DataStax Enterprise.
Use extreme care when performing this tuning. This advanced tuning feature is recommended for use in rare
cases. In most cases, using the default values is the best. To perform this tuning, you change the precision step
of one or both DataStax Enterprise internal field types:

• token_long
Used for filtering over token fields during query routing.

• ttl_long
Used for searching for expiring documents.

To change the precision step:

1. In the fieldType definition, set the class attribute of token_long and ttl_long to solr.TrieLongField.

2. Set the precisionStep attribute from the default 8 to another number. Choose this number based on an
understanding of its impact. Usually, a smaller precision step increases the index size and range query
speed, while a larger precision step reduces index size, but potentially reduces range query speed.
The following snippet of the schema.xml shows an example of the required configuration of both field types:

<?xml version="1.0" encoding="UTF-8" ?>


<schema name="test" version="1.0">
<types>
. . .
<fieldType name="token_long" class="solr.TrieLongField" precisionStep="16" />
<fieldType name="ttl_long" class="solr.TrieLongField" precisionStep="16" />
. . .
</types>
<fields>
. . .
</schema>

DataStax Enterprise ignores one or both of these field type definitions and uses the default precision step if you
make any of these mistakes:

• The field type is defined using a name other than token_long or ttl_long.

• The class is something other than solr.TrieLongField.

• The precision step value is not a number. DataStax Enterprise logs a warning.

The definition of a fieldType alone sets up the special field. You do not need to use token_long or ttl_long
types as fields in the <fields> tag.
Improving read performance
You can increase DSE Search read performance by increasing the number of replicas. You define a strategy
class, the names of datacenters, and the number of replicas. For example, you can add replicas using the
NetworkToplogyStrategy replica placement strategy.

For example, if you are using a PropertyFileSnitch, perform these steps:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
346
Using DataStax Enterprise advanced functionality

1. Examine your datacenter and nodes. The following example shows two datacenters with one node in each
datacenter, which is a suboptimal configuration.

nodetool -h localhost ring

Datacenter: DC1
==========
Address Rack Status State Load Owns
Token

145.101.134.121 rack1 Up Normal 160.54 KiB 100.00%


-9223372036854775808

Datacenter: DC2
==========
Address Rack Status State Load Owns
Token

145.101.134.122 rack1 Up Normal 160.54 KiB 100.00%


-9223372036854775808

The datacenter names, DC1 and DC2 in this example, must match the datacenter name configured for
your snitch.

2. Start CQL on the command line and create a keyspace that specifies the number of replicas.
To improve read performance, increase the number of replicas in the datacenters. For example, at least
three replicas in DC1 and three in DC2.

DSE Search operations


You can run DSE Search on one or more nodes. Typical operations including configuration of nodes, policies,
query routing, balancing loads, and communications.
DSE Search initial data migration
Best practices and guidelines for loading data into DSE Search.
When you initially load data into DataStax Enterprise (DSE) resource contention requires planning to ensure
performance.

• DSE is performant when writing data.

• Apache Solr™ is resource intensive when creating a search index.

These two activities compete for resources, so proper resource allocation is critical to maximize efficiency for
initial data load.
Recommendations

• For maximum throughput, store the search index data and DataStax Enterprise (Cassandra) data on
separate physical disks.
If you are unable to use separate disks, DataStax recommends that SSDs have a minimum of 500 MB/s
read/write speeds (bandwidth).

• Enable OpsCenter 6.1 repair service.

Also see memory recommendations in the planning guide.


Initial bulk loading
DataStax recommends following this high-level procedure:

1. Install DSE and configure nodes for search workloads.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
347
Using DataStax Enterprise advanced functionality

2. Use the CQL CREATE SEARCH INDEX command to create search indexes.

3. Tune the index for maximum indexing throughput.

4. Load data into the database using best practices for data loading. For example, load data with the driver
with the consistency level at LOCAL_ONE (CL.LOCAL_ONE) and a sufficiently high write timeout.
After data loading is completed, there might be lag time because indexing is asynchronous.

5. Verify the indexing QueueSize with the IndexPool MBean. After the index queue size has receded, run this
CQL query to verify that the number of records is as expected:

SELECT count(*) FROM ks.table WHERE solr_query = '*:*';

New data is automatically indexed.


Troubleshooting
If the record count does not stabilize:

• If dropped mutations exist in the nodetool tpstats output for some nodes, and OpsCenter repair service is
not enabled, run manual repair on those nodes.

• If dropped mutations do not exist, check the system.log and the Solr validation logfor indexing errors.

Shard routing for distributed queries


On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to route
sub-queries to the nodes most capable of handling them. The best node is determined by a chain of node
comparisons. Selection occurs in the following order using these criteria:

1. Is node active?
Preference to active nodes.

2. Is the requested core indexing, or has it failed to index?


If node is not in either of these states, select the best node.

3. Node health rank, an exponentially increasing number between 0 and 1. It describes the health node so if
all the previous criteria is equal, a node with a better score is chosen first. This node health rank value is
exposed as a JMX metrics under ShardRouter.
Node health rank is a comparison of uptime and dropped mutations:

node health = uptime / (1 + drop_rate)

where:

• drop_rate = the rate of dropped mutations per minute over a sliding window of configurable length.
To configure the historic time window, set dropped_mutation_window_minutes in dse.yaml.
A high-dropped mutation rate indicates an overloaded node. For example, database insertions, and
updates.

• uptime = a score between 0 and 1 that weights recent downtime more heavily than less recent
downtime.

4. Is the node close to the node that is issuing the query?


Node selection uses endpoint snitch proximity. Give preference to closer nodes.

After using these criteria, node selection is random.


To check on the shard router, add shards.info=true to the search query.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
348
Using DataStax Enterprise advanced functionality

The ShardRouter MBean, not present in open source Solr, provides information about how DSE search routes
queries.
Deleting a search index
To delete a search index:

1. Launch cqlsh and execute the CQL command to drop the search index:

DROP SEARCH INDEX wiki.solr;

2. Exit cqlsh and verify that the files are deleted from the file system:

ls /var/lib/cassandra/data/solr.data/wiki.solr/index

Verifying indexing status


You can check the indexing status using dsetool, the Core Admin, or the logs.
Examples
These examples use the demo keyspace and health_data table.
To view the indexing status for the local node:

$ dsetool core_indexing_status demo.health_data

The results are displayed:

[demo.health_data]: INDEXING, 38% complete, ETA 452303 milliseconds (7 minutes 32


seconds),
reason: USER_REQUEST

To view the indexing status for a search index on a specified node:

$ dsetool -h 200.192.10.11 core_indexing_status demo.health_data

To view indexing status of all search indexes in the data center:

$ dsetool core_indexing_status demo.health_data --all

The results are displayed for 3 nodes in the data center:

Address Core Indexing Status


200.192.10.11 FINISHED
200.192.10.12 FINISHED
200.192.10.23 FINISHED

Checking the indexing status using the Core Admin


To check the indexing status, open the Solr Admin and click Core Admin.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
349
Using DataStax Enterprise advanced functionality

Checking the indexing status using the logs


You can also check the logs to get the indexing status. For example, you can check information about the
plugin initializer:

INDEXING / REINDEXING -
INFO SolrSecondaryIndex plugin initializer. 2013-08-26 19:25:43,347
SolrSecondaryIndex.java (line 403) Reindexing 439171 keys for core wiki.solr

Or you can check the SecondaryIndexManager.java information:

INFO Thread-38 2013-08-26 19:31:28,498 SecondaryIndexManager.java (line 136) Submitting


index build of wiki.solr for data in SSTableReader(path='/mnt/cassandra/data/wiki/solr/
wiki-solr-ic-5-Data.db'), SSTableReader(path='/mnt/cassandra/data/wiki/solr/wiki-solr-
ic-6-Data.db')

FINISH INDEXING -
INFO Thread-38 2013-08-26 19:38:10,701 SecondaryIndexManager.java (line 156) Index build
of wiki.solr complete

Backing up DSE Search data directories


As a starting point, use these steps to create backups for DSE Search data directories. These steps apply when
the backups are intended to restore a cluster with the same token layout, and the backups can be created in a
rolling fashion.

For each node:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
350
Using DataStax Enterprise advanced functionality

1. Drain the node to ensure that the search indexes are in sync with their backing tables. This command
forces a memtable flush that forces a Solr hard commit:

$ nodetool drain

2. Shut down the node.

3. Manually back up your data directories. The default location for index files is /var/lib/cassandra/data/
solr.data.

4. Restart the node.

Restoring a search node from backup


Reload data and rebuild the indexes as we load data.

1. Use the DataStax Enterprise restore steps with indexing enabled and let the data write as data is coming
in.
Use the OpsCenter Backup Service.

2. Follow the steps in DSE Search initial data migration.

Monitoring DSE Search


Uploading the search index schema and config
After generating or changing the search index schema and configuration, use dsetool to upload to a DSE
Search node to create a search index. You can also post additional resource files.
You can configure the maximum resource file size or disable resource upload with the DSE Search resource
upload limit option in dse.yaml.

Using custom resources is not supported by the CQL CREATE SEARCH INDEX command.

Index resources are stored internally in the database, not in the file system. The schema and configuration
resources are persisted in the solr_admin.solr_resources database table.

1. Write the schema:

$ dsetool write_resource keyspace.table name=schema.xml file=schemaFile.xml

2. Post the configuration file:

$ dsetool write_resource keyspace.table name=solrconfig.xml file=solrconfigFile.xml

3. Post any other resources that you might need.

$ dsetool write_resource keyspace.table name=ResourceFile.xml file=schemaFile.xml

You can specify a path for the resource file:

$ dsetool write_resource keyspace.table name=ResourceFile.xml file=myPath1/myPath2/


schemaFile.xml

4. To verify the resources after they are posted:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
351
Using DataStax Enterprise advanced functionality

For example:

$ dsetool read_resource keyspace.table name=ResourceFile.xml file=myPath1/myPath2/


schemaFile.xml

Solr interfaces

Changing the Solr connector port


To change the Solr port from the default port 8983, change the http.port setting in the catalina.properties
file.
The file is installed with DSE in installation_location/tomcat/conf, usually /usr/share/dse/tomcat/conf/
catalina.properties.

Accessing search indexes from Solr Admin UI (deprecated)


When DataStax Enterprise authorization is enabled, access to search indexes (cores) is restricted from the Solr
Admin UI. You must grant permissions to roles of Solr Admin UI users for HTTP operations.
Table Required permissions Operation

solr_admin.solr_resources SELECT Read a resource

solr_admin.solr_resources MODIFY Write a resource

core table ALTER Stop core reindex

core table SELECT Query core and all remaining admin query
operations on core

core table MODIFY Commit and delete

Permissions are inherited. Granting permissions on a keyspace allows users with that role to access all
tables in the keyspace.

Examples
To grant permission to read resources:

GRANT SELECT ON solr_admin.solr_resources


TO role_name;

Changing Tomcat web server settings


To configure security for DSE Search, change the IP address for client connections to DSE Search using the
HTTP and Solr Admin interfaces in the Tomcat web server.xml file.

Make configuration changes in the Tomcat server.xml file:

1. Change the IP address for client connections to DSE Search.


The default IP address that the HTTP and Solr Admin interface uses to access DSE Search is defined
with native_transport_address in the cassandra.yaml file.

• The default native_transport_address value localhost enables Tomcat to only listen on the
localhos.

• To enable Tomcat to listen on all configured interfaces, set native_transport_address to 0.0.0.0.

To change the IP address for client connections to DSE Search using the HTTP and Solr Admin
interfaces, change the client connection in the following ways native_transport_address in the
cassandra.yaml file or create a Tomcat connector.
Create a Tomcat connector:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
352
Using DataStax Enterprise advanced functionality

In the <Service name="Solr"> section of the server.xml file:

<Connector
port="PORT"
protocol="HTTP/1.1"
address="IP_ADDRESS"
connectionTimeout="20000"
redirectPort="8443"
/>

Change the native_transport_address


Change native_transport_addressin the cassandra.yaml file. The native_transport_address
is read on startup only.

2. For advanced users only: In the Tomcat server.xml file, specify a client connection port other than the
default port 8983. However, when specifying a non-default connection port, the automatic SSL connection
configuration performed by DataStax Enterprise is not done. You must provide the valid connector
configuration, including keystore path and password. See the DataStax Support article Configuring the
DSE Solr HTTP/HTTPS port.

3. After making changes, restart the node.

Configuring the Solr library path


The location for library files in DataStax Enterprise is not the same location as open source Apache Solr™.
Contrary to the examples shown in the solrconfig.xml file that indicate support for relative paths, DSE Search
does not support the relative path values that are set for the <lib> property and cannot find files in directories
that are defined by the <lib> property. The workaround is to place custom code or Solr contrib modules in the
Solr library directories.
The default Solr library path location depends on the type of installation:

• Package installations: /usr/share/dse/solr/lib

• Tarball installations: installation_location/resources/solr/lib

When the plugin JAR file is not in the directory that is defined by the <lib> property, attempts to deploy custom
Solr libraries in DataStax Enterprise fail with java.lang.ClassNotFoundException and an error in the
system.log like this:

ERROR [http-8983-exec-5] 2015-12-06 16:32:33,992 CoreContainer.java (line 956) Unable to


create core: boogle.main
org.apache.solr.common.SolrException: Error loading class
'com.boogle.search.CustomQParserPlugin'
at org.apache.solr.core.SolrCore.(SolrCore.java:851)
at org.apache.solr.core.SolrCore.(SolrCore.java:640)
at
com.datastax.bdp.search.solr.core.CassandraCoreContainer.doCreate(CassandraCoreContainer.java:675)
at
com.datastax.bdp.search.solr.core.CassandraCoreContainer.create(CassandraCoreContainer.java:234)
at
com.datastax.bdp.search.solr.core.SolrCoreResourceManager.createCore(SolrCoreResourceManager.java:256)
at
com.datastax.bdp.search.solr.handler.admin.CassandraCoreAdminHandler.handleCreateAction(CassandraCoreAdminHan
...
Caused by: org.apache.solr.common.SolrException: Error loading class
'com.boogle.search.CustomQParserPlugin'
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:405)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:541)
...
Caused by: java.lang.ClassNotFoundException: com.boogle.search.CustomQParserPlugin

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
353
Using DataStax Enterprise advanced functionality

at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
...

Workaround
Using the class in this example with the JAR file name com.boogle.search.CustomQParserPlugin-1.0.jar,
follow these steps to get the custom plugin working on all DSE Search nodes.

1. Define the parser in the search index config file:

<queryParser name="myCustomQP" class="com.boogle.search.CustomQParserPlugin"/>

2. Place custom code or Solr contrib modules in the Solr library directories.

3. Deploy the JAR file on all DSE Search nodes in the cluster in the appropriate lib/ directory.
For example, package installations: /usr/share/dse/solr/lib/
com.boogle.search.CustomQParserPlugin-1.0.jar

4. Reload the search index with the new configuration.

Changing the HTTP interface to Apache JServe Protocol


In addition to the widely-used HTTP interface, you can configure DSE search to use the AJP (Apache JServe
Protocol). AJP is an optimized, binary version of HTTP that facilitates Tomcat communication with an Apache
web server using mod_jk. Typically, you use AJP when HTTPS serves a web application and DSE Search
powers the backend.
By default the AJP connector is disabled. To enable the AJP connector, uncomment the connector configuration
in the Tomcat server.xml file. Remove the comments as shown:

<!-- Define an AJP 1.3 Connector on port 8009 -->


Connector port="8009" protocol="AJP/1.3" redirectPort="8443"

Field transformer (FIT)


DataStax Enterprise (DSE) supports using a field input/output transformer (FIT) API.
A field input/output transformer, an alternative for handling update requests, is executed later than a URP at
indexing time. See the DataStax Developer Blog post An Introduction to DSE Field Transformers.

The DSE custom URP implementation is deprecated.

DSE custom URP provided similar functionality to the Solr URP chain, but appeared as a plugin to Solr. The
classic URP is invoked when updating a document using HTTP and the custom URP is invoked when updating
a table using DSE. If both classic and custom URPs are configured, the classic version is executed first. The
custom URP chain and the FIT API work with CQL and HTTP updates.
Examples are provided for using the field input/output transformer API and the deprecated custom URP.
Field input/output (FIT) transformer API
Use the field input/output transformer API as an option to the input/output transformer support in Apache
Solr™. An Introduction to DSE Field Transformers provides details on the transformer classes.
DSE Search includes the released version of a plugin API for Solr updates and a plugin to the
CassandraDocumentReader. The plugin API transforms data from the secondary indexing API before data
is submitted. The plugin to the CassandraDocumentReader transforms the results data from the database to
DSE Search.
Using the API, applications can tweak a Solr Document before it is mapped and indexed according to the
schema.xml. The API is a counterpart to the input/output transformer support in Solr.
The field input transformer (FIT) requires:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
354
Using DataStax Enterprise advanced functionality

• name="dse"

• A trailing Z for date field values

To use the API:

1. Define the plugin in the top level <config> element in the solrconfig.xml for a table (search core).

<config>
...
<fieldInputTransformer name="dse" class="
com.datastax.bdp.cassandra.index.solr.functional.
BinaryFieldInputTransformer">
</fieldInputTransformer>

<fieldOutputTransformer name="dse" class="


com.datastax.bdp.cassandra.index.solr.functional.
BinaryFieldOutputTransformer">
</fieldOutputTransformer>
...
</config>

2. Write a transformer class something like this reference implementation to tweak the data in some way.

3. Export the class to a JAR file. You must place the JAR file in this location:

• Tarball installations: install-location/resources/solr/lib

• Package installations: /usr/share/dse/solr/lib

The JAR is added to the CLASSPATH automatically.

4. Test your implementation using something like the reference implementation.

FIT transformer class examples


The DataStax Developer Blog provides an introduction to DSE Field Transformers.
Here are examples of field input and output transformer (FIT) classes.
Input transformer example

package com.datastax.bdp.search.solr.functional;

import java.io.IOException;

import org.apache.commons.codec.binary.Hex;
import org.apache.commons.lang.StringUtils;
import org.apache.lucene.document.Document;
import org.apache.solr.core.SolrCore;
import org.apache.solr.schema.SchemaField;

import com.datastax.bdp.search.solr.FieldOutputTransformer;
import org.apache.solr.schema.IndexSchema;

public class BinaryFieldInputTransformer extends FieldInputTransformer


{
@Override
public boolean evaluate(String field)
{
return field.equals("binary");
}

@Override
public void addFieldToDocument(SolrCore core,
IndexSchema schema,

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
355
Using DataStax Enterprise advanced functionality

String key,
Document doc,
SchemaField fieldInfo,
String fieldValue,
DocumentHelper helper)
throws IOException
{
try
{
byte[] raw = Hex.decodeHex(fieldValue.toCharArray());
byte[] decomp = DSP1493Test.decompress(raw);
String str = new String(decomp, "UTF-8");
String[] arr = StringUtils.split(str, ",");
String binary_name = arr[0];
String binary_type = arr[1];
String binary_title = arr[2];

SchemaField binaryNameField = core.getSchema().getFieldOrNull("binary_name");


SchemaField binaryTypeField = core.getSchema().getFieldOrNull("binary_type");
SchemaField binaryTitleField = core.getSchema().getFieldOrNull("binary_title");

helper.addFieldToDocument(core, core.getSchema(), key, doc, binaryNameField,


binary_name);
helper.addFieldToDocument(core, core.getSchema(), key, doc, binaryTypeField,
binary_type);
helper.addFieldToDocument(core, core.getSchema(), key, doc, binaryTitleField,
binary_title);
}
catch (Exception ex)
{
throw new RuntimeException(ex);
}
}
}

Output transformer example

package com.datastax.bdp.search.solr.functional;

import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.lucene.index.FieldInfo;
import org.apache.lucene.index.StoredFieldVisitor;
import com.datastax.bdp.search.solr.FieldOutputTransformer;

public class BinaryFieldOutputTransformer extends FieldOutputTransformer


{
@Override
public void binaryField(FieldInfo fieldInfo, byte[] value,
StoredFieldVisitor visitor, DocumentHelper helper) throws IOException
{
byte[] bytes = DSP1493Test.decompress(value);
String str = new String(bytes, "UTF-8");
String[] arr = StringUtils.split(str, ",");
String binary_name = arr[0];
String binary_type = arr[1];
String binary_title = arr[2];

FieldInfo binary_name_fi = helper.getFieldInfo("binary_name");


FieldInfo binary_type_fi = helper.getFieldInfo("binary_type");
FieldInfo binary_title_fi = helper.getFieldInfo("binary_title");

visitor.stringField(binary_name_fi, binary_name);
visitor.stringField(binary_type_fi, binary_type);

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
356
Using DataStax Enterprise advanced functionality

visitor.stringField(binary_title_fi, binary_title);
}
}

Custom URP example (deprecated)


DSE Search includes the released version of a plugin API for Solr updates and a plugin to the
CassandraDocumentReader. The plugin API transforms data from the secondary indexing API before data
is submitted. The plugin to the CassandraDocumentReader transforms the results data from the database to
DSE Search.

The DSE custom URP implementation is deprecated. A custom URP is almost always unnecessary.
Instead, DataStax recommends using the field input/output (FIT) transformer API instead.

Using the API, applications can tweak a search document before it is mapped and indexed according to the
index schema.
The field input transformer (FIT) requires a trailing Z for date field values.

To use the API:

1. Configure the custom URP in the solrconfig.xml.

<dseUpdateRequestProcessorChain name="dse">
<processor
class="com.datastax.bdp.search.solr.functional.DSEUpdateRequestProcessorFactoryExample">
</processor>
</dseUpdateRequestProcessorChain>

2. Write a class to use the custom URP that extends the Solr UpdateRequestProcessor. For example:

package com.datastax.bdp.search.solr.functional;

import java.io.IOException;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.datastax.bdp.search.solr.handler.update.CassandraAddUpdateCommand;
import com.datastax.bdp.search.solr.handler.update.CassandraCommitUpdateCommand;
import org.apache.solr.update.AddUpdateCommand;
import org.apache.solr.update.CommitUpdateCommand;
import org.apache.solr.update.processor.UpdateRequestProcessor;

public class TestUpdateRequestProcessor extends UpdateRequestProcessor


{
protected final Logger logger =
LoggerFactory.getLogger(TestUpdateRequestProcessor.class);

public TestUpdateRequestProcessor(UpdateRequestProcessor next)


{
super(next);
}

public void processAdd(AddUpdateCommand cmd) throws IOException


{
if (cmd instanceof CassandraAddUpdateCommand)
{
logger.info("Processing Cassandra-actuated document update.");
}
else
{
logger.info("Processing HTTP-based document update.");
}

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
357
Using DataStax Enterprise advanced functionality

super.processAdd(cmd);
}

public void processCommit(CommitUpdateCommand cmd) throws IOException


{
if (cmd instanceof CassandraCommitUpdateCommand)
{
logger.info("Processing DSE-actuated commit.");
}
else
{
logger.info("Processing client-actuated commit.");
}
super.processCommit(cmd);
}
}

3. Export the class to a JAR, and place the JAR in this location:

• Tarball installations: install-location/resources/solr/lib

• Package installations: /usr/share/dse/solr/lib

The JAR is added to the CLASSPATH automatically.

4. Test your implementation. For example:

package com.datastax.bdp.search.solr.functional;

import com.datastax.bdp.search.solr.handler.update.DSEUpdateProcessorFactory;
import org.apache.solr.core.SolrCore;
import org.apache.solr.update.processor.UpdateRequestProcessor;

public class DSEUpdateRequestProcessorFactoryExample extends


DSEUpdateProcessorFactory
{
SolrCore core;

public DSEUpdateRequestProcessorFactoryExample(SolrCore core) {


this.core = core;
}

public UpdateRequestProcessor getInstance(


UpdateRequestProcessor next)
{
return new TestUpdateRequestProcessor(next);
}
}

Interface for custom field types


DSE Search implements a CustomFieldType interface that marks Apache Solr™ custom field types and
provides their actual stored field type. The custom field type stores an integer trie field as a string representing
a comma separated list of integer values. When indexed the string is split into its integer values, each one
indexed as a trie integer field. This class effectively implements a multi-valued field based on its string
representation.
A CustomFieldType can override this method to provide the FieldType for the binary response writer to look at
when it determines whether to call the field's toObject(). This allows the binary response writer, for instance, to
return java.util.Date in place of text for a CustomFieldType that extends TrieDateField.
To ensure that custom field types control their serialized value, use:

public Class<? extends FieldType> getKnownType()

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
358
Using DataStax Enterprise advanced functionality

{
return getClass();
}

See the example reference implementation.

To use the CustomFieldType interface:

1. Implement a custom field type class something like the following reference implementation.

2. Export the class to a JAR, and place the JAR in this location:

• Package installations: /usr/share/dse/solr/libusr/share/dse

• Tarball installations: install-location/resources/solr/libinstallation_location/resources/


dse/lib

The JAR is added to the CLASSPATH automatically.

Reference implementation
Here is an example of a custom field type class:

package com.datastax.bdp.search.solr.functional;

import com.datastax.bdp.search.solr.CustomFieldType;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.index.IndexableField;
import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.StrField;
import org.apache.solr.schema.TrieField;

public class CustomTestField extends TrieField implements CustomFieldType


{
public CustomTestField()
{
this.type = TrieField.TrieTypes.INTEGER;
}

@Override
public FieldType getStoredFieldType()
{
return new StrField();
}

@Override
public boolean multiValuedFieldCache()
{
return true;
}

@Override
public ListIndexableField createFields(SchemaField sf, Object value)
{
String[] values = ((String) value).split(" ");
ListIndexableField fields = new ArrayListIndexableField();
for (String v : values)
{
fields.add(createField(sf, v));
}
return fields;

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
359
Using DataStax Enterprise advanced functionality

@Override
public String toInternal(String value)
{
return value;
}

@Override
public String toExternal(IndexableField f)
{
return f.stringValue();
}

public Class<? extends FieldType> getKnownType()


{
return TrieField.class;
}

Deleting by query
Delete by query no longer accepts wildcard queries, including queries that match all documents (for example,
<delete><query>*:*</query></delete>). Instead, use the CQL TRUNCATE command.
Delete by query supports deleting data based on search criteria. After you issue a delete by query, documents
start getting deleted immediately and deletions continue until all documents are removed. For example, you can
delete the data that you inserted using this command:

$ curl http://localhost:8983/solr/mykeyspace.mysolr/update --data


'<delete><query>color:red</query></delete>' -H 'Content-type:text/xml; charset=utf-8'

Using &allowPartialDeletes parameter set to false (default) prevents deletes if a node is down. Using
&allowPartialDeletes set to true causes the delete to fail if a node is down and the delete does not meet a
consistency level of quorum. Delete by queries using *:* are an exception to these rules. These queries issue a
truncate, which requires all nodes to be up in order to succeed.
Best practices
DataStax recommends that queries for delete-by-query operations touch columns that are not updated. For
example, a column that is not updated is one of the elements of a compound primary key.
Delete by query problem example
The following workflow demonstrates that not following this best practice is problematic:

• When a search coordinator receives a delete-by-query request, the request is forwarded to every node in
the search datacenter.

• At each search node, the query is run locally to identify the candidates for deletion, and then the
LOCAL_ONE consistency level deletes the queries for each of those candidates.

• When those database deletes are perceived at the appropriate nodes across the cluster, the records are
deleted from the search index.

For example, in a certificates table, each certificate has a date of issue that is a timestamp. When a certificate
is renewed, the new issue date is written to the row, and that write is propagated to all replicas. In this example,
let's assume that one replica misses it. If you run a periodic delete-by-query that removes all of the certificates
with issue dates older than a specified date, unintended consequences occur when the replica that just missed
the write with the "certificate renewal" matches the delete query. The certificate is deleted across the entire
cluster, on all datacenters making that delete unrecoverable.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
360
Using DataStax Enterprise advanced functionality

Monitoring Solr segments


To monitor Solr segments, use the Segments Info screen in the Solr Administration User Interface
application. You can also use this API endpoint and specify the Solr collection path name:

$ solr/<Solr collection path name>/admin/segments

The following example obtains the segment information for the Solr GeoNames collection. The example
specifies the IP address and port, and specifies that the output is to be returned in JSON format and indented.

$ http://127.0.0.1:8983/solr/solr.geonames/admin/segments?wt=json&indent=true

The following output shows the segment information, which is truncated for brevity:

{
"responseHeader":{
"status":0,
"QTime":3},
"segments":{
"_0":{
"name":"_0",
"delCount":5256,
"sizeInBytes":1843747,
"size":6439,
"sizeMB":1.7583341598510742,
"delRatio":0.816275819226588,
"age":"2017-06-15T15:21:09.730Z",
"source":"flush"},
"_1":{
"name":"_1",
"delCount":5351,
"sizeInBytes":1881895,
"size":6554,
"sizeMB":1.7947149276733398,
"delRatio":0.816447970704913,
"age":"2017-06-15T15:21:09.786Z",
"source":"flush"},
"_3":{
"name":"_3",
"delCount":5553,
"sizeInBytes":1952348,
"size":6850,
"sizeMB":1.8619041442871094,
"delRatio":0.8106569343065694,
"age":"2017-06-15T15:21:09.790Z",
"source":"flush"},
...

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
361
Using DataStax Enterprise advanced functionality

The following table describes the segment properties in the previous output:

Property Description

name Segment name

delCount Number of documents deleted from the segment

sizeInBytes Segment size in bytes

size Number of documents in the segment

sizeMB Segment size in megabytes

delRatio Delete ratio, which is based on the ratio for the segment delete count and number of documents in the segment

age Date and time that the segment was created

source Segment source; flush sends the recent index changes to stable storage

For more information, see the Apache Solr online reference guide located at https://lucene.apache.org/solr/
guide.
HTTP API SolrJ and other Solr clients
Apache Solr™ clients work with DataStax Enterprise. If you have an existing Solr application, you can create a
schema, then import your data and query using your existing Solr tools. The Wikipedia demo is built and queried
using SolrJ. The query is done using pure Ajax. No DataStax Enterprise API is used for the demo.
DataStax has extended SolrJ to protect internal Solr communication and HTTP access using SSL. You can also
use SolrJ to change the consistency level of the write in the database on the client side.

DSE Graph
DataStax Enterprise (DSE) Graph is a distributed graph database that is optimized for fast data storage and
traversals, zero downtime, and analysis of complex, disparate, and related datasets in real time. It is capable of
scaling to massive datasets and executing both transactional and analytical workloads. DSE Graph incorporates
all of the enterprise-class functionality found in DataStax Enterprise, including advanced security protection,
built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools
including DataStax Studio.
About DataStax Enterprise Graph
DataStax Enterprise (DSE) Graph is a distributed graph database that is optimized for fast data storage and
traversals, zero downtime, and analysis of complex, disparate, and related datasets in real time. It is capable of
scaling to massive datasets and executing both transactional and analytical workloads. DSE Graph incorporates
all of the enterprise-class functionality found in DataStax Enterprise, including advanced security protection,
built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools
including DataStax Studio.
What is a graph database?
A graph database is a database that uses graph structures to store data along with the data's relationships.
Common use cases include: fraud prevention, Customer 360, Internet of Things (IoT) predictive maintenance,
and recommendation engine. Graph databases use a data model that is as simple as a whiteboard drawing.
Graph databases employ vertices, edges, and properties as described in Data modeling.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
362
Using DataStax Enterprise advanced functionality

What is DSE Graph?


The architecture of the DSE database can handle petabytes of information and thousands of concurrent users
and operations per second. DSE Graph is built as a component of DataStax Enterprise. DSE Graph provides the
following benefits:
9
Scalable for large graphs and DSE Graph can contain billions (10 ) of vertices and edges.
high volumes of users, events,
and operations

Support for high-volume, The transactional capacity of DSE Graph scales with the size of the cluster and answers complex traversal
concurrent transactions and queries on huge graphs in milliseconds.
operational graph processing
(OLTP)

Support for global graph Available through the Spark framework.


analytics and batch graph
processing (OLAP)

Integration with DSE Search Integrates with DSE Search for efficient indexing that supports geographical and numeric range search, as
well as full-text search for vertices and edges in large graphs.

Native support for Apache Uses the popular property graph data model exposed by Apache TinkerPop and the graph traversal
TinkerPop and Gremlin query language Gremlin.
language

Performance tuning options Numerous graph-level configuration options are available.

Vertex-centric indexes provide Allows optimized deep traversal by reducing search space quickly.
optimal querying

Optimized disk representation Allows for efficient use of storage and speed of access.

What are the advantages of DSE Graph?


The advantages of DSE Graph over other graph databases include:

• Integrated with the DSE database to take advantage of the DSE database's features

• Dedicated index structures that make queries faster

• Certified for production environments

• Advanced security features

• Integrated with Enterprise Search and Analytics

• Visual management and monitoring with OpsCenter

• Visual development with DataStax Studio

• Graph support in certified DataStax drivers

How is DSE Graph different from other graph databases?


DSE Graph is distributed, highly available, and has a scale-out architecture. The data in a DSE Graph is
automatically partitioned across all the nodes in a cluster. Additionally, DSE Graph has built-in support for OLAP
analytics and search on graph data. All DSE components use advanced security options for sensitive data.
What is Apache TinkerPop?
Apache TinkerPop is an open source project that provides an abstraction framework to interact with DSE Graph
and other graph databases.
What is Gremlin?
Gremlin is the primary interface into DSE Graph. Gremlin is a graph traversal language and virtual machine
developed by Apache TinkerPop. Gremlin is a functional language that enables Gremlin to naturally support
imperative and declarative querying.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
363
Using DataStax Enterprise advanced functionality

How do I interact with DSE Graph?


DataStax recommends using the web-based interactive developer tool DataStax Studio to create graph
schemas, insert data, and query data and metadata. Studio provides both tabulated and visual information for
DSE Graph schema and queries, enhancing the exploration of graph relationships.
A more basic way to interact with DSE Graph is the Gremlin console dse gremlin-console. For production,
DataStax supplies a number of drivers for passing Gremlin statements to DSE Graph: Java, Python, Node.js, C#,
and C++.
How can I load and unload DSE Graph data?
Use a variety of methods to load or unload data:

• DSE Graph Loader is a command line utility that supports loading the following formats: CSV, text files,
GraphSON, GraphML, Gryo, and queries from JDBC-compatible databases.

• DataStax Studio and the Gremlin console load data using graph traversals.

• DseGraphFrame, a framework for the Spark API, loads data to DSE Graph directly or with transformations.

Best practices start with data modeling before inserting data. The paradigm shift between relational and graph
databases requires careful analysis of data and data modeling before importing and querying data in a graph
database. DSE Graph data modeling provides information and examples.

What tools come with DSE Graph?


DSE Graph is bundled with a number of tools:

• DataStax Studio, a web-based interactive developer tool with notebooks for running Gremlin commands and
visualizing graphs

• Gremlin Console, a shell for exploring DSE Graph

• DSE OpsCenter, a monitoring and administrative tool

• DSE Graph Loader, a stand-alone data loader/unloader

What hardware or cloud environment do I need to run DSE Graph?


DSE Graph runs on commodity hardware with common specifications like other DataStax Enterprise offerings;
see DataStax's capacity planning recommendations.
DSE Graph Terminology
This terminology is specific to DSE Graph.
adjacency list
A collection of unordered lists used to represent a finite graph. Each list describes the set of neighbors
of a vertex in the graph.
adjacent vertex
A vertex directly attached to another vertex by an edge.
directed graph
A set of vertices and a set of arcs (ordered pairs of vertices). In DSE Graph, the terminology "arcs" is
not used, and edges are directional.
edge
A connection between vertices. Edges can be unordered (no directional orientation) or ordered
(directional). An edge can also be described as an object that has a vertex at its tail and head.
element
An element is a vertex, edge, or property.
global index
An index structure over the entire graph.
graph
A collection of vertices and edges.
graph degree
The largest vertex degree of the graph.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
364
Using DataStax Enterprise advanced functionality

graph partitioning
A process that consists of dividing a graph into components, such that the components are of about the
same size and there are few connections between the components.
graph traversal
An algorithmic walk across the elements of a graph according to the referential structure explicit within
the graph data structure.
incident edge
An edge incident to a particular vertex, meaning that the edge and vertex touch.
index
An index is a data structure that allows for the fast retrieval of elements by a particular key-value pair.
meta-property
A property that describes some attribute of another property.
order
The magnitude of the number of edges to the number of vertices.
partitioned vertex
Used for vertices that have a very large number of edges, a partitioned vertex consists of a portion of a
vertex's data that results from dividing the vertex into smaller components for graph database storage.
Experimental
property
A key-value pair that describes some attribute of either a vertex or an edge. Property key is used to
describe the key in the key-value pair. All properties are global in DSE Graph, meaning that a property
can be used for any vertices. For example, "name" can be used for all vertices in a graph.
traversal source
A domain specific language (DSL) that specifies the traversal methods used by a traversal.
undirected graph
A set of vertices and a set of edges (unordered pairs of vertices).
vertex-centric index
A local index structure built per vertex.
vertex
A vertex is the fundamental unit of which graphs are formed. A vertex can also be described as an
object that has incoming and outgoing edges.
vertex degree
The number of edges incident to a vertex.
DSE Graph Operations

DSE Graph Configuration

DSE Graph configuration


Adjusting DSE Graph configuration can create an environment easier to use for development, while protecting
and improving the performance for a production environment. Some configurations affect the interaction of
applications with the graph database, while others affect internal processing within DSE. In addition, securing
DSE Graph has important consequences, and a number of configuration settings can secure cluster operation.
Whether doing development or implementing production, a thorough knowledge of the configuration is vital.
General DSE Graph settings
dse.yaml Graph options
DSE Graph stores cluster-wide options for DSE Graph in dse.yaml under the graph: and gremlin-server:
keys. Most of the options that are common to modify have been discussed in the sections below. Of particular
note, the Graph sandbox is configured in the Gremlin Server options of the dse.yaml file. This feature is
enabled by default and provides protection from malicious attacks within the JVM.
To modify dse.yaml settings, modify the file on each node in the cluster and restart each node. Settings in the
dse.yaml are node system level in scope. The dse.yaml files can also be modified using OpsCenter. Another
alternative is to set options per graph, described in the The schema API configuration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
365
Using DataStax Enterprise advanced functionality

remote.yaml Gremlin console options


The remote.yaml file is the primary configuration file for DSE Graph Gremlin console connections to the
Gremlin Server. Most options are self-explanatory. In particular, be aware that if you are using analytic OLAP
queries with DSE Graph, changes are required in this file.
Replication factor
The replication factor (RF) and system replication factor (system RF) for a graph can affect the performance
of reads and writes in DSE. Just as for the DSE database, these factors control the number of replicas of data
that the distributed graph database will store across multiple nodes.
Two keyspaces are created for each graph; the graph keyspace stores the data, while the graph_system
keyspace stores information vital to DSE Graph operation. The default values set for the replication factor and
system replication factor depend on the number of nodes in each datacenter when the graph is created:

Number of nodes Graph replication factor Graph System


in each datacenter replication factor
1 1 1

2 2 2

3 3 3

4 3 4

5 or greater 3 5

For more information, see the Graph System API: replication factor and system replication factor.
Consistency_mode, datacenter_id, read_consistency, and write_consistency
Consistency level in DSE Graph is controlled for both graph operation and DSE database operations. The
consistency_mode setting configures graph operations, and read_consistency and write_consistency
settings configure the consistency level of DSE database read and write operations within a graph
transaction.
The consistency_mode (default: GLOBAL) is appropriate for user-defined vertex ids. If auto-generated vertex
ids are used, this setting can be changed to DC_LOCAL, with a concurrent change made to the datacenter_id
setting. Both consistency_mode and datacenter_id must be configured on every node in the cluster. The
datacenter_id setting is ignored if consistency_mode is set to GLOBAL.
These options must be set to the same value in the dse.yaml file on every node in a cluster, and will not
be effective if set while the cluster is running.

Gremlin queries execute CQL commands to insert, read, and update graph data via traversals, and so the
DSE database consistency level settings can affect the execution of graph operations. The consistency
level for reads or writes can generally be set per graph with the read_consistency (default: ONE) and
write_consistency (default: LOCAL_QUORUM) settings for user-defined vertex ids. If a search index is
used in a graph traversal, the read_consistency will be set to LOCAL_ONE in a multiple datacenter cluster.
The options are set with the Schema API .
schema_mode
To access data, two configuration items are important: schema_mode and allow_scan.
The schema_mode setting has two choices that identify whether automatic schema creation is allowed or not:

• Development: allows loading graph data before explicitly specifying a graph schema through the Graph
Schema API

• Production (default): required explicit graph schema prior to loading graph data

The schema_mode setting has a hard-coded default value of Production, that can be overridden by either:

• including an option in the dse.yaml file: schema_mode: Development

• using a graph-level command: schema.config().option('schema_mode').set('Development')

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
366
Using DataStax Enterprise advanced functionality

When exploring data to design your graph application, setting schema_mode: Development can be beneficial
in helping you to discover the graph schema that you may want to use. However, setting schema_mode:
Production is important once development is complete, to prevent random schema creation.
The default settings for schema_mode and allow_scan are set for production, not development, to ensure
out-of-the-box operation conforms to the more restrictive environment.

Three useful commands are available for discovering the current value of these two settings:

• schema.getEffectiveSchemaMode(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.

• schema.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.

• graph.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), graph-
level setting that may have been set, and transaction-level setting that may have been set.

DSE Graph security settings


Graph sandbox and whitelisted/blacklisted code
The DSE Graph sandbox, configured in the dse.yaml file under the gremlin-server: key, is enabled by
default. This security feature prevents malicious code execution in the JVM that could harm a DSE instance.
Sandbox rules are defined to both blacklist (disallow execution) and whitelist (allow execution) packages,
superclasses and types. For Java/Groovy code entered in the Gremlin console, only the specified allowed
operations will execute. The default sandbox rules may be overridden in the dse.yaml file. The sandbox rules
are applied in the following order:

1. blacklist_supers, including all classes that implement or extend the listed items

2. blacklist_packages, including all sub-packages

3. whitelist_packages, including all sub-packages

4. whitelist_types, not including sub-classes, but only the specified type

5. whitelist_supers, including all classes that implement or extend the listed items

Any types not specified in the whitelist are blocked by default. If an item is blacklisted, it cannot be placed in
the whitelist unless it is removed from the blacklist; otherwise, an error occurs and the item is blocked.
Two classes are hard-coded as blacklisted and cannot be whitelisted:

• java.lang.System: All methods other than currentTimeMillis and nanoTime are blocked (blacklisted).

• java.lang.Thread: currentThread().isInterrupted is an allowed method that can return a wrapped thread


with toString, and sleep is another allowed method, and all other methods are disallowed.

An example of possible whitelisted and blacklisted items in the gremlin_server section of the dse.yaml file:

gremlin_server:
port: 8182
threadPoolWorker: 2
gremlinPool: 0
scriptEngines:
gremlin-groovy:
config:
# sandbox_enabled: false
sandbox_rules:
whitelist_packages:
- org.apache.tinkerpop.gremlin.process
- java.nio
whitelist_types:
- java.lang.String

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
367
Using DataStax Enterprise advanced functionality

- java.lang.Boolean
- com.datastax.bdp.graph.spark.SparkSnapshotBuilderImpl
- com.datastax.dse.graph.api.predicates.Search
whitelist_supers:
- groovy.lang.Script
- java.lang.Number
- java.util.Map
-
org.apache.tinkerpop.gremlin.process.computer.GraphComputer
blacklist_packages:
- java.io
- org.apache.tinkerpop.gremlin.structure.io
- org.apache.tinkerpop.gremlin.groovy.jsr223
- java.nio.channels

The Fluent API restricts the allowable operations to secure execution, but uses the sandbox to enable lambda
functions.
Authentication, authorization, and encryption
DSE can authenticate or authorize access by users, secure the stored data with encryption, or secure
Gremlin console with SSL, based on Graph vertex labels or graphs, as applicable.
DSE Graph security is managed by DSE security. As noted in this topic, you can modify the Graph Sandbox
by customizing the gremlin-server: key of the dse.yaml file.
To configure the DSE Graph Gremlin console connection to the Gremlin Server, customize the remote.yaml
file for your environment.
DSE Graph also supports auditing using DSE auditing; for details, refer to Setting up database auditing.
Restrict lambda
Lambda restriction is enabled by default to block arbitrary code execution in Gremlin traversals. Most
applications should not require user-defined lambda functions. If lambda functions are required, disable
lambda restrictions using the Schema API to change the restrict_lambda (default: true) option.
See Apache TinkerPop documentation for more information on lambda functions.

DSE Graph traversal performance settings


allow_scan
To access data, two configuration items are important: schema_mode and allow_scan.
The allow_scan setting is a Boolean setting that identifies whether full scans of the entire cluster are allowed
or not:

• TRUE: allows any graph query to do full scans of the cluster, similar to ALLOW FILTERING in CQL
queries. Although useful during development, allowing full scan can result in queries that do costly linear
scans over one or more tables.

• FALSE (default): will not execute a query if restrictions to a subset of the entire cluster’s data are not
included

The allow_scan setting has a hard-coded default value of FALSE, that can be overridden to a value of TRUE
by doing one of the following actions:

• including an option in the dse.yaml file: allow_scan: TRUE

• using a graph-level command: schema.config().option('allow_scan').set('TRUE')

• using a transaction-level command graph.tx().config().option('allow_scan', TRUE).open()

When exploring data to design your graph application, setting allow_scan: true allows you to fully explore
and visualize ther relationships in small test datasets with very broad queries like g.V(). Be aware, however,
that traversals depending on full scans will take too long to execute with large production-size datasets, and
that once development is complete, allow_scan: false is the appropriate setting.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
368
Using DataStax Enterprise advanced functionality

The default settings for schema_mode and allow_scan are set for production, not development, to ensure
out-of-the-box operation conforms to the more restrictive environment.

Three useful commands are available for discovering the current value of these two settings:

• schema.getEffectiveSchemaMode(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.

• schema.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.

• graph.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), graph-
level setting that may have been set, and transaction-level setting that may have been set.

cache
Caching can use additional memory to store intermediary results, and improve the performance of DSE
Graph by shortening the time to complete queries. DSE Graph has two caches:

• adjacency cache: store the properties of vertices and the properties of those vertices' incident edges

• index cache: stores the results of graph traversals that include a global index, such as a hasLabel() or
has() step

Caching is enabled by default; the Schema API setting cache (default: true) can be used to disable caching.
In addition, both adjacency cache and index cache have settings that can be modified:
Table 20: DSE Graph cache
Cache setting Default Location Description

vertex_cache_size 10000l Set with Schema API. Maximum size of transaction-level cache
of recently-used vertices.

adjacency_cache_clean_rate 1024 dse.yaml Number of stale rows per second to


clean from each graph's adjacency
cache.

adjacency_cache_max_entry_size_in_mb 0 dse.yaml Maximum entry size in each graph's


adjacency cache.

adjacency_cache_size_in_mb 128 dse.yaml Amount of ram to allocate to each


graph's adjacency (edge and property)
cache.

index_cache_clean_rate 1024 dse.yaml Number of stale entries per second to


clean from the index adjacency cache.

index_cache_max_entry_size_in_mb 0 dse.yaml Maximum entry size in the index


adjacency cache. When set to zero, the
default is calculated based on the cache
size and the number of CPUs.

Timeouts
Timeout settings can cause failure of DSE Graph in a variety of ways, both client-side and server-side. On the
client-side, commands from the Gremlin console can time out before reaching the Gremlin server. Issuing the
command :remote config timeout none in the Gremlin console allows the default maximum timeout of 3
minutes to be overridden with no time limit. Any request typed into the Gremlin console is sent to the Gremlin
Server, and the console waits for a response before it aborts the request and returns control to the user. If the
timeout is changed to none, the request will never timeout. This can be useful if the time to send a request to
the server and get a return is taking longer than the default timeout, for complex traversals or large datasets.
On the server-side, the cluster-wide timeout settings, realtime_evaluation_timeout_in_seconds (default:
30 seconds) or analytic_evaluation_timeout_in_minutes (default: 1008 minutes), are the maximum
time to wait for a traversal to evaluate for OLTP or OLAP traversals, respectively. These settings are found in
the dse.yaml file. If the timeout behavior for traversal evaluation needs to be overridden for a particular graph,
evaluation_timeout can be set on a graph-by-graph basis, to override either the OLTP or OLAP traversal

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
369
Using DataStax Enterprise advanced functionality

evaluation timeout. If complex traversals are timing out during execution, changing an appropriate timeout
setting should fix the error.
An additional server-side setting that can be adjusted in the dse.yaml file is
schema_agreement_timeout_in_ms (30 seconds), the maximum time to wait for schema versions to agree
across a cluster when making schema changes. If a large schema is submitted to a cluster, especially with
indexes defined, this setting may need adjustment before data is submitted to the graph.
Finally, in the dse.yaml file, system_evaluation_timeout_in_seconds (default: 180 seconds) is defined as
the maximum time to wait for a graph system request to evaluate. Creating or dropping a graph is a system
request affected by this setting, which does not interact with the other timeout options.

Table 21: DSE Graph Timeouts


Timeout Default Impact

:remote config timeout none 3 minutes Lengthen if command transit from Gremlin console to Gremlin
Server is timing out.

realtime_evaluation_timeout_in_seconds 30 seconds Lengthen if the OLTP traversal evaluation is timing out.

analytic_evaluation_timeout_in_minutes 1008 minutes Lengthen if the OLAP traversal evaluation is timing out.

evaluation_timeout N/A Set per-graph to override OLTP or OLAP traversal evaluation


timeout.

schema_agreement_timeout_in_ms 30 seconds Lengthen if a large schema is submitted, especially with


indexes.

system_evaluation_timeout_in_seconds 180 seconds Lengthen if graph system requests are not completing.

external_vertex_verify and internal_vertex_verify


These settings allow a tradeoff between correctness verification and better load performance. For example,
when loading large datasets that have user-defined vertex ids external_vertex_verify (default: true) or
auto-generated vertex ids internal_vertex_verify (default: false), these options are important. If you have
a fresh clean graph with no data yet, and don’t want to check if vertex ids found in your data already exist in
the graph, then set the appropriate option to false and speed up data loading with DSE Graph Loader. Of
course, if you do have data already and don’t want to overwrite it with the newly loading dataset, you should
use a true value for the appropriate option.
tx_autostart and max_query_queue
If you are loading large GraphSON files, tx_autostart can enable a query to automatically start a new
transaction once 10,000 elements are reached during loading. Another useful method of avoiding restrictions
when loading large files is to configure max_query_queue in the dse.yaml file to remove restrictions at the
node system-level.
Specifying DSE database and graph settings
Some DSE Graph options are set on a per-graph basis. The settings are modified and read using either
System or Schema API calls in the Gremlin console. These option values are stored in DSE tables and are
not set in the dse.yaml file. See the DSE Graph reference for a complete list of options. Other settings for DSE
Graph are also in the dse.yaml file.

Most per-graph options are set using the Schema API.

• Check all non-default values of configuration settings.

schema.config().describe()

graph.tx_groups.default.write_consistency: ALL

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
370
Using DataStax Enterprise advanced functionality

graph.tx_groups.default.read_consistency: QUORUM

Default settings are not displayed with this call.

• Check the values of a specific setting.

schema.config().option('graph.tx_groups.default.write_consistency').get()

ALL

• Set the value of a configuration setting.

schema.config().option('graph.tx_groups.default.write_consistency').set('ALL')

null

• To retrieve all traversal sources that have been set, use the get() command with the traversal source
type option:

schema.config().option('graph.traversal_sources.*.type').get()

REAL_TIME

• Set the maximum time to wait for a traversal to evaluate:

schema.config().option("graph.traversal_sources.g.evaluation_timeout").set('PT2H')

The timeout values can also be entered in seconds or minutes, as appropriate, using set('1500
ms'), for example.

Setting a timeout value greater than 1095 days (maximum integer) can exceed the limit of a graph
session. Starting a new session and setting the timeout to a lower value can recover access to a hung
session. This caution is applicable for all timeouts: evaluation_timeout, system_evaluation_timeout,
analytic_evaluation_timeout, and realtime_evaluation_timeout

PT2H

The dse.yaml file has settings realtime_evaluation_timeout_in_seconds and


analytic_evaluation_timeout_in_minutes that determine the timeout value
used depending on whether the query is an OLTP or OLAP query, respectively.
The command shown above using evaluation_timeoutwill override any system
level setting for the traversal source g specified. The hierarchy for OLTP traversals
is, in order of override: graph.traversal_sources.g.evaluation_timeout >
realtime_evaluation_timeout_in_seconds > system_evaluation_timeout_in_seconds.
The hierarchy for OLAP traversal timeout overrides is similar to
OLTP: graph.traversal_sources.a.evaluation_timeout >
analytic_evaluation_timeout_in_minutes > system_evaluation_timeout_in_seconds.

Some options must be set using the System API.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
371
Using DataStax Enterprise advanced functionality

• Settings can also be set while creating a new graph. For instance, replication for graph inherits DSE
database defaults, so the replication factor is set to 1 and the class is SimpleStrategy. As with the DSE
database, the replication factor for graph should be set before adding data.

system.graph('gizmo').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3 }").
ifNotExists().create()

• Graph also creates a keyspace for storing graph variables in DSE tables. This keyspace holds essential
information, so the replication factor should be set to something higher than one replica to ensure no loss.

gremlin> system.graph('gizmo').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3 }").
ifNotExists().create()

• Additional schema settings can be configured at graph creation.

system.graph('food2').
replication("{'class' : 'SimpleStrategy', 'replication_factor' : 1 }").
systemReplication("{'class' : 'SimpleStrategy', 'replication_factor' : 1 }").
option("graph.schema_mode").set("Development").
option("graph.allow_scan").set("false").
option("graph.default_property_key_cardinality").set("multiple").
option("graph.tx_groups.*.write_consistency").set("ALL").
create()

More information can be found in the Schema API reference.

The Graph API is used to set some transaction settings.

• The allow_scan option can be set at either a single graph level or as shown here, for all actions within a
transaction made on a single node. This setting can be useful if a quorum cannot be mustered for writing
the option change to the system table.

graph.tx().config().option("allow_scan", true).open()

null

Configuring DSE Graph Security


DSE Graph security is managed by DSE security. DSE Graph does require some unique configuration, such
as changing the configuration to use the Gremlin console securely or modifying the Graph Sandbox in the
Gremlin Server configuration.
DSE Graph also supports auditing using DSE auditing, see Setting up database auditing.
Backing up and restoring DSE Graph
DataStax OpsCenter is the primary tool used to backup and restore DSE data. The OpsCenter Backup Service
should be used. Note that backing up and restoring DSE Graph data is best accomplished with OpsCenter 6.5.
Importing and exporting DSE Graph data
DseGraphFrames is useful for exporting graph data from one graph to another, especially if the structure of the
graph schema changes on import. This method does require the installation of DSE Analytics.
Using JMX to read and execute operation with DSE Graph metrics
DSE Graph Tools
In addition to the Gremlin console, other tools are available for working with DSE Graph:
DataStax Studio

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
372
Using DataStax Enterprise advanced functionality

Web-based notebook-style visualization tool. Currently supports Markdown and Gremlin. Includes a
variety of list and graph functions.

DSE OpsCenter
Visual management and monitoring tool.

DSE Lifecycle Manager


Powerful provisioning and configuration management tool.

Starting the Gremlin console


Gremlin is the query language used to interact with DSE Graph. One method of inputting Gremlin code is to use
the Gremlin console. The Gremlin console is a useful interactive environment for directly inputting Gremlin to
create graph schema, load data, administer graph, and retrieve traversal results. The Gremlin Console is an
interface to the Gremlin Server that can interact with DSE Graph.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
373
Using DataStax Enterprise advanced functionality

• Start the Gremlin console using the dse command and passing the additional command gremlin-console:

$ bin/dse gremlin-console

\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>

Three plugins are activated by default, as shown. The Gremlin Server, tinkerpop.server, is started
so that commands can be issued to DSE Graph. The utilities plugin, tinkerpop.utilities, provides
various functions, helper methods and imports of external classes that are useful in Gremlin console.
TinkerGraph, an in-memory graph that is used as an intermediary for some graph operations is started
with tinkerpop.tinkergraph. The Gremlin console automatically connects to the remote Gremlin
Server.
The Gremlin console packaged with DataStax Enterprise does not allow plugin installation like the
Gremlin console packaged with Apache TinkerPop.

• Gremlin console help can be displayed with the -h flag:

$ bin/dse gremlin-console -h

usage: gremlin.sh [options] [...]


-C, --color Disable use of ANSI colors
-D, --debug Enabled debug Console output
-Q, --quiet Suppress superfluous Console
output
-V, --verbose Enable verbose Console output
-e, --execute=SCRIPT ARG1 ARG2 ... Execute the specified script
(SCRIPT ARG1 ARG2 ...) and
close the console on
completion
-h, --help Display this help message
-i, --interactive=SCRIPT ARG1 ARG2 ... Execute the specified script
and leave the console open on
completion
-l Set the logging level of
components that use standard
logging output independent of
the Console
-v, --version Display the version

Use -V to display all lines when loading a file, to discover which line of code causes an error.

• Run the Gremlin console with the host:port option to specify a specific host and port:

$ bin/dse gremlin-console 127.0.0.1:8182

Any hostname or IP address will work to specify the host.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
374
Using DataStax Enterprise advanced functionality

• Run Gremlin console with the -e flag to execute one or more scripts:

$ bin/dse gremlin-console -e test1.groovy -e test2.groovy

If the scripts run successfully, the command will return with the prompt after execution. If errors occur,
the standard output will show the errors.

• If you prefer to have Gremlin console open at the script completion, run Gremlin console with the -i flag
instead of the -e flag:

$ bin/dse gremlin-console -i test1.groovy -i test2.groovy

If the scripts run successfully, the command will return with the Gremlin console prompt after execution.
If errors occur, the console will show the errors.

• Discover all Gremlin console commands with help. Console commands are not Gremlin language
commands, but rather commands issued to the Gremlin console for shell functionality. The Gremlin console
is based on the Groovy shell.

:help

For information about Groovy, visit:


http://groovy-lang.org

Available commands:
:help (:h ) Display this help message
? (:? ) Alias to: :help
:exit (:x ) Exit the shell
:quit (:q ) Alias to: :exit
import (:i ) Import a class into the namespace
:display (:d ) Display the current buffer
:clear (:c ) Clear the buffer and reset the prompt counter.
:show (:S ) Show variables, classes or imports
:inspect (:n ) Inspect a variable or the last result with the GUI object
browser
:purge (:p ) Purge variables, classes, imports or preferences
:edit (:e ) Edit the current buffer
:load (:l ) Load a file or URL into the buffer
. (:. ) Alias to: :load
:save (:s ) Save the current buffer to a file
:record (:r ) Record the current session to a file
:history (:H ) Display, manage and recall edit-line history
:alias (:a ) Create an alias
:register (:rc ) Registers a new command with the shell
:doc (:D ) Opens a browser window displaying the doc for the argument
:set (:= ) Set (or list) preferences
:uninstall (:- ) Uninstall a Maven library and its dependencies from the
Gremlin Console
:install (:+ ) Install a Maven library and its dependencies into the Gremlin
Console
:plugin (:pin) Manage plugins for the Console
:remote (:rem) Define a remote connection
:submit (:> ) Send a Gremlin script to Gremlin Server

For help on a specific command type:


:help command

The Gremlin Console provides code help via auto-complete functionality, using the <TAB> key to trigger a
list of possible options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
375
Using DataStax Enterprise advanced functionality

:install and :plugin should not be used with DSE Graph. These commands will result in gremlin
console errors.

DSE Graph Reference

The graph API


graph commands add data to an existing graph.

addEdge
Synopsis

vertex1.addEdge('edgeLabel', vertex2, [T.id, 'edge_id'], ['key', 'value'] [,...])

Description
Edge data is inserted using addEdge. A previously created edge label must be specified. An edge_id may be
specified, to upsert data for a multiple cardinality edge to prevent creation of a new edge. Property key-value
pairs may be optionally specified.
Examples
Create an edge with an edge label rated between the vertices johnDoe and beefBourguignon with the
properties timestamp, stars, and comment.

johnDoe.addEdge('rated', beefBourguignon, 'timestamp', '2014-01-01T00:00:00.00Z',


'stars', 5, 'comment', 'Pretty tasty!')

Update an edge with an edge label created between the vertices juliaChild and beefBourguignon, specifying
the edge with an edge id of 2c85fabd-7c49-4b28-91a7-ca72ae53fd39, and a property createDate of
2017-08-22:

juliaChild.addEdge('created', 'beefBourguignon', T.id,


java.util.UUID.fromString('2c85fabd-7c49-4b28-91a7-ca72ae53fd39'), 'createDate',
'2017-08-22')

Note that a conversion function must be used to convert a string to the UUID. T.id is a literal that must be
included in the statement.
addVertex
Synopsis

addVertex(label, 'label_name', 'key', 'value', 'key', 'value')

Description
Vertex data is inserted using addVertex. A previously created vertex label must be specified.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
376
Using DataStax Enterprise advanced functionality

Examples
Create a vertex with a vertex label reviewer with the properties location and status.

graph.addVertex(label, 'reviewer', 'location', 'Santa Cruz, CA', 'status', 'Rock Star')

io
Synopsis

io( [gryo() | graphson() | graphml()]).[readGraph | writeGraph] ( file_name )

Description
Graph data is written to a file or read from a file using io. The file to read must be located on a DSE cluster
node, and the written file will be created on the DSE cluster node on which the command is run.
Examples
Write the graph data to a file using the Gryo format:

graph.io(gryo()).writeGraph('/tmp/test.gryo')

Read the graph data from a file using the Gryo format:

graph.io(gryo()).readGraph('/tmp/test.gryo')

This method of reading a graph is not recommended, and will not work with graphs larger than 10,000
vertices or elements. DSE Graph Loader is a better choice in production. Additionally, a schema setting
may need modification for this method to work:

schema.config().option("tx_autostart").set(true)

property
Synopsis

vertex1.property( ['key', 'value'] [,...], [T.id, 'property_id'])

Description
Property data is inserted using property. Property key-value pairs are specified. A property_id may be
specified, to upsert data for a multiple cardinality property to prevent creation of a new property.
Examples
Create a property with values for gender and nickname.

jamieOliver.property('gender', 'M', 'nickname', 'jimmy')

Update the property gender for the vertex juliaChild specifying a property with a property id of
2c85fabd-7c49-4b28-91a7-ca72ae53fd39:

uuid = java.util.UUID.fromString('2c85fabd-7c49-4b28-91a7-ca72ae53fd39')

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
377
Using DataStax Enterprise advanced functionality

juliaChild.property('gender', 'F', T.id, uuid)'

Note that a conversion function must be used to convert a string to the UUID. T.id is a literal that must be
included in the statement.
tx().config().option()
Synopsis

tx().config().option(option).open()

Description

Examples
Change the value of allow_scan for a transaction. The effect of this change is to allow all commands
executed in the gremlin-console on a particular node to do full graph scans, even if the consistency level for
the cluster is not QUORUM, the value required to change this option in the appropriate system table.

graph.tx().config().option("allow_scan", true).open()

Note that the previous transaction (automatically opened in gremlin-console or Studio) must be committed
before the new configuration option value is set.
The system API
The system commands create, drop, and describe graphs, as well as list existing graphs and check for
existence. Graph and system configuration can also be set and unset with system commands.
create
Synopsis

system.graph('graph_name').create()

Description
Create a new graph. The graph_name specified is used to create two DSE database keyspaces, graph_name
and graph_name_system, and can only contain alphanumeric and underscore characters.
Creating a graph should include setting the replication factor for the graph and the graph_system. It can
also include other options.

Examples
Create a simple new graph.

system.graph('FridgeItems').create()

The resulting list:

==>FridgeItems

is created with the NetworkTopologyStrategy class and replication factor based on the number of datacenter
nodes, since no options were specified.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
378
Using DataStax Enterprise advanced functionality

Create a simple new graph if it doesn't currently exist by modifying with ifNotExists().

system.graph('FridgeItems').ifNotExists().create()

The resulting list:

==>FridgeItems

An example that creates a graph on a cluster with two datacenters of 3 nodes:

system.graph('FridgeItems).
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3 }").
ifNotExists().create();

The result:

==>null

shows that the graph was successfully created.


The replication settings can be verified using the cqlsh tool, running the CQL DESCRIBE KEYSPACE command:

DESCRIBE KEYSPACE "FridgeItems";

DESCRIBE KEYSPACE "FridgeItems_system";

with a result:

CREATE KEYSPACE "FridgeItems" WITH replication = {'class': 'NetworkTopologyStrategy',


'dc1: '3', 'dc2' : '3'}
AND durable_writes = true;
CREATE KEYSPACE "FridgeItems_system" WITH replication = {'class':
'NetworkTopologyStrategy', 'dc1: '3','dc2' : '3'}
AND durable_writes = true;

drop
Synopsis

system.graph('graph_name').[ifExists()].drop()

Description
Drop an existing graph using this command. All data and schema will be lost. For better performance, truncate
a graph before dropping it.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
379
Using DataStax Enterprise advanced functionality

Examples
Drop a graph.

system.graph('FridgeItem').drop()

The resulting list:

==>null

Drop an existing graph if it exists.

system.graph('FridgeSensors').ifExists().drop()

The resulting list:

==>null

exists
Synopsis

system.graph('graph_name').exists()

Description
Discover if a particular graph exists using this command.
Examples
Discover if a particular graph exists. The return value is a boolean value.

gremlin> system.graph('FridgeItem').exists()

The resulting list:

==>true

graphs
Synopsis

system.graphs()

Description
Discover what graphs currently exist using this command.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
380
Using DataStax Enterprise advanced functionality

Examples
Discover all graphs that exist in a DSE cluster.

gremlin> system.graphs()

The resulting list:

==>quickstart
==>test

DSE Graph replication factor


Synopsis

system.graph('graph_name').replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3,


'dc2' : 2 }")

Description
Create a new graph and set the graph_name replication configuration using replication()as well as the
graph_name_system configuration using systemReplication().
Both must be set at the time of graph creation, because replication factor and system replication factor
cannot be altered once set for the graph_name and graph_name_system keyspaces.

DSE database settings for replication factor are used, either SimpleStrategy for a single datacenter or
NetworkTopologyStrategy for multiple datacenters.
The default replication strategy for a multi-datacenter graph is NetworkTopologyStrategy, whereas for a
single datacenter, the replication strategy will default to SimpleStrategy. The number of nodes will determine
the default replication factors:
number of nodes graph_name replication factor graph_name_system
per datacenter replication factor

1-3 number of nodes per datacenter number of nodes per datacenter

4 3 4

5 or greater 3 5

Examples
An example that creates a graph on a cluster with a two datacenters:

system.graph('food').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
ifNotExists().create();

The result:

==>null

shows that the graph was successfully created.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
381
Using DataStax Enterprise advanced functionality

The replication settings can be verified using the cqlsh tool, running the CQL DESCRIBE KEYSPACE command:

DESCRIBE KEYSPACE food;

DESCRIBE KEYSPACE food_system;

with a result:

CREATE KEYSPACE food WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1: '3',


'dc2' : 2 } AND durable_writes = true;
CREATE KEYSPACE food_system WITH replication = {'class': 'NetworkTopologyStrategy',
'dc1: '3', 'dc2' : 2 } AND durable_writes = true;

DSE Graph systemReplication


Synopsis

system.graph('graph_name').systemReplication("{'class' : 'NetworkTopologyStrategy',
'dc1' : 3, 'dc2' : 2 }")

Description
Create a new graph and set the graph_name replication configuration using replication()as well as the
graph_name_system configuration using systemReplication().
Both must be set at the time of graph creation, because replication factor and system replication factor
cannot be altered once set for the graph_name and graph_name_system keyspaces.

DSE database settings for replication factor are used, either SimpleStrategy for a single datacenter or
NetworkTopologyStrategy for multiple datacenters.
The default replication strategy for a multi-datacenter graph is NetworkTopologyStrategy, whereas for a
single datacenter, the replication strategy will default to SimpleStrategy. The number of nodes will determine
the default replication factors:
number of nodes graph_name replication factor graph_name_system
per datacenter replication factor

1-3 number of nodes per datacenter number of nodes per datacenter

4 3 4

5 or greater 3 5

Examples
An example that creates a graph on a cluster with a two datacenters:

system.graph('food').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
ifNotExists().create();

The result:

==>null

shows that the graph was successfully created.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
382
Using DataStax Enterprise advanced functionality

The replication settings can be verified using the cqlsh tool, running the CQL DESCRIBE KEYSPACE command:

DESCRIBE KEYSPACE food;

DESCRIBE KEYSPACE food_system;

with a result:

CREATE KEYSPACE food WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1: '3',


'dc2' : 2 } AND durable_writes = true;
CREATE KEYSPACE food_system WITH replication = {'class': 'NetworkTopologyStrategy',
'dc1: '3', 'dc2' : 2 } AND durable_writes = true;

truncate
Synopsis

system.graph('graph_name').[ifExists()].truncate()

Description
Truncate an existing graph using this command. All data will be removed from the graph.
Examples
Truncate a graph.

system.graph('FridgeItem').truncate()

The resulting list:

==>null

Truncate an existing graph if it exists.

system.graph('FridgeSensors').ifExists().truncate()

The resulting list:

==>null

DSE Management Services


DSE Management Services are a set of services in DataStax Enterprise and OpsCenter that are designed to
automatically handle various administration and maintenance tasks and assist with overall database cluster
management.
DataStax Enterprise Performance Service
The DSE Performance Service automatically collects and organizes performance diagnostic information into a
set of data dictionary tables that can be queried with CQL.
DSE OpsCenter provides multiple performance metrics.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
383
Using DataStax Enterprise advanced functionality

About DSE Performance Service


The DSE Performance Service automatically collects and organizes performance diagnostic information from
DSE, DSE Search, and DSE Analytics into a set of data dictionary tables. These tables are stored in the
dse_perf keyspace and can be queried with CQL using any CQL-based utility, such as cqlsh, About DataStax
Studio 6.0, or any application using a DataStax driver.
Use this service to obtain database metrics and optimize performance and fine-tune DSE Search. Examples
include:

• Identify slow queries on a cluster to easily find and tune poorly performing queries.

• View latency metrics for tables on all user (non-system) keyspaces.

• Collect per node and cluster wide lifetime metrics by table and keyspace.

• Obtain recent and lifetime statistics about tables, such as the number of SSTables, read/write latency, and
partition (row) size.

• Track read/write activity on a per-client, per-node level for both recent and long-lived activity to identify
problematic user and table interactions.

• Detect bottlenecks in DSE Search.

• Monitor the resources used in a DSE Analytics cluster.

• Monitor particular DSE Analytics applications.

The OpsCenter Performance Service provides visual monitoring of diagnostics collected through the DSE
Performance Service, displays alerts, and provides recommendations for optimizing cluster performance.
The available diagnostic tables are listed on these pages:

• DSE Performance Service diagnostic table reference

• DSE Search Performance Service diagnostic table reference

Sample output from querying thread pool statistics:

SELECT * FROM dse_perf.thread_pool;

Result:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
384
Using DataStax Enterprise advanced functionality

Configuring Performance Service replication strategy


To configure the Performance Service replication strategy, adjust the dse_perf keyspace that stores
performance metrics data. Depending on the specific requirements, adjust the replication factor with a keyspace
command, such as ALTER KEYSPACE, to prevent potential unavailability of metrics data when nodes are
down.
Enabling security
Tables in the dse_perf keyspace that store performance metrics data do not require special handling for user
reads and writes. Because DataStax Enterprise uses internal system APIs to write data to these tables, you do
not have to create a system user to perform the writes when security is enabled.

1. To enforce restrictions, enable DSE Unified Authentication and specify appropriate permissions on the
tables.

2. To prevent users from viewing sensitive information like keyspace, table, and user names that are recorded
in the performance tables, restrict users from reading the tables.

Setting the replication factor


By default, DataStax Enterprise writes performance metrics data with consistency level ONE and writes are
performed asynchronously. If you need to increase the replication factor of performance metrics data, use
ALTER KEYSPACE. See How is the consistency level configured?.

1. Set the replication factor based depending on your environment:

• SimpleStrategy example:

ALTER KEYSPACE "dse_perf"


WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

• NetworkTopologyStrategy example:

ALTER KEYSPACE "dse_perf"

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
385
Using DataStax Enterprise advanced functionality

WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

Enable data collection


Collecting slow queries
The node_slow_log table collects information about slow queries on a node and retains query information of
long-running CQL statements to help you identify slow queries on a cluster to find and tune poorly performing
queries.
You can also use OpsCenter to view, troubleshoot, and trace Slow Queries.
Pending inserts into the node_slow_log table might still be processed after the service has been disabled.
You can enable and disable the service. After the service is disabled, the logging of queries that take longer
than the specified threshold is stopped. However, disabling the logging does not flush the pending write queue,
a background thread eventually processes everything.

1. By default, collection is enabled for statements that are issued when the query exceeds a specified time
threshold.

• To permanently enable collecting information on slow queries, edit the dse.yaml file. Uncomment
and define values for cql_slow_log_options as shown in the following listing. Notice the default
skip_writing_to_db: true setting.

cql_slow_log_options:
enabled: true
threshold: 200.0
minimum_samples: 100
ttl_seconds: 259200
skip_writing_to_db: true
num_slowest_queries: 5

If you keep the default skip_writing_to_db: true setting then the slow query information is
stored in memory, not in the node_slow_log table shown later in this section.
To store the slow query information in the node_slow_log table, set skip_writing_to_db to false
in the dse.yaml file.
If you must store the slow query information in memory, then the information is accessed through
the MBean managed Java object named com.datastax.bdp.performance objects.CqlSlowLog
using the operation retrieveRecentSlowestCqlQueries.

• To temporarily change the cqlslowlog settings without changing dse.yaml or restarting DSE, use the
dsetool perf subcommands:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
386
Using DataStax Enterprise advanced functionality

# Disable collecting information on slow queries that exceeded the threshold:

$ dsetool perf cqlslowlog disable

# Keep slow queries in memory only:

$ dsetool perf cqlslowlog skip_writing_to_db

# Write slow queries to the database:

$ dsetool perf cqlslowlog write_to_db

• Set the number of slow queries to keep in memory. For example, 5 queries:

$ dsetool perf cqlslowlog set_num_slowest_queries 5

Retrieve the most recent slow queries:

$ dsetool perf cqlslowlog recent_slowest_queries

• To temporarily change the threshold to collect information on 5% of the slowest queries:

$ dsetool perf cqlslowlog 95.0

After you collect information using this temporarily set threshold, you can run a script to view queries
that took longer with this threshold than the previously set threshold. For example:

$ cqlsh -e "SELECT * FROM dse_perf.node_slow_log;"

2. You can export slow queries using the CQL COPY TO command:

cqlsh -e "COPY dse_perf.node_slow_log ( date, commands, duration )


TO 'slow_queries.csv' WITH HEADER = true;"

Collecting system level diagnostics


The following system level diagnostic tables collect system-wide performance information about a cluster:

• key_cache
Per node key cache metrics. Equivalent to nodetool info.

• net_stats
Per node network information. Equivalent to nodetool netstats.

• thread_pool
Per node thread pool active/blocked/pending/completed statistics by pool. Equivalent to nodetool tpstats.

• thread_pool_messages
Per node counts of dropped messages by message type. Equivalent to nodetool tpstats.

To collect system level data:

1. Edit the dse.yaml file.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
387
Using DataStax Enterprise advanced functionality

2. In the dse.yaml file, set the enabled option for cql_system_info_options to true.

cql_system_info_options:
enabled: true
refresh_rate_ms: 10000

3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.

Collecting object I/O level diagnostics


The following object I/O level diagnostic tables collect data on object I/O statistics:

• object_io
Per node recent latency metrics by keyspace and table.

• object_read_io_snapshot
Per node recent latency metrics, broken down by keyspace and table and orders data by mean read
latency.

• object_write_io_snapshot
Per node recent latency metrics, broken down by keyspace and table and orders data mean write latency.

To enable the collection of this data:

1. Edit the dse.yaml file.

2. In the dse.yaml file, set the enabled option for resource_level_latency_tracking_options to true.

resource_level_latency_tracking_options:
enabled: true
refresh_rate_ms: 10000

3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.

Statistics gathered for objects


To identify which objects (keyspace, table, or client) are currently experiencing the highest average latencies,
the Performance Service maintains two latency-ordered tables, which record the mean read/write latencies
and total read/write operations on a per-node, per-table basis:

• object_read_io_snapshot

• object_write_io_snapshot

The two tables are essentially views of the same data, but are ordered differently. Using these tables, you can
identify which data objects on the node currently cause the most write and read latency to users. Because this
is time-sensitive data, if a data object sees no activity for a period, no data will be recorded for them in these
tables.
In addition to these two tables, the Performance Service also keeps per-object latency information with a
longer retention policy in the object_io table. Again, this table holds mean latency and total count values
for both read and write operations, but it can be queried for statistics on specific data objects (either at the
keyspace or table level). Using this table enables you to pull back statistics for all tables on a particular node,
with the option of restricting results to a given keyspace or specific table.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
388
Using DataStax Enterprise advanced functionality

Table activity broken down by user is retained in the object_user_read_io_snapshot,


object_user_write_io_snapshot and object_user_io tables. The first two tables are ordered according to their
mean latency values, making it easy for you to quickly identify which clients are currently experiencing the
highest latency on specific data objects. Having identified the hot tables on a node, you can drill down and see
a breakdown of the users accessing those objects. These tables are refreshed periodically to provide the most
up to date view of activity, whereas the user_object_io table retains data for a longer period, enabling it to be
queried by node and user with the option of restricting further by keyspace or even table.
Collecting database summary diagnostics
You can enable collecting database summary diagnostics using the DataStax Enterprise Performance Service.
These database summary diagnostic tables collect statistics at a database level:

• node_table_snapshot
Per node lifetime table metrics broken down by keyspace and table.

• table_snapshot
Cluster wide lifetime table metrics broken down by keyspace and table (aggregates node_table_snapshot
from each node in the cluster).

• keyspace_snapshot
Cluster wide lifetime table metrics, aggregated at the keyspace level (rolls up the data in table_snapshot).

To permanently enable the collection of database-level statistics data:

1. Edit the dse.yaml file.

2. In the dse.yaml file, set the enabled option for db_summary_stats_options to true.

# Database summary stats options


db_summary_stats_options:
enabled: true
refresh_rate_ms: 10000

3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.

To temporarily enable the collection of database-level statistics data:

$ dsetool perf clustersummary enable

To temporarily disable the collection of database-level statistics data:

$ dsetool perf clustersummary disable

Changes made with performance object subcommands do not persist between restarts and are useful
only for short-term diagnostics.

Collecting cluster summary diagnostics


The following cluster summary diagnostic tables collect statistics at a cluster-wide level:

• cluster_snapshot
Per node system metrics.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
389
Using DataStax Enterprise advanced functionality

• dc_snapshot
Aggregates node_snapshot data at the datacenter level.

• node_snapshot
Aggregates node_snapshot data for the whole cluster.

To enable collecting cluster summary diagnostics using the DataStax Enterprise Performance Service:

1. Edit the dse.yaml file.

2. In the dse.yaml file, set the enabled option for cluster_summary_stats_options to true.

# Cluster summary stats options


cluster_summary_stats_options:
enabled: true
refresh_rate_ms: 10000

3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.

Collecting histogram diagnostics


DSE provides histogram tables for this data:
Histogram Details table Summary Keyspace Keyspace Global Global
table details summary details summary

cell_count Y Y N N N N

partition_size Y Y N N N N

range_latency Y Y Y N Y N

read_latency Y Y Y N Y N

sstables_per_read Y Y Y N N N

write_latency Y Y Y N N N

These tables show similar information to the data obtained by the nodetool tablehistograms utility. The
major difference is that the nodetool histograms output is recent data, while the diagnostic tables contain
lifetime data. The data in the diagnostic histogram tables is cumulative since the DSE server was started. In
contrast, the nodetool tablehistograms shows the values for the past fifteen minutes.

To enable the collection of table histogram data using the DataStax Enterprise Performance Service:

1. Edit the dse.yaml file.

2. In the dse.yaml file, set the enabled option for histogram_data_options to true.

# Column Family Histogram data tables options


histogram_data_options:
enabled: true
refresh_rate_ms: 10000
retention_count: 3

3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
390
Using DataStax Enterprise advanced functionality

The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.

4. To control the number of complete histograms kept in the tables at any one time, change the
retention_count parameter.

Collecting user activity diagnostics


The following diagnostics tables collect user activity:

• object_user_io
Per node, long-lived read/write metrics broken down by keyspace, table, and client connection. Each
row contains mean read/write latencies and operation counts for a interactions with a specific table by a
specific client connection during the last sampling period in which it was active. This data has a 10 minute
TTL.

A client connection is uniquely identified by a host and port.

• object_user_read_io_snapshot
Per node recent read/write metrics by client, keyspace, and table. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean read latency.

• object_user_write_io_snapshot
Per node recent read/write metrics by client, keyspace, and table. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean write latency.

• user_io
Per node, long-lived read/write metrics broken down by client connection and aggregated for all
keyspaces and tables. Each row contains mean read/write latencies and operation counts for a specific
connection during the last sampling period in which it was active. This data has a 10 minute TTL.

• user_object_io
Per node, long-lived read/write metrics broken down by client connection, keyspace, and table. Each row
contains mean read/write latencies and operation counts for interactions with a specific table by a specific
client connection during the last sampling period in which it was active. This data has a 10 minute TTL.

object_user_io and user_object_io represent two different views of the same underlying data:

# object_user_io is structured to enable querying by user

# user_object_io is structured for querying by table

• user_object_read_io_snapshot
Per node recent read/write metrics by keyspace, table, and client. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean read latency.

• user_object_write_io_snapshot
Per node recent read/write metrics by keyspace, table, and client. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean read latency.

• user_read_io_snapshot
Per node recent read/write metrics by client. This table contains only data relating to clients that were
active during the most recent sampling period. Ordered by mean read latency.

• user_write_io_snapshot
Per node recent read/write metrics by client. This table contains only data relating to clients that were
active during the most recent sampling period. Ordered by mean write latency.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
391
Using DataStax Enterprise advanced functionality

To enable collecting user activity diagnostics using the DSE Performance Service:

1. Edit the dse.yaml file.

2. In the dse.yaml file, set the enabled option for user_level_latency_tracking_options to true.

# User/Resource latency tracking settings


user_level_latency_tracking_options:
enabled: true
refresh_rate_ms: 10000
top_stats_limit: 100

3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.

4. To limit the number of individual metrics, change the top_stats_limit parameter.


Keeping this limit fairly low reduces the level of system resources required to process the metrics.

Statistics gathered for user activity


User activity data is stored in two main ways: Latency-ordered for quickly identifying the hot spots in the
system and by user to retrieve statistics for a particular client connection.
To identify which users are currently experiencing highest average latencies on a given node, you can query
these tables:

• user_read_io_snapshot

• user_write_io_snapshot

These tables record mean the read/write latencies and total read/write counts per-user on each node. They
are ordered by their mean latency values, so you can quickly see which users are the experiencing the highest
average latencies on a given node. Having identified the users experiencing the highest latency on a node,
you can then can drill down to find the hot spots for those clients.
To do this, query the user_object_read_io_snapshot and user_object_write_io_snapshot tables. These tables
store mean read/write latency and total read/write count by table for the specified user. They are ordered
according to the mean latency values, and therefore able to quickly show for a given user which tables are
contributing most to the experienced latencies.
The data in these tables is refreshed periodically (by default every 10 seconds), so querying them always
provides an up-to-date view of the data objects with the highest mean latencies on a given node. Because this
is time-sensitive data, if a user performs no activity for a period, no data is recorded for them in these tables.
The user_object_io table also reports per-node user activity broken down by keyspace/table and retains it
over a longer period (4 hours by default). This allows the Performance Service to query by node and user to
see latency metrics from all tables or restricted to a single keyspace or table. The data in this table is updated
periodically (again every 10 seconds by default).
The user_io table reports aggregate latency metrics for users on a single node. Using this table, you can query
by node and user to see high-level latency statistics across all keyspaces.
Collection of search data
Collecting slow search queries
The solr_slow_sub_query_log_options performance object reports distributed sub-queries (query executions
on individual shards) that take longer than a specified period of time.
All objects are disabled by default.

To identify slow search queries using the DataStax Enterprise Performance Service:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
392
Using DataStax Enterprise advanced functionality

1. To permanently enable and configure collecting information on slow search queries, edit the dse.yaml file
and uncomment the solr_slow_sub_query_log_options parameters, and define values for the CQL
slow log settings:

solr_slow_sub_query_log_options:
enabled:true
ttl_seconds: 604800
async_writers: 1
threshold_ms: 100

The default parameter values minimize resource usage.


Options Determines

enabled Whether the object is enabled at start up.

ttl_seconds How many seconds a record survives before it is expired from the performance object.

async_writers For event-driven objects, such as the slow log, determines the number of possible concurrent slow
query recordings. Objects like solr_result_cache_stats are updated in the background.

threshold_ms For the slow log, the level (in milliseconds) at which a sub-query slow enough to be reported.

2. To temporarily change the running parameters for collecting information on slow Solr queries:
To temporarily enable collecting information:

$ dsetool perf solrslowlog enable

To temporarily disable collecting information:

$ dsetool perf solrslowlog disable

To temporarily change the threshold value in milliseconds:

$ dsetool perf solrslowlog 200

3. You can export slow search queries using the CQL COPY TO command:

cqlsh:dse_perf> COPY solr_slow_sub_query_log ( date, commands, duration ) TO


'slow_solr_queries.csv' WITH HEADER = true;

Collecting Apache Solr™ performance statistics


When solr_latency_snapshot_options is enabled, the performance service creates the required tables and
schedules the job to periodically update the relevant snapshot from the specified data source.
The following snapshots collect performance statistics:

• Query latency snapshot


Record phase-level cumulative percentile latency statistics for queries over time.

• Update latency snapshot


Record phase-level cumulative percentile latency statistics for updates over time.

• Commit latency snapshot


Record phase-level cumulative percentile latency statistics for commits over time.

• Merge latency snapshot


Record phase-level cumulative percentile latency statistics for index merges over time.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
393
Using DataStax Enterprise advanced functionality

All objects are disabled by default.

1. Edit the dse.yaml file.

2. In the dse.yaml file, under the solr_latency_snapshot_options parameter, change enabled to true
and set the other options as required.

# Solr latency snapshot options


solr_latency_snapshot_options:
enabled: true
ttl_seconds: 604800
refresh_rate_ms: 60000

All objects are disabled by default.

Table 22: Options


Options Determines

enabled Whether the object is enabled at start up.

ttl_seconds How many seconds a record survives before it is expired from the performance object.

refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.

Collecting cache statistics


The solr_cache_stats_options object records current and cumulative cache statistics.
The following diagnostic tables collect cache statistics:

• Filter cache statistics


Record core-specific query result cache statistics over time.

• Query result cache statistics


Record core-specific query result cache statistics over time.

All objects are disabled by default.

1. Edit the dse.yaml file.

2. In the dse.yaml file, under the solr_cache_stats_optionsparameter, change enabled to true and set
the other options as required.

# Solr cache statistics options


solr_cache_stats_options:
enabled: true
ttl_seconds: 604800
refresh_rate_ms: 60000

All objects are disabled by default.

Table 23: Options


Options Determines

refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.

enabled Whether the object is enabled at start up.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
394
Using DataStax Enterprise advanced functionality

Options Determines

ttl_seconds How many seconds a record survives before it is expired from the performance object.

Collecting index statistics


The solr_index_stats_options object records search index statistics over time.

1. To collect index statistics:

• To permanently collect index statistics, edit the dse.yaml file and change the
solr_index_stats_options to change enabled to true, set the other options as required, and
restart DSE to recognize the changes.

# Solr index statistics options


solr_index_stats_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000

Table 24: solr_index_stats_options


Options Determines

enabled Whether the object is enabled at start up. Default: disabled.

refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.

ttl_seconds How many seconds a record survives before it is expired from the performance object.

• To temporarily enable or disable collecting index statistics, use dsetool perf solrindexstats.

• To verify index integrity, use .

Collecting handler statistics


The Update handler statistics records core-specific direct update handler statistics over time.
All objects are disabled by default.

1. Edit the dse.yaml file.

2. In the dse.yaml file, uncomment the solr_update_handler_metrics_options parameter and set the
options as required.

# Solr UpdateHandler metrics options


solr_update_handler_metrics_options:
enabled: true
ttl_seconds: 604800
refresh_rate_ms: 60000

All objects are disabled by default.

Table 25: Options


Options Determines

enabled Whether the object is enabled at start up.

ttl_seconds How many seconds a record survives before it is expired from the performance object.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
395
Using DataStax Enterprise advanced functionality

Options Determines

refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.

Collecting request handler metrics


Steps to enable the solr_request_handler_metrics_options parameter in dse.yaml and set the other options
as required. The solr_request_handler_metrics_options object records core-specific direct and request
update handler statistics over time.
The following diagnostic tables collect handler metrics:

• Update handler statistics


Record core-specific direct update handler statistics over time.

• Update handler statistics


Record core-specific direct update handler statistics over time.

All objects are disabled by default.

1. Edit the dse.yaml file.

2. In the dse.yaml file, under the solr_request_handler_metrics_options parameter, change enabled to


true and set the other options as required.

# Solr request handler metrics options


solr_request_handler_metrics_options:
enabled: true
ttl_seconds: 604800
refresh_rate_ms: 60000

All objects are disabled by default.

Table 26: Options


Options Determines

enabled Whether the object is enabled at start up.

ttl_seconds How many seconds a record survives before it is expired from the performance object.

refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.

Monitoring Spark with Spark Performance Objects


The Performance Service can collect data associated with Spark cluster and Spark applications and save it to a
table. This allows monitoring the metrics for DSE Analytics applications for performance tuning and bottlenecks.
If authorization is enabled in your cluster, you must grant the user who is running the Spark application
SELECT permissions to the dse_system.spark_metrics_config table, and MODIFY permissions to the
dse_perf.spark_apps_snapshot.

Monitoring Spark cluster information


The Performance Service stores information about DSE Analytics clusters in the
dse_perf.spark_cluster_snapshot table. The cluster performance objects store the available and used
resources in the cluster, including cores, memory, and workers, as well as overall information about all
registered Spark applications, drivers and executors, including the number of applications, the state of each
application, and the host on which the application is running.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
396
Using DataStax Enterprise advanced functionality

To enable collecting Spark cluster information, configure the options in the spark_cluster_info_options
section of dse.yaml.

Table 27: Spark cluster info options


Option Default value Description

enabled false Enables or disables Spark cluster information


collection.

refresh_rate_ms 10,000 The time in milliseconds in which the data will


be collected and stored.

The dse_perf.spark_cluster_snapshot table has the following columns:


name
The cluster name.
active_apps
The number of applications active in the cluster.
active_drivers
The number of active drivers in the cluster.
completed_apps
The number of completed applications in the cluster.
completed_drivers
The number of completed drivers in the cluster.
executors
The number of Spark executors in the cluster.
master_address
The host name and port number of the Spark Master node.
master_recovery_state
The state of the master node.
nodes
The number of nodes in the cluster.
total_cores
The total number of cores available on all the nodes in the cluster.
total_memory_mb
The total amount of memory in megabytes (MB) available to the cluster.
used_cores
The total number of cores currently used by the cluster.
used_memory_mb
The total amount of memory in megabytes (MB) used by the cluster.
workers
The total number of Spark Workers in the cluster.
Monitoring Spark application information
Spark application performance information is stored per application and updated whenever a task is finished. It
is stored in the dse_perf.spark_apps_snapshot table.
To enable collecting Spark application information, configure the options in the
spark_application_info_options section of dse.yaml.

Table 28: Spark application information options


Option Default Description

enabled false Enables or disables collecting Spark


application information.

refresh_rate_ms 10,000 The time in milliseconds in which the data will


be collected and stored.

The driver subsection of spark_application_info_options controls the metrics that are collected by the Spark
Driver.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
397
Using DataStax Enterprise advanced functionality

Table 29: Spark Driver information options


Option Default Description

sink false Enables or disables collecting metrics from


the Spark Driver.

connectorSource false Enables or disables collecting Spark


Cassandra Connector metrics.

jvmSource false Enables or disables collecting JVM heap and


garbage collection metrics from the Spark
Driver.

stateSource false Enables or disables collecting application


state metrics.

The executor subsection of spark_application_info_options controls the metrics collected by the Spark
executors.

Table 30: Spark executor information options


Option Default Description

sink false Enables or disables collecting Spark executor


metrics.

connectorSource false Enables or disables collecting Spark


Cassandra Connector metrics from the Spark
executors.

jvmSource false Enables or disables collecting JVM heap or


garbage collection metrics from the Spark
executors.

The dse_perf.spark_apps_snapshot table has the following columns:


application_id
component_id
metric_id
count
metric_type
rate_15_min
rate_1_min
rate_5_min
rate_mean
snapshot_75th_percentile
snapshot_95th_percentile
snapshot_98th_percentile
snapshot_999th_percentile
snapshot_99th_percentile
snapshot_max
snapshot_mean
snapshot_median
snapshot_min
snapshot_stddev
value
DSE Performance Service diagnostic table reference
Reference information on performance diagnostic information in a set of data dictionary tables can be queried
with CQL. The following types of performance service diagnostic tables are available:

• CQL slow log table

• CQL system info tables

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
398
Using DataStax Enterprise advanced functionality

• Data Resource latency tracking tables

• Database summary statistics tables

• Cluster summary statistics tables

• Histogram tables

• User and resource latency tracking tables

Table names that contain _snapshot are not related to nodetool snapshot. These tables are snapshots of
the data in the last few seconds of activity in the system.

CQL slow log table

Table 31: node_slow_log table


[
Queries on a node exceeding the threshold_ms parameter.
]
Column Name Data type Description

node_ip inet Node address.

date timestamp Date of entry (MM/DD/YYYY granularity).

start_time timeuuid Start timestamp of query execution.

commands listtext CQL statements being executed.

duration bigint Execution time in milliseconds.

parameters maptext Not used at this time.

source_ip inet Client address.

table_names settext CQL tables touched.

username text User executing query, if authentication is enabled.

CQL system info tables

Table 32: key_cache table


[
Key cache performance statistics.
]
Column Name Data type Description

node_ip inet Node address.

cache_capacity bigint Key cache capacity in bytes.

cache_hits bigint Total number of cache hits since startup.

cache_requests bigint Total number of cache requests since startup.

cache_size bigint Current key cache size in bytes.

hit_rate double Ratio of hits to requests since startup.

Table 33: net_stats table


[
Data flow operations repair tasks and more.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
399
Using DataStax Enterprise advanced functionality

]
Column Name Data type Description

node_ip inet Node address.

read_repair_attempted bigint Read repairs attempted since startup.

read_repaired_blocking bigint Number of read repairs performed syncronously since startup.

read_repaired_background bigint Number of read repairs performed asyncronously since startup.

commands_pending int Current number of read repair commands pending.

commands_completed bigint Total read repair commands completed since startup.

responses_pending int Current read repair responses pending count.

responses_completed bigint Current read repairs completed count.

Table 34: thread_pool table


[
Information on thread pool activity.
]
Column Name Data type Description

node_ip inet Node address.

pool_name text Thread pool name.

active bigint Currently active tasks.

all_time_blocked bigint Total blocked tasks since startup.

blocked bigint Currently blocked tasks.

completed bigint Total completed tasks since startup.

pending bigint Currently pending tasks.

Table 35: thread_pool_messages table


[
Information about thread pool messages.
]
Column Name Data type Description

node_ip inet Node address.

message_type text Inter-node message type.

dropped_count int Total count of dropped messages since startup.

Data Resource latency tracking tables

Table 36: object_io table


[
Per node recent latency metrics by keyspace and table.
]
Column Name Data type Description

node_ip inet Node address.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
400
Using DataStax Enterprise advanced functionality

Column Name Data type Description

keyspace_name text Keyspace name.

table_name text Table name.

last_activity timestamp End of sampling period in which this object was last active.

memory_only boolean DSE memory only table.

read_latency double Mean value in microseconds for all reads during the last active sampling period
for this object.

total_reads bigint Count during the last active sampling period for this object.

total_writes bigint Count during the last active sampling period for this object.

write_latency double Mean value in microseconds for all writes during the last active sampling period
for this object.

Table 37: user_object_read_io_snapshot table


[
Ordered by user/object and user total write latency.
]
Column Name Data type Description

keyspace_name text Keyspace name.

table_name text Table name.

node_ip inet Node address.

user_ip inet User node address.

conn_ip inet Connection node address.

username text User name.

read_latency double In microseconds during the last sampling period.

total_reads bigint Count during the last sampling period.

write_latency double In microseconds during the last sampling period.

total_writes bigint Count during the last sampling period.

latency_index int Ranking by mean read latency during the last sampling period.

read_quantiles boolean DSE memory only table.

Table 38: object_read_io_snapshot table


[
Per node recent latency metrics by keyspace and table. Ordered by user/object and user total write latency.
]
Column Name Data type Description

keyspace_name text Keyspace name.

table_name text Table name.

node_ip inet Node address.

user_ip inet User node address.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
401
Using DataStax Enterprise advanced functionality

Column Name Data type Description

conn_ip inet Connection node address.

username text User name.

read_latency double In microseconds during the last sampling period.

total_reads bigint Count during the last sampling period.

write_latency double In microseconds during the last sampling period.

total_writes bigint Count during the last sampling period.

latency_index int Ranking by mean read latency during the last sampling period.

read_quantiles boolean DSE memory only table.

Table 39: object_write_io_snapshot table


[
Per node recent latency metrics by keyspace and table. Ordered by mean write latency. Scale of 0 to 99 (0 is
worst).
]
Column Name Data type Description

node_ip inet Node address.

latency_index int Ranking by mean write latency during the last sampling period.

keyspace_name text Keyspace name.

memory_only boolean DSE memory only table.

read_latency double Mean value in microseconds during the active sampling period.

table_name text Table name.

total_reads bigint Count during the last sampling period.

total_writes bigint Count during the last sampling period.

write_latency double Mean value in microseconds during the last sampling period.

Database summary statistics tables

Table 40: node_table_snapshot table


[
Per node table metrics by keyspace and table.
]
Column Name Data type Description

node_ip inet Node address.

keyspace_name text Keyspace name.

table_name text Table name.

bf_false_positive_ratio double Bloom filter false positive ratio since startup.

bf_false_positives bigint Bloom filter false positive count since startup.

compression_ratio double Current compression ratio of SSTables.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
402
Using DataStax Enterprise advanced functionality

Column Name Data type Description

droppable_tombstone_ratio double Ratio of tombstones older than gc_grace_seconds against total column count in
all SSTables.

key_cache_hit_rate double Current key cache hit rate.

live_sstable_count bigint Current SSTable count.

max_row_size bigint Maximum partition size in bytes.

mean_read_latency double In microseconds for this table since startup.

mean_row_size bigint Average partition size in bytes.

mean_write_latency double In microseconds for this table since startup.

memtable_columns_count bigint Approximate number of cells for this table currently resident in memtables.

memtable_size bigint Total size in bytes of memtable data.

memtable_switch_count bigint Number of times memtables have been flushed since startup.

min_row_size bigint Minimum partition size in bytes.

total_data_size bigint Data size on disk in bytes.

total_reads bigint Number of reads since startup.

total_writes bigint Number of writes since startup.

unleveled_sstables bigint Current count of SSTables in level 0 (if using leveled compaction).

Table 41: table_snapshot table


[
Cluster-wide lifetime table metrics by keyspace and table. This table aggregates node_table_snapshot from
each node in the cluster.
]
Column Name Data type Description

keyspace_name text Keyspace name.

table_name text Table name.

bf_false_positive_ratio double Bloom filter false positive ratio since startup.

bf_false_positives bigint Bloom filter false positive count since startup.

compression_ratio double Current compression ratio of SSTables.

droppable_tombstone_ratio double Ratio of tombstones older than gc_grace_seconds against total column count in
all SSTables.

key_cache_hit_rate double Current key cache hit rate.

live_sstable_count bigint Current SSTable count.

max_row_size bigint Maximum partition size in bytes.

mean_read_latency double In microseconds for this table since startup.

mean_row_size bigint Average partition size in bytes.

mean_write_latency double In microseconds for this table since startup.

memtable_columns_count bigint Approximate number of cells for this table currently resident in memtables.

memtable_size bigint Total size in bytes of memtable data.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
403
Using DataStax Enterprise advanced functionality

Column Name Data type Description

memtable_switch_count bigint Number of times memtables have been flushed since startup.

min_row_size bigint Minimum partition size in bytes.

total_data_size bigint Data size on disk in bytes.

total_reads bigint Number of reads since startup.

total_writes bigint Number of writes since startup.

unleveled_sstables bigint Current count of SSTables in level 0 (if using leveled compaction).

Table 42: keyspace_snapshot table


[
Cluster wide lifetime table metrics, aggregated at the keyspace level (aggregates the data in table_snapshot).
]
Column Name Data type Description

keyspace_name text Keyspace name.

index_count int Number of secondary indexes.

mean_read_latency double For all tables in the keyspace and all nodes in the cluster since startup.

mean_write_latency double For all tables in the keyspace and all nodes in the cluster since startup.

table_count int Number of tables in the keyspace.

total_data_size bigint Total size in bytes of SSTables for all tables and indexes across all nodes in the
cluster.

total_reads bigint For all tables, across all nodes.

total_writes bigint For all tables, across all nodes.

Cluster summary statistics tables

Table 43: node_snapshot table


[
Per node system metrics.
]
Column Name Data type Description

node_ip inet Node address.

cms_collection_count bigint CMS garbage collections since startup.

cms_collection_time bigint Total time spent in CMS garbage collection since startup.

commitlog_pending_tasks bigint Current commit log tasks pending.

commitlog_size bigint Total commit log size in bytes.

compactions_completed bigint Number of compactions completed since startup.

compactions_pending int Number of pending compactions.

completed_mutations bigint Total number of mutations performed since startup.

data_owned float Percentage of total data owned by this node.

datacenter text Datacenter name.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
404
Using DataStax Enterprise advanced functionality

Column Name Data type Description

dropped_mutation_ratio double Ratio of dropped to completed mutations since startup.

dropped_mutations bigint Total number of dropped mutations since startup.

flush_sorter_tasks_pending bigint Current number of memtable flush sort tasks pending.

free_space bigint Total free disk space in bytes.

gossip_tasks_pending bigint Current number of gossip tasks pending.

heap_total bigint Total available heap memory in bytes.

heap_used bigint Current heap usage in bytes.

hinted_handoff_pending bigint Current number of hinted handoff tasks pending.

index_data_size bigint Total size in bytes of index column families.

internal_responses_pending bigint Current number of internal response tasks pending.

key_cache_capacity bigint Key cache capacity in bytes.

key_cache_entries bigint Current number of key cache entries.

key_cache_size bigint Current key cache size in bytes.

manual_repair_tasks_pending bigint Current number of manual repair tasks pending.

mean_range_slice_latency double Mean latency in microseconds for range slice operations since startup.

mean_read_latency double Mean latency in microseconds for reads since startup.

mean_write_latency double Mean latency in microseconds for writes since startup.

memtable_post_flushers_pending bigint Current number of memtable post flush tasks pending.

migrations_pending bigint Current number of migration tasks pending.

misc_tasks_pending bigint Current number of misc tasks pending.

parnew_collection_count bigint ParNew garbage collections since startup.

parnew_collection_time bigint Total time spent in ParNew garbage collection since startup.

process_cpu_load double Current CPU load for the DSE process (Linux only).

rack text Rack identifier.

range_slice_timeouts bigint Number of timed out range slice requests since startup.

read_repair_tasks_pending bigint Current number of read repair tasks pending.

read_requests_pending bigint Current read requests pending.

read_timeouts bigint Number of timed out range slice requests since startup.

replicate_on_write_tasks_pending bigint Current.

request_responses_pending bigint Current.

row_cache_capacity bigint Row cache capacity in byte.s

row_cache_entries bigint Current number of row cache entries.

row_cache_size bigint Current row cache size in bytes.

state text Node State (JOINING/LEAVING/MOVING/NORMAL).

storage_capacity bigint Total disk space in bytes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
405
Using DataStax Enterprise advanced functionality

Column Name Data type Description

streams_pending int Current number of pending streams.

table_data_size bigint Total size in bytes of non-index column families.

tokens settext Tokens owned by the this node.

total_batches_replayed bigint Total number of batchlog entries replayed since startup

total_node_memory bigint Total available RAM (Linux only).

total_range_slices bigint Total number of range slice operations performed since startup.

total_reads bigint Total number of reads performed since startup.

total_writes bigint Total number of writes performed since startup.

uptime bigint Node uptime in seconds.

write_requests_pending bigint Total number of write tasks pending.

write_timeouts bigint Number of timed out range slice requests since startup.

Table 44: dc_snapshot table


[
Aggregates node_snapshot data at the datacenter level.
]
Column Name Data type Description

name text Datacenter name

compactions_completed bigint Total number of compactions completed since startup by all nodes in the data
center.

compactions_pending int Total number of pending compactions on all nodes in the datacenter.

completed_mutations bigint Total number of mutations performed since startup by all nodes in the data
center.

dropped_mutation_ratio double Ratio of dropped to completed mutations since startup across all nodes in the
datacenter.

dropped_mutations bigint Total number of dropped mutations since startup by all nodes in the data center.

flush_sorter_tasks_pending bigint Total number of memtable flush sort tasks pending across all nodes in the
datacenter.

free_space bigint Total free disk space in bytes across all nodes in the datacenter.

gossip_tasks_pending bigint Total number of gossip tasks pending across all nodes in the data center.

hinted_handoff_pending bigint Total number of hinted handoff tasks pending across all nodes in the data center.

index_data_size bigint Total size in bytes of index column families across all nodes in the data center.

internal_responses_pending bigint number of internal response tasks pending across all nodes in the data center.

key_cache_capacity bigint Total capacity in bytes of key caches across all nodes in the data center.

key_cache_entries bigint Total number of entries in key caches across all nodes in the data center.

key_cache_size bigint Total consumed size in bytes of key caches across all nodes in the data center.

manual_repair_tasks_pending bigint Total number of manual repair tasks pending across all nodes in the data center.

mean_range_slice_latency double Mean latency in microseconds for range slice operations, averaged across all
nodes in the datacenter.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
406
Using DataStax Enterprise advanced functionality

Column Name Data type Description

mean_read_latency double Mean latency in microseconds for read operations, averaged across all nodes in
the datacenter.

mean_write_latency double Mean latency in microseconds for write operations, averaged across all nodes in
the datacenter.

memtable_post_flushers_pending bigint Total number of memtable post flush tasks pending across all nodes in the
datacenter.

migrations_pending bigint Total number of migration tasks pending across all nodes in the data center.

misc_tasks_pending bigint Total number of misc tasks pending across all nodes in the datacenter.

node_count int Total number of live nodes in the datacenter.

read_repair_tasks_pending bigint Total number of read repair tasks pending across all nodes in the data center.

read_requests_pending bigint Total read requests pending across all nodes in the datacenter.

replicate_on_write_tasks_pending bigint Total number of counter replicate on write tasks pending across all nodes in the
datacenter.

request_responses_pending bigint Total number of request response tasks pending across all nodes in the data
center.

row_cache_capacity bigint Total capacity in bytes of partition caches across all nodes in the data center.

row_cache_entries bigint Total number of row cache entries all nodes in the datacenter.

row_cache_size bigint Total consumed size in bytes of row caches across all nodes in the data center.

storage_capacity bigint Total disk space in bytes across all nodes in the datacenter.

streams_pending int number of pending streams across all nodes in the datacenter.

table_data_size bigint Total size in bytes of non-index column families across all nodes in the data
center.

total_batches_replayed bigint Total number of batchlog entries replayed since startup by all nodes in the
datacenter.

total_range_slices bigint Total number of range slice operations performed since startup by all nodes in
the datacenter.

total_reads bigint Total number of read operations performed since startup by all nodes in the
datacenter.

total_writes bigint Total number of write operations performed since startup by all nodes in the
datacenter.

write_requests_pending bigint Total number of write tasks pending across all nodes in the data center.

Table 45: cluster_snapshot table


[
Aggregates node_shapshot data for the entire cluster.
]
Column Name Data type Description

name text Cluster name.

compactions_completed bigint Total number of compactions completed since startup by all nodes in the cluster.

completed_mutations bigint Total number of mutations performed since startup by all nodes in the cluster.

compactions_pending int Total number of pending compactions on all nodes in the cluster.

datacenters settext Datacenter names.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
407
Using DataStax Enterprise advanced functionality

Column Name Data type Description

dropped_mutation_ratio double Ratio of dropped to completed mutations since startup across all nodes in the
cluster.

dropped_mutations bigint Total number of dropped mutations since startup by all nodes in the cluster.

flush_sorter_tasks_pending bigint Total number of memtable flush sort tasks pending across all nodes in the
cluster.

free_space bigint Total free disk space in bytes across all nodes in the cluster.

gossip_tasks_pending bigint Total number of gossip tasks pending across all nodes in the cluster.

hinted_handoff_pending bigint Total number of hinted handoff tasks pending across all nodes in the cluster.

index_data_size bigint Total size in bytes of index column families across all nodes in the cluster.

internal_responses_pending bigint Number of internal response tasks pending across all nodes in the cluster.

key_cache_capacity bigint Total capacity in bytes of key caches across all nodes in the cluster.

key_cache_entries bigint Total number of entries in key caches across all nodes in the cluster.

key_cache_size bigint Total consumed size in bytes of key caches across all nodes in the cluster.

keyspace_count int Total number of keyspaces defined in schema.

manual_repair_tasks_pending bigint Total number of manual repair tasks pending across all nodes in the cluster.

mean_range_slice_latency double Mean latency in microseconds for range slice operations, averaged across all
nodes in the cluster.

mean_read_latency double Mean latency in microseconds for read operations, averaged across all nodes in
the cluster.

mean_write_latency double Mean latency in microseconds for write operations, averaged across all nodes in
the cluster.

memtable_post_flushers_pending bigint Total number of memtable post flush tasks pending across all nodes in the
cluster.

migrations_pending bigint Total number of migration tasks pending across all nodes in the cluster.

misc_tasks_pending bigint Total number of misc tasks pending across all nodes in the cluster.

node_count int Total number of live nodes in the cluster.

read_repair_tasks_pending bigint Total number of read repair tasks pending across all nodes in the cluster.

read_requests_pending bigint Total read requests pending across all nodes in the cluster.

replicate_on_write_tasks_pending bigint Total number of counter replicate on write tasks pending across all nodes in the
cluster.

request_responses_pending bigint Total number of request response tasks pending across all nodes in the cluster

row_cache_capacity bigint Total capacity in bytes of partition caches across all nodes in the cluster.

row_cache_entries bigint Total number of row cache entries all nodes in the cluster.

row_cache_size bigint Total consumed size in bytes of row caches across all nodes in the cluster

storage_capacity bigint Total disk space in bytes across all nodes in the cluster.

streams_pending int Number of pending streams across all nodes in the cluster.

table_count int Total number of tables defined in schema.

table_data_size bigint Total size in bytes of non-index column families across all nodes in the cluster.

total_batches_replayed bigint Total number of batchlog entries replayed since startup by all nodes in the
cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
408
Using DataStax Enterprise advanced functionality

Column Name Data type Description

total_range_slices bigint Total number of read operations performed since startup by all nodes in the
cluster.

total_reads bigint Total number of write operations performed since startup by all nodes in the
cluster.

total_writes bigint Total number of write tasks pending across all nodes in the cluster.

write_requests_pending bigint Total number of write tasks pending across all nodes in the cluster.

Histogram tables
A histogram measures the distribution of values in a stream of data. These histogram tables use the Metrics
Core library. You must enable the collection of table histogram data using the DataStax Enterprise Performance
Service.
These tables show similar information to the data obtained by the nodetool tablehistograms utility. The major
difference is that the nodetool histograms output is recent data, while the diagnostic tables contain lifetime
data. The data in the diagnostic histogram tables is cumulative since the DSE server was started. In contrast,
the nodetool tablehistograms shows the values for the past fifteen minutes.

Histogram tables provide DSE statistics that can be queried with CQL and are generated with these templates:

• Detailed

• Summary

• Global

DSE provides histogram tables for this data:


Histogram Details table Summary Keyspace Keyspace Global Global
table details summary details summary

cell_count Y Y N N N N

partition_size Y Y N N N N

range_latency Y Y Y N Y N

read_latency Y Y Y N Y N

sstables_per_read Y Y Y N N N

write_latency Y Y Y N N N

Table 46: Summary histogram tables


[
Summary histogram tables the percentages with output similar to the nodetool nodetool tablehistograms
command output.
]
Column Name Data type Description

node_ip inet Node address

keyspace_name text Keyspace name

table_name text Table name

histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.

p50 bigint The threshold where 50 percent of the operation is recorded 50% from the end,
for the 50th percentile.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
409
Using DataStax Enterprise advanced functionality

Column Name Data type Description

p75 bigint The threshold where 75 percent of the operation is recorded 25% from the end,
for the 75th percentile.

p90 bigint The threshold where 90 percent of the operation is recorded 10% from the end,
for the 90th percentile.

p95 bigint The threshold where 95 percent of the operation is recorded 5% from the end, for
the 95th percentile.

p98 bigint The threshold where 98 percent of the operation is recorded 2% from the end, for
the 98th percentile.

p99 bigint The threshold where 99 percent of the operation is recorded 1% from the end, for
the 99th percentile.

min bigint The minimum number of operations.

max bigint The maximum number of operations.

dropped_messages bigint The total number of dropped messages for mutations to this table.

Table 47: Detailed histogram for keyspaces


[
Detailed data for a single keyspace.
]
Column Name Data type Description

node_ip inet Node address

keyspace_name text Keyspace name

table_name text Table name

histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.

bucket_offset bigint The number between the current bucket and the previous bucket.

bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.

Table 48: Detailed table histogram


[
Detailed data for a single table.
]
Column Name Data type Description

node_ip inet Node address

keyspace_name text Keyspace name

table_name text Table name

histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.

bucket_offset bigint The number between the current bucket and the previous bucket.

bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
410
Using DataStax Enterprise advanced functionality

Table 49: Global histogram tables


[
Detailed data for all tables and all keyspaces, cumulative since the last time the node was started.
]
Column Name Data type Description

node_ip inet Node address.

keyspace_name text Keyspace name.

table_name text Table name.

histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace, and table are ordered by this field, to enable date-based filtering.

bucket_offset bigint Number of cells in a partition.

bucket_count bigint Number of partitions where the cell count falls in the corresponding bucket.

Table 50: cell_count_histograms table


[
Cell count per partition histogram data.
]
Column Name Data type Description

node_ip inet Node address.

histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.

bucket_offset bigint The number between the current bucket and the previous bucket.

bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.

Table 51: dropped_messages table


[
Dropped messages histogram data in seconds.
]
Column Name Data type Description

node_ip inet Node address.

histogram_id timestamp The timestamp when the histogram record was written.

verb text Where verb denotes the message type that was dropped: MUTATION,
HINT, READ_REPAIR, READ, REQUEST_RESPONSE, BATCH_STORE,
BATCH_REMOVE, RANGE_SLICE, GOSSIP_DIGEST_SYN,
GOSSIP_DIGEST_ACK, GOSSIP_DIGEST_ACK2, DEFINITIONS_UPDATE,
TRUNCATE, SCHEMA_CHECK, REPLICATION_FINISHED,
INTERNAL_RESPONSE, COUNTER_MUTATION, SNAPSHOT,
MIGRATION_REQUEST, GOSSIP_SHUTDOWN, ECHO, REPAIR_MESSAGE,
PAXOS_PREPARE, PAXOS_PROPOSE, PAXOS_COMMIT.

global_count bigint Global metrics are the sum of the internal and the cross-node metrics for
dropped events since the server was started, including dropped mutations.

global_mean_rate double Global metrics for dropped messages, including dropped mutations.

global_1min_rate double Estimated rate of the combined internal and the cross-node metrics for dropped
messages for 1 minute.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
411
Using DataStax Enterprise advanced functionality

Column Name Data type Description

global_5min_rate double Estimated sum of the internal and the cross-node metrics for dropped messages

global_15min_rate double Global metrics for dropped messages for 15 minutes.

table_name text Table name.

internal_count bigint Inside one DSE node, the number of internal messages that were dropped since
the server was started.

internal_mean_rate double Inside one DSE node, the average rate of dropped message events per second.

internal_1min_rate double Inside one DSE node, the average number of messages that were dropped in a
one-minute interval.

internal_5min_rate double Inside one DSE node, the average number of messages that were dropped in a
five-minute interval.

internal_15min_rate double Inside one DSE node, the average number of messages that were dropped in a
15-minute interval.

internal_latency_median double Inside one DSE node, the median of all recorded durations for one second.

internal_latency_p75 double Inside one DSE node, the threshold where 75 percent of the latency is recorded
25% from the end, for the 75th percentile.

internal_latency_p90 double Inside one DSE node, the threshold where 90 percent of the latency is recorded
10% from the end, for the 90th percentile.

internal_latency_p95 double Inside one DSE node, the threshold where 95 percent of the latency is recorded
5% from the end, for the 95th percentile.

internal_latency_p98 double Inside one DSE node, the threshold where 98 percent of the latency is recorded
2% from the end, for the 98th percentile.

internal_latency_p99 double Inside one DSE node, the threshold where 99 percent of the latency is recorded
1% from the end, for the 99th percentile.

internal_latency_min double Inside one DSE node, the minimum number of dropped mutations.

internal_latency_mean double Inside one DSE node, the average number of messages dropped.

internal_latency_max double Inside one DSE node, the maximum number of dropped mutations.

internal_latency_stdev double Inside one DSE node, the standard deviation of dropped mutations.

xnode_count bigint For cross node messages, the number of messages that were dropped since the
server was started.

xnode_mean_rate double For cross node messages, the average number of messages dropped.

xnode_1min_rate double For cross node messages, the average number of messages that were dropped
in a one-minute interval.

xnode_5min_rate double For cross node messages, the average number of messages that were dropped
in a five-minute interval.

xnode_15min_rate double For cross node messages, the average number of messages that were dropped
in a 15-minute interval.

xnode_median double For cross node messages, the median of all recorded durations for one second.

xnode_p75 double For cross node messages, the threshold where 75 percent of the latency is
recorded 25% from the end, for the 75th percentile.

xnode_p90 double For cross node messages, the threshold where 90 percent of the latency is
recorded 10% from the end, for the 90th percentile.

xnode_p95 double For cross node messages, the threshold where 95 percent of the latency is
recorded 5% from the end, for the 95th percentile.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
412
Using DataStax Enterprise advanced functionality

Column Name Data type Description

xnode_p98 double For cross node messages, the threshold where 98 percent of the latency is
recorded 2% from the end, for the 98th percentile.

xnode_p99 double For cross node messages, the maximum number dropped messages are
recorded 1% from the end, for the 99th percentile.

xnode_min double For cross node messages, the minimum number of dropped messages per
second.

xnode_mean double For cross node messages, the average number of dropped messages per
second.

xnode_max double For cross node messages, the maximum number of dropped messages per
second.

xnode_stdev double For cross node messages, the standard deviation for dropped messages per
second.

bucket_offset bigint The number between the current bucket and the previous bucket.

bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.

User and resource latency tracking tables

Table 52: user_io table


[
Per node, long-lived read/write metrics by client connection and aggregated for all keyspaces and tables.
]
Column Name Data type Description

node_ip inet Node address.

conn_id text Unique client connection ID.

last_activity timestamp End of sampling period in which this client was last active.

read_latency double In microseconds for the last active sampling period.

total_reads bigint Count during the last active sampling period for this client.

total_writes bigint Count during the last active sampling period for this client.

user_ip inet Client origin address.

username text Present if authentication is enabled.

write_latency double In microseconds for the last active sampling period.

Table 53: user_read_io_snapshot table


[
Per node recent read/write metrics by keyspace, table, and client during the most recent sampling period.
]
Column Name Data type Description

node_ip inet Node address.

latency_index int Ranking by mean read latency during the last sampling period.

conn_id text Unique client connection ID.

read_latency double Mean value in microseconds during the last sampling period.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
413
Using DataStax Enterprise advanced functionality

Column Name Data type Description

total_reads bigint During the last sampling period.

total_writes bigint During the last sampling period.

user_ip inet Client origin address.

username text Present if authentication is enabled.

write_latency double Mean value in microseconds during the last sampling period.

Table 54: user_write_io_snapshot table


[
Per node recent read/write metrics by keyspace, table, and client during the most recent sampling period.
]
Column Name Data type Description

node_ip inet Node address.

latency_index int Ranking by mean write latency during the last sampling period.

conn_id text Unique client connection ID.

read_latency double Mean value in microseconds during the last sampling period.

total_reads bigint During the last sampling period.

total_writes bigint During the last sampling period.

user_ip inet Client origin address.

username text Present if authentication is enabled.

write_latency double Mean value in microseconds during the last sampling period.

Table 55: user_object_io table


[
Per node, long-lived read/write metrics by client connection, keyspace and table.
]
Column Name Data type Description

node_ip inet Node address.

conn_id text Unique client connection ID.

keyspace_name text Keyspace name.

table_name text Table name.

last_activity timestamp End of sampling period in which this client was last active against this object.

read_latency double Mean value in microseconds during the last active sampling period for this object/
client.

total_reads bigint During the last active sampling period for this object/client.

total_writes bigint During the last active sampling period for this object/client.

user_ip inet Client origin address.

username text Present if authentication is enabled.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
414
Using DataStax Enterprise advanced functionality

Column Name Data type Description

write_latency double Mean value in microseconds during the last active sampling period for this object/
client.

Table 56: user_object_write_io_snapshot table


[
Per node recent read/write metrics by client, keyspace, and table during the most recent sampling period.
]
Column Name Data type Description

node_ip inet Node address.

latency_index int Ranking by mean write latency during the last sampling period.

conn_id text Unique client connection ID.

keyspace_name text Keyspace name.

read_latency double Mean value in microseconds during the last sampling period.

table_name text Table name.

total_reads bigint During the last sampling period.

total_writes bigint During the last sampling period.

user_ip inet Client origin address.

username text Present if authentication is enabled.

write_latency double Mean value in microseconds during the last sampling period.

Table 57: object_user_io table


[
Overview of the I/O activity by user for each table.
]
Column Name Data type Description

node_ip inet Node address.

keyspace_name text Keyspace name.

table_name text Table name.

conn_id text Unique client connection ID.

last_activity timestamp End of sampling period in which this client connection was last active against this
object.

read_latency double Mean value in microseconds during the last active sampling period for this object/
client.

total_reads bigint Count during the last active sampling period for this object/client.

total_writes bigint Count during the last active sampling period for this object/client.

user_ip inet Client origin address.

username text Present if authentication is enabled.

write_latency double Mean value in microseconds during the last active sampling period for this object/
client.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
415
Using DataStax Enterprise advanced functionality

Table 58: object_user_read_io_snapshot table


[
Per node recent read/write metrics by client, keyspace, and table during the most recent sampling period.
Tracks best-worst latency on a scale of 0 (worst) to 99 (best).
]
Column Name Data type Description

node_ip inet Node address.

latency_index int Ranking by mean read latency during the last sampling period.

conn_id text Unique client connection ID.

keyspace_name text Keyspace name.

read_latency double Mean value in microseconds during the last active sampling period for this object/
client.

table_name text Table name.

total_reads bigint Count during the last active sampling period for this object/client.

total_writes bigint Count during the last active sampling period for this object/client.

user_ip inet Client origin address.

username text Present if authentication is enabled.

write_latency double Mean value in microseconds during the last active sampling period for this object/
client.

Table 59: object_user_write_io_snapshot table


[
Per node recent read/write metrics by client, keyspace, and table during the most recent sampling period.
Tracks best-worst latency on a scale of 0 to 99 (0 is worst).
]
Column Name Data type Description

node_ip inet Node address.

latency_index int Ranking by mean write latency during the last sampling period.

conn_id text Unique client connection ID.

keyspace_name text Keyspace name.

read_latency double Mean value in microseconds during the last active sampling period for this object/
client.

table_name text Table name.

total_reads bigint Count during the last active sampling period for this object/client.

total_writes bigint Count during the last active sampling period for this object/client.

user_ip inet Client origin address.

username text Present if authentication is enabled.

write_latency double Mean value in microseconds during the last active sampling period for this object/
client.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
416
Using DataStax Enterprise advanced functionality

Leases table
Acquire, disable, renew, and resolve are the four lease operations. Histogram statistics indicate the rough
distribution of timing: the average amount of time, the time for the worst request out of 100, the absolute worst
request, and the rate at which operations are happening.
Table 60: leases table
[
Lease metrics for the lease subsystem.
]
Column Name Data type Description

acquire_average_latency_ms bigint Average latency, in milliseconds, to acquire lease.

acquire_latency99ms bigint Latency recorded 1% from the end, for the 99th percentile.

acquire_rate15 double For the datacenter.

disable_average_latency_ms bigint The average amount of time.

disable_latency99ms bigint The time for the worst request out of 100.

disable_max_latency_ms bigint The absolute worst request.

disable_rate15 double The rate at which operations are happening.

monitor inet The machine partially responsible with the lease, there are Replication_Factor #
of monitors.

name text The lease name.

renew_average_latency_ms bigint The average amount of time.

renew_latency99ms bigint The time for the worst request out of 100.

renew_max_latency_ms bigint The absolute worst request.

renew_rate15 double The rate at which operations are happening.

resolve_average_latency_ms bigint The average amount of time.

resolve_latency99ms bigint The time for the worst request out of 100.

resolve_max_latency_ms bigint The absolute worst request.

resolve_rate15 The rate at which operations are happening.

up boolean Whether the lease is held, implies that the service is up.

up_or_down_since timestamp Time of the last change. For example, UP since 10PM or DOWN since 4PM.

DSE Search Performance Service diagnostic table reference

Frequently asked questions about the DSE Search Performance Service


Question: Is it a good idea to leave the search performance objects enabled 24/7?
Answer: It depends on your use cases. If you’re attempting to collect data pertaining to a problem that occurs
sporadically, and you’ve chosen configuration values that don’t introduce a painful amount of performance
overhead, there’s no reason you can’t keep the objects enabled on an ongoing basis.
Question: What kind of performance impact will enabling the search performance objects have?
Answer: Performance overhead, in terms of CPU and memory usage, should be negligible when using the
DataStax Enterprise's default configuration values. However, the overhead introduced by enabling the objects
varies as the configuration is modified (described in the following sections). For instance, setting longer TTLs
and shorter refresh intervals leads to higher memory and disk consumption.
Question: Should I enable the search performance objects on every node in my cluster?

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
417
Using DataStax Enterprise advanced functionality

Answer: The search performance objects should only be enabled on search nodes, that is, nodes where
indexes reside that can observe search operations. While it is perfectly acceptable to enable the objects
across an entire cluster, enabling them on a single node for observation first is a good way to mitigate risk.
Question: Can I use existing tables with secondary indexes on some columns, and create search indexes on
other columns in the same table?
Answer: Do not mix search indexes with secondary indexes. Attempting to use both indexes on the same table
is not supported.
Slow sub-query log for search
Report distributed sub-queries for search (query executions on individual shards) that take longer than a
specified period of time.
JMX analog
None.
Schema
When slow query logging is enabled, this table is created automatically.

CREATE TABLE IF NOT EXISTS dse_perf.solr_slow_sub_query_log (


core text,
date timestamp,
coordinator_ip inet,
query_id timeuuid,
node_ip inet,
start_time timeuuid,
parameters map text, text,
elapsed_millis bigint,
component_prepare_millis map<text, bigint>,
component_process_millis map<text, bigint>,
num_docs_found bigint,
PRIMARY KEY ((core, date), coordinator_ip, query_id, node_ip)
)

Field Type Purpose


core text Name of the search core (keyspace.table) where the slow
sub-query was executed.
date timestamp Midnight on the mm/dd/yyyy the slow sub-query started.
coordinator_ip inet Distributed query coordinator IP address.
query_id timeuuid ID of distributed query to which the slow sub-query belongs.
node_ip inet Node IP address.
start_time timestamp Timestamp at the start of the slow sub-query.
parameters maptext, Solr query parameters.
text
elapsed_millis bigint How long the slow sub-query took.
component_prepare_millis maptext, Map of (component name -> time spent in prepare phase).
bigint
component_process_millis maptext, Map of (component name -> time spent in process phase).
bigint
num_docs_found bigint Number of documents found by the slow sub-query.

Slow Solr sub-queries recorded on 10/17/2015 for core keyspace.table for coordinator at 127.0.0.1:

SELECT *
FROM solr_slow_sub_query_log

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
418
Using DataStax Enterprise advanced functionality

WHERE core = 'keyspace.table' AND date = '2015-10-17' AND coordinator_ip =


'127.0.0.1';

Slow Solr sub-queries recorded on 10/17/2015 for core keyspace.table for coordinator at 127.0.0.1 for a
particular distributed query with an ID of 33e56d33-4e63-11e4-9ce5-335a04d08bd4 :

SELECT *
FROM solr_slow_sub_query_log
WHERE core = 'keyspace.table'
AND date = '2015-10-17'
AND coordinator_ip = '127.0.0.1'
AND query_id = 33e56d33-4e63-11e4-9ce5-335a04d08bd4;

Collecting slow search queries [Steps to help you identify slow search queries using the DataStax Enterprise
Performance Service.]
Indexing error log
Record errors that occur during document indexing.
Specifically, this log records errors that occur during document validation. A common scenario is where a non-
stored copy field is copied into a field with an incompatible type.
JMX Analog
None.
Schema

CREATE TABLE IF NOT EXISTS dse_perf.solr_indexing_errors (


node_ip inet,
core text,
date timestamp,
time timeuuid,
document text,
field_name text,
field_type text,
message text,
PRIMARY KEY ((node_ip, core, date), time)
)
WITH CLUSTERING ORDER BY (time DESC)

Field Type Purpose


node_ip inet Node address.
core text search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the error occurred.
time timeuuid Timestamp for the time the error occurred.
document text The primary key for the table row corresponding to the document. For example:
[foo, bar, baz] for a complex PK, or foo for a single element PK.
field_name text Name of the field that caused the validation error.
field_type text Name of the field that caused the validation error, such as solr.StrField.
message text Error message.

Indexing validation errors recorded on 10/17/2014 for core keyspace.table for at node 127.0.0.1:

SELECT *

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
419
Using DataStax Enterprise advanced functionality

FROM solr_indexing_errors
WHERE core = 'keyspace.table' AND date = '2014-10-17' AND node_ip = '127.0.0.1';

Most recent 5 indexing validation errors recorded on 10/17/2014 for core keyspace.table for at node
127.0.0.1:

SELECT *
FROM solr_indexing_errors
WHERE core = 'keyspace.table'
AND date = '2014-10-17'
AND node_ip = '127.0.0.1'
ORDER BY time DESC
LIMIT 5;

Query latency snapshot


Record phase-level cumulative percentile latency statistics for queries over time.

All statistics reset upon node restart.


This table is configured with gc_grace_seconds 0 to avoid issues with persistent tombstones as rows
expire; tombstones are removed during compaction no matter how recently they were created.

JMX Analog

com.datastax.bdp/search/core/QueryMetrics

See IndexPool MBean.


Schema

CREATE TABLE dse_perf.solr_query_latency_snapshot (


node_ip inet,
core text,
date timestamp,
time timestamp,
phase text,
count bigint,
latency_percentiles_micros maptext, bigint PRIMARY KEY ((node_ip, core), phase, time)
)
WITH CLUSTERING ORDER BY (phase ASC, time DESC)
AND gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the snapshot was recorded.
time timestamp Time the snapshot was recorded.
phase text EXECUTE, COORDINATE, RETRIEVE
count bigint Cumulative number of queries recorded.
latency_percentiles_micros maptext, Cumulative latency percentiles of query: 25%, 50%, 75%,
bigint 95%, 99% and 99.9%

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
420
Using DataStax Enterprise advanced functionality

Snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_query_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:

SELECT *
FROM solr_query_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;

Collecting Apache Solr performance statistics [Enable the solr_latency_snapshot_options parameter in


dse.yaml and set the other options as required.]
Update latency snapshot
Record phase-level cumulative percentile latency statistics for updates over time.

All statistics reset upon node restart.


This table is configured with gc_grace_seconds 0 to avoid issues with persistent tombstones as rows
expire; tombstones are removed during compaction no matter how recently they were created.

JMX analog

com.datastax.bdp/search/core/UpdateMetrics

See IndexPool MBean.


Schema

CREATE TABLE dse_perf.solr_update_latency_snapshot (


node_ip inet,
core text,
date timestamp,
time timestamp,
phase text,
count bigint,
latency_percentiles_micros map<text, bigint>
PRIMARY KEY ((node_ip, core), phase, time)
)
WITH CLUSTERING ORDER BY (phase ASC, time DESC)
AND gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the snapshot was recorded.
time timestamp Time the snapshot was recorded.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
421
Using DataStax Enterprise advanced functionality

Field Type Purpose


phase text WRITE, QUEUE, PREPARE, EXECUTE
count bigint Cumulative number of queries recorded.
latency_percentiles_micros map<text,bigint> Cumulative latency percentiles of query: 25%, 50%, 75%,
95%, 99% and 99.9%.

Snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_update_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:

SELECT *
FROM solr_update_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;

Collecting Apache Solr performance statistics [Enable the solr_latency_snapshot_options parameter in


dse.yaml and set the other options as required.]
Commit latency snapshot
Record phase-level cumulative percentile latency statistics for commits over time.

All statistics reset upon node restart.


This table is configured with gc_grace_seconds 0 to avoid issues with persistent tombstones as rows
expire; tombstones are removed during compaction no matter how recently they were created.

JMX Analog

com.datastax.bdp/search/core/CommitMetrics

See Commit metrics MBean.


Schema

CREATE TABLE dse_perf.solr_commit_latency_snapshot (


node_ip inet,
core text,
date timestamp,
time timestamp,
phase text,
count bigint,
latency_percentiles_micros maptext, bigint
PRIMARY KEY ((node_ip, core, date), phase, time)
)
WITH CLUSTERING ORDER BY (phase ASC, time DESC)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
422
Using DataStax Enterprise advanced functionality

AND gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the snapshot was recorded.
time timestamp Time the snapshot was recorded.
phase text FLUSH, EXECUTE
count bigint Cumulative number of queries recorded.
latency_percentiles_micros maptext, Cumulative latency percentiles of query: 25%, 50%, 75%,
bigint 95%, 99% and 99.9%.

Snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_commit_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:

SELECT *
FROM solr_commit_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;

Collecting Apache Solr performance statistics [Enable the solr_latency_snapshot_options parameter in


dse.yaml and set the other options as required.]
Merge latency snapshot
Record phase-level cumulative percentile latency statistics for index merges over time.

All statistics reset upon node restart.


This table is configured with gc_grace_seconds 0 to avoid issues with persistent tombstones as rows
expire; tombstones are removed during compaction no matter how recently they were created.

JMX analog

com.datastax.bdp/search/core/MergeMetrics

See Merge metrics MBean.


Schema

CREATE TABLE dse_perf.solr_merge_latency_snapshot (


node_ip inet,
core text,
date timestamp,
time timestamp,

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
423
Using DataStax Enterprise advanced functionality

phase text,
count bigint,
latency_percentiles_micros maptext, bigint
PRIMARY KEY ((node_ip, core, date), phase, time)
)
WITH CLUSTERING ORDER BY (phase ASC, time DESC)
AND gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the snapshot was recorded.
time timestamp Time the snapshot was recorded.
phase text INIT, WARM, EXECUTE
count bigint Cumulative number of queries recorded.
latency_percentiles_micros maptext, Cumulative latency percentiles of query: 25%, 50%, 75%,
bigint 95%, 99% and 99.9%.

Snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_merge_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table” on
the node 127.0.0.1:

SELECT *
FROM solr_merge_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;

Collecting Apache Solr performance statistics [Enable the solr_latency_snapshot_options parameter in


dse.yaml and set the other options as required.]
Filter cache statistics
Record core-specific filter cache statistics over time.

All statistics reset upon node restart.


This table is configured with gc_grace_seconds 0 to avoid issues with persistent tombstones as rows
expire; tombstones are removed during compaction no matter how recently they were created.

Solr exposes a core’s filter cache statistics through its registered index searcher, but the core may have many
index searchers over its lifetime. To reflect this, statistics are provided for the currently registered searcher as
well as cumulative/lifetime statistics.
If the dseFilterCache hit_ratio declines over time, and this hit_ratio decline corresponds to a higher average
latency from the QueryMetrics.getAverageLatency(EXECUTE, null) MBean, consider increasing the size of
your filter cache in the search index config.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
424
Using DataStax Enterprise advanced functionality

JMX analog

solr/core/dseFilterCache/com.datastax.bdp.search.solr.FilterCacheMBean

Schema

CREATE TABLE dse_perf.solr_filter_cache_stats (


node_ip inet,
core text,
date timestamp,
time timestamp,
hits bigint,
inserts bigint,
evictions bigint,
hit_ratio float,
lookups bigint,
num_entries bigint,
cumulative_lookups bigint,
cumulative_hits bigint,
cumulative_hitratio float,
cumulative_inserts bigint,
cumulative_evictions bigint,
warmup_time bigint,
PRIMARY KEY ((node_ip, core, date), time)
)
WITH gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the statistics were recorded.
time timestamp The exact time the statistics were recorded.
hits bigint Cache hits for the registered index searcher.
inserts bigint Cache insertions for the registered index searcher.
evictions bigint Cache evictions for the registered index searcher.
hit_ratio float The ratio of cache hits/lookups for the registered index searcher.
lookups bigint Cache lookups for the registered index searcher.
num_entries bigint Number of cache entries for the registered index searcher.
cumulative_lookups bigint Cumulative cache lookups for the core.
cumulative_hits bigint Cumulative cache hits for the core.
cumulative_hitratio float Cumulative ratio of cache hits/lookups for the core.
cumulative_inserts bigint Cumulative cache inserts for the core.
cumulative_evictions bigint Cumulative cache evictions for the core.
warmup_time bigint Warm-up time for the registered index searcher.

Snapshots for cumulative statistics recorded on 10/17/2014 for core “keyspace.table” on the node
127.0.0.1:

SELECT cumulative_lookups, cumulative_hits, cumulative_hitratio,


cumulative_inserts
FROM solr_filter_cache_stats

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
425
Using DataStax Enterprise advanced functionality

WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_filter_cache_stats
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;

Collecting cache statistics [Enable the solr_cache_stats_options parameter in dse.yaml and set the other
options as required.]
Query result cache statistics
Record core-specific query result cache statistics over time.
The core result cache statistics is exposed through its registered index searcher, but the core may have many
index searchers over its lifetime. To reflect the index searchers, statistics for the currently registered searcher
are provided with cumulative/lifetime statistics.
JMX analog

solr/core/queryResultCache/*

Schema

CREATE TABLE dse_perf.solr_result_cache_stats (


node_ip inet,
core text,
date timestamp,
time timestamp,
hits bigint,
inserts bigint,
evictions bigint,
hit_ratio float,
lookups bigint,
num_entries bigint,
cumulative_lookups bigint,
cumulative_hits bigint,
cumulative_hitratio float,
cumulative_inserts bigint,
cumulative_evictions bigint,
warmup_time bigint,
PRIMARY KEY ((node_ip, core, date), time)
)
WITH gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the statistics were recorded.
time timestamp The exact time the statistics were recorded.
hits bigint Cache hits for the registered index searcher.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
426
Using DataStax Enterprise advanced functionality

Field Type Purpose


inserts bigint Cache insertions for the registered index searcher.
evictions bigint Cache evictions for the registered index searcher.
hit_ratio float The ratio of cache hits / lookups for the registered index searcher.
lookups bigint Cache lookups for the registered index searcher.
num_entries bigint Number of cache entries for the registered index searcher.
cumulative_lookups bigint Cumulative cache lookups for the core.
cumulative_hits bigint Cumulative cache hits for the core.
cumulative_hitratio float Cumulative ratio of cache hits/lookups for the core.
cumulative_inserts bigint Cumulative cache inserts for the core.
cumulative_evictions bigint Cumulative cache evictions for the core.
warmup_time bigint Warm-up time for the registered index searcher.

Snapshots for cumulative statistics recorded on 10/17/2014 for core keyspace.table on the node
127.0.0.1:

SELECT cumulative_lookups, cumulative_hits, cumulative_hitratio,


cumulative_inserts
FROM solr_result_cache_stats
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_result_cache_stats
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;

Collecting cache statistics [Enable the solr_cache_stats_options parameter in dse.yaml and set the other
options as required.]
Index statistics
Record core-specific index overview statistics over time.
JMX analog

solr/core_name/core/core_name & solr/core_name/Searcher*

Schema

CREATE TABLE dse_perf.solr_index_stats (


node_ip inet,
core text,
date timestamp,
time timestamp,
size_in_bytes bigint,
num_docs int,
max_doc int,
docs_pending_deletion int,

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
427
Using DataStax Enterprise advanced functionality

PRIMARY KEY ((node_ip, core, date), time)


)
WITH gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date text Midnight on the mm/dd/yyyy the statistics were recorded.
time bigint The exact time the statistics were recorded.
size_in_bytes bigint Index size on file system.
num_docs int The number of documents inserted into index.
max_docs int The number of documents inserted into index, plus those marked as
removed, but not yet physically removed.
docs_pending_deletion int max_docs - num_docs

Snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_index_stats
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_index_stats
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;

Update handler statistics


Record core-specific direct update handler statistics over time.

Do not to confuse this with Update request handler statistics.

A few fields in this table have both cumulative and non-cumulative versions. The non-cumulative statistics
are zeroed out following rollback or commit, while the cumulative versions persist through those events.
The exception is errors, which is actually cumulative and takes into account a few failure cases that
cumulative_errors does not.

JMX analog

solr/core/updateHandler

Schema

CREATE TABLE dse_perf.solr_update_handler_metrics (


node_ip inet,
core text,
date timestamp,
time timestamp,

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
428
Using DataStax Enterprise advanced functionality

adds bigint,
cumulative_adds bigint,
commits bigint,
autocommits int,
autocommit_max_time text,
autocommit_max_docs int,
soft_autocommits int,
soft_autocommit_max_docs int,
soft_autocommit_max_time text,
deletes_by_id bigint,
deletes_by_query bigint,
cumulative_deletes_by_id bigint,
cumulative_deletes_by_query bigint,
expunge_deletes bigint,
errors bigint,
cumulative_errors bigint,
docs_pending bigint,
optimizes bigint,
rollbacks bigint,
PRIMARY KEY ((node_ip, core, date), time)
)
WITH gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the statistics were recorded.
time timestamp Exact time the statistics were recorded.
adds bigint Document add commands since last commit/rollback.
cumulative_adds bigint Cumulative document additions.
commits long Number of explicit commit commands issued.
autocommits int Number of auto-commits executed.
autocommit_max_time text Maximum time between auto-commits.
autocommit_max_docs int Maximum document adds between auto-commits.
soft_autocommits int Number of soft auto-commits executed.
soft_autocommit_max_docs int Maximum time between soft auto-commits.
soft_autocommit_max_time int Maximum document adds between soft auto-commits.
deletes_by_id long Currently uncommitted deletions by ID.
deletes_by_query bigint Currently uncommitted deletions by query.
cumulative_deletes_by_id bigint Cumulative document deletions by ID.
cumulative_deletes_by_query bigint Cumulative document deletions by ID.
expunge_deletes bigint Number of commit commands issued with expunge deletes.
errors bigint Cumulative errors for add/delete/commit/rollback commands.
cumulative_errors bigint Cumulative errors for add/delete commands.
docs_pending bigint Number of documents pending commit.
optimizes bigint Number of explicit optimize commands issued.
rollbacks bigint Number of rollbacks executed.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
429
Using DataStax Enterprise advanced functionality

Snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_update_handler_metrics
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_update_handler_metrics
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;

Collecting request handler metrics [How to enable the solr_request_handler_metrics_options parameter in


dse.yaml and set options.]
Update request handler statistics
Record core-specific update request handler statistics over time.

Do not to confuse this with Update handler statistics.

JMX analog

solr/core/update[/ | /<csv | /json]

Schema

CREATE TABLE dse_perf.solr_update_request_handler_metrics (


node_ip inet,
core text,
date timestamp,
handler_name text,
time timestamp,
requests bigint,
errors bigint,
timeouts bigint,
total_time_seconds double,
avg_requests_per_second double,
five_min_rate_reqs_per_second double,
fifteen_min_rate_reqs_per_second double,
PRIMARY KEY ((node_ip, core, date), handler_name, time)
)
WITH CLUSTERING ORDER BY (handler_name ASC, time DESC)
AND gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the statistics were recorded.
handler_name text A handler name specified in the search index config.
time timestamp Exact time the statistics were recorded.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
430
Using DataStax Enterprise advanced functionality

Field Type Purpose


requests bigint Number of requests processed by the handler.
errors bigint Number of errors encountered by the handler.
timeouts bigint Number of responses received with partial results.
total_time double The sum of all request processing times.
avg_requests_per_second double Average number of requests per second.
five_min_rate_reqs_per_second double Requests per second over that past 5 minutes.
fifteen_min_rate_reqs_per_second double Requests per second over that past 15 minutes.

Snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:

SELECT *
FROM solr_update_request_handler_metrics
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots for handler “search” recorded on 10/17/2014 for core keyspace.table on the
node 127.0.0.1:

SELECT *
FROM solr_search_request_handler_metrics
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND handler_name = 'search'
LIMIT 5;

Collecting request handler metrics [How to enable the solr_request_handler_metrics_options parameter in


dse.yaml and set options.]
Search request handler statistics
Record core-specific search request handler statistics over time.
JMX analog

solr/core/search

Schema

CREATE TABLE dse_perf.solr_search_request_handler_metrics (


node_ip inet,
core text,
date timestamp,
handler_name text,
time timestamp,
requests bigint,
errors bigint,
timeouts bigint,
total_time_seconds double,
avg_requests_per_second double,
five_min_rate_reqs_per_second double,
fifteen_min_rate_reqs_per_second double,
PRIMARY KEY ((node_ip, core, date), handler_name, time)
)
WITH CLUSTERING ORDER BY (handler_name ASC, time DESC)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
431
Using DataStax Enterprise advanced functionality

AND gc_grace_seconds=0

Field Type Purpose


node_ip inet Node IP address.
core text Search core name, such as keyspace.table.
date timestamp Midnight on the mm/dd/yyyy the statistics were recorded.
handler_name text A handler name specified in the search index config.
time timestamp Exact time the statistics were recorded.
requests bigint Number of requests processed by the handler.
errors bigint Number of errors encountered by the handler.
timeouts bigint Number of responses received with partial results.
total_time_seconds double The sum of all request processing times.
avg_requests_per_second double Average number of requests per second.
five_min_rate_reqs_per_second double Requests per second over that past 5 minutes.
fifteen_min_rate_reqs_per_second double Requests per second over that past 15 minutes.

Snapshots recorded for all update handlers on 10/17/2014 for core keyspace.table on the node
127.0.0.1:

SELECT *
FROM solr_search_request_handler_metrics
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';

Most recent 5 snapshots for handler “/update/json” recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:

SELECT *
FROM solr_search_request_handler_metrics
WHERE node_ip = '127.0.0.1' AND
core = 'keyspace.table'
AND date = '2014-10-17'
AND handler_name = '/update/json'
LIMIT 5;

Collecting request handler metrics [How to enable the solr_request_handler_metrics_options parameter in


dse.yaml and set options.]
Best Practice Service
This OpsCenter service scans DataStax Enterprise clusters and automatically detects issues that threaten the
cluster’s security, availability or performance.
See Best Practice Service.
Capacity Service
This OpsCenter service accumulates cluster health and resource utilization metrics to perform historical trend
analysis. This process helps you understand cluster performance over time, and includes powerful forecasting
capabilities to predict future usage and growth.
See Capacity Service.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
432
Using DataStax Enterprise advanced functionality

Repair Service
This OpsCenter service performs repair operations in the background across a DataStax Enterprise cluster with
minimal impact. This process alleviates the potential performance impact of having to periodically run repair on
entire nodes.
See Repair Service.

DSE Advanced Replication


DSE Advanced Replication supports configurable distributed data replication from source clusters to destination
clusters. It is designed to tolerate sporadic connectivity that can occur in constrained environments, such as retail,
oil-and-gas remote sites, and cruise ships.
To learn about replication, see Data distribution overview.

About DSE Advanced Replication


DSE Advanced Replication supports configurable distributed data replication from source clusters to destination
clusters. It is designed to tolerate sporadic connectivity that can occur in constrained environments, such as
retail, oil-and-gas remote sites, and cruise ships.
To learn about replication, see Data distribution overview.

Features
Smartly replicates data from Supports replicating data in a spoke and hub configuration from remote locations to central data hubs and
source clusters to destination repositories. Enterprise customers with remote clusters are able to establish a cluster presence in each
clusters location. In addition, mesh configuration can replicate data from any source cluster to another destination
cluster within reasonable limits.

Prioritizes data streams Allows higher priority data streams to be sent from the source cluster to a destination cluster ahead of lower
priority data streams.

Supports ingestion and DSE Advanced Replication enables ingesting and querying data at any source and sent to any destination
querying of data at every that collects and analyzes data from all of the sites.
source

Solves problem of periodic Useful for energy (oil and gas), transportation, telecommunications, retail (point-of-sale systems), and other
downtime vertical markets that might experience periods of network or internet downtime at the remote locations.

Satisfies data sovereignty Provides configurable streams of selected outbound data, while preventing data changes to inbound data.
regulations

Satisfies data locality Prevents data from leaving the current geography.
regulations

DSE Advanced Replication architecture


DSE Advanced Replication enables configurable replication between clusters, identifying source and destination
clusters with replication channels. Topologies such as hub-and-spoke or mesh networks can differentially push
or pull data depending on operational needs.
A common operational scenario for DSE Advanced Replication is a network of remote sensors with poor network
connection to a centrally located storage and analytics network. The remote edge clusters collect data, but
can experience disconnections from the network and periodically send one-way updates to the central hub
clusters when a connection is available. Some sensors may be deemed more important than others, requiring
prioritization of transmission. All sensors can continue to collect data, and to transmit in a specified manner, or
have collection turned off as needed. Each remote sensor cluster would be designated as a source, while the
central database cluster would be a destination.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
433
Using DataStax Enterprise advanced functionality

Figure 9:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
434
Using DataStax Enterprise advanced functionality

This configuration would also be suitable to a network of microservices clusters that report data to a central
analytics cluster.
Another scenario may include similar remote sites that mainly send data to a centralized location, but must
periodically be updated with information from the centralized location. In this scenario, each remote cluster would
be both a source and a destination, with two channels designated, one upstream and one downstream. A small
Point of Sale (POS) system serves as a possible model for this scenario, with periodic updates to the remote
systems.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
435
Using DataStax Enterprise advanced functionality

Figure 10:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
436
Using DataStax Enterprise advanced functionality

A mesh network can also use advanced replication, with remote clusters receiving updates from either a central
location or another remote cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
437
Using DataStax Enterprise advanced functionality

Figure 11:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
438
Using DataStax Enterprise advanced functionality

Although any cluster, remote or centralized, may serve as a source for an advanced replication channel, a limited
number of destinations can be configured for any one source. In general, consider the flow of replication as many
sources to few destinations, rather than few sources to many destinations.
Traffic between the clusters
Traffic between the source cluster and the destination cluster is managed with permits, priority, and configurable
failover behavior for multi-datacenter operation.
Permits
Traffic between the source cluster and the destination cluster is managed with permits. When a permit cannot be
acquired, the message is postponed and waits in the replication log until it is processed when a permit becomes
available. Permits are global and not per destination.
To manage permits and set the maximum number of messages that can be replicated to all destinations
simultaneously, use dse advrep conf:

$ dse advrep conf update --permits 1000

The default is 1024.


Channel with a higher priority will take precedence in acquiring permits. Permits are required to transmit data
from a source to a destination.
Priority and FIFO/LIFO enablement
The commit log is flushed from memory to disk, writing the data to the appropriate table. A Capture-Data-Change
(CDC) collection agent additionally filters the data written and creates replication log files on disk. Each channel
source table will have a separate data directory created on disk into which data is appended each time the
commit log is flushed, storing all the messages that are to be replicated to a destination. Several replication log
files may exist per source table at any given time. Each file stores a contiguous time-slice, configurable with dse
advrep conf update command and the --collection-time-slice-width option (default: 60 seconds). A
CDC transmission agent then sends the messages stored in the replication log files to the destination, where
the data is processed and written to the appropriate database table. The order in which source table data is
transmitted can be altered with the priority option when creating a channel, and the order in which a source
table's replication log files are read can be tuned with the --fifo-enabled and --lifo-enabled options.
The replication log files are processed according to the time and priority of the replication channel. Replication
channel priorities are set per table, and determines how the transmission agent orders the transmission of
replication log files from the source to the destination. The replication log files can be passed to the destination
in either last in, first out (LIFO) or first in, first out (FIFO); FIFO is the default. If the newest messages should be
read first, use LIFO; if the oldest messages should be read first, use FIFO. Once an individual replication log file
is transmitted, the messages it contains are read FIFO. Both options, priority and read order, can be set during
channel creation:

$ dse advrep --host 192.168.3.10 channel create --source-keyspace foo --source-table


bar --source-id source1 --source-id-column source_id --destination mydest --destination-
keyspace foo --destination-table bar --collection-enabled true --priority 1 --lifo-enabled

This example sets the channel for table foo.bar to the top priority of one, so that the table's replication log files
will be transmitted before other table's replication log files. It also sets the replication log files to be read from
newest to oldest.
Configure automatic failover for hub clusters with multiple datacenters
DSE Advanced Replication uses the DSE Java driver load balancing policy to communicate with the
hub cluster. You can explicitly define the local datacenter for the datacenter-aware round robin policy
(DCAwareRoundRobinPolicy) that is used by the DSE Java driver.
You can enable or disable failover from a local datacenter to a remote datacenter. When multiple datacenter
failover is configured and a local datacenter fails, data replication from the edge to the hub continues using the
remote datacenter. Tune the configuration with these parameters:
driver-local-dc

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
439
Using DataStax Enterprise advanced functionality

For destination clusters with multiple datacenters, you can explicitly define the name of the datacenter
that you consider local. Typically, this is the datacenter that is closest to the source cluster. This value is
used only for clusters with multiple data enters.
driver-used-hosts-per-remote-dc
To use automatic failover for destination clusters with multiple datacenters, you must define
the number of hosts per remote datacenter that the datacenter aware round robin policy
(DCAwareRoundRobinPolicy) considers available.
driver-allow-remote-dcs-for-local-cl
Set to true to enable automatic failover for destination clusters with multiple datacenters. The value of
the driver-consistency-level parameter must be LOCAL_ONE or LOCAL_QUORUM.
To enable automatic failover with a consistency level of LOCAL_QUORUM, use dse advrep destination update:

$ dse advrep destination update --name mydest --driver-allow-remote-dcs-for-local-


cl true --driver-consistency-level LOCAL_QUORUM Destination mydest updated Updated
driver_allow_remote_dcs_for_local_cl from null to true Updated driver_consistency_level
from ONE to LOCAL_QUORUM

DSE Advanced Replication terminology


This terminology is specific to DSE Advanced Replication that supports distributed data replication from a
DataStax Enterprise source cluster to a destination cluster.
collection agent
The process thread that runs on the source cluster that captures the incoming changes and populates
the replication log.
destination cluster
The cluster to which the data flow is going from the source cluster.
source cluster
A cluster that primarily sends data to one or more destination clusters. DSE Advanced Replication must
be enabled on the source cluster.
source datacenter
A datacenter of a source cluster.
destination cluster
A cluster that generally supports one or more source clusters that replicate data to the destination
cluster. DSE Advanced Replication is not required on the destination cluster.
destination datacenter
A datacenter of a destination cluster.
isolated
The state of a cluster when there is not a live connection between the source cluster and the destination
cluster.
replication agent
The process thread that runs on the source cluster that reads data from the replication log and transmits
that data to the destination cluster.
replication channel
A defined channel of change data between source clusters and destination clusters. A replication
channel is defined by the source cluster, source keyspace, source table name, destination cluster,
destination keyspace, and destination table name.
replication channel priority
The priority order of which replication channel has precedence when limited bandwidth occurs between
the source cluster and the destination cluster.
replication log
The replication log on the source cluster stores data in preparation for transmission to the destination
cluster.
tethered
The state when there is a live connection between the source cluster and the destination cluster.
Getting started with DSE Advanced Replication
To test Advanced Replication, you must set up an source cluster and a destination cluster. These steps set up
one node in each cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
440
Using DataStax Enterprise advanced functionality

Getting started overview:

1. Setting up the destination cluster node

2. Setting up the source cluster

3. Creating sample keyspace and table

4. Configuring replication on the source node

5. Creating the replication channel

6. Starting replication from source to destination

7. Inserting data on the source

8. Testing loss of connectivity

9. Testing replication start and stop

Due to Cassandra-11368, list inserts might not be idempotent (unchanged). Because DSE Advanced
Replication might deliver the same message to the destination more than once, this Cassandra bug might
lead to data inconsistency if lists are used in a column family schema. DataStax recommends using other
collection types, like sets or frozen lists, when ordering is not important.

Setting up the destination cluster node


Prerequisite: If you are using Advanced Replication V1 from DSE 5.0, you must upgrade to DSE 5.1 and
migrate to Advanced Replication V2.

On the destination node:

1. Install DataStax Enterprise.

2. Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation
method.

3. Note the public IP address for the destination node.

Setting up the source cluster


Advanced replication can operate in a mixed-version environment. The source cluster requires DataStax
Enterprise 5.1 or later. On the source node:

1. Install DataStax Enterprise 5.1 or later.

2. To enable replication, edit the dse.yaml file.


At the end of the file, uncomment the advanced_replication_options setting and options, and set
enabled: true.

# Advanced Replication configuration settings


advanced_replication_options:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
441
Using DataStax Enterprise advanced functionality

enabled: true

3. Enable Capture-Data-Change (CDC) in the cassandra.yaml file on a per-node basis for each source:

cdc_enabled: true

Advanced Replication will not start if CDC is not enabled, since CDC logs are used to implement the
feature.

4. Consider increasing the default CDC disk space, depending on the load (default: 4096 or 1/8 of the total
space where cdc_raw_directory resides):

cdc_total_space_in_mb: 16384

5. Commitlog compression is turned off by default. To avoid problems with advanced replication, this option
should NOT be used; ensure that the option is commented out:

# commitlog_compression:
# - class_name: LZ4Compressor

6. Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation
method.

7. Once advanced replication is started on a cluster, the source node will create keyspaces and tables that
need alteration. See Keyspaces for information.

Creating the sample keyspace and table


These steps show you how to create the demonstration keyspace and table.

1. On the source node and the destination node, create the sample keyspace and table:

CREATE KEYSPACE foo


WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor':1};

Remember to use escaped quotes around keyspace and table names as command line arguments to
preserve casing: dse advrep create --keyspace \"keyspaceName\" --table \"tableName\"

2. On the source node:

CREATE TABLE foo.bar (


name TEXT,
val TEXT,
scalar INT,
PRIMARY KEY (name));

3. On the destination node:

CREATE TABLE foo.bar (


name TEXT,
val TEXT,
scalar INT,
source_id TEXT,

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
442
Using DataStax Enterprise advanced functionality

PRIMARY KEY (name, source_id));

The source_id column is recommended as a column to include on the destination node. If the
destination table has a field in the primary key that uniquely determines the source from which the data is
replicated, the source_id is not required as part of the primary key. The source_id column is useful for
preventing overwrites if two records with the same primary key get replicated from different sources, and
you want to keep both records.

Configuring a replication destination on the source node


DSE Advanced Replication stores all of its settings in CQL tables. To configure replication, use the dse advrep
command line tool.
When you configure replication on the source node:

• The source node points to its destination using the public IP address that you saved earlier.

• The source-id value is a unique identifier for all data that comes from this particular source node.

• The source-id unique identifier is written to the source-id-column that was included when the foo.bar
table was created on the destination node.

To configure a replication destination, run this command:

dse advrep --verbose destination create --name mydest --addresses 10.200.182.148 --


transmission-enabled true

Destination mydest created

To verify the configuration, run this command:

dse advrep destination list-conf

--------------------------------------------------------------------------------------------
|destination|name |value
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_enabled |false
|
--------------------------------------------------------------------------------------------
|mydest |addresses |10.200.182.148
|
--------------------------------------------------------------------------------------------
|mydest |driver_read_timeout |15000
|
--------------------------------------------------------------------------------------------
|mydest |driver_connections_max |8
|
--------------------------------------------------------------------------------------------
|mydest |source_id_column |source_id
|
--------------------------------------------------------------------------------------------
|mydest |driver_connect_timeout |15000
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_protocol |TLS
|
--------------------------------------------------------------------------------------------
|mydest |driver_consistency_level |QUORUM
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
443
Using DataStax Enterprise advanced functionality

--------------------------------------------------------------------------------------------
|mydest |driver_used_hosts_per_remote_dc |0
|
--------------------------------------------------------------------------------------------
|mydest |driver_allow_remote_dcs_for_local_cl|false
|
--------------------------------------------------------------------------------------------
|mydest |driver_compression |lz4
|
--------------------------------------------------------------------------------------------
|mydest |driver_connections |1
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_cipher_suites |
[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
444
Using DataStax Enterprise advanced functionality

| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
445
Using DataStax Enterprise advanced functionality

| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
446
Using DataStax Enterprise advanced functionality

| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
|
--------------------------------------------------------------------------------------------
|mydest |source_id |source1
|
--------------------------------------------------------------------------------------------
|mydest |transmission_enabled |true
|
--------------------------------------------------------------------------------------------

Creating the replication channel


A replication channel is a defined channel of change data between source clusters and destination clusters. A
replication channel is defined by the source cluster, source keyspace, source table name, destination cluster,
destination keyspace, and destination table name. Source clusters can exists in multi-datacenter clusters, but a
replication channel is configured with only one datacenter as the responsible party.
The keyspace and table names on the destination can be different than on the source, but in this example they
are the same. You can also set the source-id and source-id-column differently from the global setting.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
447
Using DataStax Enterprise advanced functionality

To create the replication channel for our keyspace and table:

dse advrep channel create --source-keyspace foo --source-table bar --source-id source1 --
source-id-column source_id --destination mydest --destination-keyspace foo --destination-
table bar --collection-enabled true --transmission-enabled true --priority 1

Created channel dc=Cassandra keyspace=foo table=bar to mydest

dse advrep channel status

--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |1 |foo
|bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------

The designated keyspace for a replication channel must have durable writes enabled. If durable_writes =
false, then an error message will occur and the channel will not be created. If the durable writes setting is
changed after the replication channel is created, the tables will not write to the commit log and CDC will not
work. The data will not be ingested through the replication channel and a warning is logged, but the failure will
be silent.

Starting replication from source to destination


At this point, the replication is configured and the replication channel is enabled and replication has been started.

1. On the destination, use cqlsh to verify that no data is present:

SELECT * FROM foo.bar;

name | source_id | scalar | val


------+---------+--------+-----
(0 rows)

2. On the source, replication to the destination can be paused or resumed, the latter shown here:

dse advrep channel resume --source-keyspace foo --source-table bar --transmission

Channel dc=Cassandra keyspace=foo table=bar collection to mydest was resumed

Notice that either --transmission or --collection can be specified, to resume transmission from the
source to the destination or to resume collection of data on the source.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
448
Using DataStax Enterprise advanced functionality

3. Review the number of records that are in the replication log. Because no data is inserted yet, the record
count in the replication log is 0:

dse advrep replog count --destination mydest --source-keyspace foo --source-table bar

Inserting data on the source


Insert data on the source for replication to the destination.

1. On the source, insert data using cqlsh:

INSERT INTO foo.bar (name, val, scalar) VALUES ('a', '1', 1);
INSERT INTO foo.bar (name, val, scalar) VALUES ('b', '2', 2);

2. On the destination, verify that the data was replicated:

SELECT * FROM foo.bar;

name | source_id | scalar | val


------+---------+--------+-----
a | source1 | 1 | 1
b | source1 | 2 | 2
(2 rows)

Checking data on the destination


Check data on the destination.

1. On the destination, verify that the data was replicated:

SELECT * FROM foo.bar;

name | source_id | scalar | val


------+---------+--------+-----
a | source1 | 1 | 1
b | source1 | 2 | 2
(2 rows)

Testing loss of connectivity


To test loss of connectivity to the destination, stop the DataStax Enterprise process on the destination, and insert
more data on the source. The expected result is for data to be replicated quickly after the destination cluster
resumes.

1. On the destination cluster, stop DataStax Enterprise:

dse cassandra-stop

2. On the source, insert more data:

INSERT INTO foo.bar (name, val, scalar) VALUES ('c', '3', 3);

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
449
Using DataStax Enterprise advanced functionality

INSERT INTO foo.bar (name, val, scalar) VALUES ('d', '4', 4);

3. Review the number of records that are in the replication log. The replication log should have 2 entries:

dse advrep replog count --destination mydest --source-keyspace foo --source-table bar

4. On the destination, restart DataStax Enterprise.

dse cassandra

Wait a moment for communication and data replication to resume to replicate the new records from the
source to destination.

SELECT * FROM foo.bar;

name | source_id | scalar | val


------+---------+--------+-----
a | source1 | 1 | 1
c | source1 | 3 | 3
d | source1 | 4 | 4
b | source1 | 2 | 2
4 rows(s)

5. On the source, the replication log count should be back to 0:

dse advrep replog count --destination mydest --source-keyspace foo --source-table bar

Testing replication start and stop


Similar to testing loss of connectivity, you can pause and resume individual replication channels by using the
advrep command line tool. The expected result is that newly inserted data is not saved to the replication log and
will never be sent to the destination.

1. On the source, pause the replication channel:

dse advrep --verbose channel pause --keyspace foo --table bar --collection

2. Insert more data.

3. On the source, resume the replication channel:

dse advrep --verbose channel resume --keyspace foo --table bar --collection

DSE Advanced Replication keyspace overview


Keyspaces and tables are automatically created on the source cluster when DSE Advanced Replication runs for
the first time. Two keyspaces are used, dse_system and dse_advrep. Each keyspace is configured differently.
System keyspaces on the source and destination are not supported for advanced replication.

The dse_system keyspace uses the EverywhereStrategy replication strategy by default; this setting must not be
altered. The dse_advrep keyspace is configured to use the SimpleStrategy replication strategy by default and

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
450
Using DataStax Enterprise advanced functionality

this setting must be updated in production environments to avoid data loss. After starting the cluster, alter the
keyspace to use the NetworkTopologyStrategy replication strategy with an appropriate settings for the replication
factor and datacenters. For example, use a CQL statement to configure a replication factor of 3 on the DC1
datacenter using NetworkTopologyStrategy:

ALTER KEYSPACE dse_advrep


WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'DC1': '3'};

For most environments using DSE Advanced Replication, a replication factor of 3 is suitable. The strategy must
be configured for any datacenters which are serving as an advanced replication source.
nodetool repair must be run on each node of the affected datacenters. to repair the altered keyspace:

nodetool repair -full dse_advrep

For more information, see Changing keyspace replication strategy.


DSE Advanced Replication data types
DSE data types are supported for most operations in DSE Advanced Replication. The following table shows the
supported data types and operations:
Data Type Advanced Replication Operations

Primitive data types: int, ascii, bigint, blob, boolean, decimal, double, All types are implemented for insert/update/delete.
float, inet, text, timestamp, timeuuid, uuid, varchar, varint

Frozen collections: frozen<list<data_type>>, frozen<set<ddata_type>>, All frozen collections are implemented for insert/update/delete, as
frozen<map<data_type, data_type>> values are immutable blocks - entire column value is replicated.

Tuples: tuple<data_type, data_type, data_type>, All tuples are implemented for insert/update/delete, as values are
frozen<tuple<data_type, data_type, data_type> immutable blocks - entire column value is replicated.

Frozen user-defined type (UDT): UDT type and frozen UDT type All UDTs are implemented for insert/update/delete, as values are
immutable blocks - entire column value is replicated.

Geometric types: Point, LineString, Polygon All geometric types are implemented for insert/update/delete.

The following table shows the data type and operations that are not supported in DSE Advanced Replication:
Data Type Advanced Replication Operations

Unfrozen updatable collections: <list<data_type>>, <set<ddata_type>>, All unfrozen updatable collections are implemented for insert/delete
<map<data_type, data_type>> if the entire column value is replicated. Unfrozen collections cannot
update values.

Unfrozen updatable user-defined type (UDT) All unfrozen updatable UDTs are implemented for insert/delete if
the entire column value is replicated. Unfrozen UDTs cannot update
values.

Using DSE Advanced Replication


Operations including starting, stopping, and configuring DSE Advanced Replication.

1. Starting DSE Advanced Replication

2. Stopping DSE Advanced Replication

3. Configuring global configuration settings

4. Configuring destination settings

5. Configuring channel settings

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
451
Using DataStax Enterprise advanced functionality

6. Security

7. Data insert methods

8. Monitoring operations

Starting DSE Advanced Replication


Prerequisite: If you are using Advanced Replication V1 from DSE 5.0, you must upgrade to DSE 5.1 and
migrate to Advanced Replication V2.

Before you can start and use DSE Advanced Replication, you must create the user keyspaces and tables on the
source cluster and the destination cluster.
On all nodes in the source cluster:

1. Enable replication in the dse.yaml file.


Uncomment all advanced_replication_options entries, set enabled: true, and specify a directory to
hold advanced replication log files with advanced_replication_directory:

# Advanced Replication configuration settings


advanced_replication_options:
enabled: true
advanced_replication_directory: /var/lib/cassandra/advrep

2. Enable Capture-Data-Change (CDC) in the cassandra.yaml file on a per-node basis for each source:

cdc_enabled: true
cdc_raw_directory: /var/lib/cassandra/cdc_raw

Advanced Replication will not start if CDC is not enabled. Either use the default directory or change it to a
preferred location.

3. Consider increasing the default CDC disk space, depending on the load (default: 4096 MB or 1/8 of the total
space where cdc_raw_directory resides):

cdc_total_space_in_mb: 16384

4. Commitlog compression is turned off by default. To avoid problems with advanced replication, this option
should NOT be used:

# commitlog_compression:
# - class_name: LZ4Compressor

5. Do a rolling restart: restart the nodes in the source cluster one at a time while the other nodes continue to
operate online.

Disabling DSE Advanced Replication


When replication is not enabled, data is not written to the replication log. On all nodes in the source cluster:

1. To disable replication, edit the dse.yaml file.


In the advanced_replication_options section, set enabled: false.

# Advanced Replication configuration settings


advanced_replication_options:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
452
Using DataStax Enterprise advanced functionality

enabled: false

2. Do a rolling restart: restart the nodes in the source cluster one at a time while the other nodes continue to
operate online.

3. To clean out the data that was used for DSE Advanced Replication, use cqlsh to remove these keyspaces:

DROP TABLE dse_system.advrep_source_config;


DROP TABLE dse_system.advrep_destination_config;
DROP TABLE dse_system.advrep_repl_channel_config;
DROP KEYSPACE dse_advrep;

Configuring global configuration settings


Global settings apply to the entire source cluster. These global settings are stored in the CQL table
dse_system.advrep_source_config that is automatically created.
Change global settings by using the dse advrep command line tool with this syntax:

dse advrep conf ...

To view the source node configuration settings:

dse advrep conf list

The result is:

-----------------------------------
|name |value |
-----------------------------------
|audit_log_file |/tmp/myaudit.gz|
-----------------------------------
|audit_log_enabled|true |
-----------------------------------

The following table describes the configuration keys, their default values, and identifies when a restart of the
source node is required for the change to be recognized.
The dse advrep command line tool uses these configuration keys as command arguments to the dse advrep
command line tool.
Configuration key Default Description Restart
value required

permits 30,000 Maximum number of messages that can be replicated in parallel over all No
destinations.

source-id N/A Identifies this source cluster and all inserts from this cluster. The source-id No
must also exist in the primary key on the destination for population of the
source-id to occur.

collection-expire-after-write N/A

collection-time-slice-count 5 The number of files which are open in the ingestor simultaneously. Yes

collection-time-slice-width 60 seconds The time period in seconds for each data block ingested. Smaller time Yes
widths => more files. Larger timer widths => larger files but more data to
resend on CRC mismatches.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
453
Using DataStax Enterprise advanced functionality

Configuration key Default Description Restart


value required

invalid-message-log SYSTEM_LOG Select one of these logging strategies to adopt when an invalid message No
is discarded:
SYSTEM_LOG: Log the CQL query and the error message in the system
log on the destination.
CHANNEL_LOG: Store the CQL query and the error message in files in /
var/lib/cassandra/advrep/invalid_queries on the destination.
NONE: Perform no logging.
See Managing invalid messages. Requires node restart.

audit-log-enable false Specifies whether to store the audit log. Yes

audit-log-file /tmp/ Specifies the file name prefix template for the audit log file. The file name Yes
advrep_rl_audit.log is appended with .gz if compressed using gzip.

audit-log-max-life-span-mins 0 Specifies the maximum lifetime of audit log files. Periodically, when log Yes
files are rotated, audit log files are purged when they:

• Match the audit log file template

• And they have not been written to for more than the specified
maximum lifespan minutes

To disable purging, set to 0.

audit-log-rotate-time-mins 60 Specifies the time interval to rotate the audit log file. On rotation, the Yes
rotated file is appended with the log counter .[logcounter], incrementing
from [0]. To disable rotation, set to 0.

Configuring destination settings


A destination is a location to which source data will be written. Destinations are stored in the CQL table
dse_system.advrep_destination_config that is automatically created.
Change destination settings by using the dse advrep command line tool with this syntax:

dse advrep destination ...

You can verify the channel configuration before you change it. For example:

dse advrep destination list-conf

The result is:

--------------------------------------------------------------------------------------------
|destination|name |value
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_enabled |false
|
--------------------------------------------------------------------------------------------
|mydest |addresses |10.200.182.251
|
--------------------------------------------------------------------------------------------
|mydest |driver_read_timeout |15000
|
--------------------------------------------------------------------------------------------
|mydest |driver_connections_max |8
|
--------------------------------------------------------------------------------------------
|mydest |source_id_column |source_id
|
--------------------------------------------------------------------------------------------

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
454
Using DataStax Enterprise advanced functionality

|mydest |driver_connect_timeout |15000


|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_protocol |TLS
|
--------------------------------------------------------------------------------------------
|mydest |driver_consistency_level |QUORUM
|
--------------------------------------------------------------------------------------------
|mydest |driver_used_hosts_per_remote_dc |0
|
--------------------------------------------------------------------------------------------
|mydest |driver_allow_remote_dcs_for_local_cl|false
|
--------------------------------------------------------------------------------------------
|mydest |driver_compression |lz4
|
--------------------------------------------------------------------------------------------
|mydest |driver_connections |1
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_cipher_suites |
[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
455
Using DataStax Enterprise advanced functionality

| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
456
Using DataStax Enterprise advanced functionality

| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
457
Using DataStax Enterprise advanced functionality

| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
|
--------------------------------------------------------------------------------------------
|mydest |source_id |source1
|
--------------------------------------------------------------------------------------------
|mydest |transmission_enabled |true
|
--------------------------------------------------------------------------------------------
|llpdest |driver_ssl_enabled |false
|
--------------------------------------------------------------------------------------------
|llpdest |addresses |10.200.177.184
|
--------------------------------------------------------------------------------------------
|llpdest |driver_read_timeout |15000
|
--------------------------------------------------------------------------------------------
|llpdest |driver_connections_max |8
|
--------------------------------------------------------------------------------------------

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
458
Using DataStax Enterprise advanced functionality

|llpdest |source_id_column |source_id


|
--------------------------------------------------------------------------------------------
|llpdest |driver_connect_timeout |15000
|
--------------------------------------------------------------------------------------------
|llpdest |driver_ssl_protocol |TLS
|
--------------------------------------------------------------------------------------------
|llpdest |driver_consistency_level |ONE
|
--------------------------------------------------------------------------------------------
|llpdest |driver_used_hosts_per_remote_dc |0
|
--------------------------------------------------------------------------------------------
|llpdest |driver_allow_remote_dcs_for_local_cl|false
|
--------------------------------------------------------------------------------------------
|llpdest |driver_compression |lz4
|
--------------------------------------------------------------------------------------------
|llpdest |driver_connections |1
|
--------------------------------------------------------------------------------------------
|llpdest |driver_ssl_cipher_suites |
[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
459
Using DataStax Enterprise advanced functionality

| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
460
Using DataStax Enterprise advanced functionality

| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
461
Using DataStax Enterprise advanced functionality

| | |,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
|
--------------------------------------------------------------------------------------------
|llpdest |source_id |source1
|
--------------------------------------------------------------------------------------------
|llpdest |transmission_enabled |false
|
--------------------------------------------------------------------------------------------

The following table describes the configuration keys, their default values, and identifies when a restart of the
source node is required for the change to be recognized.

Configuration key Default Description Restart


value required

separator N/A Field separator. No

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
462
Using DataStax Enterprise advanced functionality

Configuration key Default Description Restart


value required

name N/A Name for destination (required). No

addresses none REQUIRED. A comma separated list of IP addresses that are used to No
connect to the destination cluster using the DataStax Java driver.

driver-allow-remote-dcs-for-local-cl false Set to true to enable automatic failover for destination clusters with Yes
multiple datacenters. The value of the driver-consistency-level
parameter must be LOCAL_ONE or LOCAL_QUORUM.

driver-compression lz4 The compression algorithm the DataStax Java driver uses to send data Yes
from the source to the destination. Supported values are lz4 and snappy.

driver-connect-timeout 15000 Time in milliseconds the DataStax Java driver waits to connect to a No
server.

driver-connections 32 The number of connections the DataStax Java driver will create. Yes

driver-connections-max 256 The maximum number of connections the DataStax Java driver will Yes
create.

driver-max-requests-per- 1024 The maximum number of requests per connection the DataStax Java
connection driver will create.

driver-consistency-level ONE The consistency level used by the DataStax Java driver when executing No
statements for replicating data to the destination. Specify a valid DSE
CONSISTENCY level: ANY, ONE, TWO, THREE, QUORUM, ALL,
LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, or
LOCAL_ONE.

driver-local-dc N/A For destination clusters with multiple datacenters, you can explicitly define Yes
the name of the datacenter that you consider local. Typically, this is the
datacenter that is closest to the source cluster. This value is used only for
clusters with multiple data enters.

driver-pwd none Driver password if the destination requires a user and password to Yes
connect. Changing the driver-pwd value for connection to a destination will
automatically connect, but with a slight delay.
By default, driver user names and passwords are plain text. DataStax
recommends encrypting the driver passwords before you add them to
the CQL table.

driver-read-timeout 15000 Time in milliseconds the DataStax Java driver waits to read responses No
from a server.

driver-ssl-enabled false Whether SSL is enabled for connection to the destination. Yes

driver-ssl-disabled Disable SSL for connection to the destination.

driver_ssl_keystore_path none The path to the keystore for connection to DSE when SSL client Yes
authentication is enabled.

driver_ssl_keystore_password none The keystore password for connection to DSE when SSL client Yes
authentication is enabled.

driver_ssl_keystore_type none The keystore type for connection to DSE when SSL client authentication is Yes
enabled.

driver_ssl_truststore_path none The path to the truststore for connection to DSE when SSL is enabled. Yes

driver-ssl-truststore-password none The truststore password for connection to DSE when SSL is enabled. Yes

driver-ssl-truststore-type none The keystore type for connection to DSE when SSL client authentication is Yes
enabled.

driver-ssl-protocol TLS The SSL protocol for connection to DSE when SSL is enabled. Yes

driver-ssl-cipher-suites none A comma-separated list of SSL cipher suites for connection to DSE when Yes
SSL is enabled. Cipher suites must be supported by the source machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
463
Using DataStax Enterprise advanced functionality

Configuration key Default Description Restart


value required

driver-used-hosts-per-remote-dc 0 To use automatic failover for destination clusters with multiple Yes
datacenters, you must define the number of hosts per remote
datacenter that the datacenter aware round robin policy
(DCAwareRoundRobinPolicy) considers available.

driver-user none Driver username if the destination requires a user and password to Yes
connect. Changing the driver-user value for connection to a destination
will automatically connect, but with a slight delay.

source-id N/A Identifies this source cluster and all inserts from this cluster. The source-id No
must also exist in the primary key on the destination for population of the
source-id to occur.

source-id-column source-id The column to use on remote tables to insert the source id as part of the No
update. If this column is not present on the table that is being updated, the
source id value is ignored.

transmission-enabled false Specify if data collector for the table should be replicated to the No
destination using boolean value.

Configuring channel settings


A replication channel is a defined channel of change data between source clusters and destination clusters.
A replication channel is defined by the source cluster, source keyspace, source table name, destination cluster,
destination keyspace, and destination table name. Replications for each channel (unique keyspace and table)
are stored in the CQL table dse_system.advrep_repl_channel_config that is automatically created.
Change the settings using the dse advrep command line tool with this syntax:

dse advrep channel ...

You can verify the channel configuration before you change it. For example:

dse advrep channel status

The result is:

--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |2 |foo
|bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------

Properties are continuously read from the metadata, so a restart is not required after configuration changes are
made. The following table describes the configuration settings.
Column name Description

separator Field separator.

keyspace The keyspace on the source for the table to replicate.

table The table name on the source to replicate.

source-id Placeholder to override the source-id that is defined in the advrep_conf metadata

source-id-column Placeholder to override the source-id-column that is defined in advrep_conf metadata.

enabled If true, replication will start for this table. If false, no more messages from this table will be saved to the replication
log.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
464
Using DataStax Enterprise advanced functionality

Column name Description

data-center-id Datacenter this replication channel is meant for, if none specified the replication will happen in all specified dc1.

destination Destination to which data is written.

destination-keyspace The keyspace on the destination for the replicated table.

destination-table The table name on the destination for the replicated table.

priority Messages are marked by priority in descending order (DESC).

transmission-enabled Specify if the data collector for the table should be replicated to the destination.

fifo-order Specify if the channel should be replicated in FIFO order (default).

lifo-order Specify if the channel should be replicated in LIFO order.

Security
Authentication credentials can be provided in several ways, see Providing credentials from DSE tools.
The user who is doing the replicating with DSE Advanced Replication requires table and keyspace level
authorization. If the same user access is required, then ensure that the authorization is the same on the source
and destination clusters.
Advanced Replication also supports setting row-level permissions on the destination cluster. The user which
connects to the destination cluster must have permission to write to the specified destination table at the row
level replicated from the source, according to the RLAC restrictions. The user is specified with the --driver-
user destination setting. Row-level access control (RLAC) on the source cluster does not impact Advanced
Replication. Because Advanced Replication reads the source data at the raw CDC file layer, it essentially reads
as a superuser and has access to all configured data tables.
Advanced Replication supports encrypting the driver passwords. Driver passwords are stored in a CQL table. By
default, driver passwords are plain text. DataStax recommends encrypting the driver passwords before you add
them to the CQL table. Create a global encryption key, called a system_key for SSTable encryption. Each node
in the source cluster must have the same system key. The destination does not require this key.

1. In the dse.yaml file:

• Verify that the config_encryption_active property is false:

config_encryption_active: false

• Enable driver password encryption with the conf_driver_password_encryption_enabled property:

conf_driver_password_encryption_enabled: true

• Define where system keys are stored on disk. The location of the key is specified on the command line
with the -d option or with system_key_directory in dse.yaml. The default filepath is /etc/dse/conf.

• To configure the filename of the generated encryption key, set the config_encryption_key_name option
in dse.yaml. The default name is system_key.

2. Generate a system key:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
465
Using DataStax Enterprise advanced functionality

On-server:

dsetool createsystemkey cipher_algorithm strength system_key_file

Off-server

dsetool createsystemkey cipher_algorithm strength system_key_file -kmip=kmip_groupname

For example:

dsetool createsystemkey 'AES/ECB/PKCS5Padding' 128 system_key_file

where system_key_file is a unique file name for the generated system key file. See createsystemkey.
Result: Configure transparent data encryption (TDE) on a per table basis. You can configure encryption
with or without compression. You can create a global encryption key in the location that is specified
by system_key_directory in the dse.yaml file. This default global encryption key is used when the
system_key_file subproperty is not specified.

3. Copy the returned value.

4. On any node in the source cluster, use the dse command to set the encrypted password in the DSE
Advanced Replication environment:

dse advrep destination --driver-pwd "Sa9xOVaym7bddjXUT/eeOQ==" --driver-user


"username"

5. Start dse.

SSL configuration and ports


For details about SSL configuration with DSE Advanced Replication, refer to Configuring SSL for nodetool,
nodesync, dsetool, and Advanced Replication.
Enabling client encryption will encrypt all traffic on the native_transport_port (default: 9042). If both
encrypted and unencrypted traffic is required, an additional cassandra.yaml setting must be enabled.
The native_transport_port_ssl (default: 9142) sets an additional dedicated port to carry encrypted
transmissions, while native_transport_port carries unencrypted transmissions.

Data insert methods


There are several ways to get data into a DataStax Enterprise cluster. Any normal paths used will result in data
replication using DSE Advanced Replication.
Supported data insert methods:

• CQL insert, including cqlsh and applications that use the standard DSE drivers

• COPY TO from a CSV file

• Solr HTTP or CQL

• Spark saveToCassandra

Unsupported data insert methods:

• sstableloader (Cassandra bulk loader)

• OpsCenter restore from backup

• Spark bulkSaveToCassandra

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
466
Using DataStax Enterprise advanced functionality

Monitoring operations
Advanced replication can be monitored with JMX metrics. The outgoing replication queue size is a key factor to
watch. See Metrics for more details.
CQL queries in DSE Advanced Replication
This overview of supported CQL queries and replication concepts for DSE Advanced Replication provide details
on supported CQL queries and best practices guidelines.
DSE Advanced Replication replicates data from source clusters to destination clusters. Replication takes the
CQL query on the source and then recreates a modified version of the query and runs it on the destination.
DataStax Enterprise supports a restricted list of valid CQL queries to manipulate data. In DSE Advanced
Replication, the same restrictions apply to the generated CQL queries that are used to replicate data into the
destination.
Restrictions apply to the primary key. The primary key consists of two parts: the partition key and the clustering
key. The primary key parts plus the optional field values comprise the database row.
If differences exist between the primary key on the source table and the primary key on the destination table,
restrictions apply for which CQL queries are supported.
Best practices
DataStax recommends the following best practices to ensure seamless replication.
Schema structure on the source table and the destination table

• Maintain an identical primary key (partition keys and clustering keys) format in the same order, with
the same columns.

• Add the optional source_id as the first clustering column.

• Maintain all, or a subset of, the field values.

Although the source_id column can be present in the source table schema, values that are inserted
into that column are ignored. When records are replicated, the configured source-id value is used.
Partition key columns
The following list details support and restrictions for partition keys:

• In the destination table, only an additional optional source_id column is supported in the partition
key. Additional destination table partition key columns are not supported. The source_id can be
either a clustering column or a partition key, but not both.

• Using a subset of source table partition key columns in the destination table might result in
overwriting. There is a many-to-one mapping for row entries.

• Order is irrelevant for replication. All permutations are supported.

• CQL UPDATE queries require that all of the partition key columns are fully restricted. Restrict
partition key columns using = or IN (single column) restrictions.

• CQL DELETE queries require that all of the partition key columns are fully restricted. Restrict
partition key columns using = or IN (single column) restrictions.

Clustering columns
The following list details support and restrictions for clustering columns:

• In the destination table, only an additional optional source_id column is supported in the clustering column.
Additional destination table partition key columns are not supported. The source_id can be either a
clustering column or a partition key, but not both.

• Using a subset of source table clustering columns in the destination table might result in overwriting. There
is a many-to-one mapping for row entries.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
467
Using DataStax Enterprise advanced functionality

• Order is irrelevant for replication when using CQL INSERT and UPDATE queries. All permutations are
supported.

• Order is relevant for replication when using CQL DELETE queries. There are limits to permutation support, all
permutations are not supported.

• CQL UPDATE queries require that all of the clustering columns are fully restricted. Restrict partition key
columns using = or IN (single column) restrictions.

• CQL DELETE queries require that the last-specified clustering column be restricted using =/>/>=/</<= (single
or multiple column) or IN (single or multiple column). All of the clustering columns that precede the last-
specified clustering column must also be restricted using = or IN.

• Restricting clustering columns is optional. However, if you do restrict clustering columns, then all of
the clustering columns that you restrict between the first and last (in order) clustering columns must be
restricted.

Field values
The following list details support and requirements for field values:

• A subset, or all, of the field values on the source are supported for replication to the destination.

• Fields that are present on the source, but absent on the destination, are not replicated.

• Fields that are present on the destination, but absent on the source, are not populated.

Source ID (source_id)
The source_id identifies the source cluster and all inserts from the source cluster. The following list details
support and requirements for the source_id:

• The source_id configuration key must be present and correct in the metadata.

• The source_id must be the first position in the clustering column, or any of the partition keys.
If not, then the CQL INSERT and UPDATE queries should work, but the CQL DELETE queries with partially
restricted clustering columns might fail.

• The source_id is always restricted in CQL DELETE and UPDATE queries. Certain delete statements are
not supported where the clustering key is not fully restricted, and the source_id is not the first clustering
column

DSE Advanced Replication metrics


Collect metrics on each source node to review the current status of that node in the source cluster. A working
source and destination configuration is required to use the metrics feature. See Getting started.
Ensure JMX access
Metrics are stored in the DataStax Enterprise JMX system. JMX access is required.

• For production, DataStax recommends authenticating JMX users, see Configuring JMX authentication.

• Use these steps to enable local JMX access. Localhost access is useful for test and development.

1. On the source node, edit cassandra-env.sh and enable local JMX:

JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=localhost"
LOCAL_JMX=yes

2. On the source node, stop and restart DataStax Enterprise to recognize the local JMX change.

Display metrics on the command line


Use the dse advrep command line tool to display metrics on the command line. Ensure that the source node
meets the command line prerequisites.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
468
Using DataStax Enterprise advanced functionality

1. On the source node:

dse advrep --jmx-port 7199 metrics list

------------------------------------------
|Group |Type |Count|
------------------------------------------
|Tables |MessagesDelivered |1002 |
------------------------------------------
|ReplicationLog|CommitLogsToConsume|1 |
------------------------------------------
|Tables |MessagesReceived |1002 |
------------------------------------------
|ReplicationLog|MessageAddErrors |0 |
------------------------------------------
|ReplicationLog|CommitLogsDeleted |0 |
------------------------------------------

-----------------------------------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate |
FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded |1002 |events/second|0.012688461014514603|
9.862886141388435E-39|2.964393875E-314 |2.322135514219019E-114 |
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesDeleted |0 |events/second|0.0 |0.0
|0.0 |0.0 |
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAcknowledged |1002 |events/second|0.012688456391385135|
9.86403600116801E-39 |2.964393875E-314 |2.3230339468969963E-114|
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|CommitLogMessagesRead|16873|events/second|0.21366497971804438 |
0.20580430240786005 |0.39126032533612265|0.2277227124698431 |
-----------------------------------------------------------------------------------------------------------

-------------------------------------
|Group |Type |Value|
-------------------------------------
|Transmission|AvailablePermits|30000|
-------------------------------------

Accessing the metrics


Use JMX to access the metrics. Any JMX tool, such as jconsole, can access the MBeans for advanced
replication. The port listed above, 7199, is used with the hostname or IP address:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
469
Using DataStax Enterprise advanced functionality

Choose the MBeans tab and find com.datastax.bdp.advrep.v2.metrics in the left-hand navigation frame:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
470
Using DataStax Enterprise advanced functionality

The example shown here displays the attributes for


com.datastax.bdp.advrep.v2.metrics:type=ReplicationLog,name=MessagesAdded.

Performance metrics
Metrics are exposed as JMX MBeans under the com.datastax.bdp.advrep.v2.metrics path and are logically
divided into main groups. Each group refers to an architecture component. Metrics types are:
Counter
A simple incrementing and decrementing 64-bit integer.
Meter
Measures the rate at which a set of events occur.
Histogram
Measures the distribution of values in a stream of data.
Timer
A histogram of the duration of a type of event and a meter of the rate of its occurrence.
Gauge
A gauge is an instantaneous measurement of a value.
Metrics are available for the following groups:

• ReplicationLog

• Transmission

• AdvancedReplicationHub-[destinationId]-metrics

Metrics are also available per table:

• Performance metrics per table

Descriptions of each metric is provided.


Metrics for DSE 5.0 (V1) are still present; see the DSE 5.0 documentation for those metrics.

ReplicationLog
Metrics for the ReplicationLog group:
Metric name Description Metric type

MessagesAdded The number of messages that were added Meter


to the replication log, and the rate that the
messages were added, per replica.

MessagesAcknowledged The number of messages that were Meter


acknowledged (and removed) from the
replication log. Acknowledgement can be 1 or
1+n if errors occur.

MessagesDeleted The number of messages that were deleted Meter


from the replication log, including invalid
messages and messages that were removed
after a channel truncate operation.

MessageAddErrors The number of errors that occurred when Counter


adding a message to the replication log.

CommitLogsToConsume The number of commit logs that need to be Counter


consumed that have advanced replication
messages.

CommitLogMessagesRead The number of commit log messages added to Meter


the replication log. The commit log messages
are read if a message pertains to a source
table that has collection enabled.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
471
Using DataStax Enterprise advanced functionality

Metric name Description Metric type

CommitLogMessagesDeleted The number of commit log messages Meter


deleted from the commit log after
adding to the replication log. Like
CommitLogMessagesRead, this metric only
pertains to messages in tables that are
enabled for advanced replication.

Transmission
Metrics for the Transmission group:
Metric name Description Metric type

AvailablePermits The current number of available global permits Gauge


for transmission.

AdvancedReplicationHub-[destinationName]-metrics
Metrics for the AdvancedReplicationHub-[destinationName]-metrics group are provided automatically by the DSE
Java driver.

Incomplete examples of per-destination-metrics are:


Metric name Metric type

known-hosts Counter

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
472
Using DataStax Enterprise advanced functionality

Metric name Metric type

connected-to Counter

open-connections Counter

requests-timer Timer

connection-errors Counter

write-timeouts Counter

read-timeouts Counter

unavailables Counter

other-errors Counter

retries Counter

ignores Counter

For details, see the DSE Java driver documentation.


Performance metrics per table
Use JMX to find performance metrics per table, look under the com.datastax.bdp.advrep.v2.metrics tab in
the left-hand navigation frame for Tables, select a table and inspect the metrics:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
473
Using DataStax Enterprise advanced functionality

For example, to access the MessagesReceived metric for the table sensor_readings in the keyspace demo look
at the following path:

com.datastax.bdp.advrep.v2.metrics:type=Tables,scope=demo.sensor_readings,name=MessagesReceived

The following metrics are provided per table:


Metric name Description Metric type

MessagesReceived The number of messages received from the Counter


source cluster for this table.

MessagesDelivered The number of messages for the source table Counter


that were replicated to the destination.

MessagesDeleted The number of messages that were deleted Counter


from the replication log, including invalid
messages and messages that were removed
after a channel truncate operation.

Managing invalid messages


During message replication, DSE Advanced Replicates attempts to manipulate the message to ensure
successful replication. In some cases, replication might occur with only a subset of the data.
In other cases, replication fails when there are too many differences between the schema on the source cluster
and the schema on the destination cluster. For example, schema incompatibilities occur when a column in the
destination has a different type than the same column in the source, or a table in the source doesn’t contain all
the columns that form the primary key of the same table in the destination.
If a message cannot be replicated, a second transmission will be tried. If replication still fails after that the second
try, the message is discarded and removed from the replication log. The replication log on the source cluster
stores data in preparation for transmission to the destination cluster.
When a message is discarded, the CQL query string and the related error message are logged on the
destination cluster. To define where to store the CQL strings and the error messages that are relevant to the
failed message replication, use one of the following logging strategies:

• SYSTEM_LOG: Log the CQL query and the error message in the system log on the destination.

• CHANNEL_LOG: Store the CQL query and the error message in files in /var/lib/cassandra/advrep/
invalid_queries on the destination. This is the default value.

• NONE: Perform no logging.

For the channel logging strategy, a file is created in the channel log directory on the source node, following
the pattern /var/lib/cassandra/advrep/invalid_queries/<keyspace>/<table>/<destination>/
invalid_queries.log where keyspace, table and destination are:

• keyspace: keyspace name of the invalid query

• table: table name of the invalid query

• destination: destination cluster of the channel

The log file stores the following data that is relevant to the failed message replication:

• time_bucket: an hourly time bucket to prevent the database partition from getting too wide

• id: a time based id (timeuuid)

• cql_string: the CQL query string, explicitly specifies the original timestamp by including the USING
TIMESTAMP option.

• error_msg: the error message

Invalid messages are inserted by time in the log table.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
474
Using DataStax Enterprise advanced functionality

Manage invalid messages using channel logging:

1. To store the CQL query string and error message using a channel log, instead of the default system log
location, specify the invalid_message_log configuration key as CHANNEL_LOG:

$ dse advrep conf update --invalid_message_log CHANNEL_LOG

Manage invalid messages using system logging:

2. To store the CQL query string and error message using a system log, instead of the default channel log
location, specify the invalid_message_log configuration key as SYSTEM_LOG:

$ dse advrep conf update --invalid_message_log SYSTEM_LOG

3. To identify the problem, examine the error messages, the CQL query strings, and the schemas of the data
on the source and the destination.

4. Take appropriate actions to resolve the incompatibility issues.

Managing audit logs


DSE Advanced Replication provides replication audit logging and commands to manage the audit logs with
metadata configuration. Audit logs are stored on the source cluster and are handled by the audit log analyzer
(AuditLogAnalyzer). The audit log analyzer reads the log files, including audit log files in GZIP (.gz) format, that
might be incomplete because they are still being written or they were improperly closed. The audit log analyzer
identifies the list of files which match the template that is defined with the audit_log_file configuration key and
that have exceeded the maximum time interval since they were written to. Purging is based on these criteria.
Global settings apply to the entire source cluster. These global settings are stored in the CQL table
dse_system.advrep_source_config that is automatically created. To define configuration keys to change
global settings, use the dse advrep conf update command. The audit log files are read/write (RW) only for the
file owner, with no permissions for other users.
The time stamp for all writes is UTC (Universal Time Coordinated ).

1. Enable replication audit logging:

$ dse advrep conf update --audit-log-enabled true

2. The default base audit log directory is /var/lib/cassandra/advrep/auditlog. To define a different


directory for storing audit log files:

$ dse advrep conf update --audit-log-file /tmp/auditAdvRep

If the configured audit log file is a relative path, then the log files be placed in the default base directory. If
the configured audit log file is an absolute path, then that path is used.

3. To compress the audit log output using the gzip file format:

$ dse advrep conf update --audit-log-compression GZIP --audit-log-file /tmp/


auditAdvRep/myaudit.gz

The default value is NONE for compression. If .gz is not appended to the audit log filename in the
command, it will be appended to the created files. Compressed audit log files will remain locked until
rotated out; the active file cannot be opened.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
475
Using DataStax Enterprise advanced functionality

4. Specify the time interval to rotate the audit log file. On rotation, the rotated file is appended with the log
counter .[logcounter], incrementing from [0]. To disable rotation, set to 0.

$ dse advrep conf update --audit-log-rotate-mins 120

For example, the compressed file from the last step can be uncompressed after rotating out to /tmp/
auditAdvRep/myaudit.[0].gz.

5. Specify the maximum lifetime of audit log files.


After audit log files are rotated, they are periodically purged when the log files:

• Match the audit log file

• And have not been written to for more than the specified maximum lifespan minutes

To disable purging, set to 0.

$ dse advrep conf update --audit-log-max-life-span-mins 120

6. Restart the node to enable the changes.


When logging is enabled, log files that would be overwritten are moved to a subdirectory in the log
directory. The subdirectory is named archive_x, where x increments from 0 until an unused directory is
identified and created.

dse advrep commands


A list of commands for DSE Advanced Replication.
About the dse advrep command
The command line tool provides commands and options for configuring and using DSE Advanced Replication.
Synopsis

$ dse advrep [connection_options] [command] [sub_command] [sub_command_options]

The default port for DSE Advanced Replication is 9042.

Table 61: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
476
Using DataStax Enterprise advanced functionality

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Using dse advrep command line help


To view a listing of dse advrep commands:

$ dse advrep help

To view help for a specific command:

$ dse advrep help command [ sub_command ]

Connection options
JMX authentication is supported by some dse commands. Other dse commands authenticate with the user
name and password of the configured user. The connection option short form and long form are comma
separated.
You can provide authentication credentials in several ways, see Credentials for authentication.

General connection options:


--separator field_separator
The field separator for use with the --no-pretty-print command.
--verbose
Print verbose messages for command.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.
--no-pretty-print
If not specified, data is printed using tabular output. If specified, data is printed as a comma separated
list unless a separator is specified.
--cipher-suites ssl_cipher_suites
Specify comma-separated list of SSL cipher suites for connection to DSE when SSL is enabled. For
example, --cipher-suites c1,c2,c3.
--host hostname
The DSE node hostname or IP address.
--jmx-port jmx_port
The remote JMX agent port number. Default: 7199.
--jmx-pwd jmx_password
The password for authenticating with secure local JMX. If you do not provide a password, you are
prompted to enter one.
--jmx-user jmx_username
The user name for authenticating with secure local JMX.
--kerberos-enabled true | false
Whether Kerberos authentication is enabled for connections to DSE. For example, --kerberos-enabled
true.
--keystore-password keystore_password
Keystore password for connection to DSE when SSL client authentication is enabled.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
477
Using DataStax Enterprise advanced functionality

--keystore-path ssl_keystore_path
Path to the keystore for connection to DSE when SSL client authentication is enabled.
--keystore-type ssl_keystore_type
Keystore type for connection to DSE when SSL client authentication is enabled. JKS is the type
for keys generated by the Java keytool binary, but other types are possible, depending on user
environment.
-p password
The password to authenticate for database access. Can use the DSE_PASSWORD environment
variable.
--ssl
Whether SSL is enabled for connection to DSE.--ssl-enabled true is the same as --ssl.
--ssl-protocol ssl_protocol
SSL protocol for connection to DSE when SSL is enabled. For example, --ssl-protocol ssl4.
-t token
Specify delegation token which can be used to login, or alternatively, DSE_TOKEN environment
variable can be used.
--truststore_password ssl_truststore_password
Truststore password to use for connection to DSE when SSL is enabled.
--truststore_path ssl_truststore_path
Path to the truststore to use for connection to DSE when SSL is enabled. For example, --truststore-
path /path/to/ts.
--truststore-type ssl_truststore_type
Truststore type for connection to DSE when SSL is enabled. JKS is the type for keys generated by
the Java keytool binary, but other types are possible, depending on user environment. For example, --
truststore-type jks2.
-u username
User name of a DSE authentication account. Can use the DSE_USERNAME environment variable.
Examples
This connection example specifies that Kerberos is enabled and lists the replication channels:

$ dse advrep --host ip-10-200-300-138.example.lan --kerberos-enabled=true conf list

To use the server YAML files:

$ dse advrep --use-server-config conf list

To list output without pretty-print with a specified separator:

dse advrep --no-pretty-print --separator "|" destination list-conf

This output will result:

destination|name|value
mydest|addresses|192.168.200.100
mydest|transmission-enabled|true
mydest|driver-ssl-cipher-suites|
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,
mydest|driver-ssl-enabled|false
mydest|driver-ssl-protocol|TLS
mydest|name|mydest
mydest|driver-connect-timeout|15000
mydest|driver-max-requests-per-connection|1024
mydest|driver-connections-max|8
mydest|driver-connections|1
mydest|driver-compression|lz4
mydest|driver-consistency-level|ONE
mydest|driver-allow-remote-dcs-for-local-cl|false

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
478
Using DataStax Enterprise advanced functionality

mydest|driver-used-hosts-per-remote-dc|0
mydest|driver-read-timeout|15000

dse advrep channel create


Creates a replication channel for change data to flow between source clusters and destination clusters.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel create --source-keyspace keyspace_name --source-table source_table_name


--source-id source_id_name --source-id-column source_id_column_name --destination destination
--destination-keyspace destination_keyspace_name --destination-table destination_table_name
[ --fifo-order | --lifo-order ] [ --collection-enabled (true|false) ] [ --priority
channel_priority ] [ --transmission-enabled (true|false) ]

Table 62: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


The source cluster keyspace to replicate.
--source-table source_table_name (required)
The source table to replicate.
--source-id id
A unique identifier for all data that comes from a particular source node.
--source-id-column source_id
The column that identifies the source id in the destination table.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
479
Using DataStax Enterprise advanced functionality

--destination destination (required)


The destination where the replication will be sent; the user names the destination.
--destination-keyspace keyspace_name
The destination keyspace to which replication will be sent.
--destination-table table_name
The destination table to which replication will be sent.
--fifo-order
First in, first out channel (FIFO) replication order. Default.
--lifo-order
Last in, last out (LIFO) channel replication order.
--collection-enabled (true|false)
Whether to enable the source table for replication collection on creation.
--transmission-enabled (true|false)
Whether to replicate data collector for the table to the destination.
--priority channel_priority
The order in which the source table log files are transmitted.
Examples

To create a replication source channel:

$ dse advrep channel create --source-keyspace foo --source-table bar --source-id


source1 --source-id-column source_id --destination mydest --destination-keyspace foo --
destination-table bar --collection-enabled true --priority 1

with a result:

$ Created channel dc=Cassandra keyspace=foo table=bar to mydest

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set the
source-id and source-id-column differently from the global setting.

dse advrep channel update


Updates a replication channel configuration.
A replication channel is a defined channel of change data between source clusters and destination clusters.
To update a channel, specify a new value for one or more options.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel update --source-keyspace keyspace_name --source-table source_table_name


--source-id source_id_name --source-id-column source_id_column_name --destination destination
--destination-keyspace destination_keyspace_name --destination-table destination_table_name
[ --fifo-order | --lifo-order ] [ --collection-enabled (true|false) ] [ --transmission-
enabled (true|false) ] [ --priority channel_priority ]

Table 63: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
480
Using DataStax Enterprise advanced functionality

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


The source cluster keyspace to replicate.
--source-table source_table_name (required)
The source table to replicate.
--source-id id
A unique identifier for all data that comes from a particular source node.
--source-id-column source_id
The column that identifies the source id in the destination table.
--destination destination (required)
The destination where the replication will be sent; the user names the destination.
--destination-keyspace keyspace_name
The destination keyspace to which replication will be sent.
--destination-table table_name
The destination table to which replication will be sent.
--fifo-order
First in, first out channel (FIFO) replication order. Default.
--lifo-order
Last in, last out (LIFO) channel replication order.
--collection-enabled (true|false)
Whether to enable the source table for replication collection on creation.
--transmission-enabled (true|false)
Whether to replicate data collector for the table to the destination.
--priority channel_priority
The order in which the source table log files are transmitted.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
481
Using DataStax Enterprise advanced functionality

Examples

To update a replication source channel configuration:

$ dse advrep --verbose channel update --source-keyspace demo --source-table


sensor_readings --destination mydest --lifo-order

with a result as seen using dse advrep channel status:

$
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|
priority|dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|demo |sensor_readings |true |true |LIFO |2 |
demo |sensor_readings |source1|source_id |mydest |true |
--------------------------------------------------------------------------------------------------------------

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set the
source-id and source-id-column differently from the global setting.

dse advrep channel delete


Deletes a replication channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.
To delete a channel, you must specify source information and the destination and data-center for the channel.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel delete --source-keyspace keyspace_name --source-table source_table_name


--destination destination --data-center-id data_center_id

Table 64: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
482
Using DataStax Enterprise advanced functionality

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


The source cluster keyspace to replicate.
--source-table source_table_name (required)
The source table to replicate.
--destination destination (required)
The destination where the replication will be sent; the user names the destination.
--data-center-id data_center_id
The datacenter for this channel.
Examples

To create a replication source channel:

$ dse advrep channel delete --source-keyspace foo --source-table bar --destination mydest
--data-center-id Cassandra

with a result:

Deleted channel dc=Cassandra keyspace=foo table=bar to mydest

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel pause
Pauses replication for a channel for change data to flow from a source cluster to a destination cluster.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Pause collection of data or transmission of data between a source cluster and destination cluster.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel pause --source-keyspace keyspace_name --source-table source_table_name


--destinations destination [ , destination ] --data-center-ids data_center_id [ ,
data_center_id ] --collection --transmission

Table 65: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
483
Using DataStax Enterprise advanced functionality

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.
Examples

To pause a replication source channel:

$ dse advrep channel pause --source-keyspace foo --source-table bar --destinations mydest
--data-center-ids Cassandra

with a result:

Channel dc=Cassandra keyspace=foo table=bar collection to mydest was paused

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel resume
Resumes replication for a channel.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
484
Using DataStax Enterprise advanced functionality

A replication channel is a defined channel of change data between source clusters and destination clusters.
A channel can resume either the collection or transmission of replication between a source cluster and
destination cluster.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel resume --source-keyspace keyspace_name --source-table source_table_name


--destinations destination [ , destination ] --data-center-ids data_center_id [ ,
data_center_id ] --collection --transmission

Table 66: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
485
Using DataStax Enterprise advanced functionality

Examples

To resume a replication source channel:

$ dse advrep channel resume --source-keyspace foo --source-table bar --destinations


mydest --data-center-ids Cassandra

with a result:

Channel dc=Cassandra keyspace=foo table=bar collection to mydest was resumed

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel status
Prints status of a replication channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel status --data-center-id data_center_id --source-keyspace keyspace_name


--source-table source_table_name --destination destination

Table 67: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
486
Using DataStax Enterprise advanced functionality

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destination destination
The destination where the replication will be sent; the user names the destination.
--data-center-id data_center_id
The datacenter for this channel.
Examples

To print the status of a replication channel:

$ dse advrep channel status --source-keyspace foo --source-table bar --destination mydest
--data-center-id Cassandra

with a result:

--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |2 |
foo |bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel truncate
Truncates a channel to prevent replicating all messages that are currently in the replication log.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel truncate --source-keyspace keyspace_name --source-table


source_table_name --destinations destination [ , destination ] --data-center-ids
data_center_id [ , data_center_id ]

Table 68: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
487
Using DataStax Enterprise advanced functionality

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
Examples

To truncate a replication channel to prevent replicating all messages that are


currently in the replication log:

$ dse advrep channel status --source-keyspace foo --source-table bar --destinations


mydest --data-center-ids Cassandra

with a result:

Channel dc=Cassandra keyspace=foo table=bar to mydest was truncated

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep conf list
Lists configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
488
Using DataStax Enterprise advanced functionality

Synopsis

$ dse advrep conf list

Table 69: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Examples

To list configuration settings:

$ dse advrep conf list

The result:

----------------------------
|name |value |
----------------------------
|audit_log_file |auditLog|
----------------------------
|permits |8 |
----------------------------
|audit_log_enabled|true |
----------------------------

The number of permits is 8, audit logging is enabled, and the audit log file name is auditLog.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
489
Using DataStax Enterprise advanced functionality

dse advrep conf remove


Removes configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis

$ dse advrep conf remove --separator field_separator --audit-log-enabled true|false --


audit-log-compression none|gzip --audit-log-file log_file_name --audit-log-max-life-span-
mins number_of_minutes --audit-log-rotate-mins number_of_minutes --permits number_of_permits
--collection-max-open-files number_of_files --collection-time-slice-count number_of_files
--collection-time-slice-width time_period_in_seconds --collection-expire-after-write --
invalid-message-log

Table 70: Legend


Syntax conventions Description

Italics Variable value. Replace with a user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none
--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
490
Using DataStax Enterprise advanced functionality

Examples

To remove advanced replication configuration:

$ dse advrep conf remove --permits 8

with a result:

Removed config permits

dse advrep conf update


Updates configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep conf update --audit-log-enabled true|false --audit-log-compression none|gzip


--audit-log-file log_file_name --audit-log-max-life-span-mins number_of_minutes --audit-
log-rotate-mins number_of_minutes --permits number_of_permits --collection-max-open-files
number_of_files --collection-time-slice-count number_of_files --collection-time-slice-width
time_period_in_seconds --collection-expire-after-write --invalid-message-log

Table 71: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
491
Using DataStax Enterprise advanced functionality

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none
--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log
Examples

To update configuration settings:

$ dse advrep conf update --permits 8 --audit-log-enabled true --audit-log-file auditLog

with a result:

Updated audit_log_file from null to auditLog


Updated permits from null to 8
Updated audit_log_enabled from null to true

dse advrep destination create


Creates a replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination create --name destination_name --addresses address_name [ ,


address_name ] [ --transmission-enabled (true|false) ] --driver-user user_name --driver-
pwd password --driver-used-hosts-per-remote-dc number_of_hosts --driver-connections
number_of_connections --driver-connections-max number_of_connections --driver-local-
dc data_center_name --driver-allow-remote-dcs-for-local-cl true|false --driver-
consistency-level [ ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|SERIAL|
LOCAL_SERIAL|LOCAL_ONE ] --driver-compression [ snappy|lz4 ] --driver-connect-timeout

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
492
Using DataStax Enterprise advanced functionality

timeout_in_milliseconds --driver-read-timeout timeout_in_milliseconds --driver-max-requests-


per-connection number_of_requests --driver-ssl-enabled true|false --driver-ssl-cipher-
suites --driver-ssl-protocol --driver-ssl-keystore-path --driver-ssl-keystore-password --
driver-ssl-keystore-type --driver-ssl-truststore-path --driver-ssl-truststore-password --
driver-ssl-truststore-type

Table 72: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
--addresses address_name [ , address_name ] (required)
The IP addresses of the destinations.
--transmission-enabled true | false
Whether the data collector for the table should be replicated to the destination.
--driver-user user_name
The username for the destination.
--driver-pwd password
The password for the destination.
--driver-used-hosts-per-remote-dc number_of_hosts
The number of hosts per remote datacenter that the datacenter-aware round robin policy considers
available for use.
--driver-connections number_of_connections
The number of connections that the driver creates.
--driver-connections-max number_of_connections
The maximum number of connections that the driver creates.
--driver-local-dc data_center_name
The name of the datacenter that is considered local.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
493
Using DataStax Enterprise advanced functionality

--driver-consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|
SERIAL|LOCAL_SERIAL|LOCAL_ONE
The consistency level for the destination.
--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path
The SSL keystore path to use for driver connections.
--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples
To update a replication destination:

$ dse advrep --verbose destination update --name mydest --addresses 10.200.182.148 --


transmission-enabled true

with a result:

Destination mydest created

dse advrep destination update


Updates a replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination update --name destination_name --addresses address_name [ ,


address_name ] [ --transmission-enabled true|false ] --driver-user user_name --driver-
pwd password --driver-used-hosts-per-remote-dc number_of_hosts --driver-connections
number_of_connections --driver-connections-max number_of_connections --driver-local-dc
data_center_name --driver-allow-remote-dcs-for-local-cl true|false --driver-consistency-
level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|SERIAL|LOCAL_SERIAL|LOCAL_ONE
--driver-compression snappy|lz4 --driver-connect-timeout timeout_in_milliseconds --driver-
read-timeout timeout_in_milliseconds --driver-max-requests-per-connection number_of_requests
--driver-ssl-enabled true|false --driver-ssl-cipher-suites suite1 [, suite2, suite3 ]
--driver-ssl-protocol protocol --driver-ssl-keystore-path keystore_path --driver-ssl-

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
494
Using DataStax Enterprise advanced functionality

keystore-password keystore_password --driver-ssl-keystore-type keystore_type --driver-ssl-


truststore-path truststore_path --driver-ssl-truststore-password truststore_password --
driver-ssl-truststore-type truststore_type

Table 73: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
--addresses address_name [ , address_name ] (required)
The IP addresses of the destinations.
--transmission-enabled true | false
Whether the data collector for the table should be replicated to the destination.
--driver-user user_name
The username for the destination.
--driver-pwd password
The password for the destination.
--driver-used-hosts-per-remote-dc number_of_hosts
The number of hosts per remote datacenter that the datacenter-aware round robin policy considers
available for use.
--driver-connections number_of_connections
The number of connections that the driver creates.
--driver-connections-max number_of_connections
The maximum number of connections that the driver creates.
--driver-local-dc data_center_name
The name of the datacenter that is considered local.
--driver-consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|
SERIAL|LOCAL_SERIAL|LOCAL_ONE
The consistency level for the destination.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
495
Using DataStax Enterprise advanced functionality

--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path
The SSL keystore path to use for driver connections.
--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples

To create a replication destination:

$ dse advrep --verbose destination update --name mydest --addresses 10.200.182.148 --


driver-consistency-level LOCAL_QUORUM

with a result:

Destination mydest updated


Updated addresses from 10.200.182.148 to 10.200.182.1648
Updated driver_consistency_level from ONE to LOCAL_QUORUM
Updated name from mydest to mydest

Notice that any option included causes a change to occur.


dse advrep destination delete
Deletes a given replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
496
Using DataStax Enterprise advanced functionality

Synopsis

$ dse advrep destination delete --name destination_name

Table 74: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
Examples

To delete a replication destination:

$ dse advrep destination delete --name mydest

with a result:

Destination mydest removed

dse advrep destination list


Lists all replication destinations.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
497
Using DataStax Enterprise advanced functionality

Synopsis

$ dse advrep destination list

Table 75: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Examples

To list all replication destinations:

$ dse advrep destination list

with a result:

----------------
|name |enabled|
----------------
|mydest|true |
----------------

dse advrep destination list-conf


Lists all configuration for a given replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
498
Using DataStax Enterprise advanced functionality

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination list-conf --separator field_separator --name destination_name

Table 76: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
Examples

To list the configuration for a replication destination:

$ dse advrep destination list-conf --name mydest

with a result:

KEYS: ---- [addresses, transmission-enabled, driver-ssl-cipher-suites, driver-ssl-


enabled, driver-ssl-protocol, name, driver-connect-timeout, driver-max-requests-per-
connection, driver-connections-max, driver-connections, driver-compression, driver-
consistency-level, driver-allow-remote-dcs-for-local-cl, driver-used-hosts-per-remote-dc,
driver-read-timeout]
-------------------------------------------------------------------------------------------

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
499
Using DataStax Enterprise advanced functionality

|destination|name |value
|
-------------------------------------------------------------------------------------------
|mydest |addresses |10.200.180.162
|
-------------------------------------------------------------------------------------------
|mydest |transmission-enabled |true
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-cipher-suites |
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA256,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
500
Using DataStax Enterprise advanced functionality

| | |
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-enabled |false
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-protocol |TLS
|
-------------------------------------------------------------------------------------------
|mydest |name |mydest
|
-------------------------------------------------------------------------------------------

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
501
Using DataStax Enterprise advanced functionality

|mydest |driver-connect-timeout |15000


|
-------------------------------------------------------------------------------------------
|mydest |driver-max-requests-per-connection |1024
|
-------------------------------------------------------------------------------------------
|mydest |driver-connections-max |8
|
-------------------------------------------------------------------------------------------
|mydest |driver-connections |1
|
-------------------------------------------------------------------------------------------
|mydest |driver-compression |lz4
|
-------------------------------------------------------------------------------------------
|mydest |driver-consistency-level |ONE
|
-------------------------------------------------------------------------------------------
|mydest |driver-allow-remote-dcs-for-local-cl|false
|
-------------------------------------------------------------------------------------------
|mydest |driver-used-hosts-per-remote-dc |0
|
-------------------------------------------------------------------------------------------
|mydest |driver-read-timeout |15000
|
-------------------------------------------------------------------------------------------

dse advrep destination remove-conf


Removes configuration for a destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination remove-conf --name destination_name --addresses address_name


[ , address_name ] [ --transmission-enabled (true|false) ] --driver-user user_name --
driver-pwd password --driver-used-hosts-per-remote-dc --driver-connections --driver-
connections-max --driver-local-dc --driver-allow-remote-dcs-for-local-cl true|false --
driver-consistency-level [ ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|SERIAL|
LOCAL_SERIAL|LOCAL_ONE ] --driver-compression [ snappy|lz4 ] --driver-connect-timeout
timeout_in_milliseconds --driver-read-timeout timeout_in_milliseconds --driver-max-requests-
per-connection number_of_requests --driver-ssl-enabled true|false --driver-ssl-cipher-
suites --driver-ssl-protocol --driver-ssl-keystore-path --driver-ssl-keystore-password --
driver-ssl-keystore-type --driver-ssl-truststore-path --driver-ssl-truststore-password --
driver-ssl-truststore-type

Table 77: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
502
Using DataStax Enterprise advanced functionality

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
--addresses address_name [ , address_name ] (required)
The IP addresses of the destinations.
--transmission-enabled true | false
Whether the data collector for the table should be replicated to the destination.
--driver-user user_name
The username for the destination.
--driver-pwd password
The password for the destination.
--driver-used-hosts-per-remote-dc number_of_hosts
The number of hosts per remote datacenter that the datacenter-aware round robin policy considers
available for use.
--driver-connections number_of_connections
The number of connections that the driver creates.
--driver-connections-max number_of_connections
The maximum number of connections that the driver creates.
--driver-local-dc data_center_name
The name of the datacenter that is considered local.
--driver-consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|
SERIAL|LOCAL_SERIAL|LOCAL_ONE
The consistency level for the destination.
--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
503
Using DataStax Enterprise advanced functionality

The SSL keystore path to use for driver connections.


--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples

To remove configuration for a replication destination:

$ dse advrep --verbose destination remove-conf --transmission-enabled true

with a result:

Removed config transmission-enabled

dse advrep metrics list


Lists advanced replication JMX metrics.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep metrics list --metric group metric_group --metric-type metric_type

Table 78: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
504
Using DataStax Enterprise advanced functionality

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--metric group metric_group


The source cluster keyspace for which to show count.
--metric-type metric_type
The source table for which to show count.
Examples

To display the JMX metrics:

$ dse advrep --host localhost --port 7199 metrics list

with a result:

------------------------------------------
|Group |Type |Count|
------------------------------------------
|Tables |MessagesDelivered |3000 |
------------------------------------------
|ReplicationLog|CommitLogsToConsume|1 |
------------------------------------------
|Tables |MessagesReceived |3000 |
------------------------------------------
|ReplicationLog|MessageAddErrors |0 |
------------------------------------------
|ReplicationLog|CommitLogsDeleted |0 |
------------------------------------------

--------------------------------------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate |
FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded |3000 |events/second|0.020790532589851248|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesDeleted |0 |events/second|0.0 |0.0
|0.0 |0.0 |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAcknowledged |3000 |events/second|0.020790529428089743|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|CommitLogMessagesRead|30740|events/second|0.21303361656215317 |
0.13538523143065767 |0.01686330377344829|0.11519609320406245 |
--------------------------------------------------------------------------------------------------------------

-------------------------------------
|Group |Type |Value|
-------------------------------------
|Transmission|AvailablePermits|30000|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
505
Using DataStax Enterprise advanced functionality

-------------------------------------

To display JMX metrics for a particular metric group:

$ dse advrep --host localhost --port 7199 metrics list --metric-


group Tables

with a result:

--------------------------------
|Group |Type |Count|
--------------------------------
|Tables|MessagesDelivered|3000 |
--------------------------------
|Tables|MessagesReceived |3000 |
--------------------------------

To display JMX metrics for a particular metric type:

$ dse advrep --host localhost --port 7199 metrics list --metric-


type MessagesAdded

with a result:

-----------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate
|FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
-----------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded|3000 |events/second|
0.020827685267120057|6.100068258619765E-28|2.964393875E-314|
5.515866021410421E-82|
-----------------------------------------------------------------------------------

dse advrep replog count


Returns the messages that have not been replicated.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
506
Using DataStax Enterprise advanced functionality

Synopsis

$ dse advrep replog count --source-keyspace keyspace_name --source-table source_table_name


--destination destination --data-center-id data_center_id

Table 79: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


Define the source cluster keyspace for which to show count..
--source-table source_table_name (required)
Define the source table for which to show count.
--destination destination (required)
Define the destination for which to show count.
--data-center-id data_center_id
Define the data center for which to show the count.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
507
Using DataStax Enterprise advanced functionality

Examples

To verify the record count held in a replication log:

$ dse advrep replog count --destination mydest --source-keyspace foo --source-table bar

with a result:

dse advrep replog analyze-audit-log


Reads the audit log and prints a summary.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep replog analyze-audit-log --file audit_log_filename

Table 80: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--file audit_log_filename
The audit log file to create.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
508
Using DataStax Enterprise advanced functionality

Examples

To analyze the data in a replication log:

$ dse advrep replog analyze-audit-log --file auditLog

with a result:

foo, bar : inserts = 1000, insertErrors = 0


foo, bar : reads = 1000, sent = 0, deletes = 1000, readingErrors = 0, deletingErrors = 0

DSE In-Memory
DSE In-Memory is a part of the multiple storage options offered in DataStax Enterprise for optimizing performance
and cost goals. DSE in-memory provides lightning-fast performance for read-intensive situations. It allows
developers, architects, and administrators to easily choose what parts (some or all) of a database reside fully
in RAM. It is designed for use cases that lend themselves to in-memory computing, while allowing disk-based
workloads to be serviced by DSE Tiered Storage and traditional storage modeling.
DSE In-Memory is suitable for use cases that include primarily read-only workloads with slowly changing data
and/or semi-static datasets, such as a product catalog that is refreshed nightly, but read constantly during the day.
It is not suitable for workloads with heavily changing data or monotonically growing datasets that might exceed the
RAM capacity on the nodes/cluster.
DataStax recommends using OpsCenter to check performance metrics before and after configuring DSE In-
Memory.
Creating or altering tables to use DSE In-Memory
Use CQL directives to create and alter tables to use DSE In-Memory and dse.yaml to limit the size of tables.
Creating a table to use DSE In-Memory
To create a table that uses DSE In-Memory, add a CQL directive to the CREATE TABLE statement. Use the
compaction directive in the statement to specify the MemoryOnlyStrategy class and disable the key and row
caches.

CREATE TABLE customers (


uid text,
fname text,
lname text,
PRIMARY KEY (uid)
) WITH compaction= { 'class': 'MemoryOnlyStrategy' }
AND caching = {'keys':'NONE', 'rows_per_partition':'NONE'};

Altering an existing table to use DSE In-Memory


Use the ALTER TABLE statement to change a traditional table to use in-memory, or to change an in-memory
table to a traditional table. For example, use the DESCRIBE command for a table named employee. Verify that
employee is a traditional table because the output of the DESCRIBE command does not include a line that looks
something like:

compaction={'class': 'MemoryOnlyStrategy'} >

Alter the employee table to use DSE In-Memory:

ALTER TABLE employee WITH compaction= { 'class': 'MemoryOnlyStrategy' }

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
509
Using DataStax Enterprise advanced functionality

AND caching = {'keys':'NONE', 'rows_per_partition':'NONE'};

After you alter the table, rewrite existing SSTables:

$ nodetool upgradesstables -a <keyspacename> <tablename>

Use the --jobs option to set the number of SSTables that upgrade simultaneously. The default setting is 2,
which minimizes impact on the cluster. Set to 0 to use all available compaction threads.
In cqlsh, use the DESCRIBE TABLE command to view table properties:

DESCRIBE TABLE employee;

This output shows that the table uses DSE In-Memory:

CREATE TABLE employee (


uid text PRIMARY KEY,
fname text,
lname text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"NONE", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.MemoryOnlyStrategy',
'max_threshold': '32'}
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled' : 'true'}
AND speculative_retry = '99.0PERCENTILE';

When memtable_flush_period_in_ms=0, the memtable will flush when:

• the flush threshold is met

• on shutdown

• on nodetool flush

• when commitlogs get full

Limiting the size of tables


Use the max_memory_to_lock_fraction or max_memory_to_lock_mb configuration option in the dse.yaml
file to specify how much system memory to use for all in-memory tables.
max_memory_to_lock_fraction Specify a fraction of the system memory. The default value of 0.20
specifies to use up to 20% of system memory.
max_memory_to_lock_mb Specify a maximum amount of memory in megabytes (MB).

Disabling caching on tables


DataStax recommends disabling caching on tables that use the DSE In-Memory option. If caching is not
disabled, a warning is logged. Set the table caching property to disable both types of caching:

ALTER TABLE customers WITH caching = {'keys':'NONE',

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
510
Using DataStax Enterprise advanced functionality

'rows_per_partition':'NONE'};

Verifying table properties


In cqlsh, use the DESCRIBE command to view table properties:

DESCRIBE TABLE employee;

This output shows that the table uses DSE In-Memory:

CREATE TABLE employee (


uid text PRIMARY KEY,
fname text,
lname text
) WITH bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"NONE", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class':
'org.apache.cassandra.db.compaction.MemoryOnlyStrategy',
'max_threshold': '32'}
AND compression = {}
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND nodesync = {'enabled' : 'true'}
AND speculative_retry = '99.0PERCENTILE';

Managing memory
Because DataStax Enterprise runs in a distributed environment, you can inadvertently add excessive data that
exceeds the available memory.
When using DSE In-Memory, you must monitor and carefully manage available memory.
You can use OpsCenter to monitor in-memory usage.
DSE In-Memory retains the durability guarantees of the database.
Recommended limits
To prevent exceeding the RAM capacity, DataStax recommends that in-memory objects consume no more than
45% of a node’s free memory.
Managing available memory
If the maximum memory capacity is exceeded, locking some of the data into memory is stopped, and read
performance will degrade and a warning message is displayed.
The warning message looks something like this:

WARN [main] 2015-03-27 09:34:00,050 MemoryOnlyStrategy.java:252 - File


MmappedSegmentedFile(path='/data/ks/test-f590c150b95911e4b66d85e0b6fd73a5/ks-test-ka-94-
Data.db',
length=43629650) buffer address: 140394485092352 length: 43629650 could not be locked.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
511
Using DataStax Enterprise advanced functionality

Sizelimit (1048576) reached. After locking size would be: 43630592

Checking available memory


Use the dsetool inmemorystatus command to check the amount of data that is currently in memory. When the
data size exceeds the specified Max Memory to Lock value, or some other problem exists, the Couldn't Lock
column displays its value. The system.log file provides useful information for problem resolution.

$ dsetool inmemorystatus

Max Memory to Lock: 1MB


Current Total Memory Locked: 0MB
Current Total Memory Not Able To Lock: 46MB
Keyspace ColumnFamily Size Couldn't Lock Usage
mos_ks testmemory 0MB 46MB 0%
mos_ks testmemory2 0MB 0MB 0%
mos_ks testmemory4 0MB 0MB 0%
mos_ks testmemory3 0MB 0MB 0%

Backing up and restoring data


The procedures for backing up and restoring data are the same for DSE In-Memory data and on-disk data.
Use snapshots to manage backups and restores.
You can also use the OpsCenter Backup Service.

Always run nodetool cleanup before taking a snapshot for restore. Otherwise invalid replicas, that is replicas
that have been superseded by new, valid replicas on newly added nodes can get copied to the target when
they should not. This results in old data showing up on the target.

DSE Multi-Instance
DSE Multi-Instance supports multiple DataStax Enterprise nodes on a single host machine to leverage large
server capabilities and enable the use of existing hardware. This allows you to utilize the price-performance
sweet spot in the contemporary hardware market to ensure that cost saving goals are met without compromising
performance and availability.
About DSE Multi-Instance
DSE Multi-Instance supports multiple DataStax Enterprise nodes on a single host machine to leverage large
server capabilities and enable the use of existing hardware. This allows you to utilize the price-performance
sweet spot in the contemporary hardware market to ensure that cost saving goals are met without compromising
performance and availability.
Benefits
Simplifies configuration Administration, installation, and configuration of DataStax Enterprise nodes on a single host machine are more easily
managed.

Effectively utilizes larger Running multiple DataStax Enterprise nodes on a large host machine enables optimal use of RAM, CPU, and so on.
server resources

Supports scaling Simplifies scaling with multiple DataStax Enterprise nodes on a single large host machine.

DSE Multi-Instance architecture


DSE Multi-Instance enables multiple DataStax Enterprise nodes to exist on a single physical server.
To achieve the best performance, DataStax recommends:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
512
Using DataStax Enterprise advanced functionality

• All DSE Multi-Instance nodes on a single physical host share the same database rack to avoid replica
placement problems.
If you are not using the rack feature, you must configure racks manually to ensure that the DSE Multi-
Instance nodes on the same host machine do not encounter replica placement problems.

• Ensuring that DSE Multi-instance nodes do not share a single physical disk.
For example, for two DSE Multi-Instance nodes do not configure a server with a single disk. Instead,
configure the server with at least two disks so that each node can have its own exclusive storage device.

DataStax Enterprise is installed in a single location on the host machine with:

• Multiple JVMs.

• Each JVM runs one DataStax Enterprise node.

• Each DataStax Enterprise node has a node-specific set of configuration files, with one directory per service.
See default file locations for package installations.

The following image shows three DataStax Enterprise nodes on a single host machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
513
Using DataStax Enterprise advanced functionality

Figure 16:

Adding nodes to DSE Multi-Instance


With package installs, the dse add-node command simplifies adding and configuring nodes on a host machine.
Tarball installs do not support adding more nodes on a single host machine. To install DSE Multi-Instance in
a tarball installation, unpack the tarball in multiple locations on a single host machine. Each tarball installation
becomes a DataStax Enterprise node on the host machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
514
Using DataStax Enterprise advanced functionality

On the host machine, the DSE Multi-Instance root directory is /etc/defaults. This default location is not
configurable. The node type is defined in the /etc/defaults/dse-nodeId file.
DSE Multi-Instance is supported only for package installations.

These actions occur for each node that is added:

• The node configuration is modified according to the command arguments.

• A script is created so that the node can be started and stopped.

• The run levels are updated to the default values so that the node is started and stopped when the host
machine is booted or halted.

• The /etc/default/dse-nodeId file is created to set the default node type as a transactional node.

• With DSE Multi-Instance, when you run the dse command on a node in the host machine, the node
configuration is read from:

# Package installations: /etc/dse/serverconfig/dse-nodeId

# Tarball installations: the /etc/dse directory is the default configuration location in each location where
you installed DataStax Enterprise.

With DSE Multi-Instance, multiple DataStax Enterprise nodes reside on a single host machine. To segregate
the configuration for each DataStax Enterprise node, node-specific directory structures are used to store
configuration and operational files. For example, in addition to /etc/dse/dse.yaml, the DSE Multi-Instance
dse.yaml files are stored in /etc/dse-nodeId/dse.yaml locations. The server_id option is generated in
DSE Multi-Instance /etc/dse-nodeId/dse.yaml files to uniquely identify the physical server on which multiple
instances are running and is unique for each database instance.
Directories Description

/etc/dse /etc/dse/dse.yaml is the primary configuration file for DataStax Enterprise

/etc/dse-node1 /etc/dse-node1/dse.yaml is the configuration file for the DataStax Enterprise node in the dse-node1
directory

/etc/dse-node2 /etc/dse-node2/dse.yaml is the configuration file for the DataStax Enterprise node in the dse-node2
directory

For DSE Multi-Instance nodes, two files control the configuration of the node. For example, for the node named
dse-node1:

• /etc/dse/serverconfig/dse-node1 specifies the directories for the configuration files

• /etc/dsefault/dse-node1 configures the node behavior, including node type and configures the number
of retries for the DSE service to start.

For package installations, see directories for DSE Multi-Instance for a comprehensive list of file locations in a
DSE Multi-Instance cluster.

1. Verify that your existing DataStax Enterprise installation has the default node configuration in the /etc/
dse directory. The configuration files for the default node include /etc/dse/dse.yaml and /etc/dse/
cassandra/cassandra.yaml.

2. Give the default cluster a meaningful name. For example, change the default cluster named dse to payroll.

3. Verify that the node binds to working IP addresses.

4. Add DataStax Enterprise nodes to the DSE Multi-Instance cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
515
Using DataStax Enterprise advanced functionality

• For package installations, you can use the dse add-node command. For example, to add a node that
will join the cluster payroll on startup:

$ sudo dse add-node nodeId --cluster payroll --listen-address unused_ip_of_server --


rpc-address unused_ip_of_server --seeds ip_of_default_node

• For tarball installations, extract the product.tar.gz file multiple times and configure nodes in each
location.

5. Before starting the new node, set the node type in the /etc/default/dse-nodeId file:

• DSE Search:

SOLR_ENABLED=1

• DSE Analytics:

SPARK_ENABLED=1

6. Continue configuring the node as appropriate.

a. To change default DataStax Enterprise configuration values, edit the configuration files in /etc/
nodeId.
Ensure that the JMX port is configured for each node. 7199 is the DSE JMX metrics monitoring
port. DataStax recommends allowing connections only from the local node. Configure SSL and
JMX authentication when allowing connections from other nodes.

b. To change default database configuration values, edit the /etc/dse-nodeId/cassandra/


cassandra.yaml file.

See DSE Multi-Instance file locations.

7. After you make configuration changes, start the node.


If the following error appears, look for DataStax Enterprise times out when starting and other articles in
the Support Knowledge Center.

WARNING: Timed out while waiting for DSE to start.

8. Verify that the nodes are running and are part of the cluster.
For example, to verify the cluster status from a local node named dse-node1 on a DSE Multi-Instance
cluster:

$ sudo dse dse-node1 dsetool ring

With DSE Multi-Instance, the output includes the Server ID:

Server ID Address DC Rack Workload Graph Status State


Load Owns VNodes Health [0,1)
42-01-0A-F0-00-02 10.240.0.2 Cassandra rack1 Cassandra no Up Normal
92.13 KB 46.86% -9223372036854775808 0.17

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
516
Using DataStax Enterprise advanced functionality

42-01-0A-F0-00-02 127.0.0.1 Cassandra rack1 Cassandra no Up Normal


150.6 KB 53.14% 579561378715200106

Using the standard dsetool ring command provides the status of the default node dse:

$ sudo dsetool ring

When a DSE Multi-Instance server is present in the cluster, the output always includes the Server ID
column, even when you run the command on a server that is a DSE Multi-Instance host machine:

Server ID Address DC Rack Workload Graph Status State


Load Owns VNodes Health [0,1)
42-01-0A-F0-00-02 10.240.0.2 Cassandra rack1 Cassandra no Up Normal
92.13 KB 46.86% -9223372036854775808 0.17
42-01-0A-F0-00-02 127.0.0.1 Cassandra rack1 Cassandra no Up Normal
150.6 KB 53.14% 579561378715200106

9. To run standard DataStax Enterprise commands for nodes on a DSE Multi-Instance host machine, specify
the node name using this syntax:

sudo dse dse-nodeId subcommand [command_arguments]

The node ID that is specified with the add-node command is automatically prefixed with dse-. In all
instances except for add-node, the command syntax requires the dse- prefix.
For example, with DSE Multi-Instance, the command to start a Spark shell on a node named dse-spark-
node is:

$ sudo dse dse-spark-node spark

In contrast, the command to start a Spark shell without DSE Multi-Instance is:

$ dse spark

DSE Multi-Instance commands


Commands to configure and use multiple DataStax Enterprise nodes on a single host machine.
DSE Multi-Instance commands are supported only on package installations.

• $ sudo dse dse-nodeId subcommand [command_arguments]

$ sudo dsetool dse-nodeId command [command_arguments]

• For example:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
517
Using DataStax Enterprise advanced functionality

To run the dsetool ring command on a node named dse-node1 in a cluster on a DSE Multi-Instance host
machine:

$ sudo dse dse-node1 dsetool ring

To run the dsetool ring command without DSE Multi-Instance:

$ sudo dsetool ring

• dse nodeID [command_arguments]

• dse add-node

• dse list-nodes

• dse remove-node

DSE Tiered Storage


DSE Tiered Storage is part of the multiple storage options offered in DataStax Enterprise for optimizing
performance and cost goals. DSE Tiered Storage automates the smart movement of data across different types
of storage media to improve performance and reduce manual processes. DSE Tiered Storage improves efficiency
of faster and more expensive media, and mitigates the performance impact that slower storage media has on your
most common queries. With DSE Tiered Storage, older data is moved to the slower storage media.
About DSE Tiered Storage
DSE Tiered Storage is part of the multiple storage options offered in DataStax Enterprise for optimizing
performance and cost goals. DSE Tiered Storage automates the smart movement of data across different
types of storage media to improve performance and reduce manual processes. DSE Tiered Storage improves
efficiency of faster and more expensive media, and mitigates the performance impact that slower storage media
has on your most common queries. With DSE Tiered Storage, older data is moved to the slower storage media.
DSE Tiered Storage is beneficial for applications like social media applications that have a lot of time series data,
where recent data is accessed more frequently than older data. DSE Tiered Storage is appropriate when the
data most often read is most frequently written. Common use cases appropriate for DSE Tiered Storage are
described in the Improve Data Center Cost Efficiency with DSE Tiered Storage blog.
DSE Tiered Storage is not recommended:

• When entire data sets are accessed at approximately the same frequency.

• When the data access frequency is not correlated to data age.

• For use with DSE Search.

Features
Increases productivity Automates the movement of data between storage media. Eliminates manually moving data.

Improves performance Data is stored by age, so that frequently accessed data is stored on solid state drives (SSDs) for fastest
performance.

Transparent data access Access to the data in different storage tiers is transparent to users and developers.

Lowers storage costs Improves datacenter cost efficiency. Automatically stores less frequently accessed historical data on slower, less
expensive storage media, such as spinning disks.

Flexible configuration Different server configurations are easy to support and configure. Disk layout is configured per node, so you can test
options adjustments on single nodes before deploying cluster wide.

Compaction strategies Uses the selected tiering strategy to compact based on partition age and automate moving data by row between
storage media.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
518
Using DataStax Enterprise advanced functionality

Performance metrics Dashboard Tiered storage performance metrics for DSE Tiered Storage are available in OpsCenter.

Configuring DSE Tiered Storage


Configuring the data movement between storage media takes place at the node level and the schema level:

• Configure the storage strategies to define storage locations, and the tiers that define the storage locations,
at the node level in the dse.yaml file.
Use OpsCenter Lifecycle Manager to run a Configure job to push the configuration to applicable nodes.
Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations
are supported.
DataStax recommends local configuration testing before deploying cluster wide.

• Configure the age policy at the schema level.


The only supported data usage policy is partition age. Tier age thresholds are set when a table is created
with the compaction strategy TieredCompactionStrategy.

The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr™
issues apply.

1. In the dse.yaml file on each node, uncomment the tiered_storage_options section.

2. For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory
locations for each tier.

a. Define storage tiers in priority order with the fastest storage media in the tier that is listed first.

b. For each tier, define the data directory locations.

Use this format, where config_name is the tiered storage strategy that you reference with the CREATE
TABLE or ALTER TABLE statements. The config_name must be the same across all nodes:

tiered_storage_options:
config_name:
tiers:
- paths:
- path_to_directory1
- paths:
- path_to_directory2

where:

• config_name is the configurable name of the tiered storage configuration strategy. For example:
strategy1.

• tiers is the section define a storage tier with the paths and file paths that define the priority order.

• paths is the section of file paths that define the data directories for this tier of the disk configuration.
Typically list the fastest storage media first. These paths are used only to store data that is
configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml
file.

For example, the tiered storage configuration named strategy1 has three different storage tiers ordered in
priority (the first tier listed has highest priority):

tiered_storage_options:
strategy1:
tiers:
- paths:
- /mnt1
- /mnt2

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
519
Using DataStax Enterprise advanced functionality

- paths:
- /mnt3
- /mnt4
- paths:
- /mnt5
- /mnt6

3. To apply the tiered storage strategies to selected tables, use CREATE or ALTER table statements.
For example, to apply tiered storage to table ks.tbl:

CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c))
WITH
COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy',
'tiering_strategy': 'TimeWindowStorageStrategy',
'config': 'strategy1',
'max_tier_ages': '3600,7200'};

Set timing metrics with the compaction options:

• class
'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy' configures a table to
use tiered storage.

• tiering_strategy
'tiering_strategy': 'TimeWindowStorageStrategy' uses TimeWindowStorageStrategy (TWSS)
to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that uses
TimeWindowCompactionStrategy (TWCS).

• config
'config': 'strategy1' specifies to use the strategy that is configured in the dse.yaml file, in this case
strategy1.

• max_tier_ages
'max_tier_ages': '3600,7200' uses the values in a comma-separated list to define the maximum
age per tier, in seconds, where:

# 3600 restricts the first tier to data that is aged an hour (3600 seconds) or less.

# 7200 restricts the second tier to data that aged two hours (7200 seconds) or less.

# All other data is routed to the data direction locations that are defined for the third tier.

For TimeWindowStorageStrategy (TWSS), DataStax recommends that one tier be defined for
each time age that is specified for max_tier_ages, plus another tier for older data. However,
DataStax Enterprise uses only the tiers that are configured in the table schema and the dse.yaml
file.

An implicit tier exists that represents the oldest data. For example, for a strategy with two tiers in
dse.yaml:

# 'max_tier_ages': '3600,7200' uses three tiers. Tier 0 would be for data newer than 3600
seconds, tier 1 would be for data between 3600 seconds and 7200 seconds, and tier 2 would be
for data older than 7200 seconds.

# 'max_tier_ages': '3600' uses only the first two tiers.

# 'max_tier_ages': '3600,7200,10800' uses all three tiers, but ignores the last value. Any data
that did not belong in the first two tiers goes to the third tier, whether the data was older than
10800 seconds or not.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
520
Using DataStax Enterprise advanced functionality

The CQL compaction subproperties for TWCS are also supported.

Testing DSE Tiered Storage configurations


DataStax recommends local configuration testing before deploying cluster wide, or for storage configurations
that do not map cleanly to the database with CREATE or ALTER table statements. Test adjustments on one or
two nodes before deploying cluster wide. You can add local configuration options to overwrite the tiered storage
settings in the table schema. You cannot overwrite the class or the tiered storage configuration name.
Prerequisites: Complete DSE Tiered Storage configuration steps before you adjust and test the configurations.

1. To overwrite the settings in the table schema in the local dse.yaml file, add a local_options key to an
existing tiered storage configuration.
For example, for this dse.yaml configuration:

tiered_storage_options:
strategy1:
tiers:
- paths:
- /mnt1
- paths:
- /mnt2
- paths:
- /mnt3

And this existing table schema with 'max_tier_ages': '3600,7200':

CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c))
WITH COMPACTION={'class':'TieredCompactionStrategy',
'tiering_strategy': 'TimeWindowStorageStrategy',
'config': 'strategy1',
'max_tier_ages': '3600,7200'};

You can adjust the max_tier_ages value to 7200,10800 on a single node, by adding the local_options
key like this:

tiered_storage_options:
strategy1:
local_options:
max_tier_ages: "7200, 10800"
tiers:
- paths:
- /mnt1
- paths:
- /mnt2
- paths:
- /mnt3

2. Restart the node.


After the node starts, the tiered storage strategies that use strategy1 now use max_tier_ages values
"7200,10800", instead of "3600,7200" as configured on the table schema.

3. To monitor the tiered storage behavior of individual tables, use the dsetool tieredtablestats command:

dsetool tieredtablestats

ks.tbl
Tier 0:
Summary:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
521
Using DataStax Enterprise advanced functionality

max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
size: 4839
SSTables:
/mnt2/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-3-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
rows: 1
size: 4839
Tier 1:
Summary:
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0
reads_15_min: 0.0
size: 4839
SSTables:
/mnt3/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-4-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0
reads_15_min: 0.0
rows: 1
size: 4839

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
522
Chapter 8. DataStax Enterprise tools

DataStax Enterprise Metrics Collector


Available in DSE 6.0.5 and later, DSE Metrics Collector aggregates DataStax Enterprise (DSE) metrics and
integrates with existing monitoring solutions to facilitate problem resolution and remediation.
DSE Metrics Collector is built on collectd, a popular, well-supported, open source metric collection agent. With
over 90 plugins, you can tailor the solution to collect metrics most important to your organization.
When DSE Metrics Collector is enabled, DSE sends metrics and other structured events to DSE Metrics Collector.
Use dsetool insights_config to enable and configure the frequency and type of metrics that are sent to DSE
Metrics Collector. After setting the configuration properties, you can export the aggregated metrics to monitoring
tools like Prometheus, Graphite, and Splunk, which can then be visualized in a dashboard such as Grafana.

nodetool
About the nodetool utility
The nodetool utility is a command-line interface for monitoring a cluster and performing routine database
operations. It is typically run from an operational node.
The nodetool utility supports the most important JMX metrics and operations, and includes other useful
commands for cluster administration. Use nodetool commands to view detailed metrics for tables, server metrics,
and compaction statistics.
nodetool abortrebuild
Aborts a currently running rebuild operation. Completes processing of active streams, but no new streams are
started.
Synopsis

$ nodetool [connection_options] abortrebuild [-r reason]

Table 81: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
523
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-r, --reason reason


Comment to add to the log file.
Examples

Stop a build operation with reason comment

$ nodetool abortrebuild -r 'stopping for quarterly maintenance'

nodetool assassinate
Forcefully removes a dead node without re-replicating any data. Use as a last resort when you cannot
successfully use nodetool removenode.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
524
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] assassinate ip_address

Table 82: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
525
DataStax Enterprise tools

ip_address
IP address of the node.
Examples

Forcefully remove a node

$ nodetool -u user1 -pw password1 assassinate 192.168.100.2

The node at IP address 192.168.100.2 is forcefully removed. Data is not re-replicated.


nodetool bootstrap
Monitors and manages the bootstrap process on one or more nodes.
Synopsis

$ nodetool [connection_options] bootstrap [resume]

Table 83: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
526
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

resume
Restart the operation.
Examples

Resume the local bootstrap operation

$ nodetool -u user1 -pw pwd1 bootstrap resume

nodetool cfhistograms
This tool has been renamed to nodetool tablehistograms
nodetool cfstats
This tool has been renamed to nodetool tablestats
nodetool cleanup
Triggers immediate cleanup of keyspaces that no longer belong to a node.
OpsCenter provides a Cleanup option in the Nodes UI for Running cleanup.
DataStax Enterprise does not automatically remove data from nodes that lose part of their partition range to
a newly added node. Run nodetool cleanup on the source node and on neighboring nodes that shared the
same subrange after the new node is up and running. After adding a new node, run this command to prevent the
database from including the old data to rebalance the load on that node. This command temporarily increases
disk space use proportional to the size of the largest SSTable and causes Disk I/O to occur.

Failure to run nodetool cleanup after adding a node may result in data inconsistencies including resurrection
of previously deleted data.

Synopsis

$ nodetool [connection_options] cleanup [-j num_jobs] [--] [keyspace_name table_name


[table_name ...]]

Table 84: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
527
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-j, --jobs num_jobs

• num_jobs - Number of SSTables affected simultaneously. Default: 2.

• 0 - Use all available compaction threads.

keyspace_name
Keyspace name. By default, all keyspaces.
table_name
The table name.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
528
DataStax Enterprise tools

nodetool clearsnapshot
Removes one or all snapshots.

This command deletes the backup (snapshot) copy of your node.

Synopsis

$ nodetool [connection_options] clearsnapshot [--all | -t snapshotname] [--] [keyspace_name]

Table 85: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
529
DataStax Enterprise tools

-u, --username jmx_username


The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
--all
Removes all snapshots.
keyspace_name
Keyspace name. By default, all keyspaces.
-t snapshotname, --tag snapshotname
The snapshot filepath. To remove all snapshots, omit the snapshot filepath.
Examples

To delete all snapshots for a node

$ nodetool -h localhost -p 7199 clearsnapshot --all

To delete snapshot1

$ nodetool clearsnapshot -t snapshot1

nodetool compact
Forces a major compaction on one or more tables or user-defined compaction on given SSTables.
OpsCenter provides a Compact option in the Nodes UI for Running compaction.
Major compactions may behave differently depending which compaction strategy is used for the affected tables:

• SizeTieredCompactionStrategy (STCS): The default compaction strategy. This strategy triggers a


minor compaction when there are a number of similar sized SSTables on disk as configured by the table
subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace. Also see
STCS compaction subproperties.

• DateTieredCompactionStrategy (DTCS) (deprecated).

• TimeWindowCompactionStrategy (TWCS) This strategy is an alternative for time series data. TWCS
compacts SSTables using a series of time windows. While with a time window, TWCS compacts all
SSTables flushed from memory into larger SSTables using STCS. At the end of the time window, all of
these SSTables are compacted into a single SSTable. Then the next time window starts and the process
repeats. The duration of the time window is the only setting required. See TWCS compaction subproperties.
For more information about TWCS, see How is data maintained?.

• LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a fixed,


relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are
guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous.
Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously
being compacted into progressively larger levels. At each level, row keys are merged into non-overlapping
SSTables in the next level. This process can improve performance for reads, because the database can
determine which SSTables in each level to check for the existence of row key data. This compaction
strategy is modeled after Google's LevelDB implementation. Also see LCS compaction subproperties.

See How is data maintained? and Configuring compaction.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
530
DataStax Enterprise tools

A major compaction incurs considerably more disk I/O than minor compactions.

Synopsis

$ nodetool [connection_options] compact [-et end_token] [-s] [-st start_token] [--user-


defined] [--] [keyspace tables [tables ...] | sstable_name ...]

Table 86: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
531
DataStax Enterprise tools

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-et, --end-token end_token
The token at which the range ends. Requires start token (-st).
keyspace_name
Keyspace name. By default, all keyspaces.
-s, --split-output
Do not create a single large file. Split output when using SizeTieredCompactionStrategy (STCS) to files
that are 50%-25%-12.5% and so on of the total size. Ignored for DTCS.
sstable_name
The name of the SSTable file. Specify sstable_name or sstable_directory.
-st, --start-token start_token
The token at which the range starts. Requires end token (-et).
table_name
The table name.
--user-defined
Submits listed files for user-defined compaction.
nodetool compactionhistory
Prints the history of compaction.
Synopsis

$ nodetool [connection_options] compactionhistory [-F (json | yaml)]

Table 87: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
532
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-F, --format json | yaml


The format for the output. The default is plain text. The following wait latencies (in ms) are included in
the following order: 50%, 75%, 95%, 98%, 99%, Min, and Max.
Examples

To view the compaction history

$ nodetool compactionhistory

The output of compaction history is seven columns wide. The first three
columns show the id, keyspace name, and table name of the compacted
SSTable.

Compaction History:
id
keyspace_name table_name
d06f7080-07a5-11e4-9b36-abc3a0ec9088
system schema_columnfamilies
d198ae40-07a5-11e4-9b36-abc3a0ec9088
libdata users
0381bc30-07b0-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
74eb69b0-0621-11e4-9b36-abc3a0ec9088
system local

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
533
DataStax Enterprise tools

e35dd980-07ae-11e4-9b36-abc3a0ec9088
system compactions_in_progress
8d5cf160-07ae-11e4-9b36-abc3a0ec9088
system compactions_in_progress
ba376020-07af-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
d18cc760-07a5-11e4-9b36-abc3a0ec9088
libdata libout
64009bf0-07a4-11e4-9b36-abc3a0ec9088
libdata libout
d04700f0-07a5-11e4-9b36-abc3a0ec9088
system sstable_activity
c2a97370-07a9-11e4-9b36-abc3a0ec9088
libdata users
cb928a80-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
cd8d1540-079e-11e4-9b36-abc3a0ec9088
system schema_columns
62ced2b0-07a4-11e4-9b36-abc3a0ec9088
system schema_keyspaces
d19cccf0-07a5-11e4-9b36-abc3a0ec9088
system compactions_in_progress
640bbf80-07a4-11e4-9b36-abc3a0ec9088
libdata users
6cd54e60-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
c29241f0-07a9-11e4-9b36-abc3a0ec9088
libdata libout
c2a30ad0-07a9-11e4-9b36-abc3a0ec9088
system compactions_in_progress
e3a6d920-079d-11e4-9b36-abc3a0ec9088
system schema_keyspaces
62c55cd0-07a4-11e4-9b36-abc3a0ec9088
system schema_columnfamilies
62b07540-07a4-11e4-9b36-abc3a0ec9088
system schema_columns
cdd038c0-079e-11e4-9b36-abc3a0ec9088
system schema_keyspaces
b797af00-07af-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
8c918b10-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
377d73f0-07ae-11e4-9b36-abc3a0ec9088
system compactions_in_progress
62b9c410-07a4-11e4-9b36-abc3a0ec9088
system local
d0566a40-07a5-11e4-9b36-abc3a0ec9088
system schema_columns

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
534
DataStax Enterprise tools

ba637930-07af-11e4-9b36-abc3a0ec9088
system compactions_in_progress
cdbc1480-079e-11e4-9b36-abc3a0ec9088
system schema_columnfamilies
e3456f80-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
d086f020-07a5-11e4-9b36-abc3a0ec9088
system schema_keyspaces
d06118a0-07a5-11e4-9b36-abc3a0ec9088
system local
cdaafd80-079e-11e4-9b36-abc3a0ec9088
system local
640fde30-07a4-11e4-9b36-abc3a0ec9088
system compactions_in_progress
37638350-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1

The four columns to the right of the table name show the timestamp, size
of the SSTable before and after compaction, and the number of partitions
merged. The notation means {tables:rows}. For example: {1:3, 3:1} means 3
rows were taken from one SSTable (1:3) and 1 row taken from 3 SSTables (3:1)
to make the one SSTable in that compaction operation.

. . . compacted_at bytes_in
bytes_out rows_merged
. . . 1404936947592 8096
7211 {1:3, 3:1}
. . . 1404936949540 144
144 {1:1}
. . . 1404941328243 1305838191
1305838191 {1:4647111}
. . . 1404770149323 5864
5701 {4:1}
. . . 1404940844824 573
148 {1:1, 2:2}
. . . 1404940700534 576
155 {1:1, 2:2}
. . . 1404941205282 766331398
766331398 {1:2727158}
. . . 1404936949462 8901649
8901649 {1:9315}
. . . 1404936336175 8900821
8900821 {1:9315}
. . . 1404936947327 223
108 {1:3, 2:1}
. . . 1404938642471 144
144 {1:1}

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
535
DataStax Enterprise tools

. . . 1404940804904 383020422
383020422 {1:1363062}
. . . 1404933936276 4889
4177 {1:4}
. . . 1404936334171 441
281 {1:3, 2:1}
. . . 1404936949567 379
79 {2:2}
. . . 1404936336248 144
144 {1:1}
. . . 1404940645958 307520780
307520780 {1:1094380}
. . . 1404938642319 8901649
8901649 {1:9315}
. . . 1404938642429 416
165 {1:3, 2:1}
. . . 1404933543858 692
281 {1:3, 2:1}
. . . 1404936334109 7760
7186 {1:3, 2:1}
. . . 1404936333972 4860
4724 {1:2, 2:1}
. . . 1404933936715 441
281 {1:3, 2:1}
. . . 1404941200880 1269180898
1003196133 {1:2623528, 2:946565}
. . . 1404940699201 297639696
297639696 {1:1059216}
. . . 1404940556463 592
148 {1:2, 2:2}
. . . 1404936334033 5760
5680 {2:1}
. . . 1404936947428 8413
5316 {1:2, 3:1}
. . . 1404941205571 429
42 {2:2}
. . . 1404933936584 7994
6789 {1:4}
. . . 1404940844664 306699417
306699417 {1:1091457}
. . . 1404936947746 601
281 {1:3, 3:1}
. . . 1404936947498 5840
5680 {3:1}
. . . 1404933936472 5861
5680 {3:1}
. . . 1404936336275 378
80 {2:2}

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
536
DataStax Enterprise tools

. . . 1404940556293 302170540
281000000 {1:924660, 2:75340}

nodetool compactionstats
Prints statistics about compactions.
Synopsis

$ nodetool [connection_options] compactionstats [-H]

Table 88: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
537
DataStax Enterprise tools

-pwf, --password-file jmx_password_filepath


The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-H, --human-readable
Display bytes in human readable form: KiB (kibibyte), MiB (mebibyte), GiB (gibibyte), TiB (tebibyte).
Examples

The total column shows the total number of uncompressed bytes of SSTables
being compacted. The system log lists the names of the SSTables compacted.

$ nodetool compactionstats

pending tasks: 5
compaction type keyspace table completed
total unit progress
Compaction Keyspace1 Standard1 282310680
302170540 bytes 93.43%
Compaction Keyspace1 Standard1 58457931
307520780 bytes 19.01%
Active compaction remaining time : 0h00m16s

nodetool decommission
Causes a live node to decommission itself, streaming its data to the next node on the ring to replicate
appropriately.
When decommissioning a DSEFS node, you must unmount DSEFS before removing that node.
See Decommissioning a datacenter, Removing a node, and Adding a node and then decommissioning the old
node.
Use nodetool netstats to monitor the progress.

OpsCenter provides an option to Decommission a node.

Decommission does not shut down the node. Shut down the node after decommission is complete.

Synopsis

$ nodetool [connection_options] decommission [-f]

Table 89: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
538
DataStax Enterprise tools

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-f, --force
Force decommission of the node even when it reduces the number of replicas to below configured RF.
Examples

Decommission a remote node

$ nodetool -h 10.46.123.12 decommission

nodetool describecluster
Prints the name, snitch, partitioner and schema version of a cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
539
DataStax Enterprise tools

Typically used to validate the schema after upgrading. If a schema disagreement occurs, check for and resolve
schema disagreements.
Synopsis

$ nodetool [connection_options] describecluster datacenter_name

Table 90: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
540
DataStax Enterprise tools

Command arguments

datacenter_name
The datacenter name.
Example
Get cluster name, snitch, partitioner and schema version

$ nodetool describecluster

Cluster Information:
Name: Test Cluster
Snitch: com.datastax.bdp.snitch.DseDelegateSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
d4f18346-f81f-3786-aed4-40e03558b299: [127.0.0.1]

Get cluster name, snitch, partitioner and schema version

$ nodetool describecluster

When schema disagreement occurs, the last line of the output includes information about unreachable nodes:

Cluster Information:
Name: Production Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
UNREACHABLE: 1176b7ac-8993-395d-85fd-41b89ef49fbb: [10.202.205.203]

nodetool describering
Shows the token ranges.
Synopsis

$ nodetool [connection_options] describering [--] [keyspace_name [keyspace_name ...]]

Table 91: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
541
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
Examples

Get token range information on the cycling keyspace

$ nodetool describering cycling

Schema Version:1b04bd14-0324-3fc8-8bcb-9256d1e15f82
Keyspace: cycling
TokenRange:
TokenRange(start_token:3074457345618258602,
end_token:-9223372036854775808,
endpoints:[127.0.0.1, 127.0.0.2, 127.0.0.3],
rpc_endpoints:[127.0.0.1, 127.0.0.2, 127.0.0.3],

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
542
DataStax Enterprise tools

endpoint_details:[EndpointDetails(host:127.0.0.1, datacenter:datacenter1,
rack:rack1),
EndpointDetails(host:127.0.0.2, datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.3, datacenter:datacenter1, rack:rack1)])
TokenRange(start_token:-3074457345618258603,
end_token:3074457345618258602,
endpoints:[127.0.0.3, 127.0.0.1, 127.0.0.2],
rpc_endpoints:[127.0.0.3, 127.0.0.1, 127.0.0.2],
endpoint_details:[EndpointDetails(host:127.0.0.3,
datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.1, datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.2, datacenter:datacenter1, rack:rack1)])
TokenRange(start_token:-9223372036854775808,
end_token:-3074457345618258603,
endpoints:[127.0.0.2, 127.0.0.3, 127.0.0.1],
rpc_endpoints:[127.0.0.2, 127.0.0.3, 127.0.0.1],
endpoint_details:[EndpointDetails(host:127.0.0.2, datacenter:datacenter1,
rack:rack1),
EndpointDetails(host:127.0.0.3, datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.1, datacenter:datacenter1, rack:rack1)])

nodetool disableautocompaction
Disables autocompaction for a keyspace and one or more tables for the current node or the specified node.
Synopsis

$ nodetool [connection_options] disableautocompaction [--] [keyspace_name table_name


[table_name ...] ]

Table 92: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
543
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
table_name
The table name.
nodetool disablebackup
Disables incremental backup.
Synopsis

$ nodetool [connection_options] disablebackup

Table 93: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
544
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Disable incremental backup

$ nodetool disablebackup

nodetool disablebinary
Disables the native transport that defines the format of the binary messages.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
545
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] disablebinary

Table 94: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
546
DataStax Enterprise tools

This command takes no arguments.


Examples

Disable native transport

$ nodetool disablebinary

nodetool disablegossip
Disables the gossip protocol, which effectively marks the node as down.
Synopsis

$ nodetool [connection_options] disablegossip

Table 95: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
547
DataStax Enterprise tools

-p, --port jmx_port


The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Disable gossip

$ nodetool disablegossip

nodetool disablehandoff
Disables storing of future hints.
Synopsis

$ nodetool [connection_options] disablehandoff

Table 96: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
548
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Disable storing future hints

$ nodetool disablehandoff

nodetool disablehintsfordc
Turns off hints for a datacenter, but continue hints on other datacenters.
Useful if there is a downed datacenter and during datacenter failover, when hints will put unnecessary pressure
on the datacenter.
Synopsis

$ nodetool [connection_options] disablehintsfordc {--} datacenter_name

Table 97: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
549
DataStax Enterprise tools

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
datacenter_name
The datacenter name.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
550
DataStax Enterprise tools

Examples

Turn off hints for specific datacenter

$ nodetool -u joe -pw P@ssw0rd! disablehintsfordc DC2

nodetool drain
Flushes all memtables from the node to SSTables on disk. DSE stops listening for connections from the client
and other nodes. You need to restart DSE after running nodetool drain. Typically, use this command before
upgrading a node to a new version of DSE.
To simply flush memtables to disk, use nodetool flush.
OpsCenter provides an option for Draining a node.
Synopsis

$ nodetool [connection_options] drain

Table 98: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
551
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Flush memtables from node to SSTables on disk

$ nodetool drain

nodetool enableautocompaction
Enables autocompaction for a keyspace and one or more tables, or all tables.
Synopsis

$ nodetool [connection_options] enableautocompaction [--] [keyspace_name table_name [table_name


...]]

Table 99: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
552
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
Keyspace name. By default, all keyspaces.
table_name
The table name.
Examples

Enable autocompaction on cyclist_name table in cycling keyspace

$ nodetool enableautocompaction cycling cyclist_name

nodetool enablebackup
Enables incremental backup.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
553
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] enablebackup

Table 100: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
554
DataStax Enterprise tools

This command takes no arguments.


Examples

Enable incremental backup

$ nodetool enablebackup

nodetool enablebinary
Re-enables the native transport that defines the format of the binary messages.
Synopsis

$ nodetool [connection_options] enablebinary

Table 101: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
555
DataStax Enterprise tools

-p, --port jmx_port


The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Re-enable native transport

$ nodetool enablebinary

nodetool enablegossip
Re-enables gossip.
Synopsis

$ nodetool [connection_options] enablegossip

Table 102: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
556
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Re-enable gossip

$ nodetool enablegossip

nodetool enablehandoff
Reenables storing of future hints on the current node.
Synopsis

$ nodetool [connection_options] enablehandoff

Table 103: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
557
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Reenable future hints storage on current node

$ nodetool enablehandoff

nodetool enablehintsfordc
Turns on hints for a datacenter that was previously disabled with nodetool disablehintsfordc.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
558
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] enablehintsfordc [--] datacenter_name

Table 104: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
559
DataStax Enterprise tools

--
Separates an option from an argument that could be mistaken for a option.
datacenter_name
The datacenter name.
Examples

Turn on hints for DC2

$ nodetool -u elsa -pw P@ssw0rd! enablehintsfordc DC2

nodetool failuredetector
Shows the failure detector information for the cluster.
Synopsis

$ nodetool [connection_options] failuredetector

Table 105: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
560
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Show failure detection information for cluster

$ nodetool failuredetector

Endpoint, Phi

nodetool flush
Flushes one or more tables from the memtable to SSTables on disk.
OpsCenter provides a flush option in the Nodes UI for Flushing tables.
Synopsis

$ nodetool [connection_options] flush [--] [keyspace_name table_name [table_name ...]]

Table 106: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
561
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
Keyspace name. By default, all keyspaces.
table_name
The table name.
Examples

Flush cycling keyspace and cyclist_name table

$ nodetool flush cycling cyclist_name

nodetool garbagecollect
Removes deleted data from one or more tables.

The nodetool garbagecollect command is not the same as the Perform GC option in OpsCenter.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
562
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] garbagecollect [-g ROW|CELL] [-j job_threads] [--]


[keyspace_name table_name [table_name ...]]

Table 107: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
563
DataStax Enterprise tools

--
Separates an option from an argument that could be mistaken for a option.
-g, --granularity ROW|CELL
ROW (default) removes deleted partitions and rows.
CELL also removes overwritten or deleted cells.
-j, --jobs num_jobs

• num_jobs - Number of SSTables affected simultaneously. Default: 2.

• 0 - Use all available compaction threads.

keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
Examples

To remove deleted data from all tables and keyspaces at the default
granularity

$ nodetool garbagecollect

To remove deleted data from all tables and keyspaces, including overwritten
or deleted cells

$ nodetool garbagecollect -g CELL

nodetool gcstats
Prints garbage collection statistics that returns values based on all the garbage collection that has run since the
last time this command was run. Statistics identify the interval, GC elapsed time (total and standard deviation),
the disk space reclaimed in megabytes (MB), number of garbage collections, and direct memory bytes.
Synopsis

$ nodetool [connection_options] gcstats

Table 108: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
564
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

To print garbage collection statistics

$ nodetool gcstats

Result: the garbage collection statistics since the last time the command was run are returned:

Interval (ms) Max GC Elapsed (ms)Total GC Elapsed (ms)Stdev GC Elapsed (ms) GC


Reclaimed (MB) Collections Direct Memory Bytes

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
565
DataStax Enterprise tools

1890311355 113 89238 7


5699826457288 2267 -1

nodetool getbatchlogreplaythrottle
Prints batchlog replay throttle in KBs. The batchlog replay throttle replays hints. The throttle is reduced
proportionally to the number of nodes in the cluster.
Synopsis

$ nodetool [connection_options] getbatchlogreplaythrottle

Table 109: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
566
DataStax Enterprise tools

-pwf, --password-file jmx_password_filepath


The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Print batchlog replay throttle in KBs

$ nodetool getbatchlogreplaythrottle

Batchlog replay throttle: 1024 KB/s

nodetool getcachecapacity
Gets the global key, row, and counter cache capacities in megabytes.
Synopsis

$ nodetool [connection_options] getcachecapacity

Table 110: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
567
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

To get the global key, row cache, and counter cache capacities:

$ nodetool getcachecapacity

Key cache capacity: 100 MB


Row cache capacity: 0 MB
Counter cache capacity: 50 MB

A value of 0 means that the cache is disabled.


nodetool getcachekeystosave
Gets the global number of keys saved by counter cache, key cache, and row cache.
Synopsis

$ nodetool [connection_options] getcachekeystosave

Table 111: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
568
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

To get the global number of keys saved by each cache:

$ nodetool getcachekeystosave

Key cache keys to save: 2147483647

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
569
DataStax Enterprise tools

Row cache keys to save: 2147483647


Counter cache keys to save: 2147483647

nodetool getcompactionthreshold
Prints the minimum and maximum compaction thresholds in megabytes (MBs) for a given table.
Synopsis

$ nodetool [connection_options] getcompactionthreshold [--] keyspace_name table_name

Table 112: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
570
DataStax Enterprise tools

The filepath to the file that stores JMX authentication credentials.


-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
The table name.
Examples

Print compaction thresholds

$ nodetool getcompactionthreshold cycling birthday_list

Current compaction thresholds for cycling/birthday_list:


min = 4, max = 32

nodetool getcompactionthroughput
Prints current compaction throughput in megabytes (MBs) per second.
Synopsis

$ >nodetool [connection_options] getcompactionthroughput

Table 113: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
571
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

To print current compaction throughput for the system

$ nodetool -u username -pw password getcompactionthroughput

Current compaction throughput: 16 MB/s

nodetool getconcurrentcompactors
Gets the number of concurrent compactors in the system.
Synopsis

$ nodetool [connection_options] getconcurrentcompactors

Table 114: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
572
DataStax Enterprise tools

Syntax conventions Description

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
573
DataStax Enterprise tools

Examples

To get number of concurrent compactors

$ nodetool -u joe -pw P@ssw0rd! getconcurrentcompactors

Current concurrent compactors in the system is:


2

nodetool getconcurrentviewbuilders
Display the number of concurrent materialized view builders in the system.
Synopsis

$ nodetool [connection_options] getconcurrentviewbuilders

Table 115: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
574
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Display the number of concurrent materialized view builders

$ nodetool getconcurrentviewbuilders

Current number of concurrent view builders in the system is:


6

nodetool getendpoints
Prints the endpoints that own the partition key.
Synopsis

$ nodetool [connection_options] getendpoints [--] keyspace_name table_name partition_key

Table 116: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
575
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
key
Partition key of the end points you want to get.
keyspace_name
The keyspace name.
table_name
The table name.
Examples

Print endpoints that own partition key

For example, which nodes own partition key_1, key_2, and key_3?

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
576
DataStax Enterprise tools

The partitioner returns a token for the key. DSE will return an endpoint regardless of whether data exists on
the identified node for that token.

$ nodetool -h 127.0.0.1 -p 7100 getendpoints myks mytable key_1

127.0.0.2

$ nodetool -h 127.0.0.1 -p 7100 getendpoints myks mytable key_2

127.0.0.2

For example, consider the following table, which uses a primary key of race_year and race_name. This table is
created in the cycling keyspace.

CREATE TABLE cycling.rank_by_year_and_name (


race_year int,
race_name text,
rank int,
cyclist_name text,
PRIMARY KEY ((race_year, race_name), rank)
) WITH CLUSTERING ORDER BY (rank ASC);

INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)


VALUES (2015, 'Tour of Japan - Stage 4 - Minami > Shinshu', 'Benjamin PRADES', 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)
VALUES (2015, 'Tour of Japan - Stage 4 - Minami > Shinshu', 'Adam PHELAN', 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)
VALUES (2015, 'Tour of Japan - Stage 4 - Minami > Shinshu', 'Thomas LEBAS', 3);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)
VALUES (2015, 'Giro d''Italia - Stage 11 - Forli > Imola', 'Ilnur ZAKARIN', 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)
VALUES (2015, 'Giro d''Italia - Stage 11 - Forli > Imola', 'Carlos BETANCUR', 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)
VALUES (2014, '4th Tour of Beijing', 'Phillippe GILBERT', 1);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)
VALUES (2014, '4th Tour of Beijing', 'Daniel MARTIN', 2);
INSERT INTO cycling.rank_by_year_and_name (race_year, race_name, cyclist_name, rank)
VALUES (2014, '4th Tour of Beijing', 'Johan Esteban CHAVES', 3);

Given the previous information that was inserted into the table, run nodetool getendpoints and enter a value
from the partition key. For example:

$ nodetool getendpoints cycling rank_by_year_and_name "2014"

10.255.100.150

The resulting output is the IP address of the replica that owns the partition key.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
577
DataStax Enterprise tools

To specify values that comprise the full primary key

$ nodetool getendpoints cycling rank_by_year_and_name "2014:4th Tour of Beijing"

10.255.100.150

nodetool gethintedhandoffthrottlekb
Gets hinted handoff throttle in KB/sec per delivery thread.
Synopsis

$ nodetool [connection_options] gethintedhandoffthrottlekb

Table 117: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
578
DataStax Enterprise tools

-p, --port jmx_port


The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Get the hinted handoff throttle

$ nodetool gethintedhandoffthrottlekb

Hinted handoff throttle per delivery thread: 1024 KB

nodetool getinterdcstreamthroughput
Prints the outbound throttle (throughput cap) for all streaming file transfers between datacenters.
Synopsis

nodetool [connection_options] getinterdcstreamthroughput

Table 118: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
579
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Print the outbound throttle (throughput cap) for streaming file transfers
between datacenters

$ nodetool getinterdcstreamthroughput

The result is the default 200 megabits (Mbps) per second

Current inter-datacenter stream throughput: 200 Mb/s

nodetool getlogginglevels
Gets the runtime logging levels.

To change logging levels, use nodetool setlogginglevel. See Configuring logging.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
580
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] getlogginglevels

Table 119: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
581
DataStax Enterprise tools

This command takes no arguments.


Examples

Get runtime logging levels

$ nodetool getlogginglevels

Logger Name Log Level


ROOT INFO
DroppedAuditEventLogger INFO
SLF4JAuditWriter INFO
com.cryptsoft OFF
com.datastax.bdp.db DEBUG
com.datastax.bdp.search.solr.metrics.SolrMetricsEventListener DEBUG
com.datastax.bdp.util.process.InternalServiceRunner DEBUG
com.datastax.bdp.util.process.ServiceRunner DEBUG
com.datastax.driver.core.NettyUtil ERROR
org.apache.cassandra DEBUG
org.apache.lucene.index INFO
org.apache.solr.core.CassandraSolrConfig WARN
org.apache.solr.core.RequestHandlers WARN
org.apache.solr.core.SolrCore WARN
org.apache.solr.handler.component WARN
org.apache.solr.search.SolrIndexSearcher WARN
org.apache.solr.update WARN
org.apache.spark.rpc ERROR
org.apache.spark.util.logging.FileAppender OFF

nodetool getmaxhintwindow
Prints the maximum time that the database generates hints for an unresponsive node.
Synopsis

$ nodetool [connection_options] getmaxhintwindow

Table 120: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
582
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Get maximum hint window

$ nodetool getmaxhintwindow

Result: the maximum time that the database generates hints for an unresponsive node is 10800000 milliseconds
(3 hours).

Current max hint window: 10800000 ms

nodetool getseeds
Gets the IP list of the current seed node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
583
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] getseeds

Table 121: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
584
DataStax Enterprise tools

This command takes no arguments.


Examples

To get IP list of current seed node

$ nodetool getseeds

Current list of seed node IPs excluding the current node IP: /10.100.15.1

nodetool getsstables
Prints the SSTable that owns the partition key.
Synopsis

$ nodetool [connection_options] getsstables [-hf] [--] keyspace_name table_name partition_key

Table 122: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
585
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-hf, --hex-format
Specify the key in hexadecimal string format.
key
Partition key of the end points you want to get.
keyspace_name
The keyspace name.
table_name
The table name.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
586
DataStax Enterprise tools

Examples

Get the SSTable that owns the given partition key

$ nodetool getsstables cycling cyclist_name fb372533-eb95-4bb4-8685-6ef61e994caa

The result is:

/var/lib/cassandra/data/cycling/comments-b6239e719c0411e8a6f11f56fd0aa24a/aa-3-bti-Data.db

The hex string representation of the partition key is useful to resolve errors. For example, find out which SSTable
owns the faulty partition key for this exception:

java.lang.AssertionError: row DecoratedKey(2769066505137675224,


00040000002e00000800000153441a3ef000) received out of order wrt
DecoratedKey(2774747040849866654, 00040000019b0000080000015348847eb200)

When the primary key of the given table is a blob, get the DecoratedKey from the hexidecimal representation of
the partition key:

nodetool getsstables -hf cycling stats 00040000002e00000800000153441a3ef000

/var/lib/cassandra/data/cycling/comments-b6239e719c0411e8a6f11f5cd5459987/aa-2-bti-Data.db

Get the SSTables by specifying the full primary key

$ nodetool getsstables cycling rank_by_year_and_name "2014:4th Tour of Beijing"

/var/lib/cassandra/data/cycling/comments-b6239e719c0411e8a6f11f5cd5459987/aa-2-bti-Data.db

nodetool getstreamthroughput
Gets the throughput throttle for streaming file transfers.
Synopsis

$ nodetool [connection_options] getstreamthroughput

Table 123: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
587
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

To get the throughput throttle in megabits per second

$ nodetool getstreamthroughput

Current stream throughput: 200 Mb/s

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
588
DataStax Enterprise tools

Current streaming connections per host: 1

nodetool gettimeout
Prints the current timeout values in milliseconds.

To change the timeout, use nodetool settimeout.

Synopsis

$ nodetool [connection_options] gettimeout [--] [timeout_type]

Table 124: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
589
DataStax Enterprise tools

-pwf, --password-file jmx_password_filepath


The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
timeout_type
The timeout type: read, range, write, counterwrite, cascontention, truncate, streamingsocket, or misc
(general rpc_timeout_in_ms).
Examples

To get timeouts for all types

$ nodetool gettimeout

Current timeout for type read: 5000 ms


Current timeout for type range: 10000 ms
Current timeout for type write: 2000 ms
Current timeout for type counterwrite: 5000 ms
Current timeout for type cascontention: 1000 ms
Current timeout for type truncate: 60000 ms
Current timeout for type misc: 10000 ms

To get timeout for read requests

$ nodetool gettimeout read

Current timeout for type read: 5000 ms

nodetool gettraceprobability
Prints the current trace probability value.

To set the trace probability, see nodetool settraceprobability.

Synopsis

$ nodetool [connection_options] gettraceprobability

Table 125: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
590
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
591
DataStax Enterprise tools

Examples

Print current trace probability value

$ nodetool gettraceprobability

Current trace probability: 0.10

nodetool gossipinfo
Shows the gossip information to discover broadcast protocol between nodes in a cluster.
Synopsis

$ nodetool [connection_options] gossipinfo

Table 126: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
592
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Show gossip information for cluster

$ nodetool gossipinfo

localhost/127.0.0.1
generation:1532896921
heartbeat:2038494
STATUS:1611484:NORMAL,8242717283351148695
LOAD:2038483:262546.0
SCHEMA:975284:d4f18346-f81f-3786-aed4-40e03558b299
DC:26:Search
RACK:18:rack1
RELEASE_VERSION:4:4.0.0.602
NATIVE_TRANSPORT_ADDRESS:3:127.0.0.1
X_11_PADDING:11503:
{"dse_version":"6.0.2","workloads":"SearchGraphCassandraAnalytics","workload":"SearchAnalytics","active":"true
A6-6F","graph":true,"health":0.9}
NET_VERSION:1:256
HOST_ID:2:3b8e8192-c1d3-4b01-a792-9673b4e377c1
NATIVE_TRANSPORT_READY:121:true
NATIVE_TRANSPORT_PORT:6:9042
NATIVE_TRANSPORT_PORT_SSL:7:9042
STORAGE_PORT:8:7000
STORAGE_PORT_SSL:9:7001
JMX_PORT:10:7199
TOKENS:1611483:<hidden>

nodetool handoffwindow
Prints current hinted handoff window.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
593
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] handoffwindow

Table 127: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
594
DataStax Enterprise tools

This command takes no arguments.


Examples

Print current hinted handoff window

$ nodetool handoffwindow

The maximum time that the database generates hints for an unresponsive node is 10800000 ms (3 hours).

Hinted handoff window is 10800000

nodetool help
Provides a synopsis and brief description of each nodetool command.
Synopsis

$ nodetool [connection_options] help [command_name]

Table 128: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
595
DataStax Enterprise tools

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

command_name
Name of nodetool command.
Examples

Print list and brief description of all nodetool commands

$ nodetool help

The most commonly used nodetool commands are:


abortrebuild Abort a currently running rebuild operation. Currently
active streams will finish but no new streams will be started.
assassinate Forcefully remove a dead node without re-replicating any
data. Use as a last resort if you cannot removenode
bootstrap Monitor/manage node's bootstrap process
cleanup Triggers the immediate cleanup of keys no longer
belonging to a node. By default, clean all keyspaces
clearsnapshot Remove the snapshot with the given name from the given
keyspaces. If no snapshotName is specified we will remove all snapshots
compact Force a (major) compaction on one or more tables or user-
defined compaction on given SSTables
compactionhistory Print history of compaction
compactionstats Print statistics on compactions
decommission Decommission the *node I am connecting to*
describecluster Print the name, snitch, partitioner and schema version of
a cluster
describering Shows the token ranges info of a given keyspace
disableautocompaction Disable autocompaction for the given keyspace and table
disablebackup Disable incremental backup
disablebinary Disable native transport (binary protocol)
disablegossip Disable gossip (effectively marking the node down)
disablehandoff Disable storing hinted handoffs
disablehintsfordc Disable hints for a data center
drain Drain the node (stop accepting writes and flush all
tables)
enableautocompaction Enable autocompaction for the given keyspace and table
enablebackup Enable incremental backup
enablebinary Reenable native transport (binary protocol)
enablegossip Reenable gossip
enablehandoff Reenable future hints storing on the current node
enablehintsfordc Enable hints for a data center that was previsouly
disabled

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
596
DataStax Enterprise tools

failuredetector Shows the failure detector information for the cluster


flush Flush one or more tables
garbagecollect Remove deleted data from one or more tables
gcstats Print GC Statistics
getbatchlogreplaythrottle Print batchlog replay throttle in KB/s. This is reduced
proportionally to the number of nodes in the cluster.
getcompactionthreshold Print min and max compaction thresholds for a given table
getcompactionthroughput Print the MB/s throughput cap for compaction in the
system
getconcurrentcompactors Get the number of concurrent compactors in the system.
getconcurrentviewbuilders Get the number of concurrent view builders in the system
getendpoints Print the end points that owns the key
getinterdcstreamthroughput Print the Mb/s throughput cap for inter-datacenter
streaming in the system
getlogginglevels Get the runtime logging levels
getmaxhintwindow Print the max hint window in ms
getseeds Get the currently in use seed node IP list excluding the
node IP
getsstables Print the sstable filenames that own the key
getstreamthroughput Print the Mb/s throughput cap for streaming in the system
gettimeout Print the timeout of the given type in ms
gettraceprobability Print the current trace probability value
gossipinfo Shows the gossip information for the cluster
handoffwindow Print current hinted handoff window
help Display help information
info Print node information (uptime, load, ...)
inmemorystatus Returns a list of the in-memory tables for this node and
the amount of memory each table is using, or information about a single table if the
keyspace and columnfamily are given.
invalidatecountercache Invalidate the counter cache
invalidatekeycache Invalidate the key cache
invalidaterowcache Invalidate the row cache
join Join the ring
listsnapshots Lists all the snapshots along with the size on disk and
true size.
mark_unrepaired Mark all SSTables of a table or keyspace as unrepaired.
Use when no longer running incremental repair on a table or keyspace.
move Move node on the token ring to a new token
netstats Print network information on provided host (connecting
node by default)
nodesyncservice Manage the NodeSync service on the connected node
pausehandoff Pause hints delivery process
proxyhistograms Print statistic histograms for network operations
rangekeysample Shows the sampled keys held across all keyspaces
rebuild Rebuild data by streaming from other nodes (similarly to
bootstrap)
rebuild_index A full rebuild of native secondary indexes for a given
table
refresh Load newly placed SSTables to the system without restart
refreshsizeestimates Refresh system.size_estimates
reloadlocalschema Reload local node schema from system tables
reloadseeds Reload the seed node list from the seed node provider
reloadtriggers Reload trigger classes
relocatesstables Relocates sstables to the correct disk
removenode Show status of current node removal, force completion of
pending removal or remove provided ID
repair Repair one or more tables
repair_admin list and fail incremental repair sessions
replaybatchlog Kick off batchlog replay and wait for finish
resetlocalschema Reset node's local schema and resync
resumehandoff Resume hints delivery process
ring Print information about the token ring
scrub Scrub (rebuild sstables for) one or more tables
sequence Run multiple nodetool commands from a file, resource or
stdin in sequence. Common options (host, port, username, password) are passed to child
commands.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
597
DataStax Enterprise tools

setbatchlogreplaythrottle Set batchlog replay throttle in KB per second, or 0 to


disable throttling. This will be reduced proportionally to the number of nodes in the
cluster.
setcachecapacity Set global key, row, and counter cache capacities (in MB
units)
setcachekeystosave Set number of keys saved by each cache for faster post-
restart warmup. 0 to disable
setcompactionthreshold Set min and max compaction thresholds for a given table
setcompactionthroughput Set the MB/s throughput cap for compaction in the system,
or 0 to disable throttling
setconcurrentcompactors Set number of concurrent compactors in the system.
setconcurrentviewbuilders Set the number of concurrent view builders in the system
sethintedhandoffthrottlekb Set hinted handoff throttle in kb per second, per
delivery thread.
setinterdcstreamthroughput Set the Mb/s throughput cap for inter-datacenter
streaming in the system, or 0 to disable throttling
setlogginglevel Set the log level threshold for a given component or
class. Will reset to the initial configuration if called with no parameters.
setmaxhintwindow Set the specified max hint window in ms
setstreamthroughput Set the Mb/s throughput cap for streaming in the system,
or 0 to disable throttling
settimeout Set the specified timeout in ms, or 0 to disable timeout
settraceprobability Sets the probability for tracing any given request to
value. 0 disables, 1 enables for all requests, 0 is the default
sjk Run commands of 'Swiss Java Knife'. Run 'nodetool sjk --
help' for more information.
snapshot Take a snapshot of specified keyspaces or a snapshot of
the specified table
status Print cluster information (state, load, IDs, ...)
statusautocompaction status of autocompaction of the given keyspace and table
statusbackup Status of incremental backup
statusbinary Status of native transport (binary protocol)
statusgossip Status of gossip
statushandoff Status of storing future hints on the current node
stop Stop compaction
stopdaemon Stop DSE daemon
tablehistograms Print statistic histograms for a given table
tablestats Print statistics on tables
toppartitions Sample and print the most active partitions for a given
column family
tpstats Print usage statistics of thread pools
truncatehints Truncate all hints on the local node, or truncate hints
for the endpoint(s) specified.
upgradesstables Rewrite sstables (for the requested tables) that are not
on the current version (thus upgrading them to said current version)
verify Verify (check data checksum for) one or more tables
version Print DSE DB version
viewbuildstatus Show progress of a materialized view build

Get synopsis and brief description of nodetool netstats

$ nodetool help netstats

NAME
nodetool netstats - Print network information on provided host
(connecting node by default)

SYNOPSIS
nodetool [(-h <host> | --host <host>)] [(-p <port> | --port <port>)]
[(-pw <password> | --password <password>)]
[(-u <username> | --username <username>)] netstats

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
598
DataStax Enterprise tools

OPTIONS
-h <host>, --host <host>
Node hostname or ip address

-p <port>, --port <port>


Remote jmx agent port number

-pw <password>, --password <password>


Remote jmx agent password

-u <username>, --username <username>


Remote jmx agent username

nodetool info
Provides node information, including the token and on disk storage (load) information, times started (generation),
uptime in seconds, and heap memory usage.
Synopsis

$ nodetool [connection_options] info [-T]

Table 129: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
599
DataStax Enterprise tools

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-T, --tokens
Show all tokens.
Examples

Print node information with token

$ nodetool info -T

Result:

ID : 3b8e8192-c1d3-4b01-a792-9673b4e377c1
Gossip active : true
Native Transport active: true
Load : 255.29 KiB
Generation No : 1532896921
Uptime (seconds) : 1882997
Heap Memory (MB) : 604.32 / 4012.00
Off Heap Memory (MB) : 0.00
Data Center : Search
Rack : rack1
Exceptions : 0
Key Cache : entries 0, size 0 bytes, capacity 100 MiB, 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 1 hits, 2 requests,
0.500 recent hit rate, 7200 save period in seconds
Chunk Cache : entries 7871, size 260.79 MiB, capacity 2.79 GiB, 7871 misses,
14839137 requests, 0.999 recent hit rate, 937.529 microseconds miss latency
Percent Repaired : 100.0%
Token : 8242717283351148695

nodetool inmemorystatus
Returns a list of the in-memory tables and the amount of memory each table is using.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
600
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] inmemorystatus [--] [keyspace_name table_name [table_name ...]]

Table 130: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
601
DataStax Enterprise tools

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
Examples

Print information on all in-memory tables

$ nodetool inmemorystatus

Result:

Max Memory to Lock: 3209MB


Current Total Memory Locked: 0MB
Current Total Memory Not Able To Lock: 0MB
No MemoryOnlyStrategy tables found.

Print information on in-memory tables in the cycling keyspace and


popular_count table

$ nodetool inmemorystatus cycling popular_count

Result:

nodetool: Keyspace cycling Table birthday_list is not using MemoryOnlyStrategy.

nodetool invalidatecountercache
Resets global counter cache parameter to save all counter keys. Invalidates the counter_cache_keys_to_save
setting in cassandra.yaml to enable the default behavior to save all keys.
Synopsis

$ nodetool [connection_options] invalidatecountercache

Table 131: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
602
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Reset counter_cache_keys_to_save to save all keys

$ nodetool -u joe -pw P@ssw0rd! invalidatecountercache

nodetool invalidatekeycache
Clears the key cache. The key cache is present only until nodetool sstableupgrades is run.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
603
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] invalidatekeycache

Table 132: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
604
DataStax Enterprise tools

This command takes no arguments.


Examples

Clears the key cache

$ nodetool invalidatekeycache

nodetool invalidaterowcache
Invalidates the row_cache_keys_to_save setting in cassandra.yaml to enable the default behavior to save all
keys.
Synopsis

$ nodetool [connection_options] invalidaterowcache

Table 133: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
605
DataStax Enterprise tools

The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Resets row_cache_keys_to_save parameter to save all keys

$ nodetool invalidaterowcache

nodetool join
Joins the node to the ring. Valid only when the node was initially not started in the ring with the -Djoin_ring=false
start-up parameter. The joining node must be properly configured with the required cassandra.yaml options for
seed list, initial token, and auto-bootstrapping.
Synopsis

$ nodetool [connection_options] join

Table 134: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
606
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Join node to ring

$ nodetool -u admin1 join

nodetool listendpointspendinghints
Prints information about hints that the node has for other nodes.
Hint information includes Host ID, Address, Rack, DC, node status, total number of hints and files, and
timestamp of newest and oldest hints.
Synopsis

$ nodetool [connection_options] -h hostname listendpointspendinghints

Table 135: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
607
DataStax Enterprise tools

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of the remote node to get information about hints that the node has for
other nodes.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
608
DataStax Enterprise tools

Examples

To print relevant hint information about the local node endpoints

nodetool listendpointspendinghints

Host ID Address Rack DC Status Total hints Total


files Newest Oldest
5762b140-3fdf-4057-9ca7-05c070ccc9c3 127.0.0.2 rack1 datacenter1 DOWN 25098 2
2018-09-18 14:05:18,835 2018-09-18 14:05:08,811

nodetool leaksdetection
Enables and configures memory leak tracking. Tracking information is provided along with a stack trace in
debug.log and system.log when a leak is detected.
The resources currently tracked are:
CachedReadsBufferPool
The non-blocking i/o (NIO) byte buffers that are used by file chunks stored in the chunk cache. The
chunk cache is also referred to as the file cache.
DirectReadsBufferPool
The NIO byte buffers that are used for transient, short-term operations, such as some scattered file
reads.
ChunkCache
The file chunks in the chunk cache. The chunk cache is also referred to as the file cache.
Memory
Native memory accessed directly with malloc calls and therefore not managed by the JVM. Currently
used for compression metadata, bloom filters and the row cache.
The row cache should be disabled in DSE 6.x and later.

If memtables are using off-heap objects, the following resource can also be tracked:
NativeAllocator
The memory used for memtables when the memtable allocation type is offheap objects.

The leaksdetection parameters can also be set in cassandra.yaml. See Memory leak detection settings.

Synopsis

$ nodetool [connection_options] leaksdetection [--set_max_stack_depth number]


[--set_max_stacks_cache_size number] [--set_num_access_records number] [--
set_sampling_probability number] [resource]

Table 136: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
609
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--set_max_stack_depth number
The depth of the stack traces collected. Changes only the depth of the stack traces that will be collected
from the time the parameter is set. Deeper stacks are more unique, so increasing the depth may require
increasing stacks_cache_size_mb.
Default: 30
--set_max_stacks_cache_size_mb number
Set the size of the cache for call stack traces. Stack traces are used to debug leaked resources, and
use heap memory. Set the amount of heap memory dedicated to each resource by setting the max
stacks cache size in MB.
Default: 32
--set_num_access_records number
Set the average number of stack traces kept when a resource is accessed. Currently only supported for
chunks in the cache.
Default: 0
--set_sampling_probability number

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
610
DataStax Enterprise tools

Set the sampling probability. Each resource is tracked with a sampling probability. Set the sampling
probability to 0 to disable tracking and to 1 to enable tracking all the time. A number between 0 and 1
will randomly track a resource. For example, 0.5 will track resources 50% of the time.
Default: 0
Tracking incurs a significant stack trace collection cost for every access and consumes heap space.
It should never be enabled unless directed by a support engineer or consultant.
resource
The resource to which the parameters should be applied. If not specified, the parameters affect all
resources.
Examples

Print the current memory leak detection status

$ nodetool leaksdetection

Result:

Current Status:
CachedReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
DirectReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
ChunkCache/Chunk - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
Memory/Memory - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30

Set the sampling probability to 25 percent on the Memory resource

$ nodetool leaksdetection --set_sampling_probability .25 Memory

Result:

Current Status:
CachedReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
DirectReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
ChunkCache/Chunk - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
Memory/Memory - Sampling probability: 0.100000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30.

nodetool listsnapshots
Lists all the snapshots, along with the size on disk, and true size.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
611
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] listsnapshots

Table 137: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
612
DataStax Enterprise tools

This command takes no arguments.


Examples

List snapshots

$ nodetool listsnapshots

Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk

1534456548264 cycling popular_count 4.84 KiB 5.7 KiB


1534456548264 cycling calendar 5.33 KiB 6.34 KiB
1534456548264 cycling comments 6.83 KiB 7.79 KiB
1534456548264 cycling birthday_list 5.22 KiB 6.09 KiB

Total TrueDiskSpaceUsed: 22.21 KiB

nodetool mark_unrepaired
Marks all SSTables of a table or keyspace as unrepaired.

This operation marks all targeted SSTables as unrepaired, potentially creating new compaction tasks. Use
only if you are no longer running incremental repair on this node.

When no table name is specified, marks all tables in the keyspace as unrepaired.

Synopsis

$ nodetool [connection_options] mark_unrepaired [-f] [--] keyspace_name [table_name ...]

Table 138: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
613
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-f, --force
Confirms the operation.
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
Examples

Mark cycling keyspace unrepaired

$ nodetool mark_unrepaired cycling

Result:

nodetool: WARNING: This operation will mark all SSTables of keyspace cycling as
unrepaired, potentially creating new compaction tasks. Only use this when no longer
running incremental repair on this node. Use --force option to confirm.

nodetool move
Moves the node on the token ring to a new token, generally used to shift tokens slightly.
Additional syntax is required to move a node to a negative tokens:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
614
DataStax Enterprise tools

• Use the preferred double hyphen (–-):

$ nodetool move -- -9223372036854775808

• Escape the hyphen with a backslash (\):

$ nodetool move \-9223372036854775808

OpsCenter provides an option in the Nodes UI for Moving a node.


Synopsis

$ nodetool [connection_options] move [--] new_token

Table 139: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
615
DataStax Enterprise tools

The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
token
63 63
The new token. Number in partition range. For Murmur3Partitioner (default): -2 to +2 -1.
Examples

Move node to new token

$ nodetool move 3074457345618258602

Move node to new negative token

$ nodetool move \-9223372036854775808

nodetool netstats
Prints network information about the host.
The output includes the following information:

• JVM settings

• Mode - The operational mode of the node: JOINING, LEAVING, NORMAL, DECOMMISSIONED, CLIENT

• Read repair statistics

• Attempted - The number of successfully completed read repair operations.

• Mismatch (blocking) - The number of read repair operations since server restart that blocked a query.

• Mismatch (background) - The number of read repair operations since server restart performed in the
background.

• Pool name - Information about client read and write requests by thread pool size.

• Active, pending, and completed number of commands and responses

Synopsis

$ nodetool [connection_options] netstats [-H]

Table 140: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
616
DataStax Enterprise tools

Syntax conventions Description

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-H, --human-readable
Display bytes in human readable form: KiB (kibibyte), MiB (mebibyte), GiB (gibibyte), TiB (tebibyte).
-h, --host hostname
The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
617
DataStax Enterprise tools

Examples

Get network information of local node

$ nodetool netstats

The output is:

Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 1
Mismatch (Background): 1
Pool Name Active Pending Completed Dropped
Large messages n/a 0 0 0
Small messages n/a 0 23295 0
Gossip messages n/a 0 1853117 0

nodetool nodesyncservice
Use the following subcommands to manage the NodeSync service on the connected node.
The NodeSync service automatically starts when a DataStax Enterprise node is started.

The service runs continuous repair for tables that have nodesync set to true. By default, the table option is
set to false (disabled). Use CQL ALTER TABLE to change the NodeSync setting on a specific table or dse
nodesync to change the setting on multiple tables.

nodetool nodesyncservice enable


Starts up the NodeSync service on the connected host. By default, NodeSync service automatically starts with
DataStax Enterprise, but keyspaces and tables must be explicitly opted in.
Use Lifecycle Manager for Enabling keyspaces and tables for monitoring NodeSync in OpsCenter.
Synopsis

$ nodetool [connection_options] nodesyncservice enable [-t timeoutSec]

Table 141: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
618
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local
machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are
prompted to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-t seconds, --timeout seconds


Time to wait in seconds for the service to start.
Default: 120 (2 minutes).
Examples

Start up the NodeSync service on the local host

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
619
DataStax Enterprise tools

1. Run the enable command:

$ nodetool nodesyncservice enable

2. Check the status:

$ nodetool nodesyncservice status --boolean-output

true

Start up the NodeSync service on the host northeast

1. Run the enable command:

$ nodetool -h northeast nodesyncservice enable

2. Check the status:

$ nodetool -h northeast nodesyncservice status

The NodeSync service is running

nodetool nodesyncservice disable


Shuts down the NodeSync service on the connected host. Shut down occurs after the in-progress segment
validations complete, or when the timeout period is reached.
Synopsis

$ nodetool [connection_options] nodesyncservice disable [-f] [-t timeoutSec]

Table 142: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
620
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local
machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are
prompted to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-f, --force
Forces service to shutdown immediately without completing segment validations that are currently
running.
-t seconds, --timeout seconds
Time to wait in seconds for the service to start.
Default: 120 (2 minutes).
Examples
Shut down NodeSync service on local host without waiting for validations that in process to complete

1. Run the disable command:

$ nodetool nodesyncservice disable -f

2. Check the status:

$ nodetool nodesyncservice status --boolean-output

false

Shut down NodeSync service on host northeast using a timeout period of five minutes

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
621
DataStax Enterprise tools

1. Run the disable command:

$ nodetool -h northeast nodesyncservice disable -t 300

2. Check the status:

$ nodetool -h northeast nodesyncservice status

The NodeSync service is running

nodetool nodesyncservice getrate


Returns the configured synchronization rate-limit of the connected node.

Set the rate limit temporarily using nodetool nodesyncservice setrate. To persist the rate limit, use the
rate_in_kb setting in cassandra.yaml.

Synopsis

$ nodetool [connection_options] nodesyncservice getrate

Table 143: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
622
DataStax Enterprise tools

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local
machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are
prompted to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Get configured validation rate of local host

$ nodetool nodesyncservice getrate

Current rate limit=1024 KB/s

nodetool nodesyncservice ratesimulator


Simulates rates necessary to achieve the NodeSync deadline based on configurable assumptions. Rate
simulations are useful, but in production simulations are not a viable substitute for monitoring NodeSync and
adjusting the rate.
Do not use this command on a keyspace with RF=1 or on a single node cluster.

Monitor NodeSync status using OpsCenter. See NodeSync metrics.


Synopsis

$ nodetool [connection_options] nodesyncservice ratesimulator [--deadline-overrides


keyspace_name.table_name:deadline_target_time, ...] [-e keyspace_name.table_name, ...]
[help] [-i keyspace_name.table_name, ...] [--ignore-replication-factor] [simulate -ds
factor_integer -rs factor_integer -sg factor_integer | recommended | recommended_minimum |
theoretical_minimum] [] [-v]

Table 144: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
623
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local
machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are
prompted to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--deadline-overrides
Allows override on the configure deadline for some/all of the tables in the simulation.
-ds, --deadline-safety-factor
Specify factor (integer) to decrease table deadlines to account for imperfect conditions.
Only for simulate sub-command.
-e, --excludes keyspace_name.table_name, ...
A comma-separated list of tables to exclude from the simulation when NodeSync is enabled on the
server-side; this simulates the impact on the rate of disabling NodeSync on those tables.
help
Displays options and usage instructions.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
624
DataStax Enterprise tools

--ignore-replication-factor
Ignores the replication factor for the simulation. Without this option, the default assumes that
NodeSync runs on every node of the cluster (which is highly recommended) and assumes that
validation work is spread among replicas. When NodeSync runs on every node of the cluster, each
node must validate the fraction 1/RF of the data the node owns. This option removes that assumption,
and computes a rate that accounts for all the data the node stores.
-i, --includes keyspace_name.table_name, ...
A comma-separated list of tables to include in the simulation when NodeSync is not enabled server-
side; simulates the impact on the rate of enabling NodeSync on those tables.
-rs, --rate-safety-factor factor_integer
Represents a factor of how much to increase the final rate to account for imperfect conditions. Applies
only to the simulate sub-command.
-sg, --size-growth-factor factor_integer
Represents a factor of how much to increase data sizes to account for data growth. Applies only to the
simulate sub-command.
-v, --verbose
Provides details on how the simulation is carried out. Displays all steps taken by the simulation.
Although this option is useful for understanding the simulations, results can be large or may be
excessive if many tables exist.
Examples

Simulate rates for comments table

$ nodetool nodesyncservice ratesimulator -i cycling.comments

Computed rate: 420kB/s.

Simulate rates with new target times for the comments table

$ nodetool nodesyncservice ratesimulator --deadline-overrides cycling.comments:20h

Simulate example

1. In CQL, create tables within a keyspace of RF > 1 and NodeSync enabled. For example:

CREATE KEYSPACE cycling WITH replication = {'class': 'SimpleStrategy',


'replication_factor': 2};
USE cycling;
CREATE TABLE comments (record_id timeuuid, id uuid, commenter text, comment text,
created_at timestamp,
PRIMARY KEY (id, created_at)) WITH nodesync={'enabled': 'true'};
CREATE TABLE comments2 (record_id timeuuid, id uuid, commenter text, comment text,
created_at timestamp,
PRIMARY KEY (id, created_at)) WITH nodesync={'enabled': 'true'};

2. Insert data into the tables. For example:

INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter )


values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-02-14 12:43:20-0800',
'Raining too hard should have postponed', 'Alex');
INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter )
values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-02-14 12:43:20.234-0800',
'Raining too hard should have postponed', 'Alex');
INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter )
values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-03-21 13:11:09.999-0800',
'Second rest stop was out of water', 'Alex');

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
625
DataStax Enterprise tools

INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter )


values (now(), e7ae5cf3-d358-4d99-b900-85902fda9bb0, '2017-04-01 06:33:02.16-0800',
'LATE RIDERS SHOULD NOT DELAY THE START', 'Alex');
INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter )
values (now(), c7fceba0-c141-4207-9494-a29f9809de6f, totimestamp(now()), 'The gift
certificate for winning was the best', 'Amy');
INSERT INTO cycling.comments (record_id, id , created_at , comment, commenter )
values (now(), c7fceba0-c141-4207-9494-a29f9809de6f, '2017-02-17 12:43:20.234+0400',
'Glad you ran the race in the rain', 'Amy');
...

3. Run the simulator:

$ nodetool nodesyncservice ratesimulator recommended

Computed rate: 16B/s.

As expected, the computed rate is rather small because very little data was inserted.

4. Run the simulator with the verbose flag to view insights on why that rate was calculated:

$ nodetool nodesyncservice ratesimulator recommended -v

Using parameters:
- Size growing factor: 1.00
- Deadline safety factor: 0.25
- Rate safety factor: 0.10

cycling.comments:
- Deadline target=7.5d, adjusted from 10d for safety.
- Size=1.1MB to validate (2.3MB total (adjusted from 1.1MB for future growth) but
RF=2).
- Added to previous tables, 1.1MB to validate in 7.5d => 2B/s
=> New minimum rate: 2B/s
cycling.comments2:
- Deadline target=7.5d, adjusted from 10d for safety.
- Size=7.1MB to validate (14MB total (adjusted from 7.1MB for future growth) but
RF=2).
- Added to previous tables, 8.3MB to validate in 7.5d => 14B/s
=> New minimum rate: 14B/s

Computed rate: 16B/s, adjusted from 14B/s for safety.

As expected, the computed rate is rather small because very little data was inserted.

nodetool nodesyncservice setrate


Temporarily sets the maximum data validation rate.
To persist the rate limit, use the rate_in_kb setting in cassandra.yaml.

Use the nodetool nodesyncservice ratesimulator to review how the change may impact performance. For
more details, see Setting the NodeSync rate.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
626
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] nodesyncservice setrate

Table 145: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local
machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are
prompted to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
627
DataStax Enterprise tools

This command takes no arguments.


Examples

Configure the validation rate of the local host

$ nodetool nodesyncservice setrate 2048

Verify the setting change

$ nodetool nodesyncservice getrate

Current rate limit=2048 KB/s

nodetool nodesyncservice status


Returns the status of the NodeSync on the connected node.
Synopsis

$ nodetool [connection_options] nodesyncservice status [-b]

Table 146: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
628
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local
machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are
prompted to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-b, --boolean-output
Output NodeSync service status as true or false.
True when service is running and false otherwise. Output is useful for scripts.
Examples
Show NodeSync status on the local host using the boolean option

$ nodetool nodesyncservice status -b

false

Show NodeSync status on a remote host

$ nodetool -h northeast nodesyncservice status

The NodeSync service is running

nodetool pausehandoff
Pauses the hints delivery process.
Synopsis

$ nodetool [connection_options] pausehandoff

Table 147: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
629
DataStax Enterprise tools

Syntax conventions Description

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
630
DataStax Enterprise tools

Examples

Pause hint delivery process

$ nodetool pausehandoff

nodetool proxyhistograms
Provides a histogram of network operation statistics at the time of the command.
The output of this command shows the full request latency recorded by the coordinator. The output includes the
percentile rank of read and write latency values for inter-node communication. Typically, you use the command
to see if requests encounter a slow node.
Synopsis

$ nodetool [connection_options] proxyhistograms

Table 148: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
631
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Get network statistics histogram

This example shows the output from nodetool proxyhistograms after running 4,500 insert statements and 45,000
select statements on a three ccm node-cluster on a local computer.

$ nodetool proxyhistograms

proxy histograms
Percentile Read Latency Write Latency Range Latency
(micros) (micros) (micros)
50% 1502.50 375.00 446.00
75% 1714.75 420.00 498.00
95% 31210.25 507.00 800.20
98% 36365.00 577.36 948.40
99% 36365.00 740.60 1024.39
Min 616.00 230.00 311.00
Max 36365.00 55726.00 59247.00

Useful metrics in the output include:

• CAS Read Latency

• CAS Write Latency

• View Write Latency

CAS Read and Write Latency provides data for compare-and-set operations, while View Write Latency provides
data for materialized view write operations.

proxy histograms
Percentile Read Latency Write Latency Range Latency
CAS Read Latency CAS Write Latency View Write Latency
(micros) (micros) (micros) (micros)
(micros) (micros)
50% 454.83 379.02 1955.67
0.00 0.00 0.00
75% 1358.10 943.13 4055.27
0.00 0.00 0.00
95% 3379.39 12108.97 20924.30
0.00 0.00 0.00

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
632
DataStax Enterprise tools

98% 7007.51 155469.30 89970.66


0.00 0.00 0.00
99% 8409.01 155469.30 155469.30
0.00 0.00 0.00
Min 73.46 126.94 126.94
0.00 0.00 0.00
Max 14530.76 155469.30 155469.30
0.00 0.00 0.00

nodetool rangekeysample
Shows the sampled keys held across all keyspaces.
Synopsis

$ nodetool [connection_options] rangekeysample

Table 149: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
633
DataStax Enterprise tools

The JMX port number.


-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Show sampled keys across all keyspaces

$ nodetool rangekeysample

RangeKeySample:
356242581507269238
-5568512119108849737
-8044630622444638698
9139769913044883120
9139769913044883120
-9222057613388634431
-9222057613388634431
-8774946291924800999
-8774946291924800999
-7191538117016975626
-7191538117016975626
-4839385740530813564
-4839385740530813564
-2391368834889506351
-2391368834889506351
-257415902412033945
-257415902412033945
2068649272206580393
2068649272206580393
4479264904256751477
4479264904256751477
6874493789974003618
6874493789974003618
-8718305215016653338
-79752896362648430
1139519215559584928
1178565181744072132
-5883607023773259416
-5189327806405140569
2008715943680221220
3066791452337107542

nodetool rebuild
Rebuilds data by streaming from other nodes.
This command operates on multiple nodes in a cluster and streams data only from a single source replica when
rebuilding a token range. Use this command to add a new datacenter to an existing cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
634
DataStax Enterprise tools

If nodetool rebuild is interrupted before completion, restart it by re-entering the command. The process
resumes from the point at which it was interrupted.

Synopsis

$ nodetool [connection_options] rebuild [-c num_connections] [-dc src_dc_names] [-


ks keyspace_name] [-m mode] [-s source_ip_address] [-ts (start_token_1,end_token_1],
(start_token_2,end_token_2], ...] [-x exclude_source_IPs] [-xdc exclude_dc_names] [--] src-dc-
name

Table 150: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
635
DataStax Enterprise tools

-u, --username jmx_username


The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-c, --connections-per-host num_connections
Maximum number of connections per host for streaming. Overrides value of
streaming_connections_per_host in cassandra.yaml.
-dc src_dc_names, --dcs src_dc_names
Comma-separated list of datacenters from which to stream.

• src_dc_names - Datacenter names are case-sensitive. For example, dc-a,dc-b. To include a rack
name, separate datacenter and rack name with a colon (:). For example, dc-a:rack1,dc-a:rack2.

• when not set - The default is to pick any datacenter.

-ks, --keyspace keyspace_name, ...


Comma-separated list of one or more keyspaces. List only the keyspaces to include in the rebuild.
Do not include any keyspaces that are local to the datacenter, including DSEFS keyspaces, as
well as any keyspaces created with the replication class, SimpleStrategy, such as system and
system_schema.
-m, --mode mode

• normal - conventional behavior, streams only ranges that are not already locally available

• refetch - resets locally available ranges, streams all ranges but leaves current data untouched

• reset - resets the locally available ranges, removes all locally present data (like a TRUNCATE),
streams all ranges

• reset-no-snapshot - (like reset) resets the locally available ranges, removes all locally present data
(like a TRUNCATE), streams all ranges but prevents a snapshot even if auto_snapshot is enabled

When not specified, the default is normal.


-s, --sources source_ip_address
Comma-separated list of IP addresses from which to stream.
-ts, --tokens (start_token_1,end_token_1], (start_token_2,end_token_2], ...
Comma-separated list of token ranges, in this format (start_token_1,end_token_1],
(start_token_2,end_token_2],(start_token_n,end_token_n]
-x, --exclude-sources exclude_source_IPs
Comma-separated list of IP addresses to exclude from streaming.
-xdc, --exclude-dcs exclude_dc_name
Comma-separated list of datacenters to exclude from streaming. For example, dc-a,dc-b. To
include a rack name in the list, separate datacenter and rack name with a colon (:). For example, dc-
a:rack1,dc-a:rack2.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
636
DataStax Enterprise tools

Examples

Rebuild from any datacenter

$ nodetool rebuild

Rebuild from DC2

$ nodetool rebuild -dc2 DC2

Rebuild from DC2 and DC3

$ nodetool rebuild -dc2 DC2, DC3

nodetool rebuild_index
Performs a full rebuild of native secondary indexes for a given table.
Synopsis

$ nodetool [connection_options] rebuild_index [--] keyspace_name table index_name [index_name


...]

Table 151: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
637
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
index_name
One or more index names, separated by a space.
keyspace_name
The keyspace name.
table_name
The table name.
Examples

Rebuild indexes Standard3.IdxName and Standard3.IdxName1 on cycling


keyspace and cyclist_name table

$ nodetool rebuild_index cycling cyclist_name Standard3.IdxName Standard3.IdxName1

nodetool rebuild_view
Performs a rebuild of the specified materialized views for a particular base table on the node on which the
command is run. Use this command to rebuild materialized views after restoring sstables or after restarting a
materialized view build that was previously stopped. If no materialized views are specified, all materialized views
based on the specified table are rebuilt.
The rebuild_view command does not clear existing data in the materialized view.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
638
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] rebuild_view [--] keyspace_name table [materialized_view_name]


[materializeds_view_name ...]

Table 152: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
639
DataStax Enterprise tools

--
Separates an option from an argument that could be mistaken for a option.
materialized_view_name
One or more materialized view names, separated by a space. If not specified, all materialized views in
the table are rebuilt.
keyspace_name
The keyspace name.
table_name
The table name.
Examples

Rebuild materialized views cyclist_by_age and cyclist_by_birthday_and_age


on cycling keyspace and cyclist_base table

$ Rebuild materialized views cyclist_by_age and cyclist_by_birthday_and_age on cycling


keyspace and cyclist_base table

nodetool refresh
Loads newly placed SSTables onto the system without a restart.
Synopsis

$ nodetool [connection_options] refresh [--reset-levels] [--] keyspace_name table_name

Table 153: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
640
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
--reset-levels
Force all sstables to level 0.
table_name
The table name.
Examples

Load new SSTables

$ nodetool refresh cycling comments

nodetool refreshsizeestimates
Refreshes system.size_estimates table. Use when huge amounts of data are inserted or truncated, which can
result in incorrect size estimates.
Synopsis

$ nodetool [connection_options] refreshsizeestimates

Table 154: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
641
DataStax Enterprise tools

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
642
DataStax Enterprise tools

Examples

Refresh system.size_estimates table

$ nodetool refreshsizeestimates

nodetool reloadseeds
Reloads the seed node list from the seed node provider.
Synopsis

$ nodetool [connection_options] reloadseeds

Table 155: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
643
DataStax Enterprise tools

The JMX port number.


-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Reload seed node list from seed node provider

$ nodetool reloadseeds

Updated seed node IP list excluding the current node IP: /10.100.15.1

nodetool reloadtriggers
Reloads trigger classes.
Synopsis

$ nodetool [connection_options] reloadtriggers

Table 156: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
644
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Reload trigger classes

$ nodetool reloadtriggers

nodetool relocatesstables
Rewrites SSTables to the correct disk.
Use with JBOD disk storage to manually rewrite the location of SSTables on disk. Useful if you have changed the
replication factor for the cluster or if you added a new disk.
Synopsis

$ nodetool [connection_options] relocatesstables [-j num_jobs] [--] keyspace_name table_name

Table 157: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
645
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-j, --jobs num_jobs

• num_jobs - Number of SSTables affected simultaneously. Default: 2.

• 0 - Use all available compaction threads.

keyspace_name
The keyspace name.
table_name
The table name.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
646
DataStax Enterprise tools

Examples

Relocate SSTables after adding a new disk

$ nodetool relocatesstables cycling birthday_list

If the SSTables are on the correct disk already, no action is taken.


nodetool removenode
Shows the status of current node removal; forces completion of pending removal, or removes identified node.
Use when the node is down and nodetool decommission cannot be used. If the cluster does not use vnodes,
adjust the tokens before running this command.

Run this command only on nodes that are down. This command triggers cluster streaming. In large
environments, the additional streaming activity causes more pending gossip tasks in the output of nodetool
tpstats. Nodes can start to appear offline and might need to be restarted to clear up the backlog of pending
gossip tasks.

Synopsis

$ nodetool [connection_options] removenode [--] status | force | host_ID

Table 158: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
647
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

force
Force completion of pending removal.
ID
Remove provided ID.
status
Show status of current node removal.
Examples

Determine UUID of the node to remove

$ nodetool removenode status

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 192.168.2.101 112.82 KB 256 31.7%
420129fc-0d84-42b0-be41-ef7dd3a8ad06 RAC1
DN 192.168.2.103 91.11 KB 256 33.9%
d0844a21-3698-4883-ab66-9e2fd5150edd RAC1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
648
DataStax Enterprise tools

UN 192.168.2.102 124.42 KB 256 32.6%


8d5ed9f4-7764-4dbd-bad8-43fddce94b7c RAC1

Remove down node with UUID

$ nodetool removenode d0844a21-3698-4883-ab66-9e2fd5150edd

View status of operation to remove node

$ nodetool removenode status

RemovalStatus: No token removals in process.

Confirm node has been removed

$ nodetool status

The removed node no longer shows in the Host ID column.

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 192.168.2.101 112.82 KB 256 37.7%
420129fc-0d84-42b0-be41-ef7dd3a8ad06 RAC1
UN 192.168.2.102 124.42 KB 256 38.3%
8d5ed9f4-7764-4dbd-bad8-43fddce94b7c RAC1

nodetool repair
Repairs tables on one or more nodes in a cluster when all involved replicas are up and accessible.

Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.

See Repairing nodes. Before using this command, be sure to have an understanding of how node repair works.

If repair encounters a down replica, an error occurs and the repair process halts. Re-run repair after bringing
all replicas back online.

OpsCenter provides a repair option in the Nodes UI for Running a manual repair.
Synopsis

$ nodetool [connection_options] repair [-dcpar | -seq] [-full | -inc] [-hosts ip_address


[ip_address ...]] [-local | -dc datacenter_name[,datacenter_name,...]] [-pl] [-pr] [-prv]

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
649
DataStax Enterprise tools

[-pull -hosts local_ip_address [remote_ip_address] [-j job_threads] [-st start_token -et


end_token] [-tr] [--] [keyspace_name table_name [table_name ...]]

Table 159: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
650
DataStax Enterprise tools

-dc datacenter_name, --in-dc datacenter_name


Comma-separated list of datacenters to limit repairs to. Datacenter names are case sensitive.
Decreases network traffic while repairing more nodes than the local option. When this option is not
specified, repair is run cluster-wide on all nodes that contain replicas.
-dcpar, --dc-parallel
Run repairs on all nodes with the same replica data at the same time, recommended for repairs across
datacenters. A single node in each datacenter runs repair, one after another until the repair is complete.
This option combines sequential and parallel repair by simultaneously running a sequential repair in all
datacenters. Use with the -local option only when the datacenter nodes have all the data for all ranges.
-et, --end-token end_token
The token at which the range ends. Requires start token (-st).
-force, --force
Filter out down endpoints.
-full, --full
Issue a full repair.
-hosts, --in-hosts host_name
Repair specific hosts.
-inc, --inc
Issue an incremental repair.
-j, --job-threads num_threads
Number of threads to run repair jobs. Usually this means number of tables to repair concurrently.
Default: 1. Max: 4.
Increasing job threads puts more load on repairing nodes.
keyspace_name
The keyspace name.
-local, --in-local-dc
Repair only against nodes in the same datacenter.
-pl, --pull
Runs a one-way repair directly from another node that has a replica in the same token range. This
option minimizes performance impact when cross-datacenter repairs are required.
-pr, --partitioner-range
Repair only the first range returned by the partitioner.
-prv, --preview
Determine ranges and amount of data to be streamed, but doesn't perform repair.
-seq, --sequential
Perform sequential repair.
-st, --start-token start_token
The token at which the range starts. Requires end token (-et).
table_name
One or more table names, separated by a space.
-tr, --trace
Trace the repair. Traces are logged to system_traces.events.
-vd, --validate
Checks that repaired data is in sync between nodes.
Out of sync repaired data indicates a full repair should be run.

Examples
All nodetool repair command options are optional. When optional command arguments are not specified, the
defaults are:

• Full repair runs on all keyspaces and all tables.

• Repair runs in parallel on all nodes with the same replica data at the same time.

• The number of job threads is 1.

• No tracing. No validation.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
651
DataStax Enterprise tools

Sequential repair of all keyspaces

$ nodetool repair -seq

Partitioner range repair

$ nodetool repair -pr

Start-point-to-end-point repair of all nodes between two nodes on the ring

$ nodetool repair -st -9223372036854775808 -et -3074457345618258603

Restrict repair to local datacenter

$ nodetool repair -dc DC1

Results in output:

[2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system'


[2014-07-24 21:59:55,617] Starting repair command #2, repairing 490
ranges
for keyspace system_traces (seq=true, full=true)
[2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-
c972e09793ca
for range (820981369067266915,822627736366088177] finished
[2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-
c972e09793ca
for range (2506042417712465541,2515941262699962473] finished
. . .

The system.log shows repair runs only on IP addresses in DC1.

. . .
INFO [AntiEntropyStage:1] 2014-07-24 22:23:10,708
RepairSession.java:171
- [repair #16499ef0-1381-11e4-88e3-c972e09793ca] Received merkle tree
for sessions from /192.168.2.101
INFO [RepairJobTask:1] 2014-07-24 22:23:10,740 RepairJob.java:145
- [repair #16499ef0-1381-11e4-88e3-c972e09793ca] requesting merkle
trees
for events (to [/192.168.2.103, /192.168.2.101])
. . .

nodetool replaybatchlog
Forces batchlog replay and blocks until batches have been replayed.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
652
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] replaybatchlog

Table 160: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
653
DataStax Enterprise tools

This command takes no arguments.


Examples

Force batchlog replay

$ nodetool replaybatchlog

nodetool resetlocalschema
Fixes schema disagreements between nodes by dropping the schema information of the local node and
resynchronizing the schema from another node. When schema information on the local node is dropped, the
system schema tables are truncated. The node temporarily loses metadata about the tables on the node, but
rewrites the information from another node.
Useful when:

• Table schema changes have generated too many tombstones (100,000s).

• One node is out of sync with the cluster.

Synopsis

$ nodetool [connection_options] resetlocalschema

Table 161: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
654
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


nodetool resume
Synopsis
Restart a node's bootstrap process.

$ nodetool [options] bootstrap resume

Tarball path:

installation_location/resources/cassandra/bin

Table 162: Options


Short Long Description

-h --host Hostname or IP address.

-p --port Port number.

-pwf --password-file Password file path.

-pw --password Password.

-u --username Remote JMX agent username.

-- Separates an option from an argument that could be mistaken for a option.

• For tarball installations, execute the command from the installation_location/bin directory.

• If a username and password for RMI authentication are set explicitly in the cassandra-env.sh file for the
host, then you must specify credentials.

• nodetool bootstrap operates on a single node in the cluster if -h is not used to identify one or more
other nodes. If the node from which you issue the command is the intended target, you do not need the -h
option to identify the target; otherwise, for remote invocation, identify the target node, or nodes, using -h.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
655
DataStax Enterprise tools

Description
The nodetool bootstrap resume command restarts bootstrap streaming.
Examples

$ nodetool -u username -pw password bootstrap resume

nodetool resumehandoff
Resumes hints delivery process.
Synopsis

$ nodetool [connection_options] resumehandoff

Table 163: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
656
DataStax Enterprise tools

The JMX port number.


-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Resume hints delivery process

$ nodetool resumehandoff

nodetool ring
Provides node status and information about the ring as determined by the node being queried. This information
provides an idea of the load balance and if any nodes are down. If the cluster is not properly configured, different
nodes may show a different ring. Check that the node appears the same way in the ring. If you use virtual nodes
(vnodes), use nodetool status for succinct output.

• Address
The node's URL.

• DC (datacenter)
The datacenter containing the node.

• Rack
The rack or, in the case of Amazon EC2, the availability zone of the node.

• Status - Up or Down
Indicates whether the node is functioning or not.

• State - N (normal), L (leaving), J (joining), M (moving)


The state of the node in relation to the cluster.

• Load - updates every 90 seconds


The amount of file system data under the cassandra data directory after excluding all content in the
snapshots subdirectories. Because all SSTable data files are included, any data that is not cleaned up, such
as TTL-expired cell or tombstoned data) is counted.

• Token
The end of the token range up to and including the value listed. For an explanation of token ranges, see
Data distribution overview.

• Owns
The percentage of the data owned by the node per datacenter times the replication factor. For example, a
node can own 33% of the ring, but show100% if the replication factor is 3.

• Host ID

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
657
DataStax Enterprise tools

The network ID of the node.

Synopsis

$ nodetool [connection_options] ring [-r] [--] [keyspace]

Table 164: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.
-h, --host hostname
The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
-r, --resolve-ip

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
658
DataStax Enterprise tools

Node domain names instead of IPs.


nodetool scrub
Creates a snapshot and then rebuilds SSTables on a node. If possible use nodetool upgradesstables instead of
scrub.
Scrub automatically discards broken data and removes any tombstoned rows that have exceeded the grace
period of the table. If partition key values do not match the column data type, the partition is considered corrupt
and the process automatically stops.

For LeveledCompactionStrategy (LCS), resets all SSTables back to Level 0 and requires recompaction of all
SSTables.

Synopsis

$ nodetool [connection_options] scrub [-j num_jobs] [-n] [-ns] [-r] [-s] [--] [keyspace_name
table_name [table_name ...]]

Table 165: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
659
DataStax Enterprise tools

-p, --port jmx_port


The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-j, --jobs num_jobs

• num_jobs - Number of SSTables affected simultaneously. Default: 2.

• 0 - Use all available compaction threads.

keyspace_name
The keyspace name.
-n, --no-validate
Do not validate columns using column validator.
-ns, --no-snapshot
If disablesnapshot is false, scrubbed tables are snapshotted first. Default: false.
-r, --reinsert-overflowed-ttl
Rewrite rows with overflowed expiration date affected by CASSANDRA-14092 with the maximum
supported expiration date of 2038-01-19T03:14:06+00:00. The rows are rewritten with the original
timestamp incremented by one millisecond to override/supersede any potential tombstone that may
have been generated during compaction of the affected rows.
-s, --skip-corrupted
Skip corrupted partitions even when scrubbing counter tables. Default is false.
table_name
One or more table names, separated by a space.
nodetool sequence
Sequentially run multiple nodetool commands from a file, resource, or standard input (StdIn) to reduce overhead.
Faster than running nodetool commands individually from a shell script because the JVM doesn't have to restart
for each command.
Synopsis

$ nodetool [connection_options] sequence [--failonerror] [-i input [input ...]] [--


stoponerror] [--] [command_name [command_name : ...]]

Table 166: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
660
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
command_name
Commands to execute. Separate individual commands using a colon surrounded by whitespaces ( : ).
--failonerror
Set this option to true to return an error exit code if a child command fails. By default, an error exit code
is not returned if one or more child commands fail.
-i, --input input
The input to run the command.
--stoponerror
Set to true to stop command on error. Default is if one child command fails, the sequence command
continues with remaining commands.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
661
DataStax Enterprise tools

Examples

To run commands in a file

First, create a text file with one command per line.

$ nodetool sequence -i /my/file/commands

To run commands on the command line

$ nodetool sequence info : gettimeout read : gettimeout write : status

Each command in the file runs sequentially.

################################################################################
# Executing 4 commands:
# info
# gettimeout read
# gettimeout write
# status
################################################################################
# Network interface ens3 (ens3): /fe80:0:0:0:f816:3eff:fe17:a66f%ens3/64
[null], /10.200.182.118/19 [/10.200.191.255]
# Network interface lo (lo): /0:0:0:0:0:0:0:1%lo/128 [null], /127.0.0.1/8 [null]
################################################################################
# Command: info
# Timestamp: August 31, 2018 8:24:46 PM UTC
# Timestamp (local): August 31, 2018 8:24:46 PM UTC
# Timestamp (millis since epoch): 1535747086687
################################################################################
ID : 3b8e8192-c1d3-4b01-a792-9673b4e377c1
Gossip active : true
Native Transport active: true
Load : 625.97 KiB
Generation No : 1532896921
Uptime (seconds) : 2850186
Heap Memory (MB) : 1903.08 / 4012.00
Off Heap Memory (MB) : 0.01
Data Center : SearchGraphAnalytics
Rack : rack1
Exceptions : 0
Key Cache : entries 0, size 0 bytes, capacity 100 MiB, 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 1 hits, 2 requests,
0.500 recent hit rate, 7200 save period in seconds
Chunk Cache : entries 15972, size 595.42 MiB, capacity 2.79 GiB, 15972 misses,
25462774 requests, 0.999 recent hit rate, 606.208 microseconds miss latency
Percent Repaired : 0.0%
Token : 8242717283351148695
# Command 'info' completed successfully in 331 ms
################################################################################
# Command: gettimeout read
# Timestamp: August 31, 2018 8:24:47 PM UTC
# Timestamp (local): August 31, 2018 8:24:47 PM UTC
# Timestamp (millis since epoch): 1535747087024
################################################################################
Current timeout for type read: 5000 ms

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
662
DataStax Enterprise tools

# Command 'gettimeout read' completed successfully in 0 ms


################################################################################
# Command: gettimeout write
# Timestamp: August 31, 2018 8:24:47 PM UTC
# Timestamp (local): August 31, 2018 8:24:47 PM UTC
# Timestamp (millis since epoch): 1535747087025
################################################################################
Current timeout for type write: 2000 ms
# Command 'gettimeout write' completed successfully in 0 ms
################################################################################
# Command: status
# Timestamp: August 31, 2018 8:24:47 PM UTC
# Timestamp (local): August 31, 2018 8:24:47 PM UTC
# Timestamp (millis since epoch): 1535747087026
################################################################################
Datacenter: SearchGraphAnalytics
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 127.0.0.1 625.97 KiB ? 3b8e8192-c1d3-4b01-a792-9673b4e377c1
8242717283351148695 rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership
information is meaningless
# Command 'status' completed successfully in 29 ms
################################################################################
# Total duration: 374ms
# Out of 4 commands, 4 completed successfully, 0 failed.
################################################################################

nodetool setbatchlogreplaythrottle
Sets batchlog replay throttle in KB per second, or 0 to disable throttling. This will be reduced proportionally to the
number of nodes in the cluster.
Synopsis

$ nodetool [connection_options] setbatchlogreplaythrottle [--] value_in_kb_per_sec

Table 167: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
663
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
value_in_kb_per_sec

• value - the number of milliseconds that the database generates hints for an unresponsive node

• 0 - disables throttling

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
664
DataStax Enterprise tools

Examples

Set batchlog replay throttle at 60 KB per second

$ nodetool setbatchlogreplaythrottle 60

Disable batchlog replay throttle

$ nodetool setbatchlogreplaythrottle 0

nodetool setcachecapacity
Sets global key, row, and counter cache capacities in megabytes.

Overrides the configured value of the row_cache_size_in_mb parameter in cassandra.yaml.

Synopsis

$ nodetool [connection_options] setcachecapacity [--] key-cache-capacity row-cache-capacity


counter-cache-capacity

Table 168: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
665
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

counter-cache-capacity
Corresponds to the counter_cache_size_in_mb in cassandra.yaml. By default, the database uses the
smaller of minimum of 2.5% of heap or 50 MB.

• number - the number of keys saved by each cache

• 0 - disable counter caching

key-cache-capacity
Key cache capacity in MB units.
row-cache-capacity
Row cache capacity in MB units, corresponds to the row_cache_size_in_mb parameter in
cassandra.yaml. By default, row caching is zero (disabled).
nodetool setcachekeystosave
Sets the global number of keys saved by each cache for faster post-restart warmup.
Overrides the configured value of the row_cache_keys_to_save parameter in cassandra.yaml.

Synopsis

$ nodetool [connection_options] setcachekeystosave key-cache-keys-to-save row-cache-keys-to-save


counter-cache-keys-to-save

Table 169: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
666
DataStax Enterprise tools

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

counter-cache-keys-to-save
Counter cache capacity in MB units.
key-cache-keys-to-save
Corresponds to the key_cache_keys_to_save (deprecated) parameter in cassandra.yaml. Key cache
limiting is disabled by default, meaning all keys will be saved.

• number - the number of keys saved by each cache

• 0 - disable counter caching

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
667
DataStax Enterprise tools

row-cache-keys-to-save
Corresponds to the row_cache_keys_to_save parameter in cassandra.yaml.

• number - the number of keys saved by each cache

• 0 - disable counter caching

The row-cache-keys-to-save argument , which is disabled by default.


nodetool setcompactionthreshold
Sets minimum and maximum compaction thresholds for a table.
SSTables are compacted concurrently to avoid wasting memory or running out of memory when compacting
highly overlapping SSTables.

The max_threshold table property sets an upper bound on the number of SSTables that may be compacted in
a single minor compaction, as described in How is data updated?.

Synopsis

$ nodetool [connection_options] setcompactionthreshold [--] keyspace_name table_name


minthreshold maxthreshold

Table 170: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
668
DataStax Enterprise tools

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
maxthreshold
Resets or overrides internal setting of 32. How many SSTables of a similar size must be present before
a minor compaction is scheduled.
minthreshold
Minimum threshold.
table_name
The table name.
Examples

Set minimum compaction throttling

$ nodetool setcompactionthreshold cycling comments 6 28

nodetool setcompactionthroughput
Sets the throughput capacity for compaction in the system, or disables throttling. Overwrites the
compaction_throughput_mb_per_sec setting in cassandra.yaml.
To view the current setting, use nodetool getcompactionthroughput.

Synopsis

$ nodetool <options> setcompactionthroughput -- <value_in_mb>

Tarball path:

installation_location/resources/cassandra/bin

Table 171: Options


Short Long Description

-h --host Hostname or IP address.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
669
DataStax Enterprise tools

Short Long Description

-p --port Port number.

-pwf --password-file Password file path.

-pw --password Password.

-u --username Remote JMX agent username.

value_in_mb The throughput capacity in megabytes (MB) per second for compaction. To disable throttling, set to 0.

-- Separates an option from an argument that could be mistaken for a option.

Description
Set value_in_mb to 0 to disable throttling.
nodetool setconcurrentcompactors
Sets number of concurrent compactors.
Synopsis

$ nodetool [connection_options] setconcurrentcompactors [--] num_compactors

Table 172: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
670
DataStax Enterprise tools

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
num_compactors
Number of concurrent compactors.
Examples

Set 2 concurrent compactors

$ nodetool setconcurrentcompactors 2

nodetool setconcurrentviewbuilders
Sets the number of simultaneous materialized view builder tasks allowed to run concurrently. When a view
is created, the node ranges are split into (num_processors * 4) builder tasks and submitted to this executor.
Overrides the concurrent_materialized_view_builders setting in cassandra.yaml.
Synopsis

$ nodetool [connection_options] setconcurrentviewbuilders number

Table 173: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
671
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

number
The number of concurrent materialized view builder tasks. Must be greater than 0.
Examples

Allow 6 concurrent materialized view builder tasks

$ nodetool setconcurrentviewbuilders 6

nodetool sethintedhandoffthrottlekb
Sets hinted handoff throttle in KB/sec per delivery thread.
When a node detects that a node for which it is holding hints has recovered, hints are sent to that node. This
command sets the maximum sleep interval per delivery thread after delivering each hint. The interval shrinks
proportionally to the number of nodes in the cluster. For example, if there are two nodes in the cluster, each
delivery thread uses the maximum interval; if there are three nodes, each node throttles to half of the maximum
interval, because the two nodes are expected to deliver hints simultaneously.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
672
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] sethintedhandoffthrottlekb [--] value_in_KB/sec

Table 174: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
673
DataStax Enterprise tools

--
Separates an option from an argument that could be mistaken for a option.
value_in_kb_per_sec

• value - the number of milliseconds that the database generates hints for an unresponsive node

• 0 - disables throttling

Examples

Set hinted handoff throttle at 64 KB/sec per delivery thread

$ nodetool sethintedhandoffthrottlekb 64

nodetool setinterdcstreamthroughput
Sets the inter-datacenter throughput capacity in megabits per second (Mbps) streaming.
Since it is a subset of total throughput, inter_dc_stream_throughput_outbound_megabits_per_sec should be
set to a value less than or equal to stream_throughput_outbound_megabits_per_sec.

Synopsis

$ nodetool [connection_options] setinterdcstreamthroughput [--] value_in_megabits

Table 175: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
674
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
value_in_megabits

• Mb - streaming throughput capacity in Mb per second

• 0 - disables throttling

Examples

Set inter-datacenter throughput capacity at 64 megabits per second streaming

$ nodetool setinterdcstreamthroughput 64

nodetool setlogginglevel
Sets the log level threshold for a given component or class.

Use this command to set logging levels for services instead of modifying the logback-text.xml file.

Synopsis

$ nodetool [connection_options] setlogginglevel [--] component | class level

Table 176: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
675
DataStax Enterprise tools

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
class
The following values are valid for the log class qualifier:

• org.apache.cassandra

• org.apache.cassandra.db

• org.apache.cassandra.service.StorageProxy

component
The following values are valid for the log components qualifier:

• bootstrap

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
676
DataStax Enterprise tools

• compaction

• cql

• repair

• ring

• streaming

level
If class qualifier and level arguments to the command are empty or null, logging levels are reset to the
initial configuration.
The valid values for setting the log level include ALL for logging information at all levels, TRACE
through ERROR, and OFF for no logging. TRACE creates the most verbose log, and ERROR, the least.

• ALL

• TRACE

• DEBUG

• INFO (Default)

• WARN

• ERROR

• OFF

When set to TRACE or DEBUG output appears only in the debug.log. When set to INFO the
debug.log is disabled.

Examples
Set StorageProxy service to debug level

$ nodetool setlogginglevel org.apache.cassandra.service.StorageProxy DEBUG

Extended logging for compaction is supported and requires table configuration. The extended compaction logs
are stored in a separate file.

nodetool setmaxhintwindow
Sets the maximum time that the database generates hints for an unresponsive node.
Synopsis

$ nodetool [connection_options] setmaxhintwindow [--] value_in_ms

Table 177: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
677
DataStax Enterprise tools

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
value_in_ms

• value - milliseconds

• 0 - disables throttling

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
678
DataStax Enterprise tools

Examples

Set time that database generates hints for an unresponsive node to 120
milliseconds

$ nodetool setmaxhintwindow 120

nodetool setstreamthroughput
Sets the throughput capacity in megabits per second (Mb/s) for outbound streaming in the system. Overwrites
the stream_throughput_outbound_megabits_per_sec setting in cassandra.yaml.
If inter_dc_stream_throughput_outbound_megabits_per_sec is set, since it is a subset of total throughput, its
value should be less than or equal to stream_throughput_outbound_megabits_per_sec.

To view the current setting, use nodetool getstreamthroughput.

Synopsis

$ nodetool [connection_options] setstreamthroughput [--] value_in_megabits

Table 178: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
679
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
value_in_megabits

• Mb - streaming throughput capacity in Mb per second

• 0 - disables throttling

Examples

Set throughput capacity at 64 megabits per second for outbound streaming

$ nodetool setstreamthroughput 64

nodetool settimeout
Temporarily sets the timeout for the given timeout type by overriding the corresponding setting in
cassandra.yaml:

• read - read_request_timeout_in_ms

• range - range_request_timeout_in_ms

• write - write_request_timeout_in_ms

• counterwrite - counter_write_request_timeout_in_ms

• cascontention - cas_contention_timeout_in_ms

• truncate - truncate_request_timeout_in_ms

• misc, such as general rpc_timeout_in_ms

To persist the setting, change the cassandra.yaml setting.


To discover the current timeouts, use nodetool gettimeout.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
680
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] settimeout [--] timeout_type timeout_in_ms

Table 179: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
681
DataStax Enterprise tools

--
Separates an option from an argument that could be mistaken for a option.
timeout_in_ms
Time to wait in milliseconds.
0 - Disables socket streaming timeout.
timeout_type
The timeout type: read, range, write, counterwrite, cascontention, truncate, streamingsocket, or misc
(general rpc_timeout_in_ms).
Examples

Set write timeout for 15 ms

$ nodetool settimeout write 15

Disable truncate timeout

$ nodetool settimeout truncate 0

nodetool settraceprobability
Sets the probability for tracing any given request to value.
Probabilistic tracing identifies which queries are responsible for intermittent query performance problems. You
can trace some or all statements sent to a cluster. Tracing a request usually requires at least 10 rows to be
inserted.
A probability of 1.0 traces everything whereas lesser amounts (for example, 0.10) only sample a certain
percentage of statements. Take care on large and active systems, as system-wide tracing will have a
performance impact. Unless you are under a very light load, tracing all requests (probability 1.0) will probably
overwhelm your system. Start with a small fraction, for example, 0.001 and increase only if necessary.
The trace information is stored in a system_traces keyspace that holds the sessions and events tables that can
be easily queried to answer questions, such as what the most time-consuming query has been since a trace was
started. Query the parameters map and thread column in the system_traces.sessions and system_traces.events
tables for probabilistic tracing information.

To discover the current trace probability setting, use nodetool gettraceprobability.

Synopsis

$ nodetool [connection_options] settraceprobability [--] value

Table 180: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
682
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
value

• 0 - disables trace probability. Default.

• number between 0 and 1 - trace probability to represent a percentage.

• 1 - enables for all requests.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
683
DataStax Enterprise tools

Examples

Set probability for tracing a request at 60%

$ nodetool settraceprobability 0.6

Enable tracing for all requests

$ nodetool settraceprobability 1

Disable request tracing

$ nodetool settraceprobability 0

nodetool sjk
Runs Swiss Java Knife (SJK) commands to execute, troubleshoot, and monitor the database.

See Using nodetool sjk. To learn more about SJK, see the jvm-tools Github repository.

Synopsis

$ nodetool [connection_options] sjk [--] [args [args ...]]

Table 181: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
684
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
args
Arguments passed as is to 'Swiss Java Knife'.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
685
DataStax Enterprise tools

Examples

Get status for the EndpointStateTracker MBean

$ nodetool sjk mx -b com.datastax.bdp:type=core,name=EndpointStateTracker -f Blacklisted


--get

Set status to true for the EndpointStateTracker MBean

$ nodetool sjk mx -b com.datastax.bdp:type=core,name=EndpointStateTracker -f Blacklisted


--set -v true

Get status of node1

$ nodetool sjk mx -b com.datastax.bdp:type=core,name=EndpointStateTracker -mc -op


getBlacklistedStatus -a node1

nodetool snapshot
Creates a backup by taking a snapshot of table data. A snapshot is a hardlink to the SSTable files in the data
directory for a schema table at the moment the snapshot is executed.
The snapshot directory path is: data/keyspace_name/table-UID/snapshots/snapshot_name. Data is backed up
into multiple .db files and table schema is saved to schema.cql. The schema.cql file captures the structure of
the table at the time of snapshot because restoring the snapshot requires the table to have the same structure.
See this DataStax Support knowledge base article Manual Backup and Restore, with Point-in-time and table-
level restore.

Always run nodetool cleanup before taking a snapshot for restore. Otherwise invalid replicas, that is replicas
that have been superseded by new, valid replicas on newly added nodes can get copied to the target when
they should not. This results in old data showing up on the target.

Before upgrading DataStax Enterprise, be sure to create a back up of all keyspaces. See taking a snapshot.

Synopsis

$ nodetool [connection_options] snapshot [--table table_name | -kt


keyspace_name.table_name,...] [-sf] [-t snapshotname] [--] [keyspace_name [keyspace_name...]]

Table 182: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
686
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
--table, -cf, --column-family table_name
Table name in the specified keyspace.
-kt, --kt-list, -kc, --kc.list keyspace_name.table_name,...
Comma-separated list of keyspace_name.table_name with no spaces after the comma. For example,
cycling.cyclist,basketball.players
-sf, --skip_flush
Do not flush tables before creating the snapshot.
Snapshot will not contain unflushed data.
-t snapshotname, --tag snapshotname
The snapshot filepath. When not specified, the current time is used for the directory name. For example,
1489076973698.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
687
DataStax Enterprise tools

Examples

Take snapshot of all keyspaces on the node

$ nodetool snapshot

A message displays with the name of the snapshot directory:

Requested creating snapshot(s) for [all keyspaces] with snapshot name [1489076973698] and
options {skipFlush=false}
Snapshot directory: 1489076973698

Create snapshot of single keyspace in the cycling_2017-3-9 filepath

$ nodetool snapshot -t cycling_2017-3-9 cycling

The following output appears:

Requested creating snapshot(s) for [cycling] with snapshot name [2015.07.17]


Snapshot directory: cycling_2017-3-9

Take snapshot of single keyspace with two tables

The cycling keyspace contains two tables, cyclist_name and upcoming_calendar. The snapshot creates multiple
snapshot directories named cycling_2017-3-9. A number of .db files containing the data are located in these
directories, along with table schema. For example, from the DSE installation directory:

$ ls -1 data/cycling/cyclist_name-9e516080f30811e689e40725f37c761d/snapshots/
cycling_2017-3-9

manifest.json
mc-1-big-CompressionInfo.db
mc-1-big-Data.db
mc-1-big-Digest.crc32
mc-1-big-Filter.db
mc-1-big-Index.db
mc-1-big-Statistics.db
mc-1-big-Summary.db
mc-1-big-TOC.txt
schema.cql

Take snapshot of multiple (mykeyspace and cycling) keyspaces

$ nodetool snapshot mykeyspace cycling

Requested creating snapshot(s) for [mykeyspace, cycling] with snapshot name


[1391460334889]

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
688
DataStax Enterprise tools

Snapshot directory: 1391460334889

Take snapshot of single table

Take a snapshot of only the cyclist_name table in the cycling keyspace.

$ nodetool snapshot --table cyclist_name cycling

Requested creating snapshot(s) for [cycling] with snapshot name [1391461910600]


Snapshot directory: 1391461910600

The resulting snapshot directory 1391461910600 contains data files and the schema of cyclist_name table in
data/cycling/cyclist_name-a882dca02aaf11e58c7b8b496c707234/snapshots.

Take snapshot of multiple tables in different keyspaces

Take a snapshot the cyclist_name table in the cycling keyspace and the sample_times table in the test
keyspace. For the -kt command argument, list tables in a comma-separated list with no spaces.

$ nodetool snapshot -kt cycling.cyclist_name,test.sample_times

Requested creating snapshot(s) for [cycling.cyclist_name,test.sample_times] with snapshot


name [1431045288401]
Snapshot directory: 1431045288401

nodetool status
Provides information about the cluster, such as the state, load, and IDs.
A frequently used command, nodetool status provides the following information:

• Status - U (up) or D (down)


Indicates whether the node is functioning or not.

• State - N (normal), L (leaving), J (joining), M (moving)


The state of the node in relation to the cluster.

• Address
The node's URL.

• Load - updates every 90 seconds


The amount of file system data in the data directory, excluding all content in the snapshots subdirectories.
Because all SSTable data files are included, any data that is not cleaned up (such as TTL-expired cell or
tombstoned data) is counted.

• Tokens
The number of tokens set for the node.

• Owns
The percentage of the data owned by the node per datacenter times the replication factor. For example, a
node can own 33% of the ring, but shows 100% if the replication factor is 3. For non-system keyspaces, the
endpoint percentage ownership information is shown.

• Host ID

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
689
DataStax Enterprise tools

The network ID of the node.

• Rack
The rack or, in the case of Amazon EC2, the availability zone of the node.

Synopsis

$ nodetool [connection_options] status [-r] [--] [keyspace_name]

Table 183: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
690
DataStax Enterprise tools

-u, --username jmx_username


The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
-r, --resolve-ip
Node domain names instead of IPs.
Examples

Get cluster status on all keyspaces

$ nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 47.66 KB 1 ? aaa1b7c1-6049-4a08-ad3e-3697a0e30e10 rack1

Get cluster status on a single keyspace

$ nodetool status mykeyspace

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 47.66 KB 1 33.3% aaa1b7c1-6049-4a08-ad3e-3697a0e30e10 rack1
UN 127.0.0.2 47.67 KB 1 33.3% 1848c369-4306-4874-afdf-5c1e95b8732e rack1
UN 127.0.0.3 47.67 KB 1 33.3% 49578bf1-728f-438d-b1c1-d8dd644b6f7f rack1

nodetool statusbackup
Provides status of incremental backup.
Synopsis

$ nodetool [connection_options] statusbackup

Table 184: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
691
DataStax Enterprise tools

Syntax conventions Description

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
692
DataStax Enterprise tools

Examples

Get status of incremental backup

$ nodetool -u joe -pw P@ssw0rd! statusbackup

not running

nodetool statusbinary
Provides the status of the native transport that defines the format of the binary message.
Synopsis

$ nodetool [connection_options] statusbinary

Table 185: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
693
DataStax Enterprise tools

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Get status of native transport

$ nodetool -u joe -pw P@ssw0rd! statusbinary

running

nodetool statusgossip
Provides status of gossip.
Synopsis

$ nodetool [connection_options] statusgossip

Table 186: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
694
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Get status of gossip.

$ nodetool -u joe -pw P@ssw0rd! statusgossip

running

nodetool statushandoff
Provides status of storing future hints.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
695
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] statushandoff

Table 187: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
696
DataStax Enterprise tools

This command takes no arguments.


Examples

Get status of storing future hints

$ nodetool -u joe -pw P@ssw0rd! statushandoff

Hinted handoff is running

nodetool stop
Stops all compaction operations from continuing to run, typically run on a node where compaction has a negative
impact on performance. After the compaction stops, the remaining operations in the queue are continued.
Eventually, the compaction is restarted.
Synopsis

$ nodetool [connection_options] stop [-id compaction_id] [--] compaction_type

Table 188: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
697
DataStax Enterprise tools

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-id, --compaction-id compaction_id
Stop a single compaction operation by the specified ID. Use nodetool compactionstats to find the ids of
compaction operations in progress.
compaction_type
Supported compaction types:

• COMPACTION

• VALIDATION

• CLEANUP

• SCRUB

• UPGRADE_SSTABLES

• VERIFY

• INDEX_BUILD

• TOMBSTONE_COMPACTION

• ANTICOMPACTION

• VIEW_BUILD

• INDEX_SUMMARY

• RELOCATE

• GARBAGE_COLLECT

nodetool stopdaemon
Stops cassandra daemon.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
698
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] stopdaemon

Table 189: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
699
DataStax Enterprise tools

This command takes no arguments.


Examples

Stop cassandra daemon

$ nodetool -u joe -pw P@ssw0rd! stopdaemon

nodetool tablehistograms
Initial troubleshooting and performance metrics that provide current performance statics for read and write
latency on a table during the past fifteen minutes.
Synopsis

nodetool options tablehistograms [--] keyspace_name table_name

Tarball path:

installation_location/resources/cassandra/bin

Table 190: Options


Short Long Description

-h --host Hostname or IP address.

-p --port Port number.

-pwf --password-file Password file path.

-pw --password Password.

-u --username Remote JMX agent username.

keyspace_name Name of keyspace.

table_name Name of table.

-- Separates an option from an argument that could be mistaken for a option.

Description
nodetool tablehistograms shows table performance statistics over the past fifteen minutes, including read/
write latency, partition size, cell count, and number of SSTables. Use this tool to analyze performance and tune
individual tables and ensure that the percent latency level meets the SLA for the data stored in the table.
Example
For example, to get statistics for the DSE Search wiki demo solr table, use this command:

nodetool tablehistograms wiki solr

Output:

wiki/solr histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell
Count
(micros) (micros) (bytes)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
700
DataStax Enterprise tools

50% 1.00 126.93 654.95 2759


3
75% 1.00 152.32 1358.10 5722
3
95% 1.00 785.94 5839.59 17084
3
98% 1.00 1629.72 12108.97 29521
3
99% 1.00 2346.80 12108.97 42510
3
Min 1.00 73.46 219.34 104
3
Max 1.00 2346.80 12108.97 219342
3

The output shows the percentile rank of read and write latency values, the partition size, and the cell count for
the table.
nodetool tablestats
Provides statistics about one or more tables. Statistics are updated after SSTables change through compaction
or flushing.
DataStax Enterprise uses the metrics-core library to make the output more informative and easier to
understand.

Table 191: nodetool tablestats output for a single table


Name of statistic Example Brief description Related information
value

Keyspace libdata Name of the keyspace. Keyspace and table

Table libout Name of this table.

SSTable count 3 Number of SSTables containing data for Table statistics


this table.

Space used (live) 9592399 Total number of bytes of disk space used Storing data on disk in SSTables
by all active SSTables belonging to this
table.

Space used (total) 9592399 Total number of bytes of disk space used Same as above.
by SSTables belonging to this table,
including obsolete SSTables waiting for
GC management.

Space used by snapshots 0 Total number of bytes of disk space used About snapshots
(total) by snapshot of this table's data.

Off heap memory used (total) Total number of bytes of off heap memory
used for memtables, Bloom filters, index
summaries and compression metadata for
this table.

SSTable Compression Ratio 0.367… Ratio of size of compressed SSTable data Types of compression options.
to its uncompressed size.

Number of partitions (estimate) 3 The number of partition keys for this table. Not the number of primary keys. This
gives you the estimated number of
partitions in the table.

Memtable cell count 1022550 Number of cells (storage engine rows x How the database reads and writes data
columns) of data in the memtable for this
table.

Memtable data size 32028148 Total number of bytes in the memtable for Total amount of live data stored in the
this table. memtable, excluding any data structure
overhead.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
701
DataStax Enterprise tools

Name of statistic Example Brief description Related information


value

Memtable off heap memory 0 Total number of bytes of off-heap data for The maximum amount is set in
used this memtable, including column related cassandra.yaml by the property
overhead and partitions overwritten. memtable_offheap_space_in_mb.

Memtable switch count 3 Number of times a full memtable for this Increases each time the memtable for
table was swapped for an empty one. a table is flushed to disk. See How
memtables are measured article.

Local read count 11207 Number of requests to read tables in the


keyspace since startup.

Local read latency 0.048 ms Round trip time in milliseconds to How is data read?
complete the most recent request to read
the table.

Local write count 17598 Number of local requests to update the


table since startup.

Local write latency 0.054 ms Round trip time in milliseconds to How are consistent read and write
complete an update to the table. operations handled?

Pending flushes 0 Estimated number of reads, writes, and Monitor this metric to watch for
cluster operations pending for this table. blocked or overloaded memtable flush
writers. The nodetool tpstats tool does
not report on blocked flushwriters.

Percent repaired 100.0 Percentage of data (uncompressed)


marked as repaired across all non-system
tables on a node. Tables with a replication
factor of 1 are excluded.

Bytes repaired 0.000KiB Size of table data repaired on disk.

Bytes unrepaired 0.000KiB Size of table data unrepaired on disk.

Bytes pending repair 0.000KiB Size of table data isolated for an ongoing
incremental repair.

Bloom filter false positives 0 Number of false positives reported by this Tuning bloom filters
table's Bloom filter.

Bloom filter false ratio 0.00000 Fraction of all bloom filter checks resulting
in a false positive from the most recent
read.

Bloom filter space used, bytes 11688 Size in bytes of the bloom filter data for
this table.

Bloom filter off heap memory 8 The number of bytes of offheap memory
used used for Bloom filters for this table.

Index summary off heap 41 The number of bytes of off heap memory
memory used used for index summaries for this table.

Compression metadata off 8 The number of bytes of off heap memory


heap memory used used for compression offset maps for this
table.

Compacted partition minimum 1110 Size in bytes of the smallest compacted


partition for this table

Compacted partition maximum 126934 Size in bytes of the largest compacted


bytes partition for this table.

Compacted partition mean 2730 The average size of compacted partitions


bytes for this table.

Average live cells per slice 0.0 Average number of cells scanned by
(last five minutes) single key queries during the last five
minutes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
702
DataStax Enterprise tools

Name of statistic Example Brief description Related information


value

Maximum live cells per slice 0.0 Maximum number of cells scanned by
(last five minutes) single key queries during the last five
minutes.

Average tombstones per slice 0.0 Average number of tombstones scanned


(last five minutes) by single key queries during the last five
minutes.

Maximum tombstones per slice 0.0 Maximum number of tombstones scanned


(last five minutes) by single key queries during the last five
minutes.

Dropped mutations 0.0 The number of mutations (INSERT, A high number of dropped mutations can
UPDATE, or DELETE) started on this indicate an overloaded node.
table but not completed.

Synopsis

$ nodetool [connection_options] tablestats [-F json | yaml] [-H] [-i] [--]


[keyspace_name.table_name ...]

Table 192: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
703
DataStax Enterprise tools

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-F, --format json | yaml
The format for the output. The default is plain text. The following wait latencies (in ms) are included in
the following order: 50%, 75%, 95%, 98%, 99%, Min, and Max.
-H, --human-readable
Display bytes in human readable form: KiB (kibibyte), MiB (mebibyte), GiB (gibibyte), TiB (tebibyte).
-i
Ignore list of tables and display remaining tables.
keyspace [tables]
Run compaction on an entire keyspace or specified tables; use a space to separate table names.

• If you do not specify a keyspace or table, a major compaction is run on all keyspaces and tables.

• If you specify only a keyspace, a major compaction is run on all tables in that keyspace.

• If you specify one or more tables, a major compaction is run on those tables.

Examples

Get table metrics on a single table in default format

$ nodetool tablestats cycling.birthday_list

Total number of tables: 68


----------------
Keyspace : cycling
Read Count: 0
Read Latency: NaN ms
Write Count: 20
Write Latency: 0.05625 ms
Pending Flushes: 0
Table: birthday_list
SSTable count: 0
Space used (live): 0
Space used (total): 0
Space used by snapshots (total): 0
Off heap memory used (total): 0
SSTable Compression Ratio: -1.0
Number of partitions (estimate): 5
Memtable cell count: 6

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
704
DataStax Enterprise tools

Memtable data size: 799


Memtable off heap memory used: 0
Memtable switch count: 0
Local read count: 0
Local read latency: NaN ms
Local write count: 6
Local write latency: 0.035 ms
Pending flushes: 0
Percent repaired: 100.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 0.000KiB
Bytes pending repair: 0.000KiB
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 0
Bloom filter off heap memory used: 0
Index summary off heap memory used: 0
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 0
Compacted partition maximum bytes: 0
Compacted partition mean bytes: 0
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): NaN
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
Failed Replication Count: null

Get metrics on two tables in yaml format

$ nodetool tablestats -F yaml cycling.calendar cycling.birthday_list

total_number_of_tables: 68
cycling:
write_latency_ms: 0.05625
tables:
calendar:
average_tombstones_per_slice_last_five_minutes: .NaN
bloom_filter_off_heap_memory_used: '0'
bytes_pending_repair: 0
memtable_switch_count: 0
maximum_tombstones_per_slice_last_five_minutes: 0
memtable_cell_count: 12
memtable_data_size: '854'
average_live_cells_per_slice_last_five_minutes: .NaN
local_read_latency_ms: NaN
local_write_latency_ms: '0.046'
pending_flushes: 0
compacted_partition_minimum_bytes: 0
local_read_count: 0
sstable_compression_ratio: -1.0
dropped_mutations: '0'
bloom_filter_false_positives: 0
off_heap_memory_used_total: '0'
memtable_off_heap_memory_used: '0'
index_summary_off_heap_memory_used: '0'
bloom_filter_space_used: '0'
sstables_in_each_level: []
compacted_partition_maximum_bytes: 0
space_used_total: '0'
local_write_count: 12

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
705
DataStax Enterprise tools

compression_metadata_off_heap_memory_used: '0'
number_of_partitions_estimate: 3
bytes_repaired: 0
maximum_live_cells_per_slice_last_five_minutes: 0
space_used_live: '0'
compacted_partition_mean_bytes: 0
bloom_filter_false_ratio: '0.00000'
bytes_unrepaired: 0
percent_repaired: 100.0
space_used_by_snapshots_total: '0'
birthday_list:
average_tombstones_per_slice_last_five_minutes: .NaN
bloom_filter_off_heap_memory_used: '0'
bytes_pending_repair: 0
memtable_switch_count: 0
maximum_tombstones_per_slice_last_five_minutes: 0
memtable_cell_count: 6
memtable_data_size: '799'
average_live_cells_per_slice_last_five_minutes: .NaN
local_read_latency_ms: NaN
local_write_latency_ms: '0.035'
pending_flushes: 0
compacted_partition_minimum_bytes: 0
local_read_count: 0
sstable_compression_ratio: -1.0
dropped_mutations: '0'
bloom_filter_false_positives: 0
off_heap_memory_used_total: '0'
memtable_off_heap_memory_used: '0'
index_summary_off_heap_memory_used: '0'
bloom_filter_space_used: '0'
sstables_in_each_level: []
compacted_partition_maximum_bytes: 0
space_used_total: '0'
local_write_count: 6
compression_metadata_off_heap_memory_used: '0'
number_of_partitions_estimate: 5
bytes_repaired: 0
maximum_live_cells_per_slice_last_five_minutes: 0
space_used_live: '0'
compacted_partition_mean_bytes: 0
bloom_filter_false_ratio: '0.00000'
bytes_unrepaired: 0
percent_repaired: 100.0
space_used_by_snapshots_total: '0'
read_latency_ms: .NaN
pending_flushes: 0
write_count: 20
read_latency: .NaN
read_count: 0

nodetool toppartitions
Samples the activity in a table during the specified duration and reports the most active partitions.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
706
DataStax Enterprise tools

Synopsis

$ nodetool [connection_options] toppartitions [-a samplers] [-k num_partitions] [-s size] [--]
keyspace_name table_name duration

Table 193: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
707
DataStax Enterprise tools

--
Separates an option from an argument that could be mistaken for a option.
-a samplers, samplers2
Comma-separated list of samplers. Default is all.
duration
Duration in milliseconds.
-k num_partitions
Number of top partitions. Default is 10.
keyspace_name
The keyspace name.
-s size
Capacity of stream summary. A value closer to actual cardinality of partitions yields more accurate
results. Default is 256.
table_name
The table name.
Examples

Sample the most active partitions for the table test.users for 1,000
milliseconds.

$ nodetool toppartitions test users 1000

The output of nodetool toppartitions is similar to the following:

WRITES Sampler:
Cardinality: ~2 (256 capacity)
Top 4 partitions:
Partition Count +/-
4b504d39354f37353131 15 14
3738313134394d353530 15 14
4f363735324e324e4d30 15 14
303535324e4b4d504c30 15 14

READS Sampler:
Cardinality: ~3 (256 capacity)
Top 4 partitions:
Partition Count +/-
4d4e30314f374e313730 42 41
4f363735324e324e4d30 42 41
303535324e4b4d504c30 42 41
4e355030324e344d3030 41 40

For each of the samplers used (WRITES and READS in the example), toppartitions reports:

• The cardinality of the sampled operations (that is, the number of unique operations in the sample set)

• The n partitions in the specified table that had the most traffic in the specified time period (where n is the
value of the -k argument, or ten if -k is not explicitly set in the command).
For each Partition, toppartitions reports:
Partition
The partition key
Count
The number of operations of the specified type that occurred during the specified time period.
+/-
The margin of error for the Count statistic

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
708
DataStax Enterprise tools

To keep the toppartitions reporting from slowing performance, the database does not keep
an exact count of operations, but uses sampling techniques to create an approximate number.
(This example reports on a sample cluster; a production system might generate millions of
reads or writes in a few seconds.) The +/- figure allows you to judge the accuracy of the
toppartitions reporting.

nodetool tpstats
Prints usage statistics of thread pools. The DataStax Enterprise (DSE) database is based on a staged event-
driven architecture (SEDA).
The database separates different tasks into stages connected by a messaging service. Each stage has a queue
and a thread pool. Some stages skip the messaging service and queue tasks immediately on a different stage
when it exists on the same node. The database can back up a queue if the next stage is too busy and lead to
performance bottlenecks, as described in Monitoring a DataStax Enterprise cluster.
Reports are updated after SSTables change through compaction or flushing.
Report columns
The nodetool tpstats command report includes the following columns:
Active
The number of Active threads.
Pending
The number of Pending requests waiting to be executed by this thread pool.
Completed
The number of tasks Completed by this thread pool.
Blocked
The number of requests that are currently Blocked because the thread pool for the next step in the
service is full.
All-Time Blocked
The total number of All-Time Blocked requests, which are all requests blocked in this thread pool up
to now.
Report rows
The follow list describes the task or property associated with the task reported in the nodetool tpstats output.
General metrics
The following report aggregated statistics for tasks on the local node:
BackgroundIoStage
Completes background tasks like submitting hints and deserializing the row cache.
CompactionExecutor
Running compaction.
GossipStage
Distributing node information via Gossip. Out of sync schemas can cause issues. You may have to sync
using nodetool resetlocalschema.
HintsDispatcher
Dispatches a single hints file to a specified node in a batched manner.
InternalResponseStage
Responding to non-client initiated messages, including bootstrapping and schema checking.
MemtableFlushWriter
Writing memtable contents to disk. May back up if the queue is overruns the disk I/O, or because of
sorting processes.
nodetool tpstats no longer reports blocked threads in the MemtableFlushWriter pool. Check the
Pending Flushes metric reported by nodetool tblestats.
MemtablePostFlush
Cleaning up after flushing the memtable (discarding commit logs and secondary indexes as needed).
MemtableReclaimMemory
Making unused memory available.
PendingRangeCalculator

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
709
DataStax Enterprise tools

Calculating pending ranges per bootstraps and departed nodes. Reporting by this tool is not useful —
see Developer notes.
PerDiskMemtableFlushWriter_N
Activity for the memtable flush writer of each disk.
ReadRepairStage
Performing read repairs. Usually fast, if there is good connectivity between replicas.
Thread per core (TPC) task metrics
All actions in the TPC loop are labeled and therefore observable. Tasks marked Pendable are throttled, limited
to the value set for tpc_concurrent_requests_limit in the cassandra.yaml (by default, 128). Thread per core
messages are prepended with TPC/type, where:

• TPC/N are metrics for the core number (when --cores is specified).

• TPC/other are metrics for tasks executed that are not on TPC threads.

• TPC/all are the aggregate task metrics for all cores.

UNKNOWN
Unknown task.
FRAME_DECODE
Asynchronous frame decoding.
READ_LOCAL
Single-partition read request from a local node generated directly from clients.
READ_REMOTE
Single-partition read request from a remote replica.
READ_TIMEOUT
Signals read timeout errors.
READ_DEFERRED
Single-partition read request that will be first scheduled on an event loop (Pendable)
READ_RESPONSE
Single-partition read response.
READ_RANGE_LOCAL
Partition range read request from a local node generated directly from clients.
READ_RANGE_REMOTE
Partition range read request from a remote replica.
READ_RANGE_NODESYNC
Partition range read originating from NodeSync.
READ_RANGE_INTERNAL
Range reads to internal tables.
READ_RANGE_RESPONSE
Partition range read response.
READ_FROM_ITERATOR
Switching thread to read from an iterator.
READ_SECONDARY_INDEX
Switching thread to read from secondary index.
READ_DISK_ASYNC
Waiting for data from disk.
WRITE_LOCAL
Write request from a local node generated directly from clients.
WRITE_REMOTE
Write request from a remote replica
WRITE_INTERNAL
Writes to internal tables.
WRITE_RESPONSE
Write response
WRITE_DEFRAGMENT
Write issued to defragment data that required too many sstables to read (Pendable)
WRITE_MEMTABLE
Switching thread to write in memtable when not already on the correct thread

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
710
DataStax Enterprise tools

WRITE_POST_COMMITLOG_SEGMENT
Write request is waiting for the commit log segment to be allocated
WRITE_POST_COMMITLOG_SYNC
Write request is waiting for commit log to sync to disk
WRITE_POST_MEMTABLE_FULL
Write request is waiting for space in memtable
BATCH_REPLAY
Replaying a batch mutation
BATCH_STORE
Store a batchlog entry request (Pendable)
BATCH_STORE_RESPONSE
Store a batchlog entry response
BATCH_REMOVE
Remove a batchlog entry (Pendable)
COUNTER_ACQUIRE_LOCK
Acquiring counter lock.
EXECUTE_STATEMENT
Executing a statement.
CAS
Executing compare-and-set (LWT).
LWT_PREPARE
Preparation phase of light-weight transaction (Pendable).
LWT_PROPOSE
Proposal phase of light-weight transaction (Pendable).
LWT_COMMIT
Commit phase of light-weight transaction (Pendable).
TRUNCATE
Truncate request (Pendable).
NODESYNC_VALIDATION
NodeSync validation of a partition.
AUTHENTICATION
Authentication request.
AUTHORIZATION
Authorization request.
TIMED_UNKNOWN
Unknown timed task.
TIMED_TIMEOUT
Scheduled timeout task.
EVENTLOOP_SPIN
Number of busy spin cycles done by this TPC thread when it has no tasks to perform.
EVENTLOOP_YIELD
Number of Thread.yield() calls done by this TPC thread when it has no tasks to perform.
EVENTLOOP_PARK
Number of LockSupport.park() calls done by this TPC thread when it has no tasks to perform.
HINT_DISPATCH
Hint dispatch request (Pendable).
HINT_RESPONSE
Hint dispatch response.
NETWORK_BACKPRESSURE
Scheduled network backpressure.
Droppable messages
The database generates the messages listed below, but discards them after a timeout. The nodetool tpstats
command reports the number of messages of each type that have been dropped. You can view the messages
themselves using a JMX client.

Message Type Stage Notes

BINARY n/a Deprecated

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
711
DataStax Enterprise tools

Message Type Stage Notes

_TRACE n/a (special) Used for recording traces (nodetool


settraceprobability) Has a special executor (1
thread, 1000 queue depth) that throws away
messages on insertion instead of within the
execute

MUTATION MutationStage If a write message is processed after its


timeout (write_request_timeout_in_ms) it
either sent a failure to the client or it met its
requested consistency level and will relay
on hinted handoff and read repairs to do the
mutation if it succeeded.

COUNTER_MUTATION MutationStage If a write message is processed after its


timeout (write_request_timeout_in_ms) it
either sent a failure to the client or it met its
requested consistency level and will relay
on hinted handoff and read repairs to do the
mutation if it succeeded.

READ_REPAIR MutationStage Times out after write_request_timeout_in_ms

READ ReadStage Times out after read_request_timeout_in_ms.


No point in servicing reads after that point
since it would of returned error to client

RANGE_SLICE ReadStage Times out after


range_request_timeout_in_ms.

PAGED_RANGE ReadStage Times out after request_timeout_in_ms.

REQUEST_RESPONSE RequestResponseStage Times out after request_timeout_in_ms.


Response was completed and sent back but
not before the timeout

Synopsis

$ nodetool [connection_options] tpstats [-C] [-F json | yaml]

Table 194: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
712
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-C, --cores
Include data for each core. The number of cores is determined by the tpc_cores.
-F, --format json | yaml
The format for the output. The default is plain text. The following wait latencies (in ms) are included in
the following order: 50%, 75%, 95%, 98%, 99%, Min, and Max.
Examples
Run nodetool tpstats on the host labcluster

$ nodetool tpstats -C

Command output is:

Pool Name Active Pending (w/Backpressure)


Delayed Completed Blocked All time blocked
CompactionExecutor 0 0 (N/A)
N/A 196 0 0
GossipStage 0 0 (N/A)
N/A 2177 0 0
InternalResponseStage 0 0 (N/A)
N/A 8 0 0
MemtableFlushWriter 0 0 (N/A)
N/A 25 0 0
MemtablePostFlush 0 0 (N/A)
N/A 143 0 0
MemtableReclaimMemory 0 0 (N/A)
N/A 25 0 0

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
713
DataStax Enterprise tools

MigrationStage 0 0 (N/A)
N/A 7 0 0
PendingRangeCalculator 0 0 (N/A)
N/A 4 0 0
PerDiskMemtableFlushWriter_0 0 0 (N/A)
N/A 25 0 0
ReadRepairStage 0 0 (N/A)
N/A 2 0 0
TPC/0 0 0 (0)
0 3470 N/A 0
TPC/0/EVENTLOOP_SPIN 0 N/A (N/A)
N/A 49289 N/A N/A
TPC/0/READ_DISK_ASYNC 0 N/A (N/A)
N/A 21 N/A N/A
TPC/0/READ_INTERNAL 0 N/A (N/A)
N/A 1565 N/A N/A
TPC/0/READ_RANGE_INTERNAL 0 N/A (N/A)
N/A 14 N/A N/A
TPC/0/READ_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 1572 N/A N/A
TPC/0/TIMED_TIMEOUT 0 N/A (N/A)
N/A 5005 N/A N/A
TPC/0/UNKNOWN 0 N/A (N/A)
N/A 1 N/A N/A
TPC/0/WRITE_INTERNAL 0 N/A (N/A)
N/A 33 N/A N/A
TPC/0/WRITE_SWITCH_FOR_MEMTABLE 0 N/A (N/A)
N/A 251 N/A N/A
TPC/0/WRITE_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 13 N/A N/A
TPC/all/EVENTLOOP_SPIN 0 N/A (N/A)
N/A 49307 N/A N/A
TPC/all/NODESYNC_VALIDATION 0 N/A (N/A)
N/A 2 N/A N/A
TPC/all/READ_DISK_ASYNC 0 N/A (N/A)
N/A 21 N/A N/A
TPC/all/READ_INTERNAL 0 N/A (N/A)
N/A 1565 N/A N/A
TPC/all/READ_RANGE_INTERNAL 0 N/A (N/A)
N/A 14 N/A N/A
TPC/all/READ_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 1572 N/A N/A
TPC/all/TIMED_TIMEOUT 0 N/A (N/A)
N/A 5003 N/A N/A
TPC/all/UNKNOWN 0 N/A (N/A)
N/A 1 N/A N/A
TPC/all/WRITE_INTERNAL 0 N/A (N/A)
N/A 33 N/A N/A
TPC/all/WRITE_SWITCH_FOR_MEMTABLE 0 N/A (N/A)
N/A 251 N/A N/A
TPC/all/WRITE_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 13 N/A N/A
TPC/other 0 0 (0)
0 2 N/A 0
TPC/other/NODESYNC_VALIDATION 0 N/A (N/A)
N/A 2 N/A N/A

Message type Dropped Latency waiting in queue (micros)


50% 95% 99%
Max
RANGE_SLICE 0 N/A N/A N/A
N/A
SNAPSHOT 0 N/A N/A N/A
N/A
HINT 0 N/A N/A N/A
N/A

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
714
DataStax Enterprise tools

COUNTER_MUTATION 0 N/A N/A N/A


N/A
LWT 0 N/A N/A N/A
N/A
BATCH_STORE 0 N/A N/A N/A
N/A
VIEW_MUTATION 0 N/A N/A N/A
N/A
READ 0 0.00 917.50 917.50
1048.58
OTHER 0 0.00 5242.88 5242.88
6291.46
REPAIR 0 N/A N/A N/A
N/A
SCHEMA 0 917.50 1835.01 1835.01
2097.15
MUTATION 0 0.00 14680.06 14680.06
16777.22
NODESYNC 0 917.50 58720.26 58720.26
67108.86
READ_REPAIR 0 N/A N/A N/A
N/A
TRUNCATE 0 N/A N/A N/A
N/A

nodetool truncatehints
Truncates all hints on the local node or for one or more endpoints.
Synopsis

$ nodetool [connection_options] truncatehints [--] [endpoint ...]

Table 195: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
715
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
endpoint
Endpoint address or addresses. IP address or hostname.
nodetool upgradesstables
Rewrites SSTables for tables that are not running the current version of DataStax Enterprise to upgrade to
current version. Use this command when upgrading your server or changing compression options.
See sstableupgrade for SSTable compatibility with current DSE version.
Synopsis

$ nodetool [connection_options] upgradesstables [-a] [-j num_jobs] [--] [keyspace_name


table_name [table_name ...]]

Table 196: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
716
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

--
Separates an option from an argument that could be mistaken for a option.
-a, --include-all-sstables
Upgrade target SSTables, including SSTables already on the current DSE version.
-j, --jobs num_jobs

• num_jobs - Number of SSTables affected simultaneously. Default: 2.

• 0 - Use all available compaction threads.

keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
717
DataStax Enterprise tools

Examples

Upgrade all SSTables in the cycling keyspace and the cyclist_name table

$ nodetool upgradesstables --include-all-sstables cycling cyclist_name

Force upgrade all SSTables

Force upgrade all SSTables, including SSTables already on the current DSE version.

$ nodetool upgradesstables -a

Force upgrade of target SSTables

Force upgrade the SSTables for the specified keyspace and table, including SSTables already on the current
DSE version.

$ nodetool upgradesstables -a keyspace_name table_name

Upgrade four SSTables simultaneously until all SSTables are upgraded

$ nodetool upgradesstables --include-all-sstables --jobs 4

The number of jobs cannot exceed the concurrent_compactors configured in cassandra.yaml.

nodetool verify
Checks the data checksum for one or more specified tables.
Synopsis

$ nodetool [connection_options] verify [-e] [--] keyspace_name table_name [table_name ...]

Table 197: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
718
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

-e, --extended-verify
Each cell data, beyond simply checking SSTable checksums.
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
One table name, or many table names separated with a space.
table_name
The table name.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
719
DataStax Enterprise tools

Examples

Verify data checksum

$ nodetool -u username -pw password verify cycling cyclist_name

nodetool version
Provides the DSE database version.
Synopsis

$ nodetool [connection_options] version

Table 198: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
720
DataStax Enterprise tools

The JMX port number.


-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

This command takes no arguments.


Examples

Run nodetool version

$ nodetool version

ReleaseVersion: 4.0.0.607

nodetool viewbuildstatus
Shows the progress of a materialized view build.
Synopsis

$ nodetool [connection_options] viewbuildstatus keyspace_name view_name |


keyspace_name.view_name

Table 199: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
721
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Connection options

-h, --host hostname


The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.

Command arguments

keyspace_name
Keyspace name. By default, all keyspaces.
view
The name of the view.

dse commands
The dse commands for starting the database and connecting an external client to a DataStax Enterprise node and
performing common utility tasks.
About dse commands
The dse commands provide controls for starting and using DataStax Enterprise (DSE).
dse subcommands
Specify one dse subcommand and none or more optional command arguments.
When multiple flags are used, list them separately on the command line. For example, ensure there is a space
between -k and -s in dse cassandra -k -s.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
722
DataStax Enterprise tools

DSE Multi-Instance commands


To run standard DataStax Enterprise commands for nodes on a DSE Multi-Instance host machine, specify the
node name using this syntax:

sudo dse dse-nodeId subcommand [command_arguments]

For details, see DSE Multi-Instance commands.


dse command connection options
Options to authenticate connections to the database and to JMX for dse commands.
Synopsis

$ dse [-f config_file | -u username -p password] [-a jmx_username [-b jmx_password]] command
[options]

Table 200: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Specify how to connect and authenticate to the database for dse commands.
This list shows short form (-f filename) and long form (--config-file=filename):
-f, --config-file config_filename
File path to configuration file that stores credentials. The credentials in this configuration file override the
~/.dserc credentials. If not specified, then use ~/.dserc if it exists.
The configuration file can contain DataStax Enterprise and JMX login credentials. For example:

username=username

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
723
DataStax Enterprise tools

password=password
jmx_username=jmx_username
jmx_password=jmx_password

The credentials in the configuration file are stored in clear text. DataStax recommends restricting
access to this file only to the specific user.
-u username
Role to authenticate for database access.
-p, --password password
Password to authenticate for database access.
-a, --jmxusername jmx_username
User name for authenticating with secure local JMX.
-b, --jmxpassword jmx_password
Password for authenticating with secure local JMX. If you do not provide a password, you are prompted
to enter one.
Examples

To authenticate a connection to the database

$ dse -u user1 -p mypassword

To authenticate a connection using a configuration file

$ dse -f configfile

dse add-node
For DSE Multi-Instance, simplifies adding and configuring a node on a host machine. When optional parameters
are absent, the default values remain unchanged.

The user running the command must have permissions for writing to the directories that DSE uses, or use
sudo.

DSE Multi-Instance commands are supported only on package installations.

Synopsis

$ dse add-node -n nodeId [--advrep-directory advrepdirectory [--analytics] [--cdc-


directory=cdcdirectory] [--cluster=clustername] [--commit-directory=commitdirectory] [--
cpus=number_of_cpus] [--dc=datacenter_placement] [--data-directory=datadirectory] [--
dsefs] [--dsefs-directory=dsefsdatadirectory] [--graph] [--hadoop-logs=hadooplogsdirectory]
[help] [--hints-directory=hintsdirectory] [--jmxport=jmx_port] [--listen-
address=listen_IP_address] [--logs-directory=alllogsdirectory] [--max-heap-size=heapsize]
[--native-transport-address=native_transport_IP_address [--num-tokens=number_of_tokens] [--
pig-logs=piglogdirectory] [--rack=rack_placement] [--rpc-address=rpc_IP_address] [--saved-
caches-directory=savedcachesdirectory] [--search] [--seeds=IP_address1,IP_address2,...] [--
spark-local-directory=sparklocaldirectory] [--spark-log-directory=sparklogdirectory] [--
spark-worker-cores=number_of_cores] [--spark-worker-directory=sparkworkerdirectory] [--spark-

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
724
DataStax Enterprise tools

worker-memory=memory] [--tomcat-logs=tomcatlogsdirectory] [--unix-group=groupname] [--unix-


username=username]

Table 201: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

New node configuration options:


-n=nodeId, --node-id=nodeId
Required. For DSE Multi-Instance, the alphanumeric node name for the new node. The specified node
name is automatically prepended with dse- so that the resulting node ID is dse-nodeId. For example, if
you specify node1, the resulting node name is dse-node1.
--advrep-directory=advrepdirectory
Optional. The DSE Advanced Replication data directory.
Default: /var/lib/dse-nodeId/advrep
--analytics
Enable DSE Analytics.
--cdc-directory=cdcdirectory
Optional. The CDC raw data directory.
Default: /var/lib/dse-nodeId/cdc_raw
--cluster=clustername
Optional. The name of the DataStax Enterprise cluster that the new node belongs to. Only non-
whitespace values are supported.
--cpus=number_of_cpus
Optional. The number of cores.
--commit-directory=commitdirectory
Optional. The commit log directory.
Default: /var/lib/dse-nodeId/commitlog
--dc=datacenter_placement

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
725
DataStax Enterprise tools

Optional. The data center placement.


--data-directory=datadirectory
Optional. The root directory for storing data.
Default: /var/lib/dse-nodeId/data
--dsefs
Optional. Enable DSEFS.
--dsefs-directory=dsefsdatadirectory
Optional. The DSEFS data directory.
Default: /var/lib/dse-nodeId/dsefs
--graph
Optional. Enable DSE Graph.
--hadoop-logs=hadooplogsdirectory
Optional. The log directory for Hadoop logs.
Default: logs-directory/hadoop
--help
Optional. Send dse add-node option descriptions to standard output.
--hints-directory=hintsdirectory
Optional. The hints directory.
Default: /var/lib/dse-nodeId/hints
--jmxport=jmx_port
Optional. The DSE JMX metrics monitoring port.
--listen-address=listen_IP_address
Optional. The IP address or hostname that DSE binds to when connecting to other nodes.
--logs-directory=alllogsdirectory
Optional. The root directory for all of the logs.
Default: /var/log/dse-nodeId
--max-heap-size=heapsize
Optional. The Java heap size. If you omit MB the size is interpreted as megabytes.
--num-tokens=number_of_tokens
Optional. The number of tokens.
--pig-logs=piglogdirectory
The log directory for Pig logs.
Default: logs-directory/pig
--rack=rack_placement
Optional. The rack placement.
--rpc-address=rpc_IP_address
Optional. The IP address or hostname that DSE binds to for RPC requests.
--saved-caches-directory=savedcachesdirectory
Optional. The saved caches directory.
Default: /var/lib/dse-nodeId/saved_caches
--search
Optional. Enable DSE Search.
--seeds=IP_address1,IP_address2,...
Optional. A comma-separated list of IP addresses of the nodes to be used as seed nodes.
--spark-local-directory=sparklocaldirectory
Optional. The local directory for Spark Worker.
Default: /var/lib/dse-nodeId/spark/rdd
--spark-log-directory=sparklogdirectory
Optional. The log directory for Spark Worker.
Default: /var/log/dse-nodeId/spark/worker
--spark-worker-cores=number_of_cores
Optional. The maximum number of cores used by Spark executors.
--spark-worker-directory=sparkworkerdirectory
Optional. The data directory for Spark Worker.
Default: /var/lib/dse-nodeId/spark/worker
--spark-worker-memory=memory
Optional. The maximum amount of memory used by Spark executors. Specify unit of measure with k
(kilobytes), m (megabytes), g (gigabytes).
--tomcat-logs=tomcatlogsdirectory

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
726
DataStax Enterprise tools

Optional. The log directory for tomcat logs.


Default: logs-directory/tomcat
--unix-group=groupname
Optional. The UNIX group that owns the node configuration.
Default: cassandra
--unix-username=username
Optional. The UNIX user that owns the node configuration.
Default: cassandra
Examples

Add node1

$ dse add-node node1

The dse-node1 is created on the local machine.

Add a node that will join the cluster payroll on startup

$ dse add-node payrollnode --cluster payroll --listen-address 192.168.0.0 --rpc-address


192.168.0.1 --seeds 192.168.0.2

The payrollnode is created with the specified configuration options.


dse advrep

dse advrep commands


A list of commands for DSE Advanced Replication.
About the dse advrep command
The command line tool provides commands and options for configuring and using DSE Advanced Replication.
Synopsis

$ dse advrep [connection_options] [command] [sub_command] [sub_command_options]

The default port for DSE Advanced Replication is 9042.

Table 202: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
727
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Using dse advrep command line help


To view a listing of dse advrep commands:

$ dse advrep help

To view help for a specific command:

$ dse advrep help command [ sub_command ]

Connection options
JMX authentication is supported by some dse commands. Other dse commands authenticate with the user
name and password of the configured user. The connection option short form and long form are comma
separated.
You can provide authentication credentials in several ways, see Credentials for authentication.

General connection options:


--separator field_separator
The field separator for use with the --no-pretty-print command.
--verbose
Print verbose messages for command.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.
--no-pretty-print
If not specified, data is printed using tabular output. If specified, data is printed as a comma separated
list unless a separator is specified.
--cipher-suites ssl_cipher_suites
Specify comma-separated list of SSL cipher suites for connection to DSE when SSL is enabled. For
example, --cipher-suites c1,c2,c3.
--host hostname
The DSE node hostname or IP address.
--jmx-port jmx_port
The remote JMX agent port number. Default: 7199.
--jmx-pwd jmx_password
The password for authenticating with secure local JMX. If you do not provide a password, you are
prompted to enter one.
--jmx-user jmx_username
The user name for authenticating with secure local JMX.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
728
DataStax Enterprise tools

--kerberos-enabled true | false


Whether Kerberos authentication is enabled for connections to DSE. For example, --kerberos-
enabled true.
--keystore-password keystore_password
Keystore password for connection to DSE when SSL client authentication is enabled.
--keystore-path ssl_keystore_path
Path to the keystore for connection to DSE when SSL client authentication is enabled.
--keystore-type ssl_keystore_type
Keystore type for connection to DSE when SSL client authentication is enabled. JKS is the type
for keys generated by the Java keytool binary, but other types are possible, depending on user
environment.
-p password
The password to authenticate for database access. Can use the DSE_PASSWORD environment
variable.
--ssl
Whether SSL is enabled for connection to DSE.--ssl-enabled true is the same as --ssl.
--ssl-protocol ssl_protocol
SSL protocol for connection to DSE when SSL is enabled. For example, --ssl-protocol ssl4.
-t token
Specify delegation token which can be used to login, or alternatively, DSE_TOKEN environment
variable can be used.
--truststore_password ssl_truststore_password
Truststore password to use for connection to DSE when SSL is enabled.
--truststore_path ssl_truststore_path
Path to the truststore to use for connection to DSE when SSL is enabled. For example, --truststore-
path /path/to/ts.
--truststore-type ssl_truststore_type
Truststore type for connection to DSE when SSL is enabled. JKS is the type for keys generated by
the Java keytool binary, but other types are possible, depending on user environment. For example,
--truststore-type jks2.
-u username
User name of a DSE authentication account. Can use the DSE_USERNAME environment variable.
Examples
This connection example specifies that Kerberos is enabled and lists the replication channels:

$ dse advrep --host ip-10-200-300-138.example.lan --kerberos-enabled=true conf list

To use the server YAML files:

$ dse advrep --use-server-config conf list

To list output without pretty-print with a specified separator:

dse advrep --no-pretty-print --separator "|" destination list-conf

This output will result:

destination|name|value
mydest|addresses|192.168.200.100
mydest|transmission-enabled|true
mydest|driver-ssl-cipher-suites|
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,
mydest|driver-ssl-enabled|false
mydest|driver-ssl-protocol|TLS
mydest|name|mydest
mydest|driver-connect-timeout|15000
mydest|driver-max-requests-per-connection|1024

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
729
DataStax Enterprise tools

mydest|driver-connections-max|8
mydest|driver-connections|1
mydest|driver-compression|lz4
mydest|driver-consistency-level|ONE
mydest|driver-allow-remote-dcs-for-local-cl|false
mydest|driver-used-hosts-per-remote-dc|0
mydest|driver-read-timeout|15000

dse advrep channel create


Creates a replication channel for change data to flow between source clusters and destination clusters.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel create --source-keyspace keyspace_name --source-table


source_table_name --source-id source_id_name --source-id-column source_id_column_name --
destination destination --destination-keyspace destination_keyspace_name --destination-
table destination_table_name [ --fifo-order | --lifo-order ] [ --collection-enabled (true|
false) ] [ --priority channel_priority ] [ --transmission-enabled (true|false) ]

Table 203: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


The source cluster keyspace to replicate.
--source-table source_table_name (required)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
730
DataStax Enterprise tools

The source table to replicate.


--source-id id
A unique identifier for all data that comes from a particular source node.
--source-id-column source_id
The column that identifies the source id in the destination table.
--destination destination (required)
The destination where the replication will be sent; the user names the destination.
--destination-keyspace keyspace_name
The destination keyspace to which replication will be sent.
--destination-table table_name
The destination table to which replication will be sent.
--fifo-order
First in, first out channel (FIFO) replication order. Default.
--lifo-order
Last in, last out (LIFO) channel replication order.
--collection-enabled (true|false)
Whether to enable the source table for replication collection on creation.
--transmission-enabled (true|false)
Whether to replicate data collector for the table to the destination.
--priority channel_priority
The order in which the source table log files are transmitted.
Examples

To create a replication source channel:

$ dse advrep channel create --source-keyspace foo --source-table bar --source-id


source1 --source-id-column source_id --destination mydest --destination-keyspace foo --
destination-table bar --collection-enabled true --priority 1

with a result:

$ Created channel dc=Cassandra keyspace=foo table=bar to mydest

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set
the source-id and source-id-column differently from the global setting.
dse advrep channel update
Updates a replication channel configuration.
A replication channel is a defined channel of change data between source clusters and destination clusters.
To update a channel, specify a new value for one or more options.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel update --source-keyspace keyspace_name --source-table


source_table_name --source-id source_id_name --source-id-column source_id_column_name --
destination destination --destination-keyspace destination_keyspace_name --destination-

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
731
DataStax Enterprise tools

table destination_table_name [ --fifo-order | --lifo-order ] [ --collection-enabled (true|


false) ] [ --transmission-enabled (true|false) ] [ --priority channel_priority ]

Table 204: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


The source cluster keyspace to replicate.
--source-table source_table_name (required)
The source table to replicate.
--source-id id
A unique identifier for all data that comes from a particular source node.
--source-id-column source_id
The column that identifies the source id in the destination table.
--destination destination (required)
The destination where the replication will be sent; the user names the destination.
--destination-keyspace keyspace_name
The destination keyspace to which replication will be sent.
--destination-table table_name
The destination table to which replication will be sent.
--fifo-order
First in, first out channel (FIFO) replication order. Default.
--lifo-order
Last in, last out (LIFO) channel replication order.
--collection-enabled (true|false)
Whether to enable the source table for replication collection on creation.
--transmission-enabled (true|false)
Whether to replicate data collector for the table to the destination.
--priority channel_priority

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
732
DataStax Enterprise tools

The order in which the source table log files are transmitted.
Examples

To update a replication source channel configuration:

$ dse advrep --verbose channel update --source-keyspace demo --source-table


sensor_readings --destination mydest --lifo-order

with a result as seen using dse advrep channel status:

$
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|
priority|dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|demo |sensor_readings |true |true |LIFO |2 |
demo |sensor_readings |source1|source_id |mydest |true |
--------------------------------------------------------------------------------------------------------------

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set
the source-id and source-id-column differently from the global setting.
dse advrep channel delete
Deletes a replication channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.
To delete a channel, you must specify source information and the destination and data-center for the channel.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel delete --source-keyspace keyspace_name --source-table


source_table_name --destination destination --data-center-id data_center_id

Table 205: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
733
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


The source cluster keyspace to replicate.
--source-table source_table_name (required)
The source table to replicate.
--destination destination (required)
The destination where the replication will be sent; the user names the destination.
--data-center-id data_center_id
The datacenter for this channel.
Examples

To create a replication source channel:

$ dse advrep channel delete --source-keyspace foo --source-table bar --destination


mydest --data-center-id Cassandra

with a result:

Deleted channel dc=Cassandra keyspace=foo table=bar to mydest

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel pause
Pauses replication for a channel for change data to flow from a source cluster to a destination cluster.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Pause collection of data or transmission of data between a source cluster and destination cluster.

Command is supported only on nodes configured for DSE Advanced Replication.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
734
DataStax Enterprise tools

Synopsis

$ dse advrep channel pause --source-keyspace keyspace_name --source-table source_table_name


--destinations destination [ , destination ] --data-center-ids data_center_id [ ,
data_center_id ] --collection --transmission

Table 206: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
735
DataStax Enterprise tools

Examples

To pause a replication source channel:

$ dse advrep channel pause --source-keyspace foo --source-table bar --destinations


mydest --data-center-ids Cassandra

with a result:

Channel dc=Cassandra keyspace=foo table=bar collection to mydest was paused

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel resume
Resumes replication for a channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.
A channel can resume either the collection or transmission of replication between a source cluster and
destination cluster.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel resume --source-keyspace keyspace_name --source-table


source_table_name --destinations destination [ , destination ] --data-center-ids
data_center_id [ , data_center_id ] --collection --transmission

Table 207: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
736
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.
Examples

To resume a replication source channel:

$ dse advrep channel resume --source-keyspace foo --source-table bar --destinations


mydest --data-center-ids Cassandra

with a result:

Channel dc=Cassandra keyspace=foo table=bar collection to mydest was resumed

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel status
Prints status of a replication channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel status --data-center-id data_center_id --source-keyspace


keyspace_name --source-table source_table_name --destination destination

Table 208: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
737
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destination destination
The destination where the replication will be sent; the user names the destination.
--data-center-id data_center_id
The datacenter for this channel.
Examples

To print the status of a replication channel:

$ dse advrep channel status --source-keyspace foo --source-table bar --destination


mydest --data-center-id Cassandra

with a result:

--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |2 |
foo |bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel truncate
Truncates a channel to prevent replicating all messages that are currently in the replication log.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
738
DataStax Enterprise tools

A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep channel truncate --source-keyspace keyspace_name --source-table


source_table_name --destinations destination [ , destination ] --data-center-ids
data_center_id [ , data_center_id ]

Table 209: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
739
DataStax Enterprise tools

Examples

To truncate a replication channel to prevent replicating all messages that are


currently in the replication log:

$ dse advrep channel status --source-keyspace foo --source-table bar --destinations


mydest --data-center-ids Cassandra

with a result:

Channel dc=Cassandra keyspace=foo table=bar to mydest was truncated

The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep conf list
Lists configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep conf list

Table 210: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
740
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Examples

To list configuration settings:

$ dse advrep conf list

The result:

----------------------------
|name |value |
----------------------------
|audit_log_file |auditLog|
----------------------------
|permits |8 |
----------------------------
|audit_log_enabled|true |
----------------------------

The number of permits is 8, audit logging is enabled, and the audit log file name is auditLog.
dse advrep conf remove
Removes configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis

$ dse advrep conf remove --separator field_separator --audit-log-enabled true|false --


audit-log-compression none|gzip --audit-log-file log_file_name --audit-log-max-life-span-
mins number_of_minutes --audit-log-rotate-mins number_of_minutes --permits number_of_permits
--collection-max-open-files number_of_files --collection-time-slice-count number_of_files
--collection-time-slice-width time_period_in_seconds --collection-expire-after-write --
invalid-message-log

Table 211: Legend


Syntax conventions Description

Italics Variable value. Replace with a user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
741
DataStax Enterprise tools

--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log
Examples

To remove advanced replication configuration:

$ dse advrep conf remove --permits 8

with a result:

Removed config permits

dse advrep conf update


Updates configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep conf update --audit-log-enabled true|false --audit-log-compression none|


gzip --audit-log-file log_file_name --audit-log-max-life-span-mins number_of_minutes --
audit-log-rotate-mins number_of_minutes --permits number_of_permits --collection-max-open-
files number_of_files --collection-time-slice-count number_of_files --collection-time-
slice-width time_period_in_seconds --collection-expire-after-write --invalid-message-log

Table 212: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
742
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none
--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
743
DataStax Enterprise tools

Examples

To update configuration settings:

$ dse advrep conf update --permits 8 --audit-log-enabled true --audit-log-file auditLog

with a result:

Updated audit_log_file from null to auditLog


Updated permits from null to 8
Updated audit_log_enabled from null to true

dse advrep destination create


Creates a replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination create --name destination_name --addresses address_name [ ,


address_name ] [ --transmission-enabled (true|false) ] --driver-user user_name --driver-
pwd password --driver-used-hosts-per-remote-dc number_of_hosts --driver-connections
number_of_connections --driver-connections-max number_of_connections --driver-local-
dc data_center_name --driver-allow-remote-dcs-for-local-cl true|false --driver-
consistency-level [ ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|SERIAL|
LOCAL_SERIAL|LOCAL_ONE ] --driver-compression [ snappy|lz4 ] --driver-connect-timeout
timeout_in_milliseconds --driver-read-timeout timeout_in_milliseconds --driver-max-requests-
per-connection number_of_requests --driver-ssl-enabled true|false --driver-ssl-cipher-
suites --driver-ssl-protocol --driver-ssl-keystore-path --driver-ssl-keystore-password
--driver-ssl-keystore-type --driver-ssl-truststore-path --driver-ssl-truststore-password
--driver-ssl-truststore-type

Table 213: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
744
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
--addresses address_name [ , address_name ] (required)
The IP addresses of the destinations.
--transmission-enabled true | false
Whether the data collector for the table should be replicated to the destination.
--driver-user user_name
The username for the destination.
--driver-pwd password
The password for the destination.
--driver-used-hosts-per-remote-dc number_of_hosts
The number of hosts per remote datacenter that the datacenter-aware round robin policy considers
available for use.
--driver-connections number_of_connections
The number of connections that the driver creates.
--driver-connections-max number_of_connections
The maximum number of connections that the driver creates.
--driver-local-dc data_center_name
The name of the datacenter that is considered local.
--driver-consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|
SERIAL|LOCAL_SERIAL|LOCAL_ONE
The consistency level for the destination.
--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path
The SSL keystore path to use for driver connections.
--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
745
DataStax Enterprise tools

--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples
To update a replication destination:

$ dse advrep --verbose destination update --name mydest --addresses 10.200.182.148 --


transmission-enabled true

with a result:

Destination mydest created

dse advrep destination update


Updates a replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination update --name destination_name --addresses address_name [ ,


address_name ] [ --transmission-enabled true|false ] --driver-user user_name --driver-
pwd password --driver-used-hosts-per-remote-dc number_of_hosts --driver-connections
number_of_connections --driver-connections-max number_of_connections --driver-local-
dc data_center_name --driver-allow-remote-dcs-for-local-cl true|false --driver-
consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|SERIAL|
LOCAL_SERIAL|LOCAL_ONE --driver-compression snappy|lz4 --driver-connect-timeout
timeout_in_milliseconds --driver-read-timeout timeout_in_milliseconds --driver-max-requests-
per-connection number_of_requests --driver-ssl-enabled true|false --driver-ssl-cipher-
suites suite1 [, suite2, suite3 ] --driver-ssl-protocol protocol --driver-ssl-keystore-
path keystore_path --driver-ssl-keystore-password keystore_password --driver-ssl-keystore-
type keystore_type --driver-ssl-truststore-path truststore_path --driver-ssl-truststore-
password truststore_password --driver-ssl-truststore-type truststore_type

Table 214: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
746
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
--addresses address_name [ , address_name ] (required)
The IP addresses of the destinations.
--transmission-enabled true | false
Whether the data collector for the table should be replicated to the destination.
--driver-user user_name
The username for the destination.
--driver-pwd password
The password for the destination.
--driver-used-hosts-per-remote-dc number_of_hosts
The number of hosts per remote datacenter that the datacenter-aware round robin policy considers
available for use.
--driver-connections number_of_connections
The number of connections that the driver creates.
--driver-connections-max number_of_connections
The maximum number of connections that the driver creates.
--driver-local-dc data_center_name
The name of the datacenter that is considered local.
--driver-consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|
SERIAL|LOCAL_SERIAL|LOCAL_ONE
The consistency level for the destination.
--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path
The SSL keystore path to use for driver connections.
--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
747
DataStax Enterprise tools

--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples

To create a replication destination:

$ dse advrep --verbose destination update --name mydest --addresses 10.200.182.148 --


driver-consistency-level LOCAL_QUORUM

with a result:

Destination mydest updated


Updated addresses from 10.200.182.148 to 10.200.182.1648
Updated driver_consistency_level from ONE to LOCAL_QUORUM
Updated name from mydest to mydest

Notice that any option included causes a change to occur.


dse advrep destination delete
Deletes a given replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination delete --name destination_name

Table 215: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
748
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
Examples

To delete a replication destination:

$ dse advrep destination delete --name mydest

with a result:

Destination mydest removed

dse advrep destination list


Lists all replication destinations.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination list

Table 216: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
749
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Examples

To list all replication destinations:

$ dse advrep destination list

with a result:

----------------
|name |enabled|
----------------
|mydest|true |
----------------

dse advrep destination list-conf


Lists all configuration for a given replication destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination list-conf --separator field_separator --name destination_name

Table 217: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
750
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
Examples

To list the configuration for a replication destination:

$ dse advrep destination list-conf --name mydest

with a result:

KEYS: ---- [addresses, transmission-enabled, driver-ssl-cipher-suites, driver-ssl-


enabled, driver-ssl-protocol, name, driver-connect-timeout, driver-max-requests-per-
connection, driver-connections-max, driver-connections, driver-compression, driver-
consistency-level, driver-allow-remote-dcs-for-local-cl, driver-used-hosts-per-remote-
dc, driver-read-timeout]
-------------------------------------------------------------------------------------------
|destination|name |value
|
-------------------------------------------------------------------------------------------
|mydest |addresses |10.200.180.162
|
-------------------------------------------------------------------------------------------
|mydest |transmission-enabled |true
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-cipher-suites |
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA256,
|
| | |
TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384, |
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
751
DataStax Enterprise tools

| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |
TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256, |
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
| | |
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |
TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, |
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |
TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, |
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
752
DataStax Enterprise tools

| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-enabled |false
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-protocol |TLS
|
-------------------------------------------------------------------------------------------
|mydest |name |mydest
|
-------------------------------------------------------------------------------------------
|mydest |driver-connect-timeout |15000
|
-------------------------------------------------------------------------------------------
|mydest |driver-max-requests-per-connection |1024
|
-------------------------------------------------------------------------------------------
|mydest |driver-connections-max |8
|
-------------------------------------------------------------------------------------------
|mydest |driver-connections |1
|
-------------------------------------------------------------------------------------------
|mydest |driver-compression |lz4
|
-------------------------------------------------------------------------------------------
|mydest |driver-consistency-level |ONE
|
-------------------------------------------------------------------------------------------
|mydest |driver-allow-remote-dcs-for-local-cl|false
|
-------------------------------------------------------------------------------------------
|mydest |driver-used-hosts-per-remote-dc |0
|
-------------------------------------------------------------------------------------------
|mydest |driver-read-timeout |15000
|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
753
DataStax Enterprise tools

-------------------------------------------------------------------------------------------

dse advrep destination remove-conf


Removes configuration for a destination.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep destination remove-conf --name destination_name --addresses address_name


[ , address_name ] [ --transmission-enabled (true|false) ] --driver-user user_name --
driver-pwd password --driver-used-hosts-per-remote-dc --driver-connections --driver-
connections-max --driver-local-dc --driver-allow-remote-dcs-for-local-cl true|false --
driver-consistency-level [ ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|SERIAL|
LOCAL_SERIAL|LOCAL_ONE ] --driver-compression [ snappy|lz4 ] --driver-connect-timeout
timeout_in_milliseconds --driver-read-timeout timeout_in_milliseconds --driver-max-requests-
per-connection number_of_requests --driver-ssl-enabled true|false --driver-ssl-cipher-
suites --driver-ssl-protocol --driver-ssl-keystore-path --driver-ssl-keystore-password
--driver-ssl-keystore-type --driver-ssl-truststore-path --driver-ssl-truststore-password
--driver-ssl-truststore-type

Table 218: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--name destination_name (required)


The name of the destination.
--addresses address_name [ , address_name ] (required)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
754
DataStax Enterprise tools

The IP addresses of the destinations.


--transmission-enabled true | false
Whether the data collector for the table should be replicated to the destination.
--driver-user user_name
The username for the destination.
--driver-pwd password
The password for the destination.
--driver-used-hosts-per-remote-dc number_of_hosts
The number of hosts per remote datacenter that the datacenter-aware round robin policy considers
available for use.
--driver-connections number_of_connections
The number of connections that the driver creates.
--driver-connections-max number_of_connections
The maximum number of connections that the driver creates.
--driver-local-dc data_center_name
The name of the datacenter that is considered local.
--driver-consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|
SERIAL|LOCAL_SERIAL|LOCAL_ONE
The consistency level for the destination.
--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path
The SSL keystore path to use for driver connections.
--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
755
DataStax Enterprise tools

Examples

To remove configuration for a replication destination:

$ dse advrep --verbose destination remove-conf --transmission-enabled true

with a result:

Removed config transmission-enabled

dse advrep metrics list


Lists advanced replication JMX metrics.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep metrics list --metric group metric_group --metric-type metric_type

Table 219: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--metric group metric_group


The source cluster keyspace for which to show count.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
756
DataStax Enterprise tools

--metric-type metric_type
The source table for which to show count.
Examples

To display the JMX metrics:

$ dse advrep --host localhost --port 7199 metrics list

with a result:

------------------------------------------
|Group |Type |Count|
------------------------------------------
|Tables |MessagesDelivered |3000 |
------------------------------------------
|ReplicationLog|CommitLogsToConsume|1 |
------------------------------------------
|Tables |MessagesReceived |3000 |
------------------------------------------
|ReplicationLog|MessageAddErrors |0 |
------------------------------------------
|ReplicationLog|CommitLogsDeleted |0 |
------------------------------------------

--------------------------------------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate |
FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded |3000 |events/second|0.020790532589851248|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesDeleted |0 |events/second|0.0 |0.0
|0.0 |0.0 |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAcknowledged |3000 |events/second|0.020790529428089743|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|CommitLogMessagesRead|30740|events/second|0.21303361656215317 |
0.13538523143065767 |0.01686330377344829|0.11519609320406245 |
--------------------------------------------------------------------------------------------------------------

-------------------------------------
|Group |Type |Value|
-------------------------------------
|Transmission|AvailablePermits|30000|
-------------------------------------

To display JMX metrics for a particular metric group:

$ dse advrep --host localhost --port 7199 metrics list --metric-


group Tables

with a result:

--------------------------------
|Group |Type |Count|

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
757
DataStax Enterprise tools

--------------------------------
|Tables|MessagesDelivered|3000 |
--------------------------------
|Tables|MessagesReceived |3000 |
--------------------------------

To display JMX metrics for a particular metric type:

$ dse advrep --host localhost --port 7199 metrics list --metric-


type MessagesAdded

with a result:

-----------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate
|FifteenMinuteRate |OneMinuteRate |FiveMinuteRate
|
-----------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded|3000 |events/second|
0.020827685267120057|6.100068258619765E-28|2.964393875E-314|
5.515866021410421E-82|
-----------------------------------------------------------------------------------

dse advrep replog count


Returns the messages that have not been replicated.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep replog count --source-keyspace keyspace_name --source-table source_table_name


--destination destination --data-center-id data_center_id

Table 220: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
758
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--source-keyspace keyspace_name (required)


Define the source cluster keyspace for which to show count..
--source-table source_table_name (required)
Define the source table for which to show count.
--destination destination (required)
Define the destination for which to show count.
--data-center-id data_center_id
Define the data center for which to show the count.
Examples

To verify the record count held in a replication log:

$ dse advrep replog count --destination mydest --source-keyspace foo --source-table bar

with a result:

dse advrep replog analyze-audit-log


Reads the audit log and prints a summary.
A replication channel is a defined channel of change data between source clusters and destination clusters.

Command is supported only on nodes configured for DSE Advanced Replication.

Synopsis

$ dse advrep replog analyze-audit-log --file audit_log_filename

Table 221: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
759
DataStax Enterprise tools

Syntax conventions Description

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--file audit_log_filename
The audit log file to create.
Examples

To analyze the data in a replication log:

$ dse advrep replog analyze-audit-log --file auditLog

with a result:

foo, bar : inserts = 1000, insertErrors = 0


foo, bar : reads = 1000, sent = 0, deletes = 1000, readingErrors = 0, deletingErrors = 0

dse beeline
Starts the Beeline shell.

Command is supported only on nodes with analytics workloads.

Synopsis

$ dse beeline

This command takes no arguments.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
760
DataStax Enterprise tools

dse cassandra
Starts the database in transactional mode. Command options start the database in other modes and enable
advanced features on a node. See Starting DataStax Enterprise.
To change the DSE system properties on start up, see Setting system properties during startup.
Synopsis

dse cassandra [-k] [-s] [-g]


[-Dparameter_name=value]
[-f] [-h] [-p pidfile]
[-H JVM_dumpfile]
[-E JVM_errorfile]

When multiple flags are used, list them separately on the command line. For example, ensure there is a space
between -k and -s in dse cassandra -k -s.

Table 222: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Options
-k
Start the node in analytics mode. The first time the node starts up the analytics workload type is
configured.
-g
Start the node in graph mode. The first time the node starts up the graph workload type is configured.
-s
Start the node in search mode. The first time the node starts up the search workload type is configured.
-E

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
761
DataStax Enterprise tools

Change JVM error file.


-f
Start a real-time transactional node in the foreground.
-h
Display the usage and listing of the dse commands.
-H
Change JVM HeapDumpPath.
-p pidfilepath
Create the pid file. The pid file is typically used by monitoring processes and init scripts. Not compatible
with -f option.
Examples

Start a node in transactional mode

$ dse cassandra

In the foreground, start a node in transactional mode

$ dse cassandra -f

Start a node in DSE Analytics mode

$ dse cassandra -k

Start a node in SearchAnalytics mode

$ dse cassandra -k -s

Ensure there is a space between -k and -s in dse cassandra -k -s.

Start a node in DSE Analytics, DSE Graph, and DSE Search modes

$ dse cassandra -k -g -s

Ensure there is a space between -k, -g, and -s in dse cassandra -k -g -s.

Start a node in DSE Search mode and change the location of the search index
data on the server

$ dse cassandra -s -Ddse.solr.data.dir=filepath

See Managing the location of DSE Search data.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
762
DataStax Enterprise tools

Start a node in transactional mode without joining the ring

$ dse cassandra -Dcassandra.join_ring=false

Start a node in transactional mode to test compaction and compression


strategies

$ dse cassandra -Dcassandra.write_survey=true

Experiment with different strategies and benchmark write performance differences without affecting the
production workload. See Testing compaction and compression.

Start a node in transactional mode and pass the dead node IP address

$ dse cassandra -Dcassandra.replace_address=10.91.176.160

Start a node in transactional mode and create pid.txt

$ dse cassandra -p pid.txt

dse cassandra-stop
Stops the DataStax Enterprise process.
See Stopping a node.
Synopsis

$ cassandra-stop -p pid

Table 223: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
763
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

pid
DataStax Enterprise (cassandra) process id.
Examples

Stop by process id

cassandra-stop -p 41234

dse exec
Sets the environment variables required to run third-party tools that integrate with Spark:

• SPARK_HOME to point to the DSE Spark directory

• HADOOP_CONF_DIR to point to the Hadoop configuration directory within DSE

• Sets other environment variables required by DSE Spark to enable custom DSE

• Executes the given shell command

This command is typically used for third-party tools that integrate with Spark.
Synopsis

$ exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

Table 224: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
764
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Examples
See Using DSE Spark with third party tools and integrations.
dse fs
Starts the DSE File System (DSEFS). The DSEFS prompt shows the current working directory, which is the
default DSEFS search directory.
See DSEFS (DataStax Enterprise file system).
Synopsis

$ dse fs [--prefer-contact-points -h IP_address1,IP_address2,...]

--prefer-contact-points -h IP_address1,IP_address2,...
Give precedence to the specified hosts, regardless of proximity, when issuing DSEFS commands. As
long as the specified hosts are available, DSEFS will not switch to other DSEFS nodes in the cluster.
Without these options, DSEFS switches to the closest available DSEFS node.
Examples
Start DSEFS

$ dse fs

Connected to DataStax Enterprise File System 6.0.2 at DSE cluster Test Cluster
Type help to get the list of available commands.
dsefs dsefs://127.0.0.1:5598/ >

DSEFS starts on the closest available DSEFS node.


Start DSEFS

$ dse fs 10.0.0.2,10.0.0.5

Connected to DataStax Enterprise File System 6.0.2 at DSE cluster Test Cluster
Type help to get the list of available commands.
dsefs dsefs://127.0.0.1:5598/ >

DSEFS starts with precedence to the specified hosts, regardless of proximity.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
765
DataStax Enterprise tools

See DSEFS (DataStax Enterprise file system).


dse gremlin-console
DSE Gremlin Console automatically connects at startup to DataStax Enterprise (DSE) server. as configured in
the remote.yaml file. Override the configured host and port from the command line.
Synopsis

$ dse gremlin-console [-u username [-p password]] [hostname[:port]] [options]

Table 225: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

DSE connection parameters


When starting the Gremlin Console using this command it automatically connects to the host specified in the
remote.yaml.
-u username
When DataStax Enterprise Authentication is enabled use this option to login to the database.
Set the user name in a file or as an environment variable.
-p password
Optional password for DSE authentication. If omitted when a user name is specified, the password
prompt appears.
Set the password in a file or as an environment variable.
hostname
The hostname of the DataStax Enterprise to which the console connects. Overrides the setting in the
remote.yaml.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
766
DataStax Enterprise tools

port
Port number of the DataStax Enterpise database port, default is 9042. Overrides the setting in the
remote.yaml.
Options
Gremlin console options.
-C, --color
Disable use of ANSI colors.
-D, --debug
Enabled debug console output.
-Q, --quiet
Suppress superfluous console output.
-V, --verbose
Enable verbose Console output
-e, --execute=SCRIPT_NAME [ARG1 ARG2 …]
Execute the specified script and close the console on completion.
-h, --help
Display this help message.
-i, --interactive=SCRIPT_NAME [ARG1 ARG2 ... ]
Execute the specified script and leave the console open on completion.
-l
Set the logging level of components that use standard logging output independent of the Console.
-v, --version
Display the version.
dse hadoop fs
Invokes DSEFS operations using the HDFS interface to DSEFS. DseFileSystem has partial support of the
Hadoop FileSystem interface.
See Hadoop FileSystem interface implemented by DseFileSystem and DSEFS.
Synopsis

$ dse hadoop fs

Examples

Use Hadoop interface to DSEFS

$ dse hadoop fs

Connected to DataStax Enterprise File System 6.0.2 at DSE cluster Test Cluster
Type help to get the list of available commands.
dsefs dsefs://127.0.0.1:5598/ >

See DSEFS command line tool.


dse list-nodes
Lists the nodes that are configured for the DSE Multi-Instance host machine.
Since the default DataStax Enterprise node is called dse, the dse list-nodes command always returns at least
the dse node, even if nodes were not added with the dse add-node command.

DSE Multi-Instance commands are supported only on package installations.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
767
DataStax Enterprise tools

Synopsis

$ dse list-nodes

Table 226: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command takes no arguments and lists the nodes that are configured for the DSE Multi-Instance host
machine.
Examples

List the nodes

$ dse list-nodes

dse pyspark
Starts the Spark Python shell.
See the DataFrames documentation for an example of using PySpark, and the PySpark API documentation.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
768
DataStax Enterprise tools

Synopsis

$ dse pyspark

Table 227: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command takes no arguments.


dse remove-node
Removes a node that is configured for the DSE Multi-Instance host machine.

The user running the command must have permissions for writing to the directories that DSE uses, or use
sudo.

DSE Multi-Instance commands are supported only on package installations.

Synopsis

$ dse remove-node nodeId [--yes]

Table 228: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
769
DataStax Enterprise tools

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

nodeId
Required. Because the node name is always prepended with dse- the remove-node command works if
you specify dse-nodeID or just nodeID.
--yes
Confirms node deletion. Files are deleted and are not recoverable. When not specified, you are
prompted to confirm node deletion.
Examples

Remove the node payrollnode

$ dse remove-node payrollnode

or the equivalent command with the prepended dse-:

$ dse remove-node dse-payrollnode

The prompt for node deletion is displayed:

##############################
#
# WARNING
# You're trying to remove node dse-payrollnode
# This means that all configuration files for dse-payrollnode will be deleted
#
##############################

Do you wish to continue?


1) Yes
2) No

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
770
DataStax Enterprise tools

#?

Remove the node dse-payrollnode with explicit confirmation

$ dse remove-node dse-payrollnode --yes

dse spark
Enters interactive Spark shell and offers basic auto-completion.

Command is supported only on nodes with analytics workloads.

For details on using Spark with DSE, see:

• Accessing database data from Spark

• BYOS (Bring Your Own Spark)

• Importing graphs using DseGraphFrame

• Starting Spark

Synopsis

$ dse connection_options spark [-framework dse|spark-2.0] [--help] [--verbose] [--


conf name=spark.value|sparkproperties.conf] [--executor-memory mem] [--jars additional-
jars] [--master dse://?appReconnectionTimeoutSeconds=secs] [--properties-file
path_to_properties_file] [--total-executor-cores cores] [-i app_script_file]

Table 229: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
771
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

In general, Spark submission arguments (--submission_args) are translated into system properties -
Dname=value and other VM parameters like classpath. The application arguments (-app_args) are passed
directly to the application.
Configure the Spark shell with these arguments:
--conf name=spark.value|sparkproperties.conf
An arbitrary Spark option to the Spark configuration prefixed by spark.

• name-spark.value

• sparkproperties.conf - a configuration

--executor-memory mem
The amount of memory that each executor can consume for the application. Spark uses a 512 MB
default. Specify the memory argument in JVM format using the k, m, or g suffix.
-framework dse|spark-2.0
The classpath for the Spark shell. When not set, the default is dse.

• dse - Sets the Spark classpath to the same classpath that is used by the DSE server.

• spark-2.0 - Sets a classpath that is used by the open source Spark (OSS) 2.0 release to
accommodate applications originally written for open source Apache Spark. Uses a BYOS (Bring
Your Own Spark) JAR with shaded references to internal dependencies to eliminate complexity
when porting an app from OSS Spark.
If the code works on DSE, applications do not require the spark-2.0 framework. Full support
in the spark-2.0 framework might require specifying additional dependencies. For example:
hadoop-aws is included on the dse server path but is not present on the OSS Spark-2.0
classpath. In this example, applications that use S3 or other AWS APIs must include their
own aws-sdk on the runtime classpath. This additional runtime classpath is required only for
applications that cannot run on the DSE classpath.

--help
Shows a help message that displays all options except DataStax Enterprise Spark shell options.
-i app_script_file
Spark shell application argument that runs a script from the specified file.
--jars path_to_additional_jars
A comma-separated list of paths to additional JAR files.
--master dse://?appReconnectionTimeoutSeconds=secs
A custom timeout value when submitting the application, useful for troubleshooting Spark application
failures. The default timeout value is 5 seconds.
--properties-file path_to_properties_file
The location of the properties file that has the configuration settings. By default, Spark loads the settings
from spark-defaults.conf.
--total-executor-cores cores
The total number of cores the application uses.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
772
DataStax Enterprise tools

Examples

Start the Spark shell

$ dse spark

Start the Spark shell with case-sensitivity

DseGraphFrame and Spark SQL are case insensitive by default. Column names that differ only in case will result
in conflicts. The Spark property spark.sql.caseSensitive=true avoids case conflicts.

$ dse spark --conf spark.sql.caseSensitive=true

Set the timeout value to 10 seconds

$ dse spark --master dse://?appReconnectionTimeoutSeconds=10

Useful for troubleshooting, see Detecting Spark application failures.


dse spark-class
Launches Spark application contained within a class on a cluster.

Command is supported only on nodes with analytics workloads.

Synopsis

$ dse spark-class -options class_name arguments

Table 230: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
773
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command supports the same options as running a class using the java command.
Examples

Run the org.apache.spark.deploy.Client class to kill a particular application:

$ dse spark-class org.apache.spark.deploy.Client kill master URI


driver ID

dse spark-jobserver
Starts and stops the Spark Jobserver that is bundled with DSE.

Command is supported only on nodes with analytics workloads.

See Spark Jobserver.


Synopsis

$ dse spark-jobserver start [--properties-file path_to_properties_file] [--executor-


memory memory] [--total-executor-cores cores] [--conf name=spark.value] [--jars
path_to_additional_jars] [--verbose] | stop

Table 231: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
774
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

start
Starts the Spark Jobserver.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.
stop
Stops the Spark Jobserver.
For the dse spark-jobserver start command, apply one or more valid spark-submit options.
--properties-file path_to_properties_file
The location of the properties file that has the configuration settings. By default, Spark loads the settings
from spark-defaults.conf.
--executor-memory mem
The amount of memory that each executor can consume for the application. Spark uses a 512 MB
default. Specify the memory argument in JVM format using the k, m, or g suffix.
--total-executor-cores cores
The total number of cores the application uses.
--conf name=spark.value|sparkproperties.conf
An arbitrary Spark option to the Spark configuration prefixed by spark.

• name-spark.value

• sparkproperties.conf - a configuration

--jars path_to_additional_jars
A comma-separated list of paths to additional JAR files.
Examples

Start the Spark Jobserver without submit options

dse spark-jobserver start

Start the Spark Jobserver with submit option

dse spark-jobserver start --properties-file spark.conf

See spark-submit options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
775
DataStax Enterprise tools

Stop the Spark Jobserver

dse spark-jobserver stop

dse spark-history-server
Starts and stops the Spark history server, the front-end application that displays logging data from all nodes in
the Spark cluster.

Configuration is required for the Spark history server. See Spark history server.

Synopsis

$ dse spark-history-server start [--properties-file properties_file]|stop

Table 232: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

start
Starts the Spark history server to load the event logs from Spark jobs that were run with event logging
enabled. The Spark history server can be started from any node in the cluster.
--properties-file properties_file
The properties file to overwrite the default Spark configuration in conf/spark-defaults.conf. The
properties file can include settings like the authentication method and credentials and event log location.
stop
Stops the Spark history server.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
776
DataStax Enterprise tools

Examples

Start the Spark history server on the local node

dse spark-history-server start

The Spark history server is started with the default configuration in conf/spark-defaults.conf.

Start the Spark history server with a properties file

dse spark-history-server start --properties-file sparkproperties.conf

The Spark history server is started with the configuration specified in sparkproperties.conf.
dse spark-sql
Starts the Spark SQL shell in DSE to interactively perform Spark SQL queries.
The Spark SQL shell in DSE automatically creates a Spark session and connects to the Spark SQL Thrift server
to handle the underlying JDBC connections. See Using Spark SQL to query data.
Synopsis

$ dse spark-sql

Table 233: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
777
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command accepts no parameters.


Examples

Start the Spark SQL shell

$ dse spark-sql

The log file is at /home/ubuntu/.spark-sql-shell.log


spark-sql>

At the spark-sql prompt, you can interactively perform Spark SQL queries.
dse spark-sql-thriftserver
Starts and stops the Spark SQL Thriftserver. The Spark SQL Server uses a JDBC and an ODBC interface for
client connections to DSE.
Configuration is required for the Spark SQL Thriftserver. See Using the Spark SQL Thriftserver.
Synopsis

$ dse spark-sql-thriftserver start [--conf spark_prop] [--hiveconf hive_prop]|stop

Table 234: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
778
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

start
Starts the Spark SQL Thriftserver. The user who runs the command to start the Spark SQL Thriftserver
requires permissions to write to the Spark directories.
--conf spark_prop
Pass in general Spark configuration settings, like spark.cores.max=4.
-hiveconf config_file
Pass in a hive configuration property, like hive.server2.thrift.port=10001.
stop
Stops the Spark SQL Thriftserver.
Examples

Start the Spark SQL Thriftserver with default Spark and Hive options

$ dse spark-sql-thriftserver start

Start the Spark SQL Thriftserver with a Spark configuration option

$ dse spark-sql-thrift-server start --conf spark.cores.max=4

Start the Spark SQL Thriftserver with a Hive configuration option

$ dse spark-sql-thrift-server start --hiveconf hive.server2.thrift.port=10001

Stop the Spark SQL Thriftserver

$ dse spark-sql-thriftserver stop

dse spark-submit
Launches applications on a cluster to enable use of Spark cluster managers through a uniform interface. This
command supports the same options as Apache Spark spark-submit.

Command is supported only on nodes with analytics workloads.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
779
DataStax Enterprise tools

Synopsis

$ dse spark-submit --class class_name jar_file other_options| --status|--kill driver_id [--


master master_ip_address]

Table 235: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command supports the same options as Apache Spark spark-submit. Unlike the standard behavior for the
Spark status and kill options, in DSE deployments these options do not require the Spark Master IP address.
kill driver_id
Kill a Spark application running in the DSE cluster.
master master_ip_address
The IP address of the Spark Master running in the DSE cluster.
status driver_id
Get the status of a Spark application running in the DSE cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
780
DataStax Enterprise tools

Examples

Run the HTTP response example program (located in the dse-demos directory)
on two nodes:

$ dse spark-submit --class com.datastax.HttpSparkStream target/


HttpSparkStream.jar -d 2

To submit an application using cluster mode using the supervise option to


restart in case of failure

$ dse spark-submit --deploy-mode cluster --supervise --class com.datastax.HttpSparkStream


target/HttpSparkStream.jar -d $NUM_SPARK_NODES

To submit an application using cluster mode when TLS is enabled

Pass the SSL configuration with standard Spark commands to use secure HTTPS on port 4440.

$ dse spark-submit \ --conf spark.ssl.ui.enabled=true \ --conf


spark.ssl.ui.keyPassword=keystore password \ --conf spark.ssl.ui.keyStore=path to keystore \
myApplication.jar

To set the driver host to a publicly accessible IP address

$ dse spark-submit --conf spark.driver.host=203.0.113.0 myApplication.jar

To get the status of a driver

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

$ dse spark-submit --status driver-20180726160353-0019

Result when the driver exists:

Driver driver-20180726160353-0019 found: state=<state>, worker=<workerId>


(<workerHostPort>)

To kill a driver

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
781
DataStax Enterprise tools

Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.

$ dse spark-submit --kill driver-20180726160353-0019

dse SparkR
Starts the R shell configured with DSE Spark to automatically set the Spark session within R. See Using SparkR
with DataStax Enterprise.

Command is supported only on nodes with analytics workloads.

Synopsis

$ dse SparkR

Table 236: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command accepts no parameters.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
782
DataStax Enterprise tools

Examples

Starts the R shell configured with DSE Spark

$ dse sparkR

dse -v
Sends the DataStax Enterprise version number to standard output.
This command does not require authentication.
Synopsis

$ dse -v

Example

Run DSE version

$ dse -v

6.0.7

dse client-tool
About dse client-tool
The dse client-tool command line interface connects an external client to a DataStax Enterprise node and
performs common utility tasks.
Connection options
Connection options specify how to connect and authenticate for all dse client-tool commands:
Short Long Description

--port Port number.

-p --password Password.

-u --username Username.

-a DSE authorization username if proxy authentication is used.

-t Delegation token which can be used to login. Alternatively, you can use
the DSE_TOKEN environment variable.

-- Separates command parameters from a list of options.

• If a username and password for RMI authentication are set explicitly in the cassandra-env.sh file for the
host, then you must specify credentials.

• The repair and rebuild commands can affect multiple nodes in the cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
783
DataStax Enterprise tools

• Most nodetool commands operate on a single node in the cluster if -h is not used to identify one or more
other nodes. If the node from which you issue the command is the intended target, you do not need the -h
option to identify the target; otherwise, for remote invocation, identify the target node, or nodes, using -h.

Example:

nodetool -u username -pw password describering demo_keyspace

Using dse client-tool command line help


To show a listing of the dse client-tool subcommands:

$ dse client-tool help

To show the command line help for a specific dse client-tool subcommand:

$ dse client-tool help subcommand

For example:

$ dse client-tool help configuration

dse client-tool connection options


You must authenticate connections to an external client for dse client-tool commands.
JMX authentication is supported by some dsetool commands. Other dsetool commands authenticate with the
user name and password of the configured user. The connection option short form and long form are comma
separated.

You can provide authentication credentials in several ways, see Credentials for authentication.
To enable dsetool to use Kerberos authentication, see Using dsetool with Kerberos enabled cluster.

Different sources of configuration properties are used to connect external clients to a DSE node: DSE
configuration in dse.yaml and cassandra.yaml.

You can provide authentication credentials in several ways, see Credentials for authentication. The dse
client-tool subcommands use DSE Unified Authentication, like the Java and other language drivers, not JMX
authentication like dsetool.

RPC permissions over the native protocol leverage DSE authentication and role-based access abilities. To
configure external client access to DataStax Enterprise commands, see Authorizing remote procedure calls
(RPC).
DSE proxy authentication can be used with dse client-tool, and delegation tokens can be generated for
the proxy authenticated role. If the role alice is authenticated, and alice uses proxy authorization to the role
bob, alice's delegation token can be used authenticate as alice and authorize as bob. If bob loses login
permissions, the token can still be used to login as alice, because the token reflects alice's authentication. If
alice loses authorization permissions for bob, the token cannot be used to login .

Synopsis

$ dse client-tool [-a proxy_auth_username] [-u username] [-p password] [--port port] [--host
hostname] [--sasl-protocol-name dse_service_principal] [--keystore-path ssl_keystore_path]
[--keystore-password keystore_password] [--keystore-type ssl_keystore_type] [--truststore-
path ssl_truststore_path] [--truststore-password ssl_truststore_password] [--truststore-type
ssl_truststore_type] [--cipher-suites ssl_cipher_suites] [--kerberos-enabled (true | false)]

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
784
DataStax Enterprise tools

[--ssl-enabled (true | false)] [--use-server-config] [-t delegation token] [--ssl-protocol


ssl_protocol] command [options]

Table 237: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--cipher-suites ssl_cipher_suites
Specify comma-separated list of SSL cipher suites for connection to DSE when SSL is enabled. For
example, --cipher-suites c1,c2,c3.
--host hostname
The DSE node hostname or IP address.
--kerberos-enabled true | false
Whether Kerberos authentication is enabled for connections to DSE. For example, --kerberos-enabled
true.
--keystore-password keystore_password
Keystore password for connection to DSE when SSL client authentication is enabled.
--keystore-path ssl_keystore_path
Path to the keystore for connection to DSE when SSL client authentication is enabled.
--keystore-type ssl_keystore_type
Keystore type for connection to DSE when SSL client authentication is enabled. JKS is the type for keys
generated by the Java keytool binary, but other types are possible, depending on user environment.
-p password
The password to authenticate for database access. Can use the DSE_PASSWORD environment
variable.
--port port
The native protocol RPC connection port (Thrift).
--sasl-protocol-name dse_service_principal
SASL protocol name, that is, the DSE service principal name.
--ssl

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
785
DataStax Enterprise tools

Whether SSL is enabled for connection to DSE.--ssl-enabled true is the same as --ssl.
--ssl-protocol ssl_protocol
SSL protocol for connection to DSE when SSL is enabled. For example, --ssl-protocol ssl4.
-t token
Specify delegation token which can be used to login, or alternatively, DSE_TOKEN environment
variable can be used.
--truststore_password ssl_truststore_password
Truststore password to use for connection to DSE when SSL is enabled.
--truststore_path ssl_truststore_path
Path to the truststore to use for connection to DSE when SSL is enabled. For example, --truststore-
path /path/to/ts.
--truststore-type ssl_truststore_type
Truststore type for connection to DSE when SSL is enabled. JKS is the type for keys generated by
the Java keytool binary, but other types are possible, depending on user environment. For example, --
truststore-type jks2.
-u username
User name of a DSE authentication account. Can use the DSE_USERNAME environment variable.
-a proxy_auth_username
DSE authorization username if proxy authentication is used.
--use-server-config
Read parameters from server yaml configuration files. It assumes this node is properly configured.
dse client-tool cassandra
Performs token management and partitioner discovery.

Token management commands require Kerberos authentication mode.

Synopsis

dse connection_options client-tool cassandra


(cancel-token token |
generate-token [username] |
renew-token token |
partitioner)

Table 238: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
786
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

cancel-token token
Cancel the specified token.
generate-token [username]
Generate delegation token to access Kerberos DSE from non-Kerberos clusters.

• When the username is not specified, the current user is the token renewer. Only DSE processes
can renew a token.

• When the username is specified as the token renewer, that user can renew and cancel the token.

partitioner
Returns the partitioner that is being used by the node.
renew-token token
Renew the specified token.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
787
DataStax Enterprise tools

Examples

Generate token with the current user as the token renewer

dse client-tool cassandra generate-token

Generate token with user AdminAlicia as the token renewer

dse client-tool cassandra generate-token --token-renewer AdminAlicia

Return the current partitioner

dse client-tool cassandra partitioner

Cancel specified token

dse client-tool cassandra cancel-token token

Renew specified token

dse client-tool cassandra renew-token token

dse client-tool configuration export


Exports the DataStax Enterprise client configuration from a remote node.
To run Spark commands against a remote cluster, you must copy the exported file from the remote node to the
local client machine.
Synopsis

dse client-tool connection_options configuration


export filename

Table 239: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
788
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

file
File name for the exported compressed file. For example, dse-config.jar.
Examples

To export the DataStax Enterprise client configuration from the remote node:

dse client-tool configuration export dse-config.jar

dse client-tool configuration byos-export


Exports the DSE node configuration to a Spark-compatible file that can be copied to a node in the external Spark
cluster and used with the Spark shell.
See Generating the BYOS configuration file.
Synopsis

dse client-tool connection_options configuration byos-export


[--default-properties path_to_existing_properties_file]
[--export-credentials]
[--generate-token [--token-renewer username]]

Table 240: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
789
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--default-properties spark_propfile_path dse_spark_propfile_path


The path to the default Spark properties file and the DataStax Enterprise Spark properties file to merge
properties from both.
--export-credentials
Store current DSE user and password in the generated configuration file.
file
The file name for the generated Spark-compatible file. For example, byos.properties.
--generate-token
Generates digest authentication token to support access to DSE clusters secured with Kerberos from
non-Kerberos clusters.
--set-keystore-password password
The keystore password for connection to the database when SSL client authentication is enabled.
--set-keystore-path path
The path to the SSL keystore when SSL client authentication is enabled. All nodes must store the
keystore in the same location.
--set-keystore-type type
The keystore type when SSL client authentication is enabled. If not specified, the default is JKS.
--set-truststore-password password
Include the specified truststore password in the configuration file.
--set-truststore-path path
Path to SSL truststore on Spark nodes. All nodes must store the truststore in the same location.
--set-truststore-type type
The truststore type when SSL client authentication is enabled. If not specified, the default is JKS.
--token-renewer userid
User with permission to renew or cancel the token. When not specified, only the DSE process can
renew the generated token.
Examples
You can export the DSE node configuration to a Spark-compatible file with various options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
790
DataStax Enterprise tools

Generate the byos.properties file in your home directory

dse client-tool configuration byos-export ~/byos.properties

Merge the default Spark properties with the DSE Spark properties

dse client-tool configuration byos-export --default-properties /usr/lib/spark/conf/spark-


defaults.conf /home/user1/.dse/byos.conf

dse client-tool configuration import


Imports configuration file and generates local configuration files and a cqlshrc file with settings from the imported
file so the DSE client applications can remotely access the running DSE cluster.
Run this command on a client node to set up the local DSE installation for integrated client applications.
Synopsis

dse client-tool connection_options configuration import file


[--cqlshrc [file]]
[--force]

Table 241: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--cqlshrc

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
791
DataStax Enterprise tools

Generate a cqlshrc file for the DSE client node.


file
Path to cqlshrc file to be generated. When a file is not specified, the default file is the ~/.cassandra/
cqlshrc file.
--force
Force an overwrite of existing configuration files. By default, the import command fails if the
configuration files already exist.
--set-keystore-password password
The keystore password for connection to the database when SSL client authentication is enabled.
--set-keystore-path path
The path to the SSL keystore when SSL client authentication is enabled. All nodes must store the
keystore in the same location.
--set-keystore-type type
The keystore type when SSL client authentication is enabled. If not specified, the default is JKS.
--set-truststore-password password
Include the specified truststore password in the configuration file.
--set-truststore-path path
Path to SSL truststore on Spark nodes. All nodes must store the truststore in the same location.
--set-truststore-type type
The truststore type when SSL client authentication is enabled. If not specified, the default is JKS.
Examples
Run the import command on the client node.

Import the configuration file with default values:

dse client-tool configuration import dse-config.jar

Create a local cqlshrc file with the default name:

dse client-tool configuration import dse-config.jar --cqlshrc

Force an overwrite of the existing configuration file:

dse client-tool configuration import dse-config.jar --force

dse client-tool spark


Perform operations related to integrated Spark.
Synopsis

dse client-tool connection_options spark


(master-address | leader-address | version |
sql-schema (--exclude | --keyspace | --table | --decimal | --all)
metastore-migrate --from_version --to_version)

Table 242: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
792
DataStax Enterprise tools

Syntax conventions Description

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

leader-address
Returns the IP address of the currently selected Spark Master for the datacenter.
master-address
Returns the localhost IP address used to configure Spark applications. The address is returned as URI:

dse://ip:port?connection.local_dc=dc_name;connection.host=cs_list_contactpoints;

The connection.host=cs_list_contactpoints option is a comma separated list of IP addresses of


additional contact points. The additional contact points are up to five randomly selected nodes from the
datacenter.

DSE automatically connects Spark applications to the Spark Master. You do not need to use the IP
address of the current Spark Master in the connection URI.
metastore-migrate --from_version --to_version
Migrate Spark SQL metastore from one DSE version to another DSE version.

• --from_version - the version to migrate metastore from

• --to_version - the version to migrate metastore to

version
Returns the version of Spark that is bundled with DataStax Enterprise.
sql-schema (--exclude | --keyspace | --table | --decimal | --all)
Exports the SQL table creation query with these options:

• --table tablename - comma-separated list of tables to include

• --exclude csvlist - comma-separated list of tables to exclude

• --all - includes all keyspaces

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
793
DataStax Enterprise tools

• --keyspace csvlist - comma-separated list of keyspaces to include

Examples

View the Spark connection URL for this datacenter:

$ dse client-tool spark master-address dse://10.200.181.62:9042?


connection.local_dc=Analytics;connection.host=10.200.181.63;

View the IP address of the current Spark Master in this datacenter:

$ dse client-tool spark leader-address 10.200.181.62

Generate Spark SQL schema files

You can use the generated schema files with Spark SQL on external Spark clusters.

$ dse client-tool --use-server-config spark sql-schema --all > output.sql

Migrate Spark metastore

To map custom external tables from DSE 5.0.11 to the DSE 6.0.0 release format of the Hive metastore used by
Spark SQL after upgrading:

$ dse client-tool spark metastore-migrate --from 5.0.11 --to 6.0.0

dse client-tool alwayson-sql


Perform operations related to AlwaysOn SQL.
Synopsis

dse client-tool connection_options alwayson-sql


(status | stop | start | restart | reconfig)

Table 243: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
794
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

status
Get the AlwaysOn SQL service status of the datacenter. With the --dc datacenter name option, get the
status of the specified datacenter.
The returned status is one of:

• RUNNING: the server is running and ready to accept client requests.

• STOPPED_AUTO_RESTART: the server is being started but is not yet ready to accept client requests.

• STOPPED_MANUAL_RESTART: the server was stopped with either a stop or restart command. If the
server was issued a restart command, the status will be changed to STOPPED_AUTO_RESTART as
the server starts again.

• STARTING: the server is actively starting up but is not yet ready to accept client requests.

stop
Manually stop the AlwaysOn SQL service. With the --dc datacenter name option, manually stop the
service on the specified datacenter.
start
Manually start the AlwaysOn SQL service. With the --dc datacenter name option, manually start the
service on the specified datacenter. The service will start automatically if its been enabled.
restart
Manually restart a running AlwaysOn SQL service. With the --dc datacenter name option, manually
restart the service on the specified datacenter.
reconfig
Manually reconfigure the AlwaysOn SQL service. With the --dc datacenter name option, manually
reconfigure the service on specified datacenter. Running this command will tell the service to re-read
the configuration options.
The alwayson_sql_options section in dse.yaml, described in detail at AlwaysOn SQL options, has
options for setting the ports, timeout values, log location, and other Spark or Hive configuration settings.
Additional configuration options are located in spark-alwayson-sql.conf.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
795
DataStax Enterprise tools

Examples

Stop a running service:

$ dse client-tool alwayson-sql stop

Start the service on a particular datacenter:

$ dse client-tool alwayson-sql --dc dc-west start

Force the service to stop:

$ dse client-tool alwayson-sql stop

Reread the configuration options for a running service:

$ dse client-tool alwayson-sql reconfig

dse nodesync
The NodeSync service continuous background repair is enabled on a per table basis.
Modifies CQL nodesync property on one or more tables, enables nodesync tracing and monitoring.
Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.

Synopsis

[dse] nodesync
[(-ca cql_Authprovider | --cql-auth-provider cql_Authprovider)]
[(-cp cql_password | --cql-password cql_password)]
[(-cs | --cql-ssl)]
[(-cu cql_username | --cql-username cql_username)]
[(-h cql_host | --host cql_host)]
[help]
[(-jp jmx_password | --jmx-password jmx_password)]
[(-jpf jmx_password_file | --jmx-password-file jmx_password_file)]
[(-js | --jmx-ssl)]
[(-ju jmx_username | --jmx-username jmx_username)]
[(-p cql_port | --port cql_port )]
subcommand [options]

Table 244: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square
brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
796
DataStax Enterprise tools

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type
the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation
marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema
and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
797
DataStax Enterprise tools

Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
Example

Display top-level help

$ nodesync help

usage: nodesync [(-js | --jmx-ssl)] [(-p <cqlPort> | --port <cqlPort>)]


[(-cu <cqlUsername> | --cql-username <cqlUsername>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)] [(-cs | --cql-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] <command> [<args>]

The most commonly used nodesync commands are:


disable Disable nodesync on the specified tables
enable Enable nodesync on the specified tables
help Display help information
tracing Enable/disable tracing for NodeSync
validation Monitor/manage user-triggered validations

See 'nodesync help <command>' for more information on a specific command.

For more command-specific help, see nodesync help.


nodesync disable
Disables NodeSync on one or more target tables by setting the nodesync enabled property to false.
Synopsis

[dse] nodesync main_options disable


[(-k keyspace_name | --keyspace keyspace_name)]
[--quiet]
[(-v | --verbose)]
[--] [(table_list | "*")]

Table 245: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
798
DataStax Enterprise tools

Syntax conventions Description

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
799
DataStax Enterprise tools

Suppress warning and error messages.


-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
Disable options
The following options apply to the disable subcommand:
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

Examples

Disable on single table

Set nodesync enabled to false on one table:

$ nodesync disable demo.health_data

No messages returned on success.

Disable on list of tables in different keyspaces

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
800
DataStax Enterprise tools

Set nodesync enabled to false on some tables in a keyspace:

$ nodesync disable -v -k demo -- test1 test2 test3

Displays a message for each table that was disabled:

Nodesync disabled for demo.test1


Nodesync disabled for demo.test2
Nodesync disabled for demo.test3

Disable on all tables in a keyspace

Set nodesync enabled to false on all tables in a keyspace:

$ nodesync disable -v -k demo "*"

Displays a message for each table that was disabled:

Nodesync disabled for demo.test2


Nodesync disabled for demo.health_data
Nodesync disabled for demo.test1
Nodesync disabled for demo.test
Nodesync disabled for demo.test3

Disable on list of tables in different keyspaces

Set nodesync enabled to false on all tables in a keyspace:

$ nodesync disable -v demo.test demo.test3 cycling.comments cycling.cyclist_races

Displays a message for each table that was disabled:

Nodesync disabled for demo.test


Nodesync disabled for cycling.comments
Nodesync disabled for demo.test3
Nodesync disabled for cycling.cyclist_races

nodesync enable
Sets nodesync enabled to true on target tables.
Default setting is true.

Refer to Configuring SSL for nodetool, nodesync, dsetool, and Advanced Replication for important details
about creating a ~/.cassandra/nodesync-ssl.properties file. It defines properties for NodeSync that are
shared by JMX and CQL. The file must be present on any node where you will run the nodesync command.
Also, the JVM properties for NodeSync should be the same as those set for nodetool, but defined in a
separate file, such as nodesync-jvm.options. The JVM options are described in the topic referenced above.

Synopsis

[dse] nodesync main_options enable


[(-k keyspace_name | --keyspace keyspace_name)]
[--quiet]

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
801
DataStax Enterprise tools

[(-v | --verbose)]
[--] [(table_list | "*")]

Table 246: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
802
DataStax Enterprise tools

JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
Enable options
The following options apply to the enable subcommand:
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

Examples

Enable single table

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
803
DataStax Enterprise tools

Set nodesync enabled to true on one table:

$ nodesync enable demo.health_data

No messages returned on success.

Enable multiple tables

Set nodesync enabled to true on two tables:

$ nodesync enable demo.health_data cycling.comment

No messages returned on success.

Enable all tables in a keyspace

Set nodesync enabled to true on two tables:

$ nodesync enable -v -k cycling "*"

A list of tables that are enabled is returned.

Nodesync enabled for cycling.comments


Nodesync enabled for cycling.cyclist_alt_stats
Nodesync enabled for cycling.cyclist_races

nodesync help
Displays usage information for nodesync commands. Use nodesync help to display a synopsis and brief
description for a specific nodesync command.
Synopsis

[dse] nodesync help


[command_name [subcommand_name]]

Validation options
command_name
Name of nodesync command.
subcommand_name
Name of nodesync subcommand.
Examples

Display top-level help

$ nodesync help

usage: nodesync [(-js | --jmx-ssl)] [(-p <cqlPort> | --port <cqlPort>)]


[(-cu <cqlUsername> | --cql-username <cqlUsername>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
804
DataStax Enterprise tools

[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]


[(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)] [(-cs | --cql-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] <command> [<args>]

The most commonly used nodesync commands are:


disable Disable nodesync on the specified tables
enable Enable nodesync on the specified tables
help Display help information
tracing Enable/disable tracing for NodeSync
validation Monitor/manage user-triggered validations

See 'nodesync help <command>' for more information on a specific command.

Display help for specific nodesync command

$ nodesync help validation

NAME
nodesync validation - Monitor/manage user-triggered validations

SYNOPSIS
nodesync validation
nodesync [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)]
[(-p <cqlPort> | --port <cqlPort>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)] [(-js | --jmx-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] [(-cs | --cql-ssl)] validation
cancel [--quiet] [(-v | --verbose)]
nodesync [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)]
[(-p <cqlPort> | --port <cqlPort>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)] [(-js | --jmx-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] [(-cs | --cql-ssl)] validation list
[--quiet] [(-v | --verbose)] [(-a | --all)]
nodesync [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)]
[(-p <cqlPort> | --port <cqlPort>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)] [(-js | --jmx-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] [(-cs | --cql-ssl)] validation
submit [--quiet] [(-v | --verbose)]
[(-r <rateInKB> | --rate <rateInKB>)]

OPTIONS
-ca <cqlAuthProvider>, --cql-auth-provider <cqlAuthProvider>
CQL auth provider class name
-cp <cqlPassword>, --cql-password <cqlPassword>
CQL password
-cs, --cql-ssl
Enable SSL for CQL
-cu <cqlUsername>, --cql-username <cqlUsername>
CQL username

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
805
DataStax Enterprise tools

-h <cqlHost>, --host <cqlHost>


CQL contact point address
-jp <jmxPassword>, --jmx-password <jmxPassword>
JMX password
-jpf <jmxPasswordFile>, --jmx-password-file <jmxPasswordFile>
Path to the JMX password file
-js, --jmx-ssl
Enable SSL for JMX
-ju <jmxUsername>, --jmx-username <jmxUsername>
JMX username
-p <cqlPort>, --port <cqlPort>
CQL port number

COMMANDS
With no arguments, Display help information
submit
Submit a forced user validation
With --quiet option, Quiet output; don't print warnings
With --verbose option, Verbose output
With --rate option, Rate to be used just for this validation, in KB per second
cancel
Cancel a user-triggered validation
With --quiet option, Quiet output; don't print warnings
With --verbose option, Verbose output
list
List user validations. By default, only running validations are
displayed.
With --quiet option, Quiet output; don't print warnings
With --verbose option, Verbose output
With --all option, List all either running or finished validations since less then
1 day

Display help for specific nodesync command with a subcommand

$ nodesync help validation submit

NAME
nodesync validation submit - Submit a forced user validation

SYNOPSIS
nodesync
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)] [(-cs | --cql-ssl)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)]
[(-h <cqlHost> | --host <cqlHost>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-js | --jmx-ssl)] [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-p <cqlPort> | --port <cqlPort>)] validation submit [(-q | --quiet)]
[(-r <rateInKB> | --rate <rateInKB>)] [(-v | --verbose)] [--] <table>
[<range>...]

OPTIONS
-ca <cqlAuthProvider>, --cql-auth-provider <cqlAuthProvider>
CQL auth provider class name
-cp <cqlPassword>, --cql-password <cqlPassword>
CQL password
-cs, --cql-ssl
Enable SSL for CQL
-cu <cqlUsername>, --cql-username <cqlUsername>
CQL username

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
806
DataStax Enterprise tools

-h <cqlHost>, --host <cqlHost>


CQL contact point address
-jp <jmxPassword>, --jmx-password <jmxPassword>
JMX password
-jpf <jmxPasswordFile>, --jmx-password-file <jmxPasswordFile>
Path to the JMX password file
-js, --jmx-ssl
Enable SSL for JMX
-ju <jmxUsername>, --jmx-username <jmxUsername>
JMX username
-p <cqlPort>, --port <cqlPort>
CQL port number
-q, --quiet
Quiet output; don't print warnings
-r <rateInKB>, --rate <rateInKB>
Rate to be used just for this validation, in KB per second
-v, --verbose
Verbose output
--
This option can be used to separate command-line options from the
list of argument, (useful when arguments might be mistaken for
command-line options
<table> [<range>...]
The qualified table name, optionally followed by token ranges of the
form (x, y]. If no token ranges are specified, then all the tokens
will be validated.

nodesync tracing
Provides detailed transaction information related to internal NodeSync operations by capturing events in the
system_traces keyspace. When tracing is enabled a session id displays in standard output and an entry with
the high-level details is written to the system_traces.session table. More detailed data for each operation is
written to the system_traces.events table.
By default, Tracing information is saved for 7 days.

Synopsis

nodesync main_options tracing command

Table 247: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
807
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
808
DataStax Enterprise tools

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
nodesync tracing disable
Turns off NodeSync tracing.
Synopsis

nodesync main_options tracing disable


[(-n node_list | --nodes node_list)]
[--quiet]
[(-v | --verbose)]

Table 248: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
809
DataStax Enterprise tools

-cu, --cql-username cql_username


CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
disable options
-n, --nodes node_list
Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
810
DataStax Enterprise tools

Examples

Disable tracing on all nodes

$ nodesync tracing disable

Disable tracing on the local node

$ nodesync tracing disable -n 10.10.0.30 -q

nodesync tracing enable


Enables tracing.
Synopsis

nodesync main_options tracing enable


[(-c | --color)]
[(-f | --follow)]
[(-l level_name | --level level_name)]
[(-n node_list | --nodes node_list)]
[--quiet]
[(-t seconds | --timeout seconds)]
[--tables table_list]
[(-v | --verbose)]

Table 249: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
811
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
Enable options
-c, --color

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
812
DataStax Enterprise tools

If --follow is used, color each trace event according from which host it originates from

-f, --follow
After having enabled tracing, continuously show the trace events,showing new events as they come.
Note that this won't exit unless you either manually exit (with Ctrl-c) or use a timeout (--timeout option).
-l <levelStr>, --level <levelStr>
The tracing level: either 'low' or 'high'. If omitted, the 'low' level is used. Note that the 'high' level is
somewhat verbose and should be used with care.
-n, --nodes node_list
Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
-t <timeoutStr>, --timeout <timeoutStr>
Timeout on the tracing; after that amount of time, tracing will be automatically disabled (and if --follow
is used, the command will return). This default in seconds, but a 's', 'm' or 'h' suffix can be used for
seconds, minutes or hours respectively.
--tables <tableStr>
A comma separated list of fully-qualified table names to trace. If omitted, all tables are trace.
-v, --verbose
Verbose output.
Examples

Enable tracing on all nodes

$ nodesync tracing enable

When the CQL host and JMX port is not specified, the local IP and default port are used. Tracing

Warning: Do not forget to stop tracing with 'nodesync tracing disable'.


Enabled tracing. Session id is e60dfd70-eb5a-11e7-8bde-b5dcb560a8ef

nodesync tracing show


Displays the events of a NodeSync tracing session.
Synopsis

nodesync main_options tracing show


[(-c | --color)]
[(-f | --follow)]
[(-i <traceIdStr> | --id <traceIdStr>)]
[(-n <nodeList> | --nodes <nodeList>)]
[--quiet]
[(-t <timeoutStr> | --timeout <timeoutStr>)]
[(-v | --verbose)]

Table 250: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
813
DataStax Enterprise tools

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
814
DataStax Enterprise tools

Display all messages.


--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
show options
-c, --color
If --follow is used, color each trace event according to the host from which it originates.
-f, --follow
After enabling tracing, continuously show the trace events, showing new events as they occur. Will not
exit unless manually terminated using Ctrl-c or in conjunction with the --timeout option.
-l <levelStr>, --level <levelStr>
The tracing level: either 'low' or 'high'. If omitted, the 'low' level is used. Note that the 'high' level is
somewhat verbose and should be used with care.
-i <traceIdStr>, --id <traceIdStr>
Show any nodes using the specified trace ID.
-t <timeoutStr>, --timeout <timeoutStr>
When --follow is used, automatically exit after the provided amount of time elapses. This default to
seconds, but a 's', 'm' or 'h' suffix can be used for seconds, minutes or hours respectively.
Examples

$ nodesync tracing show -i e60dfd70-eb5a-11e7-8bde-b5dcb560a8ef

Starting NodeSync tracing on /10.200.176.186 (elapsed: 2.7ms)


Adding continuous proposer for demo.health_data (elapsed: 6.9m)
[#-] Skipping (10,-9223372036854775808] of demo.health_data, state updated: was recently
validated by another node (2h ago, previously know: 2h ago) (elapsed: 6.9m)
[#0] Starting validation on (-9223372036854775798,10] of demo.health_data (validated 2h
ago) (elapsed: 6.9m)
[#1] Starting validation on (-9223372036854775808,-9223372036854775798] of
demo.health_data (validated 2.1h ago) (elapsed: 6.9m)
[#0] Completed validation (full_in_sync) in 4ms: validated 0B and repaired 0B (elapsed:
6.9m)
[#2] Starting validation on (10,-9223372036854775808] of demo.health_data (validated 2h
ago) (elapsed: 6.9m)
[#1] Completed validation (full_in_sync) in 9ms: validated 0B and repaired 0B (elapsed:
6.9m)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
815
DataStax Enterprise tools

[#2] Completed validation (full_in_sync) in 2ms: validated 0B and repaired 0B (elapsed:


6.9m)

nodesync tracing status


Enables/disables tracing for NodeSync.
Synopsis

nodesync main_options tracing status


[(-n <nodeList> | --nodes <nodeList>)]
[--quiet]
[(-v | --verbose)]

Table 251: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
816
DataStax Enterprise tools

Connect to the specified remote CQL host.


help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
nodesync validation
Monitors and manages user-triggered validations.
Synopsis

[dse] nodesync main_options validation


(cancel id |
list |
submit [(-r KB | --rate KB)] [--] table_name [range ...] )
[(--quiet | (-v | --verbose))]

Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
817
DataStax Enterprise tools

Use SSL for CQL connection.


-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:

• Qualified table names: keyspace_name.table_name. For example, cycling.comments.

• Default keyspace -k option with:

# Unqualified table names. For example -k cycling cyclist_alt_stats comments


cyclist_races.

# An asterisk in double quotes to select all tables. For example, -k cycling "*".

-n, --nodes node_list


Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
validation options
cancel id
Cancel the specified user-triggered validation.
list [(-a | --all)]
List user validations. Use -a to list all, running or validations that completed in the past day.
Default: Only running validations are displayed.
submit [options] table_name [range]
Submit a forced user validation.
-r KB, --rate KB Rate to be used just for this validation, in KB per second.
--quiet
Suppress warning and error messages.
--
Separates command-line options from the list of argument. Use when arguments might be mistaken for
command-line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
818
DataStax Enterprise tools

table_name [ token_range ]
Keyspace qualified table name, optionally followed by token ranges in the form (x, y). If no token ranges
are specified, then all the tokens are validated.
-v | --verbose
Display all messages.
Examples
List all nodesync validations:

$ nodesync validation list --all Identifier Table Status Outcome Duration ETA Progress
Validated Repaired 1e6255f0-7754-11e9-aad8-579eeacd08f6 cycling.comments running ? 0ms ?
0% 0B 0B 0ac37290-7754-11e9-ab57-0f1d9fa56691 cycling.cyclist_races successful success
24ms - 100% 0B 0B

Possible Outcome values are:


?
Operation still in progress. No outcome available yet.
uncompleted
Some partitions could not be repaired because there were not enough live replicas. Errors may or may
not have occurred. Any errors are written to the console.
failed
Errors occurred, but enough replicas were available for partitions that did not have errors. Any errors
are written to the console.
partial
Enough (but not all) replicas were alive to repair or check that the data was in sync among the live
replicas.
success
All data was in sync or successfully repaired.

dsefs commands
The DSEFS functionality supports operations including uploading, downloading, moving, and deleting files,
creating directories, and verifying the DSEFS status.
append
Appends a local file to a remote file.

Refer to files in the local file system by prefixing paths with the file: prefix.

Synopsis

$ append source_filepath destination_filepath

Table 252: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
819
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

destination_filepath
Explicit or relative filepath.

• If destination path ends with name, destination entry is given that name.

• If the destination path ends with a backslash (/), original source file name is used.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

source_filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Examples

Append local file to remote file

dsefs dsefs://127.0.0.1:5598/ > append file:/home/cal09 dsefs:/data2/cal10

cat
Concatenates files and prints on the standard output.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
820
DataStax Enterprise tools

Synopsis

$ cat filepath [filepath ...]

Table 253: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
821
DataStax Enterprise tools

Examples

Print one file in the DSE filesystem to standard output

dsefs file:/home/ > cat calSept

September 2018
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30

Print two files in the DSE filesystem to standard output

dsefs file:/home/ > cat calSept calOct

September 2018
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
October 2019
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31

cd
Changes the working directory in DSEFS. The DSEFS shell remembers the last working directory of each file
system separately.

The DSEFS prompt identifies the current working directory in DSEFS:

• dsefs dsefs://127.0.0.1:5598/ > is the default directory

• dsefs dsefs://127.0.0.1:5598/dir2/ is the current working directory dir2

• dsefs file:/ > is the current directory on the local file system

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
822
DataStax Enterprise tools

Synopsis

$ cd filepath

Table 254: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
823
DataStax Enterprise tools

Examples

Change directory in DSEFS

dsefs dsefs://127.0.0.1:5598/ > cd tmp

dsefs dsefs://127.0.0.1:5598/tmp/ >

Change directory to the last working directory on the local file system

dsefs dsefs://127.0.0.1:5598/ > cd file:/

dsefs file:/home/user1/path/to/local/files

Change directory to the parent directory on the local file system

dsefs file:/home/user1/path/to/local/files > cd ..

dsefs file:/home/user1/path/to/local >

Go back to the last working directory in DSEFS

dsefs file:/home/user1/path/to/local/files > cd dsefs:

dsefs dsefs://127.0.0.1:5598/ >

chgrp
Changes group ownership for files or directories.
Synopsis

$ chgrp [-R] [-v] group_name filepath [filepath ...]

Table 255: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
824
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

group_name
Group name.
-R, --recursive
Remove directories and their contents recursively.
-v, --verbose
Turn on verbose output.
Examples

Change group ownership of myFile to admin

dsefs dsefs://127.0.0.1:5598/ > chgrp admin file:/home/myFile

chmod
Changes permission mode for owner, group, and others.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
825
DataStax Enterprise tools

Synopsis

$ chmod [-R] [-v] permission_mode filepath [filepath ...]

Table 256: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

permission_mode
Octal representation of permission mode for owner, group, and others:

• 0 – no permission

• 1 – execute

• 2 – write

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
826
DataStax Enterprise tools

• 3 – write and execute

• 4 – read

• 5 – read and execute

• 6 – read and write

• 7 – read, write, and execute

-R, --recursive
Remove directories and their contents recursively.
-v, --verbose
Turn on verbose output.
Examples

Change permission to make file readable, writable and executable by all users

dsefs dsefs://127.0.0.1:5598/ > chmod 777 file:/home/myFile

Change permission to make file readable, writable and executable by owner


and only executable by group and others

dsefs dsefs://127.0.0.1:5598/ > chmod 711 file:/home/myFile

chown
Changes ownership and/or group ownership for files or directories.
Synopsis

$ chgrp [-R] [-v] [-u username] [-g group_name] filepath [filepath ...]

Table 257: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
827
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

-g, --group group_name


New owner group name.
-R, --recursive
Remove directories and their contents recursively.
-u, --user username
Set new owner username.
-v, --verbose
Turn on verbose output.
Examples

Recursively change ownership to admin group for two files

dsefs dsefs://127.0.0.1:5598/ > chown -R -g admin file:/home/myFile file:/data2/myFile2

Change ownership to John Doe

dsefs dsefs://127.0.0.1:5598/ > chown -u jdoe dsefs:/home/myFile

cp
Copies a file within a file system or between two file systems. If the destination filepath points to a file system
other than DSEFS, the block size and redundancy options are ignored.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
828
DataStax Enterprise tools

Synopsis

$ cp [-o] [-b size_in_bytes] [-n num_nodes] [--no-force-sync] [--force-sync] source_filepath


[source_filepath ...] destination_filepath

Table 258: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

-b, --block-size size_in_bytes


Preferred block size in bytes for files. Ignored when the destination path is a file system other than
DSEFS.
destination_filepath
Explicit or relative filepath.

• If destination path ends with name, destination entry is given that name.

• If the destination path ends with a backslash (/), original source file name is used.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
829
DataStax Enterprise tools

--force-sync
Synchronize files in this directory with the storage device when closed. Files created in the directory
inherit the option.
--no-force-sync
Do not synchronize files in this directory with the storage device when closed. Files created in the
directory inherit the option.
-n, --redundancy-factor num_nodes
Set the number of replicas of file data, similar to the replication factor in the database keyspaces, but
more granular.

• Set this to one number greater than the number of nodes that are allowed to fail before data loss
occurs. For example, set this value to 3 to allow 2 nodes to fail.

• For simple replication, use a value that is equivalent to the replication factor.

• Default value is inherited from the parent directory if set, otherwise it is 3.

-o, --overwrite
If destination file exists, overwrite.
source_filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Examples

Copy file from source to overwrite file in destination

dsefs file:/home/user1/test > cp -o dsefs:archive.tgz another-archive-copy.tgz

df
Reports file system status and disk space usage.
Synopsis

$ df [-h]

Table 259: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
830
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

-h, --human-readable
Display human-readable sizes. For example, 1.25k, 234M, or 2G.
Examples

Get file system status and disk space usage

dsefs dsefs://127.0.0.1:5598/ > df

Location Status DC Rack Host


Address Port Directory Used Free Reserved
144e587c-11b1-4d74-80f7-dc5e0c744aca up GraphAnalytics rack1 node1.example.com
10.200.179.38 5598 /var/lib/dsefs/data 0 29289783296 5368709120
98ca0435-fb36-4344-b5b1-8d776d35c7d6 up GraphAnalytics rack1 node2.example.com
10.200.179.39 5598 /var/lib/dsefs/data 0 29302099968 5368709120

du
List sizes of the files and directories in a specific directory.
Synopsis

$ du [-h] [-s] directories

Table 260: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
831
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

-h, --human-readable
Display human-readable sizes. For example, 1.25k, 234M, or 2G.
-s, --summarized
Display only the total size of all files and directories.
directories
The directories to search to calculate the space usage.
Examples

Get disk usage from the root of the DSEFS file system.

$ dse fs "du"

464827 example1
0 tmp/hive
0 tmp

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
832
DataStax Enterprise tools

464827 .

Get the disk usage from the example1 directory in human readable form.

$ dse fs "du -h example1"

454K example1

Get the total disk usage in human readable form of all files in DSEFS.

$ dse fs "du -h -s"

454K .

echo
Displays a line of text.
Synopsis

$ echo text_to_display

Table 261: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
833
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Command arguments

text_to_display
Text to display.
Examples

Display File copied

dsefs dsefs://127.0.0.1:5598/ > echo File copied

exit
Exits DSEFS command shell.
Synopsis

$ exit

Table 262: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
834
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Command arguments

This command takes no arguments.


Examples

Exit DSEFS command shell

dsefs dsefs://127.0.0.1:5598/ > exit

fsck
Performs file system consistency check and repairs file system errors. Only a superuser may run fsck. Run fsck
after running umount, or if you encounter file write errors (for example, timeouts).
Synopsis

$ fsck [-p, --parallelism num_files]

Table 263: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
835
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Command arguments

-p, --parallelism num_files


Use throttling to minimize the performance impact of running fsck on clusters. Specify the number of
files to repair at one time.
Examples

Check file system and repair errors

dsefs dsefs://127.0.0.1:5598/ > fsck

Use throttling to limit the number of files being repaired at the same time to 8.

$ dse fs fsck -p 8

get
A special case of cp that copies a DSEFS remote file to the local file system. If a relative source path is given, it
is resolved in the last DSEFS working directory, regardless of the current working directory. Similarly, if a relative
destination path is given, it is always resolved in the last local working directory. Filepaths can be absolute and
can point to any file system.
Synopsis

$ get source_filepath destination_filepath

Table 264: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
836
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

destination_filepath
Explicit or relative filepath.

• If destination path ends with name, destination entry is given that name.

• If the destination path ends with a backslash (/), original source file name is used.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

source_filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Examples

Copy DSEFS remote file to local filesystem

dsefs dsefs://127.0.0.1:5598/ > dsefs / > get archive.tgz local_archive.tgz

ls
Lists directory contents.
Synopsis

$ ls [-R] [-l] [-h] [-1] [directory_name [directory_name ...]]

Table 265: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
837
DataStax Enterprise tools

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

directory_name
Directory on DSEFS file system.

• Wildcard characters are supported.

• .. is the parent directory.

-h, --human-readable
Display human-readable sizes. For example, 1.25k, 234M, or 2G.
-l, --long
Use long listing format.
-R, --recursive
Remove directories and their contents recursively.
-1, --single-column
List one file per line.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
838
DataStax Enterprise tools

Examples

List directory contents

dsefs dsefs://127.0.0.1:5598/ > ls file:/

bin cdrom dev home lib32 lost+found mnt proc run srv tmp var
initrd.img.old vmlinuz.old
boot data etc lib lib64 media opt root sbin sys usr initrd.img vmlinuz

List directory contents with one file per line

dsefs dsefs://127.0.0.1:5598/ > ls -1 file:/

bin
cdrom
dev
home
lib32
lost+found
mnt
proc
run
srv
tmp
var
initrd.img.old
vmlinuz.old
boot
data
etc
lib
lib64
media
opt
root
sbin
sys
usr
initrd.img
vmlinuz

mkdir
Creates new directory or directories.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
839
DataStax Enterprise tools

Synopsis

$ mkdir [-p] [-b size_in_bytes] [-n num_nodes] [-c encoder_name] [-m permission_mode] [--no-
force-sync] [--force-sync] new_directory_name [new_directory_name ...]

Table 266: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

-b, --block-size size_in_bytes


Preferred block size in bytes for files in new directory.
Default is 64 MB.
-c, --compression-encoder encoder_name
The compression encoder name. DSE ships with the LZ4 compression encoder.
new_directory_name
New directory on DSEFS file system.

• Explicit file system prefixes dsefs: and file: are supported.

--force-sync
Synchronize files in this directory with the storage device when closed. Files created in the directory
inherit the option.
-m, --permission-mode permission_mode

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
840
DataStax Enterprise tools

Octal representation of permission mode for owner, group, and others:

• 0 – no permission

• 1 – execute

• 2 – write

• 3 – write and execute

• 4 – read

• 5 – read and execute

• 6 – read and write

• 7 – read, write, and execute

--no-force-sync
Do not synchronize files in this directory with the storage device when closed. Files created in the
directory inherit the option.
-n, --redundancy-factor num_nodes
Set the number of replicas of file data, similar to the replication factor in the database keyspaces, but
more granular.

• Set this to one number greater than the number of nodes that are allowed to fail before data loss
occurs. For example, set this value to 3 to allow 2 nodes to fail.

• For simple replication, use a value that is equivalent to the replication factor.

• Default value is inherited from the parent directory if set, otherwise it is 3.

-p, --parents
If needed, makes parent directories. If parent directories exist, no error.
Examples

Make new directory with 32-MB block sizes, redundancy factor or 2, files
synchronize on close

dsefs dsefs://127.0.0.1:5598/ > mkdir -b 32000000 -n 2 --force-sync file:new_directory

mv
Moves a file or directory.
Synopsis

$ mv source_filepath [source_filepath ...] destination_filepath

Table 267: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
841
DataStax Enterprise tools

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

destination_filepath
Explicit or relative filepath.

• If destination path ends with name, destination entry is given that name.

• If the destination path ends with a backslash (/), original source file name is used.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

source_filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
842
DataStax Enterprise tools

Examples

Move file from source file to destination directory

dsefs dsefs://127.0.0.1:5598/ > mv file:/home/myFile dsefs:/data2/myDirectory

put
A special case of cp that copies a local file to the DSE filesystem. If a relative source path is given, it is resolved
in the last local working directory, regardless of the current working directory. Similarly, if a relative destination
path is given, it is always resolved in the last DSEFS working directory. As in cp, both paths may be absolute and
are allowed to point to any file system. If the destination path points to a different file system than DSEFS, the
block size and redundancy options are ignored.
Synopsis

$ put [-o] [-b size_in_bytes] [-n num_nodes] [-c encoder_name] [-f frame_size_in_bytes] [-m
permission_mode] [--no-force-sync] [--force-sync] source_filepath destination_filepath

Table 268: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
843
DataStax Enterprise tools

-b, --block-size size_in_bytes


Preferred block size in bytes for files. Ignored when the destination path is a file system other than
DSEFS.
-c, --compression-encoder encoder_name
The compression encoder name. DSE ships with the LZ4 compression encoder.
destination_filepath
Explicit or relative filepath.

• If destination path ends with name, destination entry is given that name.

• If the destination path ends with a backslash (/), original source file name is used.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

-f, --compression-frame-size frame_size_in_bytes


Compress frame to preferred frame size in bytes. Frame is a subject of compression. The bigger the
frame,
the bigger chance for high compression ratio.
In most cases, the default value is sufficient. Default frame size in bytes is 131072 bytes.
--force-sync
Synchronize files in this directory with the storage device when closed. Files created in the directory
inherit the option.
-m, --permission-mode permission_mode
Octal representation of permission mode for owner, group, and others:

• 0 – no permission

• 1 – execute

• 2 – write

• 3 – write and execute

• 4 – read

• 5 – read and execute

• 6 – read and write

• 7 – read, write, and execute

-n, --redundancy-factor num_nodes


Set the number of replicas of file data, similar to the replication factor in the database keyspaces, but
more granular.

• Set this to one number greater than the number of nodes that are allowed to fail before data loss
occurs. For example, set this value to 3 to allow 2 nodes to fail.

• For simple replication, use a value that is equivalent to the replication factor.

• Default value is inherited from the parent directory if set, otherwise it is 3.

--no-force-sync
Do not synchronize files in this directory with the storage device when closed. Files created in the
directory inherit the option.
-o, --overwrite
If destination file exists, overwrite.
source_filepath
Explicit or relative filepath.

• Wildcard characters are supported.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
844
DataStax Enterprise tools

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Examples

Copy local bluefile to remote greenfile

dsefs dsefs://127.0.0.1:5598/ > put file:/bluefile greenfile

pwd
Prints full filepath of current working directory.
Synopsis

$ pwd [directory_path]

Table 269: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
845
DataStax Enterprise tools

directory_path
Current working directory.
Examples

Print working directory

dsefs dsefs://127.0.0.1:5598/ > pwd dsefs:/myDirectory

dsefs:/home/user1/new_directory

realpath
Prints the resolved absolute path; all but the last component must exist
Synopsis

$ realpath [-e] [-m] filepath [filepath ...]

Table 270: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
846
DataStax Enterprise tools

Command arguments

-e, --canonicalize-existing
Convert data that involves more than one representation into a standard, approved format. All
components of the path must exist.
-m, --canonicalize-missing
Convert missing data into a standard approved format. No path components needed to exist or be a
directory.
filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Examples

Print filepath

dsefs dsefs://127.0.0.1:5598/ > realpath file:myDirectory

file:/home/user1/myDirectory

rename
Renames a file or directory without moving it to a different directory.
Synopsis

$ rename filepath new_name

Table 271: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
847
DataStax Enterprise tools

Syntax conventions Description

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

new_name
New file name.
filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Examples

Rename myFile to cyclist

dsefs dsefs://127.0.0.1:5598/ > rename file:/home/myFile cyclist

rm
Removes files or directories.
Synopsis

$ rm [-R] [-v] filepath [filepath ...]

Table 272: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
848
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

-R, --recursive
Remove directories and their contents recursively.
-v, --verbose
Turn on verbose output.
Examples

Remove files

dsefs dsefs://127.0.0.1:5598/ > rm file:/home/myFile dsefs:/home/remoteFile

Remove directory

dsefs dsefs://127.0.0.1:5598/ > rm file:/home/

rmdir
Removes empty directory or directories.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
849
DataStax Enterprise tools

Synopsis

$ rmdir filepath [filepath ...]

Table 273: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
850
DataStax Enterprise tools

Examples

Remove empty directory

dsefs dsefs://127.0.0.1:5598/ > rmdir file:/home/

stat
Displays file or directory status.
Synopsis

$ stat [-v] filepath [filepath ...]

Table 274: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
851
DataStax Enterprise tools

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

-v, --verbose
Turn on verbose output.
Examples

Get status of directory

dsefs dsefs://127.0.0.1:5598/ > stat file:new_directory

DIRECTORY file:/home/user1/new_directory:
Owner user1
Group user1
Permission rwxr-xr-x
Created 2017-01-15 13:10:06+0200
Modified 2017-01-15 13:10:06+0200
Accessed 2017-01-15 13:10:06+0200
Size 4096

truncate
Truncates file or files to a specified length.
To retain only metadata, set file size to 0 bytes. Also useful to keep an empty file for processes without deleting
and recreating a file.
Synopsis

$ truncate [-s size_in_bytes] filepath [filepath ...]

Table 275: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
852
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

filepath
Explicit or relative filepath.

• Wildcard characters are supported.

• Explicit file system prefixes dsefs: and file: are supported.

• .. is the parent directory.

-s, --size size_in_bytes


Set new file size in bytes.
Examples

Truncate file to 0 bytes

dsefs dsefs://127.0.0.1:5598/ > truncate -s 0 file:/home/myFile

umount
Unmounts file system storage locations from file hierarchy. Only a superuser may run umount. After running
umount, run fsck to add missing block replicas taken away by the unmounted location.

Synopsis

$ umount [-f] location_UUID [location_UUID ...]

Table 276: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
853
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

-f, --force
Force unmounting, even if location is unavailable.
location_UUID
UUID of location.
Examples

Unmount location from DSEFS

dsefs dsefs://127.0.0.1:5598/ > umount dcd9dd1f-46c8-4b47-b3e3-aa431156021a

dsetool
About dsetool
dsetool is a command line interface for DSE operations.
Synopsis

$ dsetool [connection_options] command command_args

Table 277: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
854
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Using dsetool command line help


To view a listing of dsetool commands:

$ dsetool help

To view help for a specific command:

$ dsetool command help

dsetool commands for DSE Search


Search CQL commands are distributed to the entire data center. The dsetool commands for DSE Search
distribute search index changes to the data center by default, and are node-specific only when the distributed
flag is set to false.
Connection options
Options for connecting to your cluster with the dsetool utility.
Synopsis

$ dsetool [connection_options] command command_args

Table 278: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
855
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

JMX authentication is used by some dsetool commands. Other dsetool commands authenticate with the
user name and password of the configured user. The connection option short form and long form are comma
separated.

You can provide authentication credentials in several ways, see Credentials for authentication.
To enable dsetool to use Kerberos authentication, see Using dsetool with Kerberos enabled cluster.

Specify how to connect and authenticate the dsetool command.


This list shows short form (-f filename) and long form (--config-file=filename):
-a, --jmxusername jmx_username
User name for authenticating with secure local JMX.
-b, --jmxpassword jmx_password
Password for authenticating with secure local JMX. If you do not provide a password, you are prompted
to enter one.
-c, --cassandra_port dse_port
DSE port number.
--cipher-suites ssl_cipher_suites
Specify comma-separated list of SSL cipher suites for connection to DSE when SSL is enabled. For
example, --cipher-suites c1,c2,c3.
-f, --config-file config_filename
File path to configuration file that stores credentials. The credentials in this configuration file override the
~/.dserc credentials. If not specified, then use ~/.dserc if it exists.
The configuration file can contain DataStax Enterprise and JMX login credentials. For example:

username=username
password=password
jmx_username=jmx_username
jmx_password=jmx_password

The credentials in the configuration file are stored in clear text. DataStax recommends restricting
access to this file only to the specific user.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
856
DataStax Enterprise tools

-h, --host IP_address


Connect to the specified hostname or IP address. Do not connect to the local node.
-j, --jmxport jmx_port
Remote JMX agent port number.
--keystore-path ssl_keystore_path
Path to the keystore for connection to DSE when SSL client authentication is enabled.
--keystore-password keystore_password
Keystore password for connection to DSE when SSL client authentication is enabled.
--keystore-type ssl_keystore_type
Keystore type for connection to DSE when SSL client authentication is enabled. JKS is the type for keys
generated by the Java keytool binary, but other types are possible, depending on user environment.
-l, --username username
Role to authenticate for database access.
-p, --password password
Password to authenticate for database access.
-s, --port solr_port
Solr port.
--ssl true | false
Whether to use SSL for native connections.
--ssl-protocol ssl_protocol
SSL protocol for connection to DSE when SSL is enabled. For example, --ssl-protocol ssl4.
--sslauth true | false
Whether to use SSL client authentication.
--truststore_password ssl_truststore_password
Truststore password to use for connection to DSE when SSL is enabled.
--truststore_path ssl_truststore_path
Path to the truststore to use for connection to DSE when SSL is enabled. For example, --truststore-
path /path/to/ts.
--truststore-type ssl_truststore_type
Truststore type for connection to DSE when SSL is enabled. JKS is the type for keys generated by
the Java keytool binary, but other types are possible, depending on user environment. For example, --
truststore-type jks2.
dsetool core_indexing_status
Retrieves the dynamic indexing status of a search index on a DSE Search node and displays the percent
complete, an estimated completion time in milliseconds, and the reindexing reason.

Command is supported only on nodes with DSE Search workloads.

Synopsis

$ dsetool core_indexing_status [keyspace_name.]table_name [--all] [--progress]

Table 279: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
857
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Retrieves the dynamic indexing status (INDEXING, FINISHED, or FAILED) of the specified index or indexes.
Also identifies the reindexing reason. The possible reason for a reindexing event is categorized as one of the
following:

• BOOTSTRAP

• NEW_SSTABLES

• USER_REQUEST

Parameters:
[keyspace_name.]table_name
The search index table name is required. The keyspace name is optional. The case of keyspace and
table names is preserved. You must use the correct case for the keyspace and table names.
--all
Retrieve the dynamic indexing status of the specified search index on all nodes.
--progress
Display the percent complete, an estimated completion time in milliseconds, and the reindexing reason.
This option is ignored and is assumed true. The command always displays the status information.
See Verifying indexing status.
Examples
These examples use the demo keyspace and health_data table.
To view the indexing status for the local node:

$ dsetool core_indexing_status demo.health_data

The results are displayed:

[demo.health_data]: INDEXING, 38% complete, ETA 452303 milliseconds (7 minutes 32


seconds),

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
858
DataStax Enterprise tools

reason: USER_REQUEST

To view the indexing status for a search index on a specified node:

$ dsetool -h 200.192.10.11 core_indexing_status demo.health_data

To view indexing status of all search indexes in the data center:

$ dsetool core_indexing_status demo.health_data --all

The results are displayed for 3 nodes in the data center:

Address Core Indexing Status


200.192.10.11 FINISHED
200.192.10.12 FINISHED
200.192.10.23 FINISHED

dsetool create_core
Creates the search index table on the local node.
Supports DSE authentication with [-l username -p password].
The CQL command to create a search index is CREATE SEARCH INDEX.

Command is supported only on nodes with DSE Search workloads.


Auto-generated schemas have default DocValues enabled. See Creating a search index with default values for
details on docValues.
If one or more nodes fail to create the core in distributed operations, an error message indicates the failing
node or nodes. If it failed to create the core immediately, issue the create again. If it failed to create on some
nodes, issue a reload for those nodes to load the newly created core.

Synopsis

$ dsetool create_core keyspace_name.table_name [coreOptions=yamlFile |


coreOptionsInline=key1:value1#key2:value2#...] [distributed=(true|false)]
[(generateResources=(true|false)] | schema=path solrconfig=path)] [recovery=(true|false)]
[reindex=(true|false)]

Table 280: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
859
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
coreOptions=yamlFilepath
When auto-generation is on with generateResources=true, the file path to a customized YAML-
formatted file of options. See Changing auto-generated search index settings.
coreOptionsInline=key1:value1#key2:value2#...
Use this key-value pair syntax key1:value1#key2:value2# to specify values for these settings:

• auto_soft_commit_max_time:ms

• default_query_field:field

• distributed:( true | false )

• enable_string_copy_fields:( true | false )

• exclude_columns: col1, col2, col3, ...

• generate_DocValues_for_fields:( * | field1, field2, ... )

• generateResources:(true|false)

See Changing auto-generated search index settings.


include_columns
A comma-separated (CSV) list of columns to include. Empty = includes all columns.
index_merge_factor
How many segments of equal size to build before merging them into a single segment.
index_ram_buffer_size
The index ram buffer size in megabytes (MB).
lenient
Ignore non-supported type columns and continue to generate resources, instead of erroring out when
non-supported type columns are encountered. Default: false
resource_generation_profiles
To minimize index size, specify a CSV list of profiles to apply while generating resources.
Table 281: Resource generation profiles
Profile name Description

spaceSavingAll Applies spaceSavingNoJoin and spaceSavingSlowTriePrecision profiles.

spaceSavingNoJoin Do not index a hidden primary key field. Prevents joins across cores.

spaceSavingSlowTriePrecision Sets trie fields precisionStep to '0', allowing for greater space saving but slower querying.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
860
DataStax Enterprise tools

Using spaceSavings profiles disables auto generation of DocValues.


For example:

resource_generation_profiles: spaceSavingNoJoin, spaceSavingSlowTriePrecision

rt
Enable live indexing to increase indexing throughput. Enable live indexing on only one search index per
cluster.

rt=true

recovery=(true|false)
Whether to delete and recreate the search index if it is not able to load due to corruption. Valid values:

• true - If search index is unable to load, recover the index by deleting and recreating it.

• false - Default. No recovery.

reindex=(true|false)
Whether to reindex the data when search indexes are auto-generated with generateResources=true.
Reindexing works on a datacenter (DC) level. Reindex only once per search-enabled DC. Repeat the
reindex command on other data centers as required.
Valid values:

• true - Default. Reindexes the data. Accepts reads and keeps the current search index while the
new index is building.

• false - Does not reindex the data. You can check and customize search index resources before
indexing.

schema=path
Path of the UTF-8 encoded search index schema file. Cannot be specified when
generateResources=true.
To ensure that non-indexed fields in the table are retrievable by queries, you must include those
fields in the schema file. For more information, see Solr single-pass CQL queries.
solrconfig=path
Path of the UTF-8 encoded search index configuration file. Cannot be specified when
generateResources=true.

Examples

Automatically generate search index for the health_data table in the demo
keyspace

$ dsetool create_core demo.health_data generateResources=true

Override the default and reindex existing data, specify the reindex=true
option

$ dsetool create_core demo.health_data generateResources=true reindex=true

The generateResources=true option generates resources only if resources do not exist in the solr_resources
table.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
861
DataStax Enterprise tools

Use options in a YAML-formatted file

To turn on live indexing, also known as real-time (RT) indexing, the contents of the rt.yaml are rt: true:

$ dsetool create_core udt_ks.users generateResources=true reindex=true coreOptions=rt.yaml

Enable encryption with inline options

Specify the class for directoryFactory to solr.EncryptedFSDirectoryFactory:

$ dsetool create_core keyspace_name.table_name generateResources=true


coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"

$ dsetool create_core demo.health_data generateResources=true


coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"

dsetool createsystemkey
Creates an encryption/decryption key for transparent data encryption (TDE).
See Transparent data encryption.
Synopsis

$ dsetool createsystemkey [cipher_algorithm[/mode/padding] [length] [key_name] [-d filepath]


[-k=kmip_groupname [-t kmip_template] [-n namespace]]

Table 282: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
862
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

cipher_algorithm[/mode/padding]
DSE supports the following JCE cipher algorithms:

• AES/CBC/PKCS5Padding (valid with length 128, 192, or 256).

• AES/ECB/PKCS5Padding (valid with length 128, 192, or 256)

• DES/CBC/PKCS5Padding (valid with length 56)

• DESede/CBC/PKCS5Padding (valid with length 112 or 168)

• Blowfish/CBC/PKCS5Padding (valid with length 32-448)

• RC2/CBC/PKCS5Padding (valid with length 40-128)

Default value: AES/CBC/PKCS5Padding (with length 128)


-d filepath, --directory filepath
Key file output directory. Enables creating key files before DSE is installed. This option is typically
used by IT automation tools like Ansible. When no directory is specified, keys are saved to the value of
system_key_directory in dse.yaml.
length
Required if cipher_algorithm is specified. Key length is not required for HMAC algorithms. Default value:
128 (with the default cipher algorithm AES/CBC/PKCS5Padding)
key_name
Unique file name for the generated system key file. Encryption key files can have any valid Unix name.
When no name is specified, the default file name is system_key. The default key file name is not
configurable.
-k=kmip_groupname
The name of the KMIP group that is defined in the kmip_hosts section of dse.yaml.
-t kmip_template
The key template on the specified KMIP provider.
-n namespace
Namespace on the specified KMIP provider.
Examples

To create a local key file:

$ dsetool createsystemkey 'AES/ECB/PKCS5Padding' 128 system_key2

where system_key2 is the unique file name for the generated key file.

To create an off-server key file:

$ dsetool createsystemkey 'AES/ECB/PKCS5Padding' 128 system_key2 -kmip=group2

where group2 is the key server group defined in the kmip_hosts section of dse.yaml.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
863
DataStax Enterprise tools

To create a local key file in a specific directory:

$ dsetool createsystemkey 'AES/ECB/PKCS5Padding' 128 -d /mydir

See Setting up local encryption keys.


dsetool encryptconfigvalue
Encrypts sensitive configuration information. This command takes no arguments and prompts for the value to
encrypt.
Example

$ dsetool encryptconfigvalue

dsetool get_core_config
Displays the XML for the specified search index config. Supports DSE authentication with [-l username -p
password].

Command is supported only on nodes with DSE Search workloads.

Synopsis

$ dsetool get_core_config keyspace_name.table_name [current=true|false]

Table 283: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
864
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
current=true|false
Optionally specify to view the current (active) configuration.

• true - Returns the active live search index config.

• false - Default. Returns the pending (latest uploaded) search index configuration.

Examples
The following examples view the search index config for the demo keyspace and health_data table.
To view the pending (latest uploaded) configuration:

$ dsetool get_core_config demo.health_data

The XML for the auto-generated configuration is displayed:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<config>
<abortOnConfigurationError>${solr.abortOnConfigurationError:true}</
abortOnConfigurationError>
<luceneMatchVersion>LUCENE_6_0_0</luceneMatchVersion>
<dseTypeMappingVersion>2</dseTypeMappingVersion>
<directoryFactory class="solr.StandardDirectoryFactory" name="DirectoryFactory"/>
<indexConfig>
<rt>false</rt>
<rtOffheapPostings>true</rtOffheapPostings>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>512</ramBufferSizeMB>
...
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>

To view the pending (latest uploaded) search index configuration:

$ dsetool get_core_config demo.health_data current=true

To save the XML output to a file:

$ dsetool get_core_config demo.health_data > /Users/maryjoe/Documents/search/


health_data_config.xml

The health_data_config.xml file is created.


dsetool get_core_schema
Displays the XML for the pending or active search index schema. Supports DSE authentication with [-l
username -p password].

Command is supported only on nodes with DSE Search workloads.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
865
DataStax Enterprise tools

Synopsis

$ dsetool get_core_schema keyspace_name.table_name [current=true|false]

Table 284: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
current=true|false
Optionally specify to view the current (active) schema.

• true - Returns the current live search index schema.

• false - Default. Returns the latest uploaded search index schema.

Examples
The following examples view the search index schema for the demo keyspace and health_data table.
To save the XML output to a file:

$ dsetool get_core_schema demo.health_data > /Users/maryjoe/Documents/search/


health_data_schema.xml

The health_data_schema.xml file is created.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
866
DataStax Enterprise tools

To view the pending (latest uploaded) search index schema:

$ dsetool get_core_schema demo.health_data

To view the active (currently loaded) search index schema:

$ dsetool get_core_schema demo.health_data current=true

The XML for the schema is displayed:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>


<schema name="autoSolrSchema" version="1.5">
<types>
<fieldType class="org.apache.solr.schema.TextField" name="TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType class="org.apache.solr.schema.TrieIntField" name="TrieIntField"/>
</types>
<fields>
<field indexed="true" multiValued="false" name="grade_completed" stored="true"
type="TextField"/>
...
<field indexed="true" multiValued="false" name="fips" stored="true" type="TextField"/>
</fields>
<uniqueKey>(id,age)</uniqueKey>
</schema>

dsetool help
Provides a listing of dsetool commands and parameters.
Synopsis

$ dsetool help

Table 285: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
867
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Typing dsetool or dsetool help provides a listing of dsetool commands and parameters.
Help is not available on a single command.

dsetool index_checks (experimental)


Optional and experimental. Reads the full index and optionally performs sanity checks. No repairs or fixes occur.
Run only when index is inactive. No writes are allowed while index check is running.
Running this index check is time consuming and implies a hard commit.

Command is supported only on nodes with DSE Search workloads.

Synopsis

$ dsetool index_checks keyspace_name.table_name [coreOptions=yamlFilepath]|


[coreOptionsInline=options] --index_checks=true|false --index_checks_stop=true|false

Table 286: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
868
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
coreOptions=yamlFilepath
When auto-generation is on with generateResources=true, the file path to a customized YAML-
formatted file of options. See Changing auto-generated search index settings.
coreOptionsInline=key1:value1#key2:value2#...
Use this key-value pair syntax key1:value1#key2:value2# to specify values for these settings:

• auto_soft_commit_max_time:ms

• default_query_field:field

• distributed:( true | false )

• enable_string_copy_fields:( true | false )

• exclude_columns: col1, col2, col3, ...

• generate_DocValues_for_fields:( * | field1, field2, ... )

• generateResources:(true|false)

See Changing auto-generated search index settings.


--index_checks=true|false
Specify to run the index check.

• true - Runs the index check to verify index integrity. Reads the full index and has performance
impact.

• false - Default. Does not run the index check.

--index_checks_stop=true|false
Specify to stop the index check.

• true - Requests the index check to stop.

• false - Does not stop the index check.

Examples
Ensure that indexing is inactive before doing an index check.

To do an index check:

$ dsetool index_checks demo.health_data

The LUKE handler information is displayed:

LUKE handler info:


------------------
numDocs:0

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
869
DataStax Enterprise tools

maxDoc:0
deletedDocs:0
indexHeapUsageBytes:0
version:2
segmentCount:0
current:true
hasDeletions:false
directory:org.apache.lucene.store.MMapDirectory:MMapDirectory@/
Users/maryjoe/dse/data/solr.data/demo.health_data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5c94e0dd
segmentsFile:segments_1
segmentsFileSizeInBytes:71
userData:{}

dsetool infer_solr_schema
Automatically infers and proposes a schema that is based on the specified keyspace and table. Search indexes
are not modified. Supports DSE authentication with [-l username -p password].

Command is supported only on nodes with DSE Search workloads.

Synopsis

$ dsetool infer_solr_schema keyspace_name.table_name [coreOptions=yamlFilepath]|


[coreOptionsInline=options]

Table 287: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
870
DataStax Enterprise tools

Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
coreOptions=yamlFilepath
When auto-generation is on with generateResources=true, the file path to a customized YAML-
formatted file of options. See Changing auto-generated search index settings.
coreOptionsInline=key1:value1#key2:value2#...
Use this key-value pair syntax key1:value1#key2:value2# to specify values for these settings:

• auto_soft_commit_max_time:ms

• default_query_field:field

• distributed:( true | false )

• enable_string_copy_fields:( true | false )

• exclude_columns: col1, col2, col3, ...

• generate_DocValues_for_fields:( * | field1, field2, ... )

• generateResources:(true|false)

See Changing auto-generated search index settings.


Examples

To automatically infer and propose a schema that is based on the specified


keyspace and table with the tuples and UDTs, specify the keyspace and table
that contains tuples and UDTs:

$ dsetool infer_solr_schema demo.health_data_udt

dsetool inmemorystatus
Provides the memory size, capacity, and percentage for this node and the amount of memory each table is
using. The unit of measurement is MB. Bytes are truncated.
Synopsis

$ dsetool inmemorystatus [keyspace_name.table_name]

Table 288: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
871
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

[keyspace_name.table_name]
The keyspace name and table name.
Examples

To view the status for all tables:

$ dsetool inmemorystatus

The results for all tables are displayed:

Max Memory to Lock: 3276MB


Current Total Memory Locked: 0MB
Current Total Memory Not Able To Lock: 0MB
No MemoryOnlyStrategy tables found.

To view the status for a specific table:

$ dsetool inmemorystatus demo.health_data

dsetool insights_config
Enables and disables DSE Metrics Collector and configures reporting frequency and storage options. The default
mode enables metrics collection and reporting with local storage on disk.
Run this command only on a single node. The change is propagated to all other nodes in the cluster. Wait at
least 30 seconds for the changes to propagate to all nodes. Restarting DSE is not required.

Synopsis

$ dsetool insights_config --show_config | --mode DISABLED|ENABLED_NO_STORAGE|


ENABLED_WITH_LOCAL_STORAGE --metric_sampling_interval_in_seconds seconds --

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
872
DataStax Enterprise tools

config_refresh_interval_in_seconds seconds --data_dir_max_size_in_mb dir_size --


node_system_info_report_period ISO-8601_duration_string

Table 289: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--show_config
Prints the current configuration for DSE Metrics Collector.
--mode DISABLED|ENABLED_NO_STORAGE|ENABLED_WITH_LOCAL_STORAGE
Enables and disables DSE Metrics Collector and configures storage options:

• DISABLED - disables metrics collection. Default.

• ENABLED_NO_STORAGE - enables metrics collection and starts reporting metrics. Typically used
when collectd is configured to report to a real-time monitoring system.

• ENABLED_WITH_LOCAL_STORAGE - enables metrics collection and reporting with local storage


on disk. The default local data directory is /var/lib/cassandra/insights_data. Default.

Restarting DSE is not required after changing the configuration mode. The configuration mode persists
after DSE is restarted.
--metric_sampling_interval_in_seconds seconds
The frequency that metrics are reported to DSE Metrics Collector.
Default: 30
--config_refresh_interval_in_seconds seconds
How often the DSE Metrics Collector configuration changes are pushed to all nodes in the cluster. If
nodes are down when a change is made, the change will propagate when the node is back up.
Default: 30
--data_dir_max_size_in_mb mb

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
873
DataStax Enterprise tools

When local storage is enabled, the limit on how much DSE Metrics Collector data will be stored on disk.
The maximum size of the data directory cannot exceed 2 GB.
Default: 1024 (1 GB)
--node_system_info_report_period duration
The repeating time interval, in ISO-8601 format, for gathering diagnostic information about the node.
For example, PT1H is 1 hour, PT5M is 5 minutes, and PTM200S is 200 seconds.
Default: PT1H (1 hour)
Examples

View the current DSE Metrics Collector configuration

$ dsetool insights_config --show_config

The results of the default configuration:

{
"mode" : "DISABLED",
"config_refresh_interval_in_seconds" : 30,
"metric_sampling_interval_in_seconds" : 30,
"data_dir_max_size_in_mb" : 1024,
"node_system_info_report_period" : "PT1H"
}

Enable metrics collection when collectd is configured to report to a real-time


monitoring system

$ dsetool insights_config --mode ENABLED_NO_STORAGE

Enable metrics collection with local storage

$ dsetool insights_config --mode ENABLED_WITH_LOCAL_STORAGE

Configure 1500 MB for the DSE Metrics Collector local data directory

$ dsetool insights_config --data_dir_max_size_in_mb 1500

The maximum size of the local data directory must not exceed 2 GB.

The default directory for local storage is /var/lib/cassandra/insights_data. To change the directory to store
collected metrics, see Configuring data and log directories for DSE Metrics Collector.

Change the node system reporting duration to 1 week

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
874
DataStax Enterprise tools

Use an ISO-8601 time duration string.

$ dsetool insights_config --node_system_info_report_period P1W

Disable metrics collection

$ dsetool insights_config --mode DISABLED

Configure the metric sampling interval for 60 seconds

$ dsetool insights_config --metric_sampling_interval_in_seconds 60

Configure 120 seconds for the configuration refresh interval

Push configuration changes to all nodes in the cluster every 2 minutes:

$ dsetool insights_config --config_refresh_interval_in_seconds 120

After you make configuration changes with dsetool insights_config, you must disable and then re-enable DSE
Metrics Collector to read the configuration file again. Wait at least 30 seconds for the changes to propagate to
all nodes.

dsetool insights_filters
Configures filters to include and exclude specific metrics for DSE Metrics Collector.
By default, the following metrics are always excluded:

• Thread Per Core (TPC) metrics at each core level

• Keyspace level metrics

• DSE internal table metrics (except system_auth, paxos, and batchlog metrics)

Use a regular expression (regex) to specify which metrics to include or exclude from the filter. See Filtering
metrics.
Synopsis

$ dsetool insights_filters --show_filters | --remove_all_filters | --add --global|--


insights_only --allow regex | --deny regex| --remove --global | --insights_only --allow
regex | --deny regex

Table 290: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
875
DataStax Enterprise tools

Syntax conventions Description

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--show_filters
Prints the current filters for DSE Metrics Collector.
--remove_all_filters
Remove all metrics filters for DSE Metrics Collector.
--add --global|--insights_only regex
Include metrics that match this regular expression and apply the filter with scope of --global or --
insights_only.
--deny --global|--insights_only regex
Exclude metrics that match this regular expression and apply the filter with scope of --global or --
insights_only.
--global
Metrics filter scope includes metrics reported locally and insights data files.
--insights_only
Limit metrics filter scope to insights data files only. Appropriate for diagnostic use.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
876
DataStax Enterprise tools

Example: Example filters

Show all active filters

$ dsetool insights_filters --show_filters

Remove all active filters

$ dsetool insights_filters --remove_all_filters

Add a global filter to deny all metrics for a specific keyspace

$ dsetool insights_filters --add --global --deny "org\\.apache\\.cassandra\\.metrics


\\.(keyspace_name|table_name).*(keyspace_name).*"

Add a global filter to deny all metrics matching KeyspaceMetrics

$ dsetool insights_filters --add --global --deny .+KeyspaceMetrics.+

Remove a global filter to allow metrics for a specific keyspace that has an
existing deny filter

$ dsetool insights_filters --remove --global --deny "org\\.apache\\.cassandra\


\.metrics\\.(keyspace_name|table_name).*(keyspace_name).*"

Add a filter to insights data files that deny grace period metrics

$ dsetool insights_filters --add --insights_only --deny .+gc.+

dsetool list_index_files
Lists all index files for a search index on the local node. The results show file name, encryption, disk usage,
decrypted size, and encryption overhead. The index file is encrypted only when the backing CQL table is
encrypted and the search index uses EncryptedFSDirectoryFactory; otherwise, the index file is decrypted.

Command is supported only on nodes with DSE Search workloads.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
877
DataStax Enterprise tools

Synopsis

$ dsetool list_index_files keyspace_name.table_name [--index directory]

Table 291: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
--index
The data directory that contains the index files.

• If not specified, the default directory is inferred from the search index name.

• directory - A specified file path to the solr.data directory that contains the search index files.

Examples

To list the index files:

$ dsetool list_index_files demo.health_data

The results show file name, encryption, disk usage, decrypted size, and encryption overhead:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
878
DataStax Enterprise tools

Filename Encryption Disk usage Decrypted size


Encryption overhead
-------- ---------- ---------- ----------
--------------
segments_1 N/A 7124 bytes N/A N/A
write.lock N/A 3240 bytes N/A N/A

To list the index files in a specified directory:

$ dsetool list_index_files demo.health_data /My_data_dir

dsetool list_core_properties
Lists the properties and values in the dse-search.properties resource for the search index.
See Load balancing for distributed search queries.

Synopsis

$ dsetool list_core_properties

Table 292: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
879
DataStax Enterprise tools

Examples
To view properties set in the dse-search.properties resource:

$ dsetool list_core_properties demo.health_data

Example result, assuming the shard shuffling strategy has already been set to RANDOM:

shard.shuffling.strategy=RANDOM

dsetool list_subranges
Lists the subranges of data in a keyspace by dividing a token range into a number of smaller subranges. Useful
when the specified range is contained in the target node's primary range.
Synopsis

$ dsetool list_subranges keyspace_name table_name keys_per_range start_token end_token

Table 293: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name table_name
Keyspace table pair.
keys_per_range
The approximate number of rows per subrange.
start_token
The start token of a specified range of tokens.
end_token

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
880
DataStax Enterprise tools

The end token of a specified range of tokens.


Example
To run the command:

$ dsetool list_subranges demo health_data 10000 113427455640312821154458202477256070485 0

The subranges are output and can be used as input to the nodetool repair command.

Start Token End Token Estimated


Size
------------------------------------------------------------------------------------------------
113427455640312821154458202477256070485 132425442795624521227151664615147681247 11264
132425442795624521227151664615147681247 151409576048389227347257997936583470460 11136
151409576048389227347257997936583470460 0 11264

dsetool listjt
Lists all Job Tracker nodes grouped by the datacenter that is local to them.

Command is supported only on nodes with analytics workloads.

Synopsis

$ dsetool listjt

Table 294: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
881
DataStax Enterprise tools

This command takes no arguments.


Examples

$ dsetool listjt

dsetool managekmip list


Verifies communication with the specified Key Management Interoperability Protocol (KMIP) server and lists the
encryption/decryption keys on that server.
Synopsis

$ dsetool managekmip list kmip_group_name [namespace=key_namespace]

Table 295: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
namespace=key_namespace
Namespace on the specified KMIP provider.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
882
DataStax Enterprise tools

Examples

Get a list of the available keys and states from the KMIP server:

$ dsetool managekmip list kmipgrouptwo

The results show that the KMIP server named vormetricgroup has two keys:

Keys on vormetricgroup:
ID Name Cipher State
Activation Date Creation Date Protect Stop Date Namespace
02-449 82413ef3-4fa6-4d4d-9dc8-71370d731fe4_0 AES/CBC/PKCS5 Deactivated Mon
Apr 25 20:25:47 UTC 2016 n/a n/a n/a
02-540 0eb2277e-0acc-4adb-9241-1dd84dde691c_0 AES Active Tue
May 31 12:57:59 UTC 2016 n/a

dsetool managekmip expirekey


Expires encryption/decryption keys on a Key Management Interoperability Protocol (KMIP) server. Database
stops using the key for encryption at the specified time and continues to use the expired key to decrypt existing
data. Data re-keying is not required. Use this command to satisfy security policies that require periodically
switching the encryption key.
DataStax recommends following best practices for key management permission policies. See Expiring an
encryption key.
Synopsis

$ dsetool managekmip expirekey kmip_group_name kmip_key_id [date_time]

Table 296: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
883
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
kmip_key_id
The key id on the KMIP provider.
date_time
After the specified date_time, new data will not be encrypted with the key. Data can be decrypted with
the key after this expire date/time. Format of datetime is YYYY-MM-DD HH:MM:SS:T. For example, use
2016-04-13 20:05:00:0 to expire the encryption key at 8:05 p.m. on 13 April 2016.
Examples

To immediately expire an encryption key:

$ dsetool managekmip expirekey kmipgrouptwo 02-540

Encryption for new data is prevented, but decryption with the key is still allowed. Because the expire date/time is
not specified, the key is expired immediately.

To expire an encryption key at a specific date and time:

$ dsetool managekmip expirekey kmipgrouptwo 02-540 2017-04-13 20:05:00:0

dsetool managekmip revoke


Permanently disables the key on the KMIP server. Database can no longer use the key for encryption, but
continues to use the key for decryption of existing data. Re-encrypt existing data before completely removing the
key from the KMIP server. Use this command as the first step when replacing a compromised key.
Synopsis

$ dsetool managekmip revoke kmip_group_name kmip_key_id

Table 297: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
884
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
kmip_key_id
The key id on the KMIP provider.
Examples

To revoke a key to prevent decryption:

$ dsetool managekmip revoke kmipgrouptwo 02-540

dsetool managekmip destroy


Completely removes the key from the KMIP server. Database can no longer use the key for encryption or
decryption. Existing data that has not been re-encrypted becomes inaccessible.
Use this command only after revoking a key and re-encrypting existing data.

Synopsis

$ dsetool managekmip destroy kmip_group_name kmip_key_id

Table 298: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
885
DataStax Enterprise tools

Syntax conventions Description

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
kmip_key_id
The key id on the KMIP provider.
Examples

To revoke a key to prevent decryption:

$ dsetool managekmip revoke kmipgrouptwo 02-540

After you revoke a key, you can destroy it:

$ dsetool managekmip destroy kmipgrouptwo 02-540

dsetool node_health
Retrieves a dynamic score between 0 and 1 that describes the health of a DataStax Enterprise node. Node
health is a score-based representation of how fit a node is to handle search queries. The node health composite
score is based on dropped mutations and uptime. A higher score indicates better node health. Nodes that have a
large number of dropped mutations and nodes that are just started have a lower health score.
See Collecting node health and indexing status scores.
Synopsis

$ dsetool node_health [--all]

Table 299: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
886
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

--all
Run the operation on all nodes.
Examples

To retrieve the health score of the local node:

$ dsetool node_health

The result displays a number between 0 and 1:

Node Health [0,1]: 0.7

To retrieve the health score of a specified node:

$ dsetool -h 200.192.10.11 node_health

To retrieve the health score of all nodes:

$ dsetool node_health --all

dsetool partitioner
Returns the fully qualified classname of the IPartitioner that is used by the cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
887
DataStax Enterprise tools

Synopsis

$ dsetool partitioner

Table 300: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command takes no arguments.


Examples

$ dsetool partitioner

The partitioner in use is displayed:

org.apache.cassandra.dht.Murmur3Partitioner

dsetool perf
Temporarily changes the running parameters for the CQL Performance Service. Histogram tables provide DSE
statistics that can be queried with CQL.
Changes made with performance object subcommands do not persist between restarts and are useful only for
short-term diagnostics.
To make these changes permanent, change the CQL Performance Service options in dse.yaml.

See DSE Performance Service diagnostic table reference and Collecting histogram diagnostics.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
888
DataStax Enterprise tools

Synopsis

$ dsetool perf subcommand values

Table 301: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

clustersummary enable|disable
Whether to enable the collection of database-level statistics for the cluster.
cqlslowlog enable|disable
Whether to enable the collection of CQL queries that exceed the specified time threshold.
cqlslowlog threshold
The CQL slow log threshold as a percentile of the actual request times:

• [0,1] is a percentile threshold

• >1 is an absolute threshold in milliseconds

• 1.0 logs no queries

• 99.9 logs 0.1% of the slowest queries

• 95.0 logs 5% of the slowest queries

• 50.0 logs 50% of the slowest queries

• 0.0 logs all queries

cqlslowlog skip_writing_to_db
Keeps slow queries in-memory only.
cqlslowlog write_to_db

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
889
DataStax Enterprise tools

Writes data to the database. When data writes to the database, the threshold must be >= 2000 ms to
prevent a high load on database.
Temporary equivalent of cql_slow_log_options.skip_writing_to_db: false setting in dse.yaml.
cqlslowlog set_num_slowest_queries
The number of slow queries to keep in-memory.
cqlslowlog recent_slowest_queries
The specified number of the most recent slow queries to retrieve.
cqlsysteminfo enable|disable
Whether to collect CQL system performance information statistics.
dbsummary enable|disable
Whether to collect database summary statistics.
histograms enable|disable
Whether to collect table histograms that measure the distribution of values in a stream of data.
Histogram tables provide DSE statistics that can be queried with CQL. The data in the diagnostic
histogram tables is cumulative since the DSE server was started.
resourcelatencytracking enable|disable
Whether to collect resource latency tracking statistics.
solrcachestats enable|disable
Whether to collect Solr cache statistics.
solrindexingerrorlog enable|disable
Whether to log Solr indexing errors.
solrindexstats enable|disable
Whether to collect Solr indexing statistics.
solrlatencysnapshots enable|disable
Whether to collect Solr latency snapshots.
solrrequesthandlerstats enable|disable
Whether to collect Solr request handler statistics.
solrslowlog threshold enable|disable
Whether to log the Solr slow sub-query log and set the Solr slow log threshold in milliseconds.
solrupdatehandlerstats enable|disable
Whether to collect Solr update handler statistics.
userlatencytracking enable|disable
Whether to enable user latency tracking.
Examples
These example commands make temporarily changes only. Changes made with performance object
subcommands do not persist between restarts and are useful only for short-term diagnostics.
See Collecting database summary diagnostics.

To enable the collection of database-level statistics data:

$ dsetool perf clustersummary enable

To disable the collection of database-level statistics data:

$ dsetool perf clustersummary disable

See Collecting slow queries.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
890
DataStax Enterprise tools

To keep slow queries in-memory only:

$ dsetool perf cqlslowlog skip_writing_to_db

To set the number of slow queries to keep in-memory:

$ dsetool perf cqlslowlog set_num_slowest_queries 5

To write slow queries to the database:

$ dsetool perf cqlslowlog write_to_db

To disable collecting information on slow queries:

$ dsetool perf cqlslowlog disable

To change the threshold to collect information on 5% of the slowest queries:

$ dsetool perf cqlslowlog 95.0

To enable collecting information to identify slow search queries:

$ dsetool perf solrslowlog enable

To change the threshold value (in milliseconds) at which a sub-query is slow


enough to be reported:

$ dsetool perf solrslowlog 200

dsetool read_resource
Reads the specified search index config or schema. Supports DSE authentication with [-l username -p
password].

Command is supported only on nodes with DSE Search workloads.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
891
DataStax Enterprise tools

Synopsis

$ dsetool read_resource keyspace_name.table_name name=res_filename

Table 302: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
res_filename
The name of the search index resource file to read.
Examples

To read the resource:

$ dsetool read_resource demo.health_data stopwords.xml

After reading the resource, then upload the search index.


dsetool rebuild_indexes
Rebuilds secondary indexes on the local node.

DataStax recommends using these commands instead:

• For DSE Search workloads: dsetool reload_core

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
892
DataStax Enterprise tools

• For workloads other than DSE Search: nodetool rebuild_index

Synopsis

$ dsetool rebuild_indexes keyspace_name.table_name [index1,index2,...]

Table 303: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
index1,index2,...
Include one or a comma-separated list of secondary indexes to rebuild. If indexes are not specified,
rebuilds all indexes.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
893
DataStax Enterprise tools

Examples

To rebuild all secondary indexes:

$ dsetool rebuild_indexes demo.health_data

To rebuild only the specified secondary indexes:

$ dsetool rebuild_indexes demo.health_data index1, index2

dsetool reload_core
Reloads the search index to recognize changes to schema or configuration. Supports DSE authentication with [-
l username -p password].
To reload the core and prevent reindexing, accept the default values reindex=false and deleteAll=false.

See Reloading the search index for details.


Synopsis

$ dsetool reload_core keyspace_name.table_name [coreOptions=yamlFile |


coreOptionsInline=key1:value1#key2:value2#...] [deleteAll=(true|false)] [distributed=(true|
false)] [reindex=(true|false)] [schema=path] [solrconfig=path]

Table 304: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
894
DataStax Enterprise tools

Syntax conventions Description

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
schema=path
Path of the UTF-8 encoded search index schema file. Cannot be specified when
generateResources=true.
To ensure that non-indexed fields in the table are retrievable by queries, you must include those
fields in the schema file. For more information, see Solr single-pass CQL queries.
solrconfig=path
Path of the UTF-8 encoded search index configuration file. Cannot be specified when
generateResources=true.
distributed=(true|false)
Whether to distribute and apply the operation to all nodes in the local datacenter.

• True applies the operation to all nodes in the local datacenter.

• False applies the operation only to the node it was sent to. False works only when recovery=true.

Distributing a re-index to an entire datacenter degrades performance severely in that datacenter.


reindex=(true|false)
Whether to reindex the data when search indexes are auto-generated with generateResources=true.
Reindexing works on a datacenter (DC) level. Reindex only once per search-enabled DC. Repeat the
reindex command on other data centers as required.
Valid values:

• true - Default. Reindexes the data. Accepts reads and keeps the current search index while the
new index is building.

• false - Does not reindex the data. You can check and customize search index resources before
indexing.

deleteAll=( true|false )

• true - deletes the already existing index before reindexing; search results will return either no or
partial data while the index is rebuilding.

• false - does not delete the existing index, causing the reindex to happen in-place; search results
will return partially incorrect results while the index is updating. Default.

During reindexing, a series of criteria routes sub-queries to the nodes most capable of handling them.
See Shard routing for distributed queries.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
895
DataStax Enterprise tools

Examples

To make the pending search index active:

$ dsetool reload_core demo.health_data


coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"

To upload the changed resource file:

$ dsetool reload_core demo.health_data


coreOptionsInline="directory_factory_class:solr.EncryptedFSDirectoryFactory"

dsetool ring
Lists the nodes in the ring. For more readable output, use dsetool status.
Synopsis

$ dsetool ring

Table 305: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command requires no input.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
896
DataStax Enterprise tools

Examples

List nodes in a cluster:

$ dsetool ring

Results:

Address DC Rack Workload


Graph Status State Load Effective-Ownership
Token Health [0,1]

0
10.101.33.157 Cassandra rack1 Cassandra
no Up Normal 178.75 KiB 50.00%
-9223372036854775808 1.00
10.101.32.188 Cassandra rack1 Cassandra
no Up Normal 188.22 KiB 50.00%
0 1.00

List the status of a search reindexing in a cluster:

$ dsetool ring

Results:

Address DC Rack Workload


Graph Status State Load Owns
VNodes Health [0,1]

10.200.182.8 Solr rack1 Search


no Up Normal 888.77 MiB ?
-9223372036854775808 1.00
10.200.182.82 Solr rack1 Search
no Up Normal 1.09 GiB ?
-3074457345618258603 0.90
10.200.182.81 Solr rack1 Search
no Up Joining 446.34 MiB ?
3074457345618258602 0.00
Joining node 10.200.182.81 is currently indexing following cores:
solr_tests.paging
Note: you must specify a keyspace to get ownership information.

dsetool set_core_property
Sets the properties and values in the dse-search.properties resource for the search index.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
897
DataStax Enterprise tools

See Load balancing for distributed search queries.

Synopsis

$ dsetool set_core_property keyspace_name.table_name shard.set.cover.finder=DYNAMIC|


STATIC | shard.shuffling.strategy=HOST|QUERY|HOST_QUERY|RANDOM|SEED |
shard.set.cover.finder.inertia=inertia_integer

Table 306: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
For shard.set.cover.finder:
DYNAMIC
Use randomization in token range and endpoint selection for load balancing. DYNAMIC is the default.
STATIC
Requires load balanced client. Suitable for 8+ vnodes. The same query on a node uses the same token
ranges and endpoints. Creates fewer token filters, and has better performance than DYNAMIC.
When shard.set.cover.finder=DYNAMIC, values for shard.shuffling.strategy:
HOST
Shards are selected based on the host that received the query.
QUERY
Shards are selected based on the query string.
HOST_QUERY
Shards are selected by host x query.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
898
DataStax Enterprise tools

RANDOM
Suitable only for 8 or fewer vnodes. Different random set of shards are selected with each request
(default).
SEED
Selects the same shard from one query to another.
When shard.set.cover.finder=STATIC, values for shard.set.cover.finder.inertia:
inertia_integer
Increasing the inertia value from the default of 1 may improve performance for clusters with more than 1
vnode and more than 20 nodes. The default is appropriate for most workloads.
Examples
To not use randomization to select token ranges and endpoints:

$ dsetool set_core_property demo.health_data shard.set.cover.finder=STATIC

$ dsetool reload_core demo.health_data reindex=false

To use default randomization to select token ranges and endpoints:

$ dsetool set_core_property demo.health_data shard.set.cover.finder=DYNAMIC

$ dsetool reload_core demo.health_data reindex=false

As shown in the examples, after setting the core property value, be sure to reload the search index. While you
can use set_core_property per cluster, reloading the search index must occur per Data Center. In cqlsh,
you can use RELOAD SEARCH INDEX. Example:

RELOAD SEARCH INDEX ON demo.health_data;

You do not need to reindex the specified table unless schema changes were made. Refer to Reloading the
search index.

dsetool sparkmaster cleanup


Drops and recreates the Spark Master recovery table.
Synopsis

$ dsetool sparkmaster cleanup [datacenter]

Table 307: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
899
DataStax Enterprise tools

Syntax conventions Description

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command has an optional argument datacenter. If a datacenter is specified, it will remove the recovery
data for that datacenter.
Examples

$ dsetool sparkmaster cleanup

$ dsetool sparkmaster cleanup dc1

dsetool sparkworker restart


Manually restarts the Spark Worker on the selected node, without restarting the node.
Synopsis

$ dsetool sparkworker restart

Table 308: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
900
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command accepts no parameters.


Examples

$ dsetool sparkworker restart

dsetool status
Lists the nodes in their ring, including the node type and node health. When the datacenter workloads are the
same type, the workload type is listed. When the datacenter workloads are heterogeneous, the workload type is
shown as mixed.
Synopsis

$ dsetool status

Table 309: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
901
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

This command accepts no parameters.


Examples

$ dsetool status

dsetool stop_core_reindex
Stops reindexing for the specified search index on the node where the command is run. Optionally, specify a
timeout in minutes so that the core waits to stop reindexing until the specified timeout is reached, then gracefully
stops the indexing. The default timeout is 1 minute.
Synopsis

$ dsetool stop_core_reindex keyspace_name.table_name [timeout_min]

Table 310: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
902
DataStax Enterprise tools

Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
timeout_min
The number of minutes to wait to gracefully stop the indexing.
Examples

To stop reindexing after the default 1 minute timeout:

$ dsetool stop_core_reindex demo.health_data

results in the following message if successful:

Successfully stopped reindex for core demo.health_data on host


10.200.182.8

To reindexing after 6 minutes:

$ dsetool stop_core_reindex demo.health_data 6

dsetool tieredtablestats
Outputs tiered storage information, including SSTables, tiers, timestamps, and sizes. Provides information on
every table that uses tiered storage.
Synopsis

$ dsetool tieredtablestats keyspace_name.table_name [-v]

Table 311: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
903
DataStax Enterprise tools

Syntax conventions Description

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
-v
Output statistics for each SSTable, in addition to the tier summaries.
Examples

To monitor all tables using tiered storage:

$ dsetool tieredtablestats

Output of command:

ks.tbl
Tier 0:
Summary:
max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
size: 4839
SSTables:
/mnt2/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-3-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
rows: 1
size: 4839
Tier 1:
Summary:
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0
reads_15_min: 0.0
size: 4839
SSTables:
/mnt3/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-4-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
904
DataStax Enterprise tools

reads_15_min: 0.0
rows: 1
size: 4839

To monitor the health_data table using tiered storage:

$ dsetool tieredtablestats demo.health_data

To monitor the health_data table with output for each SSTable:

$ dsetool tieredtablestats demo.health_data -v

dsetool tsreload
Reloads the truststores without a restart. Specify client or server.
Synopsis

$ dsetool tsreload client|server

Table 312: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

client
Reloads the truststore that is used for encrypted client-to-node communications.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
905
DataStax Enterprise tools

server
Reloads the server truststore that is used for encrypted node-to-node (internode) SSL communications.
dsetool unload_core
Removes a search index. Supports DSE authentication with [-l username -p password].
To drop a search index from a table and delete all related data for the entire cluster, see DROP SEARCH
INDEX.
The removal of the secondary index from the table schema is always distributed.

Command is supported only on nodes with DSE Search workloads.

Synopsis

$ dsetool unload_core keyspace_name.table_name [deleteDataDir=(true|false)]


[deleteResources=(true|false)] [distributed=(true|false)]

Table 313: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

deleteDataDir=( true | false )


Whether to delete index data and any other artifacts in the solr.data directory.
Valid values:

• true - Deletes index data and any other artifacts in the solr.data directory. It does not delete
DataStax Enterprise data.

• false - Default. Does not delete index data or other artifacts.

deleteResources=( true | false )

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
906
DataStax Enterprise tools

Whether to delete the config and schema resources associated with the search index.
Valid values:

• true - Deletes index resources.

• false - Default. Does not delete index resources.

distributed=true | false
Whether to distribute and apply the operation to all nodes in the local datacenter.

• True applies the operation to all nodes in the local datacenter.

• False applies the operation only to the node it was sent to. False works only when recovery=true.

Default: true
Distributing a re-index to an entire datacenter degrades performance severely in that datacenter.

dsetool upgrade_index_files
Upgrades all DSE Search index files.
Requirements:

• The remote node that contains the encryption configuration must be running.

• The local node is offline.

• The user that runs this command must have read and write permissions to the directory that contains the
index files.

Synopsis

$ dsetool upgrade_index_files keyspace_name.table_name -h IP_address [-c port] [--backup] [--


workspace directory] [--index directory]

Table 314: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
907
DataStax Enterprise tools

Syntax conventions Description

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
-h IP_address
Required. Node hostname or IP address of the remote node that contains the encryption configuration
that is used for index encryption. The remote node must be running.
-c port
The DSE port on the remote node that contains the encryption configuration.
--backup
Preserves the index files from the current index as a backup after successful upgrade. The preserved
index file backup is moved to the --workspace directory. When not specified, index files from the current
index are deleted.
--workspace directory
The workspace directory for the upgrade process. The upgraded index is created in this directory. When
not specified, the default directory is the same directory that contains the search index files.
--index directory
The data directory that contains the search index files. When not specified, the default directory is
inferred from the search index name.
Examples

To perform offline index encryption:

$ dsetool upgrade_index_files demo.health_data

See Migrating encrypted tables from earlier versions and Encrypting new Search indexes.
dsetool write_resource
Uploads the specified search index config or schema.

Command is supported only on nodes with DSE Search workloads.

Resource files are stored internally in the database. You can configure the maximum resource file size or disable
resource upload with the resource_upload_limit option in dse.yaml.
Supports DSE authentication with [-l username -p password].
Synopsis

$ dsetool write_resource keyspace_name.table_name name=res_filename file=path_to_file_to_upload

Table 315: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
908
DataStax Enterprise tools

Syntax conventions Description

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
res_filename
The name of the search index resource file to upload.
file
The file path of the file to upload.
Examples

To write the resource:

$ dsetool write_resource demo.health_data stopwords.xml

To specify the uploaded resource file and the path to the resource file:

$ dsetool write_resource demo.health_data name=ResourceFile.xml file=/myPath1/myPath2/


schemaFile.xml

DataStax Enterprise stress tools


cassandra-stress tool
The cassandra-stress tool is a Java-based stress testing utility for basic benchmarking and load testing a
DataStax Enterprise cluster.
Data modeling choices can greatly affect application performance. Significant load testing over several trials is
the best method for discovering issues with a particular data model. The cassandra-stress tool is an effective
tool for populating a cluster and stress testing CQL tables and queries. Use cassandra-stress to:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
909
DataStax Enterprise tools

• Quickly determine how a schema performs.

• Understand how your database scales.

• Optimize your data model and settings.

• Determine production capacity.

The cassandra-stress tool also supports a YAML-based profile for defining specific schemas with various
compaction strategies, cache settings, and types. Sample files are located in the tools directory:

• cqlstress-counter-example.yaml

• cqlstress-example.yaml

• cqlstress-insanity-example.yaml

The YAML file supports user-defined keyspace, tables, and schema. The YAML file can be used to design tests
of reads, writes, and mixed workloads.
When started without a YAML file, cassandra-stress creates a keyspace, keyspace1, and tables, standard1
or counter1, depending on what type of table is being tested. These elements are automatically created the first
time you run a stress test and reused on subsequent runs. You can drop keyspace1 using DROP KEYSPACE.
You cannot change the default keyspace and tables names without using a YAML file.
Usage:

• Package installations:

$ cassandra-stress command [options]

• Tarball installations:

$ cd install_location/tools && bin/cassandra-stress command [options]

cassandra-stress options

Command Description

counter_read Multiple concurrent reads of counters. The cluster must first be populated by a counter_write test.

counter_write Multiple concurrent updates of counters.

help Display help: cassandra-stress help


Display help for an option: cassandra-stress help [options] For example: cassandra-stress help
-schema

legacy Legacy support mode.

mixed Interleave basic commands with configurable ratio and distribution. The cluster must first be populated by a
write test.

print Inspect the output of a distribution definition.

read Multiple concurrent reads. The cluster must first be populated by a write test.

user Interleave user provided queries with configurable ratio and distribution.

version Print the cassandra-stress version.

write Multiple concurrent writes against the cluster.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
910
DataStax Enterprise tools

Additional sub-options are available for each option in the following table. To get more detailed information on
any of these, enter:

$ cassandra-stress help option

When entering the help command, be sure to precede the option name with a hyphen, as shown.

Cassandra-stress sub-options

Sub-option Description

-col Column details, such as size and count distribution, data generator, names, and comparator.
Usage:

-col names=? [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)]


or
-col [n=DIST(?)] [slice] [super=?] [comparator=?] [timestamp=?] [size=DIST(?)]

-errors How to handle errors when encountered during stress testing.


Usage:

-errors [retries=N] [ignore] [skip-read-validation]

• retries=N Number of times to try each operation before failing.

• ignore Do not fail on errors.

• skip-read-validation Skip read validation and message output.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
911
DataStax Enterprise tools

Sub-option Description

-graph Graph results of cassandra-stress tests. Multiple tests can be graphed together.
Usage:

-graph file=? [revision=?] [title=?] [op=?]

-insert Insert specific options relating to various methods for batching and splitting partition updates.
Usage:

-insert [revisit=DIST(?)] [visits=DIST(?)] partitions=DIST(?) [batchtype=?] select-


ratio=DIST(?) row-population-ratio=DIST(?)

-log Where to log progress and the interval to use.


Usage:

-log [level=?] [no-summary] [file=?] [hdrfile=?] [interval=?] [no-settings] [no-progress]


[show-queries] [query-log-file=?]

-mode Thrift or CQL with options.


Usage:

-mode thrift [smart] [user=?] [password=?]


or
-mode native [unprepared] cql3 [compression=?] [port=?] [user=?] [password=?] [auth-
provider=?] [maxPending=?] [connectionsPerHost=?] [protocolVersion=?]
or
-mode simplenative [prepared] cql3 [port=?]

-node Nodes to connect to.


Usage:

-node [datacenter=?] [whitelist] [file=?] []

-pop Population distribution and intra-partition visit order.


Usage:

-pop seq=? [no-wrap] [read-lookback=DIST(?)] [contents=?]


or
-pop [dist=DIST(?)] [contents=?]

-port Specify port for connecting Cassandra nodes. Port can be specified for Cassandra native protocol, Thrift protocol or
a JMX port for retrieving statistics.
Usage:

-port [native=?] [thrift=?] [jmx=?]

-rate Set the rate using the following options:

-rate threads=N [throttle=N] [fixed=N]

where

• threads=N number of clients to run concurrently.

• throttle=N throttle operations per second across all clients to a maximum rate (or less) with no implied
schedule. Default is 0.

• fixed=N expect fixed rate of operations per second across all clients with implied schedule. Default is 0.

OR

-rate [threads>=N] [threads<=N] [auto]

Where

• threads>=N run at least this many clients concurrently. Default is 4.

• threads<=N run at most this many clients concurrently. Default is 1000.

• auto stop increasing threads once throughput saturates.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
912
DataStax Enterprise tools

Sub-option Description

-schema Replication settings, compression, compaction, and so on.


Usage:

-schema [replication(?)] [keyspace=?] [compaction(?)] [compression=?]

-sendto Specify a server to send the stress command to.


Usage:

-sendto <host>

-tokenrange Token range settings.


Usage:

-tokenrange [no-wrap] [split-factor=?] [savedata=?]

-transport Custom transport factories.


Usage:

-transport [factory=?] [truststore=?] [truststore-password=?] [keystore=?] [keystore-


password=?] [ssl-protocol=?] [ssl-alg=?] [store-type=?] [ssl-ciphers=?]

Additional command-line parameters can modify how cassandra-stress runs:


Additional cassandra-stress parameters

Command Description

cl=? Set the consistency level to use during cassandra-stress. Options are ONE, QUORUM, LOCAL_QUORUM,
EACH_QUORUM, ALL, and ANY. Default is LOCAL_ONE.

clustering=DIST(?) Distribution clustering runs of operations of the same kind.

duration=? Specify the time to run, in seconds, minutes or hours.

err<? Specify a standard error of the mean; when this value is reached, cassandra-stress will end. Default is 0.02.

n>? Specify a minimum number of iterations to run before accepting uncertainly convergence.

n<? Specify a maximum number of iterations to run before accepting uncertainly convergence.

n=? Specify the number of operations to run.

no-warmup Do not warmup the process, do a cold start.

ops(?) Specify what operations to run and the number of each. (only with the user option)

profile=? Designate the YAML file to use with cassandra-stress. (only with the user option)

truncate=? Truncate the table created during cassandra-stress. Options are never, once, or always. Default is never.

Example: Simple read and write examples

# Insert (write) one million rows


$ cassandra-stress write n=1000000 -rate threads=50

# Read two hundred thousand rows.


$ cassandra-stress read n=200000 -rate threads=50

# Read rows for a duration of 3 minutes.


$ cassandra-stress read duration=3m -rate threads=50

# Read 200,000 rows without a warmup of 50,000 rows first.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
913
DataStax Enterprise tools

$ cassandra-stress read n=200000 no-warmup -rate threads=50

Example: View schema help

$ cassandra-stress help -schema

replication([strategy=?][factor=?][<option 1..N>=?]): Define the


replication strategy and any parameters
strategy=? (default=org.apache.cassandra.locator.SimpleStrategy) The
replication strategy to use
factor=? (default=1) The number of
replicas
keyspace=? (default=keyspace1) The keyspace
name to use
compaction([strategy=?][<option 1..N>=?]): Define the
compaction strategy and any parameters
strategy=? The compaction
strategy to use
compression=? Specify the
compression to use for SSTable, default:no compression

Example: Populate the database


Generally it is easier to let cassandra-stress create the basic schema and then modify it in CQL:

#Load one row with default schema


$ cassandra-stress write n=1 cl=one -mode native cql3 -log file=create_schema.log

#Modify schema in CQL


$ cqlsh

#Run a real write workload

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
914
DataStax Enterprise tools

$ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema


keyspace="keyspace1" -log file=load_1M_rows.log

Example: Change the replication strategy


Changes the replication strategy to NetworkTopologyStrategy and targets one node named existing.

$ cassandra-stress write n=500000 no-warmup -node existing -schema


"replication(strategy=NetworkTopologyStrategy, existing=2)"

Example: Run a mixed workload


When running a mixed workload, you must escape parentheses, greater-than and less-than signs, and
other such things. This example invokes a workload that is one-quarter writes and three-quarters reads.

$ cassandra-stress mixed ratio\(write=1,read=3\) n=100000 cl=ONE -pop dist=UNIFORM


\(1..1000000\) -schema keyspace="keyspace1" -mode native cql3 -rate threads\>=16
threads\<=256 -log file=~/mixed_autorate_50r50w_1M.log

Notice the following in this example:

1. The ratio parameter requires backslash-escaped parenthesis.

2. The value of n used in the read phase is different from the value used in write phase. During the write
phase, n records are written. However in the read phase, if n is too large, it is inconvenient to read
all the records for simple testing. Generally, n does not need be large when validating the persistent
storage systems of a cluster.
The -pop dist=UNIFORM\(1..1000000\) portion says that of the n=100,000 operations, select the
keys uniformly distributed between 1 and 1,000,000. Use this when you want to specify more data per
node than what fits in DRAM.

3. In the rate section, the greater-than and less-than signs are escaped. If not escaped, the shell
attempts to use them for IO redirection: the shell tries to read from a non-existent file called =256 and
create a file called =16. The rate section tells cassandra-stress to automatically attempt different
numbers of client threads and not test less that 16 or more than 256 client threads.

Example: Standard mixed read/write workload keyspace for a single node

CREATE KEYSPACE "keyspace1" WITH replication = {


'class': 'SimpleStrategy',
'replication_factor': '1'
};
USE "keyspace1";
CREATE TABLE "standard1" (
key blob,
"C0" blob,
"C1" blob,
"C2" blob,
"C3" blob,
"C4" blob,
PRIMARY KEY (key)
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
915
DataStax Enterprise tools

comment='' AND
gc_grace_seconds=864000 AND
index_interval=128 AND
replicate_on_write='true' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'class': 'LZ4Compressor'};

Example: Split up a load over multiple cassandra-stress instances on different nodes


This example demonstrates loading into large clusters, where a single cassandra-stress load generator
node cannot saturate the cluster. In this example, $NODES is a variable whose value is a comma delimited
list of IP addresses such as 10.0.0.1, 10.0.0.2, and so on.

#On Node1
$ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema
keyspace="keyspace1" -pop seq=1..1000000 -log file=~/node1_load.log -node $NODES

#On Node2
$ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema
keyspace="keyspace1" -pop seq=1000001..2000000 -log file=~/node2_load.log -node
$NODES

Example: Run cassandra-stress with authentication


The following example shows using the -mode option to supply a username and password:

$ cassandra-stress -mode native cql3 user=cassandra password=cassandra no-warmup


cl=QUORUM

Check the documentation of the transport option for SSL authentication.

Example: Run cassandra-stress with authentication and SSL encryption


The following example shows using the -mode option to supply a username and password, and the -
transportation option for SSL parameters:

$ cassandra-stress write n=100k cl=ONE no-warmup -mode native cql3


user=cassandra password=cassandra -transport truststore=/usr/local/lib/dsc-
cassandra/conf/server-truststore.jks truststore-password=truststorePass
factory=org.apache.cassandra.thrift.SSLTransportFactory keystore=/usr/local/lib/dsc-
cassandra/conf/server-keystore.jks keystore-password=myKeyPass

Cassandra authentication and SSL encryption must already be configured before executing
cassandra-stress with these options. The example shown above uses self-signed CA certificates.

Example: Run cassandra-stress using the truncate option


This option must be inserted before the mode option, otherwise the cassandra-stress tool won't apply
truncation as specified.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
916
DataStax Enterprise tools

The following example shows the truncate command:

$ cassandra-stress write n=100000000 cl=QUORUM truncate=always -schema


keyspace=keyspace-rate threads=200 -log file=write_$NOW.log

Example: Use a YAML file to run cassandra-stress


This example uses a YAML file named cqlstress-example.yaml, which contains the keyspace and table
definitions, and a query definition. The keyspace name and definition are the first entries in the YAML file:

keyspace: perftesting

keyspace_definition:

CREATE KEYSPACE perftesting WITH replication = { 'class': 'SimpleStrategy',


'replication_factor': 3};

The table name and definition are created in the next section using CQL:

table: users

table_definition:

CREATE TABLE users (


username text,
first_name text,
last_name text,
password text,
email text,
last_access timeuuid,
PRIMARY KEY(username)
);

In the extra_definitions section you can add secondary indexes or materialized views to the table:

extra_definitions:
- CREATE MATERIALIZED VIEW perftesting.users_by_first_name AS SELECT * FROM
perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY
(first_name, username);
- CREATE MATERIALIZED VIEW perftesting.users_by_first_name2 AS SELECT * FROM
perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY
(first_name, username);
- CREATE MATERIALIZED VIEW perftesting.users_by_first_name3 AS SELECT * FROM
perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY
(first_name, username);

The population distribution can be defined for any column in the table. This section specifies a uniform
distribution between 10 and 30 characters for username values in gnerated rows, that the values in
the generated rows willcreates , a uniform distribution between 20 and 40 characters for generated
startdate over the entire Cassandra cluster, and a Gaussian distribution between 100 and 500
characters for description values.

columnspec:
- name: username
size: uniform(10..30)
- name: first_name
size: fixed(16)
- name: last_name

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
917
DataStax Enterprise tools

size: uniform(1..32)
- name: password
size: fixed(80) # sha-512
- name: email
size: uniform(16..50)
- name: startdate
cluster: uniform(20...40)
- name: description
size: gaussian(100...500)

After the column specifications, you can add specifications for how each batch runs. In the following code,
the partitions value directs the test to use the column definitions above to insert a fixed number of rows
in the partition in each batch:

insert:
partitions: fixed(10)
batchtype: UNLOGGED

The last section contains a query, read1, that can be run against the defined table.

queries:
read1:
cql: select * from users where username = ? and startdate = ?
fields: samerow # samerow or multirow (select arguments from the same row,
or randomly from all rows in the partition)

The following example shows using the user option and its parameters to run cassandra-stress tests
from cqlstress-example.yaml:

$ cassandra-stress user profile=tools/cqlstress-example.yaml n=1000000 ops


\(insert=3,read1=1\) no-warmup cl=QUORUM

Notice that:

• The user option is required for the profile and opt parameters.

• The value for the profile parameter is the path and filename of the .yaml file.

• In this example, -n specifies the number of batches that run.

• The values supplied for ops specifies which operations run and how many of each. These values
direct the command to insert rows into the database and run the read1 query.
How many times? Each insert or query counts as one batch, and the values in ops determine how
many of each type are run. Since the total number of batches is 1,000,000, and ops says to run three
inserts for each query, the result will be 750,000 inserts and 250,000 of the read1 query.
Use escaping backslashes when specifying the ops value.

For more information, see Improved Cassandra 2.1 Stress Tool: Benchmark Any Schema – Part 1.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
918
DataStax Enterprise tools

Example: Use the -graph option


In Cassandra 3.2 and later, the -graph option provides visual feedback for cassandra-stress tests. A
file must be named to build the resulting HTML file. A title and revision are optional, but revision
must be used if multiple stress tests are graphed on the same output.

$ cassandra-stress user profile=tools/cqlstress-example.yaml ops\(insert=1\) -graph


file=test.html title=test revision=test1

An interactive graph can be displayed with a web browser:

Interpreting the output of cassandra-stress


Each line reports data for the interval between the last elapsed time and current elapsed time.

Created keyspaces. Sleeping 1s for propagation.


Sleeping 2s...
Warming up WRITE with 50000 iterations...
Running WRITE with 200 threads for 1000000 iteration
type total ops, op/s, pk/s, row/s, mean, med, .95, .99,
.999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
total, 43148, 42991, 42991, 42991, 4.6, 1.5, 10.9, 106.1,
239.3, 255.4, 1.0, 0.00000, 0, 1, 49, 49, 0, 612
total, 98715, 43857, 43857, 43857, 4.6, 1.7, 8.5, 98.6,
204.6, 264.5, 2.3, 0.00705, 0, 1, 45, 45, 0, 619
total, 157777, 47283, 47283, 47283, 4.1, 1.4, 8.3, 70.6,
251.7, 286.3, 3.5, 0.02393, 0, 1, 59, 59, 0, 611

Results:
op rate : 46751 [WRITE:46751]
partition rate : 46751 [WRITE:46751]
row rate : 46751 [WRITE:46751]
latency mean : 4.3 [WRITE:4.3]
latency median : 1.3 [WRITE:1.3]

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
919
DataStax Enterprise tools

latency 95th percentile : 7.2 [WRITE:7.2]


latency 99th percentile : 60.5 [WRITE:60.5]
latency 99.9th percentile : 223.2 [WRITE:223.2]
latency max : 503.1 [WRITE:503.1]
Total partitions : 1000000 [WRITE:1000000]
Total errors : 0 [WRITE:0]
total gc count : 18
total gc mb : 10742
total gc time (s) : 1
avg gc time(ms) : 73
stdev gc time(ms) : 16
Total operation time : 00:00:21

END

Table 316: Output of cassandra-stress


Data Description

total ops Running total number of operations during the run.

op/s Number of operations per second performed during the run.

pk/s Number of partition operations per second performed during the run.

row/s Number of row operations per second performed during the run.

mean Average latency in milliseconds for each operation during that run.

med Median latency in milliseconds for each operation during that run.

.95 95% of the time the latency was less than the number displayed in the column.

.99 99% of the time the latency was less than the number displayed in the column.

.999 99.9% of the time the latency was less than the number displayed in the column.

max Maximum latency in milliseconds.

time Total operation time.

stderr Standard error of the mean. It is a measure of confidence in the average throughput number; the smaller the
number, the more accurate the measure of the cluster's performance.

gc: # Number of garbage collections.

max ms Longest garbage collection in milliseconds.

sum ms Total of garbage collection in milliseconds.

sdv ms Standard deviation in milliseconds.

mb Size of the garbage collection in megabytes.

fs-stress tool
Synopsis

fs-stress [options] dsefs_directory listen_address

The default IP address is the listen_address property in the cassandra.yaml file. If not using localhost, specify the
correct IP address.
fs-stress is located in the tools directory of your installation.
The default location of the tools directory depends on the type of installation:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
920
DataStax Enterprise tools

• Package installations: /usr/share/dse/tools

• Tarball installations: installation_location/dse/tools

Table 317: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Description
The fs-stress tool performs stress testing of the DSE File System (DSEFS) layer.

Data Description

progress Total progress of the stress operation.

bytes Total bytes written/read.

curr rate Current rate of bytes being written/read per second.

avg rate Average rate of bytes being written/read per second.

max latency Maximum latency in milliseconds during the current reporting window.

SSTable utilities
SSTable utility tools are diagnostic tools for analyzing, using, upgrading, and changing DataStax Enterprise
SSTables.
About SSTable tools
For the following SSTable utility tools, stop DSE before running the command:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
921
DataStax Enterprise tools

• sstabledump

• sstableexpiredblockers

• sstablelevelreset

• sstablemetadata

• sstableofflinerelevel

• sstablerepairedset

• sstablesplit

SSTable tools work offline from the DataStax Enterprise database. If you need to pass a JVM parameter,
specify it in the command line. For example, to change the max heap size:

$ MAX_HEAP=2g sstabletoolname

SSTable tools are located in several locations.


The default location of the SSTable tools depends on the type of installation and the tool:

• Package installations: /usr/bin/

• Tarball installations: installation_location/resources/cassandra/tools/bin or installation_location/


resources/cassandra/bin

Tool Tarball filepath installation_location/resources/cassandra/..


sstabledump tools/bin
sstableexpiredblockers tools/bin
sstablelevelreset tools/bin
sstableloader bin
sstablemetadata tools/bin
sstableofflinerelevel tools/bin
sstablerepairedset tools/bin
sstablepartitions tools/bin
sstablescrub bin
sstablesplit tools/bin
sstableupgrade bin
sstableutil bin
sstableverify bin

sstabledowngrade
Downgrades the SSTables in the given table or snapshot to the version of OSS Apache Cassandra™ that is
compatible with the current version of DSE.
The sstabledowngrade command cannot be used to downgrade system tables or downgrade DSE versions.

Synopsis

$ sstabledowngrade [--debug] [-h | --help] [[-k | --keep-source] | [--keep-generation]]


[-b | --backups] [-o | --output-dir output-dir] [--schema schema-file [--schema schema-file2
... ]] [--sstable-files sstable] [-t | --throughput rate-limit] [--temp-storage tmp-dir]
keyspace_name table_name [snapshot_name]

SSTable compatibility
For details on SSTable versions and compatibility, see DataStax Enterprise, Apache Cassandra, CQL, and
SSTable compatibility.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
922
DataStax Enterprise tools

Definition
The short form and long form parameters are comma-separated.

Command arguments

--debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
-k, --keep-source
Do not delete the source SSTables. Do not use with the --keep-generation option.
-b, --backups
Rewrite incremental backups for the given table. May not be combined with the snapshot_name
option.
--keep-generation
Keep the SSTable generation. Do not use with the --keep-source option.
-o, --output-dir
Rewritten files are placed in output-dir/keyspace-name/table-name-and-id.
--schema
Allows upgrading and downgrading SSTables using the schema of the table in a CQL file containing
the DDL statements to re-create the schema. Must be a DDL file that allows the recreation of the table
including dropped columns. Repeat the option to specify multiple DDL schema files.
Always use the schema.cql from a snapshot of the table so that the DDL has all of the information
omitted by DESCRIBE TABLE, including dropped columns.
--sstable-files
Instead of processing all SSTables in the default data directories, process only the tables specified
via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all
SSTables within that directory are processed. Snapshots and backups are not supported with this
option.
-t, --throughput
Set to limit the maximum disk read rate in MB/s.
--temp-storage
When used with --schema, specifies location of temporary data. Directory and contents are deleted
when the tool terminates. Directory must not be shared with other tools and must be empty. If not
specified the default directory is /tmp.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
snapshot_name
Snapshot name.

• Rewrites only the specified snapshot.

• Replaces files in the given snapshot and breaks any hard links to live SSTables.

• Required before attempting to restore a snapshot taken in a different DSE version than the one that
is currently running.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
923
DataStax Enterprise tools

Examples

Dowgrade events table in the cycling keyspace

$ sstabledowngrade cycling events

Found 1 sstables to rewrite.


Rewriting TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
events-2118bc7054af11e987feb76774f7ab56/aa-1-bti-Data.db') to BIG/mc.
Rewrite of TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
events-2118bc7054af11e987feb76774f7ab56/aa-1-bti-Data.db') to BIG/mc complete.

sstabledump
Dumps contents of given SSTable to standard output in JSON format.
Synopsis

$ sstabledump sstable_filepath [-d] [-e] [-k partition_key] [-l] [-t] [-x partition_key]

Table 318: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
924
DataStax Enterprise tools

Command arguments

-d
Display a CQL row per line.
-e
Display a list of partition keys.
-k, --key partition_key
Partition keys to include.
-l
Output JSON lines, by partition.
-t
Print raw timestamps instead of ISO 8601 date strings.
-x, --exclude-key partition_key
Partition key to exclude. Ignored if -y option is given.
Examples

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1

Dump contents of SSTable

DataStax Enterprise must be stopped before you run this command.

$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
f4f24621ce3f11e89d32bdcab3a99c6f/aa-1-bti-Statistics.db

[
{
"partition" : {
"key" : [ "Claudio HEINEN" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 90,
"liveness_info" : { "tstamp" : "2018-10-12T16:58:00.368228Z" },
"cells" : [
{ "name" : "blist_", "deletion_info" : { "marked_deleted" :
"2018-10-12T16:58:00.368227Z", "local_delete_time" : "2018-10-12T16:58:00Z" } },
{ "name" : "blist_", "path" : [ "bday" ], "value" : "27/07/1992" },
{ "name" : "blist_", "path" : [ "blist_age" ], "value" : "23" },

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
925
DataStax Enterprise tools

{ "name" : "blist_", "path" : [ "blist_nation" ], "value" : "GERMANY" }


]
}
]
},
{
"partition" : {
"key" : [ "Claudio VANDELLI" ],
"position" : 91
},
"rows" : [
{
"type" : "row",
"position" : 179,
"liveness_info" : { "tstamp" : "2018-10-12T16:58:00.354443Z" },
"cells" : [
{ "name" : "blist_", "deletion_info" : { "marked_deleted" :
"2018-10-12T16:58:00.354442Z", "local_delete_time" : "2018-10-12T16:58:00Z" } },
{ "name" : "blist_", "path" : [ "bday" ], "value" : "27/07/1961" },
{ "name" : "blist_", "path" : [ "blist_age" ], "value" : "54" },
{ "name" : "blist_", "path" : [ "blist_nation" ], "value" : "ITALY" }
]
}
]
},
{
"partition" : {
"key" : [ "Luc HAGENAARS" ],
"position" : 180
},
"rows" : [
{
"type" : "row",
"position" : 275,
"liveness_info" : { "tstamp" : "2018-10-12T16:58:00.374846Z" },
"cells" : [
{ "name" : "blist_", "deletion_info" : { "marked_deleted" :
"2018-10-12T16:58:00.374845Z", "local_delete_time" : "2018-10-12T16:58:00Z" } },
{ "name" : "blist_", "path" : [ "bday" ], "value" : "27/07/1987" },
{ "name" : "blist_", "path" : [ "blist_age" ], "value" : "28" },
{ "name" : "blist_", "path" : [ "blist_nation" ], "value" : "NETHERLANDS" }
]
}
]
}
]

Show a row per line in standard output of the cycling.birthday_list table

DataStax Enterprise must be stopped before you run this command.

$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -d

[Claudio HEINEN]@0 Row[info=[ts=1521498957445075] ]: | , [blist[age]=23


ts=1521498957445075], [blist[bday]=27/07/1992 ts=1521498957445075],
[blist[nation]=GERMANY ts=1521498957445075]
[Claudio VANDELLI]@76 Row[info=[ts=1521498957437559] ]: | , [blist[age]=54
ts=1521498957437559], [blist[bday]=27/07/1961 ts=1521498957437559], [blist[nation]=ITALY
ts=1521498957437559]

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
926
DataStax Enterprise tools

[Luc HAGENAARS]@152 Row[info=[ts=1521498957448698] ]: | , [blist[age]=28


ts=1521498957448698], [blist[bday]=27/07/1987 ts=1521498957448698],
[blist[nation]=NETHERLANDS ts=1521498957448698]
[Toine POELS]@232 Row[info=[ts=1521498957451068] ]: | , [blist[age]=52
ts=1521498957451068], [blist[bday]=27/07/1963 ts=1521498957451068],
[blist[nation]=NETHERLANDS ts=1521498957451068]
[Allan DAVIS]@310 Row[info=[ts=1521498957430478] ]: | , [blist[age]=35
ts=1521498957430478], [blist[bday]=27/07/1980 ts=1521498957430478],
[blist[nation]=AUSTRALIA ts=1521498957430478]
[Laurence BOURQUE]@384 Row[info=[ts=1521498957441360] ]: | , [blist[age]=23
ts=1521498957441360], [blist[bday]=27/07/1992 ts=1521498957441360], [blist[nation]=CANADA
ts=1521498957441360]

Display a list of partition keys from the cycling.birthday_list table

DataStax Enterprise must be stopped before you run this command.

$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -e

[ [ "Claudio HEINEN" ], [ "Claudio VANDELLI" ], [ "Luc HAGENAARS" ], [ "Toine POELS" ],


[ "Allan DAVIS" ], [ "Laurence BOURQUE" ] ]

Display all rows in the partition

DataStax Enterprise must be stopped before you run this command.

$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -k "Claudio HEINEN"

[
{
"partition" : {
"key" : [ "Claudio HEINEN" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 75,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.445075Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "23" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1992" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "GERMANY" }
]
}
]
}
]

Display all rows except those in the specified partition

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
927
DataStax Enterprise tools

DataStax Enterprise must be stopped before you run this command.

$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -x "Claudio HEINEN"

[
{
"partition" : {
"key" : [ "Claudio VANDELLI" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 151,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.437559Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "54" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1961" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "ITALY" }
]
}
]
},
{
"partition" : {
"key" : [ "Luc HAGENAARS" ],
"position" : 152
},
"rows" : [
{
"type" : "row",
"position" : 231,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.448698Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "28" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1987" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "NETHERLANDS" }
]
}
]
},
{
"partition" : {
"key" : [ "Toine POELS" ],
"position" : 232
},
"rows" : [
{
"type" : "row",
"position" : 309,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.451068Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "52" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1963" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "NETHERLANDS" }
]
}
]
},
{
"partition" : {
"key" : [ "Allan DAVIS" ],

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
928
DataStax Enterprise tools

"position" : 310
},
"rows" : [
{
"type" : "row",
"position" : 383,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.430478Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "35" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1980" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "AUSTRALIA" }
]
}
]
},
{
"partition" : {
"key" : [ "Laurence BOURQUE" ],
"position" : 384
},
"rows" : [
{
"type" : "row",
"position" : 460,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.441360Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "23" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1992" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "CANADA" }
]
}
]
}
]

Display each row in its own JSON map

DataStax Enterprise must be stopped before you run this command.

$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -l

{"partition":{"key":["Claudio HEINEN"],"position":0},"rows":
[{"type":"row","position":75,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.445075Z"},"cells":[{"name":"blist","path":
["age"],"value":"23"},{"name":"blist","path":["bday"],"value":"27/07/1992"},
{"name":"blist","path":["nation"],"value":"GERMANY"}]}]}
{"partition":{"key":["Claudio VANDELLI"],"position":76},"rows":
[{"type":"row","position":151,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.437559Z"},"cells":[{"name":"blist","path":
["age"],"value":"54"},{"name":"blist","path":["bday"],"value":"27/07/1961"},
{"name":"blist","path":["nation"],"value":"ITALY"}]}]}
{"partition":{"key":["Luc HAGENAARS"],"position":152},"rows":
[{"type":"row","position":231,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.448698Z"},"cells":[{"name":"blist","path":
["age"],"value":"28"},{"name":"blist","path":["bday"],"value":"27/07/1987"},
{"name":"blist","path":["nation"],"value":"NETHERLANDS"}]}]}
{"partition":{"key":["Toine POELS"],"position":232},"rows":
[{"type":"row","position":309,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.451068Z"},"cells":[{"name":"blist","path":

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
929
DataStax Enterprise tools

["age"],"value":"52"},{"name":"blist","path":["bday"],"value":"27/07/1963"},
{"name":"blist","path":["nation"],"value":"NETHERLANDS"}]}]}
{"partition":{"key":["Allan DAVIS"],"position":310},"rows":
[{"type":"row","position":383,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.430478Z"},"cells":[{"name":"blist","path":
["age"],"value":"35"},{"name":"blist","path":["bday"],"value":"27/07/1980"},
{"name":"blist","path":["nation"],"value":"AUSTRALIA"}]}]}
{"partition":{"key":["Laurence BOURQUE"],"position":384},"rows":
[{"type":"row","position":460,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.441360Z"},"cells":[{"name":"blist","path":
["age"],"value":"23"},{"name":"blist","path":["bday"],"value":"27/07/1992"},
{"name":"blist","path":["nation"],"value":"CANADA"}]}]}

sstableexpiredblockers
Outputs the SSTables that prevent an SSTable from dropping.
By identifying the blocking SSTables, you can take correction active so the database can drop entire SSTables
during compaction. SSTables are dropped during compaction when they contain only expired tombstones and is
guaranteed not to cover any data in other SSTables.

DataStax Enterprise must be stopped before you run this command.

Synopsis

$ sstableexpiredblockers [--dry-run] keyspace_name table_name

Table 319: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
930
DataStax Enterprise tools

The short form and long form parameters are comma-separated.

Command arguments

--dry-run
Test command syntax and environment. Do not execute the command.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
Examples

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1

Output the blocking SSTables that prevent an SSTable from dropping

DataStax Enterprise must be stopped before you run this command.

$ sstableexpiredblockers cycling cyclist_races

Test the output without executing sstableexpiredblockers

DataStax Enterprise must be stopped before you run this command.

$ sstableexpiredblockers --dry-run cycling cyclist_races

sstablelevelreset
Uses LeveledCompactionStrategy to reset the level to zero on a set of SSTables. If the SSTable is already at
level 0, no change occurs. If the SSTable is releveled, the metadata is rewritten to designate the level at 0.

DataStax Enterprise must be stopped before you run this command.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
931
DataStax Enterprise tools

Synopsis

$ sstablelevelreset [--really-reset] keyspace_name table_name

Table 320: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

The short form and long form parameters are comma-separated.

Command arguments

keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
--really-reset
Specifies that DSE is stopped.
table_name
Table name. Required.
Examples

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
932
DataStax Enterprise tools

================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1

Reset cyclist_name to level 0

DataStax Enterprise must be stopped before you run this command.

$ sstablelevelreset cycling cyclist_name

Skipped /var/lib/cassandra/data/cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/
aa-2-bti-Data.db since it is already on level 0
Skipped /var/lib/cassandra/data/cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/
aa-3-bti-Data.db since it is already on level 0

sstableloader
Streams a set of SSTable data files from the sstable_directory to a live cluster. The target keyspace and
table are the parent directories of the sstable_directory.
For example, to load an SSTable named Standard1-g-1-Data.db into Keyspace1/Standard1, have the files
Standard1-g-1-Data.db and Standard1-g-1-Index.db in directory /path/to/Keyspace1/Standard1/.

Synopsis

$ sstableloader [-alg algorithm] [-ap authentication_provider] [-ciphers cipher_suite] [-


cph num_connections_per_host [-d initial_host] [-df dse.yaml_path] [-f cassandra.yaml_path]
[-h] [-i node] [-idct throttle_speed] [-ks keystore_path] [-kspw keystore_password] [--no-
progress] [-p native_transport_port] [-prtcl SSL_protocol] [pw password] [-sp storage_port]
[-ssp ssl_storage_port] [-st store_type] [-t throttle_speed] [-ts truststore_path] [-tspw
truststore_password] [-u username] [-v] sstable_directory

Table 321: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
933
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

-alg,--ssl-alg algorithm
Client SSL algorithm. Default: SunX509.
-ap,--auth-provider authentication_provider
Custom AuthProvider class name. Can be combined with -u username and -pw password if the
AuthProvider supports plain text credentials.
-ciphers, --ssl-ciphers cipher-suite
Comma-separated list of encryption suites for Client SSL.
-cph,--connections-per-host num_connections_per_host
Number of concurrent connections per host.
-d, --nodes initial_host
Required. Comma-separated list of hosts to connect to initially for ring information.
-df, --dse-conf-path dse_yaml_path
The dse.yaml filepath.
-f, --conf-path cassandra_yaml_path
The filepath to a cassandra.yaml config file to override only these encryption options that were set in
the cassandra.yaml file that was read at startup:

• stream_throughput_outbound_megabits_per_sec

• server_encryption_options

• client_encryption_options

-h, --help
Display the usage and listing of the commands.
-i, --ignore node
Comma-separated list of nodes to ignore.
-idct, --inter-dc-throttle throttle_speed
Inter-datacenter throttle speed in Mbits. Default: unlimited.
-ks,--keystore keystore_path
Filepath to keystore for SSL client-to-node encryption. Overrides the client_encryption_options in
cassandra.yaml.
-kspw,--keystore-password keystore_password
Client SSL keystore password. Overrides the client_encryption_options in cassandra.yaml.
--no-progress
Do not display progress.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
934
DataStax Enterprise tools

-p, --port native_transport_port


Port for native connection. Default: 9042.
-prtcl,--ssl-protocol SSL_protocol
Client SSL connections protocol. Overrides the server_encryption_optionsin cassandra.yaml. Default:
TLS.
-pw,--password password
Cassandra authentication password.
-sp, --storage-port storage_port
Port for internode communication. Default: 7000.
-ssp, --ssl-storage-port ssl_storage_port
Port for TLS internode communication. Default: 7001.
sstable_directory
The absolute path to the SSTable data directory. The data_file_directories property in cassandra.yaml
defines the default directory.
-st, --store-type store_type
Client SSL store type.
-t, --throttle throttle_speed
Throttle speed in Mbits. Default: unlimited.
-ts,--truststore truststore_path
Client SSL filepath to truststore.
-tspw,--truststore-password truststore_password
Client SSL truststore password.
-u,--username username
Cassandra authentication username.
-v,--verbose
Verbose output.
Examples

Package installation

$ sstableloader -d 110.82.155.1 /var/lib/cassandra/data/cycling/


cyclist_name-9e516080f30811e689e40725f37c761d/snapshots/1527686840030

Tarball installation

$ dse-6.0.4/bin/sstableloader -d 110.82.157.1 /var/lib/cassandra/data/cycling/


cyclist_name-9e516080f30811e689e40725f37c761d/snapshots/1527686840030

sstablemetadata
Prints metadata about give SSTable or SSTables to standard output, including SSTable name, partitioner,
tombstone details, compressor, TTL, token, min and max clustering values, SSTable level, partition size and
statistics, and column information.

DataStax Enterprise must be stopped before you run this command.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
935
DataStax Enterprise tools

Synopsis

$ sstablemetadata sstable_filepath [sstable_filepath ...] [-c] [-g seconds] [-s] [-t


time_unit] [-u]

Table 322: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

The short form and long form parameters are comma-separated.

Command arguments

-c, --colors
ANSI color sequence.
-g, --gc_grace_seconds seconds
Time to use when calculating droppable tombstones.
sstable_filepath
The explicit or relative filepath to the SSTable data file ending in Data.db.
-s, --scan
Full SSTable scan for additional details. Default: false.
-t, --timestamp_unit time_unit
Time unit that cell timestamps are written with.
-u, --unicode
Use Unicode to draw histograms and progress bars.
Examples
These examples are generated using the cycling keyspace. See Setting up the Cycling keyspace.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
936
DataStax Enterprise tools

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1

Get information about SSTable

DataStax Enterprise must be stopped before you run this command.

$ sstablemetadata /var/lib/cassandra/data/cycling/birthday_list-
f4f24621ce3f11e89d32bdcab3a99c6f/aa-1-bti-Statistics.db

SSTable: /var/lib/cassandra/data/cycling/birthday_list-f4f24621ce3f11e89d32bdcab3a99c6f/
aa-1-bti
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.01
Minimum timestamp: 1539363480354442 (10/12/2018 16:58:00)
Maximum timestamp: 1539363480374846 (10/12/2018 16:58:00)
SSTable min local deletion time: 1539363480 (10/12/2018 16:58:00)
SSTable max local deletion time: 2147483647 (no tombstones)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.6884057971014492
TTL min: 0
TTL max: 0
First token: -5189327806405140569 (Claudio HEINEN)
Last token: -428849430723689847 (Luc HAGENAARS)
minClusteringValues: []
maxClusteringValues: []
Estimated droppable tombstones: 0.3333333333333333
SSTable Level: 0
Repaired at: 0
Pending repair: --
Replay positions covered: {CommitLogPosition(segmentId=1539277782404,
position=18441844)=CommitLogPosition(segmentId=1539277782404, position=18480562)}
totalColumnsSet: 3
totalRows: 3
Estimated tombstone drop times:
Drop Time | Count (%) Histogram
1539363480 (10/12/2018 16:58:00) | 3 (100) OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Percentiles
50th 1663415872 (09/17/2022 11:57:52)
75th 1663415872 (09/17/2022 11:57:52)
95th 1663415872 (09/17/2022 11:57:52)
98th 1663415872 (09/17/2022 11:57:52)
99th 1663415872 (09/17/2022 11:57:52)
Min 1386179894 (12/04/2013 17:58:14)
Max 1663415872 (09/17/2022 11:57:52)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
937
DataStax Enterprise tools

Partition Size:
Size (bytes) | Count (%) Histogram
103 (103 B) | 3 (100) OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Percentiles
50th 103 (103 B)
75th 103 (103 B)
95th 103 (103 B)
98th 103 (103 B)
99th 103 (103 B)
Min 87 (87 B)
Max 103 (103 B)
Column Count:
Columns | Count (%) Histogram
3 | 3 (100) OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Percentiles
50th 3
75th 3
95th 3
98th 3
99th 3
Min 3
Max 3
Estimated cardinality: 3
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1539363480 (10/12/2018 16:58:00)
EncodingStats minTimestamp: 1539363480354442 (10/12/2018 16:58:00)
KeyType: org.apache.cassandra.db.marshal.UTF8Type
ClusteringTypes: []
StaticColumns:
RegularColumns:
blist_:org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.

Get information about SSTable with username

DataStax Enterprise must be stopped before you run this command.

$ sstablemetadata /var/lib/cassandra/data/cycling/cyclist_category-
e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Rows.db -u

SSTable: /var/lib/cassandra/data/cycling/cyclist_category-
e1f76e21ce4311e8949e33016bf887c0/aa-1-bti
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.01
Minimum timestamp: 1539365167498813 (10/12/2018 17:26:07)
Maximum timestamp: 1539365167524231 (10/12/2018 17:26:07)
SSTable min local deletion time: 2147483647 (no tombstones)
SSTable max local deletion time: 2147483647 (no tombstones)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 1.0761904761904761
TTL min: 0
TTL max: 0
First token: -798238132730727330 (One-day-races)
Last token: -798238132730727330 (One-day-races)
minClusteringValues: [367]
maxClusteringValues: [198]
Estimated droppable tombstones: 0.0
SSTable Level: 0
Repaired at: 0
Pending repair: --
Replay positions covered: {CommitLogPosition(segmentId=1539277782404,
position=19530606)=CommitLogPosition(segmentId=1539277782404, position=19541152)}

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
938
DataStax Enterprise tools

totalColumnsSet: 4
totalRows: 2
Estimated tombstone drop times:
Drop Time | Count (%) Histogram
Percentiles
50th 0
75th 0
95th 0
98th 0
99th 0
Min 0
Max 0
Partition Size:
Size (bytes) | Count (%) Histogram
124 (124 B) | 1 (100) ##############################
Percentiles
50th 124 (124 B)
75th 124 (124 B)
95th 124 (124 B)
98th 124 (124 B)
99th 124 (124 B)
Min 104 (104 B)
Max 124 (124 B)
Column Count:
Columns | Count (%) Histogram
4 | 1 (100) ##############################
Percentiles
50th 4
75th 4
95th 4
98th 4
99th 4
Min 4
Max 4
Estimated cardinality: 1
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1442880000 (09/22/2015 00:00:00)
EncodingStats minTimestamp: 1539365167498813 (10/12/2018 17:26:07)
KeyType: org.apache.cassandra.db.marshal.UTF8Type
ClusteringTypes:
[org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.Int32Type)]
StaticColumns:
RegularColumns: id:org.apache.cassandra.db.marshal.UUIDType,
lastname:org.apache.cassandra.db.marshal.UTF8Type

sstableofflinerelevel
Creates a decent leveling for the given keyspace and table.

DataStax Enterprise must be stopped before you run this command.

Synopsis

$ sstableofflinerelevel [--dry-run] keyspace_name table_name

Table 323: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
939
DataStax Enterprise tools

Syntax conventions Description

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

--dry-run
Test command syntax and environment. Do not execute the command.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
Examples

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
940
DataStax Enterprise tools

DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3


rack1

Relevel calendar table on cycling keyspace

DataStax Enterprise must be stopped before you run this command.

$ sstableofflinerelevel cycling calendar

No sstables to relevel for cycling.calendar

sstablepartitions
Identifies large partitions of SSTables and outputs the partition size in bytes, row count, cell count, and
tombstone count.
Synopsis

$ sstablepartitions [-b] [-c cell_threshold] [-k partition_key] [-m] [-o


tombstone_count_threshold] [-r] [-t partition_count_threshold] [-u] [-x partition_keys | -y]
sstable_filepath | sstable_directory

Table 324: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
941
DataStax Enterprise tools

The short form and long form parameters are comma-separated.

Command arguments

-b, --backups
Include backups in the data directories (recursive scans).
-c, --min-cells cell_threshold
Partition cell count threshold.
-k, --key partition_key
Partition keys to include.
-m, --csv
Produce CSV machine-readable output instead of JSON formatted output.
-o, --min-tombstones tombstone_threshold
Partition tombstone count threshold.
-r, --recursive
Recursively.
sstable_directory
The absolute path to the SSTable data directory. The data_file_directories property in cassandra.yaml
defines the default directory.
sstable_filepath
The explicit or relative filepath to the SSTable data file ending in Data.db.
-t, --min-size partition_threshold
Partition size threshold in bytes.
-u, --current-timestamp
Include timestamp in output. Timestamp is the number of seconds since epoch, unit time for TTL
expired calculation.
-x, --exclude-key partition_key
Partition key to exclude. Ignored if -y option is given.
-y, --partitions-only
Only brief partition information. Exclude per-partition detailed row/cell/tombstone information from
process and output.
Examples

Analyze partition statistics for all SSTables a single table

$ sstablepartitions -r /var/lib/cassandra/data/stresscql/
blogposts-7dd6dfc289b511e8a4a329556a9391cc/

Processing stresscql.blogposts-7dd6dfc289b511e8a4a329556a9391cc #3 (bti-aa) (6445137 bytes


uncompressed, 5416338 bytes on disk)
Partition size Row count Cell count Tombstone count
p50 124 1 1 1
p75 149 1 1 1
p90 149 2 2 1
p95 179 2 2 1
p99 215 3 3 1
p999 258 4 4 1
min 51 0 0 0
max 8239 179 179 1
count 56696
time 137676

Processing stresscql.blogposts-7dd6dfc289b511e8a4a329556a9391cc #4 (bti-aa) (230134 bytes


uncompressed, 192999 bytes on disk)
Partition size Row count Cell count Tombstone count
p50 124 1 1 1
p75 124 1 1 1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
942
DataStax Enterprise tools

p90 149 1 1 1
p95 149 1 1 1
p99 149 1 1 1
p999 179 2 2 1
min 51 0 0 0
max 446 10 10 1
count 2169
time 3626

The unit of measure for the partition size column is bytes.

Output only partitions with cell count threshold equal to or greater than 10

$ sstablepartitions -c 10 /var/lib/cassandra/data/stresscql/
blogposts-7dd6dfc289b511e8a4a329556a9391cc/aa-4-bti-Data.db

Processing stresscql.blogposts-7dd6dfc289b511e8a4a329556a9391cc #4 (bti-aa) (230134 bytes


uncompressed, 192999 bytes on disk)
Partition: 'Fwl
Cc xD06iw_]Q|[t[KzCI&
$' (46776c0b4363097815114430361169775f7f5d511b3b08177c5b745b4b1306007a434926091a24)
live, position: 208502, size: 434, rows: 10, cells: 10, tombstones: 0 (row:0, range:0,
complex:0, cell:0, row-TTLd:0, cell-TTLd:0)
Summary of stresscql.blogposts-7dd6dfc289b511e8a4a329556a9391cc #4 (bti-aa):
File: /home/dimitarndimitrov/.ccm/c13529-master/node1/data0/stresscql/
blogposts-7dd6dfc289b511e8a4a329556a9391cc/aa-4-bti-Data.db
1 partitions match
Keys: Fwl
Cc xD06iw_]Q|[t[KzCI& $
Partition size Row count Cell count Tombstone count
p50 124 1 1 1
p75 124 1 1 1
p90 149 1 1 1
p95 149 1 1 1
p99 149 1 1 1
p999 179 2 2 1
min 51 0 0 0
max 446 10 10 1
count 2169
time 4875

The unit of measure for the partition size column is bytes.

Output CSV machine-readable output

$ sstablepartitions -c 10 -m /var/lib/cassandra/data/stresscql/
blogposts-7dd6dfc289b511e8a4a329556a9391cc/aa-4-bti-Data.db

key,keyBinary,live,offset,size,rowCount,cellCount,tombstoneCount,rowTombstoneCount,rangeTombstoneCount,complex
"Fwl
Cc xD06iw_]Q|[t[KzCI&
$",46776c0b4363097815114430361169775f7f5d511b3b08177c5b745b4b1306007a434926091a24,true,208502,434,10,10,0,0,0
home/dimitarndimitrov/.ccm/c13529-master/node1/data0/

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
943
DataStax Enterprise tools

stresscql/blogposts-7dd6dfc289b511e8a4a329556a9391cc/aa-4-bti-
Data.db,stresscql,blogposts,,,,4,bti,aa

sstablerepairedset
Sets status as repaired or unrepaired on a given set of SSTables and updates the repairedAt field to denote
the time of the repair. This metadata facilitates incremental repairs. Use this tool in the process of migrating an
installation to incremental repair.

DataStax Enterprise must be stopped before you run this command.

Use the following command to list all the *Data.db files in a keyspace:

$ find '/home/user/dse-6.7.2/data/keyspace1/' -iname "*Data.db*"

Synopsis

$ sstablerepairedset --really-set [--is-repaired | --is-unrepaired] [-f sstable_list_file |


sstable_filepath]

Table 325: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
944
DataStax Enterprise tools

Command arguments

-f sstable_list_file
The filepath to a file that contains a list of SSTables. For example, a *.txt file.
--is-repaired
Sets repaired status.
--is-unrepaired
Sets unrepaired status.
--really-set
Acknowledgement of potential command impact with DSE stopped.
sstable_filepath
The explicit or relative filepath to the SSTable data file ending in Data.db.
Examples

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1

Mark SSTable as repaired

DataStax Enterprise must be stopped before you run this command.

$ sstablerepairedset --really-set --is-repaired /var/lib/cassandra/data/cycling/


cyclist_category-e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Data.db

There is no command output.

Use file to list SSTables to mark as unrepaired

DataStax Enterprise must be stopped before you run this command.

$ sstablerepairedset --is-unrepaired -f repairSetSSTables.txt

where the repairSetSSTables.txt file contains a list of SSTables (*Data.db) files, like:

/data/cycling/cyclist_by_country-82246fc065ff11e5a4c58b496c707234/ma-1-big-Data.db
/data/cycling/cyclist_by_birthday-8248246065ff11e5a4c58b496c707234/ma-1-big-Data.db
/data/cycling/cyclist_by_birthday-8248246065ff11e5a4c58b496c707234/ma-2-big-Data.db
/data/cycling/cyclist_by_age-8201305065ff11e5a4c58b496c707234/ma-1-big-Data.db

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
945
DataStax Enterprise tools

/data/cycling/cyclist_by_age-8201305065ff11e5a4c58b496c707234/ma-2-big-Data.db

sstablescrub
Scrubs the SSTable for the provided table.
The sstablescrub utility is an offline version of nodetool scrub. It attempts to remove the corrupted parts while
preserving non-corrupted data. Because sstablescrub runs offline, it can correct errors that nodetool scrub
cannot. If an SSTable cannot be read due to corruption, it will be left on disk.
If scrubbing results in dropping rows, new SSTables become unrepaired. However, if no bad rows are detected,
the SSTable keeps its original repairedAt field, which denotes the time of the repair.

DataStax Enterprise must be stopped before you run this command.

Synopsis

$ sstablescrub [--debug] [-e arg] [-h] [-j arg] [-m] [-n] [-r] [-s] [-v] keyspace_name
table_name [-sstable-files arg]

Table 326: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
946
DataStax Enterprise tools

--debug
Display stack traces.
-e, --header-fix argument
Check SSTable serialization-headers and repair issues. Takes the following arguments:
validate-only
Validate serialization-headers only. Do not attempt any repairs and do not continue with the
scrub once the validation is complete.
validate
Validate serialization-headers and continue with the scrub once the validation is complete.
(Default)
fix-only
Validate and repair only the serialization-headers. Do not continue with the scrub once
serialization-header validation and repairs are complete.
fix
Validate and repair serialization-headers and perform a normal scrub. Do not repair and do not
continue with the scrub if serialization-header validation encounters errors.
off
Do not perform serialization-header validation checks.
-h, --help
Display the usage and listing of the commands.
-j, --jobs
Number of sstables to scrub simultaneously. Defaults to the minimum between either the number of
available processors and 8.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
-m, --manifest-check
Check and repair only the leveled manifest. Do not scrub the SSTables.
-n, --no-validate
Do not validate columns using column validator.
-r, --reinsert-overflowed-ttl
Rewrite rows with overflowed expiration date affected by CASSANDRA-14092 with the maximum
supported expiration date of 2038-01-19T03:14:06+00:00. Rows are rewritten with the original
timestamp incremented by one millisecond to override/supersede any potential tombstone that might
have been generated during compaction of the affected rows. See https://docs.datastax.com/en/dse-
trblshoot/doc/troubleshooting/recoveringTtlYear2038Problem.html.
-s, --skip-corrupted
Skips corrupt rows in counter tables.
--sstable-files
Instead of processing all SSTables in the default data directories, process only the tables specified
via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all
SSTables within that directory are processed. Snapshots and backups are not supported with this
option.
table_name
Table name. Required.
-v,--verbose
Verbose output.
Examples

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph
================================

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
947
DataStax Enterprise tools

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1

DataStax Enterprise must be stopped before you run this command.

Scrub all SSTables for the calendar table

$ sstablescrub cycling calendar

Scrub only particular SSTables for the calendar table

$ sstablescrub cycling calendar --stable-files /var/lib/cassandra/data/cycling/calendar-


eebb/ac-1-bti-Data.db \ /var/lib/cassandra/data/cycling/calendar-aacc/ac-2-bti-Data.db

sstablesplit
Splits SSTable files into multiple SSTables of a maximum designated size.
For example, if SizeTieredCompactionStrategy was used for a major compaction and results in an excessively
large SSTable, split the table to ensure that compaction occurs before the next huge compaction.

DataStax Enterprise must be stopped before you run this command.

Synopsis

$ sstablessplit [--debug] [-h] [--no_snapshot] [-s max_size_in_MB] sstable_filepath


[sstable_filepath ...]

Table 327: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
948
DataStax Enterprise tools

Syntax conventions Description

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

--debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
--no-snapshot
Do not snapshot SSTables before splitting.
-s, --size max_size_in_MB
Maximum size in MB for output SSTables. Default: 50.
sstable_filepath
Filepath to an SSTable.
Examples

Verify DataStax Enterprise is not running

$ nodetool status

Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1

DataStax Enterprise must be stopped before you run this command.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
949
DataStax Enterprise tools

Split SSTables to 10 MB

$ sstablesplit /var/lib/cassandra/data/cycling/cyclist_category-
e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Statistics.db 10

Skipping inexisting file 10


Skipping /var/lib/cassandra/data/cycling/cyclist_category-
e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Data.db: it's size (0.000 MB) is less than the
split size (50 MB)
No sstables needed splitting.

sstableupgrade
Upgrades the SSTables in the given table or snapshot to the current version of Cassandra.
Synopsis

$ sstableupgrade [--debug] [-h | --help] [[-k | --keep-source] | [--keep-generation]] [-


b | --backups] [-o | --output-dir output-dir] [--schema schema-file [--schema schema-file2
... ]] [--sstable-files sstable] [-t | --throughput rate-limit] [--temp-storage tmp-dir]
keyspace_name table_name [snapshot_name]

SSTable compatibility
For details on SSTable versions and compatibility, see DataStax Enterprise, Apache Cassandra, CQL, and
SSTable compatibility.
Definition
The short form and long form parameters are comma-separated.

Command arguments

--debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
-k, --keep-source
Do not delete the source SSTables. Do not use with the --keep-generation option.
-b, --backups
Rewrite incremental backups for the given table. May not be combined with the snapshot_name
option.
--keep-generation
Keep the SSTable generation. Do not use with the --keep-source option.
-o, --output-dir
Rewritten files are placed in output-dir/keyspace-name/table-name-and-id.
--schema
Allows upgrading and downgrading SSTables using the schema of the table in a CQL file containing
the DDL statements to re-create the schema. Must be a DDL file that allows the recreation of the table
including dropped columns. Repeat the option to specify multiple DDL schema files.
Always use the schema.cql from a snapshot of the table so that the DDL has all of the information
omitted by DESCRIBE TABLE, including dropped columns.
--sstable-files
Instead of processing all SSTables in the default data directories, process only the tables specified
via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
950
DataStax Enterprise tools

SSTables within that directory are processed. Snapshots and backups are not supported with this
option.
-t, --throughput
Set to limit the maximum disk read rate in MB/s.
--temp-storage
When used with --schema, specifies location of temporary data. Directory and contents are deleted
when the tool terminates. Directory must not be shared with other tools and must be empty. If not
specified the default directory is /tmp.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
snapshot_name
Snapshot name.

• Rewrites only the specified snapshot.

• Replaces files in the given snapshot and breaks any hard links to live SSTables.

• Required before attempting to restore a snapshot taken in a different DSE version than the one that
is currently running.

Examples

Upgrade events table in the cycling keyspace

$ sstableupgrade cycling events

Found 0 sstables that need upgrading.

The SSTables are already on the current version, so the command returns immediately and no action is taken.
sstableutil
Lists SSTable files for the provided table.
Synopsis

$ sstableutil [-c] [-d] [-h] [-o] [-t type] [-v] keyspace_name table_name

Table 328: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
951
DataStax Enterprise tools

Syntax conventions Description

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

-c, --cleanup
Clean up any outstanding transactions.
-d, --debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
-o, --oplog
Include operation logs.
table_name
Table name. Required.
-t, --type type
Type of files:

• all - all final and temporary files

• tmp - only temporary files

• final - only final files

-v,--verbose
Verbose output.
Examples

List SSTables files for comments table on cycling keyspace

$ sstableutil cycling comments

Listing files...
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
CompressionInfo.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-Data.db

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
952
DataStax Enterprise tools

/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Digest.crc32
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Filter.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Partitions.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-Rows.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Statistics.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-TOC.txt
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
CompressionInfo.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-Data.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Digest.crc32
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Filter.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Partitions.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-Rows.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Statistics.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-TOC.txt

sstableverify
Verifies the SSTable for the given table.
Synopsis

$ sstableverify [--debug] [-e] [-h] [-v] keyspace_name table_name

Table 329: Legend


Syntax conventions Description

UPPERCASE Literal keyword.

Lowercase Not literal.

Italics Variable value. Replace with a valid option or user-defined value.

[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.

( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.

... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.

'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.

{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.

<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.

cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
953
DataStax Enterprise tools

Syntax conventions Description

' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.

Definition
The short form and long form parameters are comma-separated.

Command arguments

--debug
Display stack traces.
-e, --extended
Extended verification.
-h, --help
Display the usage and listing of the commands.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
-v,--verbose
Verbose output.
Examples

Verifies cyclist_name table on cycling keyspace

$ sstableverify cycling cyclist_name

Verifying
TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-2-bti-Data.db') (0.151KiB)
Deserializing sstable metadata for TrieIndexSSTableReader(path='/var/lib/cassandra/data/
cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-2-bti-Data.db')
Checking computed hash of TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-2-bti-Data.db')
Verifying TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-3-bti-Data.db') (0.131KiB)
Deserializing sstable metadata for TrieIndexSSTableReader(path='/var/lib/cassandra/data/
cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-3-bti-Data.db')
Checking computed hash of TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-3-bti-Data.db')

DataStax tools
Tools that are installed separately and used across products. See DataStax tools.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
954
DataStax Enterprise tools

Preflight check tool


The preflight check tool is a collection of tests that can be run on a DataStax Enterprise node to detect and fix
node configurations. The tool can detect and optionally fix many invalid or suboptimal configuration settings, such
as user resource limits, swap, and disk settings.
The preflight check tool is included in the following location based on your installation type:

Installation type Location

Package installation /usr/share/dse/tools/pfc

Tarball installation install_location/tools/pfc

Usage
Run the preflight check without options to run all tests.

$ sudo ./preflight_check options

Table 330: Options


Short Long Description

-h -help Show help and exit.

-f -fix Attempt to fix issues.

--yaml=YAML_LOCATION Location of cassandra.yaml file

--devices=DEVICES Comma separated lists of HDDs: /dev/sda,/dev/sdb,...

--disk-duration=DISK_DURATION Time (in seconds) for each test disk benchmark. Set to simulate a normal
load.

--disk-threads=DISK_THREADS Number of threads for each disk benchmark. Set to simulate a normal load.

--ssd=SSD Comma separated lists of SSDs: /dev/sda,/dev/sdb,...

--nossd The node does not have SSDs.

Creating a new test


Complete the following steps to create your own test:

1. Create a new Python file in /checks:

$ cd /checks

$ touch my_test.py

2. Add the new test to the __all__ section of /checks/init.py:

__all__ = ['my_test', 'disk', 'blockdev', ...]

3. Add your test to the preflight_check script.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
955
DataStax Enterprise tools

4. Run the preflight check script with the new test:

$ sudo ./preflight_check options

cluster_check and yaml_diff tools


The cluster_check and yaml_diff tools check the differences between cassandra.yaml or dse.yaml files. This
check is particularly useful during upgrades.
Prerequisites:
PyYAML must be installed. To install:

$ pip install pyyaml && pip install termcolor ## Optional. Install for colored output.

These examples check the differences between cassandra.yaml files.

• To check differences between YAML files:

$ cd /usr/share/dse/tools/yamls && ./yaml_diff path/to/cassandra.yaml path/to/


cassandra.yaml.new

The Missing Settings section of the report lists both missing and deprecated settings.

• To check the differences between each node's YAML in a datacenter:


For ease of use, use password-less SSH access from the current node to all other nodes.

$ cd /usr/share/dse/tools/yamls && ./cluster_check /path/to/cassandra.yaml [/path/to/


nodelist]

The nodelist parameter is optional since the script checks for the list of IP addresses contained in
nodetool status. The format for the nodelist file is one address per line.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
956
Chapter 9. Operations

Starting and stopping DataStax Enterprise


After you install and configure DataStax Enterprise on one or more nodes, start your cluster beginning with the
seed nodes. In a mixed-workload DataStax Enterprise cluster, you must start the analytics seed node first.
Packaged installations include start-up and stop scripts for running DataStax Enterprise as a service. Binary
tarballs do not.
OpsCenter provides a options in the Nodes UI for starting, stopping, and restarting DSE on a node. See Starting
DSE on a node, Stopping DSE on a node, and Restarting DSE on a node.
Starting DataStax Enterprise as a service
Steps for starting the DataStax Enterprise (DSE) service when DataStax Enterprise was installed from RHEL or
Debian packages.
All nodes types are DataStax Enterprise nodes and run the database.
Considerations for starting a cluster
Be aware of the following when starting a DataStax Enterprise cluster:
Nodes must be segregated by datacenters
Transactional, DSE Search, DSE Analytics, and SearchAnalytics nodes must be in separate
datacenters. For example, in a cluster with both DSE Search and transactional nodes, all DSE Search
nodes must be in a one or more search datacenters and all transactional nodes must be in one or more
datacenters.
DSE Graph can be enabled on any node in any datacenter.
Deploying a mixed-workload cluster
When deploying one or more datacenters for each type of node, first determine which nodes to
start as transactional, analytic, DSE Graph only, DSE Graph plus other types, DSE Search, and
SearchAnalytics nodes. Deploy in this order:

1. Analytic seed nodes.

2. Transactional or DSE Graph-only seed nodes.

3. DSE Search seed nodes.

4. SearchAnalytics nodes.

5. Remaining nodes one at a time. See Initializing multiple datacenters per workload type.

DSE Analytics nodes


Before starting DSE Analytics nodes, ensure that the replication factor is configured correctly for the
analytics keyspaces. Every time you add a new datacenter, you must manually increase the replication
factor of the dse_leases keyspace for the new DSE Analytics datacenter.
Start up commands
Set the type of node in the /etc/default/dse file. (Start-up scripts are also available in /etc/init.d.)
Command Description

GRAPH_ENABLED=1 Starts the node as a DSE Graph node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
957
Operations

Command Description

SPARK_ENABLED=1 Starts the node as a Spark node and starts the Spark Master service.

SOLR_ENABLED=1 Starts the node as a DSE Search node.

Transactional-only or BYOS nodes NODE_TYPES=0 or not present.

Table 331: Examples


Node type Settings

Spark Analytics node SPARK_ENABLED=1


SOLR_ENABLED=0
GRAPH_ENABLED=0

or

SPARK_ENABLED=1

No entry is the same as disabling it.

Spark Analytics, DSE Graph, and DSE Search node SPARK_ENABLED=1


GRAPH_ENABLED=1
SOLR_ENABLED=1

BYOS (Bring Your Own Spark) Set BYOS nodes as transactional nodes:
Spark nodes run in separate Spark cluster from a vendor other than All_NODE_TYPES=0 or not present.
DataStax.

DSE Graph and BYOS GRAPH_ENABLED=1

SearchAnalytics nodes SPARK_ENABLED=1


SOLR_ENABLED=1
An integrated DSE SearchAnalytics cluster allows analytics jobs to be
performed using CQL queries.

Prerequisites: Be sure to read the Considerations for starting a cluster.


You can also use OpsCenter to start and stop nodes.

1. If DataStax Enterprise is running, stop the node.

2. Set the node type in the /etc/default/dse file. For example, to a Spark node:

SPARK_ENABLED=1
SOLR_ENABLED=0
GRAPH_ENABLED=0

Alternately, you can omit the other start up entries and just use SPARK_ENABLED=1.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
958
Operations

3. Start DataStax Enterprise:

$ sudo service dse start

If the following error appears, look for DataStax Enterprise times out when starting and other articles in
the Support Knowledge Center.

WARNING: Timed out while waiting for DSE to start.

4. To verify that the cluster is running:

$ nodetool status

If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.

The nodetool command shows the node type and the status. For a transactional node running in a
normal state (UN) with virtual nodes (vnodes) enabled shows:

Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 127.0.0.1 82.43 KB 128 ? 40725dc8-7843-43ae-9c98-7c532b1f517e
rack1

For example, a running node in a normal state (UN) with DSE Analytics without vnodes enabled shows:

Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 172.16.222.136 103.24 KB ? 3c1d0657-0990-4f78-a3c0-3e0c37fc3a06
1647352612226902707 rack1

Starting DataStax Enterprise as a stand-alone process


Steps for starting the DataStax Enterprise (DSE) process when DataStax Enterprise was installed from a tarball.
All nodes types are DataStax Enterprise nodes and run the database.
Considerations for starting a cluster
Be aware of the following when starting a DataStax Enterprise cluster:
Nodes must be segregated by datacenters
Transactional, DSE Search, DSE Analytics, and SearchAnalytics nodes must be in separate
datacenters. For example, in a cluster with both DSE Search and transactional nodes, all DSE Search
nodes must be in a one or more search datacenters and all transactional nodes must be in one or more
datacenters.
DSE Graph can be enabled on any node in any datacenter.
Deploying a mixed-workload cluster
When deploying one or more datacenters for each type of node, first determine which nodes to
start as transactional, analytic, DSE Graph only, DSE Graph plus other types, DSE Search, and
SearchAnalytics nodes. Deploy in this order:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
959
Operations

1. Analytic seed nodes.

2. Transactional or DSE Graph-only seed nodes.

3. DSE Search seed nodes.

4. SearchAnalytics nodes.

5. Remaining nodes one at a time. See Initializing multiple datacenters per workload type.

DSE Analytics nodes


Before starting DSE Analytics nodes, ensure that the replication factor is configured correctly for the
analytics keyspaces. Every time you add a new datacenter, you must manually increase the replication
factor of the dse_leases keyspace for the new DSE Analytics datacenter.
Start up commands
1. Start the node from the installation_location.

2. Set the type.


Node/datacenter Command

Transactional only bin/dse cassandra

DSE Graph bin/dse cassandra -g

DSE Analytics with Spark bin/dse cassandra -k

DSE Search bin/dse cassandra -s

When multiple flags are used, list them separately on the command line. For example, ensure there is a space
between -k and -s in dse cassandra -k -s.

Table 332: Starting examples


Node type Settings

From the installation_location:

Spark Analytics, DSE Graph, and DSE Search node bin/dse cassandra -k -g -s

BYOS (Bring Your Own Spark) bin/dse cassandra


Spark nodes run in separate Spark cluster from a vendor other than
DataStax.

DSE Graph and BYOS bin/dse cassandra -g

SearchAnalytics nodes bin/dse cassandra -k -s


An integrated DSE SearchAnalytics datacenter allows analytics jobs to
be performed using CQL queries.

Prerequisites: Be sure to read the Considerations for starting a cluster.


You can also use OpsCenter to start and stop nodes.

1. If DataStax Enterprise is running, stop the node.

2. From the install directory, start the node. For example, to set a Spark node:
bin/dse cassandra -k

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
960
Operations

3. To check that your ring is up and running:

$ cd installation_location && bin/nodetool status

where the installation directory is the directory where you installed DSE.

If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.

The nodetool command shows the node type and the status. For a transactional node running in a
normal state (UN) with virtual nodes (vnodes) enabled shows:

Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 127.0.0.1 82.43 KB 128 ? 40725dc8-7843-43ae-9c98-7c532b1f517e
rack1

For example, a running node in a normal state (UN) with DSE Analytics without vnodes enabled shows:

Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 172.16.222.136 103.24 KB ? 3c1d0657-0990-4f78-a3c0-3e0c37fc3a06
1647352612226902707 rack1

Stopping a DataStax Enterprise node


To speed up the restart process, run nodetool drain before stopping the dse service. This step writes the current
memtables to disk. When you restart the node, the commit log is not read which speeds the restart process. If
you have durable writes set to false, which is unlikely, there is no commit log and you must drain the node to
prevent losing data.

To stop DataStax Enterprise running as a service:

$ nodetool drain

$ sudo service dse stop

To stop DataStax Enterprise running as a stand-alone process:

Running nodetool drain before using the cassandra-stop command to stop a stand-alone process is not
necessary because the cassandra-stop command drains the node before stopping it.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
961
Operations

From the installation location:

$ bin/dse cassandra-stop

Use sudo if required.

In the unlikely event that the cassandra-stop command fails because it cannot find the process DataStax
Enterprise Java process ID (PID), the output instructs you to find the DataStax Enterprise Java process ID (PID)
manually, and stop the process using its PID number.

ps auwx | grep dse

Use the PID, in the second column of the output, to stop the database.

bin/dse cassandra-stop -p PID

Adding or removing nodes, datacenters, or clusters


Adding nodes to vnode-enabled cluster
Virtual nodes (vnodes) greatly simplify adding nodes to an existing cluster:

• Calculating tokens and assigning them to each node is no longer required.

• Rebalancing the nodes within a datacenter is no longer necessary because a node joining the datacenter
assumes responsibility for an even portion of the data.

For a detailed explanation about how vnodes work, see Virtual nodes.

When adding multiple nodes to the cluster using the allocation algorithm, ensure that nodes are added one at
a time. If nodes are added concurrently, the algorithm assigns the same tokens to different nodes.

If you do not use vnodes, see Adding single-token nodes to a cluster.

Be sure to use the same version of DataStax Enterprise on all nodes in the cluster, as described in the
installation instructions.

1. Install DataStax Enterprise on the new nodes, but do not start DataStax Enterprise.

If your DataStax Enterprise installation started automatically, you must stop the node and clear the
data.

2. Copy the snitch properties file from another node in the same center datacenter to the node you are adding.

• cassandra-topology.properties file is used by the PropertyFileSnitch.


Add an entry for the new node, IP_address=dc_name:rack_name

• cassandra-rackdc.properties file is used by the GossipingPropertyFileSnitch, Configuring the Amazon


EC2 single-region snitch, Configuring Amazon EC2 multi-region snitch, and Configuring the Google
Cloud Platform snitch adjust the rack number if required.

3. Set the following properties in the cassandra.yaml file:

• Dynamically allocating tokens based on the keyspace replication factors in the datacenter:

auto_bootstrap: true

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
962
Operations

cluster_name: 'cluster_name'
listen_address:
endpoint_snitch: snitch_name
num_tokens: 8
allocate_tokens_for_local_replication_factor: RF_number
seed_provider:
- class_name: seedprovider_name
parameters:
- seeds: "IP_address_list"

For RF_number if the keyspaces in the datacenter have different replication factors (RF), use the
factor of the most data intensive keyspace, or when multiple keyspaces with equal data intensity
exist, use the highest RF. When adding multiple nodes alternate between the different RF.

• Randomly assign tokens:

auto_bootstrap: true
cluster_name: 'cluster_name'
listen_address:
endpoint_snitch: snitch_name
num_tokens: 128
seed_provider:
- class_name: seedprovider_name
parameters:
- seeds: "IP_address_list"

Manually add the auto_bootstrap setting if it does not exist in the cassandra.yaml. The other settings
should exist in the default cassandra.yaml file, ensure that you uncomment and set.

Seed nodes cannot bootstrap. Make sure the new node is not listed in the -seeds list. Do not make
all nodes seed nodes. See Internode communications (gossip).

4. Change any other non-default settings you have made to your existing cluster in the cassandra.yaml file
and cassandra-topology.properties or cassandra-rackdc.properties files. Use the diff command to
find and merge any differences between existing and new nodes.

5. Start the bootstrap node, see Starting DataStax Enterprise as a service or Starting DataStax Enterprise as a
stand-alone process.

6. Verify that the node is fully bootstrapped using nodetool status. All other nodes must be up (UN) and not in
any other state.

7. After all new nodes are running, run nodetool cleanup on each of the previously existing nodes to remove
the keys that no longer belong to those nodes. Wait for cleanup to complete on one node before running
nodetool cleanup on the next node.

Failure to run nodetool cleanup after adding a node may result in data inconsistencies including
resurrection of previously deleted data.

Adding a datacenter to a cluster


Complete the following steps to add a datacenter to an existing cluster.
Prerequisites:
Complete the prerequisite tasks outlined in Initializing a DataStax Enterprise cluster to prepare the
environment.

If the new datacenter uses existing nodes from another datacenter or cluster, complete the following steps to
ensure that old data will not interfere with the new cluster:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
963
Operations

1. If the nodes are behind a firewall, open the required ports for internal/external communication.

2. Decommission each node that will be added to the new datacenter.

3. Clear the data from DataStax Enterprise (DSE) to completely remove application directories.

4. Install DSE on each node.

1. Complete the following steps to prevent client applications from prematurely connecting to the new
datacenter, and to ensure that the consistency level for reads or writes does not query the new datacenter:

If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.

a. Configure client applications to use the DCAwareRoundRobinPolicy.

b. Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter,
which might not have any data.

c. If using the QUORUM consistency level, change to LOCAL_QUORUM.

d. If using the ONE consistency level, set to LOCAL_ONE.

See the programming instructions for your driver.

2. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.

a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.

ALTER KEYSPACE keyspace_name WITH REPLICATION =


{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3};

• DSE security: system_auth, dse_security

• DSE performance: dse_perf

• DSE analytics: dse_leases, dsefs

• System resources: system_traces, system_distributed

• OpsCenter (if installed)

• All keyspaces created by users

b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.

DESCRIBE SCHEMA ;

CREATE KEYSPACE dse_perf WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_leases WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
964
Operations

CREATE KEYSPACE dsefs WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_security WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;

3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.

Use the same version of DSE on all nodes in the cluster.

4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.

Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.

a. Configure node properties:

• -seeds: internal_IP_address of each seed node


Include at least one seed node from each datacenter. DataStax recommends more than
one seed node per datacenter, in more than one rack. Do not make all nodes seed
nodes.

• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.

• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.

• endpoint_snitch: snitch
See endpoint_snitch and snitches.

Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.

Snitch Configuration file

GossipingPropertyFileSnitch cassandra-rackdc.properties file

Amazon EC2 single-region snitch

Amazon EC2 multi-region snitch

Google Cloud Platform snitch

PropertyFileSnitch cassandra-topology.properties file

• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.

b. Configure node architecture (all nodes in the datacenter must use the same type):

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
965
Operations

Virtual node (vnode) allocation algorithm settings

• Set num_tokens to 8 (recommended).

• Set allocate_tokens_for_local_replication_factor to the target replication factor for keyspaces


in the new datacenter. If the keyspace RF varies, alternate the settings to use all the
replication factors.

• Comment out the initial_token property.

DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.
For more information, refer to Virtual node (vnode) configuration.
Single-token architecture settings

• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.

• Comment out both num_tokens and allocate_tokens_for_local_replication_factor.

5. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties


(PropertyFileSnitch) file, assign datacenter and rack names to the IP addresses of each node, and assign a
default datacenter name and rack name for unknown nodes.

Migration information: The GossipingPropertyFileSnitch always loads cassandra-


topology.properties when the file is present. Remove the file from each node on any new cluster,
or any cluster migrated from the PropertyFileSnitch.

# Transactional Node IP=Datacenter:Rack


110.82.155.0=DC_Transactional:RAC1
110.82.155.1=DC_Transactional:RAC1
110.54.125.1=DC_Transactional:RAC2
110.54.125.2=DC_Analytics:RAC1
110.54.155.2=DC_Analytics:RAC2
110.82.155.3=DC_Analytics:RAC1
110.54.125.3=DC_Search:RAC1
110.82.155.4=DC_Search:RAC2

# default for unknown nodes


default=DC1:RAC1

After making any changes in the configuration files, you must the restart the node for the changes to
take effect.

6. Make the following changes in the existing datacenters.

a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.

b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.

7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:

• Package installations: Starting DataStax Enterprise as a service

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
966
Operations

• Tarball installations: Starting DataStax Enterprise as a stand-alone process

8. Rotate starting DSE through the racks until all the nodes are up.

9. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.

ALTER KEYSPACE keyspace_name WITH REPLICATION =


{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3, 'NewDC2' : 2};

If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.

10. Run nodetool rebuild on each node in the new datacenter, specifying the datacenter to rebuild from. This
step replicates the data to the new datacenter in the cluster.

$ nodetool rebuild -- datacenter_name

You must specify an existing datacenter in the command line, or the new nodes will appear to rebuild
successfully, but might not contain all anticipated data.
Requests to the new datacenter with LOCAL_ONE or ONE consistency levels can fail if the existing
datacenters are not completely in-sync.

a. Use nodetool rebuild on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the existing cluster.

b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.

11. Check that your cluster is up and running:

$ dsetool status

If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.

The datacenters in the cluster are now replicating with each other.

DC: Cassandra Workload: Cassandra Graph: no


==============================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 110.82.155.0 21.33 KB 256 33.3% a9fa31c7-f3c0-... RAC1
UN 110.82.155.1 21.33 KB 256 33.3% f5bb416c-db51-... RAC1
UN 110.54.125.1 21.33 KB 256 16.7% b836748f-c94f-... RAC1

DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.2 28.44 KB 13.0.% e2451cdf-f070- ... -922337.... RAC1
UN 110.82.155.2 44.47 KB 16.7% f9fa427c-a2c5- ... 30745512... RAC1

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
967
Operations

UN 110.82.155.3 54.33 KB 23.6% b9fc31c7-3bc0- ..- 45674488... RAC1

DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.3 15.44 KB 50.2.% e2451cdf-f070- ... 9243578.... RAC1
UN 110.82.155.4 18.78 KB 49.8.% e2451cdf-f070- ... 10000 RAC1

Adding a datacenter to a cluster using a designated datacenter as a data source


Complete the following steps to add a datacenter to an existing cluster using a designated datacenter as a data
source. In this procedure, a new datacenter, DC4 is added to an existing cluster with existing datacenters DC1,
DC2, and DC3.
Prerequisites:
Complete the prerequisite tasks outlined in Initializing a DataStax Enterprise cluster to prepare the
environment.

1. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.

a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.

ALTER KEYSPACE keyspace_name WITH REPLICATION =


{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3};

• DSE security: system_auth, dse_security

• DSE performance: dse_perf

• DSE analytics: dse_leases, dsefs

• System resources: system_traces, system_distributed

• OpsCenter (if installed)

• All keyspaces created by users

b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.

DESCRIBE SCHEMA ;

CREATE KEYSPACE dse_perf WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_leases WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dsefs WITH replication =


{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
...

CREATE KEYSPACE dse_security WITH replication =

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
968
Operations

{'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;

2. Stop the OpsCenter Repair Service if it is running in the cluster. See Turning the Repair Service off.

3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.

Use the same version of DSE on all nodes in the cluster.

4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.

Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.

a. Configure node properties:

• -seeds: internal_IP_address of each seed node


Include at least one seed node from each datacenter. DataStax recommends more than
one seed node per datacenter, in more than one rack. Do not make all nodes seed
nodes.

• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.

• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.

• endpoint_snitch: snitch
See endpoint_snitch and snitches.

Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.

Snitch Configuration file

GossipingPropertyFileSnitch cassandra-rackdc.properties file

Amazon EC2 single-region snitch

Amazon EC2 multi-region snitch

Google Cloud Platform snitch

PropertyFileSnitch cassandra-topology.properties file

• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.

b. Configure node architecture (all nodes in the datacenter must use the same type):
Virtual node (vnode) allocation algorithm settings

• Set num_tokens to 8 (recommended).

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
969
Operations

• Set allocate_tokens_for_local_replication_factor to the target replication factor for keyspaces


in the new datacenter. If the keyspace RF varies, alternate the settings to use all the
replication factors.

• Comment out the initial_token property.

DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.
For more information, refer to Virtual node (vnode) configuration.
Single-token architecture settings

• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.

• Comment out both num_tokens and allocate_tokens_for_local_replication_factor.

5. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties


(PropertyFileSnitch) file, assign datacenter and rack names to the IP addresses of each node, and assign a
default datacenter name and rack name for unknown nodes.

Migration information: The GossipingPropertyFileSnitch always loads cassandra-


topology.properties when the file is present. Remove the file from each node on any new cluster,
or any cluster migrated from the PropertyFileSnitch.

# Transactional Node IP=Datacenter:Rack


110.82.155.0=DC_Transactional:RAC1
110.82.155.1=DC_Transactional:RAC1
110.54.125.1=DC_Transactional:RAC2
110.54.125.2=DC_Analytics:RAC1
110.54.155.2=DC_Analytics:RAC2
110.82.155.3=DC_Analytics:RAC1
110.54.125.3=DC_Search:RAC1
110.82.155.4=DC_Search:RAC2

# default for unknown nodes


default=DC1:RAC1

After making any changes in the configuration files, you must the restart the node for the changes to
take effect.

6. Make the following changes in the existing datacenters.

a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.

b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.

7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:

• Package installations: Starting DataStax Enterprise as a service

• Tarball installations: Starting DataStax Enterprise as a stand-alone process

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
970
Operations

8. Install and configure DataStax Agents on each node in the new datacenter if necessary: Installing DataStax
Agents 6.5

9. Run nodetool status to ensure that new datacenter is up and running.

nodetool status

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a
-9223372036854775808 rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc
-9223372036854775798 rack1
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.111 961.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c
-9223372036854775788 rack1
Datacenter: DC4
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.114 361.56 KiB ? ac43e602-ef09-4d0d-a455-3322f444198c
-9223372036854775688 rack1

10. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.

ALTER KEYSPACE keyspace_name WITH REPLICATION =


{'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3, 'NewDC2' : 2};

If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.

11. Run nodetool rebuild on each node in the new datacenter, specifying the corresponding datacenter/rack
from the source datacenter.

$ nodetool rebuild -dc source_datacenter_name:source_datacenter_rack_name

The following commands replicate data from an existing datacenter DC1 to the new datacenter DC2 on
each DC2 node. The rack specifications correspond with the rack specifications in DC1:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
971
Operations

On DC2:RACK1 nodes run:

$ nodetool rebuild -dc DC1:RACK1

On DC2:RACK2 nodes run:

$ nodetool rebuild -dc DC1:RACK2

On DC2:RACK3 nodes run:

$ nodetool rebuild -dc DC1:RACK3

a. Use nodetool rebuild -dc on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the source datacenter.

b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.
Rebuild can be safely run in parallel, but has potential performance tradeoffs. The nodes
in in the source datacenter will be streaming data, so application performance involving
that datacenter's data will be potentially impacted. Run tests within a the environment,
adjusting various levels of parallelism and streaming throttling to strike the optimal balance
of speed and performance.

12. Monitor the rebuild progress for the new datacenter using nodetool netstats and examining the size of
each node.
The nodetool rebuild command issues a JMX call to the DSE node and waits for rebuild to
finish before returning to the command line. Once the JMX call is invoked, the rebuild process will
continue on the server regardless of the nodetool rebuild process (the rebuild will continue to run
if nodetool dies.) There is not typically significant output from the nodetool rebuild command itself.
Instead, rebuild progress should be monitored via nodetool netstats, as well as examining the
data size of each node.
The data load shown in nodetool status will only be updated after a given source node is
done streaming, so it will appear to lag behind bytes reported on disk (e.g. du). If any streaming
errors occur, ERROR messages will be logged to system.log and the rebuild will stop. In the
event of temporary failure, nodetool rebuild can be re-run and skips any ranges that were
already successfully streamed.

13. Adjust stream throttling on the source datacenter as required to balance out network traffic. See nodetool
setstreamthroughput.

14. Confirm that all rebuilds are successful by searching for finished rebuild in the system.log of each node
in the new datacenter.

In rare cases the communication between two streaming nodes may hang, leaving the rebuild
operation alive but with no data streaming. Monitor streaming progress using nodetool netstats,
and, if the streams are not making any progress, restart the node where nodetool rebuild was
executed and re-run nodetool rebuild with the same parameters used originally.

15. Start the DataStax Agent on each node in the new datacenter if necessary.

16. Start the OpsCenter Repair Service if necessary. See Turning the Repair Service on.

Replacing a dead node or dead seed node


Steps to replace a node that has died for some reason, such as hardware failure.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
972
Operations

The procedure for replacing a dead node is the same for vnodes and single-token nodes. Extra steps are
required for replacing dead seed nodes.

Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.

1. Run nodetool status to verify that the node is dead (DN).

2. Record the datacenter, address, and rack settings of the dead node; you will use these later.

3. Add the replacement node to the network and record its IP address.

4. If the dead node was a seed node, change the cluster's seed node configuration on each node:

a. In the cassandra.yaml file for each node, remove the IP address of the dead node from the - seeds
list in the seed-provider property.

b. If the cluster needs a new seed node to replace the dead node, add the new node's IP address to
the - seeds list of the other nodes.

Making every node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is recommended to use
a small seed list (approximately three nodes per datacenter).

5. On an existing node, gather setting information for the new node from the cassandra.yaml file:

• cluster_name

• endpoint_snitch

• Other non-default settings: Use the diff tool to compare current settings with default settings.

6. Gather rack and datacenter information:

• If the cluster uses the PropertyFileSnitch, record the rack and data assignments listed in the
cassandra-topology.properties file, or copy the file to the new node.

• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, Configuring Amazon EC2 multi-region snitch, or Configuring the Google Cloud Platform
snitch, record the rack and datacenter assignments in the dead node's cassandra-rackdc.properties
file.

7. Make sure that the new node meets all prerequisites and then Install DataStax Enterprise on the new node,
but do not start DataStax Enterprise.
Be sure to install the same version of DataStax Enterprise as is installed on the other nodes in the cluster,
as described in the installation instructions.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
973
Operations

8. If DataStax Enterprise automatically started on the node, stop and clear the data that was added
automatically on startup.

9. Add values to the following properties in cassandra.yaml file from the information you gathered earlier:

• auto_bootstrap: If this setting exists and is set to false, set it to true. (This setting is not included in
the default cassandra.yaml configuration file.)

• cluster_name

• seed list
If the new node is a seed node, make sure it is not listed in its own - seeds list.

10. Add the rack and datacenter configuration:

• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, and Configuring Amazon EC2 multi-region snitch or Configuring the Google Cloud Platform
snitch:

a. Add the dead node's rack and datacenter assignments to the cassandra-rackdc.properties file
on the replacement node.
Do not remove the entry for the dead node's IP address yet.

b. Delete the cassandra-topology.properties file.

• If the cluster uses the PropertyFileSnitch:

a. Copy the cassandra-topology.properties file from an existing node, or add the settings to
the local copy.

b. Edit the file to add an entry with the new node's IP address and the dead node's rack and
datacenter assignments.

11. Start the new node with with the required options:
Package installations:

a. Add the following option to jvm.options:

-Dcassandra.replace_address_first_boot=address_of_dead_node

b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, add the
consistent_replace option to jvm.options using either QUORUM or LOCAL_QUORUM values to ensure data
consistency on the replacement node, otherwise the node may stream from a potentially inconsistent
replica, and reads may return stale data.
For example:

-Ddse.consistent_replace=LOCAL_QUORUM

Other options that control repair during a consistent replace are:

• consistent_replace.parallelism

• consistent_replace.retries

• consistent_replace.whitelist

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
974
Operations

c. Start the node.

d. After the node bootstraps, remove replace_address_first_boot and consistent_replace (if


specified) from jvm.options.

Tarball installations:

a. Add the following parameter to the start up command line:

$ sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=address_of_dead_node

b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, in addtion to
replace_address_first_boot, add the consistent_replace parameter using either QUORUM or
LOCAL_QUORUM values to ensure data consistency on the replacement node, otherwise the node may
stream from a potentially inconsistent replica, and reads may return stale data.
For example:

sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=address_of_dead_node


-Ddse.consistent_replace=LOCAL_QUORUM

Other options that control repair during a consistent replace are:

• consistent_replace.parallelism

• consistent_replace.retries

• consistent_replace.whitelist

12. Run nodetool status to verify that the new node has bootstrapped successfully.
Tarball path:

installation_location/resources/cassandra/bin

13. In environments that use the PropertyFileSnitch, wait at least 72 hours and then remove the old node's IP
address from the cassandra-topology.properties file.

This ensures that old node's information is removed from gossip. If removed from the property file too
soon, problems may result. Use nodetool gossipinfo to check the gossip status. The node is still in
gossip until LEFT status disappears.

The cassandra-rackdc.properties file does not contain IP information; therefore this step is not
required when using other snitches, such as GossipingPropertyFileSnitch.

Replacing a running node


Steps to replace a node with a new node, such as when updating to newer hardware or performing proactive
maintenance.

Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.

You can replace a running node in two ways:

• Adding a node and then decommissioning the old node

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
975
Operations

• Replacing a running node

Adding a node and then decommissioning the old node


You must prepare and start the replacement node, integrate it into the cluster, and then decommission the old
node.

Be sure to use the same version of DataStax Enterprise on all nodes in the cluster, as described in the
installation instructions.

1. Prepare and start the replacement node, as described in Adding nodes to an existing cluster.

If not using vnodes, see Adding single-token nodes to a cluster.

2. Confirm that the replacement node is alive:

• Run nodetool ring if not using vnodes.

• Run nodetool status if using vnodes.

Tarball path:

installation_location/resources/cassandra/bin

The status should show:

• nodetool ring: Up

• nodetool status: UN

3. Note the Host ID of the original node; it is used in the next step.

4. Using the Host ID of the original node, decommission the original node from the cluster using the nodetool
decommission command.

5. Run nodetool cleanup on all the other nodes in the same datacenter.

Failure to run nodetool cleanup after adding a node may result in data inconsistencies including
resurrection of previously deleted data.

Replacing a running node


You can replace a node that is currently running and avoid streaming the data twice or running cleanup using
these steps.

If you've written data using a consistency level of ONE, you risk losing data because the node might contain
the only copy of a record. Be absolutely sure that no application uses consistency level ONE.

1. Stop DataStax Enterprise on the node to be replaced.

2. Follow the instructions for replacing a dead node using the old node’s IP address for -
Dcassandra.replace_address.

3. Ensure that consistency level ONE is not used on this node.

Removing a node [Reduce the size of a datacenter. ]


Moving a node from one rack to another
A common task is moving a node from one rack to another. For example, when using GossipPropertyFileSnitch,
a common error is mistakenly placing a node in the wrong rack. To correct the error, use one of the following
procedures:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
976
Operations

• The preferred method is to decommission the node and re-add it to the correct rack and datacenter.
This method takes longer than the alternative method (below) because unneeded data is first removed from
the decommissioned node and then the node gets new data during bootstrapping. The alternative method
does both operations simultaneously.

• An alternative method is to update the node's topology and restart the node. Once the node is up, run a full
repair on the cluster.
This method has risks because until the repair is completed, the node may blindly handle requests for
data the node doesn't yet have. To mitigate this problem with request handling, start the node with -
Dcassandra.join_ring=false after repairing once, then fully join the node to the cluster using the JMX
method org.apache.cassandra.db.StorageService.joinRing(). The node will be less likely to be
out of sync with other nodes before it serves any requests. After joining the node to the cluster, repair the
node again, so that any writes missed during the first repair will be captured.

Decommissioning a datacenter
Steps to properly remove a datacenter so no information is lost.
To decommision a DSE datacenter:

1. Make sure no clients are still writing to any nodes in the datacenter.
When not using OpsCenter, the following JMX MBeans provide details on client connections and
pending requests:

• Active connections: org.apache.cassandra.metrics/Client/connectedNativeClients and


org.apache.cassandra.metrics/Client/connectedThriftClients

• Pending requests: org.apache.cassandra.metrics/ClientRequests/viewPendingMutations or


use nodetool tpstats.

2. Run a full repair with nodetool repair --full to ensure that all data is propagated from the datacenter being
decommissioned.
You can also use the OpsCenter Repair Service.
If using OpsCenter ensure that the repair has completed, see Checking the repair progress.

3. Shutdown the OpsCenter Repair Service if in use.

4. Change all keyspaces so they no longer reference the datacenter being removed.

5. Shutdown all nodes in the datacenter.

6. Stop the DataStax Agent on each node if in use.

7. Run nodetool assassinate on every node in the datacenter being removed:

nodetool assassinate remote_IP_address

If the RF (replication factor) on any keyspace has not been properly updated:

a. Note the name of the keyspace that needs to be updated.

b. Remove the datacenter from the keyspace RF (using ALTER KEYSPACE).

c. If the keyspace had RF simple strategy also run a full repair on the keyspace:

nodetool repair --full keyspace_name

8. Run nodetool status to ensure that the nodes in the datacenter were removed.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
977
Operations

Removing DC3 from the cluster:

1. Check the status of the cluster:

nodetool status

Status shows that there are three datacenters with 1 node in each:

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a
-9223372036854775808 rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc
-9223372036854775798 rack1
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.111 461.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c
-9223372036854775788 rack1

2. Run a full repair:

nodetool repair --full

3. Using JConsole, check the following JMX Beans to make sure there are no active connections:

• org.apache.cassandra.metrics/Client/connectedNativeClients

• org.apache.cassandra.metrics/Client/connectedThriftClients

4. Verify that there are no pending write requests on each node that is being removed (The Pending
column should read 0 or N/A):

nodetool tpstats

Pool Name Active Pending (w/


Backpressure) Delayed Completed...
BackgroundIoStage 0 0 (N/
A) N/A 640...
CompactionExecutor 0 0 (N/
A) N/A 1039...
GossipStage 0 0 (N/
A) N/A 4580...

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
978
Operations

HintsDispatcher 0 0 (N/
A) N/A 2...

5. Start cqlsh and remove DC3 from all keyspace configurations. Repeat for each keyspace that has a
RF set for DC3:

alter keyspace cycling WITH replication = {'class': 'NetworkTopologyStrategy',


'DC1':1,'DC2':2};

6. Shutdown the OpsCenter Repair Service if in use.

7. Shutdown all nodes in the datacenter.

8. Stop the DataStax Agent on each node if in use.

9. Run nodetool assassinate on each node in the DC3 (datacenter that is being removed):

nodetool assassinate remote_IP_address

10. In a remaining datacenter verify that the DC3 has been removed:

nodetool status

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.11 503.54 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a
-9223372036854775808 rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.113 522.47 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc
-9223372036854775798 rack1

Removing a node
Use these instructions when you want to remove nodes to reduce the size of your cluster, not for replacing a
dead node.
If you are not using Virtual nodes (vnodes), you must rebalance the cluster.

Prerequisites: If the node is a DSEFS node, follow this alternative node removal procedure: Removing a
DSEFS node.
Failure to follow the DSEFS procedure may result in data loss.

• Check whether the node is up or down using nodetool status:


The nodetool command shows the status of the node (UN=up, DN=down):

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
979
Operations

• If the node is up, run nodetool decommission.


This assigns the ranges that the node was responsible for to other nodes and replicates the data
appropriately.

To avoid excessive data streaming, make node topology changes one at a time.

Use nodetool netstats to monitor the progress.

Decommission does not shutdown the node, shutdown the node after decommission has completed.

• If the node is down, choose the appropriate option:

# If the cluster uses vnodes, remove the node using the nodetool removenode command.

# If the cluster does not use vnodes, before running the nodetool removenode command, adjust your
tokens to evenly distribute the data across the remaining nodes to avoid creating a hot spot.

• If removenode fails, run nodetool assassinate.

Changing the IP address of a node


To change the IP address of a node, simply change the IP of node and then restart DataStax Enterprise.

1. To speed up the restart process, before stopping the dse service, run nodetool drain.

2. Stop DataStax Enterprise.

3. Replace the old IP address in the cassandra.yaml with the new one.

• listen_address

• broadcast_address

• (Optional if already set) native_transport_address

4. If the node is a seed node, update the -seeds parameter in the seed_provider list cassandra.yaml file on all
nodes.

5. If the endpoint_snitch is PropertyFileSnitch, add an entry for the new IP address in the the cassandra-
topology.properties file on all nodes.

Do NOT remove the entry for the old IP address.

6. Update the DNS and the local host IP settings.

7. Start DSE on the local host.

8. If the using the PropertyFileSnitch, then perform a rolling restart.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
980
Operations

Switching snitches
Because snitches determine how the database distributes replicas, the procedure to switch snitches depends on
whether the topology of the cluster changes:

• If data has not been inserted into the cluster, there is no change in the network topology. This means that
you only need to set the snitch; no other steps are necessary.

• If data has been inserted into the cluster, it's possible that the topology has changed and you will need to
perform additional steps.

A change in topology means that there is a change in the datacenters and/or racks where the nodes are placed.
Topology changes may occur when the replicas are placed in different places by the new snitch. Specifically,
the replication strategy places the replicas based on the information provided by the new snitch. The following
examples demonstrate the differences:

• No topology change
Change from five nodes using the DseSimpleSnitch (default) in a single datacenter
To five nodes in one datacenter and 1 rack using a network snitch such as the GossipingPropertyFileSnitch

• Topology changes

# Change from 5 nodes using the DseSimpleSnitch (default) in a single datacenter


To 5 nodes in 2 datacenters using the GossipingPropertyFileSnitch (add a datacenter).
If splitting one datacenter into two, create a new datacenter with new nodes. Alter the keyspace
replication settings for the keyspace that originally existed to reflect that two datacenters now
exist. Once data is replicated to the new datacenter, remove the number of nodes from the original
datacenter that have "moved" to the new datacenter.

# Change from 5 nodes using the DseSimpleSnitch (default) in a single datacenter


To 5 nodes in 1 datacenter and 2 racks using the GossipingPropertyFileSnitch (add rack information).

Steps for switching snitches:

1. Create a properties file with datacenter and rack information.

• cassandra-rackdc.properties
GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region snitch, and Configuring
Amazon EC2 multi-region snitch only.

• cassandra-topology.properties
All other network snitches.

2. Copy the cassandra-rackdc.properties or cassandra-topology.properties file to the configuration directory on


all the cluster's nodes. They won't be used until the new snitch is enabled.

3. Change the snitch for each node in the cluster in the node's cassandra.yaml file. For example:

endpoint_snitch: GossipingPropertyFileSnitch

4. If the topology has not changed, you can restart each node one at a time.
Any change in the cassandra.yaml file requires a node restart.

5. If the topology of the network has changed, but no datacenters are added:

a. Shut down all the nodes, then restart them.

b. Run a sequential repair and nodetool cleanup on each node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
981
Operations

Failure to run nodetool cleanup after adding a node may result in data inconsistencies
including resurrection of previously deleted data.

6. If the topology of the network has changed and a datacenter is added:

a. Create a new datacenter.

b. Replicate data into new datacenter. Remove nodes from old datacenter.

c. Run a sequential repair and nodetool cleanup on each node.

Failure to run nodetool cleanup after adding a node may result in data inconsistencies
including resurrection of previously deleted data.

DataStax recommends stopping repair operations during topology changes; the Repair
Service does this automatically. Repairs running during a topology change are likely to error
when it involves moving ranges.

7. If migrating from the PropertyFileSnitch to the GossipingPropertyFileSnitch, remove the cassandra-


topology.properties file from each node on any new cluster after the migration is complete.

Changing keyspace replication strategy


A keyspace is created with a replication strategy. For development, the SimpleStrategy class is acceptable.
For production, you must use NetworkTopologyStrategy. To change the strategy, alter the distribution of nodes
within multiple datacenters by adding a datacenter, and then add data to the new nodes in the new datacenter
and remove nodes from the old datacenter.

1. If necessary, change the snitch to a network-aware setting.

2. Alter the keyspace properties using ALTER KEYSPACE:

• Example 1: Switch the keyspace cycling from SimpleStrategy to NetworkTopologyStrategy for a


single datacenter:

ALTER KEYSPACE cycling WITH REPLICATION = {'class' : 'NetworkTopologyStrategy',


'DC1' : 3};

• Example 2: Switch the keyspace cycling from SimpleStrategy to NetworkTopologyStrategy for two
datacenters:

ALTER KEYSPACE cycling WITH REPLICATION = {'class' : 'NetworkTopologyStrategy',


'DC1' : 3, 'DC2' : 2 };

Simply altering the keyspace may lead to faulty data replication.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
982
Operations

3. Run nodetool repair using the -full option on each node affected by the change.

$ nodetool repair -full keyspace

Tarball path:

installation_location/resources/cassandra/bin

It is possible to restrict the replication of a keyspace to selected datacenters or a single datacenter. To


do this, use the NetworkTopologyStrategy and set the replication factors of the excluded datacenters to 0
(zero):

ALTER KEYSPACE cycling WITH REPLICATION = {'class' : 'NetworkTopologyStrategy',


'DC1' : 0, 'DC2' : 3, 'DC3' : 0 };

See Modifying the replication factor.

Migrating or renaming a cluster


The information on this page is intended for the following types of scenarios:

• Migrating a cluster, including transitioning an EC2 cluster to Amazon virtual private cloud (VPC), moving a
cluster, or upgrading from an early version cluster to a recent major version.

• Renaming a cluster. You cannot change the name of an existing cluster; you must create a new cluster and
migrate your data to the new cluster.

The following method migrates a cluster without service interruption and ensures that if a problem occurs in the
new cluster, you still have an existing cluster as a fallback.

1. Set up and configure the new cluster as described in Initializing a DataStax Enterprise cluster.

If you're not using vnodes, be sure to configure the token ranges in the new nodes to match the
ranges in the old cluster. See Initializing single-token architecture datacenters.

2. Set up the schema for the new cluster using CQL.

3. Configure your client to write to both clusters.

Depending on how the writes are implemented, code changes may be required. Be sure to use
identical consistency levels.

4. Ensure that the data is flowing to the new nodes so you won't have any gaps when you copy the snapshots
to the new cluster in 6.

5. Snapshot the old cluster.

6. Copy the data files from your keyspaces to the nodes.

• You can copy the data files to their matching nodes in the new cluster, which is simpler and more
efficient, if:

# You are not using vnodes.

# Both clusters use the same version of DataStax Enterprise (DSE).

# The node ratio is 1:1.

• If the clusters are different sizes or if you are using vnodes, use the sstableloader (sstableloader).

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
983
Operations

7. You can either switch to the new cluster all at once or perform an incremental migration.
For example, to perform an incremental migration, you can set your client to designate a percentage of the
reads that go to the new cluster. This allows you to test the new cluster before decommissioning the old
cluster.

8. Ensure that the new cluster is operating properly and then decommission the old cluster. See
Decommissioning a datacenter.

Adding single-token nodes to a cluster


Steps for adding nodes in single-token architecture clusters, not clusters using Virtual nodes.
To add capacity to a cluster, introduce new nodes in stages or by adding an entire datacenter. Use one of the
following methods:

• Add capacity by doubling the cluster size: Adding capacity by doubling (or tripling or quadrupling) the
number of nodes is less complicated when assigning tokens. Using this method, existing nodes keep their
existing token assignments, and the new nodes are assigned tokens that bisect (or trisect) the existing token
ranges.

• Add capacity for a non-uniform number of nodes: When increasing capacity with this method, you must
recalculate tokens for the entire cluster, and assign the new tokens to the existing nodes.

Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.

For DataStax Enterprise clusters, you can use OpsCenter to rebalance a cluster.

1. Calculate the tokens for the nodes based on your expansion strategy using the Token Generating Tool.

2. Install DataStax Enterprise and configure DataStax Enterprise on each new node.

3. If DataStax Enterprise starts automatically, stop the node and clear the data.

4. Configure cassandra.yaml on each new node:

• auto_bootstrap: If false, set it to true.


This option is not listed in the default cassandra.yaml configuration file and defaults to true.

• cluster_name

• listen_address/broadcast_address: Usually leave blank. Otherwise, use the IP address or host name
that other nodes use to connect to the new node.

• endpoint_snitch

• initial_token: Set according to your token calculations.


If this property has no value, the database assigns the node a random token range and results in a
badly unbalanced ring.

• seed_provider: Make sure that the new node lists at least one seed node in the existing cluster.
Seed nodes cannot bootstrap. Make sure the new nodes are not listed in the -seeds list. Do not
make all nodes seed nodes. See Internode communications (gossip).

• Change any other non-default settings in the new nodes to match the existing nodes. Use the diff
command to find and merge any differences between the nodes.

5. Depending on the snitch, assign the datacenter and rack names in the cassandra-topology.properties or
cassandra-rackdc.properties for each node.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
984
Operations

6. Start DataStax Enterprise on each new node in two minutes intervals with consistent.rangemovement
turned off:

• Package installations: To each bootstrapped node, add the following option to the jvm.options file and
then start DataStax Enterprise:

-Dcassandra.consistent.rangemovement=false

• Tarball installations:

$ bin/cassandra -Dcassandra.consistent.rangemovement=false

The following operations are resource intensive and should be done during low-usage times.

7. After the new nodes are fully bootstrapped, use nodetool move to assign the new initial_token value to
each node that requires one, one node at a time.

8. After all nodes have their new tokens assigned, run nodetool cleanup on each node in the cluster and wait
for cleanup to complete on each node before doing the next node.
This step removes the keys that no longer belong to the previously existing nodes.

Failure to run nodetool cleanup after adding a node may result in data inconsistencies including
resurrection of previously deleted data.

Adding a datacenter to a single-token architecture cluster


Steps for adding a datacenter to single-token architecture clusters, not clusters using Virtual nodes.
Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.

1. Ensure that you are using NetworkTopologyStrategy for all keyspaces.

2. For each new node, edit the configuration properties in the cassandra.yaml file:

• Set auto_bootstrap to False.

• Set the initial_token. Be sure to offset the tokens in the new datacenter, see Initializing single-
token architecture datacenters.

• Set the cluster name.

• Set any other non-default settings.

• Set the seed lists. Every node in the cluster must have the same list of seeds and include at least
one node from each datacenter. Typically one to three seeds are used per datacenter.

3. Update the relevant properties file on all nodes to include the new nodes. You do not need to restart.

• GossipingPropertyFileSnitch: cassandra-rackdc.properties

• PropertyFileSnitch: cassandra-topology.properties

4. Ensure that your client does not auto-detect the new nodes so that they aren't contacted by the client until
explicitly directed.

5. If using a QUORUM consistency level for reads or writes, check the LOCAL_QUORUM or EACH_QUORUM
consistency level to make sure that the level meets the requirements for multiple datacenters.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
985
Operations

6. Start the new nodes.

7. The GossipingPropertyFileSnitch always loads cassandra-topology.properties when that


file is present. Remove the file from each node on any new cluster or any cluster migrated from the
PropertyFileSnitch.

8. After all nodes are running in the cluster:

a. Change the replication factor for your keyspace for the expanded cluster.

b. Run nodetool rebuild on each node in the new datacenter.

Replacing a dead node in a single-token architecture cluster


Steps for replacing nodes in single-token architecture clusters, not vnodes.

Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.

1. Run nodetool status to verify that the node is dead (DN).

2. Record the datacenter, address, and rack settings of the dead node; you will use these later.

3. Record the existing initial_token setting from the dead node's cassandra.yaml.

4. Add the replacement node to the network and record its IP address.

5. If the dead node was a seed node, change the cluster's seed node configuration on each node:

a. In the cassandra.yaml file for each node, remove the IP address of the dead node from the - seeds
list in the seed-provider property.

b. If the cluster needs a new seed node to replace the dead node, add the new node's IP address to
the - seeds list of the other nodes.

Making every node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is recommended to use
a small seed list (approximately three nodes per datacenter).

6. On an existing node, gather setting information for the new node from the cassandra.yaml file:

• cluster_name

• endpoint_snitch

• Other non-default settings: Use the diff tool to compare current settings with default settings.

7. Gather rack and datacenter information:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
986
Operations

• If the cluster uses the PropertyFileSnitch, record the rack and data assignments listed in the
cassandra-topology.properties file, or copy the file to the new node.

• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, Configuring Amazon EC2 multi-region snitch, or Configuring the Google Cloud Platform
snitch, record the rack and datacenter assignments in the dead node's cassandra-rackdc.properties
file.

8. Make sure that the new node meets all prerequisites and then Install DataStax Enterprise on the new node,
but do not start DataStax Enterprise.
Be sure to install the same version of DataStax Enterprise as is installed on the other nodes in the cluster,
as described in the installation instructions.

9. If DataStax Enterprise automatically started on the node, stop and clear the data that was added
automatically on startup.

10. Add values to the following properties in cassandra.yaml file from the information gathered earlier:

• auto_bootstrap: If this setting exists and is set to false, set it to true. (This setting is not included in
the default cassandra.yaml configuration file.)

• cluster_name

• initial token

• seed list
If the new node is a seed node, make sure it is not listed in its own - seeds list.

11. Add the rack and datacenter configuration:

• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, and Configuring Amazon EC2 multi-region snitch or Configuring the Google Cloud Platform
snitch:

a. Add the dead node's rack and datacenter assignments to the cassandra-rackdc.properties file
on the replacement node.
Do not remove the entry for the dead node's IP address yet.

b. Delete the cassandra-topology.properties file.

• If the cluster uses the PropertyFileSnitch:

a. Copy the cassandra-topology.properties file from an existing node, or add the settings to
the local copy.

b. Edit the file to add an entry with the new node's IP address and the dead node's rack and
datacenter assignments.

12. Start the new node with with the required options:
Package installations:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
987
Operations

a. Add the following option to jvm.options:

-Dcassandra.replace_address_first_boot=address_of_dead_node

b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, add the
consistent_replace option to jvm.options using either QUORUM or LOCAL_QUORUM values to ensure data
consistency on the replacement node, otherwise the node may stream from a potentially inconsistent
replica, and reads may return stale data.
For example:

-Ddse.consistent_replace=LOCAL_QUORUM

Other options that control repair during a consistent replace are:

• consistent_replace.parallelism

• consistent_replace.retries

• consistent_replace.whitelist

c. Start the node.

d. After the node bootstraps, remove replace_address_first_boot and consistent_replace (if


specified) from jvm.options.

Tarball installations:

a. Add the following parameter to the start up command line:

$ sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=address_of_dead_node

b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, in addtion to
replace_address_first_boot, add the consistent_replace parameter using either QUORUM or
LOCAL_QUORUM values to ensure data consistency on the replacement node, otherwise the node may
stream from a potentially inconsistent replica, and reads may return stale data.
For example:

sudo bin/dse cassandra -Dcassandra.replace_address_first_boot=address_of_dead_node


-Ddse.consistent_replace=LOCAL_QUORUM

Other options that control repair during a consistent replace are:

• consistent_replace.parallelism

• consistent_replace.retries

• consistent_replace.whitelist

13. Run nodetool status to verify that the new node has bootstrapped successfully.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
988
Operations

Tarball path:

installation_location/resources/cassandra/bin

14. In environments that use the PropertyFileSnitch, wait at least 72 hours and then remove the old node's IP
address from the cassandra-topology.properties file.

This ensures that old node's information is removed from gossip. If removed from the property file too
soon, problems may result. Use nodetool gossipinfo to check the gossip status. The node is still in
gossip until LEFT status disappears.

The cassandra-rackdc.properties file does not contain IP information; therefore this step is not
required when using other snitches, such as GossipingPropertyFileSnitch.

Backing up and restoring data


DSE OpsCenter provides automated backup and restore functionality, see Backup Service.
About snapshots
DataStax Enterprise backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the
data directory. You can take a snapshot of all keyspaces, a single keyspace, or a single table while the system is
online.
Using a parallel ssh tool (such as pssh), you can snapshot an entire cluster. This provides an eventually
consistent backup. Although no one node is guaranteed to be consistent with its replica nodes at the time a
snapshot is taken, a restored snapshot resumes consistency using built-in consistency mechanisms.
After a system-wide snapshot is performed, you can enable incremental backups on each node to backup data
that has changed since the last snapshot. Each time a memtable is flushed to disk and an SSTable is created,
a hard link is copied into a /backups subdirectory of the data directory (provided JNA is enabled). Compacted
SSTables do not create hard links in /backups because these SSTables do not contain any data that has not
already been linked.
Taking a snapshot
Snapshots are taken per node using the nodetool snapshot command. To take a global snapshot, run the
command with a parallel ssh utility, such as pssh.
A snapshot first flushes all in-memory writes to disk, then makes a hard link of the SSTable files for each
keyspace. You must have enough free disk space on the node to accommodate making snapshots of your data
files. A single snapshot requires little disk space. However, snapshots can cause your disk usage to grow more
quickly over time because a snapshot prevents old obsolete data files from being deleted. After the snapshot is
complete, you can move the backup files to another location if needed, or you can leave them in place.
Restoring from a snapshot requires the table schema.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
989
Operations

1. Run nodetool cleanup to ensure that invalid replicas are removed.

$ nodetool cleanup cycling

2. Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace. For example:

$ nodetool snapshot -t cycling_2017-3-9 cycling

Tarball path:

installation_location/resources/cassandra/bin

The name of the snapshot directory appears:

Requested creating snapshot(s) for [cycling] with snapshot name [2015.07.17]


Snapshot directory: cycling_2017-3-9

The snapshot files are created in data/keyspace_name/table_name-UUID/snapshots/snapshot_name directory.

$ ls -1 data/cycling/cyclist_name-9e516080f30811e689e40725f37c761d/snapshots/
cycling_2017-3-9

The data files extension is .db and the full CQL to create the table is in the schema.cql file.

manifest.json
mc-1-big-CompressionInfo.db
mc-1-big-Data.db
mc-1-big-Digest.crc32
mc-1-big-Filter.db
mc-1-big-Index.db
mc-1-big-Statistics.db
mc-1-big-Summary.db
mc-1-big-TOC.txt
schema.cql

Deleting snapshot files


When taking a snapshot, previous snapshot files are not automatically deleted. You should remove old
snapshots that are no longer needed.
The nodetool clearsnapshot command removes all existing snapshot files from the snapshot directory of each
keyspace. You should make it part of your back-up process to clear old snapshots before taking a new one.

1. To delete all snapshots for a node, run the nodetool clearsnapshot command. For example:

$ nodetool -h localhost -p 7199 clearsnapshot

Tarball path:

installation_location/resources/cassandra/bin

To delete snapshots on all nodes at once, run the nodetool clearsnapshot command using a parallel ssh
utility.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
990
Operations

2. To delete a single snapshot, run the clearsnapshot command with the snapshot name:

$ nodetool clearsnapshot -t <snapshot_name>

The file name and path vary according to the type of snapshot. See nodetools snapshot for details about
snapshot names and paths.

Enabling incremental backups


When incremental backups are enabled (disabled by default), DataStax Enterprise hard-links each memtable-
flushed SSTable to a backups directory under the keyspace data directory. This allows storing backups
offsite without transferring entire snapshots. Also, incremental backups combined with snapshots provide a
dependable, up-to-date backup mechanism. Compacted SSTables do not create hard links in /backups because
these SSTables do not contain any data that has not already been linked. A snapshot at a point in time, plus all
incremental backups and commit logs since that time form a compete backup.
As with snapshots, DataStax Enterprise does not automatically clear incremental backup files. DataStax
recommends setting up a process to clear incremental backup hard links each time a new snapshot is created.

1. Edit the cassandra.yaml configuration file on each node in the cluster and change the value of
incremental_backups to true.

2. Restart each node to recognize the change.

Restoring from a snapshot


Restoring a keyspace from a snapshot requires all snapshot files for the table, and if using incremental
backups, any incremental backup files created after the snapshot was taken. Streamed SSTables (from repair,
decommission, and so on) are also hardlinked and included.

Restoring from snapshots and incremental backups temporarily causes intensive CPU and I/O activity on the
node being restored.

Restoring from local nodes


This method copies the SSTables from the snapshots directory into the correct data directories.

1. Make sure the table schema exists and is the same as when the snapshot was created.
The nodetool snapshot command creates a table schema in the output directory. If the table does not exist,
recreate it using the schema.cql file.

2. If necessary, TRUNCATE the target table.


You may not need to truncate under certain conditions. For example, if a node lost a disk, you might
restart before restoring so that the node continues to receive new writes before starting the restore
procedure.
Truncating is usually necessary. For example, if there was an accidental deletion of data, the tombstone
from that delete has a later write timestamp than the data in the snapshot. If you restore without
truncating (removing the tombstone), the database continues to shadow the restored data. This behavior
also occurs for other types of overwrites and causes the same problem.

3. Locate the most recent snapshot folder. For example:


/var/lib/cassandra/data/keyspace_name/table_name-UUID/snapshots/snapshot_name

4. Copy the most recent snapshot SSTable directory to the /var/lib/cassandra/


data/keyspace/table_name-UUID directory.

5. Run nodetool refresh.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
991
Operations

Restoring from centralized backups


This method uses sstableloader to restore snapshots.

1. Verify that the SSTable version is compatible with the current version of DSE:

a. Locate the version in the file names.


Use the version number and format in the SSTable file name to determine compatibility and upgrade
requirements. The first two letters of the file name is the version, where the first letter indicates a major
version and the second letter indicates a minor version.
For example, the following SSTable version is aa and the format is bti:

data/cycling/cyclist_expenses-e4f31e122bc511e8891b23da85222d3d/aa-1-bti-Data.db

b. Using the correct DSE version of sstableupgrade, to create a compatible version:


For details on SSTable versions and compatibility, see DataStax Enterprise, Apache Cassandra, CQL,
and SSTable compatibility.

2. Make sure the table schema exists and is the same as when the snapshot was created.
The nodetool snapshot command creates a table schema in the output directory. If the table does not exist,
recreate it using the schema.cql file.

3. If necessary, TRUNCATE the target table.


You may not need to truncate under certain conditions. For example, if a node lost a disk, you might
restart before restoring so that the node continues to receive new writes before starting the restore
procedure.
Truncating is usually necessary. For example, if there was an accidental deletion of data, the tombstone
from that delete has a later write timestamp than the data in the snapshot. If you restore without
truncating (removing the tombstone), the database continues to shadow the restored data. This behavior
also occurs for other types of overwrites and causes the same problem.

4. Restore the most recent snapshot using the sstableloader tool on the backed-up SSTables.
The sstableloader streams the SSTables to the correct nodes. You do not need to remove the commitlogs or
drain or restart the nodes.

Restoring a snapshot into a new cluster


Suppose you want to copy a snapshot of SSTable data files from a three node DataStax Enterprise cluster with
vnodes enabled (128 tokens) and recover it on another newly created three node cluster (128 tokens). The token
ranges will not match, because the token ranges cannot be exactly the same in the new cluster. You need to
specify the tokens for the new cluster that were used in the old cluster.

This procedure assumes you are familiar with restoring a snapshot and configuring and initializing a cluster.

To recover the snapshot on the new cluster:

1. From the old cluster, retrieve the list of tokens associated with each node's IP:

nodetool ring | grep -w ip_address_of_node | awk '{print $NF ","}' | xargs

2. In the cassandra.yaml file for each node in the new cluster, add the list of tokens you obtained in the
previous step to the initial_token parameter using the same num_tokens setting as in the old cluster.

If nodes are assigned to racks, make sure the token allocation and rack assignments in the new
cluster are identical to those of the old.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
992
Operations

3. Make any other necessary changes in the new cluster's cassandra.yaml and property files so that the new
nodes match the old cluster settings. Make sure the seed nodes are set for the new cluster.

4. Clear the system table data from each new node:

sudo rm -rf /var/lib/cassandra/data/system/*

This allows the new nodes to use the initial tokens defined in the cassandra.yaml when they restart.

5. Start each node using the specified list of token ranges in new cluster's cassandra.yaml:

initial_token: -9211270970129494930, -9138351317258731895, -8980763462514965928, ...

6. Create schema in the new cluster. All the schemas from the old cluster must be reproduced in the new
cluster.

7. Stop the node. Using nodetool refresh is unsafe because files within the data directory of a running
node can be silently overwritten by identically named just-flushed SSTables from memtable flushes or
compaction. Copying files into the data directory and restarting the node will not work for the same reason.

8. Restore the SSTable files snapshotted from the old cluster onto the new cluster using the same directories,
while noting that the UUID component of target directory names has changed. Without restoration, the new
cluster will not have data to read upon restart.

9. Restart the node.

Recovering from a single disk failure using JBOD


Steps for recovering from a single disk failure in a disk array using JBOD (just a bunch of disks).
DataStax Enterprise might not fail from the loss of one disk in a JBOD array, but some reads and writes may fail
when:

• The operation's consistency level is ALL.

• The data being requested or written is stored on the defective disk.

• The data to be compacted is on the defective disk.

It's possible that you can simply replace the disk, restart DataStax Enterprise, and run nodetool repair.
However, if the disk crash corrupted system table, you must remove the incomplete data from the other disks in
the array. The procedure for doing this depends on whether the cluster uses Vnode or single-token architecture.

1. Verify that the node has a defective disk and identify the disk, by checking the logs on the affected node.
Disk failures are logged in FILE NOT FOUND entries, which identifies the mount point or disk that has
failed.

2. If the node is still running, stop DSE and shut down the node.

3. Replace the defective disk and restart the node.

4. If the node cannot restart:

a. Try restarting DataStax Enterprise without bootstrapping the node:


Package installations:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
993
Operations

a. Add the following option to cassandra-env.sh file:

JVM_OPTS="$JVM_OPTS -Dcassandra.allow_unsafe_replace=true

b. Starting DataStax Enterprise as a service.

c. After the node bootstraps, remove the -Dcassandra.allow_unsafe_replace=true parameter


from cassandra-env.sh.

d. Starting DataStax Enterprise as a service.

Tarball installations:

• Start DataStax Enterprise with this option:

$ sudo bin/dse cassandra Dcassandra.allow_unsafe_replace=true

Tarball path:

installation_location

5. If DataStax Enterprise restarts, run nodetool repair on the node. If not, replace the node.

6. If the repair succeeds, the node is restored to production. Otherwise, go to 7 or 8.

7. For a cluster using vnodes:

a. On the affected node, clear the system directory on each functioning drive.
Example for a node with a three disk JBOD array:

-/mnt1/cassandra/data
-/mnt2/cassandra/data
-/mnt3/cassandra/data

If mnt1 has failed:

$ rm -fr /mnt2/cassandra/data/system && rm -fr /mnt3/cassandra/data/system

b. Restart DataStax Enterprise without bootstrapping as described in 4:

-Dcassandra.allow_unsafe_replace=true

c. Run nodetool repair on the node.


If the repair succeeds, the node is restored to production. If not, replace the dead node.

8. For a cluster single-token nodes:

a. On one of the cluster's working nodes, run nodetool ring to retrieve the list of the repaired node's
tokens:

$ nodetool ring | grep ip_address_of_node | awk ' {print $NF ","}' | xargs

b. Copy the output of the nodetool ring into a spreadsheet (space-delimited).

c. Edit the output, keeping the list of tokens and deleting the other columns.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
994
Operations

d. On the node with the new disk, open the cassandra.yaml file and add the tokens (as a comma-
separated list) to the initial_token property.

e. Change any other non-default settings in the new nodes to match the existing nodes. Use the diff
command to find and merge any differences between the nodes.
If the repair succeeds, the node is restored to production. If not, replace the node.

f. On the affected node, clear the system directory on each functioning drive.
Example for a node with a three disk JBOD array:

-/mnt1/cassandra/data
-/mnt2/cassandra/data
-/mnt3/cassandra/data

If mnt1 has failed:

$ rm -fr /mnt2/cassandra/data/system && rm -fr /mnt3/cassandra/data/system

g. Restart DataStax Enterprise without bootstrapping as described in 4:

-Dcassandra.allow_unsafe_replace=true

h. Run nodetool repair on the node.


If the repair succeeds, the node is restored to production. If not, replace the node.

Repairing nodes
For conceptual information about repairing nodes, see Anti-entropy repair.
Manual repair: Anti-entropy repair
A manual repair is run using nodetool repair. This tool provides many options for configuring repair. This page
provides guidance for choosing certain parameters.

Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.

Partitioner range (-pr)


Within a cluster, the database stores a particular range of data on multiple nodes. If you run nodetool repair
on one node at a time, the database may repair the same range of data several times (depending on the
replication factor used in the keyspace). If you use the partitioner range option, nodetool repair -pr only
repairs a specified range of data once, rather than repeating the repair operation. This option decreases the
strain on network resources, although nodetool repair -pr still builds Merkle trees for each replica.
You can use the partitioner range option with incremental repair; however it is not recommended because
incremental repair already avoids re-repairing data by marking data as repaired. The most efficient way to run
incremental repair is without the -pr parameter since it can skip anti-compaction by marking whole SSTables as
repaired.
If you use this option, run the repair on every node in the cluster to repair all data. Otherwise, some ranges of
data will not be repaired.

DataStax recommends using the partitioner range parameter when running full repairs during routine
maintenance.
Full repair is run by default.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
995
Operations

If running nodetool repair -pr on a downed node that has been recovered, be sure to run the command on
all other nodes in the cluster as well.

Local (-local) vs datacenter (-dc) vs cluster-wide repair


Consider carefully before using nodetool repair across datacenters, instead of within a local datacenter. When
you run repair locally on a node using -local, the command runs only on nodes within the same datacenter
as the node that runs it. Otherwise, the command runs cluster-wide repair processes on all nodes that contain
replicas, even those in different datacenters. For example, if you start nodetool repair over two datacenters,
DC1 and DC2, each with a replication factor of 3, repair builds Merkle tables for 6 nodes. The number of Merkle
Tree increases linearly for additional datacenters. Cluster-wide repair also increases network traffic between
datacenters tremendously, and can cause cluster issues.
If the local option is too limited, use the -dc option to limit repairs to a specific datacenter. This does not repair
replicas on nodes in other datacenters, but it can decrease network traffic while repairing more nodes than the
local options.
The nodetool repair -pr option is good for repairs across multiple datacenters.
Additional guidance for nodetool repair options:

• Does not support the use of -local with the -pr option unless the datacenter nodes have all the data for all
ranges.

• Does not support the use of -local with -inc (incremental repair).

For repairs across datacenters, use the -dcpar option to repair datacenters in parallel.

One-way targeted repair from a remote node (--pull, --hosts, -st, -et)
Runs a repair directly from another node, which has a replica in the same token range. This option minimizes
performance impact when cross-datacenter repairs are required.

nodetool repair --pull -hosts local_ip_address,remote_ip_address keyspace_name

Endpoint range vs Subrange repair (-st, -et)


A repair operation runs on all partition ranges on a node, or endpoint range, unless using -st and -et (or -
start-token and -end-token) options to run subrange repairs. When you specify a start token and end token,
nodetool repair works between these tokens, repairing only those partition ranges.
Subrange repair is not a good strategy because it requires generated token ranges. However, if you know which
partition has an error, you can target that partition range precisely for repair. This approach can relieve the
problem known as overstreaming, which ties up resources by sending repairs to a range over and over.
Subrange repair involves more than just the nodetool repair command. A Java describe_splits call to ask
for a split containing 32k partitions can be iterated throughout the entire range incrementally or in parallel to
eliminate the overstreaming behavior. Once the tokens are generated for the split, they are passed to nodetool
repair -st start_token -et end_token. The -local option can be used to repair only within a local datacenter
to reduce cross datacenter transfer.
Full repair vs incremental repair (-full vs -inc)
Full repair builds a full Merkle tree and compares it the data against the data on other nodes. For a complete
explanation of full repair, see How does anti-entropy repair work?.
Incremental repair compares all SSTables on the node and makes necessary repairs. An incremental repair
persists data that has already been repaired, and only builds Merkle trees for unrepaired SSTables. Incremental
repair marks the rows in an SSTable as repaired or unrepaired.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
996
Operations

Figure 18: Merkle Trees for Incremental Repair versus Full Repair

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
997
Operations

Incremental repairs work like full repairs, with an initiating node requesting Merkle trees from peer nodes with the
same unrepaired data, and then comparing the Merkle trees to discover mismatches. Once the data has been
reconciled and new SSTables built, the initiating node issues an anti-compaction command. Anti-compaction
is the process of segregating repaired and unrepaired ranges into separate SSTables, unless the SSTable fits
entirely within the repaired range. In the latter case, the SSTable metadata repairedAt is updated to reflect its
repaired status.
Anti-compaction is handled differently, depending on the compaction strategy assigned to the data.

• Size-tiered compaction (STCS) splits repaired and unrepaired data into separate pools for separate
compactions. A major compaction generates two SSTables, one for each pool of data.

• Leveled compaction (LCS) performs size-tiered compaction on unrepaired data. After repair completes,
Casandra moves data from the set of unrepaired SSTables to L0.

• Date-tiered (DTCS) splits repaired and unrepaired data into separate pools for separate compactions. A
major compaction generates two SSTables, one for each pool of data. DTCS compaction should not use
incremental repair.

Parallel vs Sequential repair (default, -seq, -dc-par)


The default mode runs repair on all nodes with the same replica data at the same time. Sequential (-seq)
runs repair on one node after another. Datacenter parallel (-dcpar) combines sequential and parallel by
simultaneously running a sequential repair in all datacenters; a single node in each datacenter runs repair, one
after another until the repair is complete.
Sequential repair takes a snapshot of each replica. Snapshots are hardlinks to existing SSTables. They are
immutable and require almost no disk space. The snapshots are active while the repair proceeds, then the
database deletes them. When the coordinator node finds discrepancies in the Merkle trees, the coordinator node
makes required repairs from the snapshots. For example, for a table in a keyspace with a Replication factor
RF=3 and replicas A, B and C, the repair command takes a snapshot of each replica immediately and then
repairs each replica from the snapshots sequentially (using snapshot A to repair replica B, then snapshot A to
repair replica C, then snapshot B to repair replica C).
Parallel repair works on nodes A, B, and C all at once. During parallel repair, the dynamic snitch processes
queries for this table using a replica in the snapshot that is not undergoing repair.
Sequential repair is the default in DataStax Enterprise 4.8 and earlier. Parallel repair is the default for DataStax
Enterprise 5.0 and later.
When to run anti-entropy repair
When to run anti-entropy repair is dependent on the characteristics of the cluster. General guidelines are
presented here, and should be tailored to each particular case.
An understanding of how repair works is required to fully understand the information presented on this page,
see Anti-entropy repair.

When is repair needed?


Run repair in these situations:

• Routinely to maintain node health.


Even if deletions never occur, schedule regular repairs. Setting a column to null is a delete.

• When recovering a node after a failure while bringing it back into the cluster.

• To update data on a node containing infrequently read data, and subsequently does not get read repair.

• To update data on a downed node.

• When recovering missing data or corrupted SSTables. You must run non-incremental repair.

Guidelines for running routine node repair


• Run full repairs weekly to monthly. Monthly is generally sufficient, but run more frequently if warranted.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
998
Operations

Full repair is useful for maintaining data integrity, even if deletions never occur.

• Use the parallel and partitioner range options, unless precluded by the scope of the repair.

• Migrate off incremental repairs and then run a full repair to eliminate anti-compaction. Anti-compaction is the
process of splitting an SSTable into two SSTables, one with repaired data and one with non-repaired data.
This has compaction strategy implications.
If you are on DataStax Enterprise version 5.1.0-5.1.2, DataStax recommends upgrading to 5.1.3 or later.

• Run repair frequently enough that every node is repaired before reaching the time specified in the
gc_grace_seconds setting. If this requirement is met, deleted data is properly handled in the cluster.

• Schedule routine node repair operations to minimize cluster disruption during low-usage hours and on one
node at a time:

• Increase the time value setting of gc_grace_seconds if data is seldom deleted or overwritten. For these
tables, changing the setting minimizes impact to disk space and provides a longer interval between repair
operations.

• Mitigate heavy disk usage by configuring nodetool compaction throttling options (setcompactionthroughput
and setcompactionthreshold) before running a repair.

Guideline for running repair on a downed node


• Do not use partitioner range, -pr.

• Do not use incremental repair, -inc.

Changing repair strategies


Change the method used for routine repairs from incremental or full repair. Repairing SSTables using anti-
entropy repair is required for database maintenance. A full repair of all SSTables on a node takes a lot of time
and is resource-intensive. Incremental repair consumes less time and resources because it skips SSTables that
are already marked as repaired.
Migrating to full repairs
Incremental repairs split the data into repaired and unrepaired SSTables and mark the data state with
metadata. Full repairs keeps the data together and uses no repair status flag. Before switching from incremental
repairs to full repairs remove the status.

$ nodetool mark_unrepaired keyspace_name [table_name]

Migrating to incremental repairs


To start using incremental repairs, migrate the SSTables on each node. Incremental repair skips SSTables that
are already marked as repaired. These steps ensure the data integrity when changing the repair strategy from
full to incremental.
DataStax recommends using full repairs. Incremental repairs may cause performance issues, see
CASSANDRA-9143.

Prerequisites:
In RHEL and Debian installations, you must install the tools packages before following these steps.

Before starting this procedure, be aware that the first system-wide full repair (3) can take a long time, as the
database recompacts all SSTables. To make this process less disruptive, migrate the cluster to incremental
repair one node at a time.

In a terminal:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
999
Operations

1. Disable autocompaction on the node:

$ nodetool disableautocompaction

Tarball path: install_directory/bin

Running nodetool disableautocompaction without parameters disables autocompaction for all


keyspaces.

2. Before running a full repair (3), list the nodes SSTables located in /var/lib/cassandra/data. You will
need this list to run the command to set the repairedAt flag in 5.
The data directory contains a subdirectory for each keyspace. Each subdirectory contains a set of files
for each SSTable. The name of the file that contains the SSTable data has the following format:

<version_code>-<generation>-<format>-Data.db

3. Run the default full, sequential repair on one node at a time:

$ nodetool repair

Tarball path: install_directory/bin


Running nodetool repair without parameters runs a full sequential repair of all SSTables on the node
and can take a substantial amount of time.

4. Stop the node.

5. Using the list you created in 2, set the repairedAt flag on each SSTable using sstablerepairedset to --is-
repaired.
Unless you set the repairedAt to repaired for each SSTable, the existing SSTables might not be
changed by the repair process and any incremental repair process that runs later will not process these
SSTables.

• To mark a single SSTable:

$ sudo sstablerepairedset --really-set --is-repaired SSTable-example-Data.db

• For batch processing, use a text file of SSTable names:

$ sudo sstablerepairedset --really-set --is-repaired -f SSTable-names.txt

Tarball path:

installation_location/resources/cassandra/tools/bin

The value of the repairedAt flag is the timestamp of the last repair. The sstablerepairedset
command applies the current date/time. To check the value of the repairedAt flag, use:

$ sstablemetadata example-keyspace-SSTable-example-Data.db | grep "Repaired at"

6. Restart the node.

What's next:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1000
Operations

After you have migrated all nodes, you can run incremental repairs using nodetool repair with the -inc
option.
https://www.datastax.com/dev/blog/repair-in-cassandra
https://www.datastax.com/dev/blog/more-efficient-repairs
https://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1

Monitoring a DataStax Enterprise cluster


DataStax Enterprise (DSE) provides a wealth of metrics regarding your clusters. Understanding the performance
characteristics of a cluster is critical to diagnosing issues and planning capacity.
DataStax recommends using DSE Metrics Collector or DataStax Enterprise OpsCenter to monitor clusters and
view metrics. These tools provide valuable insights into your clusters by visually representing the most useful
metrics in customizable dashboards.
DSE exposes a number of statistics and management operations via Java Management Extensions (JMX). You
can get statistics and metrics using DataStax tools or the Java Console (JConsole), and then add those metrics to
dashboards in external monitoring tools (such as Prometheus) or OpsCenter.
DSE Metrics Collector and external dashboards
When DSE Metrics Collector is enabled, DSE sends metrics and other structured events to DSE Metrics Collector.
Use dsetool insights_config to enable and configure the frequency and type of metrics that are sent to DSE
Metrics Collector. After setting the configuration properties, you can export the aggregated metrics to monitoring
tools like Prometheus, Graphite, and Splunk, which can then be visualized in a dashboard such as Grafana.
OpsCenter Dashboard and Performance Service
Use OpsCenter to monitor performance metrics in the OpsCenter Dashboard. Real-time and historical
performance metrics are available at different granularities: cluster-wide, per node, per table (column family), or
per storage tier.
In addition to the OpsCenter Dashboard, you can enable the OpsCenter Performance Service, which combines
OpsCenter metrics with CQL-based diagnostic tables populated by the DSE Performance Service. Use the
information generated by the Performance Service to help understand, tune, and optimize cluster performance.

Tuning the database


Tuning Java Virtual Machine
Improve performance or reduce high memory consumption by tuning the Java Virtual Machine (JVM). Operations
on the following components occur in the JVM heap:

• Bloom filters

• Partition summary

• Partition key cache

• Compression offsets

• SSTable index summary

The metadata resides in memory and is proportional to total data. Some of the components grow proportionally
to the size of total memory. The database gathers replicas for a read or for anti-entropy repair and compares the
replicas in heap memory.
Data written to the database is first stored in memtables in heap memory. Memtables are then flushed to
SSTables on disk.

The database uses off-heap memory as follows:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1001
Operations

• Page cache. The database uses additional memory as page cache when reading files on disk.

• The Bloom filter and compression offset maps reside off-heap.

• The database can store cached rows in native memory, outside the Java heap. This reduces JVM heap
requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.

DSE advanced features memory use


DataStax Enterprise advanced features use additional memory on nodes where the workload type is enabled.
Search
DSE Search has larger memory requirements than database only node. Most search deployments run with
heaps between 24 -32 GB using G1 GC. Additional memory usage considerations include:

• Solr stores indexed data in RAM buffer until it is flushed to index segments on disk; when setting the heap
size determine the amount of memory required for Solr indexes. Allow enough free RAM, that is total RAM -
DSE heap size - DSE off heap object size.

• Multiple concurrent indexers can cause GC thrashing, even with a large heap.

• Indexes larger than the page cache size can cause impact search query performance. Ensure that the index
size does not exceed the page cache size for highly performant search queries.

See DSE Search performance tuning and monitoring.

Analytics
DSE Analytics nodes run Spark in a separate JVM. Therefore, adjustments to the Cassandra JVM do not affect
Spark operations directly. DSE Analytics typically have read heavy workloads because analytic nodes run a
significant number of range reading queries. Additional memory usage considerations include:

• Spark executors are the most memory intensive processes in Spark. These are tuned to use G1 GC by
default. Tune the size of the executor heap in spark.defaults.conf. Consider leaving room for OS page
cache when tuning the executor heaps.

• Common causes of Spark OOM's are shuffle steps. Try to avoid performing shuffles by leveraging
RepartitionByCassandraReplica / JoinWithCassandraTable in your RDD Jobs .

See Spark JVMs and memory management.

Graph
DSE Graph workloads often include Search, Analytics, or both. Tune the GC for the Search and Analytics
workloads. In addition to the memory needed by the Search and Analytics workloads, Graph queries
utilize memory during execution. This workload is characterized by its short lived objects. Most DSE Graph
deployments with Search enabled are run on systems of >= 128GB RAM with G1 GC heaps of 32 GB.
Changing heap size parameters
By default, DataStax Enteprise (DSE) sets the Java Virtual Machine (JVM) heap size from 1 to 32 GB
depending on the amount of RAM and type of Java installed. The cassandra-env.sh automatically configures
the min and max size to the same value using the following formula:

max(min(1/2 ram, 1024 megabytes), min(1/4 ram, 32765 megabytes))

To adjust the JVM heap size, uncomment and set the following parameters in the jvm.options file:

• Minimum (-Xms)

• Maximum (-Xmx)

• New generation (-Xmn)

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1002
Operations

• Parallel processing for GC (-XX:+UseParallelGC)

When overriding the default setting, both min and max must be defined the jvm.options file.

Additionally, for larger machines, increase the max direct memory (-XX:MaxDirectMemorySize), but leave
around 15-20% of memory for the OS and other in-memory structures.
Guidelines and recommendations
Setting the Java heap higher than 32 GB may interfere with the OS page cache. Operating systems that
maintain the OS page cache for frequently accessed data are very good at keeping this data in memory.
Properly tuning the OS page cache usually results in better performance than increasing the row cache. For
production use, follow these guidelines to adjust heap size for your environment:

• Heap size is usually between ¼ and ½ of system memory but not larger than 32 GB.

• Reserve enough memory for the offheap cache and file system cache.

• Enable GC logging when adjusting GC.

• Gradually increase or decrease the parameters. Test each incremental change.

• Enable parallel processing for GC, particularly when using DSE Search.

• The GCInspector class logs information about any garbage collection that takes longer than 200 ms.
Garbage collections that occur frequently and take a moderate length of time (seconds) to complete
indicate excessive garbage collection pressure on the JVM. In addition to adjusting the garbage collection
options, other remedies include adding nodes, and lowering cache sizes.

• For a node using G1, DataStax recommends a MAX_HEAP_SIZE as large as possible, up to 64 GB.

For more tuning tips, see Secret HotSpot option improving GC pauses on large heaps.

Maximum and minimum heap size


The recommended maximum heap size depends on which GC is used:
Hardware setup Recommended MAX_HEAP_SIZE

G1 for newer computers (8+ cores) with up to 256 GB RAM 16 GB to 32 GB


See Java performance tuning.

CMS for newer computers (8+ cores) with up to 256 GB RAM No more than 16 GB

Older computers Typically 8 GB

New heap size


For CMS, you may also need to adjust new (young) generation heap size. This setting determines the amount
of heap memory allocated to newer objects. The database calculates the default value for this property in
megabytes (MB) as the lesser of:

• 100 times the number of cores

• ¼ of MAX_HEAP_SIZE

1. To enable GC logging, uncomment the loggc parameter in the jvm.options file.

-Xloggc:/var/log/cassandra/gc.log

After restarting Cassandra the log is created and GC events are recorded.

2. Set the heap sizes in the jvm.options file:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1003
Operations

a. Uncomment and set both the min and max heap size. For example to set both the min and max
heap size to 16 GB:

-Xms16G
-Xmx16G

Set the min (-Xms) and max (-Xmx) heap sizes to the same value to avoid stop-the-world
GC pauses during resize, and to lock the heap in memory on startup which prevents any of it
from being swapped out.

b. If using CMS, uncomment and set the new generation heap size to tune the heap for CMS. As a
starting point, set the new parameter to 100 MB per physical CPU core. For example, for a modern
eight-core or greater system:

-Xmn800M

A larger size leads to longer GC pause times. For a smaller new size, GC pauses are shorter
but usually more expensive.

3. On larger machines, increase the max direct memory (-XX:MaxDirectMemorySize), but leave around
15-20% of memory for the OS and other in-memory structures. For example, to set the max direct memory
to 1 MB:

-XX:MaxDirectMemorySize=1M

By default, the size is zero, so the JVM selects the size of the NIO direct-buffer allocations
automatically.

Alternatively, you can set an environment variable called MAX_DIRECT_MEM, instead of setting a size
for -XX:MaxDirectMemorySize in the jvm.options file.

4. Save and close the jvm.options file.

5. Restart Cassandra and run some read heavy or write heavy operations.

6. Check the GC logs.

This method decreases performance for the test node, but generally does not significantly reduce
cluster performance.

If performance does not improve, contact the DataStax Services team for additional help.

Configuring the garbage collector


Garbage collection is performed by a Java process, garbage collection, which removes data that is no longer
needed from memory. For the best performance, use either the Garbage-First (G1) or Continuous Mark Sweep
(CMS) collector. By default, DataStax Enterprise (DSE) uses the Garbage-First (G1) collector.
The primary differences between the collector options are:

• G1 divides the heap into multiple regions, where the number of regions depends primarily on the heap
size and heap region size. The G1 collector dynamically assigns the regions to old generation or new
generation based on the running workload, prioritizing garbage collection in areas of the heap that will yield
the largest free space when collected. Additionally, G1 makes tradeoffs at runtime optimizing for a pause
target (which is configurable using -XX:MaxGCPauseMillis) to provide predictable performance.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1004
Operations

• CMS divides the heap into new generation (eden + survivor spaces), old generation, and permanent
generation, and relies on many heuristics and configurable settings to optimize for performance.

G1 advantages
DataStax recommends G1 over CMS for the following reasons:

• G1 supports large heap sizes (24-96 GB) without tuning. DSE systems, especially those with Search,
Analytics, or Graph workloads, have enough RAM to run larger heaps.

• G1 handles dynamic workloads more effectively than CMS. DSE systems typically have multiple
workloads, such as reads, writes, compactions, search indexing, and range reads for analytics, etc.

• CMS will be deprecated in Java 9.

• G1 is easier to configure. The only configuration options are MAX_HEAP_SIZE and -XX:MaxGCPauseMillis.

Changing the Garbage-First MaxGCPauseMillis parameter


A pause occurs when a region of memory is full and the JVM needs to make space to continue. A region can
fill up, if the rate data is stored in memory exceeds the rate at which it is removed. When tuning the JVM, try
to minimize garbage collection pause, also known as a stop-the-world event. For more details, see Garbage
collection pauses.
During a pause, all operations are suspended. Because a pause affects networking, the node can appear as
down to other nodes in the cluster. SELECT and INSERT statements wait, which increases read and write
latencies. Avoid pauses longer than a second, or multiple pauses within a second.
MaxGCPauseMillis sets the peak pause time expected in the environment. By default, DataStax Enterprise
(DSE) sets the maximum to 500 milliseconds (-XX:MaxGCPauseMillis=500). DataStax recommends staying
between 500-2000 ms. Set the maximum value to the expected peak pause length (not the target pause
length). When adjusting the GC pause, there is always a tradeoff between latency and throughput:

• Longer pause increases latency and throughput

• Shorter pause decreases latency and throughput

Setting MaxGCPauseMillis lower than 500 ms to force lower latency collections might not have the intended
effect. When this value is set lower, it causes GC to run more aggressively and less efficiently, which can
steal cycles without yielding considerable benefit.

Set the value for the -XX:MaxGCPauseMillis parameter in the jvm.options file.
Using the Continuous Mark Sweep (CMS) garbage collector
For some deployments that have small heap sizes, Continuous Mark Sweep (CMS) performs better than
Garbage-First (G1) garbage collector (GC). CMS requires manual tuning which is time consuming, requires
expertise, and can result in poor performance when not done scientifically or if a workload changes.
Using CMS has the following disadvantages:

• Manual tuning and testing that requires time and expertise.

• Only supports heap sizes up to 14 gigabytes (GB). Allocating more memory to heap can result in
diminishing performance as the garbage collection facility increases the amount of database metadata in
heap memory.

For help configuring CMS, contact the DataStax Services team.

CMS guidelines
Use the following basic recommendations when configuring CMS:

• Only use CMS in fixed workload environments, that is the cluster performs the same processes all the
time.

• In environment that require the lowest latency possible.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1005
Operations

G1 incurs some latency due to profiling.

• Configure heap size:

# For systems with more than 24 GB of RAM, configure a 14 GB heap and the settings from
CASSANDRA-8150.

# For systems with less than 24 GB of RAM, configure an 8 GB heap and use the default settings.

# For systems that cannot support 8 GB heap (which are not usually fit for production workloads) use
the default settings. This allocates ¼ of the available RAM to the heap.

Note: For more CMS tuning tips, seeSecret HotSpot option improving GC pauses on large heaps.

1. Open jvm.options.

2. Comment out all lines in the ### G1 Settings section.

3. Uncomment all the ### CMS Settings section.

4. Restart the database.

Tuning Bloom filters


DataStax Enterprise uses Bloom filters to determine whether an SSTable has data for a particular partition.
Bloom filters are unused for range scans, but are used for index scans. Bloom filters are probabilistic
sets that allow you to trade memory for accuracy. This means that higher Bloom filter attribute settings
bloom_filter_fp_chance use less memory, but will result in more disk I/O if the SSTables are highly fragmented.
Bloom filter settings range from 0 to 1.0 (disabled). The default value of bloom_filter_fp_chance depends on
the compaction strategy.
The LeveledCompactionStrategy (LCS) uses a higher default value (0.1) than the
SizeTieredCompactionStrategy (STCS), which has a default of 0.01. Memory savings are nonlinear; going from
0.01 to 0.1 saves about one third of the memory. SSTables using LCS contain a relatively smaller ranges of keys
than those using STCS, which facilitates efficient exclusion of the SSTables even without a bloom filter; however,
adding a small bloom filter helps when there are many levels in LCS.
The settings you choose depend the type of workload. For example, to run an analytics application that heavily
scans a particular table, you would want to inhibit the Bloom filter on the table by setting it high.
To view the observed Bloom filters false positive rate and the number of SSTables consulted per read use
tablestats in the nodetool utility.
Bloom filters are stored off-heap so you don't need include it when determining the -Xmx settings (the maximum
memory size that the heap can reach for the JVM).
To change the bloom filter property on a table, use CQL. For example:

ALTER TABLE addamsFamily WITH bloom_filter_fp_chance = 0.1;

After updating the value of bloom_filter_fp_chance on a table, Bloom filters need to be regenerated in one of
these ways:

• Initiate compaction

• Upgrade the SSTables to compute new bloom filters:

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1006
Operations

# Force all SSTables to be rewritten

$ nodetool upgradesstables -a

# Force upgrade of target SSTables

$ nodetool upgradesstables -a keyspace_name table_name

If the SSTables are already on the current version, the nodetool upgradesstables command returns
immediately and no action is taken. You must use the -a command argument to force the SSTable
upgrade.

You do not have to restart DataStax Enterprise after regenerating SSTables.


Configuring memtable thresholds
Configuring memtable thresholds can improve write performance.
The database flushes memtables to disk, creating SSTables when the commit log space threshold or the
memtable cleanup threshold has been exceeded. Configure the commit log space threshold per node in the
cassandra.yaml. How you tune memtable thresholds depends on your data and write load. Increase memtable
thresholds under either of these conditions:

• The write load includes a high volume of updates on a smaller set of data.

• A steady stream of continuous writes occurs. This action leads to more efficient compaction.

Allocating memory for memtables reduces the memory available for caching and other internal database
structures, so tune carefully and in small increments.

Data caching
Configuring data caches
DataStax Enterprise includes integrated caching and distributes cache data around the cluster.
When a node goes down, the client can read from another cached replica of the data. The database architecture
also facilitates troubleshooting because there is no separate caching tier, and cached data matches what is
in the database exactly. The integrated cache alleviates the cold start problem by saving the cache to disk
periodically. The database reads contents back into the cache and distributes the data when it restarts. The
cluster does not start with a cold cache.
The saved key cache files include the ID of the table in the file name. A saved key cache filename for the users
table in the mykeyspace keyspace looks similar to:
mykeyspace-users.users_name_idx-19bd7f80352c11e4aa6a57448213f97f-KeyCache-
b.db2046071785672832311.tmp

About the row cache


Utilizing appropriate OS page cache results in better performance than using row caching. Consult resources
for page caching for the operating system on which DataStax Enterprise is hosted.

Configure the number of rows to cache in a partition by setting the rows_per_partition table option. To cache
rows, if the row key is not already in the cache, the database reads the first portion of the partition, and puts the
data in the cache. If the newly cached data does not include all cells configured by user, the database performs
another read. The actual size of the row-cache depends on the workload. You should properly benchmark your
application to get ”the best” row cache size to configure.
There are two row cache options, the old serializing cache provider and a new off-heap cache (OHC) provider.
The new OHC provider has been benchmarked as performing about 15% better than the older option.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1007
Operations

Using the row cache


Enable a row cache only when the number of reads is much bigger (rule of thumb is 95%) than the number of
writes. Consider using the chunk cache instead of the row cache, because writes to a partition invalidate the
whole partition in the cache.
Disable caching entirely for archive tables, which are infrequently read.

Enabling and configuring caching


Use CQL to enable or disable caching by configuring the caching table property. Set parameters in the
cassandra.yaml file to configure global caching properties:

• Row cache size

• How often DataStax Enterprise saves row caches to disk

Configuring the row_cache_size_in_mb (in the cassandra.yaml configuration file) determines how much space
in memory the database allocates to store rows from the most frequently read partitions of the table.

1. Set the table caching property that configures the partition key cache and the row cache.

CREATE TABLE users (


userid text PRIMARY KEY,
first_name text,
last_name text,
)
WITH caching = { 'rows_per_partition' : '120' };

Tips for efficient cache use


Some tips for efficient cache use are:

• Store lower-demand data or data with extremely long partitions in a table with minimal or no caching.

• Deploy a large number of transactional nodes under a relatively light load per node.

• Logically separate heavily-read data into discrete tables.

The Tuning the row cache in Cassandra 2.1 blog describes best practices of using the built-in caching
mechanisms and designing an effective data model.

When you query a table, turn on tracing to check that the table actually gets data from the cache rather than
from disk. The first time you read data from a partition, the trace shows this line below the query because the
cache has not been populated yet:

Row cache miss [ReadStage:41]

In subsequent queries for the same partition, look for a line in the trace that looks something like this:

Row cache hit [ReadStage:55]

This output means the data was found in the cache and no disk read occurred. Updates invalidate the cache. If
you query rows in the cache plus uncached rows, request more rows than the global limit allows, or the query
does not grab the beginning of the partition, the trace might include a line that looks something like this:

Ignoring row cache as cached value could not satisfy query [ReadStage:89]

This output indicates that an insufficient cache caused a disk read. Requesting rows not at the beginning of
the partition is a likely cause. Try removing constraints that might cause the query to skip the beginning of the
partition, or place a limit on the query to prevent results from overflowing the cache. To ensure that the query

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1008
Operations

hits the cache, try increasing the cache size limit, or restructure the table to position frequently accessed rows
at the head of the partition.
Monitoring and adjusting caching
In the event of high memory consumption, consider tuning data caches.
Make changes to cache options in small, incremental adjustments, then monitor the effects of each change using
nodetool info.
The cassandra.yaml file provides options for adjusting row cache and key cache settings:

• Cache size in bytes

• Capacity in bytes

• Number of hits

• Number of requests

• Recent hit rate

• Duration in seconds after which the database saves the key cache.

For example, on start-up, the information from nodetool info might look something like this:

ID : 387d15ba-7103-491b-9327-1a691dbb504a
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 65.87 KB
Generation No : 1400189757
Uptime (seconds) : 148760
Heap Memory (MB) : 392.82 / 1996.81
datacenter : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : entries 10, size 728 (bytes), capacity 103809024 (bytes), 93 hits, 102
requests, 0.912 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN
recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 (bytes), capacity 51380224 (bytes), 0 hits, 0
requests, NaN recent hit rate, 7200 save period in seconds
Token : -9223372036854775808

Compacting and compressing


Configuring compaction
As discussed in the How is data maintained?, the compaction process merges keys, combines columns, evicts
tombstones, consolidates SSTables, and creates a new index in the merged SSTable.
In the cassandra.yaml file, you configure these global compaction parameters:

• snapshot_before_compaction

• concurrent_compactors

• compaction_throughput_mb_per_sec

The compaction_throughput_mb_per_sec parameter is designed for use with large partitions. The database
throttles compaction to this rate across the entire system.
DataStax Enterprise provides a start-up option for testing compaction strategies without affecting the production
workload.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1009
Operations

DataStax Enterprise supports the following compaction strategies, which you can configure using CQL:

• LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a fixed,


relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are
guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous.
Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously
being compacted into progressively larger levels. At each level, row keys are merged into non-overlapping
SSTables in the next level. This process can improve performance for reads, because the database can
determine which SSTables in each level to check for the existence of row key data. This compaction
strategy is modeled after Google's LevelDB implementation. Also see LCS compaction subproperties.

• SizeTieredCompactionStrategy (STCS): The default compaction strategy. This strategy triggers a


minor compaction when there are a number of similar sized SSTables on disk as configured by the table
subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace. Also see
STCS compaction subproperties.

• TimeWindowCompactionStrategy (TWCS) This strategy is an alternative for time series data. TWCS
compacts SSTables using a series of time windows. While with a time window, TWCS compacts all
SSTables flushed from memory into larger SSTables using STCS. At the end of the time window, all of
these SSTables are compacted into a single SSTable. Then the next time window starts and the process
repeats. The duration of the time window is the only setting required. See TWCS compaction subproperties.
For more information about TWCS, see How is data maintained?.

• DateTieredCompactionStrategy (DTCS) (deprecated).

To configure the compaction strategy property and CQL compaction subproperties, such as the maximum
number of SSTables to compact and minimum SSTable size, use CREATE TABLE or ALTER TABLE.

1. Update a table to set the compaction strategy using the ALTER TABLE statement.

ALTER TABLE users WITH


compaction = { 'class' : 'LeveledCompactionStrategy' }

2. Change the compaction strategy property to SizeTieredCompactionStrategy and specify the minimum
number of SSTables to trigger a compaction using the CQL min_threshold attribute.

ALTER TABLE users


WITH compaction =
{'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 6 }

You can monitor the results of your configuration using compaction metrics, see Compaction metrics.
What's next: DataStax Enterprise supports extended logging for Compaction. This utility must be configured
as part of the table configuration. The extended compaction logs are stored in a separate file. For details, see
Enabling extended compaction logging.
Compression
Compression maximizes the storage capacity of DataStax Enterprise (DSE) nodes by reducing the volume of
data on disk and disk I/O, particularly for read-dominated workloads. The database quickly finds the location
of rows in the SSTable index and decompresses the relevant row chunks. DSE uses a storage engine that
dramatically reduces disk volume automatically. See Putting some structure in the storage engine
Write performance is not negatively impacted by compression in DataStax Enterprise as it is in traditional
databases. In traditional relational databases, writes require overwrites to existing data files on disk. The
database has to locate the relevant pages on disk, decompress them, overwrite the relevant data, and finally
recompress. In a relational database, compression is an expensive operation in terms of CPU cycles and disk I/
O. Because SSTable data files are immutable (they are not written to again after they have been flushed to disk),
there is no recompression cycle necessary in order to process writes. SSTables are compressed only once when
they are written to disk. Writes on compressed tables can show up to a 10 percent performance improvement.
In DSE the commit log can also be compressed and write performance can be improved 6-12%. See the
Updates to Cassandra’s Commit Log in 2.2 blog.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1010
Operations

When to compress data


Compression is most effective on a table with many rows, where each row contains the same set of columns (or
the same number of columns) as all other rows. For example, a table containing user data such as username,
email and state is a good candidate for compression. The greater the similarity of the data across rows, the
greater the compression ratio and gain in read performance.
A table whose rows contain differing sets of columns is not well-suited for compression.
Depending on the data characteristics of the table, compressing its data can result in:

• 25-33% reduction in data size

• 25-35% performance improvement on reads

• 5-10% performance improvement on writes

After configuring compression on an existing table, subsequently created SSTables are compressed. Existing
SSTables on disk are not compressed immediately. DataStax Enterprise compresses existing SSTables
when the normal database compaction process occurs. You can force existing SSTables to be rewritten and
compressed by using nodetool upgradesstables or nodetool scrub.
Configuring compression
You configure a table property and subproperties to manage compression. CQL table properties describes the
available options for compression. Compression is enabled by default.

• Disable compression, using CQL to set the compression parameter enabled to false.

CREATE TABLE DogTypes (


block_id uuid,
species text,
alias text,
population varint,
PRIMARY KEY (block_id)
)
WITH compression = { 'enabled' : false };

• Enable compression on an existing table, using ALTER TABLE to set the compression algorithm class to
LZ4Compressor, SnappyCompressor, or DeflateCompressor.

ALTER TABLE DogTypes


WITH compression = { 'class' : 'LZ4Compressor' };

• Change compression on an existing table, using ALTER TABLE and setting the compression algorithm
class to DeflateCompressor.

ALTER TABLE CatTypes


WITH compression = { 'class' : 'DeflateCompressor', 'chunk_length_in_kb' : 64 }

Tune data compression on a per-table basis using CQL to alter a table.

Testing compaction and compression


Write survey mode is a start-up option for testing new compaction and compression strategies. In write survey
mode, you can test out new compaction and compression strategies on that node and benchmark the write
performance differences, without affecting the production cluster.
Write survey mode adds a node to a database cluster. The node accepts all write traffic as if it were part of the
normal cluster, but the node does not officially join the ring.
You can also use the write survey mode to try out a new product version. The nodes you add in write survey
mode to a cluster must be of the same major release version as other nodes in the cluster. The write survey
mode relies on the streaming subsystem that transfers data between nodes in bulk and differs from one major
release to another.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1011
Operations

If you want to see how read performance is affected by modifications, stop the node, bring it up as a standalone
machine, and then benchmark read operations on the node.

1. Start the node using the write_survey option:

• Package installations: Add the following option to cassandra-env.sh file:

JVM_OPTS="$JVM_OPTS -Dcassandra.write_survey=true

• Tarball installations: Start the node with this option:

$ cd installation_location && sudo bin/cassandra -Dcassandra.write_survey=true

Migrating data to DataStax Enterprise


DataStax Enterprise (DSE) uses several solutions for migrating data from other databases:

• Use DataStax Bulk Loader (dsbulk) to load and unload CSV or JSON data in and out of the DSE database.

• DSE Graph Loader is a command line utility for loading graph datasets into DSE Graph from various input
sources.

• The CQL COPY TO command mirrors what the PostgreSQL RDBMS uses for file/export import.
You can use COPY in the CQL shell to read CSV data to DSE and write CSV data from DSE to a file system.
Typically, an RDBMS has unload utilities for writing table data to a file system.

• The sstableloader provides the ability to bulk load external data into a cluster.

• DSE Analytics can use Apache Spark to connect to a wide variety of data sources and save the data to DSE
using either the older RDD or newer DataFrame method.

The DataStax Apache Kafka™ Connector synchronizes records from a Kafka topic with rows in one or more DSE
database tables.
ETL tools
If you need more sophistication applied to a data movement situation than just extract-load, you can use
any number of extract-transform-load (ETL) solutions that support DataStax Enterprise. These tools provide
transformation routines for manipulating source data and then loading the data into a DSE target. The tools offer
features such as visual, point-and-click interfaces, scheduling engines, and more.
Many ETL vendors who support DSE supply community editions of their products that are free and able to solve
many different use cases. Enterprise editions are also available.
You can download ETL tools that work with DSE from Talend, Informatica, and Streamsets.

Collecting node health and indexing status scores


Node health options are always enabled for all nodes. Node health is a score-based representation of how fit a
node is to handle search queries. The node health composite score is based on dropped mutations and uptime. A
dynamic health score between 0 and 1 describes the health of the specified DataStax Enterprise node:

• A higher score indicates better node health. The highest score is 1.

• A lower score applies to nodes that have a large number of dropped mutations and nodes that are just
started.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1012
Operations

On DSE Search nodes, the shard selection algorithm uses account proximity and secondary factors such as
active and indexing statuses. You can examine node health scores and indexing status. The indexing status is
INDEXING, FINISHED, or FAILED.
Replication selection for distributed search queries can be configured to consider node health when multiple
candidates exist for a particular token range. This health-based routing enables a trade-off between index
consistency and query throughput. When the primary concern is performance, do not enable health-based routing.

1. In the dse.yaml file:

a. Customize node health options to increase the node health score from 0 to 1 (full health):

node_health_options:
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30

node_health_options
Node health options are always enabled.

If a node is repairing after a period of downtime, try increasing the


uptime_ramp_up_period_seconds value to the expected repair time.

b. To enable replication selection for distributed search queries to consider node health, enable health-
based routing:

enable_health_based_routing: true

Health-based routing enables a trade-off between index consistency and query throughput. When
the primary concern is performance, do not enable health-based routing.

2. To retrieve a dynamic health score between 0 and 1 that describes the specified DataStax Enterprise node,
use the dsetool node_health command.
For example:

$ dsetool -h 200.192.10.11 node_health

Node Health [0,1]: 0.7

If you do not specify the IP address, the default is the local DataStax Enterprise node.
Specify dsetool node_health -all to retrieve the node health scores for all nodes.
You can also see node health scores with dsetool status.

3. To retrieve the dynamic indexing status (INDEXING, FINISHED, or FAILED) of the specified core on a node,
use the dsetool core_indexing_status command.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1013
Operations

For example:

$ dsetool -h 200.192.10.11 core_indexing_status wiki.solr

wiki.solr: INDEXING

Clearing the data from DataStax Enterprise


Remove all data from any type of installation.
Package installation
To clear the data from the default directories:

1. After Stop the service.

2. Run one of the following commands:

$ sudo rm -rf /var/lib/cassandra/* ## Remove all data

$ sudo rm -rf /var/lib/cassandra/data/* ## Remove only the data directories

Tarball installation
To clear all data from the default directories:

1. Stop the DataStax Enterprise process.

2. Remove the data from the installation location:

$ cd installation_location

Run one of the following commands:

$ sudo rm -rf data/* commitlog/* saved_caches/* hints/* ## Remove all data

$ sudo rm -rf data/* ## Remove only the data directories

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1014
Chapter 10. Planning
Hardware selection, estimating disk capacity, anti-patterns, cluster testing and more.

Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1015

You might also like