Dse Admin 60
Dse Admin 60
0
Administrator Guide
Earlier DSE version
Latest 6.0 patch:
6.0.13
Updated: 2020-09-18-07:00
© 2020 DataStax, Inc. All rights reserved.
DataStax, Titan, and TitanDB are registered trademarks of DataStax,
Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks
of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
Contents
Chapter 1. Getting started................................................................................................................................... 16
New features.....................................................................................................................................................19
Chapter 4. Configuration....................................................................................................................................101
cassandra.yaml......................................................................................................................................... 109
dse.yaml.................................................................................................................................................... 141
remote.yaml...............................................................................................................................................180
cassandra-rackdc.properties..................................................................................................................... 184
cassandra-topology.properties.................................................................................................................. 184
Cassandra................................................................................................................................................. 189
JMX........................................................................................................................................................... 191
TPC........................................................................................................................................................... 192
LDAP......................................................................................................................................................... 193
Kerberos....................................................................................................................................................194
NodeSync..................................................................................................................................................194
Logging configuration......................................................................................................................................205
DSE Graph......................................................................................................................................................362
Architecture............................................................................................................................................... 433
Terminology...............................................................................................................................................440
Keyspaces.................................................................................................................................................450
Data types.................................................................................................................................................451
Operations.................................................................................................................................................451
CQL queries..............................................................................................................................................467
Metrics.......................................................................................................................................................468
nodetool...........................................................................................................................................................523
abortrebuild............................................................................................................................................... 523
assassinate............................................................................................................................................... 524
bootstrap................................................................................................................................................... 526
cfhistograms.............................................................................................................................................. 527
cfstats........................................................................................................................................................ 527
cleanup......................................................................................................................................................527
clearsnapshot............................................................................................................................................ 529
compact.....................................................................................................................................................530
compactionhistory..................................................................................................................................... 532
compactionstats........................................................................................................................................ 537
decommission........................................................................................................................................... 538
describecluster.......................................................................................................................................... 539
describering...............................................................................................................................................541
disableautocompaction..............................................................................................................................543
disablebackup........................................................................................................................................... 544
disablebinary............................................................................................................................................. 545
disablegossip.............................................................................................................................................547
disablehandoff........................................................................................................................................... 548
disablehintsfordc....................................................................................................................................... 549
drain.......................................................................................................................................................... 551
enableautocompaction.............................................................................................................................. 552
enablebackup............................................................................................................................................ 553
enablebinary..............................................................................................................................................555
enablegossip............................................................................................................................................. 556
enablehandoff............................................................................................................................................557
enablehintsfordc........................................................................................................................................ 558
failuredetector............................................................................................................................................560
flush...........................................................................................................................................................561
garbagecollect........................................................................................................................................... 562
gcstats....................................................................................................................................................... 564
getbatchlogreplaythrottle........................................................................................................................... 566
getcachecapacity.......................................................................................................................................567
getcachekeystosave..................................................................................................................................568
getcompactionthreshold............................................................................................................................ 570
getcompactionthroughput..........................................................................................................................571
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
getconcurrentcompactors.......................................................................................................................... 572
getconcurrentviewbuilders.........................................................................................................................574
getendpoints..............................................................................................................................................575
gethintedhandoffthrottlekb.........................................................................................................................578
getinterdcstreamthroughput...................................................................................................................... 579
getlogginglevels.........................................................................................................................................580
getmaxhintwindow.....................................................................................................................................582
getseeds....................................................................................................................................................583
getsstables................................................................................................................................................ 585
getstreamthroughput................................................................................................................................. 587
gettimeout..................................................................................................................................................589
gettraceprobability..................................................................................................................................... 590
gossipinfo.................................................................................................................................................. 592
handoffwindow.......................................................................................................................................... 593
help............................................................................................................................................................595
info.............................................................................................................................................................599
inmemorystatus......................................................................................................................................... 600
invalidatecountercache..............................................................................................................................602
invalidatekeycache.................................................................................................................................... 603
invalidaterowcache....................................................................................................................................605
join.............................................................................................................................................................606
listendpointspendinghints.......................................................................................................................... 607
leaksdetection........................................................................................................................................... 609
listsnapshots..............................................................................................................................................611
mark_unrepaired....................................................................................................................................... 613
move..........................................................................................................................................................614
netstats......................................................................................................................................................616
nodesyncservice........................................................................................................................................618
pausehandoff.............................................................................................................................................629
proxyhistograms........................................................................................................................................ 631
rangekeysample........................................................................................................................................ 633
rebuild........................................................................................................................................................634
rebuild_index............................................................................................................................................. 637
rebuild_view.............................................................................................................................................. 638
refresh....................................................................................................................................................... 640
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
refreshsizeestimates................................................................................................................................. 641
reloadseeds...............................................................................................................................................643
reloadtriggers............................................................................................................................................ 644
relocatesstables........................................................................................................................................ 645
removenode.............................................................................................................................................. 647
repair......................................................................................................................................................... 649
replaybatchlog........................................................................................................................................... 652
resetlocalschema...................................................................................................................................... 654
resume...................................................................................................................................................... 655
resumehandoff.......................................................................................................................................... 656
ring............................................................................................................................................................ 657
scrub..........................................................................................................................................................659
sequence...................................................................................................................................................660
setbatchlogreplaythrottle........................................................................................................................... 663
setcachecapacity.......................................................................................................................................665
setcachekeystosave.................................................................................................................................. 666
setcompactionthreshold............................................................................................................................ 668
setcompactionthroughput.......................................................................................................................... 669
setconcurrentcompactors.......................................................................................................................... 670
setconcurrentviewbuilders......................................................................................................................... 671
sethintedhandoffthrottlekb......................................................................................................................... 672
setinterdcstreamthroughput.......................................................................................................................674
setlogginglevel...........................................................................................................................................675
setmaxhintwindow..................................................................................................................................... 677
setstreamthroughput................................................................................................................................. 679
settimeout..................................................................................................................................................680
settraceprobability..................................................................................................................................... 682
sjk.............................................................................................................................................................. 684
snapshot....................................................................................................................................................686
status.........................................................................................................................................................689
statusbackup............................................................................................................................................. 691
statusbinary............................................................................................................................................... 693
statusgossip.............................................................................................................................................. 694
statushandoff.............................................................................................................................................695
stop............................................................................................................................................................697
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
stopdaemon...............................................................................................................................................698
tablehistograms......................................................................................................................................... 700
tablestats................................................................................................................................................... 701
toppartitions...............................................................................................................................................706
tpstats........................................................................................................................................................709
truncatehints..............................................................................................................................................715
upgradesstables........................................................................................................................................ 716
verify..........................................................................................................................................................718
version.......................................................................................................................................................720
viewbuildstatus.......................................................................................................................................... 721
dse commands................................................................................................................................................722
About dse commands............................................................................................................................... 722
add-node................................................................................................................................................... 724
advrep....................................................................................................................................................... 727
beeline.......................................................................................................................................................760
cassandra..................................................................................................................................................761
cassandra-stop..........................................................................................................................................763
exec...........................................................................................................................................................764
fs................................................................................................................................................................765
gremlin-console......................................................................................................................................... 766
list-nodes................................................................................................................................................... 767
pyspark......................................................................................................................................................768
remove-node............................................................................................................................................. 769
spark..........................................................................................................................................................771
spark-class................................................................................................................................................ 773
spark-jobserver..........................................................................................................................................774
spark-history-server...................................................................................................................................776
spark-sql....................................................................................................................................................777
spark-sql-thriftserver..................................................................................................................................778
spark-submit..............................................................................................................................................779
SparkR...................................................................................................................................................... 782
-v............................................................................................................................................................... 783
dse client-tool..................................................................................................................................................783
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
cassandra..................................................................................................................................................786
configuration export.................................................................................................................................. 788
configuration byos-export..........................................................................................................................789
spark..........................................................................................................................................................792
alwayson-sql..............................................................................................................................................794
nodesync......................................................................................................................................................... 796
disable....................................................................................................................................................... 798
enable........................................................................................................................................................801
help............................................................................................................................................................804
tracing........................................................................................................................................................807
validation................................................................................................................................................... 817
append...................................................................................................................................................... 819
cat..............................................................................................................................................................820
cd...............................................................................................................................................................822
chgrp......................................................................................................................................................... 824
chmod........................................................................................................................................................825
chown........................................................................................................................................................ 827
cp...............................................................................................................................................................828
df............................................................................................................................................................... 830
du.............................................................................................................................................................. 831
echo...........................................................................................................................................................833
exit.............................................................................................................................................................834
fsck............................................................................................................................................................ 835
get............................................................................................................................................................. 836
ls................................................................................................................................................................837
mkdir..........................................................................................................................................................839
mv..............................................................................................................................................................841
put............................................................................................................................................................. 843
pwd............................................................................................................................................................845
realpath..................................................................................................................................................... 846
rename...................................................................................................................................................... 847
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
rm.............................................................................................................................................................. 848
rmdir.......................................................................................................................................................... 849
stat.............................................................................................................................................................851
truncate..................................................................................................................................................... 852
umount...................................................................................................................................................... 853
dsetool.............................................................................................................................................................854
core_indexing_status................................................................................................................................ 857
create_core............................................................................................................................................... 859
createsystemkey....................................................................................................................................... 862
encryptconfigvalue.................................................................................................................................... 864
get_core_config.........................................................................................................................................864
get_core_schema......................................................................................................................................865
help............................................................................................................................................................867
index_checks.............................................................................................................................................868
infer_solr_schema..................................................................................................................................... 870
inmemorystatus......................................................................................................................................... 871
insights_config...........................................................................................................................................872
insights_filters............................................................................................................................................875
list_index_files........................................................................................................................................... 877
list_core_properties................................................................................................................................... 879
list_subranges........................................................................................................................................... 880
listjt............................................................................................................................................................ 881
managekmip revoke..................................................................................................................................884
managekmip destroy.................................................................................................................................885
node_health...............................................................................................................................................886
partitioner.................................................................................................................................................. 887
perf............................................................................................................................................................ 888
read_resource........................................................................................................................................... 891
rebuild_indexes......................................................................................................................................... 892
reload_core............................................................................................................................................... 894
ring............................................................................................................................................................ 896
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
set_core_property..................................................................................................................................... 897
sparkmaster cleanup.................................................................................................................................899
stop_core_reindex.....................................................................................................................................902
tieredtablestats.......................................................................................................................................... 903
tsreload......................................................................................................................................................905
unload_core...............................................................................................................................................906
upgrade_index_files.................................................................................................................................. 907
write_resource...........................................................................................................................................908
fs-stress tool..............................................................................................................................................920
sstabledowngrade..................................................................................................................................... 922
sstabledump.............................................................................................................................................. 924
sstableexpiredblockers.............................................................................................................................. 930
sstablelevelreset........................................................................................................................................931
sstableloader............................................................................................................................................. 933
sstablemetadata........................................................................................................................................ 935
sstableofflinerelevel...................................................................................................................................939
sstablepartitions........................................................................................................................................ 941
sstablerepairedset..................................................................................................................................... 944
sstablescrub.............................................................................................................................................. 946
sstablesplit.................................................................................................................................................948
sstableupgrade..........................................................................................................................................950
sstableutil.................................................................................................................................................. 951
sstableverify.............................................................................................................................................. 953
DataStax tools.................................................................................................................................................954
Starting as a service.................................................................................................................................957
Repairing nodes..............................................................................................................................................995
Compression........................................................................................................................................... 1010
• DataStax Enterprise-based applications and clusters are much different than relational databases and use
a data model based on the types of queries, not on modeling entities and relationships. Architecture in brief
contains key concepts and terminology for understanding the database.
• You can use DSE OpsCenter and Lifecycle Manager for most administrative tasks.
• Save yourself some time and frustration by spending a few moments looking at DataStax Doc and Search tips.
These short topics talk about navigation and bookmarking aids that will make your journey through the docs
more efficient and productive.
The following are not administrator specific but are presented to give you a fuller picture of the database:
• Cassandra Query Language (CQL) is the query language for DataStax Enterprise.
• DataStax provides drivers in several programming languages for connecting client applications to the
database.
• APIs are available for OpsCenter, DseGraphFrame, DataStax Spark Cassandra Connector, and the drivers.
Plan
The Planning and testing guide contains guidelines for capacity planning and hardware selection in production
environments. Key topics include:
• Estimating RAM
• CPU recommendations
Install
DataStax offers a variety of ways to set up a cluster:
Cloud
On premises
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
16
Getting started with DataStax Enterprise 6.0
• Docker images
• Binary tarball
For help with choosing an install type, see Which install method should I use?
Secure
DSE Advanced Security provides fine-grained user and access controls to keep applications data protected and
compliant with regulatory standards like PCI, SOX, HIPAA, and the European Union’s General Data Protection
Regulation (GDPR). Key topics include:
The DSE database includes the default role cassandra with password cassandra. This is a superuser login has
full access to the database. DataStax recommends only using the cassandra role once during initial Role Based
Access Control (RBAC) set up to establish your own root account and then disabling the cassandra role. See
Adding a superuser login.
Tune
Important topics for optimizing the performance of the database include:
Operations
The most commonly used operations include:
• Tools
Load
The primary tools for getting data into and out of the database are:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
17
Getting started with DataStax Enterprise 6.0
• DSE OpsCenter
Troubleshooting
• Troubleshooting guide
Upgrading
Key topics in the Upgrade Guide include:
Advanced Functionality
See Advanced functionality in DataStax Enterprise 6.0.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
18
Getting started with DataStax Enterprise 6.0
DSE Management Services automatically handle administration and maintenance tasks and assist with
overall database cluster management.
NodeSync service
Continuous background repair that virtually eliminates manual efforts to run repair operations in a
DataStax cluster.
Advanced Replication
Advanced Replication allows a single cluster to have a primary hub with multiple spokes. This allows
configurable, bi-directional distributed data replication to and from source and destination clusters.
DSE In-Memory
Store and access data exclusively from memory.
DSE Multi-Instance
Run multiple DataStax Enterprise nodes on a single host machine.
DSE Tiered Storage
Automate data movement across different types of storage media.
Feature Description
NodeSync DSE NodeSync removes the need for manual repair operations in DSE's distribution of Cassandra and eliminates
cluster outages that are attributed to manual repair failures. This equates to operational cost savings, reduced support
cycles, and reduced application management pain. NodeSync also makes applications run more predictably, making
capacity planning easier. NodeSync’s advantages for operational simplicity extend across the whole data layer
including database, search, and analytics.
Be sure to read the DSE NodeSync: Operational Simplicity at its Best blog.
Advanced Performance DSE Advanced Performance delivers numerous performance advantages over open-source Apache Cassandra
including:
• Thread per core (TPC) and asynchronous architecture: A coordination-free design, DSE’s thread-per-core
architecture provides up to 2x more throughput for read and write operations.
• Storage engine optimizations that provide up to half the latency of open source Cassandra and include optimized
compaction.
• DataStax Bulk Loader Up to 4x faster loads and unloads of data over current data loading utilities. Up to 4 times
faster than current data loading utilities. Be sure to read the Introducing DataStax Bulk Loader blog.
• Continuous paging improves DSE Analytics read performance by up to 3x over open source Apache Cassandra
and Apache Spark.
DSE TrafficControl DSE TrafficControl provides a backpressure mechanism to avoid overloading DSE nodes with client or replica
requests that could make DSE nodes unresponsive or lead to long garbage collections and out of memory errors. DSE
TrafficControl is enabled by default and comes pre-tuned to accommodate very different workloads, from simple reads
and writes to the most extreme workloads. It requires no configuration.
Automated Upgrades for Part of OpsCenter LifeCycle Manager, the Upgrade Service handles patch upgrades of DSE clusters at the data center,
patch releases rack, or node level with up to 60% less manual involvement. The Upgrade Service allows you to easily clone your
existing configuration profile to ensure compatibility with DSE upgrades. Be sure to read the Taking the Pain Out of
Database Upgrades blog.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
19
Getting started with DataStax Enterprise 6.0
Feature Description
• AlwaysOn SQL with advanced security, ensures around-the-clock uptime for analytics queries with the freshest,
secure insight. It is interoperable with existing business intelligence tools that utilize ODBC/JDBC and other Spark-
based tools. Be sure to read the Introducing AlwaysOn SQL for DSE Analytics blog.
• Structured Streaming simple, efficient, and robust streaming of data from Apache Kafka, file systems, or other
sources.
• Enhanced Spark SQL support allows you to execute Spark queries using a variation of the SQL language. Spark
SQL includes APIs for returning Spark Datasets in Scala and Java, and interactively using an SQL shell or visually
through DataStax Studio notebooks.
Be sure to read the What’s New for DataStax Enterprise Analytics 6 blog.
• Better throughput for DSE Graph due to Advanced Performance improvements, resulting in DSE Graph handling
more requests per node.
• Smart Analytics Query Routing: the DSE Graph engine automatically routes a Gremlin OLAP traversal to the
correct implementation (DSE Graph Frames or Gremlin OLAP) for the fastest and best execution.
• Advanced Schema Management provides the ability to remove any graph schema element, not just vertex labels
or properties.
• The Batches in DSE Graph Fluent API adds the ability to execute DSE Graph statements in batches to speeds up
writes to DSE Graph.
• TinkerPop 3.3.0. DataStax has added a lot of great enhancements to the Apache TinkerPopTM tool suite.
Enhancements have proved faster, more robust graph querying and provided a better developer experience.
• Private Schemas: Control who can see what parts of a table definition, critical for security compliance best
practices.
• Separation of Duties: Create administrator roles who can carry out everyday administrative tasks without having
unnecessary access to data.
• Auditing by Role: Focus your audits on the users you need to scrutinize. You can now elect to audit activity by user
type and increase the signal to noise ratio by removing application tier system accounts from the audit trail.
• Unified Authorization for DSE Analytics: Additional protection for data used for analytics operations.
Be sure to read the Safe data? Check. DataStax Enterprise Advanced Security blog.
DSE Search Built with a production-certified version of Apache Solr™ 6, DSE Search requires less configuration, improved search
data consistency, and a more synchronous write path for indexing data with less moving pieces to tune and monitor.
DSE 5.1 introduced index management CQL and cqlsh commands to streamline operations and development. DSE 6.0
adds a wider array of CQL query functionality and indexing support.
Be sure to read the What’s New for Search in DSE 6 blog.
• The Batches in DSE Graph Fluent API adds the ability to execute DSE Graph statements in batches to speed up
writes to DSE Graph.
• The C# and Node.js DataStax drivers include Batches in DSE Graph Fluent API, as well as the Java and Python
drivers.
Be sure to read the What’s New With Drivers for DSE 6 blog.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
20
Getting started with DataStax Enterprise 6.0
Feature Description
DataStax Studio Improvements to DSE Studio further ease DSE development include:
• Notebook Sharing: Easily collaborate with your colleagues to develop DSE applications using the new import and
export capabilities.
• Spark SQL support: Query and analyze data with Spark SQL using DataStax Studio's visual and intelligent
notebooks, which provide syntax highlighting, auto-code completion and correction, and more.
• Interactive Graphs: explore and configure DSE Graph schemas with a whiteboard-like view that allows you to drag
your vertices and edges.
• Notebook History: provides a historical dated record with descriptions and change events that makes it easy to
track and rollback changes.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
21
Chapter 2. DataStax Enterprise release notes
Release notes for DataStax Enterprise 6.0.
Before you upgrade to a later major version, upgrade to the latest patch Upgrades to DSE 6.0 are supported from:
release (6.0.13) on your current version. Be sure to read the relevant
upgrade documentation. • DSE 5.1
• DSE 5.0
Check the compatibility page for your products. DSE 6.0 product compatibility:
• OpsCenter 6.5
• Studio 6.0
See Upgrading DataStax drivers. DataStax Drivers: You may need to recompile your client application
code.
Use DataStax Bulk Loader for loading and unloading data. Loads data into DSE 5.0 or later and unloads data from any Apache
Cassandra™ 2.1 or later data source.
• 6.0.13 Components
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
22
DataStax Enterprise release notes
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
All components from DSE 6.0.13 are listed. Components that are updated for DSE 6.0.11 are indicated with an
asterisk (*).
• Netty 4.1.25.7.dse
DSE 6.0.13 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements if any.
DataStax recommends upgrading all DSE Search nodes to DSE 6.0.13 or later.
• Fixed StackOverflowError thrown during read repairs (only large clusters or clusters with enabled vnodes
are affected). (DB-4350)
• Increased default direct_reads_size_in_mb value. Previously it was 2M per core + 2M shared. It is now
4M per core + 4M shared. (DB-4348)
• Slow indexing at bootstrap time due to early TPC boundaries computation when node is replaced by a node
with the same IP (DB-4049)
• Fixed a problem with the treatment of zeroes in the type decimal that could cause assertion errors, or not
being able to find some rows if their key is 0 written using different precisions, or both. (DB-4472)
• Fixed the NullPointerException issue described in CASSANDRA-14200: NPE when dumping an SSTable
with null value for timestamp column. (DB-4512)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
23
DataStax Enterprise release notes
• Fix an issue that was causing excessive contention during encryption/decryption operations. The fix results
in an encryption/decryption performance improvement. (DB-4419)
• Fixed an issue to prevent an unbounded number of flushing tasks for memtables that are almost empty.
(DB-4376)
• Global BloomFilterFalseRatio is now calculated in the same way as table BloomFilterFalseRatio. Now
both types of metrics include true negatives, the formula is ratio = falsePositiveCount / (truePositiveCount +
falsePositiveCount + trueNegativeCount). (DB-4439)
• Fixed a bug whereby after a node replacement procedure. the bootstrap indexing in DSE Search was
happening only on one TPC core. (DB-4049)
• Systemd units are included for DSE packages for CentOS and compatible OSes. (DSP-7603)
• The server_host option in dse.yaml now handles mutiple, comma separated LDAP server addresses.
(DSP-20833)
• Cassandra tools now work on encrypted SSTables when security is configured. (DSP-20940)
• Recording a slow CQL query to the log will no longer block the thread. (DSP-20894)
• The frequency of range queries performed by lease manager is now configurable via
dse.lease.refresh.interval.seconds system property (an addition to JMX and dsetool command)
(DSP-20696)
• Security updates:
# The jackson-databind library has been upgraded to 2.9.10.4 to address a Jackson databind vulnerability
(CVE-2020-8840) (DSP-20981)
# Fixed some security vulnerabilities for Solr HTTP REST API when authorization is enabled. Now, users
with no appropriate permissions can perform search operations. Resources can be deleted when
authorization is enabled, given the correct permissions. (DSP-20749)
# Fixed an issue where the audit logging did not capture search queries. (DSP-21058)
# While there is no change in default behavior, there is a new render_cql_literals option in dse.yaml
under the audit logging section, which is false by default. When enabled, bound variables for logged
statements will be rendered as CQL literals, which means there will be additional quotation marks and
escaping, as well as values of all complex types (collections, tuples, udts) will be in human readable
format. (DSP-17032)
# Fixed LDAP settings to properly handle nested groups so that LDAP enumerates all ancestors of a
user's distinguishedName. Inherited groups retrieval with directory_search and members_search types.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
24
DataStax Enterprise release notes
Fixed fetching parent groups of a role that's mapped to an LDAP group. See new dse.yaml options,
all_groups_xxx in ldap_options, to configure optimized retrieval of parent groups, including inherited
ones, in a single roundtrip. (DSP-20107)
# When DSE tries one authentication scheme and finds that the password is invalid, DSE now tries
another scheme, but only if the user has a scheme permission for that other scheme. (DSP-20903)
# Raised the upper bound limit on DSE LDAP caches. The upper limit for
ldap_options.credentials_validity_in_ms has been increased to 864,000,000 ms, which is
10 days. The upper limit for ldap_options.search_validity_in_seconds has been increased to
864,000 seconds, which is 10 days. (DSP-21072)
# Fixed an error condition when DSE failed to get the LDAP roles while refreshing a database schema.
(DSP-21075)
6.0.13 DSEFS
• To minimize fsck impact on overloaded clusters, throttling is possible via -p or --parallelism arguments.
• Fixed an issue where an excessive number of connections are created to port 5599 when using DSEFS.
(DSP-21021)
• Search-related latency metrics will now decay in time like other metrics. Named queries (using query.name
parameter) will now have separate latency metrics. New mbean atributes are available for search
latency metrics: TotalLatency (us), Min, Max, Mean, StdDev, DurationUnit, MeanRate, OneMinuteRate,
FiveMinuteRate, FifteenMinuteRate, RateUnit, 98th, 999th. (DSP-19612)
• Fixed some security vulnerabilities for Solr HTTP REST API when authorization is enabled. Now, users with
no appropriate permissions can perform search operations. Resources can be deleted when authorization is
enabled, given the correct permissions. (DSP-20749)
• Fixed a bug where a decryption block cache occasionally was not operational (SOLR-14498). (DSP-20987)
• Fixed an issue where the audit logging did not capture search queries. (DSP-21058)
• Fixed a bug where after several months of up time an encrypted index wouldn't accept more writes unless
the core is reloaded. (DSP-21234)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
25
DataStax Enterprise release notes
• 6.0.12 Components
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
All components from DSE 6.0.12 are listed. Components that are updated for DSE 6.0.11 are indicated with an
asterisk (*).
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
26
DataStax Enterprise release notes
• Apache Solr™6.0.1.1.2716
• Apache Spark™2.2.3.13
• Netty 4.1.25.6.dse
DataStax recommends upgrading all DSE Search nodes to DSE 6.0.12 or later.
• The frequency of range queries performed by lease manager is now configurable via JMX and dsetool
command. (DSP-20696)
• Added dse.ldap.retry_interval.ms system property, which sets the time between subsequent retries
when trying authentication using LDAP server. (DSP-20298)
• Removed Jodd Core dependency that created vulnerability to Arbitrary File Writes. (DSP-19206)
• Added a new JMX attribute of ConnectionSearchPassword for LdapAuthenticator bean has been added,
which updates the LDAP search password without the need to restart DSE. (DSP-18928)
• dsetool ring shows in-progress search index building during bootstrap. (DSP-15281)
• Made the search reference visible in the error message for LDAP connections. (DSP-20578)
• DecayingEstimatedHistogram now decays even when there are no updates so invalid metric values do not
linger. (DSP-20674)
• Nodesync can now be enabled on all system distributed and protected tables. (DB-3241)
• Improved the estimated values of histogram percentiles reported via JMX. In some cases, the percentiles
may go slightly up. (DB-4275)
• Added --disable-history option to cqlsh that disables saving history to disk for current execution. Added
history section to cqlshrc which is called with boolean parameter disabled that is set to False by default.
(DB-3843)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
27
DataStax Enterprise release notes
• Improved error messaging for enabled internode SSL encryption in Cassandra Tools test suite. (DB-3957)
• Security updates:
Resolved issues:
• Bug that prevented LIST ROLES and LIST USERS to work with system-keyspace-filtering enabled.
(DB-4221)
• Continuous paging sessions could leak if the continuous result sets on the driver side were not exhausted or
cancelled. (DB-4313)
• Error that caused nodetool viewbuildstatus to return an incorrect error message. (DB-2397)
Resolved issues:
• Internal continuous paging sessions were not closed when LIMIT clause was added in SQL query, which
caused sessions leak and inability to close the Spark application gracefully because the Java driver waited
indefinitely for orphaned sessions to finish. (DSP-19804)
• Removed Jodd Core dependency that created vulnerability to Arbitrary File Writes. (DSP-19206)
• Security updates:
6.0.12 DSEFS
• DSEFS local file system implementation returns alphabetically sorted directories and files when using
wildcards and listing command. (DSP-20057)
• When creating a file through WebHDFS API, DSEFS does not verify WX permissions of parent's parent
when the parent exists. (DSP-20355)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
28
DataStax Enterprise release notes
Resolved issues:
• DSEFS cannot use Mixed Case keyspaces, which was broken by DSP-16825. (DSP-20354)
• Changed classic Graph query so vertices are read from _p tables in Cassandra using SELECT ... WHERE
<vertex primary key columns> statement. The search predicate is applied in memory. (DSP-20230)
• Error messages related to Solr errors contain better description of the root cause. (DSP-13792)
• The dsetool stop_core_reindex command now mentions the node in the output message. (DSP-17090)
• Improved warnings for search index creation via dsetool or CQL. (DSP-17994)
• Improved guidance with warnings when index rebuild is required for ALTER SEARCH INDEX, RELOAD SEARCH
INDEX, and dsetool reload_core commands. (DSP-19347)
• suggest request handler requires select permission. Previously, suggest request handler returned
forbidden response when authorization was on, regardless of the user permissions. (DSP-20697)
• Security update:
• 6.0.11 Components
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
29
DataStax Enterprise release notes
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
All components from DSE 6.0.11 are listed. Components that are updated for DSE 6.0.10 are indicated with an
asterisk (*).
• Netty 4.1.25.6.dse
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
30
DataStax Enterprise release notes
DataStax recommends upgrading all DSE Search nodes to DSE 6.0.11 or later.
• Fixed a bug to avoid multiple disposals of Solr filter cache DocSet objects. (DSP-15765)
• Improve performance, logging, and add options for using the Solr timeAllowed parameter in all queries.
The Solr timeAllowed option in queries is now enforced by default to prevent long-running shard queries.
(DSP-19781, DSP-19790)
• Add support for nodesync command to specify different IP addresses for JMX and CQL. (DB-2969)
• Prevent accepting streamed SSTables or loading SSTables when the clustering order does not match.
(DB-3530)
• Dropping and re-adding the same column with incompatible types is not supported. This change prevents
unreadable SSTables. (DB-3586)
Resolved issues:
• Reads against ma and mc SSTables hit more SSTables than necessary due to the bug fixed by
CASSANDRA-14861. (DB-3691)
• Error retrieving expired columns with secondary index on key components. (DB-3764)
• The diff logic used by the secondary index does not always pick the latest schema and results in ERROR
[CoreThread-8] errors on batch writes. (DB-3838)
• Fixed concurrency factor calculation for distributed range read with a maximum 10 times
the number of cores. Configurable maximum concurrency factor with new JVM argument -
Ddse.max_concurrent_range_requests. (DB-3859)
• AIO and DSE Metrics Collector are not available on REHL/Centos 6.x because GLIBC_2.14 is not present.
(DSP-18603)
• Using SELECT JSON for empty BLOB values incorrectly returns an empty string instead of the expected 0x.
(DSP-20022)
• RoleManager cache keeps invalid values if the LDAP connectivity is down. (DSP-20098)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
31
DataStax Enterprise release notes
• LDAP user login fails due to parsing failure on user DN with parentheses. (DSP-20106)
• New du dsefs shell command lists sizes of the files and directories in a specific directory. (DSP-19572)
• Improve configuration of available system resources for Spark Workers. You can now set the total memory
and total cores with new environment variables that take precedence over the resource_manager_options
defined in dse.yaml. (DSP-19673)
dse.yaml resource_manager_options Environment variable
memory_total SPARK_WORKER_TOTAL_MEMORY
cores_total SPARK_WORKER_TOTAL_CORES
• Support for multiple contact points is added for DSEFS implementation of the Hadoop FileSystem.
(DSP-19704)
Provide FileSystem URI with:
$ dsefs://host0\[:port\]\[,host1\[:port\]\]/
Enhancements:
• The Solr timeAllowed option in queries is now enforced by default to prevent long-running shard queries.
This change prevents complex facets and boolean queries from using system resources after the DSE
Search coordinator considers the queries to have timed out. For all queries, the default for the timeAllowed
value uses the value of client_request_timeout_seconds setting in dse.yaml. (DSP-19781, DSP-19790)
While using Solr timeAllowed in queries improves performance for long zombie queries, it can cause
increased per-request latency cost in mixed workloads. If the per-request latency cost is too high, use the
-Ddse.timeAllowed.enabled.default search system property to disable timeAllowed in your queries.
Resolved issues:
• Apply filter cache optimization to remote shard requests when RF=N. . (DSP-19800)
• Filter cache warming doesn't warm parent-only filter correctly when RF=N. (DSP-19802)
• Handle paging states serialized with a different version than the session version (CASSANDRA-15176)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
32
DataStax Enterprise release notes
• 6.0.10 Components
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
All components from DSE 6.0.10 are listed. Components that are updated for DSE 6.0.10 are indicated with an
asterisk (*).
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
33
DataStax Enterprise release notes
• Netty 4.1.13.13.dse
DSE 6.0.10 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.
• Fixed incorrect handling of frozen type issues to accept all valid CQL statements and reject all invalid CQL
statements. (DB-3084)
• Standalone cqlsh client tool provides an interface for developers to interact with the database and issue
CQL commands without having to install the database software. From DataStax Labs, download the version
of CQLSH that corresponds to your DataStax database version. (DSP-18694)
• New options to select cipher suite and protocol to configure KMIP encryption when connecting to a KMIP
server. (DSP-17294)
• Storing and revoking permissions for the application owner is removed. The application owner is explicitly
assumed to have these permissions. (DSP-19393)
• Fixed an issue where T values are hidden by property keys of the same name in valueMap(). (DSP-19261)
# facet.limit < 0 is no longer supported. Override the default facet.limit of 20000 with the -
Dsolr.max.facet.limit.size system property.
# This change adds guardrails that can cause misconfigured faceting queries to fail. Before upgrading, set
an explicit facet.limit.
• Improved troubleshooting. A log entry is now created when autocompaction is disabled or enabled for a
table. (DB-1635)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
34
DataStax Enterprise release notes
• Enhanced DroppedMessages logging output adds the size percentiles of the dropped messages, their most
common destinations, and the most common tables targeted for read requests or mutations. (DB-1250)
• Reformatted StatusLogger output to reduce details in the INFO level system.log. The detailed output is still
present in the debug.log. (DB-2552)
• For nodetool tpstats -F json and nodetool tpstats -F yaml, wait latencies (in ms) appear in the
output. Although not labeled, the wait latencies are included in the following order: 50%, 75%, 95%, 98%,
99%, Min, and Max. (DB-3401)
• New resources improve debugging leaked chunks before the cache evicts them and provide more
meaningful call stack and stack trace. (DB-3504)
# RandomAccessReader/RandomAccessReader
# AsyncPartitionReader/FlowSource
# AsyncSSTableScanner/FlowSource
• New options to select cipher suite and protocol to configure KMIP encryption when connecting to a KMIP
server. (DSP-17294)
• Standalone cqlsh client tool provides an interface for developers to interact with the database and issue
CQL commands without having to install the database software. From DataStax Labs, download the version
of CQLSH that corresponds to your DataStax database version. (DSP-18694)
• Upgraded Apache MINA Core library to 2.0.21 to prevent a security issue where Apache MINA Core was
vulnerable to information disclosure. (DSP-19213)
• Update Jackson Databind to 2.9.9.1 for all components except DataStax Bulk Loader. (DSP-19441)
Resolved issues:
• Tarball installs to create two instances on the same physical server with remote JMX access with binding
the separated IPs to port 7199 causes JMX error of Address already in use (Bind failed) because
com.sun.management.jmxremote.host is ignored. (DB-2483)
• Incorrect handling of frozen type issues: valid CQL statements are not accepted and invalid CQL statements
are not property rejected. (DB-3084)
• DSE fails to start with ERROR Attempted serializing to buffer exceeded maximum of 65535 bytes. Improved
error to identify a workaround for commitlog corruption. (DB-3162)
• sstabledowngrade needs write access to the snapshot folder for a different output location. (DB-3231)
• The number of pending compactions reported by nodetool compactionstats was incorrect (off by one) for
Time Window Compaction Strategy (TWCS). (DB-3284)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
35
DataStax Enterprise release notes
• When unable to send mutations to replicas due to overloading, hints are mistakenly created against the local
node. (DB-3421)
• When a non-frozen UDT column is dropped and the table is later re-created from the schema that was
created as part of a snapshot, the dropped column record is invalid and may lead to failure loading some
SSTables. (DB-3434)
• Error in custom provider prevents DSE node startup. With this fix, the node will start up but insights
is not active. See the DataStax Support Knowledge Base for steps to resolve existing missing or incorrect
keyspace replication problems. (DSP-19521)
Known issues:
• On Oracle Linux 7.x, StorageService.java:4970 exception occurs with DSE package installation.
(DSP-19625)
Workaround: On Oracle Linux 7.x operating systems, install DSE using the binary tarball.
• Storing and revoking permissions for the application owner is removed. Instead of explicitly storing
permission of the application owner to manage and view Spark applications, the application owner is
explicitly assumed to have these permissions. (DSP-19393)
Resolved issues:
• Spark applications incorrectly reported that joins were broken. DirectJoin output check too strict.
(DSP-19063)
• Submitting many Spark apps will reach the default tombstone_failure_threshold before the default 90 days
gc_grace_seconds defined for the system_auth.role_permissions table. (DSP-19098)
Workaround with this fix:
1. Manually grant permissions to the user before the user starts Spark jobs:
• Credentials are not masked in the debug level logs for Spark Jobserver and Spark submitted jobs.
(DSP-19490)
• New graph truncate command to remove all data from graph. (DSP-17609)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
36
DataStax Enterprise release notes
Resolved issues:
• T values get hidden by property keys of the same name in valueMap(). (DSP-19261)
Enhancements:
• DSE 6.0 search query latency is on parity with DSE 5.1. (DSP-18677)
• For token ranges dictated by distribution, filter cache warming occurs when a node is restarted, a search
index is rebuilt, or when node health score is up to 0.9. New per-core metrics for metric type WarmupMetrics
and other improvements. (DSP-8621)
# facet.limit < 0 is no longer supported. Override the default facet.limit of 20000 with the -
Dsolr.max.facet.limit.size system property.
# This change adds guardrails that can cause misconfigured faceting queries to fail. Before upgrading, set
an explicit facet.limit.
Resolved issues:
• Solr CQL count query incorrectly returns the count as all data count but should return all data count minus
start offset. (DSP-16153)
• Validation error does not get returned when docValues are applied when types do not allow docValues.
(DSP-16884)
With this fix, the following exception behavior is applied:
# Throw exception when docValues:true is specified for a column and column type does not support
docValues.
# Do not throw exception and ignore docValues:true for columns with types that do not support docValues
if docValues:true is set for *.
• When using live indexing, also known as Real Time (RT) indexing, stale Solr documents contain data that is
updated in the database. This issue happens when a facet query is run against a search index (core) while
inserting or loading data, and the search core is shut down. (DSP-18786)
• When driver uses paging, CQL query fails when using a Solr index to query with a sort on a field that
contains the primary key name in the field: InvalidRequest: Error from server: code=2200 [Invalid
query] message="Cursor functionality requires a sort containing a uniqueKey field tie
breaker". (DSP-19210)
Known issues:
• The count() query with Solr enabled can be inaccurate or inconsistent. (DSP-19401)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
37
DataStax Enterprise release notes
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
TinkerPop changes for DSE 6.0.10
DataStax Enterprise (DSE) 6.0.10 includes TinkerPop 3.3.7 with all DataStax enhancements from earlier
versions.
DSE 6.0.9 release notes
9 July 2019
In this section:
• 6.0.9 Components
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
6.0.9 Components
• Netty 4.1.13.13.dse
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
38
DataStax Enterprise release notes
DSE 6.0.9 is compatible with Apache Cassandra™ 3.11 and includes all DataStax enhancements from earlier
versions.
• Fixed possible data loss when using DSE Tiered Storage. (DB-3404)
If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.
• 6.0.8 Components
• 6.0.8 DSEFS
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
39
DataStax Enterprise release notes
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
6.0.8 Components
All components from DSE 6.0.8 are listed. Components that are updated for DSE 6.0.8 are indicated with an
asterisk (*).
• Netty 4.1.13.13.dse *
DSE 6.0.8 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.
• Significant fixes and improvements for native memory, the chunk cache, and async read timeouts.
DSEFS highlights
• Fix handling of path alternatives in DSEFS shell to provide wildcard support for mkdir and ls commands.
(DSP-17768)
• You can now dynamically pass cluster and connection configuration for different graph objects. Fixes the
issue where DseGraphFrame cannot directly copy graph from one cluster to another. (DSP-18605)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
40
DataStax Enterprise release notes
• New configurable memory leak tracking: new nodetool leaksdetection command and Memory leak detection
settings options in cassandra.yaml. (DB-3123)
• Changes to correct uneven distribution of shard requests with the STATIC set cover finder. (DSP-18197)
• New recommended method for case-insensitive text search, faceting, grouping, and sorting with new
LowerCaseStrField Solr field type. This type sets field values as lowercase and stores them as lowercase in
docValues. (DSP-18763)
• The queryExecutorThreads and timeAllowed Solr parameters can be used together. (DSP-18717)
• Avoid interrupting request threads when an internode handshake fails so that the Lucene file channel lock
cannot be interrupted. Fixes LUCENE-8262. (DSP-18211)
# Improved lightweight transactions (LWT) performance. New cassandra.yaml LWT configuration options.
(DB-3018)
# Optimized memory usage for direct reads pool when using a high number of LWTs. (DB-3124)
When not set in cassandra.yaml, the default calculated size of direct_reads_size_in_mb changed from
128 MB to 2 MB per TPC core thread, plus 2 MB shared by non-TPC threads, with a maximum value of
128 MB.
• Improved logging identifies which client, keyspace, table, and partition key is rejected when mutation
exceeds size threshold. (DB-1051)
• Enable upgrading and downgrading SSTables using a CQL file that contains DDL statements to recreate the
schema. (DB-2951)
Resolved issues:
• Possible direct memory leak when part of bulk allocation fails. (DB-3125)
• Counters in memtable allocators and buffer pool metrics can be incorrect when out of memory (OOM)
failures occur. (DB-3126)
• Memory leak occurs when a read from disk times out. (DB-3127)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
41
DataStax Enterprise release notes
• Bootstrap should fail when the node can't fetch the schema from other nodes in the cluster. (DB-3186)
• Deadlock when replaying schema mutations from commit log during DSE startup. (DB-3190)
• Make the remote host visible in the error message for failed magic number verification. (DSP-18645)
Known issue:
• A warning message is displayed when DSE authentication is enabled, but Spark security is not enabled.
(DSP-17273)
• Spark Cassandra Connector: To improve connection for streaming applications with shorter batch times, the
default value for Keep Alive is increased to 1 hour. (DSP-17393)
Resolved issues:
• Reduce probability of hitting max_concurrent_sessions limit for OLAP workloads with BYOS (Bring Your
Own Spark). (DSP-18280)
For OLAP workloads with BYOS, DataStax recommends increasing the max_concurrent_sessions using
this formula as a guideline:
• Accessing files from Spark through WebHDFS interface fails with message: java.io.IOException:
Content-Length is missing. (DSP-18559)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
42
DataStax Enterprise release notes
• Submitting many Spark applications will reach the default tombstone_failure_threshold before the default 90
days gc_grace_seconds defined for the system_auth.role_permissions table. (DSP-19098)
6.0.8 DSEFS
Resolved issues:
• Fix handling of path alternatives in DSEFS shell to provide wildcard support for mkdir and ls commands.
(DSP-17768)
For example, to make several subdirectories with a single command:
• The graph configuration and gremlin_server sections in DSE Graph system-level options are now correctly
commented out at the top level. (DSP-18477)
Resolved issues:
• Time, date, inet, and duration data types are not supported in graph search indexes. (DSP-17694)
• Should prevent sharing Gremlin Groovy closures between scripts that are submitted through session-less
connections, like DSE drivers. (DSP-18146)
• Operations through gremlin-console run with system permissions, but should run with anonymous
permissions. (DSP-18471)
• DseGraphFrame cannot directly copy graph from one cluster to another. You can now dynamically pass
cluster and connection configuration for different graph objects. (DSP-18605)
Workaround for earlier versions:
$ g.V.write.format("csv").save("dsefs://culster1/tmp/vertices") &&
g.E.write.format("csv").save("dsefs://culster1/tmp/edges")
$ g.updateVertices(spark.read.format("csv").load("dsefs://culster1/tmp/vertices")
&& g.updateEdges(spark.read.format("csv").load("dsefs://culster1/tmp/edges")
• Issue querying a search index when the vertex label is set to cache properties. (DSP-18898)
• UnsatisfiedLinkError when insert multi edge with DseGraphFrame in BYOS (Bring Your Own Spark).
(DSP-18916)
• DSE Graph does not use primary key predicate in Search/.has() predicate. (DSP-18993)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
43
DataStax Enterprise release notes
• Reject requests from the TPC backpressure queue when requests are on the queue for too long.
(DSP-15875)
• Changes to correct uneven distribution of shard requests with the STATIC set cover finder. (DSP-18197)
A new inertia parameter for dsetool set_core_property supports fine tuning. The default value of 1 can be
adjusted for environments with vnodes and more than 10 vnodes.
• New recommended method for case-insensitive text search, faceting, grouping, and sorting with new
LowerCaseStrField custom Solr field type. This type sets field values as lowercase and stores them as
lowercase in docValues. (DSP-18763)
DataStax does not support using the TextField Solr field type with solr.KeywordTokenizer and
solr.LowerCaseFilterFactory to achieve single-token, case-insensitive indexing on a CQL text field.
Resolved issues:
• SASI queries don't work on tables with row level access control (RLAC). (DB-3082)
• Documents might not be removed from the index when a key element has value equal to a Solr reserved
word. (DSP-17419)
• Avoid interrupting request threads when an internode handshake fails so that the Lucene file channel lock
cannot be interrupted. Fixes LUCENE-8262. (DSP-18211)
Workaround for earlier versions: Reload the search core without restarting or reindexing.
• Search should error out, rather than timeout, on Solr query with non-existing field list (fl) fields. (DSP-18218)
• Fixed PartitionStrategy when setting vertex label and having includeMetaProperties configured to
true.
• Fixed bug with EventStrategy in relation to addE() where detachment was not happening properly.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
44
DataStax Enterprise release notes
• Fixed bug in detachment of Path where embedded collection objects would prevent that process.
• Quieted "host unavailable" warnings for both the driver and Gremlin Console.
• Implemented EdgeLabelVerificationStrategy.
• Fixed behavior of P for within() and without() in Gremlin Language Variants (GLV) to be consistent with
Java when using variable arguments (varargs).
• Refactored use of commons-lang to use common-lang3 only. Dependencies may still use commons-lang.
• Added GraphSON serialization support for Duration, Char, ByteBuffer, Byte, BigInteger, and BigDecimal in
gremlin-python.
• Added ProfilingAware interface to allow steps to be notified that profile() was being called.
• Fixed bug where profile() could produce negative timings when group() contained a reducing barrier.
• Improved logic determining the dead or alive state of a Java driver connection.
• Fixed a bug in PartitionStrategy where addE() as a start step was not applying the partition.
• Added Symbol.asyncIterator member to the Traversal class to provide support for await ... of
loops (async iterables).
Bug fixes:
• TINKERPOP-2094 Gremlin Driver Cluster Builder serializer method does not use mimeType as suggested.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
45
DataStax Enterprise release notes
• TINKERPOP-2105 Gremlin-Python connection not returned back to the pool on exception from the Gremlin
Server.
Improvements:
• TINKERPOP-1889 JavaScript Gremlin Language Variants (GLV): Use heartbeat to prevent connection
timeout.
• TINKERPOP-2071 gremlin-python: the graphson deserializer for g:Set should return a python set.
• TINKERPOP-2074 Ensure that only NuGet packages for the current version are pushed.
• TINKERPOP-2078 Hide use of EmptyGraph or RemoteGraph behind a more unified method for
TraversalSource construction.
• TINKERPOP-2084 For remote requests in console, display the remote stack trace.
• 6.0.7 Components
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
46
DataStax Enterprise release notes
6.0.7 DSEFS
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
6.0.7 Components
All components from DSE 6.0.7 are listed. Components that are updated for DSE 6.0.7 are indicated with an
asterisk (*).
• Netty 4.1.13.13.dse *
DSE 6.0.7 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
47
DataStax Enterprise release notes
• Improved user tools for SSTable upgrades (sstableupgrade) and downgrades (sstabledowngrade).
(DB-2950)
• New cassandra.yaml direct_reads_size_in_mb option sets the size of the new buffer pool for direct
transient reads. (DB-2958)
• Remedy deadlock during node startup when calculating disk boundaries. (DB-3028)
• The frame decoding off-heap queue size is configurable and smaller by default. (DB-3047)
• Improved updateEdges and updateVertices usability for single label update. (DSP-18404)
• Operations through gremlin-console run with anonymous instead of system permissions. (DSP-18471)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
48
DataStax Enterprise release notes
• The sstableloader downgrade from DSE to OSS Apache Cassandra is supported with new
sstabledowngrade tool. (DB-2756)
The sstabledowngrade command cannot be used to downgrade system tables or downgrade DSE
versions.
• TupleType values with null fields NPE when being made byte-comparable. (DB-2872)
• Support for using sstableloader to stream OSS Cassandra 3.x and DSE 5.x data to DSE 6.0 and later.
(DB-2909)
The sstabledowngrade command cannot be used to downgrade system tables or downgrade DSE
versions.
# Buffer pool, and metrics for the buffer pool, are now in two pools. In cassandra.yaml,
file_cache_size_in_mb option sets the file cache (or chunk cache) and new direct_reads_size_in_mb
option for all other short-lived read operations. (DB-2958)
To retrieve the buffer pool metrics:
# cassandra-env.sh respect heap and direct memory values set in jvm.options or as environment
variables. (DB-2973)
The precedence for heap and direct memory is:
# Environment variables
# jvm.options
# calculations in cassandra-env.sh
# AIO is automatically disabled if the chunk cache size is small enough: less or equal to system RAM / 8.
(DB-2997)
# Limit off-heap frame queues by configurable number of frames and total number of bytes. (DB-3047)
Resolved issues:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
49
DataStax Enterprise release notes
• The sstableloader downgrade from DSE to OSS Apache Cassandra is not supported. New
sstabledowngrade tool is required. (DB-2756)
• nodesync fails when validating MV row with empty partition key. (DB-2823)
• TupleType values with null fields NPE when being made byte-comparable. (DB-2872)
• The memory in use in the buffer pool is not identical to the memory allocated. (DB-2904)
• Offline sstable tools fail with Out of Direct Memory error. (DB-2955)
• DIRECT_MEMORY is being calculated using 25% of total system memory if -Xmx is set in jvm.options.
(DB-2973)
• Netty direct buffers can potentially double the -XX:MaxDirectMemorySize limit. (DB-2993)
• Increased NIO direct memory because the buffers are not cleaned until GC is run. (DB-2996)
• Check of two versions of metadata for a column fails on upgrade from DSE 5.0.x when type is not of same
class. Loosen the check from CASSANDRA-13776 to prevent Trying to compare 2 different types
ERROR on upgrades. (DB-3021)
• Dropped UDT columns in SSTables deserialization are broken after upgrading from DSE 5.0. (DB-3031)
• Kerberos protocol and QoP parameters are not correctly propagated. (DSP-15455)
• RpcExecutionException does not print the user who is not authorized to perform a certain action.
(DSP-15895)
Known issue:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
50
DataStax Enterprise release notes
Resolved issues:
• After client-to-node SSL is enabled, all Spark nodes must also listen on port 7480. (DSP-15744)
• dse client-tool configuration byos-export does not export required Spark properties. (DSP-15938)
• Downloaded Spark JAR files are executable for all users. (DSP-17692)
• Issue with viewing information for completed jobs when authentication is enabled. (DSP-17854)
• Spark Cassandra Connector does properly cache manually prepared RegularStatements, see
SPARKC-558. (DSP-18075)
• Invalid options show for dse spark-submit command line help. (DSP-18293)
• Spark SQL function concat_ws results in a compilation error when an array column is included in the column
list and when the number of columns to be concatenated exceeds 8. (DSP-18383)
• Improved error messaging for AlwaysOn SQL (AOSS) client tool. (DSP-18409)
• CQL syntax error when single quote is not correctly escaped before including in save cache query to AOSS
cache table. (DSP-18418)
Known issue:
• DSE 6.0.7 is not compatible with Zeppelin in SparkR and PySpark 0.8.1. (DSP-18777)
The Apache Spark™ 2.2.3.4 that is included with DSE 6.0.7 contains the patched protocol and all versions
of DSE are compatible with the Scala interpreter.
However, SparkR and PySpark use only a separate channel for communication with Zeppelin. This protocol
was vulnerable to attack from other users on the system and was secured in CVE-2018-11760. Zeppelin
in SparkR and PySpark 0.8.1 fails because it does not recognize that Spark 2.2.2 and later contain this
patched protocol and attempts to use the old protocol. The Zeppelin patch to recognize this protocol is not
available in a released Zeppelin build.
Solution: Do not upgrade to DSE 6.0.7 if you use SparkR or PySpark. Wait for the Zeppelin release later
than 0.8.1 that will recognize that DSE-packaged Spark can use the secured protocol.
• Submitting many Spark apps will reach the default tombstone_failure_threshold before the default 90 days
gc_grace_seconds defined for the system_auth.role_permissions table. (DSP-19098)
Workaround for use cases where a large number of Spark jobs are submitted:
1. Before the user starts the Spark jobs, manually grant permissions to the user:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
51
DataStax Enterprise release notes
3. After this user completes all the Spark jobs, revoke permissions for the user:
6.0.7 DSEFS
Resolved issues:
• Change dsefs:// default port when the DSEFS setting public_port is changed in dse.yaml. (DSP-17962)
The shortcut dsefs:/// now automatically resolves to broadcastaddress:dsefs.public_port, instead of
incorrectly using broadcastaddress:5598 regardless of the configured port.
• DSEFS WebHDFS API GETFILESTATUS op returns AccessDeniedException for the file even when user
has correct permission. (DSP-18044)
• Problem with change group ownership of files using the fileSystem.setOwner method. (DSP-18052)
• Vertex and especially edge loading is simplified. idColumn function is no longer required. (DSP-18404)
Resolved issues:
• OLAP traversal duplicates the partition key properties: OLAP g.V().properties() prints 'first' vertex n times
with custom ids. (DSP-15688)
• Edges are inserted with tombstone values set when inserting a recursive edge with multiple cardinality.
(DSP-17377)
Resolved issues:
• Solr HTTP request for CSV output is blank. The CSVResponseWriter returns only stored fields if a field list is
not provided in the URL. (DSP-18029)
To workaround, specify a field list with the URL:
/select?q=*%3A*&sort=lst_updt_gdttm+desc&rows=10&fl=field1,field2&wt=csv&indent=true
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
52
DataStax Enterprise release notes
• Disables the ScriptEngine global function cache which can hold on to references to "g" along with some
other minor bug fixes/enhancements.
27 February 2019
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
All components from DSE 6.0.6 are listed. Components that are updated for DSE 6.0.6 are indicated with an
asterisk (*).
• Netty 4.1.13.12.dse
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
53
DataStax Enterprise release notes
DSE 6.0.6 is compatible with Apache Cassandra™ 3.11 and includes all production-certified changes from
earlier versions.
DSE 6.0.6 Important bug fix
• DSE 5.0 SSTables with UDTs are corrupted in DSE 5.1, DSE 6.0, and DSE 6.7. (DB-2954,
Cassandra-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), the SSTable serialization headers are fixed
when DSE is started with DSE 6.0.6 or later.
6.0.5 DSEFS
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
54
DataStax Enterprise release notes
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
All components from DSE 6.0.5 are listed. Components that are updated for DSE 6.0.5 are indicated with an
asterisk (*).
• Netty 4.1.13.12.dse *
DSE 6.0.5 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.
• DSE Metrics Collector aggregates DSE metrics and integrates with existing monitoring solutions to facilitate
problem resolution and remediation. (DSP-17319)
See:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
55
DataStax Enterprise release notes
• Fixed resource leak related to streaming operations that affects tiered storage users. Excessive number of
TieredRowWriter threads causing java.lang.OutOfMemoryError. (DB-2463)
• Exception now occurs when user with no permissions returns no rows on restricted table. (DB-2668)
• Upgraded nodes that still have big-format SSTables from DSE 5.x caused errors during read. (DB-2801)
• Fixed an issue where heap memory usage seems higher with default file cache settings. (DB-2865)
• Fixed prepared statement cache issues when using row-level access control (RLAC) permissions. Existing
prepared statements were not correctly invalidated. (DB-2867)
• You use scripts that invoke DSEFS commands and need to handle failures properly.
• You use dse spark-sql-metastore-migrate with DSE Unified Authentication and internal authentication.
(DSP-17632)
• You have DSE 5.0.x with DSEFS client connected to DSE 5.1.x and later DSEFS server. (DSP-17600)
• You get errors for OLAP traversals after dropping schema elements. (DSP-15884)
• You want server side error messages for remote exceptions reported in Gremlin console. (DSP-16375)
• Use graph OLAP and want secret tokens redacted in log files. (DSP-18074)
• You want to build fuzzy-text search indexes on string properties that form part of a vertex label ID.
(DSP-17386)
# Upgrade Apache Commons Compress to prevent Denial Of Service (DoS) vulnerability present in
Commons Compress 1.16.1, CVE-2018-11771. (DSP-17019)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
56
DataStax Enterprise release notes
# Critical memory leak and corruption fixes for encrypted indexes. (DSP-17111)
• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.
# nodetool listendpointspendinghints command prints hint information about the endpoints this node has
hints for. (DB-1674)
# nodetool rebuild_view rebuilds materialized views for local data. Existing view data is not cleared.
(DB-2451)
# Improved messages for nodetool nodesyncservice ratesimulator command include explanation for
single node clusters and when no tables have NodeSync enabled. (DB-2468)
• Direct Memory field output of nodetool gcstats includes all allocated off-heap memory. Metrics for native
memory are added in org.apache.cassandra.metrics.NativeMemoryMetrics.java. (DB-2796)
• Batch replay is interrupted and good batches are skipped when a mutation of an unknown table is found.
(DB-2855)
• New environment variable MAX_DIRECT_MEMORY overrides cassandra.yaml value for how much direct
memory (NIO direct buffers) that the JVM can use. (DB-2919)
Resolved issues:
• Running the nodetool nodesyncservice enable command reports the error NodeSyncRecord
constructor assertion failed. (DB-2280)
Workaround: Before DSE 6.0.5, a restart of DSE resolves the issue so that you can execute the command
and enable NodeSync without error.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
57
DataStax Enterprise release notes
• Rebuild should not fail when a keyspace is not replicated to other datacenters. (DB-2301)
• Repair may skip some ranges due to received range cache. (DB-2432)
• Read and compaction errors with levelled compaction strategy (LCS). (DB-2446)
• Chunk cache can retain data from a previous version of a file, causing restore failures. (DB-2489)
• LineNumberInference is not failure-safe, not finding the source information can break the request. (DB-2568)
• Improved error message when Netty Epoll library cannot be loaded. (DB-2579)
• The nodetool gcstats command output incorrectly reports the GC reclaimed metric in bytes, instead of the
expected MB. (DB-2598)
• Incorrect order of application of nodetool garbagecollect leaves tombstones that should be deleted.
(DB-2658)
• Exception should occur when user with no permissions returns no rows on restricted table. (DB-2668)
• DSE does not start with Unable to gossip with any peers error if cross_node_timeout is true.
(DB-2670)
• Heap memory usage is higher with default file cache settings. (DB-2865)
• Prepared statement cache issues when using row-level access control (RLAC) permissions. Existing
prepared statements are not correctly invalidated. (DB-2867)
• User-defined aggregates (UDAs) that instantiate user-defined types (UDTs) break after restart. (DB-2771)
• Upgraded nodes that still have big-format SSTables from DSE 5.x can cause errors during read. (DB-2801)
Workaround for upgrades from DSE 5.x to DSE versions before 6.0.5 and DSE 6.7.0: Run offline
sstableupgrade before starting the upgraded node.
• Late continuous paging errors can leave unreleased buffers behind. (DB-2862)
• Improve config encryption error reporting for missing system key and unencrypted passwords. (DSP-17480)
• Fix sstableloader error when internode encryption, client_encryption, and config encryption are enabled.
(DSP-17536)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
58
DataStax Enterprise release notes
• Improved error handling: only submission-related error exceptions from Spark submitted applications are
wrapped in a Dse Spark Submit Bootstrapper Failed to Submit error. (DSP-16359)
• Improved error message for dse client-tool when DSE Analytics is not correctly configured. (DSP-17322)
# Provide a way for clients to determine if AlwaysOn SQL (AOSS) is enabled in DSE. (DSP-17180)
# Improved logging messages with recommended resolutions for AlwaysOn SQL (AOSS). (DSP-17326,
DSP-17533)
# Improved error message for AlwaysOn SQL (AOSS) when the role specified by auth_user does not
exist. (DSP-17358)
# Set default for spark.sql.thriftServer.incrementalCollect to true for AlwaysOn SQL (AOSS). (DSP-17428)
# Structured Streaming support for (Bring Your Own Spark) BYOS Spark 2.3. (DSP-17593)
Resolved issues:
• Race condition allows Spark Executor working directories to be removed before stopping those executors.
(DSP-15769)
• Restore DseGraphFrame support in BYOS and spark-dependencies artifacts. Include graph frames python
library in graphframe.jar. (DSP-16383)
• Search optimizations for search analytics Spark SQL queries are applied to a datacenter that no longer has
search enabled. Queries launched from a search-enabled datacenter cause search optimizations even when
the target datacenter does not have search enabled. (DSP-16465)
• Unable to get available memory before Spark Workers are registered. (DSP-16790)
• Spark shell error Cannot proxy as a super user occurs when AlwaysOn Spark SQL (AOSS) is running
with authentication. (DSP-17200)
• Spark Connector has hard dependencies on dse-core when running Spark Application tests with dse-
connector. (DSP-17232)
• AlwaysOn SQL (AOSS) should attempt to auto start again on datacenter restart, regardless of the previous
status. (DSP-17359)
• AlwaysOn SQL (AOSS) restart hangs for at least 15 minutes if it cannot start, should fail with meaningful
error message. (DSP-17264)
• Submission in client mode does not support specifying remote jars (DSEFS) for main application resource
(main jar) and jars specified with --jars / spark.jars. (DSP-17382)
• Incorrect conversions in DirectJoin Spark SQL operations for timestamps, UDTs, and collections.
(DSP-17444)
• DSE 5.0.x DSEFS client is not able to list files when connected to 5.1.x (and up) DSEFS server.
(DSP-17600)
• dse spark-sql-metastore-migrate does not work with DSE Unified Authentication and internal
authentication. (DSP-17632)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
59
DataStax Enterprise release notes
6.0.5 DSEFS
• Add the ability to disable and configure DSEFS internode (node-to-node) authentication. (DSP-17721)
Resolved issues:
• DSEFS throws exceptions and cannot initialize when listen_address is left blank. (DSP-16296)
• Moving a directory under itself causes data loss and orphan data structures. (DSP-17347)
• New tool fixes inconsistencies in graph data that are caused by schema changes, like label delete, or
improper data loading. (DSP-15884)
# Spark: spark.dseGraph("name").cleanUp()
JMX operations are not cluster-aware. Invoke on each node as appropriate to your environment.
Resolved issues:
• DSEGF label drop hang with a lot of edges, both ended the same label. (DSP-17096)
• A Gremlin query with search predicate containing \u2028 or \u2029 characters fails. (DSP-17227)
• Geo.inside predicate with Polygon no longer works on secondary index if JTS is not installed. (DSP-17284)
• Search indexes on key fields work only with non-tokenized queries. (DSP-17386)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
60
DataStax Enterprise release notes
• DseGraphFrame fail to read properties with symbols, like period (.), in names. (DSP-17818)
• Large queries with oversize frames no longer cause buffer corruption on the receiver. (DSP-15664)
• If a client executes a query that results in a shard attempting to send an internode frame larger than the size
specified in frame_length_in_mb, the client receive an error message with a message like this:
Attempted to write a frame of <n> bytes with a maximum frame size of <n> bytes
In earlier versions, the query timed out with no message. Information was provided only as error in the logs.
• In earlier releases, CQL search queries failed with UTFDataFormatException on very large SELECT clauses
and when tables have a very large number of columns. (DSP-17220)
With this fix, CQL search queries fail with UTFDataFormatException only when SELECT clauses constitute
a string larger than 64k UTF-8 encode bytes.
• Upgrade Apache Commons Compress to prevent Denial Of Service (DoS) vulnerability present in Commons
Compress 1.16.1, CVE-2018-11771. (DSP-17019)
• Requesting a core reindex with dsetool reload_core or REBUILD SEARCH INDEX no longer builds up a
queue of reindexing tasks on a node. Instead, a single starting reindexing task handles all reindex requests
that are already submitted to that node. (DSP-17045, DSP-13030)
• The calculated value for maxMergeCount is changed to improve indexing performance. (DSP-17597)
where num_tokens is the number of token ranges to assign to the virtual node (vnode) as configured in
cassandra.yaml.
Resolved issues:
• Race condition occurs on bootstrap completion and Solr core fails to initialize during node bootstrap.
(DB-1383, DSP-14823)
Workaround: Restart the node that failed to initialize.
• Internode protocol can send oversize frames causing buffer corruption on the receiver. (DSP-15664)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
61
DataStax Enterprise release notes
• CQL search queries fail with UTFDataFormatException on very large SELECT clauses. (DSP-17220)
With this fix, CQL search queries fail with UTFDataFormatException only when SELECT clauses constitute
a string larger than 64k UTF-8 encode bytes.
• Unexpected search index errors occur when non-ASCII characters, like the U+3000 (ideographic space)
character, are in indexed columns. (DSP-17816, DSP-17961)
• TextField type in search index schema should be case-sensitive if created when using copyField.
(DSP-17817)
• gf.V().id().next() causes data to get mismatched with properties in legacy DseGraphFrame. (DSP-17979)
• Avoid calling iter.next() in a loop when notifying indexers about range tombstones (CASSANDRA-14794)
• DESC order reads can fail to return the last Unfiltered in the partition (CASSANDRA-14766)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
62
DataStax Enterprise release notes
• 6.0.4 Components
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
6.0.4 Components
All components from DSE 6.0.4 are listed. No components were updated from the previous DSE version.
• Netty 4.1.13.11.dse
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
63
DataStax Enterprise release notes
DSE 6.0.4 is compatible with Apache Cassandra™ 3.11 and includes all production-certified enhancements from
earlier DSE versions.
General upgrade advice for DSE 6.0.4
DataStax Enterprise 6.0.4 is compatible with Apache Cassandra™ 3.11.
All upgrade advice from previous versions applies. Carefully review the DataStax Enterprise upgrade planning
and upgrade instructions to ensure a smooth upgrade and avoid pitfalls and frustrations.
DSE 6.0.3 release notes
20 September 2018
DataStax recommends installing the latest patch release. Due to DB-2477, DataStax does not recommend
using DSE 6.0.3 for production.
• 6.0.3 Components
6.0.3 DSEFS
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
6.0.3 Components
All components from DSE 6.0.3 are listed. Components that are updated for DSE 6.0.3 are indicated with an
asterisk (*).
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
64
DataStax Enterprise release notes
• Netty 4.1.13.11.dse
DataStax Enterprise 6.0.3 is compatible with Apache Cassandra™ 3.11 and includes all production-certified
enhancements from earlier DSE versions.
• Deleting a static column and adding it back as a non-static column introduces corruption. (DB-1630)
• NodeSync command line tool only connects over JMX to a single node. (DB-1693)
• Unexpected behavior change when using row-level permissions with modification conditions like IF EXISTS.
(DB-2429)
• Jetty 9.4.1 upgrade addresses security vulnerabilities in Spark dependencies packaged with DSE.
(DSP-16893)
• dse spark-submit kill and status commands support optionally explicit Spark Master IP address.
(DSP-16910, DSP-16991)
• Fixed problems with temporary and data directories for Spark applications. (DSP-15476, DSP-15880)
• Spark Cassandra Connector method saveToCassandra should not require solr_query column when search
is enabled. (DSP-16427)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
65
DataStax Enterprise release notes
• Fully qualified paths with resource URL are correctly resolved in Spark structured streaming checkpointing.
Backport SPARK-20894. (DSP-16972)
DSEFS highlights
Important bug fixes:
• Only superusers are allowed to remove corrupted non-empty directories when authentication is enabled for
DSEFS. Improved error message when performing an operation on a corrupted path. (DSP-16340)
• DSEFS Hadoop layer doesn't properly translate DSEFS exceptions to Hadoop exceptions in some methods.
(DSP-16933)
• Closing DSEFS client before all issued requests are completed causes unexpected message type:
DefaultLastHttpContent error. (DSP-16953)
• Under high loads, DSEFS reports temporary incorrect state for various files/directories. (DSP-17178)
• Aligned query behavior using geo.inside() predicate for polygon search with and without search indexes.
(DSP-16108)
• Fixed bug where deleting a search index that was defined inside a graph fails. (DSP-16765)
• Changed default write consistency level (CL) for Graph to LOCAL_QUORUM. (DSP-17140)
In earlier DSE versions, the default QUORUM write consistency level (CL) was not appropriate for multi-
datacenter production environments.
• Reduce the number of token filters for distributed searches with vnodes. (DSP-14189)
• Avoid unnecessary exception and error creation in the Solr query parser. (DSP-17147)
• Avoid accumulating redundant router state updates during schema disagreement. (DSP-15615)
• A search enabled node could return different exceptions than a non-search enabled node when a keyspace
or table did not exist. (DSP-16834)
• DSE does not start without appropriate Tomcat JAR scanning exclusions. (DSP-16841)
• CQL single-pass queries have incorrect results when query is run with primary key and search index
schema does not contain all columns in selection. (DSP-16895)
• Node health score of 1 is not obtainable. Search node gets stuck at 0.00 node health score after replacing a
node in a cluster. (DSP-17107)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
66
DataStax Enterprise release notes
If using DSE Tiered Storage, you must immediately upgrade to at least DSE 5.1.16, DSE 6.0.9, or DSE
6.7.4. Be sure to follow the upgrade instructions.
• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.
• Due to Thread Per Core (TPC) asynchronous request processing architecture, the
index_summary_capacity_in_mb and index_summary_resize_interval_in_minutes settings in
cassandra.yaml are removed. (DB-2390)
• NodeSync waits to start until all nodes in the cluster are upgraded. (DB-2385)
• Improved error handling and logging for TDE encryption key management. (DP-15314)
• DataStax does more extensive testing on OpenJDK 8 due to the end of public updates for Oracle JRE/JDK
8. (DSP-16179)
Resolved issues:
• NodeSync command line tool only connects over JMX to a single node. (DB-1693)
• Move TWCS message "No compaction necessary for bucket size" to Trace level or NoSpam. (DB-2022)
• sstableloader options assume the RPC/native (client) interface is the same as the internode (node-to-node)
interface. (DB-2184)
• NodeSync fails on upgraded nodes while a cluster is in a partially upgraded state. (DB-2385)
• Compaction strategy instantiation errors don't generate meaningful error messages, instead return only
InvocationTargetException. (DB-2404)
• Unexpected behavior change when using row-level permissions with modification conditions like IF EXISTS.
(DB-2429)
• Authentication cache loading can exhaust native threads. The Spark master node is not able to be elected.
(DB-2248)
• Audit events for CREATE ROLE and ALTER ROLE with incorrect spacing exposes PASSWORD in plain
text. (DB-2285)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
67
DataStax Enterprise release notes
• Timestamps inserted with ISO 8601 format are saved with wrong millisecond value. (DB-2312)
• Error out if not all permissions for GRANT/REVOKE/RESTRICT/UNRESTRICT are applicable for a
resource. (DB-2373)
• BulkLoader class exits without printing the stack trace for throwable error. (DB-2377)
• Unexpected behavior change when using row-level permissions with modification conditions like IF EXISTS.
(DB-2429)
• Using geo types does not work when memtable allocation type is set to offheap_objects. (DSP-16302)
• The -graph option for the cassandra-stress tool failed on generating the target output html in the JAR file.
(DSP-17046)
Known issue:
• Upgraded nodes that still have big-format SSTables from DSE 5.x can cause errors during read. (DB-2801)
Workaround for upgrades from DSE 5.x to DSE versions before 6.0.5 and DSE 6.7.0: Run offline
sstableupgrade before starting the upgraded node.
• DSE pyspark libraries are added to PYTHONPATH for dse exec command. Add support for Jupyter
integration. (DSP-16797)
• dse spark-submit kill and status commands support optionally explicit master address. (DSP-16910,
DSP-16991)
• Address security vulnerabilities in Spark dependencies packaged with DSE. Upgrade Netty to 9.4.11.
(DSP-16893)
• Jetty 9.4.1 upgrade addresses security vulnerabilities in Spark dependencies packaged with DSE.
(DSP-16893)
Resolved issues:
• Problems with temporary and data directories for Spark applications. (DSP-15476, DSP-15880)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
68
DataStax Enterprise release notes
# DSE client applications, like Spark, will not start if user HOME environment variable is not defined, user
home directory does not exist, or the current user does not have write permissions.
# Temporary data directory for AOSS is /var/log/spark/rdd, the same as the server-side temporary
data location for Spark. Configurable with SPARK_EXECUTOR_DIRS environment variable in spark-
env.sh.
# If TMPDIR environment variable is missing, /tmp is set for all DSE apps. If /tmp directory does not
exist, it is created with 1777 permissions. If directory creation fails, perform a hard stop.
• Improved security isolates Spark applications; prevents run_as runner for Spark from running a malicious
program. (DSP-16093)
• Spark Cassandra Connector method saveToCassandra should not require solr_query column when search
is enabled. (DSP-16427)
• DSE Spark logging does not match OSS Spark logging levels. (DSP-16726)
• Metastore can't handle table with 100+ columns with auto Spark SQL table creation. (DSP-16742)
• DseDirectJoin and reading from Hive Tables does not work in Spark Structured Streaming. (DSP-16856)
• Fully qualified paths with resource URL are resolved in Spark structured streaming checkpointing. Backport
SPARK-20894. (DSP-16972)
• AlwaysOn SQL (AOSS) dsefs directory creation does not wait for all operations to finish before closing
DSEFS client. (DSP-16997)
6.0.3 DSEFS
• Only superusers are able to remove corrupted non-empty directories when authentication is enabled for
DSEFS. (DSP-16340)
Resolved issues:
• In DSEFS shell, listing too many local file system directories in a single session causes a file descriptor leak.
(DSP-16657)
• DSEFS fails to start when there is a table with duration type or other type DSEFS can't understand.
(DSP-16825)
• DSEFS Hadoop layer doesn't properly translate DSEFS exceptions to Hadoop exceptions in some methods.
(DSP-16933)
• Closing DSEFS client before all issued requests are completed causes unexpected message type:
DefaultLastHttpContent error. (DSP-16953)
• Under high loads, DSEFS reports temporary incorrect state for various files/directories. (DSP-17178)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
69
DataStax Enterprise release notes
schema.config().option('graph.traversal_sources.g.evaluation_timeout').set(Duration.ofDays(1094))
Known issue:
Resolved issues:
• Align query behavior using geo.inside() predicate for polygon search with and without search indexes.
(DSP-16108)
• Avoid looping indefinitely when a thread making internode requests is interrupted while trying to acquire a
connection. (DSP-16544)
• Deleting a search index that was defined inside a graph fails. (DSP-16765)
• Reduce the number of unique token selections for distributed searches with vnodes. (DSP-14189)
Search load balancing strategies are per search index (per core) and are set with dsetool set_core_property.
• Avoid unnecessary exception and error creation in the Solr query parser. (DSP-17147)
Resolved issues:
• Avoid accumulating redundant router state updates during schema disagreement. (DSP-15615)
• NRT codec is not registered at startup for Solr cores that have switched to RT. (DSP-16663)
• Dropping search index when index build is in progress can interrupt Solr core closure. (DSP-16774)
• Exceptions thrown when search is enabled and table is not found in existing keyspace. (DSP-16834)
• DSE should not start without appropriate Tomcat JAR scanning exclusions. (DSP-16841)
• CQL single-pass queries have incorrect results when query is run with primary key and search index
schema does not contain all columns in selection. (DSP-16895)
Best practice: For optimal single-pass queries, including queries where solr_query is used with a partition
restriction, and queries with partition restrictions and a search predicate, ensure that the columns to
SELECT are not indexed in the search index schema.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
70
DataStax Enterprise release notes
Workaround: Since auto-generation indexes all columns by default, you can ensure that the field is not
indexed but still returned in a single-pass query. For example, this statement indexes everything except
for column c3, and informs the search index schema about column c3 for efficient and correct single-pass
queries.
• Node health score of 1 is not obtainable. Search node gets stuck at 0.00 node health score after replacing a
node in a cluster. (DSP-17107)
• 6.0.2 Components
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
71
DataStax Enterprise release notes
6.0.2 Components
All components from DSE 6.0.2 are listed. Components that are updated for DSE 6.0.2 are indicated with an
asterisk (*).
• Netty 4.1.13.11.dse
DataStax Enterprise 6.0.2 is compatible with Apache Cassandra™ 3.11 and includes all production-certified
enhancements from earlier DSE versions.
• Fixed issue where CassandraConnectionConf creates excessive database connections and reports too
many HashedWheelTimer instances. (DSP-16365)
DSE Graph
DSE Search
• Schemas with stored=true work because stored=true is ignored. The workaround for 6.0.x upgrades with
schema.xml fields with “indexed=false, stored=true, docValues=true” is no longer required. (DSP-16392)
• Minor bug fixes and error handling improvements. (DSP-16435, DSP-16061, DSP-16078)
• -d option to create local encryption keys without configuring the directory in dse.yaml. (DSP-15380)
Resolved issues:
• Use more precise grep patterns to prevent accidental matches in cassandra-env.sh. (DB-2114)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
72
DataStax Enterprise release notes
• For tables using DSE Tiered Storage, nodetool cleanup places cleaned SSTables in the wrong tier.
(DB-2173)
• Support creating system keys before the output directory is configured in dse.yaml. (DSP-15380)
• Improved compatibility with external tables stored in the DSE Metastore in remote systems. (DSP-16561)
• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.
• Apache Hadoop Azure libraries for Hadoop 2.7.1 have been added to the Spark classpath to simplify
integration with Microsoft Azure and Microsoft Azure Blob Storage. (DSP-15943)
# AlwaysOn SQL (AOSS) support for enabling Kerberos and SSL at the same time. (DSP-16087)
# Add 120 seconds wait time so that Spark Master recovery process completes before status check of
AlwaysOn SQL (AOSS) app. (DSP-16249)
# AlwaysOn SQL (AOSS) driver continually runs on a node even when DSE is down. (DSP-16297)
# Improved defaults and errors for AlwaysOn SQL (AOSS) workpool. (DSP-16343)
Resolved issues:
• Need to disable cluster object JMX metrics report to prevent count exceptions spam in Spark driver log.
(DSP-16442)
6.0.2 DSEFS
• DSEFS operations: chown, chgrp, and chmod support recursive (-R) and verbose (-v) flag. (DSP-14238)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
73
DataStax Enterprise release notes
# Idle DSEFS internode connections are closed after 120 seconds. Configurable with new dse.yaml
option internode_idle_connection_timeout_ms.
• DSEFS clients close idle connections after 60 seconds, configurable in dse.yaml. (DSP-14284)
# If the second read is issued after a failed read, it is not blocked forever. The stream is automatically
closed on errors, and subsequent reads will fail with IllegalStateException.
# The timeout message includes information about the underlying DataSource object.
# No more reads are issued to the underlying DataSource after it reports hasMoreData = false.
# The read loop has been simplified to properly move to the next buffer if the requested number of bytes
hasn't been delivered yet.
# Empty buffer returned from the DataSource when hasMoreData = true is not treated as an EOF. The
read method validates offset and length arguments.
• Security improvement: DSEFS uses an isolated native memory pool for file data and metadata sent between
nodes. This isolation makes it harder to exploit potential memory management bugs. (DSP-16492)
Resolved issues:
• DSEFS silently fails when TCP port 5599 is not open between nodes. (DSP-16101)
• Vertices and vertex properties created or modified with graphframes respect TTL as defined in the schema.
In earlier versions, vertices and vertex properties had no TTL. Edges created or modified with graphframes
continue to have no TTL. (DSP-15555)
Resolved issues:
• DGF interceptor does not take into account GraphStep parameters with g.V(id) queries. (DSP-16172)
• The clause LIMIT does not work in a graph traversal with search predicate TOKEN, returning only a subset
of expected results. (DSP-16292)
• The node health option uptime_ramp_up_period_seconds default value in dse.yaml is reduced to 3 hours
(10800 seconds). (DSP-15752)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
74
DataStax Enterprise release notes
• Use monotonically increasing time source for search query execution latency calculation. (DSP-16435)
Resolved issues:
• DataStax Bulk Loader (dsbulk) version 1.1.0 is automatically installed with DataStax Enterprise 6.0.2, and
can also be installed as a standalone tool. See DataStax Bulk Loader 1.1.0 release notes. (DSP-16484)
• Fixed regression issue where the HTTPChannelizer doesn’t instantiate the specified
AuthenticationHandler.
• 6.0.1 Components
• 6.0.1 Highlights
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
75
DataStax Enterprise release notes
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
6.0.1 Components
All components from DSE 6.0.1 are listed. Components that are updated for DSE 6.0.1 are indicated with an
asterisk (*).
• Netty 4.1.13.11.dse
DSE 6.0.1 is compatible with Apache Cassandra™ 3.11 and adds additional production-certified enhancements.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
76
DataStax Enterprise release notes
• Fixed issue where multiple Spark Masters can be started on the same machine. (DSP-15636)
• Improved AlwaysOn SQL (AOSS) startup reliability. (DSP-15871, DSP-15468, DSP-15695, DSP-15839)
• Resolved the missing /tmp directory in DSEFS after fresh cluster installation. (DSP-16058)
• Fixed the HashedWheelTimer leak in Spark Connector that affected BYOS. (DSP-15569)
DSE Search
• Fix for the known issue that prevented using TTL (time-to-live) with DSE Search live indexing (RT indexing).
(DSP-16038, DSP-14216)
• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.
• LDAP tuning parameters allow all LDAP connection pool options to be set. (DSP-15948)
Resolved issues:
• Use the indexed item type as backing table key validator of 2i on collections. (DB-1121)
• Add getConcurrentCompactors to JMX in order to avoid loading DatabaseDescriptor to check its value in
nodetool. (DB-1730)
• Send a final error message when a continuous paging session is cancelled. (DB-1798)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
77
DataStax Enterprise release notes
• Apply view batchlog mutation parallel with local view mutations. (DB-1900)
• Use same IO queue depth as Linux scheduler and advise against overriding it. (DB-1909)
• Fix startup error message rejecting COMPACT STORAGE after upgrade. (DB-1916)
• Improve user warnings on startup when libaio package is not installed. (DB-1917)
• Prevent OOM due to OutboundTcpConnection backlog by dropping request messages after the queue
becomes too large. (DB-2001)
• sstableloader does not decrypt passwords using config encryption in DSE. (DSP-13492)
• The Spark Jobserver demo has an incorrect version for the Spark Jobserver API. (DSP-15832)
Workaround: In the demo's gradle.properties file, change the version from 0.6.2 to 0.6.2.238.
• Decreased the number of exceptions logged during master move from node to node. (DSP-14405)
• When querying remote cluster from Spark job, connector does not route requests to data replicas.
(DSP-15202)
• AlwaysOn SQL dependency on JPS is removed. The jps_directory entry in dse.yaml is removed.
(DSP-15468)
• Improved security for Spark JobServer. All uploaded JARs, temporary files, and logs are created under the
current user's home directory: ~/.spark-jobserver. (DSP-15832)
• During misconfigured cluster bootstrap, the AlwaysOn SqlServer does not start due to missing /tmp/hive
directory in DSEFS. (DSP-16058)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
78
DataStax Enterprise release notes
Resolved issues:
• A shard request timeout caused an assertion error from Lucene getNumericDocValues in the log.
(DSP-14216)
• In some situations, AlwaysOn SQL cannot start unless DSE node is restarted. (DSP-15871)
• Java driver in Spark Connector uses daemon threads to prevent shutdown hooks from being blocked by
driver thread pools. (DSP-16051)
• dse client-tool spark sql-schema --all exports definitions for solr_admin keyspace. (DSP-16073).
6.0.1 DSEFS
Resolved issues:
• DseGraphFrame performance improvement reduces number of joins for count() and other id only queries.
(DSP-15554)
• Performance improvements for traversal execution with Fluent API and script-based executions.
(DSP-15686)
Resolved issues:
• When using graph frames, cannot upload edges when ids for vertices are complex non-text ids.
(DSP-15614)
• CassandraHiveMetastore is prevented from adding multiple partitions for file-based data sources. Fixes
MSCK REPAIR TABLE command. (DSP-16067)
• Output Solr foreign filter cache warning only on classes other than DSE classes. (DSP-15625)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
79
DataStax Enterprise release notes
# Xerces2-j: CVE-2013-4002
# uimaj-core: CVE-2017-15691
Resolved issues:
• Offline sstable tools fail is DSE Search index is present on a table. (DSP-15628)
• HTTP read on solr_stress doesn't inject random data into placeholders. (DSP-15727)
• Search index TTL Expiration thread loops without effect with live indexing (RT indexing). (DSP-16038)
• Search incorrectly assumes only single-row ORDER BY clauses on first clustering key. (DSP-16064)
DataStax recommends using the latest DataStax Bulk Loader 1.2.0 For details, see DataStax Bulk Loader.
Cassandra enhancements for DSE 6.0.1
DataStax Enterprise 6.0.1 is compatible with Apache Cassandra™ 3.11, includes all DataStax enhancements
from earlier releases, and adds these production-certified changes:
• cassandra-stress throws NPE if insert section isn't specified in user profile (CASSSANDRA-14426)
• Don't use guava collections in the non-system keyspace jmx attributes (CASSANDRA-12271)
• Serialize empty buffer as empty string for json output format (CASSANDRA-14245)
• Cassandra not starting when using enhanced startup scripts in windows (CASSANDRA-14418)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
80
DataStax Enterprise release notes
• Delay hints store excise by write timeout to avoid race with decommission (CASSANDRA-13740)
• Avoid deadlock when running nodetool refresh before node is fully up (CASSANDRA-14310)
• CqlRecordReader no longer quotes the keyspace when connecting, as the java driver will
(CASSANDRA-10751)
• Bump to Groovy 2.4.15 - resolves a Groovy bug preventing Lambda creation in GLVs in some cases.
(TINKERPOP-1953)
• 6.0.0 Components
DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and
DSE 6.7.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
81
DataStax Enterprise release notes
The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change.
Performance is highly dependent on data access patterns and varies from customer to customer. This
upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.
In response to this scenario:
• DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
• DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services
engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using
DSE 5.1 and plan to upgrade to DSE 6.0 or DSE 6.7 or DSE 6.8, contact DataStax to schedule your
complimentary assessment.
• DataStax continues to investigate performance differences related to DSE Search and DSE Graph that
occur after some upgrades to DSE 6.0 and DSE 6.7. Additional details have been and will continue to be
included in DSE release notes.
DSE 6.0.0 Do not use TTL (time-to-live) with DSE Search live indexing (RT indexing). To use these features
together, upgrade to DSE 6.0.1. (DSP-16038)
6.0.0 Components
• Netty 4.1.13.11.dse
DSE 6.0 is compatible with Apache Cassandra™ 3.11 and adds additional production-certified enhancements.
Experimental features. These features are experimental and are not supported for production:
• SASI indexes.
Known issues:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
82
DataStax Enterprise release notes
Workaround: create a directory that matches the keyspace name, and then create symbolic links into that
directory from snapshot directory with name of the destination table. For example:
• DSE 5.0 SSTables with UDTs will be corrupted after migrating to DSE 5.1, DSE 6.0, and DSE 6.7.
(DB-2954, CASSANDRA-15035)
If the DSE 5.0.x schema contains user-defined types (UDTs), upgrade to at least DSE 5.1.13, DSE
6.0.6, or DSE 6.7.2. The SSTable serialization headers are fixed when DSE is started with the upgraded
versions.
• DSE 6.0 will not start with OpsCenter 6.1 installed. OpsCenter 6.5 is required for managing DSE 6.0
clusters. See DataStax OpsCenter compatibility with DSE. (DSP-15996)
Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading to DSE 6.0, you
must migrate all tables that have COMPACT STORAGE to CQL table format.
Upgrades from DSE 5.0.x or DSE 5.1.x with Thrift-compatible tables require DSE 5.1.6 or later or DSE 5.0.12
or later.
• Allow user-defined functions (UDFs) within GROUP BY clause and allow non-deterministic UDFs within
GROUP BY clause. New CQL keywords (DETERMINISTIC and MONOTONIC). The cassandra.yaml file
enable_user_defined_functions_threads option has no changes to default behavior of true; set to false to
use UDFs in GROUP BY clauses. (DB-672)
• Improved architecture with Thread Per Core (TPC) asynchronous read and write paths. (DB-707)
New DSE start-up parameters:
# -Ddse.io.aio.enable
# -Ddse.io.aio.force
# aggregated_request_timeout_in_ms
# streaming_connections_per_host
# key_cache_* settings are no longer used in new SSTable format, but retained to support existing
SSTable format
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
83
DataStax Enterprise release notes
# Deprecated options:
Deprecated options Replaced with
rpc_address native_transport_address
rpc_interface native_transport_interface
rpc_interface_prefer_ipv6 native_transport_interface_prefer_ipv6
rpc_port native_transport_port
broadcast_rpc_address native_transport_broadcast_address
rpc_keepalive native_transport_keepalive
# batch_size_warn_threshold_in_kb: 64
# column_index_size_in_kb: 16
# memtable_flush_writers: 4
• Authentication and authorization improvements. RLAC (setting row-level permissions) speed is improved.
(DB-909)
• JMX exposed metrics for external dropped messages include COUNTER_MUTATION, MUTATION,
VIEW_MUTATION, RANGE_SLICE, READ, READ_REPAIR, LWT, HINTS, TRUNCATE, SNAPSHOT,
SCHEMA, REPAIR, OTHER. (DB-1127)
• After upgrade is complete and all nodes are on DSE 6.0 and the required schema change occurs,
authorization (CassandraAuthorizer) and audit logging (CassandraAuditWriter) enable the use of new
columns. (DB-1597)
• The DataStax Installer is no longer supported. To upgrade from earlier versions that used the DataStax
Installer, see Upgrading to DSE 6.0 from DataStax Installer installations. For new installations, use a
supported installation method. (DSP-13640)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
84
DataStax Enterprise release notes
# Database administrators can manage role permissions without having access to the data. (DB-757)
# Filter rows from system keyspaces and system_schema tables based on user permissions. New
system_keyspaces_filtering option in cassandra.yaml returns information based on user access to
keyspaces. (DB-404)
# New metric for replayed batchlogs and trace-level logging include the age of the replayed batchlog.
(DB-1314)
# Decimals with a scale > 100 are no longer converted to a plain string to prevent
DecimalSerializer.toString() being used as an attack vector. (DB-1848)
# Auditing by role: new dse.yaml audit options included_roles and excluded_roles. (DSP-15733)
• libaio package dependency for DataStax Enterprise 6.0 installations on RHEL-based systems using Yum
and on Debian-based systems using APT install. For optimal performance in tarball installations, DataStax
recommends installing the libaio package. (DSP-14228)
• The default number of threads used by performance objects increased from 1 to 4. Upgrade restrictions
apply. (DSP-14515)
• Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, migrate all
tables that have COMPACT STORAGE to CQL table format. DSE 6.0 will not start if COMPACT STORAGE
tables are present. See Upgrading from DSE 5.1.x or Upgrading from DSE 5.0.x. (DSP-14839)
• The minimum supported version of Oracle Java SE Runtime Environment 8 (JDK) is 1.8u151. (DSP-14818)
• sstabledump supports the -l option to output each partition as its own JSON object. (DSP-15079)
• Upgrades to OpsCenter 6.5 or later are required before starting DSE 6.0. DataStax recommends upgrading
to the latest OpsCenter version that supports your DSE version. Check the compatibility page for your
products. (DSP-15996)
Resolved issues:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
85
DataStax Enterprise release notes
• Add result set metadata to prepared statement MD5 hash calculation. (DB-608)
system.peers:
dse_version text,
graph boolean,
server_id text,
workload text,
workloads frozen<set<text>>
system.local:
dse_version text,
graph boolean,
server_id text,
workload text,
workloads frozen<set<text>>
• Create administrator roles who can carry out everyday administrative tasks without having unnecessary
access to data. (DB-757)
• When repairing Paxos commits, only block on nodes are being repaired. (DB-761)
• Error in counting iterated SSTables when choosing whether to defrag in timestamp ordered path. (DB-1018)
• Expose ports (storage, native protocol, JMX) in system local and peers tables. (DB-1040)
• Load mapped buffer into physical memory after mlocking it for MemoryOnlyStrategy. (DB-1052)
• Forbid advancing KeyScanningIterator before exhausting or closing the current iterator. (DB-1199)
• New nodetool abortrebuild command stops a currently running rebuild operation. (DB-1234)
• Drop response on view lock acquisition timeout and add ViewLockAcquisitionTimeouts metric. (DB-1522)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
86
DataStax Enterprise release notes
• dsetool ring prints ERROR when data_file_directories is removed from cassandra.yaml. (DSP-13547)
• Support for DSE Advanced Replication V1 is removed. For V1 installations, you must first upgrade to DSE
5.1.x and migrate your DSE Advanced Replication to V2, and then upgrade to DSE 6.0. (DSP-13376)
• Enhanced CLI security prevents injection attacks and sanitizes and validates the command line inputs.
(DSP-13682)
Resolved issues:
• Improve logging on unsupported operation failure and remove the failed mutation from replog. (DSP-15043)
• Channel creation fails with NPE when using mixed case destination name. (DSP-15538)
Experimental features. These features are experimental and are not supported for production:
Known issues:
• DSE Analytics: Additional configuration is required when enabling context-per-jvm in the Spark Jobserver.
(DSP-15163)
• Previously deprecated environment variables, including SPARK_CLASSPATH, are removed in Spark 2.2.0.
(DSP-8379)
• AlwaysOn SQL service, a HA (highly available) Spark SQL Thrift server. (DSP-10996)
# The spark_config_settings and hive_config_settings are removed from dse.yaml. The configuration is
provided in the spark-alwayson-sql.conf file in DSEHOME/resources/spark/conf with the same default
contents as DSEHOME/resources/spark/conf/spark-defaults.conf. (DSP-15837)
• Cassandra File System (CFS) is removed. Use DSEFS instead. Before upgrading to DSE 6.0, remove CFS
keyspaces. See the From CFS to DSEFS dev blog post. (DSP-12470)
• Authenticate JDBC users to Spark SQL Thrift Server. Queries that are executed during JDBC session are
run as the user who authenticated through JDBC. (DSP-13395)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
87
DataStax Enterprise release notes
• Encryption for data stored on the server and encryption of Spark spill files is supported. (DSP-13841)
• Spark local applications no longer use /var/lib/spark/rdd, instead configure and use .sparkdirectory for
processes started by the user. (DSP-14380)
• Input metrics are not thread-safe and are not used properly in CassandraJoinRDD and
CassandraLeftJoinRDD. (DSP-14569)
• AlwaysOn SQL workpool option adds high availability (HA) for the JDBC or ODBC connections for analytics
node. (DSP-14719)
• CFS is removed. Before upgrade, move HiveMetaStore from CFS to DSEFS and update URL references.
(DSP-14831)
• Include SPARK-21494 to use correct app id when authenticating to external service. (DSP-14140)
• Upgrade to DSE 6.0 must be complete on all nodes in the cluster before Spark Worker and Spark Master
will start. (DSP-14735)
# All Spark-related parameters are now camelCase. Parameters are case-sensitive. The snake_versions
are automatically translated to the camelCaseVersions except when the parameters are used as table
options. In SparkSQL and with spark.read.options(...), the parameters are case-insensitive because of
internal SQL implementation.
• Use NodeSync (continuous repair) and LOCAL_QUORUM for reading from Spark recovery storage.
(DSP-15219)
Supporting changes:
# Spark Master will not start until LOCAL_QUORUM is achieved for dse_analytics keyspace.
# Spark Master recovery data is first attempted to be updated with LOCAL_QUORUM, and if that fails,
then attempt with LOCAL_ONE. Recovery data are always queried with LOCAL_QUORUM (unlike
previous versions of DSE where we used LOCAL_ONE)
DataStax strongly recommends enabling NodeSync for continuous repair on all tables in the
dse_analytics keyspace. NodeSync is required on the rm_shared_data keyspace that stores Spark
recovery information.
Resolved issues:
• DSE does not work with Spark Crypto based encryption. (DSP-14140)
6.0.0 DSEFS
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
88
DataStax Enterprise release notes
• Improved authorization security sets the default permission to 755 for directories and 644 for files. New
DSEFS clusters create the root directory / with 755 permission to prevent non-super users from modifying
root content; for example, by using mkdir or put commands. (DSP-13609)
• New tool to move hive metastore from CFS to DSEFS and update references.
Known issues:
• Dropping a property of vertex label with materialized view (MV) indices breaks graph. To drop a property
key for a vertex label that has a materialized view index, additional steps are required to prevent data loss or
cluster errors. See Dropping graph schema. (DSP-15532)
• Secondary indexes used for DSE Graph queries have higher latency in DSE 6.0 than in the previous
version. (DB-1928)
• Backup snapshots taken with OpsCenter 6.1 will not load to DSE 6.0. Use the backup service in OpsCenter
6.5 or later. (DSP-15922)
# Standard vertex IDs are deprecated. Use custom vertex IDs instead. (DSP-13485)
• Schema API changes: all .remove() methods are renamed to .drop() and schema.clear() is renamed to
schema.drop(). Schema API supports removing vertex/edge labels and property keys. Unify use of drop |
remove | clear in the Schema API and use .drop() everywhere. (DSP-8385, DSP-14150)
• Include materialized view (MV) indexes in query optimizer only if the MV was fully built. (DSP-10219)
• Improve Graph OLAP performance by smart routing query to DseGraphFrame engine with
DseGraphFrameInterceptorStrategy. (DSP-13489)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
89
DataStax Enterprise release notes
• Graph online analytical processing (OLAP) supports drop() with DseGraphFrame interceptor. Simple queries
can be used in drop operations. (DSP-13998)
• DSE Graphs vertices and edges tables are accessible from SparkSQL and automated to dse_graph
SparkSQL database. (DSP-12046)
• More Gremlin APIs are supported in DSEGraphFrames: dedup, sort, limit, filter, as()/select(), or().
(DSP-13649)
• Some graph and gremlin_server properties in earlier versions of DSE are no longer required for DSE 6.0.
The default settings from the earlier versions of dse.yaml are preserved. These settings were removed from
dse.yaml.
# adjacency_cache_clean_rate
# adjacency_cache_max_entry_size_in_mb
# adjacency_cache_size_in_mb
# gremlin_server_enabled
# index_cache_clean_rate
# index_cache_max_entry_size_in_mb
# window_size
If these properties exist in the dse.yaml file after upgrading to DSE 6.0, logs display warnings. You can
ignore these warnings or modify dse.yaml so that only the required graph system level and gremlin_server
properties are present. (DSP-14308)
• Spark Jobserver is the DSE custom version 0.8.0.44. Applications must use the compatible Spark Jobserver
API in DataStax repository. (DSP-14152)
• Edge label names and property key names allow only [a-zA-Z0-9], underscore, hyphen, and period. The
string formatting for vertices with text custom IDs has changed. (DSP-14710)
Supporting changes (DSP-15167):
# In-place upgrades allow existing schemas with invalid edge label names and property key names.
• Invoking toString on a custom vertex ID containing a text property, or on an edge ID that is incident upon a
vertex with a custom vertex ID, now returns a value that encloses the text property value in double quotation
marks and escapes the value's internal double-quotes. This change protects older formats from irresolvable
parsing ambiguity. For example:
// old
{~label=v, x=foo}
{~label=w, x=a"b}
// new
{~label=v, x="foo"}
{~label=w, x="a""b"}
• Support for math()-step (math) to enable scientific calculator functionality within Gremlin. (DSP-14786)
• The GraphQueryThreads JMX attribute has been removed. Thread selection occurs with Thread Per Core
(TPC) asynchronous request processing architecture. (DSP-15222)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
90
DataStax Enterprise release notes
Resolved issues:
• Intermittent KryoException: Buffer underflow error when running order by query in OLTP mode.
(DSP-12694)
• DseGraphFrames properties().count() step return vertex count instead of multi properties count.
(DSP-15049)
• GraphSON parsing error prevents proper type detection under certain conditions. (DSP-14066)
Experimental features. These features are experimental and are not supported for production:
Known issues:
• Search index TTL Expiration thread loops without effect with live indexing (RT indexing). (DSP-16038)
• DSE Search is very IO intensive. Performance is impacted by the Thread Per Core (TPC) asynchronous
read and write paths architecture. (DB-707)
Before using DSE Search in DSE 6.0 and later, review and follow the DataStax recommendations:
# On search nodes, change the tpc_cores value from its default to the number of physical CPUs. Refer
to Tuning TPC cores.
# Disable AIO and set the file_cache_size_in_mb value to 512. Refer to Disabling AIO.
# Locate DSE Cassandra transactional data and Solr-based DSE Search data on separate Solid State
Drives (SSDs). Refer to Set the location of search indexes.
# Plan for sufficient memory resources and disk space to meet operational requirements. Refer to
Capacity planning for DSE Search.
• Writes are flushed to disk in segments that use a new Lucene codec that does not exist in earlier versions.
Unique key values are no longer stored as both docValues and Lucene stored fields. The unique key values
are now stored only as docValues in a new codec to store managed fields like Lucene. Downgrades to
versions earlier than DSE 6.0 are not supported. (DSP-8465)
• Document inserts and updates using HTTP are removed. Before upgrading, ensure you are using CQL for
all inserts and updates. (DSP-9725).
• The <dataDir> parameter in the solrconfig.xml file is not supported. Instead, follow the steps in Set the
location of search indexes. (DSP-13199)
• Improved performance by early termination of sorting. Ideal for queries that need only a few results returned,
from a large number of total matches. (DSP-13253)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
91
DataStax Enterprise release notes
# The default for CQL text type changed from solr.TextField to solr.StrField.
• Delete by id is removed. Delete by query no longer accepts wildcard queries, including queries that match
all documents (for example, <delete><query>*:*</query></delete>). Instead, use CQL to DELETE by
Primary Key or the TRUNCATE command. (DSP-13436)
# RAM buffer size settings are no longer required in search index config. Global RAM buffer usage in
Lucene is governed by the memtable size limits in cassandra.yaml. RAM buffers are counted toward the
memtable_heap_space_in_mb.
• The HTTP API for Solr core management is removed. Instead, use CQL commands for search index
management or dsetool search index commands. (DSP-13530)
• The Tika functionality bundled with Apache Solr is removed. Instead, use the stand-alone Apache Tika
project. (DSP-13892)
# The solrvalidation.log is removed. You can safely remove appender SolrValidationErrorAppender and
the logger SolrValidationErrorLogger from logback.xml. Indexing errors manifest as:
# failures at the coordinator if they represent failures that might succeed at some later point in time
using the hint replay mechanism
# as messages in the system.log if the failures are due to non-recoverable indexing validation errors
(for data that is written to the database, but not indexed properly)
• The DSE custom update request processor (URP) implementation is deprecated. Use the field input/output
(FIT) transformer API instead. (DSP-14360)
• The stored flag in search index schemas is deprecated and is no longer added to auto-generated schemas.
If the flag exists in custom schemas, it is ignored. (DSP-14425)
# Indexing is no longer asynchronous. Document updates are written to the Lucene RAM buffer
synchronously with the mutation backing table.
# enable_back_pressure_adaptive_nrt_commit
# max_solr_concurrency_per_core
# solr_indexing_error_log_options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
92
DataStax Enterprise release notes
• StallMetrics MBean is removed. Before upgrading to DSE 6.0, change operators that use the MBean.
(DSP-14860)
• Optimize Paging when limit is smaller than the page size. (DSP-15207)
Resolved issues includes all bug fixes up to DSE 5.1.8. Additional 6.0.0 fixes:
• For use with DSE 6.0.x, DataStax Studio 6.0.0 is installed as a standalone tool. (DSP-13999, DSP-15623)
• DataStax Bulk Loader (dsbulk) version 1.0.1 is automatically installed with DataStax Enterprise 6.0.0, and
can also be installed as a standalone tool. (DSP-13999, DSP-15623)
• Fix updating base table rows with TTL not removing view entries (CASSANDRA-14071)
• RPM package spec: fix permissions for installed jars and config files (CASSANDRA-14181)
• Gossip thread slows down when using batch commit log (CASSANDRA-12966)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
93
DataStax Enterprise release notes
• Avoid reading static row twice from old format sstables (CASSANDRA-13236)
• Upgrade netty version to fix memory leak with client encryption (CASSANDRA-13114)
• Add result set metadata to prepared statement MD5 hash calculation (CASSANDRA-10786)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
94
DataStax Enterprise release notes
• Add incremental repair support for --hosts, --force, and subrange repair (CASSANDRA-13818)
• Add additional unit tests for batch behavior, TTLs, Timestamps (CASSANDRA-13846)
• Emit metrics whenever we hit tombstone failures and warn thresholds (CASSANDRA-13771)
• Allow changing log levels via nodetool for related classes (CASSANDRA-12696)
• Reduce memory copies and object creations when acting on ByteBufs (CASSANDRA-13789)
• Don't delete incremental repair sessions if they still have sstables (CASSANDRA-13758)
• Support for migrating legacy users to roles has been dropped (CASSANDRA-13371)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
95
DataStax Enterprise release notes
• Don't add localhost to the graph when calculating where to stream from (CASSANDRA-13583)
• Change the accessibility of RowCacheSerializer for third party row cache plugins (CASSANDRA-13579)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
96
DataStax Enterprise release notes
• Fix incorrect cqlsh results when selecting same columns multiple times (CASSANDRA-13262)
• Change protocol to allow sending key space independent of query string (CASSANDRA-10145)
• Take number of files in L0 in account when estimating remaining compaction tasks (CASSANDRA-13354)
• Skip building views during base table streams on range movements (CASSANDRA-13065)
• Improve error messages for +/- operations on maps and tuples (CASSANDRA-13197)
• Make it possible to monitor an ideal consistency level separate from actual consistency level
(CASSANDRA-13289)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
97
DataStax Enterprise release notes
• Use new token allocation for non bootstrap case as well (CASSANDRA-13080)
• Require forceful decommission if number of nodes is less than replication factor (CASSANDRA-12510)
• Nodetool repair can hang forever if we lose the notification for the repair completing/failing
(CASSANDRA-13480)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
98
DataStax Enterprise release notes
• Fixed a bug in NumberHelper that led to wrong min/max results if numbers exceeded the Integer limits.
(TINKERPOP-1873)
• Improved error messaging for failed serialization and deserialization of request/response messages.
• Fixed bug in handling of Direction.BOTH in Messenger implementations to pass the message to the
opposite side of the `StarGraph` in VertexPrograms for OLAP traversals. (TINKERPOP-1862)
• Fixed a bug in Gremlin Console which prevented handling of gremlin.sh flags that had an equal sign (=)
between the flag and its arguments. (TINKERPOP-1879)
• Fixed a bug where SparkMessenger was not applying the edgeFunction`from MessageScope`in
VertexPrograms for OLAP-based traversals. (TINKERPOP-1872)
• TinkerPop drivers prior to 3.2.4 won't authenticate with Kerberos anymore. A long-deprecated option on the
Gremlin Server protocol was removed.
• Can unload data from any Cassandra 2.1 or later data source
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
99
Chapter 3. Installing DataStax Enterprise 6.0
Installation information is located in the Installation Guide.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
100
Chapter 4. Configuration
Depending on your environment, some of the following settings might not persist after reboot. Check with your
system administrator to ensure these settings are viable for your environment.
Use the Preflight check tool to run a collection of tests on a DSE node to detect and fix node configurations. The
tool can detect and optionally fix many invalid or suboptimal configuration settings, such as user resource limits,
swap, and disk settings.
Configure the chunk cache
Beginning in DataStax Enterprise (DSE) 6.0, the amount of native memory used by the DSE process has
increased significantly.
The main reason for this increase is the chunk cache (or file cache), which is like an OS page cache. The
following sections provide additional information:
• See Chunk cache history for a historical description of the chunk cache, and how it is calculated in DSE 6.0
and later.
• See Chunk cache differences from OS page cache to understand key differences between the chunk cache
and the OS page cache.
Consider the following recommendations depending on workload type for your cluster.
DSE recommendations
Regarding DSE, consider the following recommendations when choosing the max direct memory and file cache
size:
• Adequate memory for native raw memory (such as bloom filters and off-heap memtables)
For 64 GB servers, the default settings are typically adequate. For larger servers, increase the max direct
memory (-XX:MaxDirectMemorySize), but leave approximately 15-20% of memory for the OS and other in-
memory structures. The file cache size will be set automatically to half of that. This setting is acceptable, but the
size could be increased gradually if the cache hit rate is too low and there is still available memory on the server.
Disabling asynchronous I/O (AIO) and explicitly setting the chunk cache size (file_cache_size_in_mb) improves
performance for most DSE Search workloads. When enforced, SSTables and Lucene segments, as well as other
minor off-heap elements, will reside in the OS page cache and be managed by the kernel.
A potentially negative impact of disabling AIO might be measurably higher read latency when DSE goes to disk,
in cases where the dataset is larger than available memory.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
101
Configuration
To disable AIO and set the chunk cache size, see Disable AIO.
DSE Analytics relies heavily on memory for performance. Because Apache Spark™ effectively manages its own
memory through the Apache Spark application settings, you must determine how much memory the Apache
Spark application receives. Therefore, you must think about how much memory to allocate to the chunk cache
versus how much memory to allocate for Apache Spark applications. Similar to DSE Search, you can disable
AIO and lower the chunk cache size to provide Apache Spark with more memory.
Because DSE Graph heavily relies on several different workloads, it’s important to follow the previous
recommendations for the specific workload. If you use DSE Search or DSE Analytics with DSE Graph, lower the
chunk cache and disable AIO for the best performance. If you use DSE Graph only on top of Apache Cassandra,
increase the chunk cache gradually, leaving 15-20% of memory available for other processes.
There are several differences between the chunk cache and the OS page cache, and a full description is outside
the scope of this information. However, the following differences are relevant to DSE:
• Because the OS page cache is sized dynamically by the operating system, it can grow and shrink depending
on the available server memory. The chunk cache must be sized statically.
If the chunk cache is too small, the available server memory will be unused. For servers with large amounts
of memory (50 GB or more), the memory is wasted. If the chunk cache is too large, the available memory on
the server can reduce enough that the OS will kill the DSE process to avoid an out of memory issue.
At the time of writing, the size of the chunk cache cannot be changed dynamically so to change the size
of the chunk cache the DSE process must be restarted.
• Restarting the DSE process will destroy the chunk cache, so each time the process is restarted, the chunk
cache will be cold. The OS page cache only becomes cold after a server restart.
• The memory used by the file cache is part of the DSE process memory, and is therefore seen by the OS as
user memory. However, the OS page cache memory is seen as buffer memory.
• The chunk cache uses mostly NIO direct memory, storing file chunks into NIO byte buffers. However, NIO
does have an on-heap footprint, which DataStax is working to reduce.
The chunk cache is not new to Apache Cassandra, and was originally intended to cache small parts (chunks) of
SSTable files to make read operations faster. However, the default file access mode was memory mapped until
DSE 5.1, so the chunk cache had a secondary role and its size was limited to 512 MB.
The default setting of 512 MB was configured by the file_cache_size_in_mb parameter in cassandra.yaml.
In DSE 6.0 and later, the chunk cache has increased relevance, not just because it replaces the OS page cache
for database read operations, but because it is a central component of the asynchronous thread-per-core (TPC)
architecture.
By default, the chunk cache is configured to use the following portion of the max direct memory:
• One-half (½) of the max direct memory for the DSE process
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
102
Configuration
The max direct memory is calculated as one-half (½) of the system memory minus the JVM heap size:
You can explicitly configure the max direct memory by setting the JVM MaxDirectMemorySize (-
XX:MaxDirectMemorySize) parameter. See increasing the max direct memory. Alternatively, you can override
the max direct memory setting by explicitly configuring the file_cache_size_in_mb parameter in cassandra.yaml.
Install the latest Java Virtual Machine
Configure your operating system to use the latest build of a Technology Compatibility Kit (TCK) Certified
OpenJDK version 8. For example, OpenJDK 8 (1.8.0_151 minimum). Java 9 is not supported.
Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8. This change
is due to the end of public updates for Oracle JRE/JDK 8.
Synchronize clocks
Use Network Time Protocol (NTP) to synchronize the clocks on all nodes and application servers.
Synchronizing clocks is required because DataStax Enterprise (DSE) overwrites a column only if there is another
version whose timestamp is more recent, which can happen when machines are in different locations.
DSE timestamps are encoded as microseconds because UNIX Epoch time does not include timezone
information. The timestamp for all writes in DSE is Universal Time Coordinated (UTC). DataStax recommends
converting to local time only when generating output to be read by humans.
1
RHEL-based system $ sudo yum install ntpdate
1
On RHEL 7 and later, chrony is the default network time protocol daemon. The configuration file for chrony is located in /etc/chrony.conf
on these systems.
$ ntpstat
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
103
Configuration
Run the following command to view all current Linux kernel settings:
$ sudo sysctl -a
TCP settings
During low traffic intervals, a firewall configured with an idle connection timeout can close connections to local
nodes and nodes in other data centers. To prevent connections between nodes from timing out, set the following
network kernel settings:
These values set the TCP keepalive timeout to 60 seconds with 3 probes, 10 seconds gap between each.
The settings detect dead TCP connections after 90 seconds (60 + 10 + 10 + 10). The additional traffic is
negligible, and permanently leaving these settings is not an issue. See Firewall idle connection timeout
causes nodes to lose communication during low traffic times on Linux .
2. Change the following settings to handle thousands of concurrent connections used by the database:
Instead of changing the system TCP settings, you can prevent reset connections during streaming by tuning
the streaming_keep_alive_period_in_secs setting in cassandra.yaml.
1. Edit the /etc/pam.d/su file and uncomment the following line to enable the pam_limits.so module:
This change to the PAM configuration file ensures that the system reads the files in the /etc/security/
limits.d directory.
2. If you run DSE as root, some Linux distributions (such as Ubuntu), require setting the limits for the root user
explicitly instead of using cassandra_user:
RHEL-based systems
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
104
Configuration
All systems
vm.max_map_count = 1048575
3. Configure the following settings for the <cassandra_user> in the configuration file:
4. Reboot the server or run the following command to make all changes take effect:
$ sudo sysctl -p
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.core.optmem_max=40960
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
3. To confirm the user limits are applied to the DSE process, run the following command where pid is the
process ID of the currently running DSE process:
$ cat /proc/pid/limits
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
105
Configuration
Do not use governors that lower the CPU frequency. To ensure optimal performance, reconfigure all CPUs to
use the performance governor, which locks the frequency at maximum.
The performance governor will not switch frequencies, which means that power savings will be bypassed to
always run at maximum throughput. On most systems, run the following command to set the governor:
If this directory does not exist on your system, refer to one of the following pages based on your operating
system:
For more information, see High server load and latency when CPU frequency scaling is enabled in the DataStax
Help Center.
Disable zone_reclaim_mode on NUMA systems
The Linux kernel can be inconsistent in enabling/disabling zone_reclaim_mode, which can result in odd
performance problems.
To ensure that zone_reclaim_mode is disabled:
For more information, see Peculiar Linux kernel performance problem on NUMA systems.
Disable swap
Failure to disable swap entirely can severely lower performance. Because the database has multiple replicas
and transparent failover, it is preferable for a replica to be killed immediately when memory is low rather than
go into swap. This allows traffic to be immediately redirected to a functioning replica instead of continuing to
hit the replica that has high latency due to swapping. If your system has a lot of DRAM, swapping still lowers
performance significantly because the OS swaps out executable code so that more DRAM is available for
caching disks.
If you insist on using swap, you can set vm.swappiness=1. This allows the kernel swap out the absolute least
used parts.
To make this change permanent, remove all swap file entries from /etc/fstab.
For more information, see Nodes seem to freeze after some period of time.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
106
Configuration
Complete the optimization settings for either SSDs or spinning disks. Do not complete both procedures for
either storage type.
Optimize SSDs
Complete the following steps to ensure the best settings for SSDs.
2. Apply the same rotational flag setting for any block devices created from SSD storage, such as mdarrays.
$ lsblk
4. Set the IO scheduler to either deadline or noop for each of the listed devices:
For example:
where device_name is the name of the device you want to apply settings for.
• The deadline scheduler optimizes requests to minimize IO latency. If in doubt, use the deadline
scheduler.
• The noop scheduler is the right choice when the target block device is an array of SSDs behind a high-
end IO controller that performs IO optimization.
5. Set the nr_requests value to indicate the maximum number of read and write requests that can be queued:
Machine size Value
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
107
Configuration
The recommended readahead setting for RAID on SSDs is the same as that for SSDs that are not being
used in a RAID installation.
touch /var/lock/subsys/local
echo 0 > /sys/class/block/sda/queue/rotational
echo 8 > /sys/class/block/sda/queue/read_ahead_kb
Heap size is usually between ¼ and ½ of system memory. Do not devote all memory to heap because it is also
used for offheap cache and file system cache.
See Tuning Java Virtual Machine for more information on tuning the Java Virtual Machine (JVM).
If you want to use Concurrent-Mark-Sweep (CMS) garbage collection, contact the DataStax Services team for
configuration help. Tuning Java resources provides details on circumstances where CMS is recommended,
though using CMS requires time, expertise, and repeated testing to achieve optimal results.
The easiest way to determine the optimum heap size for your environment is:
1. Set the MAX_HEAP_SIZE in the jvm.options file to a high arbitrary value on a single node.
3. Use the value for setting the heap size in the cluster.
This method decreases performance for the test node, but generally does not significantly reduce cluster
performance.
If you don't see improved performance, contact the DataStax Services team for additional help in tuning the JVM.
Check Java Hugepages settings
Many modern Linux distributions ship with the Transparent Hugepages feature enabled by default. When Linux
uses Transparent Hugepages, the kernel tries to allocate memory in large chunks (usually 2MB), rather than 4K.
This allocation can improve performance by reducing the number of pages the CPU must track. However, some
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
108
Configuration
applications still allocate memory based on 4K pages, which can cause noticeable performance problems when
Linux tries to defragment 2MB pages.
For more information, see the Cassandra Java Huge Pages blog and this RedHat bug report.
To solve this problem, disable defrag for Transparent Hugepages:
For more information, including a temporary fix, see No DSE processing but high CPU usage.
After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.
Syntax
For the properties in each section, the parent setting has zero spaces. Each child entry requires at least two
spaces. Adhere to the YAML syntax and retain the spacing.
• Default values that are not defined are shown as Default: none.
Organization
The configuration properties are grouped into the following sections:
• Quick start
The minimal properties needed for configuring a cluster.
• Default directories
If you have changed any of the default directories during installation, set these properties to the new
locations. Make sure you have root access.
• Commonly used
Properties most frequently used when configuring DataStax Enterprise.
• Performance tuning
Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O,
CPU, reads, and writes.
• Advanced
Properties for advanced users or properties that are less commonly used.
• Security
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
109
Configuration
• Continuous paging options Properties configure memory, threads, and duration when pushing pages
continuously to the client.
cluster_name
The name of the cluster. This setting prevents nodes in one logical cluster from joining another. All
nodes in a cluster must have the same value.
Default: 'Test Cluster'
listen_address
The IP address or hostname that the database binds to for connecting this node to other nodes.
Default: localhost
listen_interface
The interface that the database binds to for connecting to other nodes. Interfaces must correspond to a
single address. IP aliasing is not supported.
Set listen_address or listen_interface, not both.
Default: commented out (wlan0)
listen_interface_prefer_ipv6
Use IPv4 or IPv6 when interface is specified by name.
When only a single address is used, that address is selected without regard to this setting.
Default: commented out (false)
Default directories
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
cdc_raw_directory: /var/lib/cassandra/cdc_raw
hints_directory: /var/lib/cassandra/hints
saved_caches_directory: /var/lib/cassandra/saved_caches
If you have changed any of the default directories during installation, set these properties to the new locations.
Make sure you have root access.
data_file_directories
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
110
Configuration
The directory where table data is stored on disk. The database distributes data evenly across the
location, subject to the granularity of the configured compaction strategy. If not set, the directory is
$DSE_HOME/data/data.
For production, DataStax recommends RAID 0 and SSDs.
Default: - /var/lib/cassandra/data
commitlog_directory
The directory where the commit log is stored. If not set, the directory is $DSE_HOME/data/commitlog.
For optimal write performance, place the commit log on a separate disk partition, or ideally on a
separate physical device from the data file directories. Because the commit log is append only, a hard
disk drive (HDD) is acceptable.
DataStax recommends explicitly setting the location of the DSE Metrics Collector data directory.
When the DSE Metrics Collector is enabled and when the insights_options data dir is not explicitly
set in dse.yaml, the default location of the DSE Metrics Collector data directory is the same directory
as the commitlog directory.
Default: /var/lib/cassandra/commitlog
cdc_raw_directory
The directory where the change data capture (CDC) commit log segments are stored on flush. DataStax
recommends a physical device that is separate from the data directories. If not set, the directory is
$DSE_HOME/data/cdc_raw. See Change Data Capture (CDC) logging.
Default: /var/lib/cassandra/cdc_raw
hints_directory
The directory in which hints are stored. If not set, the directory is $CASSANDRA_HOME/data/hints.
Default: /var/lib/cassandra/hints
saved_caches_directory
The directory location where table key and row caches are stored. If not set, the directory is
$DSE_HOME/data/saved_caches.
Default: /var/lib/cassandra/saved_caches
Commonly used properties
Properties most frequently used when configuring DataStax Enterprise.
Before starting a node for the first time, DataStax recommends that you carefully evaluate your requirements.
commit_failure_policy: stop
prepared_statements_cache_size_mb:
# disk_optimization_strategy: ssd
disk_failure_policy: stop
endpoint_snitch: com.datastax.bdp.snitch.DseSimpleSnitch
seed_provider:
- org.apache.cassandra.locator.SimpleSeedProvider
- seeds: "127.0.0.1"
enable_user_defined_functions: false
enable_scripted_user_defined_functions: false
enable_user_defined_functions_threads: true
commit_failure_policy
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
111
Configuration
• die - Shut down the node and kill the JVM, so the node can be replaced.
• stop - Shut down the node, leaving the node effectively dead, available for inspection using JMX.
• stop_commit - Shut down the commit log, letting writes collect but continuing to service reads.
Default: stop
prepared_statements_cache_size_mb
Maximum size of the native protocol prepared statement cache. Change this value only if there are
more prepared statements than fit in the cache.
Generally, the calculated default value is appropriate and does not need adjusting. DataStax
recommends contacting the DataStax Services team before changing this value.
Specifying a value that is too large results in long running GCs and possibly out-of-memory errors.
Keep the value at a small fraction of the heap.
Constantly re-preparing statements is a performance penalty. When not set, the default is automatically
calculated to heap / 256 or 10 MB, whichever is greater.
Default: calculated
disk_optimization_strategy
The strategy for optimizing disk reads.
• die - Shut down gossip and client transports, and kill the JVM for any file system errors or single
SSTable errors, so the node can be replaced.
• stop_paranoid - Shut down the node, even for single SSTable errors.
• stop - Shut down the node, leaving the node effectively dead, but available for inspection using
JMX.
• best_effort - Stop using the failed disk and respond to requests based on the remaining available
SSTables. This setting allows obsolete data at consistency level of ONE.
• ignore - Ignore fatal errors and lets the requests fail; all file system errors are logged but otherwise
ignored.
• DseSimpleSnitch
Appropriate only for development deployments. Proximity is determined by DSE workload, which
places transactional, analytics, and search nodes into their separate datacenters. Does not
recognize datacenter or rack information.
• GossipingPropertyFileSnitch
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
112
Configuration
Recommended for production. Reads rack and datacenter for the local node in cassandra-
rackdc.properties file and propagates these values to other nodes via gossip. For migration from
the PropertyFileSnitch, uses the cassandra-topology.properties file if it is present.
• PropertyFileSnitch
Determines proximity by rack and datacenter that are explicitly configured in cassandra-
topology.properties file.
• Ec2Snitch
For EC2 deployments in a single region. Loads region and availability zone information from the
Amazon EC2 API. The region is treated as the datacenter, the availability zone is treated as the
rack, and uses only private IP addresses. For this reason, Ec2Snitch does not work across multiple
regions.
• Ec2MultiRegionSnitch
Uses the public IP as the broadcast_address to allow cross-region connectivity. This means you
must also set seed addresses to the public IP and open the storage_port or ssl_storage_port
on the public IP firewall. For intra-region traffic, the database switches to the private IP after
establishing a connection.
• RackInferringSnitch
Proximity is determined by rack and datacenter, which are assumed to correspond to the 3rd and
2nd octet of each node's IP address, respectively. Best used as an example for writing a custom
snitch class (unless this happens to match your deployment conventions).
• GoogleCloudSnitch
Use for deployments on Google Cloud Platform across one or more regions. The region is
treated as a datacenter and the availability zones are treated as racks within the datacenter. All
communication occurs over private IP addresses within the same logical network.
• CloudstackSnitch
Use the CloudstackSnitch for Apache Cloudstack environments.
See Snitches.
Default: com.datastax.bdp.snitch.DseSimpleSnitch
seed_provider
The addresses of hosts that are designated as contact points in the cluster. A joining node contacts one
of the nodes in the -seeds list to learn the topology of the ring.
Use only seed provider implementations bundled with DSE.
• class_name - The class that handles the seed logic. It can be customized, but this is typically not
required.
Default: org.apache.cassandra.locator.SimpleSeedProvider
• - seeds - A comma delimited list of addresses that are used by gossip for bootstrapping new nodes
joining a cluster. If your cluster includes multiple nodes, you must change the list from the default
value to the IP address of one of the nodes.
Default: "127.0.0.1"
Making every node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a
small seed list (approximately three nodes per datacenter).
See Initializing a single datacenter per workload type and Initializing multiple datacenters per
workload type.
Default: org.apache.cassandra.locator.SimpleSeedProvider
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
113
Configuration
enable_user_defined_functions
Enables user defined functions (UDFs). UDFs present a security risk, since they are executed on the
server side. UDFs are executed in a sandbox to contain the execution of malicious code.
• true - Enabled. Supports Java as the code language. Detect endless loops and unintended memory
leaks.
• false - Disabled.
• true - Enabled. Only one instance of a function can run at one time. Asynchronous execution
prevents UDFs from running too long or forever and destabilizing the cluster.
• false - Disabled. Allows multiple instances of the same function to run simultaneously. Required to
use UDFs within GROUP BY clauses.
Disabling asynchronous UDF execution implicitly disables the security manager. You must
monitor the read timeouts for UDFs that run too long or forever, which can cause the cluster to
destabilize.
Default: true
Common compaction settings
compaction_throughput_mb_per_sec: 16
compaction_large_partition_warning_threshold_mb: 100
compaction_throughput_mb_per_sec
The MB per second to throttle compaction for the entire system. The faster the database inserts data,
the faster the system must compact in order to keep the SSTable count down.
memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 2048
memtable_heap_space_in_mb
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
114
Configuration
The amount of on-heap memory allocated for memtables. The database uses the total of this amount
and the value of memtable_offheap_space_in_mb to set a threshold for automatic memtable flush.
See memtable_cleanup_threshold and Tuning the Java heap.
Default: calculated 1/4 of heap size (2048)
memtable_offheap_space_in_mb
The amount of off-heap memory allocated for memtables. The database uses the total of this amount
and the value of memtable_heap_space_in_mb to set a threshold for automatic memtable flush.
See memtable_cleanup_threshold and Tuning the Java heap.
Default: calculated 1/4 of heap size (2048)
Common automatic backup settings
incremental_backups: false
snapshot_before_compaction: false
incremental_backups
Enables incremental backups.
• true - Enable incremental backups to create a hard link to each SSTable flushed or streamed
locally in a backups subdirectory of the keyspace data. Incremental backups enable storing
backups off site without transferring entire snapshots.
The database does not automatically clear incremental backup files. DataStax recommends
setting up a process to clear incremental backup hard links each time a new snapshot is
created.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
115
Configuration
• Streaming settings
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
# commitlog_sync_group_window_in_ms: 1000
# commitlog_sync_batch_window_in_ms: 2 //deprecated
commitlog_segment_size_in_mb: 32
# commitlog_total_space_in_mb: 8192
# commitlog_compression:
# - class_name: LZ4Compressor
# parameters:
# -
commitlog_sync
Commit log synchronization method:
• periodic - Send ACK signal for writes immediately. Commit log is synced every
commitlog_sync_period_in_ms.
• group - Send ACK signal for writes after the commit log has been flushed to disk. Wait up to
commitlog_sync_group_window_in_ms between flushes.
• batch - Send ACK signal for writes after the commit log has been flushed to disk. Each incoming
write triggers the flush task.
Default: periodic
commitlog_sync_period_in_ms
Use with commitlog_sync: periodic. Time interval between syncing the commit log to disk. Periodic
syncs are acknowledged immediately.
Default: 10000
commitlog_sync_group_window_in_ms
Use with commitlog_sync: group. The time that the database waits between flushing the commit log
to disk. DataStax recommends using group instead of batch.
Default: commented out (1000)
commitlog_sync_batch_window_in_ms
Deprecated. Use with commitlog_sync: batch. The maximum length of time that queries may be
batched together.
Default: commented out (2)
commitlog_segment_size_in_mb
The size of an individual commitlog file segment. A commitlog segment may be archived, deleted, or
recycled after all its data has been flushed to SSTables. This data can potentially include commitlog
segments from every table in the system. The default size is usually suitable, but for commitlog
archiving you might want a finer granularity; 8 or 16 MB is reasonable.
If you set max_mutation_size_in_kb explicitly, then you must set commitlog_segment_size_in_mb to:
2 * max_mutation_size_in_kb / 1024
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
116
Configuration
Default: 32
max_mutation_size_in_kb
The maximum size of a mutation before the mutation is rejected. Before increasing the commitlog
segment size of the commitlog segments, investigate why the mutations are larger than expected. Look
for underlying issues with access patterns and data model, because increasing the commitlog segment
size is a limited fix. When not set, the default is calculated as (commitlog_segment_size_in_mb *
1024) / 2.
Default: calculated
commitlog_total_space_in_mb
Disk usage threshold for commit logs before triggering the database flushing memtables to disk. If the
total space used by all commit logs exceeds this threshold, the database flushes memtables to disk for
the oldest commitlog segments to reclaim disk space by removing those log segments from the commit
log. This flushing reduces the amount of data to replay on start-up, and prevents infrequently updated
tables from keeping commitlog segments indefinitely. If the commitlog_total_space_in_mb is small,
the result is more flush activity on less-active tables.
See Configuring memtable thresholds.
Default for 64-bit JVMs: calculated (8192 or 25% of the total space of the commit log
value, whichever is smaller)
Default for 32-bit JVMs: calculated (32 or 25% of the total space of the commit log value,
whichever is smaller )
commitlog_compression
The compressor to use if commit log is compressed. To make changes, uncomment the
commitlog_compression section and these options:
# commitlog_compression:
# - class_name: LZ4Compressor
# parameters:
# -
When not set, the default compression for the commit log is uncompressed.
Default: commented out
Lightweight transactions (LWT) settings
concurrent_lw_transactions
Maximum number of permitted concurrent lightweight transactions (LWT).
• A higher number might improve throughput if non-contending LWTs are in heavy use, but will use
more memory and might be less successful with contention.
• When not set, the default value is 8x the number of TPC cores. This default value is appropriate for
most environments.
cdc_enabled: false
cdc_total_space_in_mb: 4096
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
117
Configuration
cdc_free_space_check_interval_ms: 250
• true - use CDC functionality to reject mutations that contain a CDC-enabled table if at space limit
threshold in cdc_raw_directory.
Default: false
cdc_total_space_in_mb
Total space to use for change-data-capture (CDC) logs on disk. If space allocated for CDC exceeds
this value, the database throws WriteTimeoutException on mutations, including CDC-enabled tables.
A CDCCompactor (a consumer) is responsible for parsing the raw CDC logs and deleting them when
parsing is completed.
Default: calculated (4096 or 1/8th of the total space of the drive where the cdc_raw_directory resides)
cdc_free_space_check_interval_ms
Interval between checks for new available space for CDC-tracked tables when the
cdc_total_space_in_mb threshold is reached and the CDCCompactor is running behind or experiencing
back pressure. When not set, the default is 250.
Default: commented out (250)
Compaction settings
#concurrent_compactors: 1
# concurrent_validations: 0
concurrent_materialized_view_builders: 2
sstable_preemptive_open_interval_in_mb: 50
# pick_level_on_streaming: false
See also compaction_throughput_mb_per_sec in the common compaction settings section and Configuring
compaction.
concurrent_compactors
The number of concurrent compaction processes allowed to run simultaneously on a node, not
including validation compactions for anti-entropy repair. Simultaneous compactions help preserve
read performance in a mixed read-write workload by limiting the number of small SSTables that
accumulate during a single long-running compaction. If your data directories are backed by SSDs,
increase this value to the number of cores. If compaction running too slowly or too fast, adjust
compaction_throughput_mb_per_sec first.
Increasing concurrent compactors leads to more use of available disk space for compaction,
because concurrent compactions happen in parallel, especially for STCS. Ensure that adequate disk
space is available before increasing this configuration.
Generally, the calculated default value is appropriate and does not need adjusting. DataStax
recommends contacting the DataStax Services team before changing this value.
Default: calculated The fewest number of disks or number of cores, with a minimum of 2 and a
maximum of 8 per CPU core.
concurrent_validations
Number of simultaneous repair validations to allow. When not set, the default is unbounded. Values less
than one are interpreted as unbounded.
Default: commented out (0) unbounded
concurrent_materialized_view_builders
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
118
Configuration
Number of simultaneous materialized view builder tasks allowed to run concurrently. When a view
is created, the node ranges are split into (num_processors * 4) builder tasks and submitted to this
executor.
Default: 2
sstable_preemptive_open_interval_in_mb
The size of the SSTables to trigger preemptive opens. The compaction process opens SSTables before
they are completely written and uses them in place of the prior SSTables for any range previously
written. This process helps to smoothly transfer reads between the SSTables by reducing cache churn
and keeps hot rows hot.
A low value has a negative performance impact and will eventually cause heap pressure and GC
activity. The optimal value depends on hardware and workload.
Default: 50
pick_level_on_streaming
The compaction level for streamed-in SSTables.
• true - streamed-in SSTables of tables using LeveledCompactionStrategy (LCS) are placed on the
same level as the source node. For operational tasks like nodetool refresh or replacing a node, true
improves performance for compaction work.
memtable_allocation_type: heap_buffers
# memtable_cleanup_threshold: 0.34
memtable_flush_writers: 4
memtable_allocation_type
The method the database uses to allocate and manage memtable memory.
Default: heap_buffers
memtable_cleanup_threshold
Ratio used for automatic memtable flush.
Generally, the calculated default value is appropriate and does not need adjusting. DataStax
recommends contacting the DataStax Services team before changing this value.
When not set, the calculated default is 1/(memtable_flush_writers + 1)
Default: commented out (0.34)
memtable_flush_writers
The number of memtable flush writer threads per disk and the total number of memtables that can
be flushed concurrently, generally a combination of compute that is I/O bound. Memtable flushing
is more CPU efficient than memtable ingest. A single thread can keep up with the ingest rate of a
server on a single fast disk, until the server temporarily becomes I/O bound under contention, typically
with compaction. Generally, the default value is appropriate and does not need adjusting for SSDs.
However, the recommended default for HDDs: 2.
Default for SSDs: 4
Cache and index settings
column_index_size_in_kb: 16
# file_cache_size_in_mb: 4096
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
119
Configuration
# direct_reads_size_in_mb: 128
column_index_size_in_kb
Granularity of the index of rows within a partition. For huge rows, decrease this setting to improve seek
time. Lower density nodes might benefit from decreasing this value to 4, 2, or 1.
Default: 16
file_cache_size_in_mb
DSE 6.0.0-6.0.6: Maximum memory for buffer pooling and SSTable chunk cache. 32 MB is reserved
for pooling buffers, the remaining memory is the cache for holding recent or frequently used index
pages and uncompressed SSTable chunks. This pool is allocated off heap and is in addition to the
memory allocated for heap. Memory is allocated only when needed.
DSE 6.0.7 and later: Buffer pool is split into two pools, this setting defines the maximum memory to
use file buffers that are stored in the file cache, also known as chunk cache. Memory is allocated only
when needed but is not released. The other buffer pool is direct_reads_size_in_mb.
See Tuning Java Virtual Machine.
Default: calculated (0.5 of -XX:MaxDirectMemorySize)
direct_reads_size_in_mb
DSE 6.0.7 and later: Buffer pool is split into two pools, this setting defines the buffer pool for
transient read operations. A buffer is typically used by a read operation and then returned to this pool
when the operation is finished so that it can be reused by other operations. The other buffer pool is
file_cache_size_in_mb. When not set, the default calculated as 2 MB per TPC core thread, plus 2 MB
shared by non-TPC threads, with a maximum value of 128 MB.
Default: calculated
Streaming settings
# stream_throughput_outbound_megabits_per_sec: 200
# inter_dc_stream_throughput_outbound_megabits_per_sec: 200
# streaming_keep_alive_period_in_secs: 300
# streaming_connections_per_host: 1
stream_throughput_outbound_megabits_per_sec
Throttle for the throughput of all outbound streaming file transfers on a node. The database does
mostly sequential I/O when streaming data during bootstrap or repair which can saturate the network
connection and degrade client (RPC) performance. When not set, the value is 200 Mbps.
Default: commented out (200)
inter_dc_stream_throughput_outbound_megabits_per_sec
Throttle for all streaming file transfers between datacenters, and for network stream traffic as configured
with stream_throughput_outbound_megabits_per_sec. When not set, the value is 200 Mbps.
Should be set to a value less than or equal to stream_throughput_outbound_megabits_per_sec
since it is a subset of total throughput.
Default: commented out (200)
streaming_keep_alive_period_in_secs
Interval to send keep-alive messages to prevent reset connections during streaming. The stream
session fails when a keep-alive message is not received for 2 keep-alive cycles. When not set, the
default is 300 seconds (5 minutes) so that a stalled stream times out in 10 minutes.
Default: commented out (300)
streaming_connections_per_host
Maximum number of connections per host for streaming. Increase this value when you notice that joins
are CPU-bound, rather than network-bound. For example, a few nodes with large files. When not set,
the default is 1.
Default: commented out (1)
Fsync settings
trickle_fsync: true
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
120
Configuration
trickle_fsync_interval_in_kb: 10240
trickle_fsync
When set to true, causes fsync to force the operating system to flush the dirty buffers at the set
interval trickle_fsync_interval_in_kb. Enable this parameter to prevent sudden dirty buffer flushing from
impacting read latencies. Recommended for use with SSDs, but not with HDDs.
Default: false
trickle_fsync_interval_in_kb
The size of the fsync in kilobytes.
Default: 10240
max_value_size_in_mb
The maximum size of any value in SSTables. SSTables are marked as corrupted when the threshold is
exceeded.
Default: 256
Thread Per Core (TPC) parameters
#tpc_cores:
# tpc_io_cores:
io_global_queue_depth: 128
tpc_cores
The number of concurrent CoreThreads. The CoreThreads are the main workers in a DSE 6.x node,
and process various asynchronous tasks from their queue. If not set, the default is the number of cores
(processors on the machine) minus one. Note that configuring tpc_cores affects the default value for
tpc_io_cores.
To achieve optimal throughput and latency, for a given workload, set tpc_cores to half the number
of CPUs (minimum) to double the number of CPUs (maximum). In cases where there are a large
number of incoming client connections, increasing tpc_cores to more than the default usually results in
CoreThreads receiving more CPU time.
DSE Search workloads only: set tpc_cores to the number of physical CPUs. See Tuning search
for maximum indexing throughput.
Default: commented out; defaults to the number of cores minus one.
tpc_io_cores
The subset of tpc_cores that process asynchronous IO tasks. (That is, disk reads.) Must be smaller or
equal to tpc_cores. Lower this value to decrease parallel disk IO requests.
Default: commented out; by default, calculated as min(io_global_queue_depth/4, tpc_cores)
io_global_queue_depth
Global IO queue depth used for reads when AIO is enabled, which is the default for SSDs. The optimal
queue depth as found with the fio tool for a given disk setup.
Default: 128
NodeSync parameters
nodesync:
rate_in_kb: 1024
rate_in_kb
The maximum kilobytes per second for data validation on the local node. The optimum validation rate
for each node may vary.
Default: 1024
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
121
Configuration
Advanced properties
Properties for advanced users or properties that are less commonly used.
Advanced initialization properties
batch_size_warn_threshold_in_kb: 64
batch_size_fail_threshold_in_kb: 640
unlogged_batch_across_partitions_warn_threshold: 10
# broadcast_address: 1.2.3.4
# listen_on_broadcast_address: false
# initial_token:
# num_tokens: 128
# allocate_tokens_for_local_replication_factor: 3
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
auto_bootstrap
This setting has been removed from default configuration.
• true - causes new (non-seed) nodes migrate the right data to themselves automatically
• true - If this node uses multiple physical network interfaces, set a unique IP address for
broadcast_address
• false - if this node is on a network that automatically routes between public and private networks,
like Amazon EC2 does
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
122
Configuration
See listen_address.
Default: false
initial_token
The token to start the contiguous range. Set this property for single-node-per-token architecture, in
which a node owns exactly one contiguous range in the ring space. Setting this property overrides
num_tokens.
If your installation is not using vnodes or this node's num_tokens is set it to 1 or is commented out, you
should always set an initial_token value when setting up a production cluster for the first time, and
when adding capacity. See Generating tokens.
Use this parameter only with num_tokens (vnodes ) in special cases such as Restoring from a
snapshot.
Default: 1 (disabled)
num_tokens
Define virtual node (vnode) token architecture.
All other nodes in the datacenter must have the same token architecture.
• a number between 2 and 128 - the number of token ranges to assign to this virtual node (vnode). A
higher value increases the probability that the data and workload are evenly distributed.
DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured for
your environment.
Using vnodes can impact performance for your cluster. DataStax recommends testing the
configuration before enabling vnodes in production environments.
When the token number varies between nodes in a datacenter, the vnode logic assigns a
proportional number of ranges relative to other nodes in the datacenter. In general, if all nodes
have equal hardware capability, each node should have the same num_tokens value.
Default: 1 (disabled)
To migrate an existing cluster from single node per token range to vnodes, see Enabling virtual nodes
on an existing production cluster.
allocate_tokens_for_local_replication_factor
• RF of keyspaces in datacenter - triggers the recommended algorithmic allocation for the RF and
num_tokens for this node.
The allocation algorithm optimizes the workload balance using the target keyspace replication
factor. DataStax recommends setting the number of tokens to 8 to distribute the workload with
~10% variance between nodes. The allocation algorithm attempts to choose tokens in a way that
optimizes replicated load over the nodes in the datacenter for the specified RF. The load assigned
to each node is close to proportional to the number of vnodes.
The allocation algorithm is supported only for the Murmur3Partitioner and RandomPartitioner
partitioners. The Murmur3Partitioner is the default partitioning strategy for new clusters and the
right choice for new clusters in almost all cases.
• commented out - uses the random selection algorithm to assign token ranges randomly.
Over time, loads in a datacenter using the random selection algorithm become unevenly
distributed. DataStax recommends using only the allocation algorithm.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
123
Configuration
The class that distributes rows (by partition key) across all nodes in the cluster. Any IPartitioner
may be used, including your own as long as it is in the class path. For new clusters use the default
partitioner.
DataStax Enterprise provides the following partitioners for backward compatibility:
• RandomPartitioner
• ByteOrderedPartitioner (deprecated)
• OrderPreservingPartitioner (deprecated)
See Partitioners.
Default: org.apache.cassandra.dht.Murmur3Partitioner
tracetype_query_ttl
TTL for different trace types used during logging of the query process.
Default: 86400
tracetype_repair_ttl
TTL for different trace types used during logging of the repair process.
Default: 604800
Advanced automatic backup setting
auto_snapshot: true
auto_snapshot
Enables snapshots of the data before truncating a keyspace or dropping a table. To prevent data loss,
DataStax strongly advises using the default setting. If you set auto_snapshot to false, you lose data on
truncation or drop.
Default: true
Global row properties
column_index_cache_size_in_kb: 2
# row_cache_class_name: org.apache.cassandra.cache.OHCProvider
row_cache_size_in_mb: 0
row_cache_save_period: 0
# row_cache_keys_to_save: 100
When creating or modifying tables, you can enable or disable the row cache for that table by setting the caching
parameter. Other row cache tuning and configuration options are set at the global (node) level. The database
uses these settings to automatically distribute memory for each table on the node based on the overall workload
and specific table usage. You can also configure the save periods for these caches globally.
column_index_cache_size_in_kb
(Only applies to BIG format SSTables) Threshold for the total size of all index entries for a partition that
the database stores in the partition key cache. If the total size of all index entries for a partition exceeds
this amount, the database stops putting entries for this partition into the partition key cache.
Default: 2
row_cache_class_name
The classname of the row cache provider to use. Valid values:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
124
Configuration
Default: 0 (disabled)
row_cache_save_period
The number of seconds that rows are kept in cache. Caches are saved to saved_caches_directory. This
setting has limited use as described in row_cache_size_in_mb.
Default: 0 (disabled)
row_cache_keys_to_save
The number of keys from the row cache to save. All keys are saved.
Default: commented out (100)
Counter caches properties
counter_cache_size_in_mb:
counter_cache_save_period: 7200
# counter_cache_keys_to_save: 100
Counter cache helps to reduce counter locks' contention for hot counter cells. In case of RF = 1 a counter cache
hit causes the database to skip the read before write entirely. With RF > 1 a counter cache hit still helps to
reduce the duration of the lock hold, helping with hot counter cell updates, but does not allow skipping the read
entirely. Only the local (clock, count) tuple of a counter cell is kept in memory, not the whole counter, so it is
relatively cheap.
If you reduce the counter cache size, the database may load the hottest keys start-up.
counter_cache_size_in_mb
When no value is set, the database uses the smaller of minimum of 2.5% of heap or 50 megabytes
(MB). If your system performs counter deletes and relies on low gc_grace_seconds, you should disable
the counter cache. To disable, set to 0.
Default: calculated
counter_cache_save_period
The time, in seconds, after which the database saves the counter cache (keys only). The database
saves caches to saved_caches_directory.
Default: 7200 (2 hours)
counter_cache_keys_to_save
Number of keys from the counter cache to save. When not set, the database saves all keys.
Default: commented out (disabled, saves all keys)
Tombstone settings
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
When executing a scan, within or across a partition, the database must keep tombstones in memory to allow
them to return to the coordinator. The coordinator uses tombstones to ensure that other replicas know about the
deleted rows. Workloads that generate numerous tombstones may cause performance problems and exhaust
the server heap. Adjust these thresholds only if you understand the impact and want to scan more tombstones.
You can adjust these thresholds at runtime using the StorageServiceMBean.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
125
Configuration
See the DataStax Developer Blog post Cassandra anti-patterns: Queues and queue-like datasets.
tombstone_warn_threshold
The database issues a warning if a query scans more than this number of tombstones.
Default: 1000
tombstone_failure_threshold
The database aborts a query if it scans more than this number of tombstones.
Default: 100000
Network timeout settings
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
aggregated_request_timeout_in_ms: 120000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
# cross_dc_rtt_in_ms: 0
read_request_timeout_in_ms
Default: 5000. How long the coordinator waits for read operations to complete before timing it out.
range_request_timeout_in_ms
Default: 10000. How long the coordinator waits for sequential or index scans to complete before timing
it out.
aggregated_request_timeout_in_ms
How long the coordinator waits for sequential or index scans to complete. Lowest acceptable value is
10 ms. This timeout does not apply to aggregated queries such as SELECT, COUNT(*), MIN(x), and so
on.
Default: 120000 (2 minutes)
write_request_timeout_in_ms
How long the coordinator waits for write requests to complete with at least one node in the local
datacenter. Lowest acceptable value is 10 ms.
See Hinted handoff: repair during write path.
Default: 2000 (2 seconds)
counter_write_request_timeout_in_ms
How long the coordinator waits for counter writes to complete before timing it out.
Default: 5000 (5 seconds)
cas_contention_timeout_in_ms
How long the coordinator continues to retry a CAS (compare and set) operation that contends with other
proposals for the same row. If the coordinator cannot complete the operation within this timespan, it
aborts the operation.
Default: 1000 (1 second)
truncate_request_timeout_in_ms
How long the coordinator waits for a truncate (the removal of all data from a table) to complete before
timing it out. The long default value allows the database to take a snapshot before removing the data. If
auto_snapshot is disabled (not recommended), you can reduce this time.
Default: 60000 (1 minute)
request_timeout_in_ms
The default timeout value for other miscellaneous operations. Lowest acceptable value is 10 ms.
See Hinted handoff: repair during write path.
Default: 10000
cross_dc_rtt_in_ms
How much to increase the cross-datacenter timeout (write_request_timeout_in_ms +
cross_dc_rtt_in_ms) for requests that involve only nodes in a remote datacenter. This setting is
intended to reduce hint pressure.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
126
Configuration
DataStax recommends using LOCAL_* consistency levels (CL) for read and write requests in multi-
datacenter deployments to avoid timeouts that may occur when remote nodes are chosen to satisfy
the CL, such as QUORUM.
Default: commented out (0)
slow_query_log_timeout_in_ms
Default: 500. How long before a node logs slow queries. Select queries that exceed this value generate
an aggregated log message to identify slow queries. To disable, set to 0.
Inter-node settings
storage_port: 7000
cross_node_timeout: false
# internode_send_buff_size_in_bytes:
# internode_recv_buff_size_in_bytes:
internode_compression: dc
inter_dc_tcp_nodelay: false
storage_port
The port for inter-node communication. Follow security best practices, do not expose this port to the
internet. Apply firewall rules.
See Securing DataStax Enterprise ports.
Default: 7000
cross_node_timeout
Enables operation timeout information exchange between nodes to accurately measure request
timeouts. If this property is disabled, the replica assumes any requests are forwarded to it instantly by
the coordinator. During overload conditions this means extra time is required for processing already-
timed-out requests.
Before enabling this property make sure NTP (network time protocol) is installed and the times are
synchronized among the nodes.
Default: false
internode_send_buff_size_in_bytes
The sending socket buffer size, in bytes, for inter-node calls.
See TCP settings.
• /proc/sys/net/core/wmem_max
• /proc/sys/net/core/rmem_max
• /proc/sys/net/ipv4/tcp_wmem
• /proc/sys/net/ipv4/tcp_wmem
• none - No compression.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
127
Configuration
Default: dc
inter_dc_tcp_nodelay
Enables tcp_nodelay for inter-datacenter communication. When disabled, the network sends larger,
but fewer, network packets. This reduces overhead from the TCP protocol itself. However, disabling
inter_dc_tcp_nodelay may increase latency by blocking cross datacenter responses.
Default: false
Native transport (CQL Binary Protocol)
start_native_transport: true
native_transport_port: 9042
# native_transport_port_ssl: 9142
# native_transport_max_frame_size_in_mb: 256
# native_transport_max_concurrent_connections: -1
# native_transport_max_concurrent_connections_per_ip: -1
native_transport_address: localhost
# native_transport_interface: eth0
# native_transport_interface_prefer_ipv6: false
# native_transport_broadcast_address: 1.2.3.4
native_transport_keepalive: true
start_native_transport
Enables or disables the native transport server.
Default: true
native_transport_port
The port where the CQL native transport listens for clients. For security reasons, do not expose this port
to the internet. Firewall it if needed.
Default: 9042
native_transport_max_frame_size_in_mb
The maximum allowed size of a frame. Frame (requests) larger than this are rejected as invalid.
Default: 256
native_transport_max_concurrent_connections
The maximum number of concurrent client connections.
Default: -1 (unlimited)
native_transport_max_concurrent_connections_per_ip
The maximum number of concurrent client connections per source IP address.
Default: -1 (unlimited)
native_transport_address
When left blank, uses the configured hostname of the node. Unlike the listen_address, this value
can be set to 0.0.0.0, but you must set the native_transport_broadcast_address to a value other than
0.0.0.0.
Set native_transport_address OR native_transport_interface, not both.
Default: localhost
native_transport_interface
IP aliasing is not supported.
Set native_transport_address OR native_transport_interface, not both.
Default: eth0
native_transport_interface_prefer_ipv6
Use IPv4 or IPv6 when interface is specified by name.
When only a single address is used, that address is selected without regard to this setting.
Default: commented out (false)
native_transport_broadcast_address
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
128
Configuration
Native transport address to broadcast to drivers and other DSE nodes. This cannot be set to 0.0.0.0.
# gc_log_threshold_in_ms: 200
# gc_warn_threshold_in_ms: 1000
# otc_coalescing_strategy: DISABLED
# otc_coalescing_window_us: 200
# otc_coalescing_enough_coalesced_messages: 8
gc_log_threshold_in_ms
The threshold for log messages at the INFO level. Adjust to minimize logging.
Default: commented out (200)
gc_warn_threshold_in_ms
Threshold for GC pause. Any GC pause longer than this interval is logged at the WARN level. By
default, the database logs any GC pause greater than 200 ms at the INFO level.
• FIXED
• MOVINGAVERAGE
• TIMEHORIZON
• DISABLED
• For FIXED strategy - the amount of time after the first message is received before it is sent with any
accompanying messages.
• For MOVING average - the maximum wait time and the interval that messages must arrive on
average to enable coalescing.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
129
Configuration
The percentage of time that gossip messages are sent to a seed node during each round of gossip.
Decreases the time to propagate gossip changes across the cluster.
Default: 1.0 (100%)
Backpressure settings
back_pressure_enabled: false
back_pressure_strategy:
- class_name: org.apache.cassandra.net.RateBasedBackPressure
parameters:
- high_ratio: 0.90
factor: 5
flow: FAST
back_pressure_enabled
Enables the coordinator to apply the specified back pressure strategy to each mutation that is sent to
replicas.
Default: false
back_pressure_strategy
To add new strategies, implement org.apache.cassandra.net.BackpressureStrategy and provide a
public constructor that accepts a Map<String, Object>.
Use only strategy implementations bundled with DSE.
class_name
The default class_name uses the ratio between incoming mutation responses and outgoing mutation
requests.
Default: org.apache.cassandra.net.RateBasedBackPressure
high_ratio
When outgoing mutations are below this value, they are rate limited according to the incoming rate
decreased by the factor (described below). When above this value, the rate limiting is increased by the
factor.
Default: 0.90
factor
A number between 1 and 10. When backpressure is below high ratio, outgoing mutations are rate
limited according to the incoming rate decreased by the given factor; if above high ratio, the rate limiting
is increased by the given factor.
Default: 5
flow
The flow speed to apply rate limiting:
Default: FAST
dynamic_snitch_badness_threshold
The performance threshold for dynamically routing client requests away from a poorly performing
node. Specifically, it controls how much worse a poorly performing node has to be before the dynamic
snitch prefers other replicas. A value of 0.2 means the database continues to prefer the static snitch
values until the node response time is 20% worse than the best performing node. Until the threshold is
reached, incoming requests are statically routed to the closest replica as determined by the snitch.
Default: 0.1
dynamic_snitch_reset_interval_in_ms
Time interval after which the database resets all node scores. This allows a bad node to recover.
Default: 600000
dynamic_snitch_update_interval_in_ms
The time interval, in milliseconds, between the calculation of node scores. Because score calculation is
CPU intensive, be careful when reducing this interval.
Default: 100
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
130
Configuration
hinted_handoff_enabled: true
# hinted_handoff_disabled_datacenters:
# - DC1
# - DC2
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_directory: /var/lib/cassandra/hints
hints_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128
#hints_compression:
# - class_name: LZ4Compressor
# parameters:
# -
batchlog_replay_throttle_in_kb: 1024
# batchlog_endpoint_strategy: random_remote
hinted_handoff_enabled
Enables or disables hinted handoff. A hint indicates that the write needs to be replayed to an
unavailable node. The database writes the hint to a hints file on the coordinator node.
• true - globally enable hinted handoff, except for datacenters specified for
hinted_handoff_disabled_datacenters
Default: true
hinted_handoff_disabled_datacenters
A blacklist of datacenters that will not perform hinted handoffs. To disable hinted handoff on a certain
datacenter, add its name to this list.
Default: commented out
max_hint_window_in_ms
Maximum amount of time during which the database generates hints for an unresponsive node.
After this interval, the database does not generate any new hints for the node until it is back up and
responsive. If the node goes down again, the database starts a new interval. This setting can prevent a
sudden demand for resources when a node is brought back online and the rest of the cluster attempts
to replay a large volume of hinted writes.
See About failure detection and recovery.
Default: 10800000 (3 hours)
hinted_handoff_throttle_in_kb
Maximum amount of traffic per delivery thread in kilobytes per second. This rate reduces proportionally
to the number of nodes in the cluster. For example, if there are two nodes in the cluster, each delivery
thread uses. the maximum rate. If there are three, each node throttles to half of the maximum, since the
two nodes are expected to deliver hints simultaneously.
When applying this limit, the calculated hint transmission rate is based on the uncompressed hint
size, even if internode_compression or hints_compression is enabled.
Default: 1024
hints_flush_period_in_ms
The time, in milliseconds, to wait before flushing hints from internal buffers to disk.
Default: 10000
max_hints_delivery_threads
Number of threads the database uses to deliver hints. In multiple datacenter deployments, consider
increasing this number because cross datacenter handoff is generally slower.
Default: 2
max_hints_file_size_in_mb
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
131
Configuration
• random_remote - Default, purely random. Prevents the local rack, if possible. Same behavior as
earlier releases.
• dynamic - Mostly the same as dynamic_remote, except that local rack is not excluded, which offers
lower availability guarantee than random_remote or dynamic_remote. Note: this strategy will fall
back to random_remote if dynamic_snitch is not enabled.
Default: random_remote
Security properties
DSE Advanced Security fortifies DataStax Enterprise (DSE) databases against potential harm due to deliberate
attack or user error. Configuration properties include authentication and authorization, permissions, roles,
encryption of data in-flight and at-rest, and data auditing. DSE Unified Authentication provides authentication,
authorization, and role management. Enabling DSE Unified Authentication requires additional configuration in
dse.yaml, see Configuring DSE Unified Authentication.
authenticator: com.datastax.bdp.cassandra.auth.DseAuthenticator
# internode_authenticator: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
authorizer: com.datastax.bdp.cassandra.auth.DseAuthorizer
role_manager: com.datastax.bdp.cassandra.auth.DseRoleManager
system_keyspaces_filtering: false
roles_validity_in_ms: 120000
# roles_update_interval_in_ms: 120000
permissions_validity_in_ms: 120000
# permissions_update_interval_in_ms: 120000
authenticator
The authentication backend. The only supported authenticator is DseAuthenticator for external
authentication with multiple authentication schemes such as Kerberos, LDAP, and internal
authentication. Authenticators other than DseAuthenticator are deprecated and not supported. Some
security features might not work correctly if other authenticators are used. See authentication_options in
dse.yaml.
Use only authentication implementations bundled with DSE.
Default: com.datastax.bdp.cassandra.auth.DseAuthenticator
internode_authenticator
Internode authentication backend to enable secure connections from peer nodes.
Use only authentication implementations bundled with DSE.
Default: org.apache.cassandra.auth.AllowAllInternodeAuthenticator
authorizer
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
132
Configuration
The authorization backend. Authorizers other than DseAuthorizer are not supported. DseAuthorizer
supports enhanced permission management of DSE-specific resources. Authorizers other than
DseAuthorizer are deprecated and not supported. Some security features might not work correctly if
other authorizers are used. See Authorization options in dse.yaml.
Use only authorization implementations bundled with DSE.
Default: com.datastax.bdp.cassandra.auth.DseAuthorizer
system_keyspaces_filtering
Enables system keyspace filtering so that users can access and view only schema information
for rows in the system and system_schema keyspaces to which they have access. When
system_keyspaces_filtering is set to true:
• Data in the following tables of the system keyspace are filtered based on the role's DESCRIBE
privileges for keyspaces; only rows for appropriate keyspaces will be displayed in:
# size_estimates
# sstable_activity
# built_indexes
# built_views
# available_ranges
# view_builds_in_progress
• Data in all tables in the system_schema keyspace are filtered based on a role's DESCRIBE privileges
for keyspaces stored in the system_schema tables.
• Read operations against other tables in the system keyspace are denied
Security requirements and user permissions apply. Enable this feature only after appropriate user
permissions are granted. You must grant the DESCRIBE permission to role on any keyspaces stored
in the system keyspaces. If you do not grant the permission, you will see an error that states the
keyspace is not found.
See Controlling access to keyspaces and tables and Configuring the security keyspaces replication
factors.
Default: false
role_manager
The DSE Role Manager supports LDAP roles and internal roles supported by the
CassandraRoleManager. Role options are stored in the dse_security keyspace. When using the DSE
Role Manager, increase the replication factor of the dse_security keyspace. Role managers other than
DseRoleManager are deprecated and not supported. Some security features might not work correctly if
other role managers are used.
Use only role manager implementations bundled with DSE.
Default: com.datastax.bdp.cassandra.auth.DseRoleManager
roles_validity_in_ms
Validity period for roles cache in milliseconds. Determines how long to cache the list of roles assigned
to the user; users may have several roles, either through direct assignment or inheritance (a role that
has been granted to another role). Adjust this setting based on the complexity of your role hierarchy,
tolerance for role changes, the number of nodes in your environment, and activity level of the cluster.
Fetching permissions can be an expensive operation, so this setting allows flexibility. Granted roles
are cached for authenticated sessions in AuthenticatedUser. After the specified time elapses, role
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
133
Configuration
validity is rechecked. Disabled automatically when internal authentication is not enabled when using
DseAuthenticator.
• milliseconds - how long to cache the list of roles assigned to the user
REVOKE does not automatically invalidate cached permissions. Permissions are invalidated the next
time they are refreshed.
Default: 120000 (2 minutes)
permissions_update_interval_in_ms
Sets refresh interval for the standard authentication cache and the row-level access control
(RLAC) cache. After this interval, cache entries become eligible for refresh. On next access,
the database schedules an async reload and returns the old value until the reload completes. If
permissions_validity_in_ms is non-zero, the value for roles_update_interval_in_ms must also be non-
zero. When not set, the default is the same value as permissions_validity_in_ms.
Default: commented out (2000)
permissions_cache_max_entries
The maximum number of entries that are held by the standard authentication cache and row-level
access control (RLAC) cache. With the default value of 1000, the RLAC permissions cache can have
up to 1000 entries in it, and the standard authentication cache can have up to 1000 entries. This single
option applies to both caches. To size the permissions cache for use with Setting up Row Level Access
Control (RLAC), use this formula:
If this option is not present in cassandra.yaml, manually enter it to use a value other than 1000. See
Enabling DSE Unified Authentication.
Default: not set (1000)
Inter-node encryption options
Node-to-node (internode) encryption protects data that is transferred between nodes in a cluster using SSL.
server_encryption_options:
internode_encryption: none
keystore: resources/dse/conf/.keystore
keystore_password: cassandra
truststore: resources/dse/conf/.truststore
truststore_password: cassandra
# More advanced defaults below:
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
134
Configuration
# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_
# require_client_auth: false
# require_endpoint_verification: false
server_encryption_options
Inter-node encryption options. If enabled, you must also generate keys and provide the appropriate key
and truststore locations and passwords. No custom encryption options are supported.
The passwords used in these options must match the passwords used when generating the keystore
and truststore. For instructions on generating these files, see Creating a Keystore to Use with JSSE.
• none - No encryption
Default: none
keystore
Relative path from DSE installation directory or absolute path to the Java keystore (JKS) suitable for
use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer
(SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to
encrypt outgoing messages.
Default: resources/dse/conf/.keystore
keystore_password
Password for the keystore. This must match the password used when generating the keystore and
truststore.
Default: cassandra
truststore
Relative path from DSE installation directory or absolute path to truststore containing the trusted
certificate for authenticating remote servers.
Default: resources/dse/conf/.truststore
truststore_password
Password for the truststore.
Default: cassandra
protocol
Default: commented out (TLS)
algorithm
Default: commented out (SunX509)
store_type
Valid types are JKS, JCEKS, and PKCS12.
PKCS11 is not supported.
Default: commented out (JKS)
truststore_type
Valid types are JKS, JCEKS, and PKCS12.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
135
Configuration
PKCS11 is not supported. Also, due to an OpenSSL issue, you cannot use a PKCS12 truststore that
was generated via OpenSSL. For example, a truststore generated via the following command will not
work with DSE:
However, truststores generated via Java's keytool and then converted to PKCS12 work with DSE.
Example:
• TLS_RSA_WITH_AES_128_CBC_SHA
• TLS_RSA_WITH_AES_256_CBC_SHA
• TLS_DHE_RSA_WITH_AES_128_CBC_SHA
• TLS_DHE_RSA_WITH_AES_256_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
client_encryption_options:
enabled: false
# If enabled and optional is set to true, encrypted and unencrypted connections over
native transport are handled.
optional: false
keystore: resources/dse/conf/.keystore
keystore_password: cassandra
# require_client_auth: false
# Set trustore and truststore_password if require_client_auth is true
# truststore: resources/dse/conf/.truststore
# truststore_password: cassandra
# More advanced defaults below:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
136
Configuration
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_
client_encryption_options
Whether to enable client-to-node encryption. You must also generate keys and provide the appropriate
key and truststore locations and passwords. There are no custom encryption options enabled for
DataStax Enterprise.
Advanced settings:
enabled
Whether to enable client-to-node encryption.
Default: false
optional
When optional is selected, both encrypted and unencrypted connections over native transport are
allowed. That is a necessary transition state to facilitate enabling client to node encryption on live
clusters without inducing an outage for existing unencrypted clients. Typically, once existing clients
are migrated to encrypted connections, optional is unselected in order to enforce native transport
encryption.
Default: false
keystore
Relative path from DSE installation directory or absolute path to the Java keystore (JKS) suitable for
use with Java Secure Socket Extension (JSSE), which is the Java version of the Secure Sockets Layer
(SSL), and Transport Layer Security (TLS) protocols. The keystore contains the private key used to
encrypt outgoing messages.
Default: resources/dse/conf/.keystore
keystore_password
Password for the keystore.
Default: cassandra
require_client_auth
Whether to enable certificate authentication for client-to-node encryption. When not set, the default is
false.
When set to true, client certificates must be present on all nodes in the cluster.
Default: commented out (false)
truststore
Relative path from DSE installation directory or absolute path to truststore containing the trusted
certificate for authenticating remote servers.
Default: resources/dse/conf/.truststore
truststore_password
Password for the truststore. This must match the password used when generating the keystore and
truststore.
Truststore password and path is only required when require_client_auth is set to true.
Default: cassandra
protocol
Default: commented out (TLS)
algorithm
Default: commented out (SunX509)
store_type
Valid types are JKS, JCEKS and PKCS12. For file-based keystores, use PKCS12.
PKCS11 is not supported.
Default: commented out (JKS)
truststore_type
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
137
Configuration
However, truststores generated via Java's keytool and then converted to PKCS12 work with DSE.
Example:
• TLS_RSA_WITH_AES_128_CBC_SHA
• TLS_RSA_WITH_AES_256_CBC_SHA
• TLS_DHE_RSA_WITH_AES_128_CBC_SHA
• TLS_DHE_RSA_WITH_AES_256_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
transparent_data_encryption_options:
enabled: false
chunk_length_kb: 64
cipher: AES/CBC/PKCS5Padding
key_alias: testing:1
# CBC IV length for AES must be 16 bytes, the default size
# iv_length: 16
key_provider:
- class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:
- keystore: conf/.keystore
keystore_password: cassandra
store_type: JCEKS
key_password: cassandra
transparent_data_encryption_options
DataStax Enterprise supports this option only for backward compatibility. When using DSE, configure
data encryption options in the dse.yaml; see Transparent data encryption.
TDE properties:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
138
Configuration
• cipher: options:
# AES
# CBC
# PKCS5Padding
• key_alias: testing:1
• iv_length: 16
iv_length is commented out in the default cassandra.yaml file. Uncomment only if cipher is set
to AES. The value must be 16 (bytes).
• key_provider:
# class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:
# keystore: conf/.keystore
# keystore_password: cassandra
# store_type: JCEKS
# key_password: cassandra
SSL Ports
ssl_storage_port: 7001
native_transport_port_ssl: 9142
ssl_storage_port
The SSL port for encrypted communication. Unused unless enabled in encryption_options. Follow
security best practices, do not expose this port to the internet. Apply firewall rules.
Default: 7001
native_transport_port_ssl
Dedicated SSL port where the CQL native transport listens for clients with encrypted communication.
For security reasons, do not expose this port to the internet. Firewall it if needed.
Default: 9142
Continuous paging options
continuous_paging:
max_concurrent_sessions: 60
max_session_pages: 4
max_page_size_mb: 8
max_local_query_time_ms: 5000
client_timeout_sec: 600
cancel_timeout_sec: 5
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
139
Configuration
paused_check_interval_ms: 1
continuous_paging
Options to tune continuous paging that pushes pages, when requested, continuously to the client:
Guidance
• Because memtables and SSTables are used by the continuous paging query, you can define the
maximum period of time during which memtables cannot be flushed and compacted SSTables
cannot be deleted.
• If fewer threads exist than sessions, a session cannot execute until another one is swapped out.
• Distributed queries (CL > ONE or non-local data) are swapped out after every page, while local
queries at CL = ONE are swapped out after max_local_query_time_ms.
max_concurrent_sessions
The maximum number of concurrent sessions. Additional sessions are rejected with an unavailable
error.
Default: 60
max_session_pages
The maximum number of pages that can be buffered for each session. If the client is not reading from
the socket, the producer thread is blocked after it has prepared max_session_pages.
Default: 4
max_page_size_mb
The maximum size of a page, in MB. If an individual CQL row is larger than this value, the page can be
larger than this value.
Default: 8
max_local_query_time_ms
The maximum time for a local continuous query to run. When this threshold is exceeded, the
session is swapped out and rescheduled. Swapping and rescheduling ensures the release of
resources that prevent the memtables from flushing and ensures fairness when max_threads <
max_concurrent_sessions. Adjust when high write workloads exist on tables that have continuous
paging requests.
Default: 5000
client_timeout_sec
How long the server will wait, in seconds, for clients to request more pages if the client is not reading
and the server queue is full.
Default: 600
cancel_timeout_sec
How long to wait before checking if a paused session can be resumed. Continuous paging sessions
are paused because of backpressure or when the client has not request more pages with backpressure
updates.
Default: 5
paused_check_interval_ms
How long to wait, in milliseconds, before checking if a continuous paging sessions can be resumed,
when that session is paused because of backpressure.
Default: 1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
140
Configuration
# phi_convict_threshold: 8
phi_convict_threshold
The sensitivity of the failure detector on an exponential scale. Generally, this setting does not need
adjusting.
See About failure detection and recovery.
When not set, the internal value is 8.
Default: commented out (8)
Memory leak detection settings
#leaks_detection_params:
# sampling_probability: 0
# max_stacks_cache_size_mb: 32
# num_access_records: 0
# max_stack_depth: 30
sampling_probability
The sampling probability to track for the specified resource. For resources tracked, see nodetool
leaksdetection.
• A number between 0 and 1 - the percentage of time to randomly track a resource. For example,
0.5 will track resources 50% of the time.
Tracking incurs a significant stack trace collection cost for every access and consumes heap space.
Enable tracking only when directed by DataStax Support.
Default: commented out (0)
max_stacks_cache_size_mb
Set the size of the cache for call stack traces. Stack traces are used to debug leaked resources, and
use heap memory. Set the amount of heap memory dedicated to each resource by setting the max
stacks cache size in MB.
Default: commented out (32)
num_access_records
Set the average number of stack traces kept when a resource is accessed. Currently only supported for
chunks in the cache.
Default: commented out (0)
max_stack_depth
Set the depth of the stack traces collected. Changes only the depth of the stack traces that will be
collected from the time the parameter is set. Deeper stacks are more unique, so increasing the depth
may require increasing stacks_cache_size_mb.
Default: commented out (30)
dse.yaml configuration file
The dse.yaml file is the primary configuration file for security, DSE Search, DSE Graph, and DSE Analytics.
After changing properties in the dse.yaml file, you must restart the node for the changes to take effect.
The cassandra.yaml file is the primary configuration file for the DataStax Enterprise database.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
141
Configuration
Syntax
For the properties in each section, the parent setting has zero spaces. Each child entry requires at least
two spaces. Adhere to the YAML syntax and retain the spacing. For example, no spaces before the parent
node_health_options entry, and at least two spaces before the child settings:
node_health_options:
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30
Organization
The DataStax Enterprise configuration properties are grouped into the following sections:
• DSE In-Memory
• Node health
• Health-based routing
• Lease metrics
• Audit logging
• audit_logging_options
• Inter-node messaging
• DSE Multi-Instance
• Authentication options
• Authorization options
• Kerberos options
• LDAP options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
142
Configuration
Authentication options
Authentication options for the DSE Authenticator that allows you to use multiple schemes for authentication in a
DataStax Enterprise cluster. Additional authenticatorconfiguration is required in cassandra.yaml.
Internal and LDAP schemes can also used for role management, see role_management_options.
# authentication_options:
# enabled: false
# default_scheme: internal
# other_schemes:
# - ldap
# - kerberos
# scheme_permissions: false
# transitional_mode: disabled
# allow_digest_with_kerberos: true
# plain_text_without_ssl: warn
authentication_options
Options for the DseAuthenticator to authenticate users when the authenticator option in
cassandra.yaml is set to com.datastax.bdp.cassandra.auth.DseAuthenticator. Authenticators other than
DseAuthenticator are not supported.
enabled
Enables user authentication.
• false - The DseAuthenticator does not authenticate users and allows all connections.
• true - Use multiple schemes for authentication. Every role requires permissions to a scheme in
order to be assigned.
• false - Do not use multiple schemes for authentication. Prevents unintentional role assignment that
might occur if user or group names overlap in the authentication service.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
143
Configuration
Controls whether DIGEST-MD5 authentication is also allowed with Kerberos. The DIGEST-MD5
mechanism is not directly associated with an authentication scheme, but is used by Kerberos to pass
credentials between nodes and jobs.
• true - DIGEST-MD5 authentication is also allowed with Kerberos. In analytics clusters, set to true to
use Hadoop inter-node authentication with Hadoop and Spark jobs.
Analytics nodes require true to use internode authentication with Hadoop and Spark jobs. When not set,
the default is true.
Default: commented out (true)
plain_text_without_ssl
Controls how the DseAuthenticator responds to plain text authentication requests over unencrypted
client connections. Set to one of the following values:
• disabled - Transitional mode is disabled. All connections must provide valid credentials and map to
a login-enabled role.
• permissive - Only super users are authenticated and logged in. All other authentication attempts
are logged in as the anonymous user.
• normal - Allow all connections that provide credentials. Maps all authenticated users to their role
AND maps all other connections to anonymous.
• strict - Allow only authenticated connections that map to a login-enabled role OR connections that
provide a blank username and password as anonymous.
Credentials are required for all connections after authentication is enabled; use a blank username
and password to login with anonymous role in transitional mode.
#role_management_options:
# mode: internal
# stats: false
role_management_options
Options for the DSE Role Manager. To enable role manager, set:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
144
Configuration
When scheme_permissions is enabled, all roles must have permission to execute on the authentication
scheme, see Binding a role to an authentication scheme.
mode
Set to one of the following values:
• internal - Scheme that manages roles per individual user in the internal database. Allows nesting
roles for permission management.
• ldap - Scheme that assigns roles by looking up the user name in LDAP and mapping the group
attribute (ldap_options) to an internal role name. To configure an LDAP scheme, complete the
steps in Defining an LDAP scheme.
Internal role management allows nesting roles for permission management; when using LDAP mode
role, nesting is disabled. Using GRANT role_name TO role_name results in an error.
Default: commented out (internal)
stats
Set to true, to enable logging of DSE role creation and modification events in the
dse_security.role_stats system table. All nodes must have the stats option enabled, and must be
restarted for the functionality to take effect.
To query role events:
(2 rows)
#authorization_options:
# enabled: false
# transitional_mode: disabled
# allow_row_level_security: false
authorization_options
Options for the DSE Authorizer.
enabled
Whether to use the DSE Authorizer for role-based access control (RBAC).
• true - use the DSE Authorizer for role-based access control (RBAC)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
145
Configuration
• strict - Permissions can be passed to resources, and are enforced on authenticated users.
Permissions are not enforced against anonymous users.
kerberos_options:
keytab: resources/dse/conf/dse.keytab
service_principal: dse/_HOST@REALM
http_principal: HTTP/_HOST@REALM
qop: auth
kerberos_options
Options to configure security for a DataStax Enterprise cluster using Kerberos.
keytab
The file path of dse.keytab.
service_principal
The service_principal that the DataStax Enterprise process runs under must use the form dse_user/
_HOST@REALM, where:
• dse_user is the name of the user that starts the DataStax Enterprise process.
• REALM is the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
http_principal
The http_principal is used by the Tomcat application container to run DSE Search. The Tomcat
web server uses the GSSAPI mechanism (SPNEGO) to negotiate the GSSAPI security mechanism
(Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be
uppercase.
qop
A comma-delimited list of Quality of Protection (QOP) values that clients and servers can use for each
connection. The client can have multiple QOP values, while the server can have only a single QOP
value. The valid values are:
• auth-conf - Authentication plus integrity protection and encryption of all transmitted data.
Encryption using auth-conf is separate and independent of whether encryption is done using
SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax
recommends choosing only one method and using it for both encryption and authentication.
LDAP options
Define LDAP options to authenticate users against an external LDAP service and/or for Role Management using
LDAP group look up.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
146
Configuration
# ldap_options:
# server_host:
# server_port: 389
# hostname_verification: false
# search_dn:
# search_password:
# use_ssl: false
# use_tls: false
# truststore_path:
# truststore_password:
# truststore_type: jks
# user_search_base:
# user_search_filter: (uid={0})
# user_memberof_attribute: memberof
# group_search_type: directory_search
# group_search_base:
# group_search_filter: (uniquemember={0})
# group_name_attribute: cn
# credentials_validity_in_ms: 0
# search_validity_in_seconds: 0
# connection_pool:
# max_active: 8
# max_idle: 8
Microsoft Active Directory (AD) example, for both authentication and role management:
ldap_options:
server_host: win2012ad_server.mycompany.lan
server_port: 389
search_dn: cn=lookup_user,cn=users,dc=win2012domain,dc=mycompany,dc=lan
search_password: lookup_user_password
use_ssl: false
use_tls: false
truststore_path:
truststore_password:
truststore_type: jks
#group_search_type: directory_search
group_search_type: memberof_search
#group_search_base:
#group_search_filter:
group_name_attribute: cn
user_search_base: cn=users,dc=win2012domain,dc=mycompany,dc=lan
user_search_filter: (sAMAccountName={0})
user_memberof_attribute: memberOf
connection_pool:
max_active: 8
max_idle: 8
ldap_options
Options to configure LDAP security. When not set, LDAP authentication is not used.
Default: commented out
server_host
A comma separated list of LDAP server hosts.
Do not use LDAP on the same host (localhost) in production environments. Using LDAP on the same
host (localhost) is appropriate only in single node test or development environments.
Default: none
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
147
Configuration
server_port
The port on which the LDAP server listens.
• 636 - typically used for encrypted connections; the default SSL port for LDAP is 636
• A valid truststore with the correct path specified in truststore_path must exist. The truststore
must have a certificate entry, trustedCertEntry, including a SAN DNSName entry that matches the
hostname of the LDAP server.
Default: false
search_dn
Distinguished name (DN) of an account with read access to the user_search_base and
group_search_base. For example:
• OpenLDAP: uid=lookup,ou=users,dc=springsource,dc=com
Do not create/use an LDAP account or group called cassandra. The DSE database comes with a
default login role, cassandra, that has access to all database objects and uses the consistency level
QUOROM.
When not set, an anonymous bind is used for the search on the LDAP server.
Default: commented out
search_password
The password of the search_dn account.
Default: commented out
use_ssl
Whether to use an SSL-encrypted connection.
• true - use an SSL-encrypted connection, set server_port to the LDAP port for the server (typically
port 636)
• true - enable TLS connections to the LDAP server, set server_port to the TLS port of the LDAP
server.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
148
Configuration
Distinguished name (DN) of the object to start the recursive search for user entries for authentication
and role management memberof searches. For example to search all users in example.com,
ou=users,dc=example,dc=com.
• For your LDAP domain, set the ou and dc elements. Typically set to
ou=users,dc=domain,dc=top_level_domain. For example, ou=users,dc=example,dc=com.
• memberof_search - Recursively search for user entries using the user_search_base and
user_search_filter. Get groups from the user attribute defined in user_memberof_attribute.
The directory server must have memberof support.
• duration period in milliseconds - enable a search cache and improve performance by reducing the
number of requests that are sent to the internal or LDAP server. See Defining an LDAP scheme.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
149
Configuration
• duration period in seconds - enables a search cache and improves performance by reducing the
number of requests that are sent to the internal or LDAP server
system_info_encryption:
enabled: false
cipher_algorithm: AES
secret_key_strength: 128
chunk_length_kb: 64
key_provider: KmipKeyProviderFactory
kmip_host: kmip_host_name
DataStax recommends using a remote encryption key from a KMIP provider when using Transparent Data
Encryption (TDE) features. Use a local encryption key only if a KMIP server is not available.
system_info_encryption
Options to set encryption settings for system resources that might contain sensitive information,
including the system.batchlog and system.paxos tables, hint files, and the database commit log.
enabled
Whether to enable encryption of system resources. See Encrypting system resources.
The system_trace keyspace is NOT encrypted by enabling the system_information_encryption
section. In environments that also have tracing enabled, manually configure encryption with
compression on the system_trace keyspace. See Transparent data encryption.
Default: false
cipher_algorithm
The name of the JCE cipher algorithm used to encrypt system resources.
Table 11: Supported cipher algorithms names
cipher_algorithm secret_key_strength
DES 56
Blowfish 32-448
RC2 40-128
Default: AES
secret_key_strength
Length of key to use for the system resources. See Supported cipher algorithms names.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
150
Configuration
DSE uses a matching local key or requests the key type from the KMIP server. For KMIP, if an
existing key does not match, the KMIP server automatically generates a new key.
Default: 128
chunk_length_kb
Optional. Size of SSTable chunks when data from the system.batchlog or system.paxos are written to
disk.
To encrypt existing data, run nodetool upgradesstables -a system batchlog paxos on all
nodes in the cluster.
Default: 64
key_provider
KMIP key provider to enable encrypting sensitive system data with a KMIP key. Comment out if using a
local encryption key.
Default: commented out (KmipKeyProviderFactory)
kmip_host
The KMIP key server host. Set to the kmip_group_name that defines the KMIP host in kmip_hosts
section. DSE requests a key from the KMIP host and uses the key generated by the KMIP provider.
Default: commented out
Encrypted configuration properties settings
Settings for using encrypted passwords in sensitive configuration file properties.
system_key_directory: /etc/dse/conf
config_encryption_active: false
config_encryption_key_name: (key_filename | KMIP_key_URL )
system_key_directory
Path to the directory where local encryption/decryption key files are stored, also called system keys.
Distribute the system keys to all nodes in the cluster. Ensure that the DSE account is the folder owner
and has read/write/execute (700) permissions.
See Setting up local encryption keys.
This directory is not used for KMIP keys.
Default: /etc/dse/conf
config_encryption_active
Whether to enable encryption on sensitive data stored in tables and in configuration files.
Default: false
config_encryption_key_name
Set to the local encryption key filename or KMIP key URL to use for configuration file property value
decryption.
Use dsetool dsetool encryptconfigvalue to generate encrypted values for the configuration file
properties.
Default: system_key. The default name is not configurable.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
151
Configuration
kmip_hosts:
your_kmip_groupname:
hosts: kmip1.yourdomain.com, kmip2.yourdomain.com
keystore_path: pathto/kmip/keystore.jks
keystore_type: jks
keystore_password: password
truststore_path: pathto/kmip/truststore.jks
truststore_type: jks
truststore_password: password
key_cache_millis: 300000
timeout: 1000
protocol: protocol
cipher_suites: supported_cipher
kmip_hosts
Connection settings for key servers that support the KMIP protocol.
kmip_groupname
A user-defined name for a group of options to configure a KMIP server or servers, key settings, and
certificates. Configure options for a kmip_groupname section for each KMIP key server or group of
KMIP key servers. Using separate key server configuration settings allows use of different key servers
to encrypt table data, and eliminates the need to enter key server configuration information in DDL
statements and other configurations. Multiple KMIP hosts are supported.
Default: commented out
hosts
A comma-separated list KMIP hosts (host[:port]) using the FQDN (Fully Qualified Domain Name). DSE
queries the host in the listed order, so add KMIP hosts in the intended failover sequence.
For example, if the host list contains kmip1.yourdomain.com, kmip2.yourdomain.com, DSE tries
kmip1.yourdomain.com and then kmip2.yourdomain.com.
keystore_path
The path to a Java keystore created from the KMIP agent PEM files.
Default: commented out (/etc/dse/conf/KMIP_keystore.jks)
keystore_type
The type of keystore.
Default: commented out (jks)
keystore_password
The password to access the keystore.
Default: commented out (password)
truststore_path
The path to a Java truststore that was created using the KMIP root certificate.
Default: commented out (/etc/dse/conf/KMIP_truststore.jks)
truststore_type
The type of truststore.
Default: commented out (jks)
truststore_password
The password to access the truststore.
Default: commented out (password)
key_cache_millis
Milliseconds to locally cache the encryption keys that are read from the KMIP hosts. The longer the
encryption keys are cached, the fewer requests are made to the KMIP key server, but the longer it takes
for changes, like revocation, to propagate to the DataStax Enterprise node. DataStax Enterprise uses
concurrent encryption, so multiple threads fetch the secret key from the KMIP key server at the same
time. DataStax recommends using the default value.
Default: commented out (300000)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
152
Configuration
timeout
Socket timeout in milliseconds.
Default: commented out (1000)
protocol
protocol
When not specified, JVM default is used. Example: TLSv1.2
cipher_suites
When not specified, JVM default is used. Examples:
• TLS_RSA_WITH_AES_128_CBC_SHA
• TLS_RSA_WITH_AES_256_CBC_SHA
• TLS_DHE_RSA_WITH_AES_128_CBC_SHA
• TLS_DHE_RSA_WITH_AES_256_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
See cipher_algorithm.
DSE Search index encryption settings
# solr_encryption_options:
# decryption_cache_offheap_allocation: true
# decryption_cache_size_in_mb: 256
solr_encryption_options
Settings to tune encryption of search indexes.
decryption_cache_offheap_allocation
Whether to allocate shared DSE Search decryption cache off JVM heap.
• true - allocate shared DSE Search decryption cache off JVM heap
• false - do not allocate shared DSE Search decryption cache off JVM heap
# max_memory_to_lock_fraction: 0.20
# max_memory_to_lock_mb: 10240
max_memory_to_lock_fraction
A fraction of the system memory. The default value of 0.20 specifies to use up to 20% of system
memory. This max_memory_to_lock_fraction value is ignored if max_memory_to_lock_mb is set to a
non-zero value. To specify a fraction, use instead of max_memory_to_lock_mb.
Default: commented out (0.20)
max_memory_to_lock_mb
A maximum amount of memory in megabytes (MB).
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
153
Configuration
node_health_options:
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30
node_health_options
Node health options are always enabled.
refresh_rate_ms
Default: 60000
uptime_ramp_up_period_seconds
The amount of continuous uptime required for the node's uptime score to advance the node health
score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a
composite score based on dropped mutations and uptime.
If a node is repairing after a period of downtime, you might want to increase the uptime period to the
expected repair time.
Default: commented out (10800 3 hours)
dropped_mutation_window_minutes
The historic time window over which the rate of dropped mutations affect the node health score.
Default: 30
Health-based routing
enable_health_based_routing: true
enable_health_based_routing
Whether to consider node health for replication selection for distributed DSE Search queries. Health-
based routing enables a trade-off between index consistency and query throughput.
• true - consider node health when multiple candidates exist for a particular token range.
• false - ignore node health for replication selection. When the primary concern is performance, do
not enable health-based routing.
Default: true
Lease metrics
lease_metrics_options:
enabled:false
ttl_seconds: 604800
lease_metrics_options
Lease holder statistics help monitor the lease subsystem for automatic management of Job Tracker and
Spark Master nodes.
enabled
Enables (true) or disables (false) log entries related to lease holders. Most of the time you do not want
to enable logging.
Default: false
ttl_seconds
Defines the time, in milliseconds, to persist the log of lease holder changes. Logging of lease holder
changes is always on, and has a very low overhead.
Default: 604800
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
154
Configuration
• async_bootstrap_reindex
ttl_index_rebuild_options:
fixed_rate_period: 300
initial_delay: 20
max_docs_per_batch: 4096
thread_pool_size: 1
ttl_index_rebuild_options
Section of options to control the schedulers in charge of querying for and removing expired records, and
the execution of the checks.
fix_rate_period
Time interval to check for expired data in seconds.
Default: 300
initial_delay
The number of seconds to delay the first TTL check to speed up start-up time.
Default: 20
max_docs_per_batch
The maximum number of documents to check and delete per batch by the TTL rebuild thread. All
documents determined to be expired are deleted from the index during each check, to avoid memory
pressure, their unique keys are retrieved and deletes issued in batches.
Default: 4096
thread_pool_size
The maximum number of cores that can execute TTL cleanup concurrently. Set the thread_pool_size
to manage system resource consumption and prevent many search cores from executing simultaneous
TTL deletes.
Default: 1
Reindexing of bootstrapped data
async_bootstrap_reindex: false
async_bootstrap_reindex
For DSE Search, configure whether to asynchronously reindex bootstrapped data. Default: false
• If enabled, the node joins the ring immediately after bootstrap and reindexing occurs
asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down.
The dsetool ring command can be used to check the status of the reindexing.
• If disabled, the node joins the ring after reindexing the bootstrapped data.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
155
Configuration
cql_solr_query_paging: off
cql_solr_query_paging
• driver - Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the
driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.
• off - Paging is off. Ignore driver paging settings for CQL queries and use normal Solr paging unless:
cql_solr_query_row_timeout: 10000
cql_solr_query_row_timeout
The maximum time in milliseconds to wait for each row to be read from the database during CQL Solr
queries.
Default: commented out (10000 10 seconds)
DSE Search resource upload limit
solr_resource_upload_limit_mb: 10
solr_resource_upload_limit_mb
Option to disable or configure the maximum file size of the search index config or schema. Resource
files can be uploaded, but the search index config and schema are stored internally in the database
after upload.
• upload size - The maximum upload size limit in megabytes (MB) for a DSE Search resource file
(search index config or schema).
Default: 10
Shard transport options
shard_transport_options:
netty_client_request_timeout: 60000
shard_transport_options
Fault tolerance option for inter-node communication between DSE Search nodes.
netty_client_request_timeout
Timeout behavior during distributed queries. The internal timeout for all search queries to prevent long
running queries. The client request timeout is the maximum cumulative time (in milliseconds) that a
distributed search request will wait idly for shard responses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
156
Configuration
# back_pressure_threshold_per_core: 1024
# flush_max_time_per_core: 5
# load_max_time_per_core: 5
# enable_index_disk_failure_policy: false
# solr_data_dir: /MyDir
# solr_field_cache_enabled: false
# ram_buffer_heap_space_in_mb: 1024
# ram_buffer_offheap_space_in_mb: 1024
back_pressure_threshold_per_core
The maximum number of queued partitions during search index rebuilding and reindexing. This
maximum number safeguards against excessive heap use by the indexing queue. If set lower than the
number of threads per core (TPC), not all TPC threads can be actively indexing.
Default: commented out (1024)
flush_max_time_per_core
The maximum time, in minutes, to wait for the flushing of asynchronous index updates that occurs at
DSE Search commit time or at flush time. Expert level knowledge is required to change this value.
Always set the value reasonably high to ensure flushing completes successfully to fully sync DSE
Search indexes with the database data. If the configured value is exceeded, index updates are only
partially committed and the commit log is not truncated which can undermine data durability.
When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely
manner. Live indexing increases the time to flush asynchronous index updates.
Default: commented out (5)
load_max_time_per_core
The maximum time, in minutes, to wait for each DSE Search index to load on startup or create/reload
operations. This advanced option should be changed only if exceptions happen during search index
loading. When not set, the default is 5 minutes.
Default: commented out (5)
enable_index_disk_failure_policy
Whether to apply the configured disk failure policy if IOExceptions occur during index update
operations.
• true - apply the configured Cassandra disk failure policy to index write failures
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
157
Configuration
flushes also de-schedule pending auto-soft commits to avoid potentially flushing too many small
segments. When not set, the default is 1024.
Default: commented out (1024)
ram_buffer_offheap_space_in_mb
Global Lucene RAM buffer usage threshold for offheap to force segment flush. Setting too low might
induce a state of constant flushing during periods of ongoing write activity. For NRT, forced segment
flushes also de-schedule pending auto-soft commits to avoid potentially flushing too many small
segments. When not set, the default is 1024.
Default: commented out (1024)
Performance Service options
# performance_core_threads: 4
# performance_max_threads: 32
# performance_queue_capacity: 32000
performance_core_threads
Number of background threads used by the performance service under normal conditions. Default: 4
performance_max_threads
Maximum number of background threads used by the performance service.
performance_queue_capacity
The number of queued tasks in the backlog when the number of performance_max_threads are busy.
Default: 32000
Performance Service options
These settings are used by the Performance Service to configure collection of performance metrics on
transactional nodes. Performance metrics are stored in the dse_perf keyspace and can be queried with CQL
using any CQL-based utility, such as cqlsh or any application using a CQL driver. To temporarily make changes
for diagnostics and testing, use the dsetool perf subcommands.
graph_events
Graph event information.
graph_events:
ttl_seconds: 600
ttl_seconds
The TTL in milliseconds.
Default: 600
cql_slow_log_options
Options to configure reporting distributed sub-queries for search (query executions on individual shards)
that take longer than a specified period of time.
# cql_slow_log_options:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
158
Configuration
# enabled: true
# threshold: 200.0
# minimum_samples: 100
# ttl_seconds: 259200
# skip_writing_to_db: true
# num_slowest_queries: 5
• A value greater than 1 is expressed in time and will log queries that take longer than the specified
number of milliseconds.
• A value of 0 to 1 is expressed as a percentile and will log queries that exceed this percentile.
• false - write slow queries to the database; the threshold must be >= 2000 ms to prevent a high load
on the database
cql_system_info_options:
enabled: false
refresh_rate_ms: 10000
enabled
Whether to collect system-wide performance information about a cluster.
Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
resource_level_latency_tracking_options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
159
Configuration
resource_level_latency_tracking_options:
enabled: false
refresh_rate_ms: 10000
Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
db_summary_stats_options
Options to configure collection of summary statistics at the database level.
db_summary_stats_options:
enabled: false
refresh_rate_ms: 10000
Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
cluster_summary_stats_options
Options to configure collection of statistics at a cluster-wide level.
cluster_summary_stats_options:
enabled: false
refresh_rate_ms: 10000
Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
spark_cluster_info_options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
160
Configuration
Options to configure collection of data associated with Spark cluster and Spark applications.
spark_cluster_info_options:
enabled: false
refresh_rate_ms: 10000
Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
histogram_data_options
Histogram data for the dropped mutation metrics are stored in the dropped_messages table in the
dse_perf keyspace.
histogram_data_options:
enabled: false
refresh_rate_ms: 10000
retention_count: 3
Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
retention_count
Default: 3
user_level_latency_tracking_options
User-resource latency tracking settings.
user_level_latency_tracking_options:
enabled: false
refresh_rate_ms: 10000
top_stats_limit: 100
quantiles: false
Default: false
refresh_rate_ms
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
161
Configuration
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
top_stats_limit
Limit the number of individual metrics.
Default: 100
quantiles
Default: false
DSE Search Performance Service options
These settings are used by the DataStax Enterprise Performance Service.
solr_slow_sub_query_log_options:
enabled: false
ttl_seconds: 604800
threshold_ms: 3000
async_writers: 1
solr_update_handler_metrics_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_request_handler_metrics_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_index_stats_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_cache_stats_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_latency_snapshot_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_slow_sub_query_log_options
See Collecting slow search queries.
enabled
Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
async_writers
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
162
Configuration
The number of server threads dedicated to writing in the log. More than one server thread might
degrade performance.
Default: 1
threshold_ms
Default: 3000
solr_update_handler_metrics_options
Options to collect search index direct update handler statistics over time.
See Collecting handler statistics.
enabled
Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
solr_index_stats_options
Options to record search index statistics over time.
See Collecting index statistics.
enabled
Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
solr_cache_stats_options
See Collecting cache statistics.
enabled
Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
solr_latency_snapshot_options
See Collecting Apache Solr performance statistics.
enabled
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
163
Configuration
Default: false
ttl_seconds
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 604800 (about 10 minutes)
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 60000 (1 minute)
Spark Performance Service options
See Monitoring Spark application information.
spark_application_info_options:
enabled: false
refresh_rate_ms: 10000
driver:
sink: false
connectorSource: false
jvmSource: false
stateSource: false
executor:
sink: false
connectorSource: false
jvmSource: false
spark_application_info_options
Statistics options.
enabled
Default: false
refresh_rate_ms
The length of the sampling period in milliseconds; the frequency to update the statistics.
Default: 10000 (10 seconds)
driver
Options to configure collection of metrics at the Spark Driver.
connectorSource
Whether to collect Spark Cassandra Connector metrics at the Spark Driver.
Default: false
jvmSource
Whether to collect JVM heap and garbage collection (GC) metrics from the Spark Driver.
Default: false
stateSource
Whether to collect application state metrics at the Spark Driver.
Default: false
executor
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
164
Configuration
Default: false
connectorSource
Whether to collect Spark Cassandra Connector metrics at Spark executors.
Default: false
jvmSource
Whether to collect JVM heap and GC metrics at Spark executors.
Default: false
DSE Analytics options
• Spark
spark_shared_secret_bit_length: 256
spark_security_enabled: false
spark_security_encryption_enabled: false
spark_daemon_readiness_assertion_interval: 1000
resource_manager_options:
worker_options:
cores_total: 0.7
memory_total: 0.6
workpools:
- name: alwayson_sql
cores: 0.25
memory: 0.25
spark_ui_options:
encryption: inherit
encryption_options:
enabled: false
keystore: .keystore
keystore_password: cassandra
require_client_auth: false
truststore: .truststore
truststore_password: cassandra
# Advanced settings
# protocol: TLS
# algorithm: SunX509
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
165
Configuration
# store_type: JKS
# cipher_suites:
[TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_
spark_shared_secret_bit_length
The length of a shared secret used to authenticate Spark components and encrypt the connections
between them. This value is not the strength of the cipher for encrypting connections. Default: 256
spark_security_enabled
In DSE 6.0.8 and later, when DSE authentication is enabled with authentication_options, Spark security
is enabled regardless of this setting.
Enables Spark security based on shared secret infrastructure. Enables mutual authentication and
optional encryption between DSE Spark Master and Workers, and of communication channels, except
the web UI.
Default: false
spark_security_encryption_enabled
In DSE 6.0.8 and later, when DSE authentication is enabled with authentication_options, Spark security
is enabled regardless of this setting.
Enables encryption between DSE Spark Master and Workers, and of communication channels,
except the web UI. Uses DIGEST-MD5 SASL-based encryption mechanism. Requires
spark_security_enabled: true.
Configure encryption between the Spark processes and DSE with client-to-node encryption in
cassandra.yaml.
spark_daemon_readiness_assertion_interval
Time interval, in milliseconds, between subsequent retries by the Spark plugin for Spark Master and
Worker readiness to start. Default: 1000
resource_manager_options
DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-
automatic fashion. You can define the total amount of physical resources available to Spark Workers,
and optionally add named work pools with specific resources dedicated to them.
worker_options
If the option is not specified, the default value 0.6 is used. The amount of system resources that are
made available to the Spark Worker.
cores_total
The number of total system cores available to Spark. If the option is not specified, the default value 0.7
is used.
For DSE 6.0.11 and later, the SPARK_WORKER_TOTAL_CORES environment variables takes precedence
over this setting.
This setting can be the exact number of cores or a decimal of the total system cores. When the value is
expressed as a decimal, the available resources are calculated in the following way:
The lowest value that you can assign to Spark Worker cores is 1 core. If the results are lower, no
exception is thrown and the values are automatically limited.
Setting cores_total or a workpool's cores to 1.0 is a decimal value, meaning 100% of the available
cores will be reserved. Setting cores_total or cores to 1 (no decimal point) is an explicit value, and
one core will be reserved.
memory_total
The amount of total system memory available to Spark. This setting can be the exact amount of
memory or a decimal of the total system memory. When the value is an absolute value, you can use
standard suffixes like M for megabyte and G for gigabyte.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
166
Configuration
When the value is expressed as a decimal, the available resources are calculated in the following way:
The lowest values that you can assign to Spark Worker memory is 64 MB. If the results are lower, no
exception is thrown and the values are automatically limited.
If the option is not specified, the default value 0.6 is used.
For DSE 6.0.11 and later, the SPARK_WORKER_TOTAL_MEMORY environment variables takes
precedence over this setting.
workpools
Named work pools that can use a portion of the total resources defined under worker_options. A
default work pool named default is used if no work pools are defined in this section. If work pools are
defined, the resources allocated to the work pools are taken from the total amount, with the remaining
resources available to the default work pool. The total amount of resources defined in the workpools
section must not exceed the resources available to Spark in worker_options.
A work pool named alwayson_sql is created by default for AlwaysOn SQL. By default, it is configured
to use 25% of the resources available to Spark.
name
The name of the work pool.
cores
The number of system cores to use in this work pool expressed as either an absolute value or a decimal
value. This option follows the same rules as cores_total.
memory
The amount of memory to use in this work pool expressed as either an absolute value or a decimal
value. This option follows the same rules as memory_total.
spark_ui_options
Specify the source for SSL settings for Spark Master and Spark Worker UIs. The spark_ui_options
apply only to Spark daemon UIs, and do not apply to user applications even when the user applications
are run in cluster mode.
encryption
• inherit - inherit the SSL settings from the client encryption options.
Default: inherit
encryption_options
Set encryption options for HTTPS of Spark Master and Worker UI. The spark_encryption_options are
not valid for DSE 5.1 and later.
enabled
Whether to enable Spark encryption for Spark client-to-Spark cluster and Spark internode
communication.
Default: false
keystore
The keystore for Spark encryption keys.
The relative file path is the base Spark configuration directory that is defined by the SPARK_CONF_DIR
environment variable. The default Spark configuration directory is resources/spark/conf.
Default: resources/dse/conf/.ui-keystore
keystore_password
The password to access the key store.
Default: cassandra
require_client_auth
Whether to require truststore for client authentication. When not set, the default is false.
Default: commented out (false)
truststore
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
167
Configuration
• TLS_RSA_WITH_AES_128_CBC_SHA
• TLS_RSA_WITH_AES_256_CBC_SHA
• TLS_DHE_RSA_WITH_AES_128_CBC_SHA
• TLS_DHE_RSA_WITH_AES_256_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
• TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
spark_process_runner:
runner_type: default
run_as_runner_options:
user_slots:
- slot1
- slot2
spark_process_runner:
Options to configure how Spark driver and executor processes are created and managed.
runner_type
• run_as - Use the run_as_runner_options options. See Running Spark processes as separate
users.
run_as_runner_options
The slot users for separating Spark processes users from the DSE service user. See Running Spark
processes as separate users.
Default: slot1, slot2
AlwaysOn SQL options
Properties to enable and configure AlwaysOn SQL.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
168
Configuration
# thrift_port: 10000
# web_ui_port: 9077
# reserve_port_wait_time_ms: 100
# alwayson_sql_status_check_wait_time_ms: 500
# workpool: alwayson_sql
# log_dsefs_dir: /spark/log/alwayson_sql
# auth_user: alwayson_sql
# runner_max_errors: 10
alwayson_sql_options
The AlwaysOn SQL options enable and configure the server on this node.
enabled
Whether to enable AlwaysOn SQL for this node. The node must be an analytics node. When not set,
the default is false.
Default: commented out (false)
thrift_port
The Thrift port on which AlwaysOn SQL listens.
Default: commented out (10000)
web_ui_port
The port on which the AlwaysOn SQL web UI is available.
Default: commented out (9077)
reserve_port_wait_time_ms
The wait time in milliseconds to reserve the thrift_port if it is not available.
Default: commented out (100)
alwayson_sql_status_check_wait_time_ms
The time in milliseconds to wait for a health check status of the AlwaysOn SQL server.
Default: commented out (500)
workpool
The work pool name used by AlwaysOn SQL.
Default: commented out (alwayson_sql)
log_dsefs_dir
Location in DSEFS of the AlwaysOn SQL log files.
Default: commented out (/spark/log/alwayson_sql)
auth_user
The role to use for internal communication by AlwaysOn SQL if authentication is enabled. Custom roles
must be created with login=true.
Default: commented out (alwayson_sql)
runner_max_errors
The maximum number of errors that can occur during AlwaysOn SQL service runner thread runs before
stopping the service. A service stop requires a manual restart.
Default: commented out (10)
DSE File System (DSEFS) options
Properties to enable and configure the DSE File System (DSEFS).
DSEFS replaced the Cassandra File System (CFS). DSE version 6.0 and later do not support CFS.
dsefs_options:
enabled:
keyspace_name: dsefs
work_dir: /var/lib/dsefs
public_port: 5598
private_port: 5599
data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
min_free_space: 5368709120
# service_startup_timeout_ms: 30000
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
169
Configuration
# service_close_timeout_ms: 600000
# server_close_timeout_ms: 2147483647 # Integer.MAX_VALUE
# compression_frame_max_size: 1048576
# query_cache_size: 2048
# query_cache_expire_after_ms: 2000
# gossip_options:
# round_delay_ms: 2000
# startup_delay_ms: 5000
# shutdown_delay_ms: 10000
# rest_options:
# request_timeout_ms: 330000
# connection_open_timeout_ms: 55000
# client_close_timeout_ms: 60000
# server_request_timeout_ms: 300000
# idle_connection_timeout_ms: 60000
# internode_idle_connection_timeout_ms: 120000
# core_max_concurrent_connections_per_host: 8
# transaction_options:
# transaction_timeout_ms: 3000
# conflict_retry_delay_ms: 200
# conflict_retry_count: 40
# execution_retry_delay_ms: 1000
# execution_retry_count: 3
# block_allocator_options:
# overflow_margin_mb: 1024
# overflow_factor: 1.05
dsefs_options
Enable and configure options for DSEFS.
enabled
Whether to enable DSEFS.
• blank or commented out (#) - DSEFS will start only if the node is configured to run analytics
workloads.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
170
Configuration
- dir
Mandatory attribute to identify the set of directories. DataStax recommends segregating these data
directories on physical devices that are different from the devices that are used for DataStax Enterprise.
Using multiple directories on JBOD improves performance and capacity.
Default: commented out (/var/lib/dsefs/data)
storage_weight
The weighting factor for this location specifies how much data to place in this directory, relative to other
directories in the cluster. This soft constraint determines how DSEFS distributes the data. For example,
a directory with a value of 3.0 receives about three times more data than a directory with a value of 1.0.
Default: commented out (1.0)
min_free_space
The reserved space, in bytes, to not use for storing file data blocks. You can use a unit of measure
suffix to specify other size units. For example: terabyte (1 TB), gigabyte (10 GB), and megabyte (5000
MB).
Default: commented out (5368709120)
Advanced properties for DSEFS
service_startup_timeout_ms
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to bootstrap.
Default: commented out (30000)
service_close_timeout_ms
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to close.
Default: commented out (600000)
server_close_timeout_ms
Wait time, in milliseconds, that the DSEFS server waits during shutdown before closing all pending
connections.
Default: commented out (2147483647)
compression_frame_max_size
The maximum accepted size of a compression frame defined during file upload.
Default: commented out (1048576)
query_cache_size
Maximum number of elements in a single DSEFS Server query cache.
Default: commented out (2048)
query_cache_expire_after_ms
The time to retain the DSEFS Server query cache element in cache. The cache element expires when
this time is exceeded.
Default: commented out (2000)
gossip options
Options to configure DSEFS gossip rounds.
round_delay_ms
The delay, in milliseconds, between gossip rounds.
Default: commented out (2000)
startup_delay_ms
The delay time, in milliseconds, between registering the location and reading back all other locations
from the database.
Default: commented out (5000)
shutdown_delay_ms
The delay time, in milliseconds, between announcing shutdown and shutting down the node.
Default: commented out (30000)
rest_options
Options to configure DSEFS rest times.
request_timeout_ms
The time, in milliseconds, that the client waits for a response that corresponds to a given request.
Default: commented out (330000)
connection_open_timeout_ms
The time, in milliseconds, that the client waits to establish a new connection.
Default: commented out (55000)
client_close_timeout_ms
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
171
Configuration
The time, in milliseconds, that the client waits for pending transfer to complete before closing a
connection.
Default: commented out (60000)
server_request_timeout_ms
The time, in milliseconds, to wait for the server rest call to complete.
Default: commented out (300000)
idle_connection_timeout_ms
The time, in milliseconds, for RestClient to wait before closing an idle connection. If RestClient does not
close connection after timeout, the connection is closed after 2*idle_connection_timeout_ms.
overflow_margin_mb
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
172
Configuration
# insights_options:
# data_dir: /var/lib/cassandra/insights_data
# log_dir: /var/log/cassandra/
insights_options
Options for DSE Metrics Collector.
data_dir
Directory to store collected metrics. When not set, the default directory is /var/lib/cassandra/
insights_data.
When data_dir is not set, the default location of the /insights_data directory is the same location
as the /commitlog directory, as defined with the commitlog_directory property in cassandra.yaml.
log_dir
Directory to store logs for collected metrics. The log file is dse-collectd.log. The file with the collectd
PID is dse-collectd.pid. When not set, the default directory is /var/log/cassandra/.
Audit database activities
Track database activity using the audit log feature. To get the maximum information from data auditing, turn on
data auditing on every node.
See Setting up database auditing.
audit_logging_options
Options to enable and configure database activity logging.
enabled
Whether to enable database activity auditing.
Default: false
logger
The logger to use for recording events:
Configure logging level, sensitive data masking, and log file name/location in the logback.xml file.
Default: SLF4JAuditWriter
included_categories
Comma separated list of event categories that are captured, where the category names are:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
173
Configuration
• UNKNOWN - Events where the category and type are both UNKNOWN.
• UNKNOWN - Events where the category and type are both UNKNOWN.
retention_time: 0
cassandra_audit_writer_options:
mode: sync
batch_size: 50
flush_time: 250
queue_size: 30000
write_consistency: QUORUM
# dropped_event_log: /var/log/cassandra/dropped_audit_events.log
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
174
Configuration
# day_partition_millis: 3600000
retention_time
The amount of time, in hours, audit events are retained by supporting loggers. Only the
CassandraAuditWriter supports retention time.
• sync - A query is not executed until the audit event is successfully written.
• async - Audit events are queued for writing to the audit table, but are not necessarily logged before
the query executes. A pool of writer threads consumes the audit events from the queue, and writes
them to the audit table in batch queries.
While async substantially improves performance under load, if there is a failure between when
a query is executed, and its audit event is written to the table, the audit table might be missing
entries for queries that were executed.
Default: sync
batch_size
Available only when mode: async. Must be greater than 0.
The maximum number of events the writer dequeues before writing them out to the table. If
warnings in the logs reveal that batches are too large, decrease this value or increase the value of
batch_size_warn_threshold_in_kb in cassandra.yaml.
Default: 50
flush_time
Available only when mode: async.
The maximum amount of time in milliseconds before an event is removed from the queue by a writer
before being written out. This flush time prevents events from waiting too long before being written to
the table when there are not a lot of queries happening.
Default: 500
queue_size
The size of the queue feeding the asynchronous audit log writer threads. When there are more events
being produced than the writers can write out, the queue fills up, and newer queries are blocked until
there is space on the queue. If a value of 0 is used, the queue size is unbounded, which can lead to
resource exhaustion under heavy query load.
Default: 30000
write_consistency
The consistency level that is used to write audit events.
Default: QUORUM
dropped_event_log
The directory to store the log file that reports dropped events. When not set, the default is /var/log/
cassandra/dropped_audit_events.log.
Default: commented out (/var/log/cassandra/dropped_audit_events.log)
day_partition_millis
The interval, in milliseconds, between changing nodes to spread audit log information across multiple
nodes. For example, to change the target node every 12 hours, specify 43200000 milliseconds. When
not set, the default is 3600000 (1 hour).
Default: commented out (3600000) (1 hour)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
175
Configuration
# tiered_storage_options:
# strategy1:
# tiers:
# - paths:
# - /mnt1
# - /mnt2
# - paths: [ /mnt3, /mnt4 ]
# - paths: [ /mnt5, /mnt6 ]
#
# local_options:
# k1: v1
# k2: v2
#
# 'another strategy':
# tiers: [ paths: [ /mnt1 ] ]
tiered_storage_options
Options to configure the smart movement of data across different types of storage media so that data
is matched to the most suitable drive type, according to the performance and cost characteristics it
requires
strategy1
The first disk configuration strategy. Create a strategy2, strategy3, and so on. In this example, strategy1
is the configurable name of the tiered storage configuration strategy.
tiers
The unnamed tiers in this section define a storage tier with the paths and file paths that define the
priority order.
local_options
Local configuration options overwrite the tiered storage settings for the table schema in the local
dse.yaml file. See Testing DSE Tiered Storage configurations.
- paths
The section of file paths that define the data directories for this tier of the disk configuration. Typically
list the fastest storage media first. These paths are used only to store data that is configured to use
tiered storage. These paths are independent of any settings in the cassandra.yaml file.
- /filepath
The file paths that define the data directories for this tier of the disk configuration.
DSE Advanced Replication configuration settings
DSE Advanced Replication configuration options to replicate data from remote clusters to central data hubs.
# advanced_replication_options:
# enabled: false
# conf_driver_password_encryption_enabled: false
# advanced_replication_directory: /var/lib/cassandra/advrep
# security_base_path: /base/path/to/advrep/security/files/
advanced_replication_options
Options to enable and configure DSE Advanced Replication.
enabled
Whether to enable an edge node to collect data in the replication log.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
176
Configuration
internode_messaging_options:
port: 8609
# frame_length_in_mb: 256
# server_acceptor_threads: 8
# server_worker_threads: 16
# client_max_connections: 100
# client_worker_threads: 16
# handshake_timeout_seconds: 10
# client_request_timeout_seconds: 60
internode_messaging_options
Configuration options for inter-node messaging.
port
The mandatory port for the inter-node messaging service.
Default: 8609
frame_length_in_mb
Maximum message frame length. When not set, the default is 256.
Default: commented out (256)
server_acceptor_threads
The number of server acceptor threads. When not set, the default is the number of available
processors.
Default: commented out
server_worker_threads
The number of server worker threads. When not set, the default is the number of available processors *
8.
Default: commented out
client_max_connections
The maximum number of client connections. When not set, the default is 100.
Default: commented out (100)
client_worker_threads
The number of client worker threads. When not set, the default is the number of available processors *
8.
Default: commented out
handshake_timeout_seconds
Timeout for communication handshake process. When not set, the default is 10.
Default: commented out (10)
client_request_timeout_seconds
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
177
Configuration
Timeout for non-query search requests like core creation and distributed deletes. When not set, the
default is 60.
Default: commented out (60)
DSE Multi-Instance server_id
server_id
In DSE Multi-Instance /etc/dse-nodeId/dse.yaml files, the server_id option is generated to uniquely
identify the physical server on which multiple instances are running. The server_id default value is the
media access control address (MAC address) of the physical server. You can change server_id when
the MAC address is not unique, such as a virtualized server where the host’s physical MAC is cloned.
DSE Graph options
# graph:
# analytic_evaluation_timeout_in_minutes: 10080
# realtime_evaluation_timeout_in_seconds: 30
# schema_agreement_timeout_in_ms: 10000
# system_evaluation_timeout_in_seconds: 180
# index_cache_size_in_mb: 128
# max_query_queue: 10000
# max_query_threads (no explicit default)
# max_query_params: 16
graph
These graph options are system-level configuration options and options that are shared between graph
instances.
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid.
The ISO 8601 format is deprecated.
analytic_evaluation_timeout_in_minutes
Maximum time to wait for an OLAP analytic (Spark) traversal to evaluate. When not set, the default is
10080 (168 hours).
Default: commented out (10080)
realtime_evaluation_timeout_in_seconds
Maximum time to wait for an OLTP real-time traversal to evaluate. When not set, the default is 30
seconds.
Default: commented out (30)
schema_agreement_timeout_in_ms
Maximum time to wait for the database to agree on schema versions before timing out. When not set,
the default is 10000 (10 seconds).
Default: commented out (10000)
system_evaluation_timeout_in_seconds
Maximum time to wait for a graph system-based request to execute, like creating a new graph. When
not set, the default is 180 (3 minutes).
Default: commented out (180)
schema_mode
Controls the way that the schemas are handled.
• Production = Schema must be created before data insertion. Schema cannot be changed after
data is inserted. Full graph scans are disallowed unless the option graph.allow_scan is changed to
TRUE.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
178
Configuration
• Development = No schema is required to write data to a graph. Schema can be changed after data
is inserted. Full graph scans are allowed unless the option graph.allow_scan is changed to FALSE.
When not set, the default is Production. If this option is not present, manually enter it to use
Development.
Default: not present
index_cache_size_in_mb
The amount of ram to allocate to the index cache. When not set, the default is 128.
Default: commented out (128)
max_query_queue
The maximum number of CQL queries that can be queued as a result of Gremlin requests. Incoming
queries are rejected if the queue size exceeds this setting. When not set, the default is 10000.
Default: commented out (10000)
max_query_threads
The maximum number of threads to use for queries to the database. When this option is not set, the
default is calculated:
See gremlinPool.
Default: calculated
max_query_params
The maximum number of parameters that can be passed on a graph query request for TinkerPop
drivers and drivers using the Cassandra native protocol. Passing very large numbers of parameters
on requests is an anti-pattern, because the script evaluation time increases proportionally. DataStax
recommends reducing the number of parameters to speed up script compilation times. Before you
increase this value, consider alternate methods for parameterizing scripts, like passing a single map. If
the graph query request requires many arguments, pass a list.
Default: commented out (16)
DSE Graph Gremlin Server options
The Gremlin Server is configured using Apache TinkerPop specifications.
# gremlin_server:
# port: 8182
# threadPoolWorker: 2
# gremlinPool: 0
# scriptEngines:
# gremlin-groovy:
# config:
# sandbox_enabled: false
# sandbox_rules:
# whitelist_packages:
# - package.name
# whitelist_types:
# - fully.qualified.type.name
# whitelist_supers:
# - fully.qualified.class.name
# blacklist_packages:
# - package.name
# blacklist_supers:
# - fully.qualified.class.name
gremlin_server
The top-level configurations in Gremlin Server.
port
The available communications port for Gremlin Server. When not set, the default is 8182.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
179
Configuration
node_health_options:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
180
Configuration
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30
hosts: [localhost]
port: 8182
serializer: { className:
org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0,
config: { ioRegistries:
[org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0] }}
hosts
Identifies a host or hosts running a DSE node that is running Gremlin Server. You may need to use the
native_transport_address value set in cassandra.yaml.
Default: [localhost]
You can also connect to the Spark Master node for the datacenter by either running the console from
the Spark Master or specifying the Spark Master in the hosts field in the remote.yaml file.
port
Identifies a port on a DSE node running Gremlin Server. The port value needs to match the port value
specified for gremlin_server: in the dse.yaml file.
Default: 8182
serializer
Specifies the class and configuration for the serializer used to pass information between the Gremlin
console and the Gremlin Server.
Default: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0,
config: { ioRegistries:
[org.apache.tinkerpop.gremlin.tinkergraph.structure.TinkerIoRegistryV3d0]
DSE Graph Gremlin connectionPool options
The connectionPool settings specify a number of options that will be passed between the Gremlin console and
the Gremlin Server.
connectionPool: {
enableSsl: false,
maxContentLength: 65536000,
maxInProcessPerConnection: 4,
maxSimultaneousUsagePerConnection: 16,
maxSize: 8,
maxWaitForConnection: 3000,
maxWaitForSessionClose: 3000,
minInProcessPerConnection: 1,
minSimultaneousUsagePerConnection: 8,
minSize: 2,
reconnectInterval: 1000,
resultIterationBatchSize: 64,
# trustCertChainFile: /etc/dse/graph/gremlin-console/conf/mycert.pem
# Note: trustCertChainFile deprecated as of TinkerPop 3.2.10; instead use trustStore.
trustStore: /full/path/to/jsse/truststore/file
}
enableSsl
Determines if SSL should be enabled. If enabled on the server, SSL must be enabled on the client.
To configure the Gremlin console to use SSL, when SSL is enabled on the Gremlin Server, edit the
connectionPool section of remote.yaml:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
181
Configuration
# Java Secure Socket Extension (JSSE) truststore file via the trustStore parameter
Example:
hosts: [localhost]
username: Cassandra_username
password: Cassandra_password
port: 8182
...
connectionPool: {
enableSsl: true,
trustStore: /full/path/to/JSSE/truststore/file,
...
...
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
182
Configuration
The override value for the size of the result batches to be returned from the server.
Default: 64
trustCertChainFile
The location of the public certificate from the DSE truststore file, in PEM format. Also set enableSsl:
true.
If you are using the deprecated trustCertChainFile in your version of remote.yaml, here are
the details. Depending on how you created the DSE truststore file, you may already have the
PEM format certificate file from the root Certificate Authority. If so, specify the PEM file with this
trustCertChainFile option. If not, export the public certificate from the DSE truststore (CER format)
and convert it to PEM format. Then specify the PEM file with this option. Example:
$ pwd
/etc/dse/graph/gremlin-console/conf
In this example, the connectionPool section of remote.yaml should then include the following options
(assuming you are aware that trustCertChainFile is deprecated, as noted above).
connectionPool: {
enableSsl: true,
trustCertChainFile: /etc/dse/graph/gremlin-console/conf/mycert.pem,
...
}
Default: Unspecified
trustStore
The location of the Java Secure Socket Extension (JSSE) truststore file. Trusted certificates for verifying
the remote client's certificate. Similar to setting the JSSE property javax.net.ssl.trustStore. If
this value is not provided in remote.yaml and if SSL is enabled (via enableSSL: true), the default
TrustManager is used.
Default: Unspecified
DSE Graph Gremlin AuthProperties options
Security considerations for authentication between the Gremlin console and the Gremlin server require additional
options in the remote.yaml file.
# jaasEntry:
# protocol:
# username: xxx
# password: xxx
jaasEntry
Sets the AuthProperties.Property.JAAS_ENTRY properties for authentication to Gremlin Server.
Default: commented out (no value)
protocol
Sets the AuthProperties.Property.PROTOCOL properties for authentication to Gremlin Server.
Default: commented out (no value)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
183
Configuration
username
The username to submit on requests that require authentication.
Default: commented out (xxx)
password
The password to submit on requests that require authentication.
Default: commented out (xxx)
cassandra-rackdc.properties file
The GossipingPropertyFileSnitch, Ec2Snitch, and Ec2MultiRegionSnitch use the cassandra-rackdc.properties
configuration file to determine which datacenters and racks nodes belong to. They inform the database about the
network topology to route requests efficiently and distribute replicas evenly. Settings for this file depend on the
type of snitch:
• GossipingPropertyFileSnitch
This page also includes instructions for migrating from the PropertyFileSnitch to the GossipingPropertyFileSnitch.
GossipingPropertyFileSnitch
This snitch is recommended for production. It uses rack and datacenter information for the local node defined in
the cassandra-rackdc.properties file and propagates this information to other nodes via gossip.
To configure a node to use GossipingPropertyFileSnitch, edit the cassandra-rackdc.properties file as follows:
• Define the datacenter and rack that include this node. The default settings:
dc=DC1
rack=RAC1
datacenter and rack names are case-sensitive. For examples, see Initializing a single datacenter per
workload type and Initializing multiple datacenters per workload type.
• To save bandwidth, add the prefer_local=true option. This option tells DataStax Enterprise to use the
local IP address when communication is not across different datacenters.
cassandra-topology.properties file
The PropertyFileSnitch uses the cassandra-topology.properties for datacenters and rack names and to
determine network topology so that requests are routed efficiently and allows the database to distribute replicas
evenly.
The GossipingPropertyFileSnitch snitch is recommended for production. See Migrating from the
PropertyFileSnitch to the GossipingPropertyFileSnitch.
PropertyFileSnitch
This snitch determines proximity as determined by rack and datacenter. It uses the network details located in the
cassandra-topology.properties file. When using this snitch, you can define your datacenter names to be whatever
you want. Make sure that the datacenter names correlate to the name of your datacenters in the keyspace
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
184
Configuration
definition. Every node in the cluster should be described in the cassandra-topology.properties file, and this
file should be exactly the same on every node in the cluster.
Setting datacenters and rack names
If you had non-uniform IPs and two physical datacenters with two racks in each, and a third logical datacenter for
replicating analytics data, the cassandra-topology.properties file might look like this:
# datacenter One
175.56.12.105=DC1:RAC1
175.50.13.200=DC1:RAC1
175.54.35.197=DC1:RAC1
120.53.24.101=DC1:RAC2
120.55.16.200=DC1:RAC2
120.57.102.103=DC1:RAC2
# datacenter Two
110.56.12.120=DC2:RAC1
110.50.13.201=DC2:RAC1
110.54.35.184=DC2:RAC1
50.33.23.120=DC2:RAC2
50.45.14.220=DC2:RAC2
50.17.10.203=DC2:RAC2
172.106.12.120=DC3:RAC1
172.106.12.121=DC3:RAC1
172.106.12.122=DC3:RAC1
• node0
dc_suffix=_1_cassandra
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
185
Configuration
• node1
dc_suffix=_1_cassandra
• node2
dc_suffix=_1_cassandra
• node3
dc_suffix=_1_cassandra
• node4
dc_suffix=_1_analytics
• node5
dc_suffix=_1_search
us-east_1_cassandra
us-east_1_analytics
us-east_1_search
The datacenter naming convention in this example is based on the workload. You can use other conventions,
such as DC1, DC2 or 100, 200.
1. In the cassandra.yaml, set the listen_address to the private IP address of the node, and the
broadcast_address to the public IP address of the node.
This allows DataStax Enterprise nodes in one EC2 region to bind to nodes in another region, thus enabling
multiple datacenter support. For intra-region traffic, DataStax Enterprise switches to the private IP after
establishing a connection.
2. Set the addresses of the seed nodes in the cassandra.yaml file to that of the public IP. Private IP are not
routable between networks. For example:
To find the public IP address, from each of the seed nodes in EC2:
$ curl http://instance-data/latest/meta-data/public-ipv4
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
186
Configuration
• node0 • node0
dc_suffix=_1_transactional dc_suffix=_1_transactional
• node1 • node1
dc_suffix=_1_transactional dc_suffix=_1_transactional
• node2 • node2
dc_suffix=_2_transactional dc_suffix=_2_transactional
• node3 • node3
dc_suffix=_2_transactional dc_suffix=_2_transactional
• node4 • node4
dc_suffix=_1_analytics dc_suffix=_1_analytics
• node5 • node5
dc_suffix=_1_search dc_suffix=_1_search
This results in four us-east datacenters: This results in four us-west datacenters:
us-east_1_transactional us-west_1_transactional
us-east_2_transactional us-west_2_transactional
us-east_1_analytics us-west_1_analytics
us-east_1_search us-west_1_search
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
187
Configuration
Node dc_suffix
node0 dc_suffix=_a_transactional
node1 dc_suffix=_a_transactional
node2 dc_suffix=_a_transactional
node3 dc_suffix=_a_transactional
node4 dc_suffix=_a_analytics
node5 dc_suffix=_a_search
Synopsis
Change the start up parameters using the following syntax:
• Command line:
• jvm.options file:
-Dparameter_name=value
• cassandra-env.sh file:
JVM_OPTS="$JVM_OPTS -Dparameter_name=value"
Only pass the parameter to the start-up operation once. If the same switch is passed to the start operation
multiple times, for example from both the jvm.options file and on the command line, DSE may fail to start or
may use the wrong parameter.
Startup examples
Starting a node without joining the ring:
• Command line:
• jvm.options:
-Dcassandra.join_ring=false
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
188
Configuration
• Command line:
• jvm.options:
-Dcassandra.replace_address=10.91.176.160
• Command line:
dse -Ddse.ldap.retry_interval.ms=20
• jvm.options:
-Ddse.ldap.retry_interval.ms=20
The value for consistent replace should match the value for application read consistency.
Default: ONE
-Ddse.consistent_replace.parallelism
Specify how many ranges will be repaired simultaneously during a consistent replace. The higher
the parallelism, the more resources are consumed cluster-wide, which may affect overall cluster
performance. Used only in conjunction with Dcassandra.consistent_replace.
Default: 2
-Ddse.consistent_replace.retries
Specify how many times a failed repair will be retried during a replace. If all retries fail, the replace fails.
Used only in conjunction with Dcassandra.consistent_replace.
Default: 3
-Ddse.consistent_replace.whitelist
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
189
Configuration
Specify keyspaces and tables on which to perform a consistent replace. The keyspaces and tables
can be specified as: “ks1, ks2.cf1”. The default is blank, in which case all keyspaces and tables are
replaced. Used only in conjunction with Dcassandra.consistent_replace.
Default: blank (not set)
-Dcassandra.disable_auth_caches_remote_configuration
Set to true to disable authentication caches, for example the caches used for credentials, permissions,
and roles. This will mean those config options can only be set (persistently) in cassandra.yaml and will
require a restart for new values to take effect.
Default: false.
-Dcassandra.expiration_date_overflow_policy
Set the policy (REJECT or CAP) for any TTL (time to live) timestamps that exceeds the maximum value
supported by the storage engine, 2038-01-19T03:14:06+00:00. The database storage engine can
only encode TTL timestamps through January 19 2038 03:14:07 UTC due to the Year 2038 problem.
• CAP: Allow requests and insert expiration timestamps later than 2038-01-19T03:14:06+00:00 as
2038-01-19T03:14:06+00:00.
Default: REJECT.
-Dcassandra.force_default_indexing_page_size
Set to true to disable dynamic calculation of the page size used when indexing an entire partition
during initial index build or a rebuild. Fixes the page size to the default of 10000 rows per page.
Default: false.
-Dcassandra.ignore_dc
Set to true to ignore the datacenter name change on startup. Applies only when using
DseSimpleSnitch.
Default: false.
-Dcassandra.initial_token
Use when DSE is not using virtual nodes (vnodes). Set to the initial partitioner token for the node on the
first start up.
Default: blank (not set).
Vnodes automatically select tokens.
-Dcassandra.join_ring
Set to false to prevent the node from joining a ring on startup.
Add the node to the ring afterwards using nodetool join and a JMX call.
Default: true.
-Dcassandra.load_ring_state
Set to false to clear all gossip state for the node on restart.
Default: true.
-Dcassandra.metricsReporterConfigFile
Enables pluggable metrics reporter and configures it from the specified file.
Default: blank (not set).
-Dcassandra.native_transport_port
Set to the port number that CQL native transport listens for clients.
Default: 9042.
-Dcassandra.native_transport_startup_delay_seconds
Set to the number of seconds to delay the native transport server start up.
Default: 0 (no delay).
-Dcassandra.partitioner
Set to the partitioner name.
Default: org.apache.cassandra.dht.Murmur3Partitioner.
-Dcassandra.partition_sstables_by_token_range
Set to false to disable JBOD SSTable partitioning by token range to multiple data_file_directories.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
190
Configuration
Advanced setting that should only be used with guidance from DataStax Support.
Default: true.
-Dcassandra.printHeapHistogramOnOutOfMemoryError
Set to false to disable a heap histogram dump on an OutOfMemoryError.
Default: false.
-Dcassandra.replace_address
Set to the listen_address or the broadcast_address when replacing a dead node with a new node. The
new node must be in the same state as before bootstrapping, without any data in its data directory.
The broadcast_address defaults to the listen_address except when the ring is using the
Configuring Amazon EC2 multi-region snitch.
-Dcassandra.replace_address_first_boot
Same as -Dcassandra.replace_address but only runs the first time the Cassandra node boots.
This property is preferred over -Dcassandra.replace_address since it has no effect on subsequent
boots if it is not removed from jvm.options or cassandra-env.sh.
-Dcassandra.replayList
Allows restoring specific tables from an archived commit log.
-Dcassandra.ring_delay_ms
Set to the number of milliseconds the node waits to hear from other nodes before formally joining the
ring.
Default: 30000.
-Dcassandra.ssl_storage_port
Sets the SSL port for encrypted communication.
Default: 7001.
-Dcassandra.start_native_transport
Enables or disables the native transport server. See start_native_transport in cassandra.yaml.
Default: true.
-Dcassandra.storage_port
Sets the port for inter-node communication.
Default: 7000.
-Dcassandra.write_survey
Set to true to enable a tool for testing new compaction and compression strategies. write_survey
allows you to experiment with different strategies and benchmark write performance differences without
affecting the production workload. See Testing compaction and compression.
Default: false.
Java Management Extension system properties
DataStax Enterprise exposes metrics and management operations via Java Management Extensions (JMX).
JConsole and the nodetool utility are JMX-compliant management tools.
-Dcom.sun.management.jmxremote.port
Sets the port number on which the database listens for JMX connections.
By default, you can interact with DataStax Enterprise using JMX on port 7199 without authentication.
Default: 7199
-Dcom.sun.management.jmxremote.ssl
Change to true to enable SSL for JMX.
Default: false
-Dcom.sun.management.jmxremote.authenticate
True enables remote authentication for JMX.
Default: false
-Djava.rmi.server.hostname
Sets the interface hostname or IP that JMX should use to connect. Uncomment and set if you are
having trouble connecting.
Search system properties
DataStax Enterprise (DSE) Search system properties.
-Ddse.search.client.timeout.secs
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
191
Configuration
Set the timeout in seconds for native driver search core management calls using the dsetool search-
specific commands.
Default: 600 (10 minutes).
-Ddse.search.query.threads
Sets the number of Search queries that can execute in parallel. Consider increasing this value or
reducing client/driver requests per connection if EnqueuedRequestCount does not stabilize near zero.
Default: The default is two times the number of CPUs (including hyperthreading).
-Ddse.timeAllowed.enabled.default
The Solr timeAllowed option is enforced by default to prevent long-running shard queries (such as
complex facets and Boolean queries) from using system resources after they have timed out from the
DSE Search coordinator.
DSE Search checks the timeout per segment instead of during document or terms iteration. The
system property solr.timeAllowed.docsPerSample has been removed.
By default for all queries, the timeAllowed value is the same as the
internode_messaging_options.client_request_timeout_seconds setting in dse.yaml. For more
details, see Limiting queries by time.
Using the Solr timeAllowed parameter may cause a latency cost. If you find the cost for queries is
too high in your environment, consider setting the -Ddse.timeAllowed.enabled.default property
to false at DSE startup time. Or set timeAllowed.enable to false in the query.
Default: true.
-Ddse.solr.data.dir
Set the path to store DSE Search data. See Set the location of search indexes.
-Dsolr.offheap.enable
The DSE Search per-segment filter cache is moved off-heap by using native memory to reduce on-
heap memory consumption and garbage collection overhead. The off-heap filter cache is enabled by
default. To disable, set to false to pass the offheap JVM system property at startup time. When not set,
the default is true.
Default: true
Threads per core system properties
Tune TPC using the Netty system parameters.
-Ddse.io.aio.enable
Set to false to have all read operations use the AsynchronousFileChannel regardless of the
operating system or disk type.
The default setting true allows dynamic switching of libraries for read operations as follows:
• AsynchronousFileChannel for read operations on hard disk drives and all non-Linux operating
systems
Use this advanced setting only with guidance from DataStax Support.
Default: true
-Ddse.io.aio.force
Set to true to force all read operations to use LibAIO regardless of the disk type or operating system.
Use this advanced setting only with guidance from DataStax Support.
Default: false
-Dnetty.eventloop.busy_extra_spins=N
Set to the number of iterations in the epoll event loops performed when queues are empty before
moving on to the next backoff stage. Increasing the value reduces latency while increasing CPU usage
when the loops are idle.
Default: 10
-Dnetty.epoll_check_interval_nanos
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
192
Configuration
Sets the granularity for calling an epoll select in nanoseconds, which is a system call. Setting the value
too low impacts performance because by making too many system calls. Setting the value too high,
impacts performance by delaying the discovery of new events.
Default: 2000
-Dnetty.schedule_check_interval_nanos
Set the granularity for checking if scheduled events are ready to execute in nanoseconds. Specifying a
value below 1 nanosecond is not productive. Too high a values delays scheduled tasks.
Default: 1000
LDAP system properties for DataStax Enterprise Authentication
-Ddse.ldap.connection.timeout.ms
The number of milliseconds before the connection timesout.
Default:
-Ddse.ldap.retry_interval.ms
Allows you to set the time in milliseconds between subsequent retries when authenticating via an LDAP
server.
Default: 10
-Ddse.ldap.pool.min.idle
Finer control over the connection pool for DataStax Enterprise LDAP authentication connector. The
min idle settings determines the minimum number of connections allowed in the pool before the evictor
thread will create new connections. This setting has no effect if the evictor thread isn't configured to run.
Default:
-Ddse.ldap.pool.exhausted.action
Determines what the pool does when it is full. It can be one of:
Default: block
-Ddse.ldap.pool.max.wait
When the dse.ldap.pool.exhausted.action is block, sets the number of milliseconds to block the
pool before throwing an exception.
Default:
-Ddse.ldap.pool.test.borrow
Tests a connection when it is borrowed from the pool.
Default:
-Ddse.ldap.pool.test.return
Tests a connection returned to the pool.
Default:
-Ddse.ldap.pool.test.idle
Tests any connections in the eviction loop that are not being evicted. Only works if the time between
eviction runs is greater than 0ms.
Default:
-Ddse.ldap.pool.time.between.evictions
Determines the time in ms (milliseconds) between eviction runs. When run with the
dse.ldap.pool.test.idle this becomes a basic keep alive for connections.
Default:
-Ddse.ldap.pool.num.tests.per.eviction
Number of connections in the pool that are tested each connection run. If this is set the same as max
active (the pool size) then all connections will be tested each eviction run.
Default:
-Ddse.ldap.pool.min.evictable.idle.time.ms
Determines the minimum time in ms (milliseconds) that a connection can sit in the pool before it
becomes available for eviction.
Default:
-Ddse.ldap.pool.soft.min.evictable.idle.time.ms
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
193
Configuration
Determines the minimum time in ms (milliseconds) that a connection can sit the pool before it
becomes available for eviction with the proviso that the number of connections doesn't fall below
dse.ldap.pool.min.evictable.idle.time.ms.
Default:
Kerberos system properties
-Ddse.sasl.protocol
Kerberos principal name, user@realm. For example, dse_admin@EXAMPLE.com.
-Djava.security.auth.login.config
The path to the JAAS configuration file for DseClient.
NodeSync system parameters
-Ddse.nodesync.controller_update_interval_sec
Set the frequency to execute NodeSync auto-tuning process in seconds.
Default: 300 (5 minutes).
-Ddse.nodesync.log_reporter_interval_sec
Set the frequency of short INFO progress report in seconds.
Default: 600 (10 minutes).
-Ddse.nodesync.min_validation_interval_sec
Set to the minimum number of seconds between validations of the same segment, mostly to avoid busy
spinning on new/empty clusters.
Default: 300 (5 minutes).
-Ddse.nodesync.min_warn_interval_sec
Set to the minimum number of seconds between logging warnings.
Avoid logging warnings too often.
Default: 36000 (10 hours).
-Ddse.nodesync.rate_checker_interval_sec
Set the frequency in seconds of comparing the current configured rate to tables and their deadline. Log
a warning if rate considered too low.
Default: 1800 (30 minutes).
-Ddse.nodesync.segment_lock_timeout_sec
Set the Time-to-live (TTL) on locks inserted in the status table in seconds.
Default: 600 (10 minutes).
-Ddse.nodesync.segment_size_target_bytes
Set to the targeted maximum size for segments in bytes.
Default: 26214400 (200 MB).
-Ddse.nodesync.size_checker_interval_sec
Set the frequency to check if the depth used for a table should be updated due to data size changes in
seconds.
Default: 7200 (2 hours).
2. Answer the questions below to determine the appropriate compaction strategy for each table.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
194
Configuration
If the answer is yes, use TWCS (TimeWindowCompactionStrategy). If the answer is no, read the
following questions.
Does your table handle more reads than writes, or more writes than reads?
LCS (LeveledCompactionStrategy) is appropriate if there are twice or more reads than writes, especially
randomized reads. If the reads and writes are approximately equal, the performance penalty from LCS
may not be worth the benefit. Be aware that LCS can be overwhelmed by a high number of writes. One
advantage of LCS is that it keeps related data in a small set of SSTables.
Does the data in your table change often?
If your data is immutable or there are few upserts, use STCS (SizeTieredCompactionStrategy), which
does not have the write performance penalty of LCS.
Do you require predictable levels of read and write activity?
LCS keeps the SSTables within predictable sizes and numbers. For example, if your table's read and
write ratio is small, and the read activity is expected to conform to a Service Level Agreement (SLA), it
may be worth the LCS write performance penalty to keep read rates and latency at predictable levels.
And, you may be able to overcome the LCS write penalty by adding more nodes.
Will your table be populated by a batch process?
For batched reads and writes, STCS performs better than LCS. The batch process causes little or no
fragmentation, so the benefits of LCS are not realized; batch processes can overwhelm tables that use
LCS.
Does your system have limited disk space?
LCS handles disk space more efficiently than STCS: LCS requires about 10% headroom in addition to
the space occupied by the data. In some cases, STCS and DTCS (DateTieredStorageStrategy) require
as much as 50% more headroom than the data space. (DTCS is deprecated.)
Is your system reaching its limits for input and output?
LCS is significantly more input and output intensive than DTCS or STCS. Switching to LCS may
introduce extra input and output load that offsets the advantages.
Configuring and running compaction
Set the table compaction strategy in the CREATE TABLE or ALTER TABLE statement parameters. See
table_options.
You can start compaction manually using the nodetool compact command.
Testing compaction strategies
To test the compaction strategy:
• Create a three-node cluster using one of the compaction strategies, then stress test the cluster using
thecassandra-stress utility and measure the results.
• Set up a node on your existing cluster and enable the write survey mode option on the node to analyze live
data.
NodeSync service
About NodeSync
NodeSync is an easy to use continuous background repair that has low overhead and provides consistent
performance and virtually eliminates manual efforts to run repair operations in a DataStax cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
195
Configuration
For write-heavy workloads, where more than 20% of the operations are writes, you may notice CPU
consumption overhead associated with NodeSync. If that's the case for you environment, DataStax
recommends using nodetool repair instead of enabling NodeSync. See nodetool repair.
NodeSync service
By default, each node runs the NodeSync service. The service is idle unless it has something to validate.
NodeSync is enabled on a per table basis. The service continuously validates local data ranges for NodeSync-
enabled tables and repairs any inconsistency found. The local data ranges are split into small segments, which
act as validation save points. Segments are prioritized in order to try to meet the per-table deadline target.
Segments
A segment is a small local token range of a table. NodeSync recursively splits local ranges in half a certain
number of times (depth) to create segments. The depth is calculated using the total table size, assuming equal
distribution of data. Typically segments cover no more than 200 MB. The token ranges can be no smaller than a
single partition, so very large partitions can result in segments larger than the configured size.
Validation process and status
After a segment is selected for validation, NodeSync reads the entirety of the data it covers from all replica
(using paging), checks for inconsistencies, and repairs if needed. When a node validates a segment, it “locks”
it in a system table to avoid work duplication by other nodes. It is not a race-free lock; there is a possibility of
duplicated work which saves the complexity and cost of true distributed locking.
Segment validation is saved on completion in the system_distributed.nodesync_status table, which is used
internally for resuming on failure, prioritization, segment locking, and by tools. It is not meant to be read directly.
# successful: All replicas responded and all inconsistencies (if any) were properly repaired.
# unsuccessful: Either some replicas did not respond or repairs on inconsistent replicas failed.
# partial_in_sync: Not all replica responded, but all respondents were in sync.
# partial_repaired: Not all replica responded, some that responded were repaired.
Limitations
• For debugging/tuning, understanding of traditional repair will be mostly unhelpful, since NodeSync depends
on the read repair path
• No special optimizations for remote DC - may perform poorly on particularly bad WAN links
• NodeSync only makes internal adjustments to try to hit the configured rate - operators must ensure this
configured throughput is sufficient to meet the gc_grace_seconds commitment and can be achieved by the
hardware
Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
196
Configuration
On the next restart of DataStax Enterprise (DSE), the NodeSync service will start up.
Data only needs to be validated if the table is in more than one datacenter or is in a datacenter where the
keyspace has a replication factor or 2 or more.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
197
Configuration
nodesync={'enabled': 'true'};
NodeSync records warnings to the system.log, if it detects any of the following conditions:
• rate_in_kb is too low to validate all tables within their deadline, even under ideal circumstances.
• rate_in_kb cannot be sustained by the node (too high for the node load/hardware).
1. Check the rate_in_kb setting within the nodesync section in the cassandra.yaml file.
The configured rate is different from the effective rate, which can be found in the NodeSync Service
metrics.
• Failures - When a node fails, it does not participate in NodeSync validation while it is offline.
• Temporary overloads - During periods of overload, such as an unexpected events, nodes can not achieve
the configured rate.
• Data size variation - The rate required to repair all tables within a fixed amount of time directly depends on
the size of the data to validate, which is typically a moving target.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
198
Configuration
All these factors can impact the overall NodeSync rate. Therefore build safety margins within the configured
rate. The NodeSyncServiceRate simulator helps to set the rate.
Setting the NodeSync deadline
Each table with NodeSync enabled has a deadline_target_sec property. This is the target for the maximum time
between 2 validations of the same data. As long as the deadline is met, all parts of the ring (for the table) are
validated at least that often.
The deadline (deadline_target_sec) relates to grace period (gc_grace_seconds). The deadline should
always be less than or equal to the grace period. As long as the deadline is met, no data is resurrected due to
tombstone purging.
The deadline defaults to which ever is longer, the grace period or four days. Typically an acceptable default,
unless the table has a grace period of zero. For testing, the deadline value can be less than the grace period.
Verify for a few weeks if a lower gc_grace value is realistic without taking risk before changing it.
NodeSync prioritize segments in order to try to meet the deadline. The next segment to validate at any given
time is the one the closest to missing its deadline. For example, if table 1 has half the deadline of table 2, table
1 validates approximately twice as often as table 2.
Use OpsCenter to get a graphical representation of the NodeSync validation status. See Viewing NodeSync
Status.
The syntax to change the per-table nodesync property:
This is an advanced tool. Usually, it is better to let NodeSync prioritize segments on its own.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
199
Configuration
• cassandra-topology.properties (PropertyFileSnitch)
1. In the cassandra.yaml file , set the listen_address to the private IP address of the node, and the
broadcast_address to the public address of the node.
This allows nodes to bind to nodes in another network or region, thus enabling multiple datacenter support.
For intra-network or region traffic, DSE switches to the private IP after establishing a connection.
2. Set the addresses of the seed nodes in the cassandra.yaml file to that of the public IP. Private IP are not
routable between networks. For example:
Be sure to enable encryption and authentication when using public IPs. See Configuring SSL for node-to-node
connections. Another option is to use a custom VPN to have local, inter-region/ datacenter IPs.
listen_on_broadcast_address: true
In non-EC2 environments, the public address to private address routing is not automatically enabled. Enabling
listen_on_broadcast_address allows DSE to listen on both listen_address and broadcast_address with
two network interfaces.
Configuring the snitch for multiple networks
External communication between the datacenters can only happen when using the broadcast_address (public IP).
The GossipingPropertyFileSnitch is recommended for production. The cassandra-rackdc.properties file defines
the datacenters used by this snitch. Enable the option prefer_local to ensure that traffic to broadcast_address
will re-route to listen_address.
For each node in the network, specify its datacenter in cassandra-rackdc.properties file.
In the example below, there are two datacenters and each datacenter is named for its workload. The datacenter
naming convention in this example is based on the workload. You can use other conventions, such as DC1, DC2
or 100, 200. (datacenter names are case-sensitive.)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
200
Configuration
Network A Network B
• node0 • node0
dc=DC_A_transactional dc=DC_A_transactional
rack=RAC1 rack=RAC1
• node1 • node1
dc=DC_A_transactional dc=DC_A_transactional
rack=RAC1 rack=RAC1
• node2 • node2
dc=DC_B_transactional dc=DC_B_transactional
rack=RAC1 rack=RAC1
• node3 • node3
dc=DC_B_transactional dc=DC_B_transactional
rack=RAC1 rack=RAC1
• node4 • node4
dc=DC_A_analytics dc=DC_A_analytics
rack=RAC1 rack=RAC1
• node5 • node5
dc=DC_A_search dc=DC_A_search
rack=RAC1 rack=RAC1
In cloud deployments, the region name is treated as the datacenter name and availability zones are treated as
racks within a datacenter. For example, if a node is in the us-east-1 region, us-east is the datacenter name and 1
is the rack location. (Racks are important for distributing replicas, but not for datacenter naming.)
In the example below, there are two DataStax Enterprise datacenters and each datacenter is named for its
workload. The datacenter naming convention in this example is based on the workload. You can use other
conventions, such as DC1, DC2 or 100, 200. (Datacenter names are case-sensitive.)
For each node, specify its datacenter in the cassandra-rackdc.properties. The dc_suffix option defines the
datacenters used by the snitch. Any other lines are ignored.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
201
Configuration
• node0 • node0
dc_suffix=_1_transactional dc_suffix=_1_transactional
• node1 • node1
dc_suffix=_1_transactional dc_suffix=_1_transactional
• node2 • node2
dc_suffix=_2_transactional dc_suffix=_2_transactional
• node3 • node3
dc_suffix=_2_transactional dc_suffix=_2_transactional
• node4 • node4
dc_suffix=_1_analytics dc_suffix=_1_analytics
• node5 • node5
dc_suffix=_1_search dc_suffix=_1_search
This results in four us-east datacenters: This results in four us-west datacenters:
us-east_1_transactional us-west_1_transactional
us-east_2_transactional us-west_2_transactional
us-east_1_analytics us-west_1_analytics
us-east_1_search us-west_1_search
Property Description
cluster_name Name of the cluster that this node is joining. Must be the same for every node in the
cluster.
listen_address The IP address or hostname that the database binds to for connecting this node to other
nodes.
listen_interface Use this option instead of listen_address to specify the network interface by name, rather
than address/hostname
(Optional) broadcast_address The public IP address this node uses to broadcast to other nodes outside the network
or across regions in multiple-region EC2 deployments. If this property is commented
out, the node uses the same IP address or hostname as listen_address. A node
does not need a separate broadcast_address in a single-node or single-datacenter
installation, or in an EC2-based network that supports automatic switching between
private and public communication. It is necessary to set a separate listen_address and
broadcast_address on a node with multiple physical network interfaces or other topologies
where not all nodes have access to other nodes by their private IP addresses. For specific
configurations, see the instructions for listen_address. The default is the listen_address.
seed_provider A -seeds list is comma-delimited list of hosts (IP addresses) that gossip uses to learn the
topology of the ring. Every node should have the same list of seeds.
Making every node a seed node is not recommended because of increased
maintenance and reduced gossip performance. Gossip optimization is not critical, but
it is recommended to use a small seed list (approximately three nodes per datacenter).
storage_port The inter-node communication port (default is 7000). Must be the same for every node in
the cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
202
Configuration
Property Description
initial_token For legacy clusters. Set this property for single-node-per-token architecture, in which a
node owns exactly one contiguous range in the ring space.
num_tokens For new clusters. The number of tokens randomly assigned to this node in a cluster that
uses virtual nodes (vnodes).
Base the size of the directory on the value of the Java -mx option.
3. On the line after the comment, set the CASSANDRA_HEAPDUMP_DIR to the desired path:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
203
Configuration
DataStax Enterprise requires the same token architecture on all nodes in a datacenter. The nodes must all be
vnode-enabled or single-token architecture. Across the entire cluster, datacenter architecture can vary. For
example, a single cluster with:
Using 8 vnodes distributes the workload between systems with a ~10% variance and has minimal impact on
performance.
# The allocation algorithm distributes the token ranges proportionately using the num_tokens settings.
All systems in the datacenter should have the same num_token settings unless the systems
performance varies between systems. To distribute more of the workload to the higher performance
hardware, increase the number of tokens for those systems.
The allocation algorithm efficiently balances the workload using fewer tokens; when systems are added
to a datacenter, the algorithm maintains the balance. Using a higher number of tokens more evenly
distributes the workload, but also significantly increases token management overhead.
Set the number of vnode tokens based on the workload distribution requirements of the datacenter:
Table 12: Allocation algorithm workload distribution variance
Replication 4 vnode (tokens) 8 vnode (tokens) 64 vnode 128 vnode
factor (tokens) (tokens)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
204
Configuration
Enabling vnodes
In the cassandra.yaml file:
To upgrade existing clusters to vnodes, see Enabling virtual nodes on an existing production cluster.
Disabling vnodes
If you do not use vnodes, you must make sure that each node is responsible for roughly an equal amount of
data. To ensure that each node is responsible for an equal amount of data, assign each node an initial-token
value and calculate the tokens for each datacenter as described in Generating tokens.
b. Uncomment the initial_token and set it to 1 or to the value of a generated token for a multi-node cluster.
DataStax recommends not using vnodes with DSE Search. However, if you decide to use vnodes with DSE
Search, do not use more than 8 vnodes and ensure that allocate_tokens_for_local_replication_factor option in
cassandra.yaml is correctly configured for your environment.
2. Once the new datacenter with vnodes enabled is up, switch your clients to use the new datacenter.
Logging configuration
Changing logging locations
Logging locations are set at installation. Generally, the default logs location is /var/log. For example, /var/
log/cassandra and /var/log/tomcat.
For details, see Default file locations for package installations and Default file locations for tarball installations.
You can also change logging locations with OpsCenter Configuration Profiles.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
205
Configuration
• To generate all logs in the same location, add CASSANDRA_LOG_DIR to the dse-env.sh file:
export CASSANDRA_LOG_DIR="/your/log/location"
• For finer-grained control, edit the logback.xml file and replace ${cassandra.logdir} with the path.
2. To change the Tomcat server log locations for DSE Search, edit one of these files:
export TOMCAT_LOGS="/your/log/location"
Configuring logging
Logging functionality uses Simple Logging Facade for Java (SLF4J) with a logback backend. Logs are written
to the system.log and debug.log in the logging directory. You can configure logging programmatically or
manually. Manual ways to configure logging are:
Logback looks for the logback-test.xml file first, and then for the logback.xml file.
The following example details the XML configuration of the logback.xml file:
<configuration scan="true">
<jmxConfigurator />
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
206
Configuration
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>20MB</maxFileSize>
</triggeringPolicy>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
</encoder>
</appender>
<if condition='isDefined("dse.console.useColors")'>
<then>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<withJansi>true</withJansi>
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder>
<pattern>%highlight(%-5level) [%thread] %green(%date{ISO8601})
%yellow(%X{service}) %F:%L - %msg%n<$
</encoder>
</appender>
</then>
</if>
<if condition='isNull("dse.console.useColors")'>
<then>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<filter class="ch.qos.logback.classic.filter.ThresholdFilter">
<level>INFO</level>
</filter>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
</encoder>
</appender>
</then>
</if>
<include file="${SPARK_SERVER_LOGBACK_CONF_FILE}"/>
<include file="${GREMLIN_SERVER_LOGBACK_CONF_FILE}"/>
<!-- Uncomment the LogbackMetrics appender and the corresponding appender-ref in the
root to activate
<appender name="LogbackMetrics"
class="com.codahale.metrics.logback.InstrumentedAppender" />
-->
<root level="${logback.root.level:-INFO}">
<appender-ref ref="SYSTEMLOG" />
<appender-ref ref="STDOUT" />
<!-- Comment out the ASYNCDEBUGLOG appender to disable debug.log -->
<appender-ref ref="ASYNCDEBUGLOG" />
<!-- Uncomment LogbackMetrics and its associated appender to enable metric collecting for
logs. -->
<!-- <appender-ref ref="LogbackMetrics" /> -->
<appender-ref ref="SparkMasterFileAppender" />
<appender-ref ref="SparkWorkerFileAppender" />
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
207
Configuration
<!--audit log-->
<appender name="SLF4JAuditWriterAppender"
class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>${cassandra.logdir}/audit/audit.log</file>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
<immediateFlush>true</immediateFlush>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${cassandra.logdir}/audit/audit.log.%i.zip</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>5</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>200MB</maxFileSize>
</triggeringPolicy>
</appender>
<appender name="DroppedAuditEventAppender"
class="ch.qos.logback.core.rolling.RollingFileAppender" prudent=$
<file>${cassandra.logdir}/audit/dropped-events.log</file>
<encoder>
<pattern>%-5level [%thread] %date{ISO8601} %X{service} %F:%L - %msg%n</pattern>
<immediateFlush>true</immediateFlush>
</encoder>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${cassandra.logdir}/audit/dropped-events.log.%i.zip</
fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>5</maxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>200MB</maxFileSize>
</triggeringPolicy>
</appender>
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
208
Configuration
</configuration>
The appender configurations specify where to print the log and its configuration. Each appender is defined as
appendername="appender", and are described as follows.
SYSTEMLOG
Directs logs and ensures that WARN and ERROR messages are written synchronously to the /var/
log/cassandra/system.log file.
DEBUGLOG | ASYNCDEBUGLOG
Generates the /var/log/cassandra/debug.log file, which contains an asynchronous log of events
written to the system.log file, plus production logging information useful for debugging issues.
STDOUT
Directs logs to the console in a human-readable format.
LogbackMetrics
Records the rate of logged events by their logging level.
SLF4JAuditWriterAppender | DroppedAuditEventAppender
Used by the audit logging functionality. See Setting up database auditing for more information.
The following logging functionality is configurable:
• Rolling policy
Log levels
The valid values for setting the log level include ALL for logging information at all levels, TRACE through
ERROR, and OFF for no logging. TRACE creates the most verbose log, and ERROR, the least.
• ALL
• TRACE
• DEBUG
• INFO (Default)
• WARN
• ERROR
• OFF
When set to TRACE or DEBUG output appears only in the debug.log. When set to INFO the debug.log is
disabled.
Increasing logging levels can generate heavy logging output on a moderately trafficked cluster.
Use the nodetool getlogginglevels command to see the current logging configuration.
bin\nodetool getlogginglevels
Logger Name Log Level
ROOT INFO
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
209
Configuration
com.thinkaurelius.thrift ERROR
To add debug logging to a class permanently using the logback framework, use nodetool setlogginglevel to
confirm the component or class before setting it in the logback.xml file in installation_location/conf. Modify
to include the following line or similar at the end of the file:
Command archive_command=
Command restore_command=
Parameters %from Fully qualified path of the archived commitlog segment from the restore_directories.
Command restore_directories=
Format restore_directories=restore_directory_location
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
210
Configuration
Command restore_point_in_time=
Restore stops when the first client-supplied timestamp is greater than the restore point timestamp.
Because the order in which the database receives mutations does not strictly follow the timestamp order,
this can leave some mutations unrecovered.
1. Enable CDC logging and configure CDC directories and space in cassandra.yaml.
For example, to enable CDC logging with default values:
cdc_enabled: true
cdc_total_space_in_mb: 4096
cdc_free_space_check_interval_ms: 250
cdc_raw_directory: /var/lib/cassandra/cdc_raw
2. To enable CDC logging for a database table, create or alter the table with the table property.
For example, to enable CDC logging on the cycling table:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
211
Chapter 5. Initializing a DataStax Enterprise cluster
Complete the following tasks before initializing a DSE cluster.
• Establish a firm understanding of how the database works. Be sure to read at least Understanding the
database architecture and Data replication.
• Ensure the environment is suitable for the use case and workload.
• Determine the snitch and replication strategy. The GossipingPropertyFileSnitch and NetworkTopologyStrategy
are recommended for production environments.
• Determine which nodes are seed nodes. Do not make all nodes seed nodes.
Seed nodes are not required for DSE Search datacenters, see Internode communications (gossip).
• Review and make appropriate changes to other property files, such as cassandra-rackdc.properties.
• Set virtual nodes correctly for the type of datacenter. DataStax recommends using 8 vnodes (tokens). See
Virtual nodes for more information.
Initializing datacenters
In most circumstances, each workload type, such as search, analytics, and transactional, should be organized
into separate virtual datacenters. Workload segregation avoids contention for resources. However, workloads can
be combined in SearchAnalytics nodes when there is not a large demand for analytics, or when analytics queries
must use a DSE Search index. Generally, combining transactional (OLTP) and analytics (OLAP) workloads
results in decreased performance.
When creating a keyspace using CQL, DataStax Enterprise creates a virtual datacenter for a cluster, even a one-
node cluster, automatically. You assign nodes that run the same type of workload to the same datacenter. The
separate, virtual datacenters for different types of nodes segregate workloads that run DSE Search from those
nodes that run other workload types.
Single datacenters per workload type
If using a single, physical datacenter, single datacenter deployments are useful.
Multiple datacenters per workload type
If using multiple datacenters, consider multiple datacenter deployments.
The following scenarios describe some benefits of using multiple, physical datacenters:
• Isolating replicas from external infrastructure failures, such as networking between datacenters and power
outages.
• Diversifying assets between public cloud providers and on-premise managed datacenters.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
212
Initializing a DataStax Enterprise cluster
• Preventing the slow down of a real-time analytics cluster by a development cluster running analytics jobs on
live data.
• Using virtual datacenters in the physical datacenter to ensure reads from a specific datacenter is local to the
requests, especially when using a consistency level greater than ONE. This strategy ensures lower latency
because it avoids reads from one node in New York and another read from a node in Los Angeles.
In contrast, a multiple datacenter cluster has more than one datacenter for each type of workload.
The eight-node cluster spans two racks across three datacenters. Applications in each datacenter will use a
default consistency level of LOCAL_QUORUM. One node per rack will serve as a seed node.
Prerequisites:
To prepare the environment, complete the prerequisite tasks outlined in Initializing a DataStax Enterprise
cluster.
If the new datacenter uses existing nodes from another datacenter or cluster, complete the following steps to
ensure that old data will not interfere with the new cluster:
1. If the nodes are behind a firewall, open the required ports for internal/external communication.
3. Clear the data from DataStax Enterprise (DSE) to completely remove application directories.
1. Complete the following steps to prevent client applications from prematurely connecting to the new
datacenter, and to ensure that the consistency level for reads or writes does not query the new datacenter:
If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
213
Initializing a DataStax Enterprise cluster
b. Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter,
which might not have any data.
2. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.
a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.
b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.
DESCRIBE SCHEMA ;
3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.
4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
214
Initializing a DataStax Enterprise cluster
Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.
• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.
• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.
• endpoint_snitch: snitch
See endpoint_snitch and snitches.
Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.
• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.
b. Configure node architecture (all nodes in the datacenter must use the same type):
Virtual node (vnode) allocation algorithm settings
DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
215
Initializing a DataStax Enterprise cluster
• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.
After making any changes in the configuration files, you must the restart the node for the changes to
take effect.
a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.
b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.
7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:
8. Rotate starting DSE through the racks until all the nodes are up.
9. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
216
Initializing a DataStax Enterprise cluster
If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.
10. Run nodetool rebuild on each node in the new datacenter, specifying the datacenter to rebuild from. This
step replicates the data to the new datacenter in the cluster.
You must specify an existing datacenter in the command line, or the new nodes will appear to rebuild
successfully, but might not contain all anticipated data.
Requests to the new datacenter with LOCAL_ONE or ONE consistency levels can fail if the existing
datacenters are not completely in-sync.
a. Use nodetool rebuild on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the existing cluster.
b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.
$ dsetool status
If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.
12. Complete 3 through 11 to add the third datacenter (DC3) to the cluster.
The datacenters in the cluster are now replicating with each other.
DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.2 28.44 KB 13.0.% e2451cdf-f070- ... -922337.... RAC1
UN 110.82.155.2 44.47 KB 16.7% f9fa427c-a2c5- ... 30745512... RAC2
UN 110.82.155.3 54.33 KB 23.6% b9fc31c7-3bc0- ..- 45674488... RAC1
DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.3 15.44 KB 50.2.% e2451cdf-f070- ... 9243578.... RAC1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
217
Initializing a DataStax Enterprise cluster
The ten-node cluster spans two racks across five datacenters. Applications in each datacenter will use a default
consistency level of LOCAL_QUORUM. One node per rack will serve as a seed node.
Prerequisites:
Complete the prerequisite tasks outlined in Initializing a DataStax Enterprise cluster to prepare the
environment.
If the new datacenter uses existing nodes from another datacenter or cluster, complete the following steps to
ensure that old data will not interfere with the new cluster:
1. If the nodes are behind a firewall, open the required ports for internal/external communication.
3. Clear the data from DataStax Enterprise (DSE) to completely remove application directories.
1. Complete the following steps to prevent client applications from prematurely connecting to the new
datacenter, and to ensure that the consistency level for reads or writes does not query the new datacenter:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
218
Initializing a DataStax Enterprise cluster
If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.
b. Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter,
which might not have any data.
2. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.
a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.
b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.
DESCRIBE SCHEMA ;
3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
219
Initializing a DataStax Enterprise cluster
4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.
Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.
• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.
• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.
• endpoint_snitch: snitch
See endpoint_snitch and snitches.
Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.
• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.
b. Configure node architecture (all nodes in the datacenter must use the same type):
Virtual node (vnode) allocation algorithm settings
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
220
Initializing a DataStax Enterprise cluster
DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.
For more information, refer to Virtual node (vnode) configuration.
Single-token architecture settings
• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.
After making any changes in the configuration files, you must the restart the node for the changes to
take effect.
a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.
b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.
7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:
8. Rotate starting DSE through the racks until all the nodes are up.
9. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
221
Initializing a DataStax Enterprise cluster
If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.
10. Run nodetool rebuild on each node in the new datacenter, specifying the datacenter to rebuild from. This
step replicates the data to the new datacenter in the cluster.
You must specify an existing datacenter in the command line, or the new nodes will appear to rebuild
successfully, but might not contain all anticipated data.
Requests to the new datacenter with LOCAL_ONE or ONE consistency levels can fail if the existing
datacenters are not completely in-sync.
a. Use nodetool rebuild on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the existing cluster.
b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.
$ dsetool status
If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.
The datacenters in the cluster are now replicating with each other.
DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.2 28.44 KB 50.2.% e2451cdf-f070- ... -922337.... RAC1
UN 110.82.155.2 44.47 KB 49.8% f9fa427c-a2c5- ... 30745512... RAC2
DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.3 15.44 KB 50.2.% e2451cdf-f070- ... 9243578.... RAC1
UN 110.82.155.4 18.78 KB 49.8.% e2451cdf-f070- ... 10000 RAC2
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
222
Initializing a DataStax Enterprise cluster
DC: Analytics2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.82.155.3 54.33 KB 50.2% b9fc31c7-3bc0- ..- 45674488... RAC1
UN 110.55.120.2 54.33 KB 49.8% b8gd45e4-3bc0- ..- 45674488... RAC2
What's next:
• A seed node is used to bootstrap the gossip process for new nodes joining a cluster.
• To learn the topology of the ring, a joining node contacts one of the nodes in the -seeds list in
cassandra.yaml.
• The first time you bring up a node in a new cluster, only one node is the seed node.
• The seeds list is a comma delimited list of addresses. Since this example cluster includes 5 nodes, you must
change the list from the default value "127.0.0.1" to the IP address of one of the nodes.
• After all nodes are added, all nodes in the datacenter must be configured to use the same seed nodes.
Making every node a seed node is not recommended because of increased maintenance and reduced gossip
performance. Gossip optimization is not critical, but it is recommended to use a small seed list (approximately
three nodes per datacenter).
This single datacenter example has 5 nodes, where nodeA, nodeB, and nodeC are seed nodes.
nodeA 110.82.155.0 #
nodeB 110.82.155.1 #
nodeC 110.54.125.1 #
nodeD 110.54.125.2
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
223
Initializing a DataStax Enterprise cluster
nodeE 110.54.155.2
1. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.
2. For nodeA, nodeB, and nodeC, configure only nodeA as seed node:
a. In cassandra.yaml:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: 110.82.155.0
3. Start the seed nodes one at a time nodeA, nodeB, and then nodeC.
4. For nodeA, nodeB, and nodeC, change cassandra.yaml to configure nodeA, nodeB, and nodeC as seed
nodes:
a. In cassandra.yaml:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: 110.82.155.0, 110.82.155.1, 110.54.125.1
You do not need to restart nodeA, nodeB, or nodeC after changing the seed node entry in
cassandra.yaml; the nodes will reread the seed nodes.
5. For nodeD and nodeE, change cassandra.yaml to configure nodeA, nodeB, and nodeC as seed nodes:
a. In cassandra.yaml:
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: 110.82.155.0, 110.82.155.1, 110.54.125.1
# Comment out the listen_address property. If the node is properly configured (host name, name
resolution, and so on), the database uses InetAddress.getLocalHost() to get the local address from
the system.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
224
Initializing a DataStax Enterprise cluster
• Node in a multi-node installations: set the listen_address property to the node's IP address or hostname,
or set listen_interface.
• Node with two physical network interfaces in a multi-datacenter installation or cluster deployed
across multiple Amazon EC2 regions using the Ec2MultiRegionSnitch:
1. Set listen_address to this node's private IP or hostname, or set listen_interface (for communication
within the local datacenter).
4. If this node is a seed node, add the node's public IP address or hostname to the seeds list.
These steps provide information about setting up a cluster having one or more datacenters.
• node1 10.176.43.66
• node2 10.168.247.41
• node4 10.169.61.170
• node5 10.169.30.138
2. Calculate the token assignments as described in Calculating tokens for single-token architecture nodes.
The following tables list tokens for a 6 node cluster with a single datacenter or two datacenters.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
225
Initializing a DataStax Enterprise cluster
node0 0
node1 21267647932558653966460912964485513216
node2 42535295865117307932921825928971026432
node3 63802943797675961899382738893456539648
node4 85070591730234615865843651857942052864
node5 106338239662793269832304564822427566080
node0 0 NA DC1
3. If the nodes are behind a firewall, open the required ports for internal/external communication.
4. If DataStax Enterprise is running, stop the node and clear the data:
• Tarball installations:
From the installation location, stop the database:
$ bin/dse cassandra-stop
5. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.
Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
226
Initializing a DataStax Enterprise cluster
• initial_token: token_value_from_calculation
• num_tokens: 1
• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.
• auto_bootstrap: false
Add the bootstrap setting only when initializing a new cluster with no data.
• endpoint_snitch: snitch
See endpoint_snitch and snitches.
Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.
• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.
6. Set the properties in the dse.yaml file as required by your use case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
227
Initializing a DataStax Enterprise cluster
110.82.155.4=DC_Search:RAC2
After making any changes in the configuration files, you must the restart the node for the changes to take
effect.
8. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a time,
and then start the rest of the nodes:
$ dsetool status
If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.
Datacenter: Cassandra
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 110.82.155.0 21.33 KB 256 33.3% a9fa31c7-f3c0-... RAC1
UN 110.82.155.1 21.33 KB 256 33.3% f5bb416c-db51-... RAC1
UN 110.82.155.2 21.33 KB 256 16.7% b836748f-c94f-... RAC1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
228
Initializing a DataStax Enterprise cluster
Usage:
• Package installations:
• Tarball installations:
$ installation_location/resources/cassandra/tools/bin/token-generator num_of_nodes_in_dc
... [options]
Options Description
• -h
• --help
Offset token values Use when adding or replacing dead nodes or datacenters.
• --ringoffset offset
Test Displays various ring arrangements and generates an HTML file showing these
arrangements.
• --test
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
229
Initializing a DataStax Enterprise cluster
2. Assign the tokens to nodes on alternating racks in the cassandra-rackdc.properties or the cassandra-
topology.properties file.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
230
Initializing a DataStax Enterprise cluster
2. After calculating the tokens, assign the tokens so that the nodes in each datacenter are evenly dispersed
around the ring.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
231
Initializing a DataStax Enterprise cluster
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
232
Initializing a DataStax Enterprise cluster
Datacenter 1
Datacenter 2
TokenToken position
position
The results show the generated token values for the Murmur3Partitioner for one datacenter with 3 nodes
and one datacenter with 2 nodes with an offset:
DC #1:
Node #1: 6148914691236517105
Node #2: 12297829382473034310
Node #3: 18446744073709551516
DC #2:
Node #1: 9144875253562394637
Node #2: 18368247290417170445
The value of the offset is for the first node and all other nodes are calculated for even distribution from the
offset.
The tokens without the offset are:
2. After calculating the tokens, assign the tokens so that the nodes in each datacenter are evenly dispersed
around the ring and alternate the rack assignments.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
233
Chapter 6. Security
For securing DataStax Enterprise 6.0, see the DataStax Security Guide.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
234
Chapter 7. Using DataStax Enterprise advanced
functionality
Information on using DSE Analytics, DSEFS, DSE Search, DSE Graph, DSE Advanced Replication, DSE In-
Memory, DSE Multi-Instance, DSE Tiered Storage and DSE Performance services.
DSE Analytics
DataStax Enterprise (DSE) integrates real-time and batch operational analytics capabilities with an enhanced
version of Apache Spark™. With DSE Analytics you can easily generate ad-hoc reports, target customers with
personalization, and process real-time streams of data. The analytics toolset lets you write code once and then
use it for both real-time and batch workloads.
About DSE Analytics
DataStax Enterprise (DSE) integrates real-time and batch operational analytics capabilities with an enhanced
version of Apache Spark™. With DSE Analytics you can easily generate ad-hoc reports, target customers with
personalization, and process real-time streams of data. The analytics toolset lets you write code once and then
use it for both real-time and batch workloads.
DSE Analytics jobs can use the DataStax Enterprise File System (DSEFS) to handle the large data sets typical
of analytic processing. DSEFS replaces CFS (Cassandra File System).
DSE Analytics features
No single point of failure
DSE Analytics supports a peer-to-peer, distributed cluster for running Spark jobs. Being peers, any
node in the cluster can load data files, and any analytics node can assume the responsibilities of Spark
Master.
Spark Master management
DSE Analytics provides automatic Spark Master management.
Analytics without ETL
Using DSE Analytics, you run Spark jobs directly against data in the database. You can perform real-
time and analytics workloads at the same time without one workload affecting the performance of the
other. Starting some cluster nodes as Analytics nodes and others as pure transactional real-time nodes
automatically replicates data between nodes.
DataStax Enterprise file system (DSEFS)
DSEFS (DataStax Enterprise file system) is a fault-tolerant, general-purpose, distributed file system
within DataStax Enterprise. It is designed for use cases that need to leverage a distributed file system
for data ingestion, data staging, and state management for Spark Streaming applications (such
as checkpointing or write-ahead logging). DSEFS is similar to HDFS, but avoids the deployment
complexity and single point of failure typical of HDFS. DSEFS is HDFS-compatible and is designed to
work in place of HDFS in Spark and other systems.
DSE Analytics Solo
DSE Analytics Solo datacenters are devoted entirely to DSE Analytics processing, for deployments that
require separation of analytics jobs from transactional data.
Integrated security
DSE Analytics uses the advanced security features of DSE, simplifying configuration and deployment.
AlwaysOn SQL
AlwaysOn SQL is a highly-available service that provides JDBC and ODBC interfaces to applications
accessing DSE Analytics data.
Enabling DSE Analytics
To enable Anayltics, follow the architecture guidelines for choosing a workload type for the datacenters in the
cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
235
Using DataStax Enterprise advanced functionality
• dse_analytics
• dse_leases
• dsefs
• "HiveMetaStore"
All analytics keyspaces are initially created with the SimpleStrategy replication strategy and a replication
factor (RF) of 1. Each of these must be updated in production environments to avoid data loss. After starting
the cluster, alter the keyspace to use the NetworkTopologyStrategy replication strategy with an appropriate
settings for the replication factor and datacenters. For most environments using DSE Analytics, a suitable
replication factor will be either 3 or the cluster size, whichever is smaller.
For example, use a CQL statement to configure the dse_leases keyspace for a replication factor of 3 in both
DC1 and DC2 datacenters using NetworkTopologyStrategy:
Only replicate DSE Analytics keyspaces to other DSE Analytics datacenters. DSEFS does not support
replication to other datacenters, and the dsefs keyspace only contains metadata, not the data stored in
DSEFS. Each DSE Analytics datacenter should have its own DSEFS instance.
The datacenter name used is case-sensitive. If needed, use the dsetool status command to confirm the exact
datacenter spelling.
After adjusting the replication factor, nodetool repair must be run on each node in the affected datacenters.
For example to repair the altered keyspace dse_leases:
Repeat the above steps for each of the analytics keyspaces listed above. For more information see Changing
keyspace replication strategy.
DSE Analytics and Search integration
An integrated DSE SearchAnalytics cluster allows analytics jobs to be performed using CQL queries. This
integration allows finer-grained control over the types of queries that are used in analytics workloads, and
improves performance by reducing the amount of data that is processed. However, a DSE SearchAnalytics
cluster does not provide workload isolation and there are no detailed guidelines for provisioning and performance
in production environments.
Nodes that are started in SearchAnalytics mode allow you to create analytics queries that use DSE Search
indexes. These queries return RDDs that are used by Spark jobs to analyze the returned data.
The following code shows how to use a DSE Search query from the DSE Spark console.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
236
Using DataStax Enterprise advanced functionality
.take(10)
For a detailed example, see Running the Wikipedia demo with SearchAnalytics.
Configuring a DSE SearchAnalytics cluster
1. Create DSE SearchAnalytics nodes in a mixed-workload cluster, as described in Initializing a single
datacenter per workload type.
The name of the datacenter is set to SearchAnalytics when using the DseSimpleSnitch. Do not modify
existing search or analytics nodes that use DseSimpleSnitch to be SearchAnalytics nodes. If you use
another snitch like GossipingPropertyFileSnitch you can have a mixed workload within a datacenter.
2. Perform load testing to ensure your hardware has enough CPU and memory for the additional resource
overhead that is required by Spark and Solr.
SearchAnalytics nodes always use driver paging settings. See Using pagination (cursors) with CQL Solr
queries.
SearchAnalytics nodes might consume more resources than search or analytics nodes. Resource
requirements of the nodes greatly depend on the type of query patterns you are using.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
237
Using DataStax Enterprise advanced functionality
When in auto mode the predicate push down will do a COUNT operation against the Search indices both with
and without the predicate filters applied. If the number of records with the predicate filter is less than the result
of the following formula:
To create a temporary table in Spark SQL with Solr predicate push down enabled:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
238
Using DataStax Enterprise advanced functionality
Traditional DSE Analytics deployments have both the DataStax database process and the Spark process
running on the same machine. This allows for simple deployment of analytic processing when the analysis is not
as intensive, or the database is not as heavily used.
DSE Analytics Solo allows customers to deploy DSE Analytics processing on segregated hardware
configurations in a different datacenter from the transactional DSE nodes. This ensures consistent behavior of
both engines in a configuration that does not compete for computer resources. This configuration is good for
processing-intensive analytic workloads.
DSE Analytics Solo allows the flexibility to have more nodes dedicated to data processing than are used for
database transactions. This is particularly good for situations where the processing needs far exceed the
transactional resource needs. For example, suppose you have a Spark Streaming job that will analyze and
filter 99.9% of the incoming data, storing only a few records after analysis. The resources required by the
transactional datacenter are much smaller than the resources required to analyze the data.
DSE Analytics Solo is more elastic in terms of scaling up, or down, the analytic processing in the cluster. This is
particularly useful when you need extra analytics processing, such as end of the day or end of the quarter surges
in analytics jobs. Since a DSE Analytics Solo node does not store database data, when new nodes are added to
a cluster there is very little data moved across the network to the new nodes. In an analytics and transactional
collocated environment, adding a node means moving transactional data between the existing nodes and the
new nodes.
For information on creating a DSE Analytics Solo datacenter, see Creating a DSE Analytics Solo datacenter.
Analyzing data using Spark
Spark is the default mode when you start an analytics node in a packaged installation.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
239
Using DataStax Enterprise advanced functionality
About Spark
Apache Spark is a framework for analyzing large data sets across a cluster, and is enabled when you start an
Analytics node. Spark runs locally on each node and executes in memory when possible. Spark uses multiple
threads instead of multiple processes to achieve parallelism on a single node, avoiding the memory overhead of
several JVMs.
Apache Spark integration with DataStax Enterprise includes:
• AlwaysOn SQL
• Spark streaming
• SparkR integration
Spark architecture
The software components for a single DataStax Enterprise analytics node are:
• Spark Worker
• The database
A Spark Master acts purely as a resource manager for Spark applications. Spark Workers launch executors that
are responsible for executing part of the job that is submitted to the Spark Master. Each application has its own
set of executors. Spark architecture is described in the Apache documentation.
DSE Spark nodes use a different resource manager than standalone Spark nodes. The DSE Resource
Manager simplifies integration between Spark and DSE. In a DSE Spark cluster, client applications use the
CQL protocol to connect to any DSE node, and that node redirects the request to the Spark Master.
The communication between the Spark client application (or driver) and the Spark Master is secured the same
way as connections to DSE, which means that plain password authentication as well as Kerberos authentication
is supported, with or without SSL encryption. Encryption and authentication can be configured per application,
rather than per cluster. Authentication and encryption between the Spark Master and Worker nodes can be
enabled or disabled regardless of the application settings.
Spark supports multiple applications. A single application can spawn multiple jobs and the jobs run in parallel.
An application reserves some resources on every node and these resources are not freed until the application
finishes. For example, every session of Spark shell is an application that reserves resources. By default, the
scheduler tries allocate the application to the highest number of different nodes. For example, if the application
declares that it needs four cores and there are ten servers, each offering two cores, the application most likely
gets four executors, each on a different node, each consuming a single core. However, the application can
get also two executors on two different nodes, each consuming two cores. You can configure the application
scheduler. Spark Workers and Spark Master are part of the main DSE process. Workers spawn executor JVM
processes which do the actual work for a Spark application (or driver). Spark executors use native integration to
access data in local transactional nodes through the Open Source Spark-Cassandra Connector. The memory
settings for the executor JVMs are set by the user submitting the driver to DSE.
In deployment for each Analytics datacenter one node runs the Spark Master, and Spark Workers run on each
of the nodes. The Spark Master comes with automatic high availability.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
240
Using DataStax Enterprise advanced functionality
As you run Spark, you can access data in the Hadoop Distributed File System (HDFS), or the DataStax
Enterprise File System (DSEFS) by using the URL for the respective file system.
Highly available Spark Master
The Spark Master High Availability mechanism uses a special table in the dse_analytics keyspace to
store information required to recover Spark workers and the application. Reads to the recovery data in
dse_analytics are always performed using the LOCAL_QUORUM consistency level. Writes are attempted
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
241
Using DataStax Enterprise advanced functionality
first using LOCAL_QUORUM, and if that fails, the write is retried using LOCAL_ONE. Unlike the high availability
mechanism mentioned in Spark documentation, DataStax Enterprise does not use ZooKeeper.
If the original Spark Master fails, the reserved one automatically takes over. To find the current Spark Master,
run:
The Spark Master will not start until LOCAL_QUORUM is attainable for the dse_analytics keyspace.
Unsupported features
The following Spark features and APIs are not supported:
By default DSEFS is required to execute Spark applications. DSEFS should not be disabled when Spark is
enabled on a DSE node. If there is a strong reason not to use DSEFS as the default file system, reconfigure
Spark to use a different file system. For example to use a local file system set the following properties in
spark-daemon-defaults.conf:
spark.hadoop.fs.defaultFS=file:///
spark.hadoop.hive.metastore.warehouse.dir=file:///tmp/warehouse
How you start Spark depends on the installation and if you want to run in Spark mode or SearchAnalytics
mode:
Package installations:
To start the Spark trackers on a cluster of analytics nodes, edit the /etc/default/dse file to set
SPARK_ENABLED to 1.
When you start DataStax Enterprise as a service, the node is launched as a Spark node. You can
enable additional components.
Tarball installations:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
242
Using DataStax Enterprise advanced functionality
To start the Spark trackers on a cluster of analytics nodes, use the -k option:
$ installation_location/bin/dse cassandra -k
Nodes started with -k are automatically assigned to the default Analytics datacenter if you do not
configure a datacenter in the snitch property file.
You can enable additional components:
Mode Option Description
For example:
To start a node in SearchAnalytics mode, use the -k and -s options.
$ installation_location/bin/dse cassandra -k -s
Starting the node with the Spark option starts a node that is designated as the master, as shown by the
Analytics(SM) workload in the output of the dsetool ring command:
$ dsetool ring
0
10.200.175.149 Analytics rack1 Analytics(SM) no Up
Normal 185 KiB ? -9223372036854775808
0.90
10.200.175.148 Analytics rack1 Analytics(SW) no Up
Normal 194.5 KiB ? 0
0.90
Note: you must specify a keyspace to get ownership information.
Launching Spark
After starting a Spark node, use dse commands to launch Spark.
Usage:
Package installations: dse spark
Tarball installations: installation_location/bin/dse spark
You can use Cassandra specific properties to start Spark. Spark binds to the listen_address that is specified
in cassandra.yaml.
DataStax Enterprise supports these commands for launching Spark on the DataStax Enterprise command line:
dse spark
Enters interactive Spark shell, offers basic auto-completion.
Package installations: dse spark
Tarball installations: installation_location/bin/ dse spark
dse spark-submit
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
243
Using DataStax Enterprise advanced functionality
Launches applications on a cluster like spark-submit. Using this interface you can use Spark cluster
managers without the need for separate configurations for each application. The syntax for package
installations is:
For example, if you write a class that defines an option named d, enter the command as follows:
The JAR file can be located in a DSEFS directory. If the DSEFS cluster is secured, provide
authentication credentials as described in DSEFS authentication.
The dse spark-submit command supports the same options as Apache Spark's spark-submit. For
example, to submit an application using cluster mode using the supervise option to restart in case of
failure:
The directory in which you run the dse Spark commands must be writable by the current user.
export DSE_USERNAME=user
export DSE_PASSWORD=secret
These environment variables are supported for all Spark and dse client-tool commands.
DataStax recommends using the environment variables instead of passing user credentials on the
command line.
You can provide authentication credentials in several ways, see Credentials for authentication.
Specifying Spark URLs
You do not need to specify the Spark Master address when starting Spark jobs with DSE. If you connect to any
Spark node in a datacenter, DSE will automatically discover the Master address and connect the client to the
Master.
Specify the URL for any Spark node using the following format:
By default the URL is dse://?, which is equivalent to dse://localhost:9042. Any parameters you set in the
URL will override the configuration read from DSE's Spark configuration settings.
You can specify the work pool in which the application will be run by adding the workpool=work pool name as
a URL parameter. For example, dse://1.1.1.1:123?workpool=workpool2.
Valid parameters are CassandraConnectorConf settings with the spark.cassandra. prefix stripped. For
example, you can set the spark.cassandra.connection.local_dc option to dc2 by specifying dse://?
connection.local_dc=dc2.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
244
Using DataStax Enterprise advanced functionality
spark.blockManager.port 38000
spark.broadcast.port 38001
spark.driver.port 38002
spark.executor.port 38003
spark.fileserver.port 38004
spark.replClassServer.port 38005
For a full list of ports used by DSE, see Securing DataStax Enterprise ports.
1. Export the DataStax Enterprise client configuration from the remote node to the client node:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
245
Using DataStax Enterprise advanced functionality
To set the driver host to a publicly accessible IP address, pass in the spark.driver.host option.
Kerberos authentication is not supported in the Spark web UI. If authentication is enabled and either LDAP
or Internal authentication is not available, the Spark web UI will not be accessible. If this occurs, disable
authentication for the Spark web UI only by removing the spark.ui.filters setting in spark-daemon-
defaults.conf located in the Spark configuration directory.
DSE SSL encryption and authentication only apply to the Spark Master and Worker UIs, not the Spark Driver
UI. To use encryption and authentication with the Driver UI, refer to the Spark security documentation.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
246
Using DataStax Enterprise advanced functionality
The UI includes information on the number of cores and amount of memory available to Spark in total and in
each work pool, and similar information for each Spark worker. The applications list the associated work pool.
See the Spark documentation for information on using the Spark web UI.
Authorization in the Spark web UI
When authorization is enabled and an authenticated user accesses the web UI, what they can see and do
is controlled by their permissions. This allows administrators to control who has permission to view specific
application logs, view the executors for the application, kill the application, and list all applications. Viewing and
modifying applications can be configured per datacenter, work pool, or application.
See Using authorization with Spark for details on granting permissions.
Displaying fully qualified domain names in the web UI
To display fully qualified domain names (FQDNs) in the Spark web UI, set the SPARK_PUBLIC_DNS variable in
spark-env.sh on each Analytics node.
Set SPARK_PUBLIC_DNS to the FQDN of the node if you have SSL enabled for the web UI.
Redirecting to the fully qualified domain name of the master
Set the SPARK_LOCAL_IP or SPARK_LOCAL_HOSTNAME in the spark-env.sh file on each node to the fully qualified
domain name (FQDN) of the node to force any redirects to the web UI using the FQDN of the Spark master.
This is useful when enabling SSL in the web UI.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
247
Using DataStax Enterprise advanced functionality
If the tool is run on a server that is not part of the DSE cluster, see Running Spark commands against a
remote cluster.
Jupyter integration
Download and install Jupyter notebook on a DSE node.
To launch Jupyter notebook:
A Jupyter notebook starts with the correct Python path. You must create a context to work with DSE. In
contrast to Livy and Zeppelin integrations, the Jupyter integration does not start an interpreter that creates a
context.
Livy integration
Download and install Livy on a DSE node. By default Livy runs Spark in local mode. Before starting Livy create
a configuration file by copying the conf/livy.conf.template to conf/livy.conf, then uncomment or add
the following two properties:
livy.spark.master = dse:///
livy.repl.enable-hive-context = true
To launch Livy:
RStudio integration
Download and install R on all DSE Analytics nodes, install RStudio desktop on one of the nodes, then run
RStudio:
These instructions are for RStudio desktop, not RStudio Server. In multiuser environments, we recommend
using AlwaysOn SQL and JDBC connections rather than SparkR.
Zeppelin integration
Download and install Zeppelin on a DSE node. To launch Zeppelin server:
By default Zeppelin runs Spark in local mode. Update the master property to dse:/// in the Spark session in
the Interpreters configuration page. No configuration file changes are required to run Zeppelin.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
248
Using DataStax Enterprise advanced functionality
Configuring Spark
Configuring Spark for DataStax Enterprise includes:
Configuring Spark nodes
Modify the settings for Spark nodes security, performance, and logging.
To manage Spark performance and operations:
The temporary directory for shuffle data, RDDs, and other ephemeral Spark data can be configured for both
the locally running driver and for the Spark server processes managed by DSE (Spark Master, Workers,
shuffle service, executor and driver running in cluster mode).
For the locally running Spark driver, the SPARK_LOCAL_DIRS environment variable can be customized in the
user environment or in spark-env.sh. By default, it is set to the system temporary directory. For example,
on Ubuntu it is /tmp/. If there's no system temporary directory, then SPARK_LOCAL_DIRS is set to a .spark
directory in the user's home directory.
For all other Spark server processes, the SPARK_EXECUTOR_DIRS environment variable can be customized in
the user environment or in spark-env.sh. By default it is set to /var/lib/spark/rdd.
The default SPARK_LOCAL_DIRS and SPARK_EXECUTOR_DIRS environment variable values differ from non-
DSE Spark.
To configure worker cleanup, modify the SPARK_WORKER_OPTS environment variable and add the cleanup
properties. The SPARK_WORKER_OPTS environment variable can be set in the user environment or in spark-
env.sh. For example, the following enables worker cleanup,.sets the cleanup interval to 30 minutes (i.e. 1800
seconds), and sets the length of time application worker directories will be retained to 7 days (i.e. 604800
seconds).
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
249
Using DataStax Enterprise advanced functionality
In multiple datacenter clusters, use a virtual datacenter to isolate Spark jobs. Running Spark jobs consume
resources that can affect latency and throughput.
DataStax Enterprise supports the use of virtual nodes (vnodes) with Spark.
Secure Spark nodes
Client-to-node SSL
Ensure that the truststore entries in cassandra.yaml are present as described in Client-to-node
encryption, even when client authentication is not enabled.
Enabling security and authentication
Security is enabled using the spark_security_enabled option in dse.yaml. Setting it to
enabled turns on authentication between the Spark Master and Worker nodes, and allows you to
enable encryption. To encrypt Spark connections for all components except the web UI, enable
spark_security_encryption_enabled. The length of the shared secret used to secure Spark
components is set using the spark_shared_secret_bit_length option, with a default value of 256
bits. These options are described in DSE Analytics options. For production clusters, enable these
authentication and encryption. Doing so does not significantly affect performance.
Authentication and Spark applications
If authentication is enabled, users need to be authenticated in order to submit an application.
Authorization and Spark applications
If DSE authorization is enabled, users needs permission to submit an application. Additionally, the
user submitting the application automatically receives permission to manage the application, which
can optionally be extended to other users.
Database credentials for the Spark SQL Thrift server
In the hive-site.xml file, configure authentication credentials for the Spark SQL Thrift server. Ensure
that you use the hive-site.xml file in the Spark directory:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
250
Using DataStax Enterprise advanced functionality
periodically. For security reasons, the user who is authenticated with the token should not be able to
renew it. Therefore, delegation tokens have two associated users: token owner and token renewer.
The token renewer is none so that only a DSE internal process can renew it. When the application is
submitted, DSE automatically renews delegation tokens that are associated with Spark application.
When the application is unregistered (finished), the delegation token renewal is stopped and the
token is cancelled.
Set Kerberos options, see Defining a Kerberos scheme.
Configure Spark memory and cores
Spark memory options affect different components of the Spark ecosystem:
Spark History server and the Spark Thrift server memory
The SPARK_DAEMON_MEMORY option configures the memory that is used by the Spark SQL
Thrift server and history-server. Add or change this setting in the spark-env.sh file on nodes that run
these server applications.
Spark Worker memory
The memory_total option in resource_manager_options.worker_options section of dse.yaml
configures the total system memory that you can assign to all executors that are run by the work
pools on the particular node. The default work pool will use all of this memory if no other work pools
are defined. If you define additional work pools, you can set the total amount of memory by setting the
memory option in the work pool definition.
Application executor memory
You can configure the amount of memory that each executor can consume for the application. Spark
uses a 512MB default. Use either the spark.executor.memory option, described in "Spark Available
Properties", or the --executor-memory mem argument to the dse spark command.
Application memory
You can configure additional Java options that are applied by the worker when spawning an executor for
the application. Use the spark.executor.extraJavaOptions property, described in Spark 1.6.2 Available
Properties. For example: spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value
-Dnumbers="one two three"
Core management
You can manage the number of cores by configuring these options.
• Application cores
In the Spark configuration object of your application, you configure the number of application cores that
the application requests from the cluster using either the spark.cores.max configuration property or the
--total-executor-cores cores argument to the dse spark command.
See the Spark documentation for details about memory and core allocation.
DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-automatic
fashion. The resource_manager_options.worker_options section in the dse.yaml file has options to
configure the proportion of system resources that are made available to Spark Workers and any defined
work pools, or explicit resource settings. When specifying decimal values of system resources the available
resources are calculated in the following way:
• Spark Worker memory = memory_total * (total system memory - memory assigned to DSE)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
251
Using DataStax Enterprise advanced functionality
This calculation is used for any decimal values. If the setting is not specified, the default value 0.7 is used. If
the value does not contain a decimal place, the setting is the explicit number of cores or amount of memory
reserved by DSE for Spark.
Setting cores_total or a workpool's cores to 1.0 is a decimal value, meaning 100% of the available cores
will be reserved. Setting cores_total or cores to 1 (no decimal point) is an explicit value, and one core will
be reserved.
The lowest values you can assign to a named work pool's memory and cores are 64 MB and 1 core,
respectively. If the results are lower, no exception is thrown and the values are automatically limited.
The following example shows a work pool named workpool1 with 1 core and 512 MB of RAM assigned to it.
The remaining resources calculated from the values in worker_options are assigned to the default work
pool.
resource_manager_options:
worker_options:
cores_total: 0.7
memory_total: 0.7
workpools:
- name: workpool1
cores: 1
memory: 512M
# Uncomment the following line to make this snitch prefer the internal ip when possible,
as the Ec2MultiRegionSnitch does.
prefer_local=true
This tells the cluster to communicate only on private IP addresses within the datacenter rather than the public
routable IP addresses.
Configuring the number of retries to retrieve Spark configuration
When Spark fetches configuration settings from DSE, it will not fail immediately if it cannot retrieve the
configuration data, but will retry 5 times by default, with increasing delay between retries. The number of
retries can be set in the Spark configuration, by modifying the spark.dse.configuration.fetch.retries
configuration property when calling the dse spark command, or in spark-defaults.conf.
Disabling continuous paging
Continuous paging streams bulk amounts of records from DSE to the DataStax Java Driver
used by DSE Spark. By default, continuous paging in queries is enabled. To disable it, set the
spark.dse.continuous_paging_enabled setting to false when starting the Spark SQL shell or in spark-
defaults.conf. For example:
Using continuous paging can potentially improve performance up to 3 times, though the improvement
will depend on the data and the queries. Some factors that impact the performance improvement are the
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
252
Using DataStax Enterprise advanced functionality
number of executor JVMs per node and the number of columns included in the query. Greater performance
gains were observed with fewer executor JVMs per node and more columns selected.
2. Set the SPARK_MASTER_WEBUI_PORT variable to the new port number. For example, to set it to port 7082:
export SPARK_MASTER_WEBUI_PORT=7082
To add the Graphite JARs to Spark in a package installation, copy them to the Spark lib directory:
spark.network.crypto.enabled true
spark.dseShuffle.noSasl.port 7437 The port number on which a shuffle service for unsecured
applications is started. Bound to the listen_address in
cassandra.yaml.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
253
Using DataStax Enterprise advanced functionality
By default Spark executor logs, which log the majority of your Spark Application output, are
redirected to standard output. The output is managed by Spark Workers. Configure logging by adding
spark.executor.logs.rolling.* properties to spark-daemon-defaults.conf file.
spark.executor.logs.rolling.maxRetainedFiles 3
spark.executor.logs.rolling.strategy size
spark.executor.logs.rolling.maxSize 50000
Additional Spark properties that affect the master and driver can be added to spark-daemon-defaults.conf.
For example, to enable Spark's commons-crypto encryption library:
spark.network.crypto.enabled true
dse://10.200.181.62:9042?
connection.local_dc=Analytics;connection.host=10.200.181.63;
10.200.181.62
$ dsetool ring
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
254
Using DataStax Enterprise advanced functionality
• Query the dse_leases.leases table to list all the masters from each data center with Analytics nodes:
Ensure that the replication factor is configured correctly for the dse_leases keyspace
If the dse_leases keyspace is not properly replicated, the Spark Master might not be elected.
Every time you add a new datacenter, you must manually increase the replication factor of the dse_leases
keyspace for the new DSE Analytics datacenter. If DataStax Enterprise or Spark security options are
enabled on the cluster, you must also increase the replication factor for the dse_security keyspace across
all logical datacenters.
The initial node in a multi datacenter has a replication factor of 1 for the dse_leases keyspace. For new
datacenters, the first node is created with the dse_leases keyspace with an replication factor of 1 for that
datacenter. However, any datacenters that you add have a replication factor of 0 and require configuration
before you start DSE Analytics nodes. You must change the replication factor of the dse_leases keyspace for
multiple analytics datacenters. See Setting the replication factor for analytics keyspaces.
Monitoring the lease subsystem
All changes to lease holders are recorded in the dse_leases.logs table. Most of the time, you do not want to
enable logging.
1. To turn on logging, ensure that the lease_metrics_options is enabled in the dse.yaml file:
lease_metrics_options:
enabled:true
ttl_seconds: 604800
name | dc | monitor | at |
new_holder | old_holder
-------------------+-----+---------------+---------------------------------
+---------------+------------
Leader/master/6.0 | dc1 | 10.200.180.44 | 2018-05-17 00:45:02.971000+0000 |
10.200.180.44 |
Leader/master/6.0 | dc1 | 10.200.180.49 | 2018-05-17 02:37:07.381000+0000 |
10.200.180.49 |
3. When the lease_metrics_option is enabled, you can examine the acquire, renew, resolve, and disable
operations. Most of the time, these operations should complete in 100 ms or less:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
255
Using DataStax Enterprise advanced functionality
4. If the log warnings and errors do not contain relevant information, edit the logback.xml file and add:
Troubleshooting
Perform these various lease holder troubleshooting activities before you contact DataStax Support.
Verify the workload status
Run the dsetool ring command:
$ dsetool ring
If the replication factor is inadequate or if the replicas are down, the output of the dsetool ring
command contains a warning:
0
10.200.178.232 SearchGraphAnalytics rack1 SearchAnalytics yes
Up Normal 153.04 KiB ? -9223372036854775808
0.00
10.200.178.230 SearchGraphAnalytics rack1 SearchAnalytics(SM) yes
Up Normal 92.98 KiB ? 0
0.000
If the automatic Job Tracker or Spark Master election fails, verify that an appropriate replication factor
is set for the dse_leases keyspace.
Use cqlsh commands to verify the replication factor of the analytics keyspaces
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
256
Using DataStax Enterprise advanced functionality
• SPARK_WORKER_DIR/worker-n/application_id/executor_id/stderr
• SPARK_WORKER_DIR/worker-n/application_id/executor_id/stdout
2. If you want to enable rolling logging for Spark executors, add the following options to spark-daemon-
defaults.conf.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
257
Using DataStax Enterprise advanced functionality
Enable rolling logging with 3 log files retained before deletion. The log files are broken up by size with a
maximum size of 50,000 bytes.
spark.executor.logs.rolling.maxRetainedFiles 3
spark.executor.logs.rolling.strategy size
spark.executor.logs.rolling.maxSize 50000
The default location of the Spark configuration files depends on the type of installation:
When user credentials are specified in plain text on the dse command line, like dse -u username
-p password, the credentials are present in the logs of Spark workers when the driver is run in
cluster mode.
The Spark Master, Spark Worker, executor, and driver logs might include sensitive information.
Sensitive information includes passwords and digest authentication tokens for Kerberos guidelines
mode that are passed in the command line or Spark configuration. DataStax recommends using
only safe communication channels like VPN and SSH to access the Spark user interface.
You can provide authentication credentials in several ways, see Credentials for authentication.
• All simultaneously running applications deployed by a single DSE service user will be run as a single OS
user.
• Applications deployed by different DSE service users will be run by different OS users.
• All applications will be run as a different OS user than the DSE service user.
This allows you to prevent an application from accessing DSE server private files, and prevent one application
from accessing the private files of another application.
How the run_as process runner works
DSE uses sudo to run Spark applications components (drivers and executors) as specific OS users. DSE
doesn't link a DSE service user with a particular OS user. Instead, a configurable number of spare user
accounts or slots are used. When a request to run an executor or a driver is received, DSE finds an unused
slot, and locks it for that application. Until the application is finished, all of that application's processes run as
that slot user. When the application completes, the slot user will be released and will be available to other
applications.
Since the number of slots is limited, a single slot is shared among all the simultaneously running applications
run by the same DSE service user. Such a slot is released once all the applications of that user are removed.
When there is not enough slots to run an application, an error is logged and DSE will try to run the executor or
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
258
Using DataStax Enterprise advanced functionality
driver on a different node. DSE does not limit the number of slots you can configure. If you need to run more
applications simultaneously, create more slot users.
Slots assignment is done on a per node basis. Executors of a single application may run as different slot users
on different DSE nodes. When DSE is run on a fat node, different DSE instances running within the same OS
should be configured with different sets of slot users. If they use the same slot users, a single OS user may run
the applications of two different DSE service users.
When a slot is released, all directories which are normally managed by Spark for the application are removed.
If the application doesn't finish, but all executors are done on a node, and a slot user is about to be released,
all the application files are modified so that their ownership is changed to the DSE service user with owner-
only permission. When a new executor for this application is run on this node, the application files are
reassigned back to the slot user assigned to that application.
Configuring the run_as process runner
The administrator needs to prepare slot users in the OS before configuring DSE. The run_as process runner
requires:
• Each slot user has its own primary group, which name is the same as the name of slot user. This is
typically the default behaviour of the OS. For example, the slot1 user's primary group is slot1.
• The DSE service user is a member of each slot's primary group. For example, if the DSE service user is
cassandra, the cassandra user is a member of the slot1 group.
• The DSE service user is a member of a group with the same name as the service user. For example, if
the DSE service user is cassandra, the cassandra user is a member of the cassandra group.
• sudo is configured so that the DSE service user can execute any command as any slot user without
providing a password.
Override the umask setting to 007 for slot users so that files created by sub-processes will not be accessible by
anyone else by default, and DSE configuration files are not visible to slot users.
You may further secure the DSE server environment by modifying the OS's limits.conf file to set exact disk
space quotas for each slot user.
After adding the slot users and groups and configuring the OS, modify the dse.yaml file. In the
spark_process_runner section enable the run_as process runner and set the list of slot users on each node.
spark_process_runner:
# Allowed options are: default, run_as
runner_type: run_as
run_as_runner_options:
user_slots:
- slot1
- slot2
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
259
Using DataStax Enterprise advanced functionality
3. Make sure the DSE service user is a member of a group with the same name as the service user. For
example, if the DSE service user is cassandra:
$ groups cassandra
cassandra : cassandra
4. Log out and back in again to make the group changes take effect.
6. Modify dse.yaml to enable the run_as process runner and add the new runners.
# Configure the way how the driver and executor processes are created and managed.
spark_process_runner:
# Allowed options are: default, run_as
runner_type: run_as
# RunAs runner uses sudo to start Spark drivers and executors. A set of
predefined fake users, called slots, is used
# for this purpose. All drivers and executors owned by some DSE user are run as
some slot user x. At the same time
# drivers and executors of any other DSE user use different slots.
run_as_runner_options:
user_slots:
- slot1
- slot2
2. On each node in the cluster, edit the spark-defaults.conf file to enable event logging and specify the
directory for event logs:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
260
Using DataStax Enterprise advanced functionality
3. Start the Spark history server on one of the nodes in the cluster:
The Spark history server is a front-end application that displays logging data from all nodes in the
Spark cluster. It can be started from any node in the cluster.
If you've enabled authentication set the authentication method and credentials in a properties file and
pass it to the dse command. For example, for basic authentication:
spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.username=role name
spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.password=password
If you set the event log location in spark-defaults.conf, set the spark.history.fs.logDirectory
property in your properties file.
spark.history.fs.logDirectory=dsefs:///spark/events
If you specify a properties file, none of the configuration in spark-defaults.conf is used. The
properties file should contain all the required configuration properties.
The history server is started and can be viewed by opening a browser to http://node
hostname:18080.
The Spark Master web UI does not show the historical logs. To work around this known issue,
access the history from port 18080.
4. When event logging is enabled, the default behavior is for all logs to be saved, which causes the storage
to grow over time. To enable automated cleanup edit spark-defaults.conf and edit the following
options:
spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.interval 1d
spark.history.fs.cleaner.maxAge 7d
For these settings, automated cleanup is enabled, the cleanup is performed daily, and logs older than
seven days are deleted.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
261
Using DataStax Enterprise advanced functionality
You pass settings for Spark, Spark Shell, and other DataStax Enterprise Spark built-in applications using the
intermediate application spark-submit, described in Spark documentation.
Configuring the Spark shell
Pass Spark configuration arguments using the following syntax:
[--help] [--verbose]
[--conf name=spark.value|sparkproperties.conf]
[--executor-memory memory]
[--jars additional-jars]
[--master dse://?appReconnectionTimeoutSeconds=secs]
[--properties-file path_to_properties_file]
[--total-executor-cores cores]
--conf name=spark.value|sparkproperties.conf
An arbitrary Spark option to the Spark configuration prefixed by spark.
• name-spark.value
• sparkproperties.conf - a configuration
--executor-memory mem
The amount of memory that each executor can consume for the application. Spark uses a 512 MB
default. Specify the memory argument in JVM format using the k, m, or g suffix.
--help
Shows a help message that displays all options except DataStax Enterprise Spark shell options.
--jars path_to_additional_jars
A comma-separated list of paths to additional JAR files.
--properties-file path_to_properties_file
The location of the properties file that has the configuration settings. By default, Spark loads the
settings from spark-defaults.conf.
--total-executor-cores cores
The total number of cores the application uses.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.
Spark shell application arguments:
-i app_script_file
Spark shell application argument that runs a script from the specified file.
Configuring Spark applications
You pass the Spark submission arguments using the following syntax:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
262
Using DataStax Enterprise advanced functionality
--files files
A comma-separated list of files that are distributed among the executors and available for the
application.
In general, Spark submission arguments are translated into system properties -Dname=value and other VM
parameters like classpath. The application arguments are passed directly to the application.
Property list
When you run dse spark-submit on a node in your Analytics cluster, all the following properties are set
automatically, and the Spark Master is automatically detected. Only set the following properties if you need to
override the automatically managed properties.
spark.cassandra.connection.native.port
Default = 9042. Port for native client protocol connections.
spark.cassandra.connection.rpc.port
Default = 9160. Port for thrift connections.
spark.cassandra.connection.host
The host name or IP address to which the Thrift RPC service and native transport is bound.
The native_transport_address property in the cassandra.yaml, which is localhost by default,
determines the default value of this property.
You can explicitly set the Spark Master address using the --master master address parameter to dse spark-
submit.
Read properties
spark.cassandra.input.split.size
Default = 100000. Approximate number of rows in a single Spark partition. The higher the value, the
fewer Spark tasks are created. Increasing the value too much may limit the parallelism level.
spark.cassandra.input.fetch.size_in_rows
Default = 1000. Number of rows being fetched per round-trip to the database. Increasing this value
increases memory consumption. Decreasing the value increases the number of round-trips. In earlier
releases, this property was spark.cassandra.input.page.row.size.
spark.cassandra.input.consistency.level
Default = LOCAL_ONE. Consistency level to use when reading.
Write properties
You can set the following properties in SparkConf to fine tune the saving process.
spark.cassandra.output.batch.size.bytes
Default = 1024. Maximum total size of a single batch in bytes.
spark.cassandra.output.consistency.level
Default = LOCAL_QUORUM. Consistency level to use when writing.
spark.cassandra.output.concurrent.writes
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
263
Using DataStax Enterprise advanced functionality
• Make sure all keyspaces in the DC1 datacenter use NetworkTopologyStrategy. If necessary, alter the
keyspace.
• Add nodes to a new datacenter named DC2, then enable Analytics on those nodes.
• Configure the dse_leases and dse_analytics keyspaces to replicate to both DC1 and DC2. For example:
• When submitting Spark applications specify the --master URL with the name or IP address of a node in
the DC2 datacenter, and set the spark.cassandra.connection.local_dc configuration option to DC1.
Accessing an external DSE transactional cluster from a DSE Analytics Solo cluster
To access an external DSE transactional cluster, explicitly set the connection to the transactional cluster when
creating RDDs or Datasets within the application.
In the following examples, the external DSE transactional cluster has a node running on 10.10.0.2.
To create an RDD from the transactional cluster's data:
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql._
import org.apache.spark.SparkContext
val rddFromTransactionalCluster = {
// Sets connectorToTransactionalCluster as default connection for everything in this
code block
implicit val c = connectorToTransactionalCluster
// get the data from the test.words table
sc.cassandraTable("test","words")
}
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
264
Using DataStax Enterprise advanced functionality
import org.apache.spark.sql.cassandra._
import com.datastax.spark.connector.cql.CassandraConnectorConf
val df = spark
.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table" -> "words", "keyspace" -> "test"))
.load()
When you submit the application to the DSE Analytics Solo cluster, it will retrieve the data from the external
DSE transactional cluster.
Spark JVMs and memory management
Spark jobs running on DataStax Enterprise are divided among several different JVM processes, each with
different memory requirements.
DataStax Enterprise and Spark Master JVMs
The Spark Master runs in the same process as DataStax Enterprise, but its memory usage is negligible. The
only way Spark could cause an OutOfMemoryError in DataStax Enterprise is indirectly by executing queries
that fill the client request queue. For example, if it ran a query with a high limit and paging was disabled or it
used a very large batch to update or insert data in a table. This is controlled by MAX_HEAP_SIZE in cassandra-
env.sh. If you see an OutOfMemoryError in system.log, you should treat it as a standard OutOfMemoryError
and follow the usual troubleshooting steps.
Spark executor JVMs
The Spark executor is where Spark performs transformations and actions on the RDDs and is usually
where a Spark-related OutOfMemoryError would occur. An OutOfMemoryError in an executor will show
up in the stderr log for the currently executing application (usually in /var/lib/spark). There are several
configuration settings that control executor memory and they interact in complicated ways.
• spark.executor.memory is a system property that controls how much executor memory a specific
application gets. It must be less than or equal to the calculated value of memory_total. It can be specified
in the constructor for the SparkContext in the driver application, or via --conf spark.executor.memory
or --executor-memory command line options when submitting the job using spark-submit.
• SPARK_DRIVER_MEMORY in spark-env.sh
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
265
Using DataStax Enterprise advanced functionality
Spark Streaming applications require synchronized clocks to operate correctly. See Synchronize clocks.
The following Scala example demonstrates how to connect to a text input stream at a particular IP address
and port, count the words in the stream, and save the results to the database.
import org.apache.spark.streaming._
2. Create a new StreamingContext object based on an existing SparkConf configuration object, specifying
the interval in which streaming data will be divided into batches by passing in a batch duration.
Spark allows you to specify the batch duration in milliseconds, seconds, and minutes.
3. Import the database-specific functions for StreamingContext, DStream, and RDD objects.
import com.datastax.spark.connector.streaming._
4. Create the DStream object that will connect to the IP and port of the service providing the data stream.
5. Count the words in each batch and save the data to the table.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
266
Using DataStax Enterprise advanced functionality
ssc.start()
ssc.awaitTermination()
In the following example, you start a service using the nc utility that repeats strings, then consume the
output of that service using Spark Streaming.
Using cqlsh, start by creating a target keyspace and table for streaming to write into.
$ nc -lk 9999 one two two three three three four four four four someword
$ dse spark
import org.apache.spark.streaming._
import com.datastax.spark.connector.streaming._
Using cqlsh connect to the streaming_test keyspace and run a query to show the results.
$ cqlsh -k streaming_test
word | count
---------+-------
three | 3
one | 1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
267
Using DataStax Enterprise advanced functionality
two | 2
four | 4
someword | 1
What's next:
Run the http_receiver demo. See the Spark Streaming Programming Guide for more information, API
documentation, and examples on Spark Streaming.
Creating a Spark Structured Streaming sink using DSE
Spark Structured Streaming is a high-level API for streaming applications. DSE supports Structured
Streaming for storing data into DSE.
The following Scala example shows how to store data from a streaming source to DSE using the
cassandraFormat method.
This example sets the OutputMode to Update, described in the Spark API documentation.
The cassandraFormat method is equivalent to calling the format method and in
org.apache.spark.sql.cassandra.
Any tables you create or destroy, and any table data you delete, in a Spark SQL session will not be
reflected in the underlying DSE database, but only in that session's metastore.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
268
Using DataStax Enterprise advanced functionality
$ dse spark-sql
The Spark SQL shell in DSE automatically creates a Spark session and connects to the Spark SQL Thrift
server to handle the underlying JDBC connections.
If the schema changes in the underlying database table during a Spark SQL session (for example, a column
was added using CQL), drop the table and then refresh the metastore to continue querying the table with the
correct schema.
Queries to a table whose schema has been modified cause a runtime exception.
Spark SQL limitations
• You cannot load data from one file system to a table in a different file system.
CREATE TABLE IF NOT EXISTS test (id INT, color STRING) PARTITIONED BY (ds STRING);
LOAD DATA INPATH 'hdfs2://localhost/colors.txt' OVERWRITE INTO TABLE test PARTITION
(ds ='2008-08-15');
The first line creates a table on the default file system. The second line attempts to load data into that
table from a path on a different file system, and will fail.
$ dse spark
2. Use the sql method to pass in the query, storing the result in a variable.
results.show()
+--------------------+-----------+
| id|description|
+--------------------+-----------+
|de2d0de1-4d70-11e...| thing|
|db7e4191-4d70-11e...| another|
|d576ad50-4d70-11e...|yet another|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
269
Using DataStax Enterprise advanced functionality
+--------------------+-----------+
After the Spark session instance is created, you can use it to create a DataFrame instance from the query.
Queries are executed by calling the SparkSession.sql method.
employees.collect();
If you have properties that are spelled the same but with different capitalizations (for example, id and Id),
start Spark SQL with the --conf spark.sql.caseSensitive=true option.
Prerequisites:
Start your cluster with both Graph and Spark enabled.
$ dse spark-sql
USE dse_graph;
SELECT * FROM gods_vertices where name = 'Zeus';
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
270
Using DataStax Enterprise advanced functionality
Vertices are identified by id columns. Edge tables have src and dst columns that identify the from
and to vertices, respectively. A join can be used to traverse the graph. For example to find all vertex
ids that are reached by the out edges:
What's next: The same steps work from the Spark shell using spark.sql() to run the query statements, or
using the JDBC/ODBC driver and the Spark SQL Thrift Server.
Using Spark predicate push down in Spark SQL queries
Spark predicate push down to database allows for better optimized Spark queries. A predicate is a condition
on a query that returns true or false, typically located in the WHERE clause. A predicate push down filters
the data in the database query, reducing the number of entries retrieved from the database and improving
query performance. By default the Spark Dataset API will automatically push down valid WHERE clauses to the
database.
You can also use predicate push down on DSE Search indices within SearchAnalytics data centers.
Restrictions on column filters
Partition key columns can be pushed down as long as:
Clustering key columns can be pushed down with the following rules:
• Only the last predicate in the filter can be a non equivalence predicate.
• If there is more than one predicate for a column, the predicates cannot be equivalence predicates.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
271
Using DataStax Enterprise advanced functionality
INSERT INTO words (user, word, count ) VALUES ( 'Zebra', 'zed', 100 );
Then create a Spark Dataset in the Spark console using that table and look for PushedFilters in the output
after issuing the EXPLAIN command:
== Physical Plan ==
*Scan org.apache.spark.sql.cassandra.CassandraSourceRelation [user#0,word#1,count#2]
ReadSchema: struct<user:string,word:string,count:int>
Because this query doesn't filter on columns capable of being pushed down, there are no PushedFilters in
the physical plan.
Adding a filter, however, does change the physical plan to include PushedFilters:
== Physical Plan ==
*Scan org.apache.spark.sql.cassandra.CassandraSourceRelation
[user#0,word#1,count#2] PushedFilters: [*GreaterThan(word,ham)], ReadSchema:
struct<user:string,word:string,count:int>
The PushedFilters section of the physical plan includes the GreaterThan push down filter. The asterisk
indicates that push down filter will be handled only at the datasource level.
Troubleshooting predicate push down
When creating Spark SQL queries that use comparison operators, making sure the predicates are pushed
down to the database correctly is critical to retrieving the correct data with the best performance.
For example, given a CQL table with the following schema:
Suppose you want to write a query that selects all entries where the birthday is earlier than a given date:
== Physical Plan ==
*Filter (cast(birthday#1 as string) < 2001-1-1)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
272
Using DataStax Enterprise advanced functionality
+- *Scan org.apache.spark.sql.cassandra.CassandraSourceRelation
[year#0,birthday#1,userid#2,likes#3,name#4] ReadSchema:
struct<year:int,birthday:timestamp,userid:string,likes:string,name:string>
Time taken: 0.72 seconds, Fetched 1 row(s)
Note that the Filter directive is treating the birthday column, a CQL TIMESTAMP, as a string. The query
optimizer looks at this comparison and needs to make the types match before generating a predicate. In
this case the optimizer decides to cast the birthday column as a string to match the string '2001-1-1',
but cast functions cannot be pushed down. The predicate isn't pushed down, and it doesn't appear in
PushedFilters. A full table scan will be performed at the database layer, with the results returned to Spark
for further processing.
To push down the correct predicate for this query, use the cast method to specify that the predicate is
comparing the birthday column to a TIMESTAMP, so the types match and the optimizer can generate the
correct predicate.
== Physical Plan ==
*Scan org.apache.spark.sql.cassandra.CassandraSourceRelation
[year#0,birthday#1,userid#2,likes#3,name#4]
PushedFilters: [*LessThan(birthday,2001-01-01 00:00:00.0)],
ReadSchema: struct<year:int,birthday:timestamp,userid:string,likes:string,name:string>
Time taken: 0.034 seconds, Fetched 1 row(s)
Note the PushedFilters indicating that the LessThan predicate will be pushed down for the column data in
birthday. This should speed up the query as a full table scan will be avoided.
SELECT statement
FROM statement
[JOIN | INNER JOIN | LEFT JOIN | LEFT SEMI JOIN | LEFT OUTER JOIN | RIGHT JOIN | RIGHT
OUTER JOIN | FULL JOIN | FULL OUTER JOIN]
ON join condition
SELECT statement 1
[UNION | UNION ALL | UNION DISTINCT | INTERSECT | EXCEPT]
SELECT statement 2
Select queries run on new columns return '', or empty results, instead of None.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
273
Using DataStax Enterprise advanced functionality
You can remove a table from the cache using a UNCACHE TABLE query.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
274
Using DataStax Enterprise advanced functionality
UPPER
LOWER
REGEXP
ORDER
OUTER
RIGHT
SELECT
SEMI
STRING
SUM
TABLE
TIMESTAMP
TRUE
UNCACHE
UNION
WHERE
INTERSECT
EXCEPT
SUBSTR
SUBSTRING
SQRT
ABS
Inserting data into tables with static columns using Spark SQL
Static columns are mapped to different columns in Spark SQL and require special handling. Spark SQL Thrift
servers use Hive. When you when run an insert query, you must pass data to those columns.
To work around the different columns, set cql3.output.query in the insertion Hive table properties to
limit the columns that are being inserted. In Spark SQL, alter the external table to configure the prepared
statement as the value of the Hive CQL output query. For example, this prepared statement takes values that
are inserted into columns a and b in mytable and maps these values to columns b and a, respectively, for
insertion into the new row.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
275
Using DataStax Enterprise advanced functionality
$ bin/dse spark
2. Use the provided HiveContext instance sqlContext to create a new query in HiveQL by calling the sql
method on the sqlContext object..
$ dse pyspark
table1 = spark.read.format("org.apache.spark.sql.cassandra")
.options(table="kv", keyspace="ks")
.load()
table1.write.format("org.apache.spark.sql.cassandra")
.options(table="othertable", keyspace = "ks")
.save(mode ="append")
Using the DSE Spark console, the following Scala example shows how to create a DataFrame object from
one table and save it to another.
$ dse spark
The write operation uses one of the helper methods, cassandraFormat, included in the Spark Cassandra
Connector. This is a simplified way of setting the format and options for a standard DataFrame operation. The
following command is equivalent to write operation using cassandraFormat:
table1.write.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "othertable", "keyspace" -> "test"))
.save()
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
276
Using DataStax Enterprise advanced functionality
another node. Both AlwaysOn SQL and the Spark SQL Thriftserver provide JDBC and ODBC interfaces to
DSE, and share many configuration settings.
1. If you are using Kerberos authentication, in the hive-site.xml file, configure your authentication
credentials for the Spark SQL Thrift server.
<property>
<name>hive.server2.authentication.kerberos.principal</name>
<value>thriftserver/_HOST@EXAMPLE.COM</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.keytab</name>
<value>/etc/dse/dse.keytab</value>
</property>
Ensure that you use the hive-site.xml file in the Spark directory:
3. Start the server by entering the dse spark-sql-thriftserver start command as a user with
permissions to write to the Spark directories.
To override the default settings for the server, pass in the configuration property using the --hiveconf
option. See the HiveServer2 documentation for a complete list of configuration properties.
By default, the server listens on port 10000 on the localhost interface on the node from which it was
started. You can specify the server to start on a specific port. For example, to start the server on port
10001, use the --hiveconf hive.server2.thrift.port=10001 option.
You can configure the port and bind address permanently in resources/spark/conf/spark-env.sh:
You can specify general Spark configuration settings by using the --conf option.
4. Use DataFrames to read and write large volumes of data. For example, to create the table_a_cass_df
table that uses a DataFrame while referencing table_a:
With DataFrames, compatibility issues exist with UUID and Inet types when inserting data with the
JDBC driver.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
277
Using DataStax Enterprise advanced functionality
5. Use the Spark Cassandra Connector tuning parameters to optimize reads and writes.
What's next:
You can now connect your application by using the Simba JDBC driver to the server at the URI:
jdbc:hive2://hostname:port number, using the Simba ODBC driver or use dse beeline.
Starting SparkR
Start the SparkR shell using the dse command to automatically set the Spark session within R.
$ dse sparkR
Lifecycle Manager allows you to enable and configure AlwaysOn SQL in managed clusters.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
278
Using DataStax Enterprise advanced functionality
When AlwaysOn SQL is enabled within an Analytics datacenter, all nodes within the datacenter must have
AlwaysOn SQL enabled. Use dsetool ring to find which nodes in the datacenter are Analytics nodes.
AlwaysOn SQL is not supported when using DSE Multi-Instance or other deployments with multiple DSE
instances on the same server.
The dse client-tool alwayson-sql command controls the server. The command works on the local
datacenter unless you specify the datacenter with the --dc option:
• reserve_port_wait_time_ms
• alwayson_sql_status_check_wait_time_ms
• log_dsefs_dir
• runner_max_errors
Changing other options requires a restart, except for the enabled option. Enabling or disabling AlwaysOn
SQL requires restarting DSE.
The spark-alwayson-sql.conf file contains Spark and Hive settings as properties. When AlwaysOn SQL is
started, spark-alwayson-sql.conf is scanned for Spark properties, similar to other Spark applications started
with dse spark-submit. Properties that begin with spark.hive are submitted as properties using --hiveconf,
removing the spark. prefix.
For example, if spark-alwayson-sql.conf has the following setting:
spark.hive.server2.table.type.mapping CLASSIC
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
279
Using DataStax Enterprise advanced functionality
Under spark.master set the Spark URI to the connect to the DSE Analytics Solo datacenter.
spark.master=dse://?connection.local_dc=dc1
spark.cassandra.connection.local_dc=dc0
To start the server on a specific datacenter, specify the datacenter name with the --dc option:
You can also view the status in a web browser by going to http://node name or IP address:AlwaysOn SQL
web UI port. By default, the port is 9077. For example, if 10.10.10.1 is the IP address of an Analytics node with
AlwaysOn SQL enabled, navigate to http://10.10.10.1:9077.
The returned status is one of:
• STOPPED_AUTO_RESTART: the server is being started but is not yet ready to accept client requests.
• STOPPED_MANUAL_RESTART: the server was stopped with either a stop or restart command. If the server
was issued a restart command, the status will be changed to STOPPED_AUTO_RESTART as the server
starts again.
• STARTING: the server is actively starting up but is not yet ready to accept client requests.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
280
Using DataStax Enterprise advanced functionality
The temporary cache table is only valid for the session in which it was created, and will not be recreated on
server restart.
Create a permanent cache table using the CREATE CACHE TABLE directive and a SELECT query:
The table cache can be destroyed using the UNCACHE TABLE and CLEAR CACHE directives.
CLEAR CACHE;
Issuing DROP TABLE will remove all metadata including the table cache.
Enabling SSL for AlwaysOn SQL
Communication between the driver and AlwaysOn SQL can be encrypted using SSL.
The following instructions give an example of how to set up SSL with a self-signed keystore and truststore.
2. If the SSL keystore and truststore used for AlwaysOn SQL differ from the keystore and truststore
configured in cassandra.yaml, add the required settings to enable SSL to the hive-site.xml configuration
file.
By default the SSL settings in cassandra.yaml will be used with AlwaysOn SQL.
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hostname</value>
</property>
<property>
<name>hive.server2.use.SSL</name>
<value>true</value>
</property>
<property>
<name>hive.server2.keystore.path</name>
<value>path to keystore/keystore.jks</value>
</property>
<property>
<name>hive.server2.keystore.password</name>
<value>keystore password</value>
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
281
Using DataStax Enterprise advanced functionality
</property>
Changes in the hive-site.xml configuration file only require a restart of AlwaysOn SQL service,
not DSE.
$ dse beeline
jdbc:spark://hostname:10000/default;SSL=1;SSLTrustStore=path to truststore/
truststore.jks;SSLTrustStorePwd=truststore password
DSE supports multiple authentication mechanisms, but AlwaysOn SQL only supports one mechanism per
datacenter.
AlwaysOn SQL supports DSE proxy authentication. The user who executes the queries is the user who
authenticated using JDBC. If AlwaysOn SQL was started by user Amy, and then Bob begins a JDBC session,
the queries are executed by Amy on behalf of Bob. Amy must have permissions to execute these queries on
behalf of Bob.
To enable authentication in AlwaysOn SQL alwayson_sql_options, follow these steps.
1. Create the auth_user role specified in AlwaysOn SQL options and grant the following permissions to the
role.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
282
Using DataStax Enterprise advanced functionality
If you use Kerberos, set up a role that matches the full Kerberos principal name for each user.
4. Allow the AlwaysOn SQL role (auth_user) to execute commands with the user role.
For internal roles:
GRANT PROXY.EXECUTE
ON ROLE 'user_name'
TO alwayson_sql;
GRANT PROXY.EXECUTE
ON ROLE 'user_name/example.com@EXAMPLE.COM'
TO alwayson_sql;
• If Kerberos authentication is to be used, Kerberos does not need to be enabled in DSE. AlwaysOn
SQL must have its own service principal and keytab.
• The user must have login permissions in DSE in order to login through JDBC to AlwaysOn SQL.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
283
Using DataStax Enterprise advanced functionality
This example shows how to enable Kerberos authentication. Modify the Kerberos domain and path to the
keytab file.
<!-- Start of: configuration for authenticating JDBC users with Kerberos -->
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>KERBEROS</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.principal</name>
<value>hiveserver2/_HOST@KERBEROS DOMAIN</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.keytab</name>
<value>path to hiveserver2.keytab</value>
</property>
<!-- End of: configuration for authenticating JDBC users with Kerberos -->
7. Modify the owner of the /spark and /tmp/hive directories in DSEFS so the new role can write to the log
and temp files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
284
Using DataStax Enterprise advanced functionality
$ dse beeline
3. Connect to the server using the JDBC URI for your server.
This will generate the byos.properties file in your home directory. See dse client-tool for more
information on the options for dse client-tool.
What's next:
The byos.properties file can be copied to a node in the external Spark cluster and used with the Spark shell,
as described in Connecting to DataStax Enterprise using the Spark shell on an external Spark cluster.
Connecting to DataStax Enterprise using the Spark shell on an external Spark cluster
Use the generated byos.properties configuration file and the byos-version.jar from a DataStax Enterprise
node to connect to the DataStax Enterprise cluster from the Spark shell on an external Spark cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
285
Using DataStax Enterprise advanced functionality
Prerequisites:
You must generate the byos.properties on a node in your DataStax Enterprise cluster.
1. Copy the byos.properties file you previously generated from the DataStax Enterprise node to the local
Spark node.
$ scp user@dsenode1.example.com:~/byos.properties .
If you are using Kerberos authentication, specify the --generate-token and --token-renewer
<username> options when generating byos.properties, as described in dse client-tool configuration
byos-export.
2. Copy the byos-version.jar file from the clients directory from a node in your DataStax Enterprise cluster
to the local Spark node.
The byos-version.jar file location depends on the type of installation.
$ scp user@dsenode1.example.com:/usr/share/dse/clients/dse-byos_2.11-6.0.2.jar
byos-6.0.jar
4. If you are using Kerberos authentication, set up a CRON job or other task scheduler to periodically call
dse client-tool cassandra renew-token <token> where <token> is the encoded token string in
byos.properties.
5. Start the Spark shell using the byos.properties and byos-version.jar file.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
286
Using DataStax Enterprise advanced functionality
2. Login as the hive user on the Spark SQL Thrift Server host.
4. Merge the existing Spark SQL Thrift Server configuration properties with the generated BYOS
configuration file into a new file.
$ cat /usr/hdp/current/spark-thriftserver/conf/spark-thrift-sparkconf.conf
byos.properties > custom-sparkconf.conf
5. Start Spark SQL Thrift Server with the custom configuration file and byos-version.jar.
$ beeline -u 'jdbc:hive2://hostname:port/default;principal=hive/_HOST@REALM'
What's next:
Generated SQL schema files can be passed to beeline with the -f option to generate a mapping for DSE
tables so both Hadoop and DataStax Enterprise tables will be available through the service for queries.
Using the Spark Jobserver
DataStax Enterprise includes a bundled copy of the open-source Spark Jobserver, an optional component
for submitting and managing Spark jobs, Spark contexts, and JARs on DSE Analytics clusters. Refer to the
Components in the release notes to find the version of the Spark Jobserver included in this version of DSE.
Valid spark-submit options are supported and can be applied to the Spark Jobserver. To use the Jobserver:
The default location of the Spark Jobserver depends on the type of installation:
All the uploaded JARs, temporary files, and log files are created in the user's $HOME/.spark-jobserver
directory, first created when starting Spark Jobserver.
Beneficial use cases for the Spark Jobserver include sharing cached data, repeated queries of cached data,
and faster job starts.
Running multiple SparkContext instances in a single JVM is not recommended. Therefore it is not
recommended to create a new SparkContext for each submitted job in a single Spark Jobserver instance.
We recommend one of the two following Spark Jobserver usages.
• Context per JVM: each job has it's own SparkContext in a separate JVM.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
287
Using DataStax Enterprise advanced functionality
By default, the H2 database is used for storing Spark Jobserver related metadata. In this setup, using
Context per JVM requires additional configuration. See the Spark Jobserver docs for details.
In Context per JVM mode, job results must not contain instances of classes that are not present in the
Spark Jobserver classpath. Problems with returning unknown (to server) types can be recognized by
following log line:
For an example of how to create and submit an application through the Spark Jobserver, see the spark-
jobserver demo included with DSE.
The default location of the demos directory depends on the type of installation:
spray.can.server {
ssl-encryption = on
keystore = "path to keystore"
keystorePW = "keystore password"
}
The default location of the Spark Jobserver depends on the type of installation:
• File data blocks are stored locally on each node and are replicated onto multiples nodes.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
288
Using DataStax Enterprise advanced functionality
The redundancy factor is set at the DSEFS directory or file level, which is more granular than the
replication factor that is set at the keyspace level in the database.
For performance on production clusters, store the DSEFS data on physical devices that are separate from
the database. For development and testing you may store DSEFS data on the same physical device as the
database.
Deployment overview
• The DSEFS server runs in the same JVM as DataStax Enterprise. Similar to the database, there is no
master node. All nodes running DSEFS are equal.
• A single DSEFS cannot span multiple datacenters. To deploy DSEFS in multiple datacenters, you can
create a separate instance of DSEFS for each datacenter.
• You can use different keyspaces to configure multiple DSEFS file systems in a single datacenter.
• For optimal performance, locate the local DSEFS data on a different physical drive than the database.
• Encryption is not supported. Use operating system access controls to protect the local DSEFS data
directories. Other limitations apply.
• DSEFS uses the LOCAL_QUORUM consistency level to store file metadata. DSEFS will always try to write
each data block to replicated node locations, and even if a write fails, it will retry to another node before
acknowledging the write. DSEFS writes are very similar to the ALL consistency level, but with additional
failover to provide high-availability. DSEFS reads are similar to the ONE consistency level.
Enabling DSEFS
DSEFS is automatically enabled on analytics nodes, and disabled on non-analytics nodes. You can enable the
DSEFS service on any node in a DataStax Enterprise cluster. Nodes within the same datacenter with DSEFS
enabled will join together to behave as a DSEFS cluster.
On each node:
1. In the dse.yaml file, set the properties for the DSE File System options:
dsefs_options:
enabled:
keyspace_name: dsefs
work_dir: /var/lib/dsefs
public_port: 5598
private_port: 5599
data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
min_free_space: 5368709120
a. Enable DSEFS:
enabled: true
If enabled is blank or commented out, DSEFS starts only if the node is configured to run analytics
workloads.
keyspace_name: dsefs
You can optionally configure multiple DSEFS file systems in a single datacenter.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
289
Using DataStax Enterprise advanced functionality
c. Define the work directory for storing the DSEFS metadata for the local node. The work directory
should not be shared with other DSEFS nodes:
work_dir: /var/lib/dsefs
public_port: 5598
DataStax recommends that all nodes in the cluster have the same value. Firewalls must open
this port to trusted clients. The service on this port is bound to the native_transport_address.
private_port: 5599
Do not open this port to firewalls; this private port must be not visible from outside of the
cluster.
f. Set the data directories where the file data blocks are stored locally on each node.
data_directories:
- dir: /var/lib/dsefs/data
If you use the default /var/lib/dsefs/data data directory, verify that the directory exists and
that you have root access. Otherwise, you can define your own directory location, change the
ownership of the directory, or both:
Ensure that the data directory is writeable by the DataStax Enterprise user. Put the data
directories on different physical devices than the database. Using multiple data directories on
JBOD improves performance and capacity.
g. For each data directory, set the weighting factor to specify how much data to place in this directory,
relative to other directories in the cluster. This soft constraint determines how DSEFS distributes
the data. For example, a directory with a value of 3.0 receives about three times more data than a
directory with a value of 1.0.
data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
h. For each data directory, define the reserved space, in bytes, to not use for storing file data blocks.
See min_free_space.
data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
min_free_space: 5368709120
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
290
Using DataStax Enterprise advanced functionality
4. With guidance from DataStax Support, you can tune advanced DSEFS properties:
# service_startup_timeout_ms: 30000
# service_close_timeout_ms: 600000
# server_close_timeout_ms: 2147483647 # Integer.MAX_VALUE
# compression_frame_max_size: 1048576
# query_cache_size: 2048
# query_cache_expire_after_ms: 2000
# gossip_options:
# round_delay_ms: 2000
# startup_delay_ms: 5000
# shutdown_delay_ms: 10000
# rest_options:
# request_timeout_ms: 330000
# connection_open_timeout_ms: 55000
# client_close_timeout_ms: 60000
# server_request_timeout_ms: 300000
# idle_connection_timeout_ms: 60000
# internode_idle_connection_timeout_ms: 120000
# core_max_concurrent_connections_per_host: 8
# transaction_options:
# transaction_timeout_ms: 3000
# conflict_retry_delay_ms: 200
# conflict_retry_count: 40
# execution_retry_delay_ms: 1000
# execution_retry_count: 3
# block_allocator_options:
# overflow_margin_mb: 1024
# overflow_factor: 1.05
Disabling DSEFS
To disable DSEFS and remove metadata and data:
1. Remove all directories and files from the DSEFS file system:
$ dse fs rm -r filepath
3. Verify that all DSEFS data directories where the file data blocks are stored locally on each node are empty.
These data directories are configured in dse.yaml. Your directories are probably different from this
default data_directories value:
data_directories:
- dir: /var/lib/dsefs/data
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
291
Using DataStax Enterprise advanced functionality
Do not delete the data_directories before removing the dsefs keyspace tables, or removing the
node from the cluster.
Configuring DSEFS
You must configure data replication. You can optionally configure multiple DSEFS file systems in a datacenter,
and perform other functions, including setting the Kafka log retention.
DSEFS does not span datacenters. Create a separate DSEFS instance in each datacenter, as described in the
steps below.
DSEFS limitations
Know these limitations when you configure and tune DSEFS. The following functionality and features are not
supported:
• Encryption.
Use operating system access controls to protect the local DSEFS data directories.
• File system consistency checks (fsck) and file repair have only limited support. Running fsck will re-
replicate blocks that were under-replicated because a node was taken out of a cluster.
• File repair.
• Checksum.
• Automatic backups.
• Multi-datacenter replication.
• Snapshots.
a. Globally: set replication for the metadata in the dsefs keyspace that is stored in the database.
For example, use a CQL statement to configure a replication factor of 3 on the Analytics
datacenter using NetworkTopologyStrategy:
Datacenter names are case-sensitive. Verify the case of the using utility, such as dsetool
status.
c. Locally: set the redundancy factor on a specific DSEFS file or directory where the data blocks are
stored.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
292
Using DataStax Enterprise advanced functionality
When a redundancy factor is not specified, it is inherited from the parent directory. The default
redundancy factor is 3.
2. If you have multiple Analytics datacenters, you must configure each DSEFS file system to replicate within
its own datacenter:
a. In the dse.yaml file, specify a separate DSEFS keyspace for each logical datacenter.
For example, on a cluster with logical datacenters DC1 and DC2.
On each node in DC1:
dsefs_options:
...
keyspace_name: dsefs1
dsefs_options:
...
keyspace_name: dsefs2
On DC2:
For example, in a cluster with multiple datacenters, the keyspace names dsefs1 and dsefs2 define
separate file systems in each datacenter.
3. When bouncing a streaming application, verify the Kafka log configuration (especially
log.retention.check.interval.ms and policies.log.retention.bytes). Ensure the Kafka log
retention policy is robust enough to handle the length of time expected to bring the application and
consumers back up.
For example, if the log retention policy is too conservative and deletes or rolls are logged very
frequently to save disk space, the users are likely to encounter issues when attempting to recover from
a checkpoint that references offsets that are no longer maintained by the Kafka logs.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
293
Using DataStax Enterprise advanced functionality
$ dse fs
For example, to list the file system status and disk space usage in human-readable format:
Optional command arguments are enclosed in square brackets. For example, [dse_auth_credentials] and [-
R]
Variable values are italicized. For example, directory and [subcommand].
Working with the local file system in the DSEFS shell
You can refer to files in the local file system by prefixing paths with file:. For example the following command
will list files in the system root directory:
If you need to perform many subsequent operations on the local file system, first change the current working
directory to file: or any local file system path:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
294
Using DataStax Enterprise advanced functionality
DSEFS shell remembers the last working directory of each file system separately. To go back to the previous
DSEFS directory, enter:
To refer to a path relative to the last working directory of the file system, prefix a relative path with either dsefs:
or file:. The following session will create a directory new_directory in the directory /home/user1:
To copy a file between two different file systems, you can also use the cp command with explicit file system
prefixes in the paths:
Authentication
For dse dse_auth_credentials you can provide user credentials in several ways, see Providing credentials from
DSE tools. For authentication with DSEFS, see DSEFS authentication.
Wildcard support
Some DSEFS commands support wildcard pattern expansion in the path argument. Path arguments containing
wildcards are expanded before method invocation into a set of paths matching the wildcard pattern, then the
given method is invoked for each expanded path.
For example in the following directory tree:
dirA
|--dirB
|--file1
|--file2
Giving the stat dirA/* command would be transparently translated into three invocations: stat dirA/dirB,
stat dirA/file1, and stat dirA/file2.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
295
Using DataStax Enterprise advanced functionality
• * matches any files system entry (file or directory) name, as in the example of stat dirA/*.
• ? matches any single character in the file system entry name. For example stat dirA/dir? matches
dirA/dirB.
• [] matches any characters enclosed within the brackets. For example stat dirA/file[0123] matches
dirA/file1 and dirA/file2.
• {} matches any sequence of characters enclosed within the brackets and separated with ,. For example
stat dirA/{dirB,file2} matches dirA/dirB and dirA/file2.
Forcing synchronization
Before confirming writing a file, DSEFS by default forces all blocks of the file to be written to the storage
devices. This behavior can be controlled with --no-force-sync and --force-fsync flags when creating files
or directories in the DSEFS shell with mkdir, put, and cp commands. The force/no-force behavior is inherited
from the parent directory, if not specified. For example, if a directory is created with --no-force-sync, then all
files are created with --no-force-sync unless --force-fsync is explicitly set during file creation.
Turning off forced synchronization improves latency and performance at a cost of durability. For example,
if a power loss occurs before writing the data to the storage device, you may lose data. Turn off forced
synchronization only if you have a reliable backup power supply in your datacenter and failure of all replicas is
unlikely, or if you can afford losing file data.
The Hadoop SYNC_BLOCK flag has the same effect as --force-sync in DSEFS. The Hadoop LAZY_PERSIST
flag has the same effect as --no-force-sync in DSEFS.
Removing a DSEFS node
When removing a node running DSEFS from a DSE cluster, additional steps are needed to ensure proper
correctness within the DSEFS data set.
Make sure the replication factor for the cluster is greater than ONE before continuing.
1. From a node in the same datacenter as the node to be removed, start the DSEFS shell.
$ dse fs
dsefs > df
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
296
Using DataStax Enterprise advanced functionality
3. Find the node to be removed in the list and note the UUID value for it under the Location column.
4. If the node is up, unmount it from DSEFS with the command umount UUID.
5. If the node is not up (for example, after a hardware failure), force unmount it from DSEFS with the
command umount -f UUID.
6. Run a file system check with the fsck command to make sure all blocks are replicated.
If data was written to a DSEFS node, more nodes were added to the cluster, and the original node was
removed without running fsck, the data in the original node may be permanently lost.
$ dse fs
dsefs > df
3. Find the directory to be removed in the list and note the UUID value for it under the Location column.
5. Run a file system check with the fsck command to make sure all blocks are replicated.
If the file system check results in an IOException, make sure all the nodes in the cluster are running.
Examples
Using the DSEFS shell, these commands put the local bluefile to the remote DSEFS greenfile:
dsefs / > ls -l
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
297
Using DataStax Enterprise advanced functionality
dsefs / > ls -l
Type Permission Owner Group Length Modified Name
Using the dse command, these commands create the test2 directory and upload the local README.md file to the
new DSEFS directory.
You can use two or more dse commands in a single command line. This is faster because the JVM is launched
and connected/disconnected with DSEFS only once. For example:
The following example shows how to use the --no-force-sync flag on a directory, and how to check the state
of the --force-sync flag using stat. These commands are run from within the DSEFS shell.
DSEFS compression
DSEFS is able to compress files to save storage space and bandwidth. Compression is performed by DSE
during upload upon a user’s explicit request. Decompression is transparent. Data is always uncompressed by
the server before it is returned to the client.
Compression is performed within block boundaries. The unit of compression—the chunk of data that gets
compressed individually—is called a frame and its size can be specified during file upload.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
298
Using DataStax Enterprise advanced functionality
Encoders
DSEFS is shipped with the lz4 encoder which works out of the box.
Compression
To compress files use the -c or --compression-encoder parameter for put or cp command. The parameter
specifies the compression encoder to use for the file that is about to get uploaded.
The frame size can optionally be set with the -f, --compression-frame-size option.
The maximum frame size in bytes is set in the compression_frame_max_size option in dse.yaml. If a user
sets the frame size to a value greater than compression_frame_max_size when using put -f an error will be
thrown and the command will fail. Modify the compression_frame_max_size setting based on the available
memory of the node.
Files that are compressed can be appended in the same way as uncompressed files. If the file is compressed
the appended data gets transparently compressed with the file's encoder specified for the initial put operation.
Directories can have a default compression encoder specified during directory creation with the mkdir
command. Newly added files with the put command inherit the default compression encoder from containing
directory. You can override the default compression encoder with the c parameter during put operations.
Decompression
Decompression is performed automatically for all commands that transport data to the client. There is no need
for additional configuration to retrieve the original, decompressed file content.
Storage space
Enabling compression creates a distinction between the logical and physical file size.
The logical size is the size of a file before uploading it to DSEFS, where it is then compressed. The logical size
is shown by the stat command under Size.
The physical size is the actual size of a data stored on the storage device. The physical size is shown by the df
command and by the stat -v command for each block separately, under the Compressed length column.
Limitations
Truncating compressed files is not possible.
DSEFS authentication
DSEFS works with secured DataStax Enterprise clusters.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
299
Using DataStax Enterprise advanced functionality
For related SSL details, see Enabling SSL encryption for DSEFS.
• Set with the dse spark-submit command using one of the credential options described in Providing
credentials on command line.
• Programmatically set the user credentials in the Spark configuration object before the SparkContext is
created:
conf.set("spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.username",
<user>)
conf.set("spark.hadoop.com.datastax.bdp.fs.client.authentication.basic.password",
<pass>)
If a Kerberos authentication token is in use, you do not need to set any properties in the context object. If
you need to explicitly set the token, set the spark.hadoop.cassandra.auth.token property.
• When running the Spark Shell, where the SparkContext is created at startup, set the properties in the
Hadoop configuration object:
sc.hadoopConfiguration.set("com.datastax.bdp.fs.client.authentication.basic.username",
<user>)
sc.hadoopConfiguration.set("com.datastax.bdp.fs.client.authentication.basic.password",
<pass>)
• When running a Spark application or the Spark Shell, provide properties in the spark-defaults.conf
configuration file:
<property>
<name>com.datastax.bdp.fs.client.authentication.basic.username</name>
<value>username</value>
</property>
<property>
<name>com.datastax.bdp.fs.client.authentication.basic.password</name>
<value>password</value>
</property>
Optional: If you want to use this method, but do not have privileges to write to core-default.xml, copy
this file to any location path and set the environment variable to point to the file with:
export HADOOP2_CONF_DIR=path
DSEFS shell
Providing authentication credentials while using the DSEFS shell is as easy as in other DSE tools. The DSEFS
shell supports different authentication methods listed below in priority order. When more than one method
can be used, the one with higher priority is chosen. For example when the DSE_TOKEN environment variable
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
300
Using DataStax Enterprise advanced functionality
is set and the DSEFS shell is started with a username and password set as environment variables in the
$HOME/.dserc file the provided username and password is used for authentication as it has higher priority.
2. Using a Kerberos delegation token. See dse client-tool cassandra for further information.
3. Using a cached Kerberos ticket after authenticating using a tool like kinit.
$ kinit username
DSEFS authorization
DSEFS authorization verifies user and group permissions on files and directories stored in DSEFS.
DSEFS authorization is disabled by default. It requires no configuration, it is automatically enabled along with
DSE authorization.
For related SSL details, see Enabling SSL encryption for DSEFS.
In secured clusters with DSEFS authentication enabled all newly created files and directories are created with
owner set the authenticated user’s username and group set to authenticated user primary role. See the CQL
roles documentation for detailed information on user roles. File and directory permissions can be specified
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
301
Using DataStax Enterprise advanced functionality
during creation as a parameter for the put and mkdir commands. Please use help put or help mkdir for
details.
To change the owner or group of an existing file or directory use chown or chgrp commands. Please use help
chown or help chgrp for details.
DSEFS by default creates directories with rwxr-xr-x (octal 755) permissions and files with rw-r-r- (octal
644). To change the permissions of an existing file or directory use the chmod command. Please use help
chmod for details.
DSEFS superusers
A DSEFS user is a superuser if and only if the user is a database superuser. Superusers are allowed to
read and write every file and directory stored in DSEFS. Only superusers are allowed to execute DSEFS
maintenance operations like fsck and umount.
DSEFS users
User access is verified against:
• Owner permissions if the file or directory owner name is equal to the authenticated user’s username.
• Group permissions if the file or directory group belongs to the authenticated user’s groups. Groups are
mapped from the database's user role names.
Each DSEFS command requires it’s own set of permissions. For a given path a/b/c, c is a leaf and a/b is a
parent path. The following table shows what permissions must be present for the given operation to succeed. R
indicates read, W indicates write, and X indicates execute privileges.
cd a/b/c a/b/c X
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
302
Using DataStax Enterprise advanced functionality
rm a/b/c a/b X WX
If the DSE cluster has authentication enabled, use the curl --location-trusted parameter when the
WebHDFS noredirect parameter is false (the default value).
The README.md has instructions on building and running the demo applications.
Hadoop FileSystem interface implemented by DseFileSystem
The DseFileSystem class has partial support of the Hadoop FileSystem interface.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
303
Using DataStax Enterprise advanced functionality
getURI() #
makeQualified(Path) # default
addDelegationTokens(String, Credentials) #
collectDelegationTokens(...) #
getServerDefaults(Path) # default
resolvePath(Path) # default
createNewFile # default
getReplication(Path) #
rename #
delete(Path) #
delete(Path, boolean) #
deleteOnExit(Path) # default
cancelDeleteOnExit(Path) # default
exists(Path) #
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
304
Using DataStax Enterprise advanced functionality
isDirectory(Path) #
isFile(Path) #
getLength(Path) #
getContentSummary(Path) # default
globStatus # default
listLocatedStatus # default
listStatusIterator # default
listFiles # default
getHomeDirectory() # default
getWorkingDirectory() #
setWorkingDirectory() #
mkdirs #
copyFromLocalFile # default
moveFromLocalFile # default
copyToLocalFile # default
moveToLocalFile # default
startLocalOutput # default
close #
getBlockSize #
getFileStatus(Path) #
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
305
Using DataStax Enterprise advanced functionality
setPermission #
setOwner #
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
306
Using DataStax Enterprise advanced functionality
readOnly
Returns true if the location is in read-only mode.
status
One of the following values: up, down, unavailable:
• If the location is up, the location is fully operational and this node will attempt to read or write from
it.
• If the location is down, the location is on a node that has been gracefully shut down by the
administrator and no reads or writes will be attempted.
• If the location is unavailable, this node has problems communicating with that location, and the
real status is unknown. This node will check the status periodically.
storageWeight
How much data relative to other locations will be stored in this location. This is a static value
configured in dse.yaml
BlockStore
BlockStore metrics report how fast and how much data is being read/written by the data layer of the DSEFS
node. They are reported only for the locations managed by the node to which you connect with JMX. In order
to get metrics information for all the locations in the cluster, you need to indvidually connect to all nodes with
DSEFS.
blocksDeleted
How many blocks are deleted, in blocks per second.
blocksRead
Read accesses in blocks per second.
blocksWritten
Writes in blocks per second.
bytesDeleted
How fast data is removed, in bytes per second.
bytesRead
How fast data is being read, in bytes per second.
bytesWritten
How fast data is written, in bytes per second.
readErrors
The total count and rate of read errors (rate in errors per second).
writeErrors
The total count and rate of write errors (rate in errors per second).
directory
The path to the storage directory of this location.
freeSpace
How much space is left on the device in bytes.
usedSpace
Estimated amount of space used by this location in bytes.
RestServer
RestServer reports metrics related to the communication layer of DSEFS, separately for internode traffic and
clients. Each set of these metrics is identified by a scope of the form: listen address:listen port. By default
port 5598 is used for clients, and port 5599 is for internode communication.
connectionCount
The current number of open inbound connections.
connectionRate
The total rate and count of connections since the server was started.
requestRate
The total rate and number of requests, respectively: all, DELETE, GET, POST, and PUT requests. Use
deleteRate, getRate, postRate, or putRate to obtain requests of a specific type.
downloadBytesRate
Throughput in bytes per second of the transfer from server to client.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
307
Using DataStax Enterprise advanced functionality
uploadBytesRate
Throughput in bytes per second of the transfer from client to server.
responseTime
The time that elapses from receiving the full request body to the moment the server starts sending out
the response.
uploadTime
The time it takes to read the request body from the client.
downloadTime
The time that it takes to send the response body to the client.
errors
A counter which is increased every time the service handling the request throws an unexpected error.
errors is not increased by errors handled by the service logic. For example, file not found errors do
not increment errors.
CassandraClient
CassandraClient reports metrics related to the communication layer between DSEFS and the database.
responseTime
Tracks the response times of database queries.
errors
A counter increased by query execution errors (for example, timeout errors).
DSE Search
DSE Search allows you to quickly find data and provide a modern search experience for your users, helping you
create features like product catalogs, document repositories, ad-hoc reporting engines, and more.
Because DataStax Enterprise is a cohesive data management platform so other workloads such as DSE Graph,
DSE Analytics and Search integration, and DSE Analytics can take full advantage of the indexing and query
capabilities of DSE Search.
About DSE Search
DSE Search is part of DataStax Enterprise (DSE). DSE Search allows you to find data and create features like
product catalogs, document repositories, and ad-hoc reports. See DSE Search architecture.
DSE Analytics and Search integration and DSE Analytics can use the indexing and query capabilities of DSE
Search. DSE Search manages search indexes with a persistent store.
The benefits of running enterprise search functions through DataStax Enterprise and DSE Search include:
• Add search capacity just like you add capacity in the DSE database.
• Set up replication for DSE Search nodes the same way as other nodes by creating a keyspace or changing
the replication factor of a keyspace to optimize performance.
• DSE Search has two indexing modes: Near-real-time (NRT) and live indexing, also called real-time (RT)
indexing. Configure and tune DSE Search for maximum indexing throughput.
• TDE encryption of DSE Search data, including search indexes and commit logs. See Encrypting Search
indexes.
• Local node (optional) management of search indexing resources with dsetool commands.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
308
Using DataStax Enterprise advanced functionality
• Read/write to any DSE Search node and automatically index stored data.
• Fault-tolerant queries, efficient deep paging, and advanced search node resiliency.
• Native CQL queries that leverage search indexes for an array of CQL query functionality and indexing
support.
• Using CQL, DSE Search supports partial document updates that enable you to modify existing information
and maintain a lower transaction cost.
• Supports indexing and querying of advanced data types, including tuples and user-defined types (UDT).
• DSE Search is built with a production-certified version of Apache Solr™. DSE Search uses some Solr tools
and APIs, the implementation does not guarantee that Solr tools and APIs work as expected. Be sure to
review the unsupported features for DSE Search.
See the DataStax blog post What’s New for Search in DSE 6. Highlights include:
• Simplified indexing pipeline and back-pressure that reduces the frequency of dropped mutations and
requires less configuration. (Soft commit is still required for update visibility.)
• Native CQL queries can use search indexes for additional CQL query functionality and index support.
Search queries do not require a solr_query clause, and some queries that previously required ALLOW
FILTERING no longer have that limitation because search indexes are used automatically.
• Default search index configuration provides functionality similar to the ANSI SQL LIKE operator, and
requires less processing to generate the data and less index data for the search.
• Disabled the ability to perform writes and deletes using the Solr HTTP interface.
• Default index behavior from Cassandra is overridden to improve the performance of post-repair index
building.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
309
Using DataStax Enterprise advanced functionality
Figure 5:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
310
Using DataStax Enterprise advanced functionality
• A collection
See the following table for a mapping between database and DSE Search concepts.
Table 18: Relationship between the database and DSE Search concepts
Database Search single node environment
Row Document
Column Field
Node n/a
Partition n/a
Keyspace n/a
• Each document in a search index is unique and contains a set of fields that adhere to a user-defined
schema.
• The schema lists the field types and defines how they should be indexed.
• A shard is indexed data for a subset of the data on the local node.
• The keyspace is a prefix for the name of the search index and has no counterpart in Solr.
• Search queries are routed to enough nodes to cover all token ranges.
# The query is sent to all token ranges to get all possible results.
# The search engine considers the token ranges that each node is responsible for, taking into account
the replication factor (RF), and computes the minimum number of nodes that is required to query all
ranges.
• On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to
route sub-queries to the nodes most capable of handling them. The shard routing is token aware, but is not
limited unless the search query specifies a specific token range.
• With replication, a node or search index contains more than one partition (shard) of table (collection) data.
Unless the replication factor equals the number of cluster nodes, the node or search index contains only a
portion of the data of the table or collection.
2. A thread in the Thread Per Core (TPC) architecture processes the mutation.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
311
Using DataStax Enterprise advanced functionality
4. A Lucene document is built from the latest full row in the backing table.
Note:
# Commits occur when the RAM buffer is full, or a soft commit or hard commit is performed.
# Hard commits are triggered by a memtable flush on the base Cassandra table.
• Part of the flush process ensures that for a given document identifier, only one live document exists.
Therefore, any duplicate older documents are deleted.
• Lucene merges segments periodically in a similar way that Cassandra performs compaction.
Indexes real-time data yes no Ingests real-time data and automatically indexes the data.
Provides an intuitive way to yes no CQL for loading and updating data.
update data
Supports data distribution yes yes [1] Transparently distributes real-time, analytics, and search data to
multiple nodes in a cluster.
Balances loads on nodes/shards yes no Unlike Solr and Solr Cloud, DSE Search loads can be efficiently
rebalanced.
Spans indexes over multiple yes no A DSE cluster can have more than one datacenter for different types of
datacenters nodes.
Makes durable updates to data yes no Updates are durable and written to the commit log for all updates.
Automatically reindexes search yes no OSS requires the client to reingest everything to reindex data in Solr.
data
Upgrades of Apache Lucene® yes no DataStax integrates Lucene upgrades periodically and data is
preserve data preserved when you upgrade DSE.
Supports timeAllowed queries yes no OSS Solr does not support using timeAllowed queries with deep
with deep paging. paging.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
312
Using DataStax Enterprise advanced functionality
• If the timeAllowed is exceeded, and the additional shards.tolerant parameter is set to true, the application
returns the partial results collected so far.
When partial results are returned, the CQL custom payload contains the DSESearch.isPartialResults key.
• Continuous paging.
• Static columns
• Counter columns
• Super columns
• PER PARTITION clause is not supported for DSE Search solr_query queries.
• Indexing frozen maps is not supported. However, indexing frozen sets and lists of native and user-defined
(tuple/UDT) element types is supported.
• Using DSE Search with newly created COMPACT STORAGE tables is deprecated.
• Solr schema fields that are both dynamic and multiValued only for CQL-based search indexes.
• The deprecated replaceFields request parameters on document updates for CQL-based search indexes.
Instead, use the suggested procedure for inserting/updating data.
• Block joins based on the Lucene BlockJoinQuery in search indexes and CQL tables.
• Schemaless mode.
• Partial schema updates through the REST API after search indexes are changed.
For example, to update individual fields of a schema using the REST API to add a new field to a schema,
you must change the schema.xml file, upload it again, and reload the core (same for copy fields).
• The SolrCloud CloudSolrServer feature of SolrJ for endpoint discovery and round-robin load balancing.
• DSE Search does not support the duration Cassandra data type.
• RealTime Get.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
313
Using DataStax Enterprise advanced functionality
• The Tika functionality that is bundled with Apache Solr is deprecated. Instead, use the stand-alone Apache
Tika project.
• Highlighting.
• ClassicSimilarityFactory class.
• The DSE custom URP implementation is deprecated. Use the field input/output (FIT) transformer API
instead.
• JBOD mode.
• CQL Solr queries do not support native functions or column aliases as selectors.
• The 2.1 billion records limitation, per index on each node, as described in Lucene limitations.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
314
Using DataStax Enterprise advanced functionality
# Names with both leading and trailing underscores (for example, _version_) are reserved.
Non-compliant field names are not supported from all components. Backward compatibility is not
guaranteed.
• Limitations and known Apache Solr issues apply to DSE Search queries. For example: incorrect SORT
results for tokenized text fields.
• DataStax recommends CQL CREATE SEARCH INDEX and ALTER SEARCH INDEX CONFIG
commands.
3. Optionally view the XML of the pending search index. For example:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
315
Using DataStax Enterprise advanced functionality
<ramBufferSizeMB>512</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
</deletionPolicy>
<infoStream file="INFOSTREAM.txt">false</infoStream>
</indexConfig>
<jmx/>
<updateHandler class="solr.DirectUpdateHandler2">
<autoSoftCommit>
<maxTime>10000</maxTime>
</autoSoftCommit>
</updateHandler>
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048"
lowWaterMarkMB="1024"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>16</maxWarmingSearchers>
</query>
<requestDispatcher handleSelect="true">
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000"/>
<httpCaching never304="true"/>
</requestDispatcher>
<requestHandler class="solr.SearchHandler" default="true" name="search">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler
class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler"
name="solr_query">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler class="solr.UpdateRequestHandler" name="/update"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
<requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field"
startup="lazy"/>
<requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document"
startup="lazy"/>
<requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
<requestHandler class="solr.PingRequestHandler" name="/admin/ping">
<lst name="invariants">
<str name="qt">search</str>
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
316
Using DataStax Enterprise advanced functionality
</config>
For CQL index management, use configuration element shortcuts with CQL commands.
Configuration elements are listed alphabetically by shortcut. The XML element is shown with the element start
tag. An ellipsis indicates that other elements or attributes are not shown.
autoCommitTime
Defines the time interval between updates to the search index with the most recent data after an
INSERT, UPDATE, or DELETE. By default, changes are automatically committed every 10000
milliseconds. To change the time interval between updates:
The resulting XML shows the maximum time between updates is 30000 milliseconds:
<updateHandler class="solr.DirectUpdateHandler2">
<autoSoftCommit>
<maxTime>30000</maxTime>
</autoSoftCommit>
</updateHandler>
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
317
Using DataStax Enterprise advanced functionality
<directoryFactory class="solr.EncryptedFSDirectoryFactory"
name="DirectoryFactory"/>
Even though additional properties are available to tune encryption, DataStax recommends using the
default settings.
filterCacheLowWaterMark
Default is 1024 MB. See below.
filterCacheHighWaterMark
Default is 2048 MB.
The DSE Search configurable filter cache reliably bounds the filter cache memory usage for a search
index. This implementation contrasts with the default Solr implementation which defines bounds for
filter cache usage per segment. SolrFilterCache bounding works by evicting cache entries after the
configured per search index (per core) high watermark is reached, and stopping after the configured
lower watermark is reached.
SolrFilterCache defaults to offheap. In general, the larger the index is, then the larger the filter cache
should be. A good default is 1 to 2 GB. If the index is 1 billion docs per node, then set to 4 to 5 GB.
1. To change cache eviction for a large index, set the low and high values one at a time:
<query>
...
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="5000"
lowWaterMarkMB="2000"/>
...
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
318
Using DataStax Enterprise advanced functionality
</query>
mergeFactor
When a new segment causes the number of lowest-level segments to exceed the merge factor value,
then those segments are merged together to form a single large segment. When the merge factor is
10, each merge results in the creation of a single segment that is about ten times larger than each of
its ten constituents. When there are 10 of these larger segments, then they in turn are merged into an
even larger single segment. Default is 10.
<indexConfig>
...
<mergeFactor>10</mergeFactor>
...
</indexConfig>
mergeMaxThreadCount
Must configure with mergeMaxMergeCount. The number of concurrent merges that Lucene can
perform for the search index. The default mergeScheduler settings are set automatically. Do not
adjust this setting.
Default: ½ the number of tpc_cores
mergeMaxMergeCount
Must configure with mergeMaxThreadCount. The number of pending merges (active and in the
backlog) that can accumulate before segment merging starts to block/throttle incoming writes. The
default mergeScheduler settings are set automatically. Do not adjust this setting.
Default: 2x the mergeMaxThreadCount
ramBufferSize
The index RAM buffer size in megabytes (MB). The RAM buffer holds uncommitted documents. A
larger RAM buffer reduces flushes. Segments are also larger when flushed. Fewer flushes reduces I/
O pressure which is ideal for higher write workload scenarios.
For example, adjust the ramBufferSize when you configure live indexing:
Default: 512
realtime
Enables live indexing to increase indexing throughput. Enable live indexing on only one node per
cluster. Live indexing, also called real-time (RT) indexing, supports searching directly against the
Lucene RAM buffer and more frequent, cheaper soft-commits, which provide earlier visibility to newly
indexed data.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
319
Using DataStax Enterprise advanced functionality
Live indexing requires a larger RAM buffer and more memory usage than an otherwise equivalent
NRT setup. See Tune RT indexing.
Configuration elements without shortcuts
To specify configuration elements that do not have shortcuts, you can specify the XML path to the setting and
separate child elements using a period.
deleteApplicationStrategy
Controls how to retrieve deleted documents when deletes are being applied. Seek exact is the safe
default most people should choose, but for a little extra performance you can try seekceiling.
Valid case-insensitive values are:
• seekexact
Uses bloom filters to avoid reading from most segments. Use when memory is limited and the
unique key field data does not fit into memory.
• seekceiling
More performant when documents are deleted/inserted into the database with sequential keys,
because this strategy can stop reading from segments when it is known that terms can no longer
appear.
Default: seekexact
mergePolicyFactory
The AutoExpungeDeletesTieredMergePolicy custom merge policy is based on TieredMergePolicy.
This policy cleans up the large segments by merging them when deletes reach the percentage
threshold. A single auto expunge merge occurs at a time. Use for large indexes that are not merging
the largest segments due to deletes. To determine whether this merge setting is appropriate for your
workflow, view the segments on the Solr Segment Info screen.
When set, the XML is described as:
<indexConfig>
<mergePolicyFactory
class="org.apache.solr.index.AutoExpungeDeletesTieredMergePolicyFactory">
<int name="maxMergedSegmentMB">1005</int>
<int name="forceMergeDeletesPctAllowed">25</int>
<bool name="mergeSingleSegments">true</bool>
</mergePolicyFactory>
</indexConfig>
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
320
Using DataStax Enterprise advanced functionality
3. Set the percentage threshold for deleting from the large segments:
If mergeFactor is in the existing index config, you must drop it from the search index before you alter
the table to support automatic removal of deletes:
parallelDeleteTasks
Regulates how many tasks are created to apply deletes during soft/hard commit in parallel.
Supported for RT and NRT indexing. Specify a positive number greater than 0.
Leave parallelDeleteTasks at the default value, except when issues occur with write load when
running a mixed read/write workload. If writes occasionally spike in utilization and negatively impact
your read performance, then set this value lower.
Default: the number of available processors
Search index schema
Search index schema reference information to use for creating and altering a search index schema:
• DataStax recommends CQL CREATE SEARCH INDEX and ALTER SEARCH INDEX SCHEMA
commands.
The schema defines the relationship between data in a table and a search index. See Creating a search index
with default values and Quick Start for CQL index management for details and examples.
A sample search index schema XML:
Sample XML
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
321
Using DataStax Enterprise advanced functionality
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
322
Using DataStax Enterprise advanced functionality
• dsetool create_core
• dsetool core_indexing_status
• dsetool get_core_config
• dsetool get_core_schema
• dsetool infer_solr_schema
• dsetool list_index_files
• dsetool read_resource
• dsetool rebuild_indexes
• dsetool reload_core
• dsetool stop_core_reindex
• dsetool unload_core
• dsetool upgrade_index_files
• dsetool write_resource
• Performance in cassandra.yaml
• Performance in dse.yaml
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
323
Using DataStax Enterprise advanced functionality
data_file_directories
The directory where table data is stored on disk. The database distributes data evenly across the
location, subject to the granularity of the configured compaction strategy. If not set, the directory is
$DSE_HOME/data/data.
For production, DataStax recommends RAID 0 and SSDs.
Default: - /var/lib/cassandra/data
Scheduler settings in dse.yaml
Configuration options to control the scheduling and execution of indexing checks.
ttl_index_rebuild_options
Section of options to control the schedulers in charge of querying for and removing expired records,
and the execution of the checks.
fix_rate_period
Time interval to check for expired data in seconds.
Default: 300
initial_delay
The number of seconds to delay the first TTL check to speed up start-up time.
Default: 20
max_docs_per_batch
The maximum number of documents to check and delete per batch by the TTL rebuild thread. All
documents determined to be expired are deleted from the index during each check, to avoid memory
pressure, their unique keys are retrieved and deletes issued in batches.
Default: 4096
thread_pool_size
The maximum number of cores that can execute TTL cleanup concurrently. Set the thread_pool_size
to manage system resource consumption and prevent many search cores from executing
simultaneous TTL deletes.
Default: 1
Indexing settings in dse.yaml
solr_resource_upload_limit_mb
Option to disable or configure the maximum file size of the search index config or schema. Resource
files can be uploaded, but the search index config and schema are stored internally in the database
after upload.
• upload size - The maximum upload size limit in megabytes (MB) for a DSE Search resource file
(search index config or schema).
Default: 10
flush_max_time_per_core
The maximum time, in minutes, to wait for the flushing of asynchronous index updates that occurs at
DSE Search commit time or at flush time. Expert level knowledge is required to change this value.
Always set the value reasonably high to ensure flushing completes successfully to fully sync DSE
Search indexes with the database data. If the configured value is exceeded, index updates are only
partially committed and the commit log is not truncated which can undermine data durability.
When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely
manner. Live indexing increases the time to flush asynchronous index updates.
Default: commented out (5)
load_max_time_per_core
The maximum time, in minutes, to wait for each DSE Search index to load on startup or create/reload
operations. This advanced option should be changed only if exceptions happen during search index
loading. When not set, the default is 5 minutes.
Default: commented out (5)
enable_index_disk_failure_policy
Whether to apply the configured disk failure policy if IOExceptions occur during index update
operations.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
324
Using DataStax Enterprise advanced functionality
• true - apply the configured Cassandra disk failure policy to index write failures
• If enabled, the node joins the ring immediately after bootstrap and reindexing occurs
asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down.
The dsetool ring command can be used to check the status of the reindexing.
• If disabled, the node joins the ring after reindexing the bootstrapped data.
Safety thresholds
Configure safety thresholds and fault tolerance for DSE Search with options in dse.yaml and cassandra.yaml.
Safety thresholds in cassandra.yaml
Configuration options include:
read_request_timeout_in_ms
Default: 5000. How long the coordinator waits for read operations to complete before timing it out.
Security in dse.yaml
Security options for DSE Search. See DSE Search security checklist.
solr_encryption_options
Settings to tune encryption of search indexes.
decryption_cache_offheap_allocation
Whether to allocate shared DSE Search decryption cache off JVM heap.
• true - allocate shared DSE Search decryption cache off JVM heap
• false - do not allocate shared DSE Search decryption cache off JVM heap
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
325
Using DataStax Enterprise advanced functionality
Timeout behavior during distributed queries. The internal timeout for all search queries to prevent
long running queries. The client request timeout is the maximum cumulative time (in milliseconds)
that a distributed search request will wait idly for shard responses.
Default: 60000 (1 minute)
Query options in dse.yaml
Options for CQL Solr queries.
cql_solr_query_paging
• driver - Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the
driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.
• off - Paging is off. Ignore driver paging settings for CQL queries and use normal Solr paging
unless:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
326
Using DataStax Enterprise advanced functionality
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
327
Using DataStax Enterprise advanced functionality
<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler"
name="solr_query">
<lst name="defaults">
<int name="rows">10</int>
</lst>
</requestHandler>
<requestHandler class="solr.UpdateRequestHandler" name="/update"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/csv" startup="lazy"/>
<requestHandler class="solr.UpdateRequestHandler" name="/update/json" startup="lazy"/>
<requestHandler class="solr.FieldAnalysisRequestHandler" name="/analysis/field"
startup="lazy"/>
<requestHandler class="solr.DocumentAnalysisRequestHandler" name="/analysis/document"
startup="lazy"/>
<requestHandler class="solr.admin.AdminHandlers" name="/admin/"/>
<requestHandler class="solr.PingRequestHandler" name="/admin/ping">
<lst name="invariants">
<str name="qt">search</str>
<str name="q">solrpingquery</str>
</lst>
<lst name="defaults">
<str name="echoParams">all</str>
</lst>
</requestHandler>
<requestHandler class="solr.DumpRequestHandler" name="/debug/dump">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="echoHandler">true</str>
</lst>
</requestHandler>
<admin>
<defaultQuery>*:*</defaultQuery>
</admin>
</config>
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
328
Using DataStax Enterprise advanced functionality
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
329
Using DataStax Enterprise advanced functionality
<uniqueKey>(id,age)</uniqueKey>
</schema>
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
330
Using DataStax Enterprise advanced functionality
• dsetool
View the pending (uploaded) or active (in use) schema or config.
# dsetool get_core_config
# dsetool get_core_schema
• Solr Admin
View only the last uploaded (pending) resource.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
331
Using DataStax Enterprise advanced functionality
Sample schema
The following example from Querying CQL collections uses a simple primary key. The schema version attribute
is the Solr version number for the schema syntax and semantics. In this example, version="1.5".
DSE Search indexes the id, quotes, name, and title fields.
Mapping CQL primary keys and Solr unique keys
DSE Search supports CQL tables using simple or compound primary keys.
If the field is a compound primary key or Defining a multi-column partition key column in the database, the
unique key value is enclosed parentheses. The schema for this kind of table requires a different syntax than the
simple primary key:
• List each compound primary key column that appears in the CQL table in the schema as a field, just like
any other column.
• Declare the unique key using the key columns enclosed in parentheses.
• Order the keys in the uniqueKey element as the keys are ordered in the CQL table.
• When using composite partition keys, do not include the extra set of parentheses in the uniqueKey.
Partition key CQL syntax Solr uniqueKey syntax
Simple CQL primary key CREATE TABLE ( . . . a type PRIMARY <uniqueKey>a</uniqueKey>
KEY, . . . );
Parenthesis are not required for a single key.
(a is both the partition key and the primary key)
Compound primary key CREATE TABLE ( . . . PRIMARY KEY ( a, b, c ) ); <uniqueKey>(a, b, c)</uniqueKey>
(a is the partition key and a b c is the primary key)
Composite partition key CREATE TABLE ( . . . PRIMARY KEY ( ( a, b), c ); <uniqueKey>(a, b, c)</uniqueKey>
(a b is the partition key and a b c is the primary key)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
332
Using DataStax Enterprise advanced functionality
• False applies the operation only to the node it was sent to. False works only when recovery=true.
Default: true
Distributing a re-index to an entire datacenter degrades performance severely in that datacenter.
enable_tokenized_text_copy_fields:( true | false )
Whether to generate tokenized text.
Default: false
exclude_columns: col1, col2, col3, ...
A comma-separated (CSV) list of columns to exclude.
generate_DocValues_for_fields:( * | field1, field2, ... )
The fields to automatically configure DocValues in the generated search index schema. Specify '*' to
add all possible fields:
generate_DocValues_for_fields: '*'
Due to SOLR-7264, setting docValues to true on a boolean field in the Solr schema does not work. A
workaround for boolean docValues is to use 0 and 1 with a TrieIntField.
generateResources=true | false
Whether to automatically generate search index resources based on the existing CQL table metadata.
Cannot be used with schema= and solrconfig=.
Valid values:
• true - Automatically generate search index schema and configuration resources if resources do
not already exist.
spaceSavingNoJoin Do not index a hidden primary key field. Prevents joins across cores.
spaceSavingSlowTriePrecision Sets trie fields precisionStep to '0', allowing for greater space saving but slower querying.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
333
Using DataStax Enterprise advanced functionality
For example:
rt=true
Whether to enable live indexing to increase indexing throughput. Enable live indexing on only one
search index per cluster.
rt=true
default_query_field: name
auto_soft_commit_max_time: 1000
generate_DocValues_for_fields: '*'
enable_string_copy_fields: false
Use the dsetool command to generate the search index with these options to customize the config and schema
generation. Use coreOptions to specify the config.yaml file:
You can verify that DSE Search created the solrconfig and schema by reading core resources using dsetool.
Enable encryption for a new search index
Specify the class for directoryFactory to solr.EncryptedFSDirectoryFactory with coreOptionsInline:
• Converts the data into lowercase and correctly stores the lowercase data in docValues.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
334
Using DataStax Enterprise advanced functionality
You cannot apply LowerCaseStrField to a table's primary key. You also cannot use any analyzers with
LowerCaseStrField.
The command creates a search index with birthplace using the LowerCaseStrField field type. The field type
is added automatically.
To view the elements in the generated schema XML, you can use a cqlsh or dsetool command.
Examples:
Output:
To add a new field to an existing index schema with the LowerCaseStrField field type, you can:
• Or you can display the current schema with dsetool get_core_schema; edit the XML manually; and use
dsetool write_resource to update the schema by specifying your edited schema XML. Refer to dsetool
get_core_schema and dsetool write_resource.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
335
Using DataStax Enterprise advanced functionality
For example, in cqlsh, the following command adds the LowerCaseStrField field type to the new field
medicalNotes if it does not exist:
No matter which command you choose, using cqlsh or dsetool, be sure to RELOAD and REBUILD the search
index in each datacenter in the cluster.
Output:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
336
Using DataStax Enterprise advanced functionality
</schema>
There is a workaround to apply LowerCaseStrField to primary key columns. To do so, use the copyField
declaration to copy the primary key field data to the new field that's defined as type LowerCaseStrField.
Example:
The search query is case insensitive. All queries are converted to lowercase and return the same result. For
example, searches for the following values return the same result:
• name
• Name
• NAME
cd installation_location &&
bin/dse cassandra -s -Ddse.solr.data.dir=My_data_dir
solr_data_dir: My_data_dir
solr_data_dir: /var/lib/cassandra/solr.data
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
337
Using DataStax Enterprise advanced functionality
com.datastax.bdp.search.solr.transport.protocols.admin.ReindexRequestProcessor
com.datastax.bdp.search.solr.transport.protocols.admin.CoreAdminRequestProcessor
com.datastax.bdp.search.solr.core.SolrCoreResourceManager
com.datastax.bdp.search.solr.core.CassandraResourceLoader
org.apache.solr.core.SolrCore
Indexing
com.datastax.bdp.search.solr.log.EncryptedCommitLog
com.datastax.bdp.search.solr.metrics.SolrMetricsEventListener
com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex
org.apache.lucene.store.crypto.EncryptedFSDirectory
org.apache.lucene.index.IndexWriter
org.apache.lucene.index.DocumentsWriter
org.apache.lucene.store.crypto.ThreadLocalIndexEncryptionConfiguration
org.apache.lucene.index.AutoExpungeDeletesTieredMergePolicy
Queries
com.datastax.bdp.search.solr.transport.protocols.query.ShardRequestProcessor
com.datastax.bdp.search.solr.metrics.QueryMetrics
com.datastax.bdp.search.solr.auth.DseHttpRequestAuthenticatorFactory
com.datastax.bdp.search.solr.handler.shard.modern.ModernShardHandler
com.datastax.bdp.search.solr.dht.ShardRouter
com.datastax.bdp.search.solr.transport.protocols.query.RowsRequestProcessor
com.datastax.bdp.search.solr.transport.protocols.update.AbstractUpdateCommandProcessor
org.apache.solr.search.SolrFilterCache
org.apache.solr.search.SolrIndexSearcher
org.apache.solr.handler.component.SearchHandler
org.apache.solr.core.SolrCore
org.apache.solr.handler.component.QueryComponent
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
338
Using DataStax Enterprise advanced functionality
/var/log/cassandra/solrvalidation.log
For example, if a node that is not running DSE Search puts a string in a date field, an exception is logged for
that column when the data is replicated to the search node.
Enabling multi-threaded queries
Multi-threaded queries are useful for a low-indexing volume with longer running queries.
Multi-threaded queries can offset the load of a query onto the CPU instead of writing and reading to disk.
Benchmarking is recommended, multi-threaded queries do not always improve performance.
Use the CQL index management commands to set the number of queryExecutorThreads for the search index
config:
2. To view the pending search index config in XML format, use this CQL shell command:
<config>
...
<query>
<maxBooleanClauses>1024</maxBooleanClauses>
<filterCache class="solr.SolrFilterCache" highWaterMarkMB="2048"
lowWaterMarkMB="1024"/>
<enableLazyFieldLoading>true</enableLazyFieldLoading>
<useColdSearcher>true</useColdSearcher>
<maxWarmingSearchers>16</maxWarmingSearchers>
...
</query>
...
<queryExecutorThreads>4</queryExecutorThreads>
</config>
...
3. Use the RELOAD SEARCH INDEX command to apply the pending changes to the search index:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
339
Using DataStax Enterprise advanced functionality
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup</str>
<str name="field">suggest</str>
<str name="storeDir">suggest</str>
<str name="buildOnCommit">true</str>
<float name="threshold">0.0</float>
</lst>
</searchComponent>
shard.set.cover.finder
The shard set cover finder calculates how to set cover for a query and specify how one node is selected over
others for reading the search data.
The value can be one of:
• STATIC
Use Results
# Faster # For a given index, a particular coordinator accesses the same token
ranges from the respective shards.
# shard.set.cover.finder=STATIC
# Creates fewer token filters.
• DYNAMIC
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
340
Using DataStax Enterprise advanced functionality
Use Results
# DYNAMIC is the default in DSE 6.0 # There is no fixed distribution of shard requests for a given coordinator.
For two queries, there may be two different sets of shard requests.
# shard.set.cover.finder=DYNAMIC
# Creates a large number of unique token filters because different queries
may yield shard requests accessing different sets of token ranges. This
scenario is often times a problem especially with vnodes because there
is a much greater number of possible combinations.
shard.shuffling.strategy
When shard.set.cover.finder=DYNAMIC, you can change the shard shuffling strategy to one of these values:
• HOST - Shards are selected based on the host that received the query.
• RANDOM - Different random set of shards are selected with each request (default).
shard.set.cover.finder.inertia
When shard.set.cover.finder=STATIC, you can change the shard cover finder inertia value. Increasing the
inertia value from the default of 1 may improve performance for clusters with more than 1 vnode and more than
20 nodes. The default is appropriate for most workloads.
Changing core properties
Changing core properties is an advanced operation that sets properties in the dse-search.properties
resource for the search index.
These example commands show how to change core properties for the demo keyspace and the health_data
table.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
341
Using DataStax Enterprise advanced functionality
2. Only when shard.set.cover.finder=DYNAMIC, you can change the shard shuffling strategy:
Result:
shard.set.cover.finder=STATIC
Log files show the loaded DSE search properties. The dsetool list_core_properties command shows
only the state of the properties in the dse-search.properties resource.
2. Open the exclude.hosts file, and add the list of nodes to be excluded. Separate each name with a
newline character.
3. Update the list of routing endpoints on each node by calling the JMX operation refreshEndpoints() on
the com.datastax.bdp:type=ShardRouter mbean.
For the steps to accomplish this task, refer to Set the location of search indexes.
In addition, plan for sufficient memory resources and disk space to meet operational requirements. Refer to
Capacity planning for DSE Search.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
342
Using DataStax Enterprise advanced functionality
iostat -x -c -d -t 1 600
IOwait is a measure of time over a given period that a CPU (or all CPUs) spent idle because all runnable tasks
were waiting for an IO operation to complete. While each environment is unique, a general guidelines is to
check whether iowait is above 5% more than 5% of the time. If that scenario occurs, try upgrading to faster
SSD devices or tune the machine to use less IO, and test again. Again, it's important to locate the search data
on dedicated SSDs, separate from the transactional data.
Disable AIO
All DSE Search index updates first perform a read-before-write against the partition or row being indexed. This
functionality means DSE uses the core database's internal read path, which in turn uses the asynchronous I/O
(AIO) chunk cache apparatus.
If you are experiencing poor performance during search indexing, or during read or write queries of frequently
used datasets, DataStax recommends that you try the following steps. Starting in your development
environment:
1. Disable AIO.
To disable AIO, pass -Ddse.io.aio.enabled=false to DSE at startup. Once enforced, SSTables and Lucene
segments, as well as other minor off-heap elements, will reside in the OS page cache and will be managed by
the kernel.
Disabling AIO will generate a WARN entry in system.log. Example:
• Near-real-time (NRT) indexing is the default indexing mode for Apache Solr™ and Apache Lucene®.
• Live indexing, also called real-time (RT) indexing, supports searching directly against the Lucene RAM
buffer and more frequent, cheaper soft-commits, which provide earlier visibility to newly indexed data.
However, RT indexing requires a larger RAM buffer and more memory usage than an otherwise equivalent
NRT setup.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
343
Using DataStax Enterprise advanced functionality
• Increase the soft commit time, which is set to 10 seconds (10000 ms) by default. For example, increase the
time to 60 seconds and then reload the search index:
A disadvantage of changing the autoSoftCommit attribute is that newly updated rows take longer than usual
(10000ms) to appear in search results.
Tune RT indexing
Live indexing reduces the time for docs to be searchable.
2. To configure live indexing, set the autoCommitTime to a value between 100-1000 ms:
Test with tuning values of 100-1000 ms. An optimal setting in this range depends on your hardware and
environment. For live indexing (RT), this refresh interval saturates at 1000 ms. A value higher than 1000
ms is not recognized.
4. If you change the heap, restart DSE to use live indexing with the changed heap size.
• ram_buffer_heap_space_in_mb: 1024
• ram_buffer_offheap_space_in_mb: 1024
Because NRT does not use offheap, these settings apply only to RT.
Adjust these settings to configure how much global memory all Solr cores use to accumulate updates before
flushing segments. Setting this value too low can induce a state of constant flushing during periods of ongoing
write activity. For NRT, these forced segment flushes will also de-schedule pending auto-soft commits to avoid
potentially flushing too many small segments.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
344
Using DataStax Enterprise advanced functionality
To workaround timeouts:
1. Run with a replication factor greater than 1 to ensure that replicas are always available.
3. After restarting the node, issue several match all queries. For example, q=*:* to warm up the filters.
4. If you're using the Java Driver, create an ad-hoc session with only the node to warm up in the white list.
Issuing many queries increase the chances that all token ranges are used.
After the uptime ramp-up period, the node starts to be hit by distributed queries. The filters are warmed up
already and timeouts should not occur.
Table compression can optimize reads
Search nodes typically engage in read-dominated tasks, so maximizing storage capacity of nodes, reducing the
volume of data on disk, and limiting disk I/O can improve performance. You can configure data compression on
a per-table basis to optimize performance of read-dominated tasks.
You can implement custom compression classes using the
org.apache.cassandra.io.compress.ICompressor interface. See CQL table properties.
By default, the parallel row resolver uses up to x threads to execute parallel reads, where x is the number of
CPUs. Each thread sequentially reads a batch of rows equal to the total requested rows divided by the number
of CPUs:
Rows read = Total requested rows / Number of CPUs
You can change the batch size per request, by specifying the cassandra.readBatchSize HTTP request
parameter. Smaller batches use more parallelism, while larger batches use less.
Changing the stack size and memtable space
Some Solr users have reported that increasing the stack size improves performance under Tomcat. To
increase the stack size, uncomment and modify the default -Xss256k setting in the cassandra-env.sh file.
Also, decreasing the memtable space to make room for Solr caches might improve performance. Modify the
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
345
Using DataStax Enterprise advanced functionality
• token_long
Used for filtering over token fields during query routing.
• ttl_long
Used for searching for expiring documents.
1. In the fieldType definition, set the class attribute of token_long and ttl_long to solr.TrieLongField.
2. Set the precisionStep attribute from the default 8 to another number. Choose this number based on an
understanding of its impact. Usually, a smaller precision step increases the index size and range query
speed, while a larger precision step reduces index size, but potentially reduces range query speed.
The following snippet of the schema.xml shows an example of the required configuration of both field types:
DataStax Enterprise ignores one or both of these field type definitions and uses the default precision step if you
make any of these mistakes:
• The field type is defined using a name other than token_long or ttl_long.
• The precision step value is not a number. DataStax Enterprise logs a warning.
The definition of a fieldType alone sets up the special field. You do not need to use token_long or ttl_long
types as fields in the <fields> tag.
Improving read performance
You can increase DSE Search read performance by increasing the number of replicas. You define a strategy
class, the names of datacenters, and the number of replicas. For example, you can add replicas using the
NetworkToplogyStrategy replica placement strategy.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
346
Using DataStax Enterprise advanced functionality
1. Examine your datacenter and nodes. The following example shows two datacenters with one node in each
datacenter, which is a suboptimal configuration.
Datacenter: DC1
==========
Address Rack Status State Load Owns
Token
Datacenter: DC2
==========
Address Rack Status State Load Owns
Token
The datacenter names, DC1 and DC2 in this example, must match the datacenter name configured for
your snitch.
2. Start CQL on the command line and create a keyspace that specifies the number of replicas.
To improve read performance, increase the number of replicas in the datacenters. For example, at least
three replicas in DC1 and three in DC2.
These two activities compete for resources, so proper resource allocation is critical to maximize efficiency for
initial data load.
Recommendations
• For maximum throughput, store the search index data and DataStax Enterprise (Cassandra) data on
separate physical disks.
If you are unable to use separate disks, DataStax recommends that SSDs have a minimum of 500 MB/s
read/write speeds (bandwidth).
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
347
Using DataStax Enterprise advanced functionality
2. Use the CQL CREATE SEARCH INDEX command to create search indexes.
4. Load data into the database using best practices for data loading. For example, load data with the driver
with the consistency level at LOCAL_ONE (CL.LOCAL_ONE) and a sufficiently high write timeout.
After data loading is completed, there might be lag time because indexing is asynchronous.
5. Verify the indexing QueueSize with the IndexPool MBean. After the index queue size has receded, run this
CQL query to verify that the number of records is as expected:
• If dropped mutations exist in the nodetool tpstats output for some nodes, and OpsCenter repair service is
not enabled, run manual repair on those nodes.
• If dropped mutations do not exist, check the system.log and the Solr validation logfor indexing errors.
1. Is node active?
Preference to active nodes.
3. Node health rank, an exponentially increasing number between 0 and 1. It describes the health node so if
all the previous criteria is equal, a node with a better score is chosen first. This node health rank value is
exposed as a JMX metrics under ShardRouter.
Node health rank is a comparison of uptime and dropped mutations:
where:
• drop_rate = the rate of dropped mutations per minute over a sliding window of configurable length.
To configure the historic time window, set dropped_mutation_window_minutes in dse.yaml.
A high-dropped mutation rate indicates an overloaded node. For example, database insertions, and
updates.
• uptime = a score between 0 and 1 that weights recent downtime more heavily than less recent
downtime.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
348
Using DataStax Enterprise advanced functionality
The ShardRouter MBean, not present in open source Solr, provides information about how DSE search routes
queries.
Deleting a search index
To delete a search index:
1. Launch cqlsh and execute the CQL command to drop the search index:
2. Exit cqlsh and verify that the files are deleted from the file system:
ls /var/lib/cassandra/data/solr.data/wiki.solr/index
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
349
Using DataStax Enterprise advanced functionality
INDEXING / REINDEXING -
INFO SolrSecondaryIndex plugin initializer. 2013-08-26 19:25:43,347
SolrSecondaryIndex.java (line 403) Reindexing 439171 keys for core wiki.solr
FINISH INDEXING -
INFO Thread-38 2013-08-26 19:38:10,701 SecondaryIndexManager.java (line 156) Index build
of wiki.solr complete
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
350
Using DataStax Enterprise advanced functionality
1. Drain the node to ensure that the search indexes are in sync with their backing tables. This command
forces a memtable flush that forces a Solr hard commit:
$ nodetool drain
3. Manually back up your data directories. The default location for index files is /var/lib/cassandra/data/
solr.data.
1. Use the DataStax Enterprise restore steps with indexing enabled and let the data write as data is coming
in.
Use the OpsCenter Backup Service.
Using custom resources is not supported by the CQL CREATE SEARCH INDEX command.
Index resources are stored internally in the database, not in the file system. The schema and configuration
resources are persisted in the solr_admin.solr_resources database table.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
351
Using DataStax Enterprise advanced functionality
For example:
Solr interfaces
core table SELECT Query core and all remaining admin query
operations on core
Permissions are inherited. Granting permissions on a keyspace allows users with that role to access all
tables in the keyspace.
Examples
To grant permission to read resources:
• The default native_transport_address value localhost enables Tomcat to only listen on the
localhos.
To change the IP address for client connections to DSE Search using the HTTP and Solr Admin
interfaces, change the client connection in the following ways native_transport_address in the
cassandra.yaml file or create a Tomcat connector.
Create a Tomcat connector:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
352
Using DataStax Enterprise advanced functionality
<Connector
port="PORT"
protocol="HTTP/1.1"
address="IP_ADDRESS"
connectionTimeout="20000"
redirectPort="8443"
/>
2. For advanced users only: In the Tomcat server.xml file, specify a client connection port other than the
default port 8983. However, when specifying a non-default connection port, the automatic SSL connection
configuration performed by DataStax Enterprise is not done. You must provide the valid connector
configuration, including keystore path and password. See the DataStax Support article Configuring the
DSE Solr HTTP/HTTPS port.
When the plugin JAR file is not in the directory that is defined by the <lib> property, attempts to deploy custom
Solr libraries in DataStax Enterprise fail with java.lang.ClassNotFoundException and an error in the
system.log like this:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
353
Using DataStax Enterprise advanced functionality
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
...
Workaround
Using the class in this example with the JAR file name com.boogle.search.CustomQParserPlugin-1.0.jar,
follow these steps to get the custom plugin working on all DSE Search nodes.
2. Place custom code or Solr contrib modules in the Solr library directories.
3. Deploy the JAR file on all DSE Search nodes in the cluster in the appropriate lib/ directory.
For example, package installations: /usr/share/dse/solr/lib/
com.boogle.search.CustomQParserPlugin-1.0.jar
DSE custom URP provided similar functionality to the Solr URP chain, but appeared as a plugin to Solr. The
classic URP is invoked when updating a document using HTTP and the custom URP is invoked when updating
a table using DSE. If both classic and custom URPs are configured, the classic version is executed first. The
custom URP chain and the FIT API work with CQL and HTTP updates.
Examples are provided for using the field input/output transformer API and the deprecated custom URP.
Field input/output (FIT) transformer API
Use the field input/output transformer API as an option to the input/output transformer support in Apache
Solr™. An Introduction to DSE Field Transformers provides details on the transformer classes.
DSE Search includes the released version of a plugin API for Solr updates and a plugin to the
CassandraDocumentReader. The plugin API transforms data from the secondary indexing API before data
is submitted. The plugin to the CassandraDocumentReader transforms the results data from the database to
DSE Search.
Using the API, applications can tweak a Solr Document before it is mapped and indexed according to the
schema.xml. The API is a counterpart to the input/output transformer support in Solr.
The field input transformer (FIT) requires:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
354
Using DataStax Enterprise advanced functionality
• name="dse"
1. Define the plugin in the top level <config> element in the solrconfig.xml for a table (search core).
<config>
...
<fieldInputTransformer name="dse" class="
com.datastax.bdp.cassandra.index.solr.functional.
BinaryFieldInputTransformer">
</fieldInputTransformer>
2. Write a transformer class something like this reference implementation to tweak the data in some way.
3. Export the class to a JAR file. You must place the JAR file in this location:
package com.datastax.bdp.search.solr.functional;
import java.io.IOException;
import org.apache.commons.codec.binary.Hex;
import org.apache.commons.lang.StringUtils;
import org.apache.lucene.document.Document;
import org.apache.solr.core.SolrCore;
import org.apache.solr.schema.SchemaField;
import com.datastax.bdp.search.solr.FieldOutputTransformer;
import org.apache.solr.schema.IndexSchema;
@Override
public void addFieldToDocument(SolrCore core,
IndexSchema schema,
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
355
Using DataStax Enterprise advanced functionality
String key,
Document doc,
SchemaField fieldInfo,
String fieldValue,
DocumentHelper helper)
throws IOException
{
try
{
byte[] raw = Hex.decodeHex(fieldValue.toCharArray());
byte[] decomp = DSP1493Test.decompress(raw);
String str = new String(decomp, "UTF-8");
String[] arr = StringUtils.split(str, ",");
String binary_name = arr[0];
String binary_type = arr[1];
String binary_title = arr[2];
package com.datastax.bdp.search.solr.functional;
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.lucene.index.FieldInfo;
import org.apache.lucene.index.StoredFieldVisitor;
import com.datastax.bdp.search.solr.FieldOutputTransformer;
visitor.stringField(binary_name_fi, binary_name);
visitor.stringField(binary_type_fi, binary_type);
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
356
Using DataStax Enterprise advanced functionality
visitor.stringField(binary_title_fi, binary_title);
}
}
The DSE custom URP implementation is deprecated. A custom URP is almost always unnecessary.
Instead, DataStax recommends using the field input/output (FIT) transformer API instead.
Using the API, applications can tweak a search document before it is mapped and indexed according to the
index schema.
The field input transformer (FIT) requires a trailing Z for date field values.
<dseUpdateRequestProcessorChain name="dse">
<processor
class="com.datastax.bdp.search.solr.functional.DSEUpdateRequestProcessorFactoryExample">
</processor>
</dseUpdateRequestProcessorChain>
2. Write a class to use the custom URP that extends the Solr UpdateRequestProcessor. For example:
package com.datastax.bdp.search.solr.functional;
import java.io.IOException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.datastax.bdp.search.solr.handler.update.CassandraAddUpdateCommand;
import com.datastax.bdp.search.solr.handler.update.CassandraCommitUpdateCommand;
import org.apache.solr.update.AddUpdateCommand;
import org.apache.solr.update.CommitUpdateCommand;
import org.apache.solr.update.processor.UpdateRequestProcessor;
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
357
Using DataStax Enterprise advanced functionality
super.processAdd(cmd);
}
3. Export the class to a JAR, and place the JAR in this location:
package com.datastax.bdp.search.solr.functional;
import com.datastax.bdp.search.solr.handler.update.DSEUpdateProcessorFactory;
import org.apache.solr.core.SolrCore;
import org.apache.solr.update.processor.UpdateRequestProcessor;
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
358
Using DataStax Enterprise advanced functionality
{
return getClass();
}
1. Implement a custom field type class something like the following reference implementation.
2. Export the class to a JAR, and place the JAR in this location:
Reference implementation
Here is an example of a custom field type class:
package com.datastax.bdp.search.solr.functional;
import com.datastax.bdp.search.solr.CustomFieldType;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.index.IndexableField;
import org.apache.solr.schema.FieldType;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.StrField;
import org.apache.solr.schema.TrieField;
@Override
public FieldType getStoredFieldType()
{
return new StrField();
}
@Override
public boolean multiValuedFieldCache()
{
return true;
}
@Override
public ListIndexableField createFields(SchemaField sf, Object value)
{
String[] values = ((String) value).split(" ");
ListIndexableField fields = new ArrayListIndexableField();
for (String v : values)
{
fields.add(createField(sf, v));
}
return fields;
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
359
Using DataStax Enterprise advanced functionality
@Override
public String toInternal(String value)
{
return value;
}
@Override
public String toExternal(IndexableField f)
{
return f.stringValue();
}
Deleting by query
Delete by query no longer accepts wildcard queries, including queries that match all documents (for example,
<delete><query>*:*</query></delete>). Instead, use the CQL TRUNCATE command.
Delete by query supports deleting data based on search criteria. After you issue a delete by query, documents
start getting deleted immediately and deletions continue until all documents are removed. For example, you can
delete the data that you inserted using this command:
Using &allowPartialDeletes parameter set to false (default) prevents deletes if a node is down. Using
&allowPartialDeletes set to true causes the delete to fail if a node is down and the delete does not meet a
consistency level of quorum. Delete by queries using *:* are an exception to these rules. These queries issue a
truncate, which requires all nodes to be up in order to succeed.
Best practices
DataStax recommends that queries for delete-by-query operations touch columns that are not updated. For
example, a column that is not updated is one of the elements of a compound primary key.
Delete by query problem example
The following workflow demonstrates that not following this best practice is problematic:
• When a search coordinator receives a delete-by-query request, the request is forwarded to every node in
the search datacenter.
• At each search node, the query is run locally to identify the candidates for deletion, and then the
LOCAL_ONE consistency level deletes the queries for each of those candidates.
• When those database deletes are perceived at the appropriate nodes across the cluster, the records are
deleted from the search index.
For example, in a certificates table, each certificate has a date of issue that is a timestamp. When a certificate
is renewed, the new issue date is written to the row, and that write is propagated to all replicas. In this example,
let's assume that one replica misses it. If you run a periodic delete-by-query that removes all of the certificates
with issue dates older than a specified date, unintended consequences occur when the replica that just missed
the write with the "certificate renewal" matches the delete query. The certificate is deleted across the entire
cluster, on all datacenters making that delete unrecoverable.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
360
Using DataStax Enterprise advanced functionality
The following example obtains the segment information for the Solr GeoNames collection. The example
specifies the IP address and port, and specifies that the output is to be returned in JSON format and indented.
$ http://127.0.0.1:8983/solr/solr.geonames/admin/segments?wt=json&indent=true
The following output shows the segment information, which is truncated for brevity:
{
"responseHeader":{
"status":0,
"QTime":3},
"segments":{
"_0":{
"name":"_0",
"delCount":5256,
"sizeInBytes":1843747,
"size":6439,
"sizeMB":1.7583341598510742,
"delRatio":0.816275819226588,
"age":"2017-06-15T15:21:09.730Z",
"source":"flush"},
"_1":{
"name":"_1",
"delCount":5351,
"sizeInBytes":1881895,
"size":6554,
"sizeMB":1.7947149276733398,
"delRatio":0.816447970704913,
"age":"2017-06-15T15:21:09.786Z",
"source":"flush"},
"_3":{
"name":"_3",
"delCount":5553,
"sizeInBytes":1952348,
"size":6850,
"sizeMB":1.8619041442871094,
"delRatio":0.8106569343065694,
"age":"2017-06-15T15:21:09.790Z",
"source":"flush"},
...
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
361
Using DataStax Enterprise advanced functionality
The following table describes the segment properties in the previous output:
Property Description
delRatio Delete ratio, which is based on the ratio for the segment delete count and number of documents in the segment
source Segment source; flush sends the recent index changes to stable storage
For more information, see the Apache Solr online reference guide located at https://lucene.apache.org/solr/
guide.
HTTP API SolrJ and other Solr clients
Apache Solr™ clients work with DataStax Enterprise. If you have an existing Solr application, you can create a
schema, then import your data and query using your existing Solr tools. The Wikipedia demo is built and queried
using SolrJ. The query is done using pure Ajax. No DataStax Enterprise API is used for the demo.
DataStax has extended SolrJ to protect internal Solr communication and HTTP access using SSL. You can also
use SolrJ to change the consistency level of the write in the database on the client side.
DSE Graph
DataStax Enterprise (DSE) Graph is a distributed graph database that is optimized for fast data storage and
traversals, zero downtime, and analysis of complex, disparate, and related datasets in real time. It is capable of
scaling to massive datasets and executing both transactional and analytical workloads. DSE Graph incorporates
all of the enterprise-class functionality found in DataStax Enterprise, including advanced security protection,
built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools
including DataStax Studio.
About DataStax Enterprise Graph
DataStax Enterprise (DSE) Graph is a distributed graph database that is optimized for fast data storage and
traversals, zero downtime, and analysis of complex, disparate, and related datasets in real time. It is capable of
scaling to massive datasets and executing both transactional and analytical workloads. DSE Graph incorporates
all of the enterprise-class functionality found in DataStax Enterprise, including advanced security protection,
built-in DSE Analytics and DSE Search functionality, visual management and monitoring, and development tools
including DataStax Studio.
What is a graph database?
A graph database is a database that uses graph structures to store data along with the data's relationships.
Common use cases include: fraud prevention, Customer 360, Internet of Things (IoT) predictive maintenance,
and recommendation engine. Graph databases use a data model that is as simple as a whiteboard drawing.
Graph databases employ vertices, edges, and properties as described in Data modeling.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
362
Using DataStax Enterprise advanced functionality
Support for high-volume, The transactional capacity of DSE Graph scales with the size of the cluster and answers complex traversal
concurrent transactions and queries on huge graphs in milliseconds.
operational graph processing
(OLTP)
Integration with DSE Search Integrates with DSE Search for efficient indexing that supports geographical and numeric range search, as
well as full-text search for vertices and edges in large graphs.
Native support for Apache Uses the popular property graph data model exposed by Apache TinkerPop and the graph traversal
TinkerPop and Gremlin query language Gremlin.
language
Vertex-centric indexes provide Allows optimized deep traversal by reducing search space quickly.
optimal querying
Optimized disk representation Allows for efficient use of storage and speed of access.
• Integrated with the DSE database to take advantage of the DSE database's features
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
363
Using DataStax Enterprise advanced functionality
• DSE Graph Loader is a command line utility that supports loading the following formats: CSV, text files,
GraphSON, GraphML, Gryo, and queries from JDBC-compatible databases.
• DataStax Studio and the Gremlin console load data using graph traversals.
• DseGraphFrame, a framework for the Spark API, loads data to DSE Graph directly or with transformations.
Best practices start with data modeling before inserting data. The paradigm shift between relational and graph
databases requires careful analysis of data and data modeling before importing and querying data in a graph
database. DSE Graph data modeling provides information and examples.
• DataStax Studio, a web-based interactive developer tool with notebooks for running Gremlin commands and
visualizing graphs
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
364
Using DataStax Enterprise advanced functionality
graph partitioning
A process that consists of dividing a graph into components, such that the components are of about the
same size and there are few connections between the components.
graph traversal
An algorithmic walk across the elements of a graph according to the referential structure explicit within
the graph data structure.
incident edge
An edge incident to a particular vertex, meaning that the edge and vertex touch.
index
An index is a data structure that allows for the fast retrieval of elements by a particular key-value pair.
meta-property
A property that describes some attribute of another property.
order
The magnitude of the number of edges to the number of vertices.
partitioned vertex
Used for vertices that have a very large number of edges, a partitioned vertex consists of a portion of a
vertex's data that results from dividing the vertex into smaller components for graph database storage.
Experimental
property
A key-value pair that describes some attribute of either a vertex or an edge. Property key is used to
describe the key in the key-value pair. All properties are global in DSE Graph, meaning that a property
can be used for any vertices. For example, "name" can be used for all vertices in a graph.
traversal source
A domain specific language (DSL) that specifies the traversal methods used by a traversal.
undirected graph
A set of vertices and a set of edges (unordered pairs of vertices).
vertex-centric index
A local index structure built per vertex.
vertex
A vertex is the fundamental unit of which graphs are formed. A vertex can also be described as an
object that has incoming and outgoing edges.
vertex degree
The number of edges incident to a vertex.
DSE Graph Operations
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
365
Using DataStax Enterprise advanced functionality
2 2 2
3 3 3
4 3 4
5 or greater 3 5
For more information, see the Graph System API: replication factor and system replication factor.
Consistency_mode, datacenter_id, read_consistency, and write_consistency
Consistency level in DSE Graph is controlled for both graph operation and DSE database operations. The
consistency_mode setting configures graph operations, and read_consistency and write_consistency
settings configure the consistency level of DSE database read and write operations within a graph
transaction.
The consistency_mode (default: GLOBAL) is appropriate for user-defined vertex ids. If auto-generated vertex
ids are used, this setting can be changed to DC_LOCAL, with a concurrent change made to the datacenter_id
setting. Both consistency_mode and datacenter_id must be configured on every node in the cluster. The
datacenter_id setting is ignored if consistency_mode is set to GLOBAL.
These options must be set to the same value in the dse.yaml file on every node in a cluster, and will not
be effective if set while the cluster is running.
Gremlin queries execute CQL commands to insert, read, and update graph data via traversals, and so the
DSE database consistency level settings can affect the execution of graph operations. The consistency
level for reads or writes can generally be set per graph with the read_consistency (default: ONE) and
write_consistency (default: LOCAL_QUORUM) settings for user-defined vertex ids. If a search index is
used in a graph traversal, the read_consistency will be set to LOCAL_ONE in a multiple datacenter cluster.
The options are set with the Schema API .
schema_mode
To access data, two configuration items are important: schema_mode and allow_scan.
The schema_mode setting has two choices that identify whether automatic schema creation is allowed or not:
• Development: allows loading graph data before explicitly specifying a graph schema through the Graph
Schema API
• Production (default): required explicit graph schema prior to loading graph data
The schema_mode setting has a hard-coded default value of Production, that can be overridden by either:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
366
Using DataStax Enterprise advanced functionality
When exploring data to design your graph application, setting schema_mode: Development can be beneficial
in helping you to discover the graph schema that you may want to use. However, setting schema_mode:
Production is important once development is complete, to prevent random schema creation.
The default settings for schema_mode and allow_scan are set for production, not development, to ensure
out-of-the-box operation conforms to the more restrictive environment.
Three useful commands are available for discovering the current value of these two settings:
• schema.getEffectiveSchemaMode(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.
• schema.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.
• graph.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), graph-
level setting that may have been set, and transaction-level setting that may have been set.
1. blacklist_supers, including all classes that implement or extend the listed items
5. whitelist_supers, including all classes that implement or extend the listed items
Any types not specified in the whitelist are blocked by default. If an item is blacklisted, it cannot be placed in
the whitelist unless it is removed from the blacklist; otherwise, an error occurs and the item is blocked.
Two classes are hard-coded as blacklisted and cannot be whitelisted:
• java.lang.System: All methods other than currentTimeMillis and nanoTime are blocked (blacklisted).
An example of possible whitelisted and blacklisted items in the gremlin_server section of the dse.yaml file:
gremlin_server:
port: 8182
threadPoolWorker: 2
gremlinPool: 0
scriptEngines:
gremlin-groovy:
config:
# sandbox_enabled: false
sandbox_rules:
whitelist_packages:
- org.apache.tinkerpop.gremlin.process
- java.nio
whitelist_types:
- java.lang.String
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
367
Using DataStax Enterprise advanced functionality
- java.lang.Boolean
- com.datastax.bdp.graph.spark.SparkSnapshotBuilderImpl
- com.datastax.dse.graph.api.predicates.Search
whitelist_supers:
- groovy.lang.Script
- java.lang.Number
- java.util.Map
-
org.apache.tinkerpop.gremlin.process.computer.GraphComputer
blacklist_packages:
- java.io
- org.apache.tinkerpop.gremlin.structure.io
- org.apache.tinkerpop.gremlin.groovy.jsr223
- java.nio.channels
The Fluent API restricts the allowable operations to secure execution, but uses the sandbox to enable lambda
functions.
Authentication, authorization, and encryption
DSE can authenticate or authorize access by users, secure the stored data with encryption, or secure
Gremlin console with SSL, based on Graph vertex labels or graphs, as applicable.
DSE Graph security is managed by DSE security. As noted in this topic, you can modify the Graph Sandbox
by customizing the gremlin-server: key of the dse.yaml file.
To configure the DSE Graph Gremlin console connection to the Gremlin Server, customize the remote.yaml
file for your environment.
DSE Graph also supports auditing using DSE auditing; for details, refer to Setting up database auditing.
Restrict lambda
Lambda restriction is enabled by default to block arbitrary code execution in Gremlin traversals. Most
applications should not require user-defined lambda functions. If lambda functions are required, disable
lambda restrictions using the Schema API to change the restrict_lambda (default: true) option.
See Apache TinkerPop documentation for more information on lambda functions.
• TRUE: allows any graph query to do full scans of the cluster, similar to ALLOW FILTERING in CQL
queries. Although useful during development, allowing full scan can result in queries that do costly linear
scans over one or more tables.
• FALSE (default): will not execute a query if restrictions to a subset of the entire cluster’s data are not
included
The allow_scan setting has a hard-coded default value of FALSE, that can be overridden to a value of TRUE
by doing one of the following actions:
When exploring data to design your graph application, setting allow_scan: true allows you to fully explore
and visualize ther relationships in small test datasets with very broad queries like g.V(). Be aware, however,
that traversals depending on full scans will take too long to execute with large production-size datasets, and
that once development is complete, allow_scan: false is the appropriate setting.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
368
Using DataStax Enterprise advanced functionality
The default settings for schema_mode and allow_scan are set for production, not development, to ensure
out-of-the-box operation conforms to the more restrictive environment.
Three useful commands are available for discovering the current value of these two settings:
• schema.getEffectiveSchemaMode(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.
• schema.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), and
graph-level setting that may have been set.
• graph.getEffectiveAllowScan(): Checks the hard-coded value, dse.yaml value (if specified), graph-
level setting that may have been set, and transaction-level setting that may have been set.
cache
Caching can use additional memory to store intermediary results, and improve the performance of DSE
Graph by shortening the time to complete queries. DSE Graph has two caches:
• adjacency cache: store the properties of vertices and the properties of those vertices' incident edges
• index cache: stores the results of graph traversals that include a global index, such as a hasLabel() or
has() step
Caching is enabled by default; the Schema API setting cache (default: true) can be used to disable caching.
In addition, both adjacency cache and index cache have settings that can be modified:
Table 20: DSE Graph cache
Cache setting Default Location Description
vertex_cache_size 10000l Set with Schema API. Maximum size of transaction-level cache
of recently-used vertices.
Timeouts
Timeout settings can cause failure of DSE Graph in a variety of ways, both client-side and server-side. On the
client-side, commands from the Gremlin console can time out before reaching the Gremlin server. Issuing the
command :remote config timeout none in the Gremlin console allows the default maximum timeout of 3
minutes to be overridden with no time limit. Any request typed into the Gremlin console is sent to the Gremlin
Server, and the console waits for a response before it aborts the request and returns control to the user. If the
timeout is changed to none, the request will never timeout. This can be useful if the time to send a request to
the server and get a return is taking longer than the default timeout, for complex traversals or large datasets.
On the server-side, the cluster-wide timeout settings, realtime_evaluation_timeout_in_seconds (default:
30 seconds) or analytic_evaluation_timeout_in_minutes (default: 1008 minutes), are the maximum
time to wait for a traversal to evaluate for OLTP or OLAP traversals, respectively. These settings are found in
the dse.yaml file. If the timeout behavior for traversal evaluation needs to be overridden for a particular graph,
evaluation_timeout can be set on a graph-by-graph basis, to override either the OLTP or OLAP traversal
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
369
Using DataStax Enterprise advanced functionality
evaluation timeout. If complex traversals are timing out during execution, changing an appropriate timeout
setting should fix the error.
An additional server-side setting that can be adjusted in the dse.yaml file is
schema_agreement_timeout_in_ms (30 seconds), the maximum time to wait for schema versions to agree
across a cluster when making schema changes. If a large schema is submitted to a cluster, especially with
indexes defined, this setting may need adjustment before data is submitted to the graph.
Finally, in the dse.yaml file, system_evaluation_timeout_in_seconds (default: 180 seconds) is defined as
the maximum time to wait for a graph system request to evaluate. Creating or dropping a graph is a system
request affected by this setting, which does not interact with the other timeout options.
:remote config timeout none 3 minutes Lengthen if command transit from Gremlin console to Gremlin
Server is timing out.
analytic_evaluation_timeout_in_minutes 1008 minutes Lengthen if the OLAP traversal evaluation is timing out.
system_evaluation_timeout_in_seconds 180 seconds Lengthen if graph system requests are not completing.
schema.config().describe()
graph.tx_groups.default.write_consistency: ALL
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
370
Using DataStax Enterprise advanced functionality
graph.tx_groups.default.read_consistency: QUORUM
schema.config().option('graph.tx_groups.default.write_consistency').get()
ALL
schema.config().option('graph.tx_groups.default.write_consistency').set('ALL')
null
• To retrieve all traversal sources that have been set, use the get() command with the traversal source
type option:
schema.config().option('graph.traversal_sources.*.type').get()
REAL_TIME
schema.config().option("graph.traversal_sources.g.evaluation_timeout").set('PT2H')
The timeout values can also be entered in seconds or minutes, as appropriate, using set('1500
ms'), for example.
Setting a timeout value greater than 1095 days (maximum integer) can exceed the limit of a graph
session. Starting a new session and setting the timeout to a lower value can recover access to a hung
session. This caution is applicable for all timeouts: evaluation_timeout, system_evaluation_timeout,
analytic_evaluation_timeout, and realtime_evaluation_timeout
PT2H
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
371
Using DataStax Enterprise advanced functionality
• Settings can also be set while creating a new graph. For instance, replication for graph inherits DSE
database defaults, so the replication factor is set to 1 and the class is SimpleStrategy. As with the DSE
database, the replication factor for graph should be set before adding data.
system.graph('gizmo').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3 }").
ifNotExists().create()
• Graph also creates a keyspace for storing graph variables in DSE tables. This keyspace holds essential
information, so the replication factor should be set to something higher than one replica to ensure no loss.
gremlin> system.graph('gizmo').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3 }").
ifNotExists().create()
system.graph('food2').
replication("{'class' : 'SimpleStrategy', 'replication_factor' : 1 }").
systemReplication("{'class' : 'SimpleStrategy', 'replication_factor' : 1 }").
option("graph.schema_mode").set("Development").
option("graph.allow_scan").set("false").
option("graph.default_property_key_cardinality").set("multiple").
option("graph.tx_groups.*.write_consistency").set("ALL").
create()
• The allow_scan option can be set at either a single graph level or as shown here, for all actions within a
transaction made on a single node. This setting can be useful if a quorum cannot be mustered for writing
the option change to the system table.
graph.tx().config().option("allow_scan", true).open()
null
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
372
Using DataStax Enterprise advanced functionality
Web-based notebook-style visualization tool. Currently supports Markdown and Gremlin. Includes a
variety of list and graph functions.
DSE OpsCenter
Visual management and monitoring tool.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
373
Using DataStax Enterprise advanced functionality
• Start the Gremlin console using the dse command and passing the additional command gremlin-console:
$ bin/dse gremlin-console
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
plugin activated: tinkerpop.tinkergraph
gremlin>
Three plugins are activated by default, as shown. The Gremlin Server, tinkerpop.server, is started
so that commands can be issued to DSE Graph. The utilities plugin, tinkerpop.utilities, provides
various functions, helper methods and imports of external classes that are useful in Gremlin console.
TinkerGraph, an in-memory graph that is used as an intermediary for some graph operations is started
with tinkerpop.tinkergraph. The Gremlin console automatically connects to the remote Gremlin
Server.
The Gremlin console packaged with DataStax Enterprise does not allow plugin installation like the
Gremlin console packaged with Apache TinkerPop.
$ bin/dse gremlin-console -h
Use -V to display all lines when loading a file, to discover which line of code causes an error.
• Run the Gremlin console with the host:port option to specify a specific host and port:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
374
Using DataStax Enterprise advanced functionality
• Run Gremlin console with the -e flag to execute one or more scripts:
If the scripts run successfully, the command will return with the prompt after execution. If errors occur,
the standard output will show the errors.
• If you prefer to have Gremlin console open at the script completion, run Gremlin console with the -i flag
instead of the -e flag:
If the scripts run successfully, the command will return with the Gremlin console prompt after execution.
If errors occur, the console will show the errors.
• Discover all Gremlin console commands with help. Console commands are not Gremlin language
commands, but rather commands issued to the Gremlin console for shell functionality. The Gremlin console
is based on the Groovy shell.
:help
Available commands:
:help (:h ) Display this help message
? (:? ) Alias to: :help
:exit (:x ) Exit the shell
:quit (:q ) Alias to: :exit
import (:i ) Import a class into the namespace
:display (:d ) Display the current buffer
:clear (:c ) Clear the buffer and reset the prompt counter.
:show (:S ) Show variables, classes or imports
:inspect (:n ) Inspect a variable or the last result with the GUI object
browser
:purge (:p ) Purge variables, classes, imports or preferences
:edit (:e ) Edit the current buffer
:load (:l ) Load a file or URL into the buffer
. (:. ) Alias to: :load
:save (:s ) Save the current buffer to a file
:record (:r ) Record the current session to a file
:history (:H ) Display, manage and recall edit-line history
:alias (:a ) Create an alias
:register (:rc ) Registers a new command with the shell
:doc (:D ) Opens a browser window displaying the doc for the argument
:set (:= ) Set (or list) preferences
:uninstall (:- ) Uninstall a Maven library and its dependencies from the
Gremlin Console
:install (:+ ) Install a Maven library and its dependencies into the Gremlin
Console
:plugin (:pin) Manage plugins for the Console
:remote (:rem) Define a remote connection
:submit (:> ) Send a Gremlin script to Gremlin Server
The Gremlin Console provides code help via auto-complete functionality, using the <TAB> key to trigger a
list of possible options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
375
Using DataStax Enterprise advanced functionality
:install and :plugin should not be used with DSE Graph. These commands will result in gremlin
console errors.
addEdge
Synopsis
Description
Edge data is inserted using addEdge. A previously created edge label must be specified. An edge_id may be
specified, to upsert data for a multiple cardinality edge to prevent creation of a new edge. Property key-value
pairs may be optionally specified.
Examples
Create an edge with an edge label rated between the vertices johnDoe and beefBourguignon with the
properties timestamp, stars, and comment.
Update an edge with an edge label created between the vertices juliaChild and beefBourguignon, specifying
the edge with an edge id of 2c85fabd-7c49-4b28-91a7-ca72ae53fd39, and a property createDate of
2017-08-22:
Note that a conversion function must be used to convert a string to the UUID. T.id is a literal that must be
included in the statement.
addVertex
Synopsis
Description
Vertex data is inserted using addVertex. A previously created vertex label must be specified.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
376
Using DataStax Enterprise advanced functionality
Examples
Create a vertex with a vertex label reviewer with the properties location and status.
io
Synopsis
Description
Graph data is written to a file or read from a file using io. The file to read must be located on a DSE cluster
node, and the written file will be created on the DSE cluster node on which the command is run.
Examples
Write the graph data to a file using the Gryo format:
graph.io(gryo()).writeGraph('/tmp/test.gryo')
Read the graph data from a file using the Gryo format:
graph.io(gryo()).readGraph('/tmp/test.gryo')
This method of reading a graph is not recommended, and will not work with graphs larger than 10,000
vertices or elements. DSE Graph Loader is a better choice in production. Additionally, a schema setting
may need modification for this method to work:
schema.config().option("tx_autostart").set(true)
property
Synopsis
Description
Property data is inserted using property. Property key-value pairs are specified. A property_id may be
specified, to upsert data for a multiple cardinality property to prevent creation of a new property.
Examples
Create a property with values for gender and nickname.
Update the property gender for the vertex juliaChild specifying a property with a property id of
2c85fabd-7c49-4b28-91a7-ca72ae53fd39:
uuid = java.util.UUID.fromString('2c85fabd-7c49-4b28-91a7-ca72ae53fd39')
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
377
Using DataStax Enterprise advanced functionality
Note that a conversion function must be used to convert a string to the UUID. T.id is a literal that must be
included in the statement.
tx().config().option()
Synopsis
tx().config().option(option).open()
Description
Examples
Change the value of allow_scan for a transaction. The effect of this change is to allow all commands
executed in the gremlin-console on a particular node to do full graph scans, even if the consistency level for
the cluster is not QUORUM, the value required to change this option in the appropriate system table.
graph.tx().config().option("allow_scan", true).open()
Note that the previous transaction (automatically opened in gremlin-console or Studio) must be committed
before the new configuration option value is set.
The system API
The system commands create, drop, and describe graphs, as well as list existing graphs and check for
existence. Graph and system configuration can also be set and unset with system commands.
create
Synopsis
system.graph('graph_name').create()
Description
Create a new graph. The graph_name specified is used to create two DSE database keyspaces, graph_name
and graph_name_system, and can only contain alphanumeric and underscore characters.
Creating a graph should include setting the replication factor for the graph and the graph_system. It can
also include other options.
Examples
Create a simple new graph.
system.graph('FridgeItems').create()
==>FridgeItems
is created with the NetworkTopologyStrategy class and replication factor based on the number of datacenter
nodes, since no options were specified.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
378
Using DataStax Enterprise advanced functionality
Create a simple new graph if it doesn't currently exist by modifying with ifNotExists().
system.graph('FridgeItems').ifNotExists().create()
==>FridgeItems
system.graph('FridgeItems).
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3 }").
ifNotExists().create();
The result:
==>null
with a result:
drop
Synopsis
system.graph('graph_name').[ifExists()].drop()
Description
Drop an existing graph using this command. All data and schema will be lost. For better performance, truncate
a graph before dropping it.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
379
Using DataStax Enterprise advanced functionality
Examples
Drop a graph.
system.graph('FridgeItem').drop()
==>null
system.graph('FridgeSensors').ifExists().drop()
==>null
exists
Synopsis
system.graph('graph_name').exists()
Description
Discover if a particular graph exists using this command.
Examples
Discover if a particular graph exists. The return value is a boolean value.
gremlin> system.graph('FridgeItem').exists()
==>true
graphs
Synopsis
system.graphs()
Description
Discover what graphs currently exist using this command.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
380
Using DataStax Enterprise advanced functionality
Examples
Discover all graphs that exist in a DSE cluster.
gremlin> system.graphs()
==>quickstart
==>test
Description
Create a new graph and set the graph_name replication configuration using replication()as well as the
graph_name_system configuration using systemReplication().
Both must be set at the time of graph creation, because replication factor and system replication factor
cannot be altered once set for the graph_name and graph_name_system keyspaces.
DSE database settings for replication factor are used, either SimpleStrategy for a single datacenter or
NetworkTopologyStrategy for multiple datacenters.
The default replication strategy for a multi-datacenter graph is NetworkTopologyStrategy, whereas for a
single datacenter, the replication strategy will default to SimpleStrategy. The number of nodes will determine
the default replication factors:
number of nodes graph_name replication factor graph_name_system
per datacenter replication factor
4 3 4
5 or greater 3 5
Examples
An example that creates a graph on a cluster with a two datacenters:
system.graph('food').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
ifNotExists().create();
The result:
==>null
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
381
Using DataStax Enterprise advanced functionality
The replication settings can be verified using the cqlsh tool, running the CQL DESCRIBE KEYSPACE command:
with a result:
system.graph('graph_name').systemReplication("{'class' : 'NetworkTopologyStrategy',
'dc1' : 3, 'dc2' : 2 }")
Description
Create a new graph and set the graph_name replication configuration using replication()as well as the
graph_name_system configuration using systemReplication().
Both must be set at the time of graph creation, because replication factor and system replication factor
cannot be altered once set for the graph_name and graph_name_system keyspaces.
DSE database settings for replication factor are used, either SimpleStrategy for a single datacenter or
NetworkTopologyStrategy for multiple datacenters.
The default replication strategy for a multi-datacenter graph is NetworkTopologyStrategy, whereas for a
single datacenter, the replication strategy will default to SimpleStrategy. The number of nodes will determine
the default replication factors:
number of nodes graph_name replication factor graph_name_system
per datacenter replication factor
4 3 4
5 or greater 3 5
Examples
An example that creates a graph on a cluster with a two datacenters:
system.graph('food').
replication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
systemReplication("{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2 }").
ifNotExists().create();
The result:
==>null
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
382
Using DataStax Enterprise advanced functionality
The replication settings can be verified using the cqlsh tool, running the CQL DESCRIBE KEYSPACE command:
with a result:
truncate
Synopsis
system.graph('graph_name').[ifExists()].truncate()
Description
Truncate an existing graph using this command. All data will be removed from the graph.
Examples
Truncate a graph.
system.graph('FridgeItem').truncate()
==>null
system.graph('FridgeSensors').ifExists().truncate()
==>null
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
383
Using DataStax Enterprise advanced functionality
• Identify slow queries on a cluster to easily find and tune poorly performing queries.
• Collect per node and cluster wide lifetime metrics by table and keyspace.
• Obtain recent and lifetime statistics about tables, such as the number of SSTables, read/write latency, and
partition (row) size.
• Track read/write activity on a per-client, per-node level for both recent and long-lived activity to identify
problematic user and table interactions.
The OpsCenter Performance Service provides visual monitoring of diagnostics collected through the DSE
Performance Service, displays alerts, and provides recommendations for optimizing cluster performance.
The available diagnostic tables are listed on these pages:
Result:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
384
Using DataStax Enterprise advanced functionality
1. To enforce restrictions, enable DSE Unified Authentication and specify appropriate permissions on the
tables.
2. To prevent users from viewing sensitive information like keyspace, table, and user names that are recorded
in the performance tables, restrict users from reading the tables.
• SimpleStrategy example:
• NetworkTopologyStrategy example:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
385
Using DataStax Enterprise advanced functionality
1. By default, collection is enabled for statements that are issued when the query exceeds a specified time
threshold.
• To permanently enable collecting information on slow queries, edit the dse.yaml file. Uncomment
and define values for cql_slow_log_options as shown in the following listing. Notice the default
skip_writing_to_db: true setting.
cql_slow_log_options:
enabled: true
threshold: 200.0
minimum_samples: 100
ttl_seconds: 259200
skip_writing_to_db: true
num_slowest_queries: 5
If you keep the default skip_writing_to_db: true setting then the slow query information is
stored in memory, not in the node_slow_log table shown later in this section.
To store the slow query information in the node_slow_log table, set skip_writing_to_db to false
in the dse.yaml file.
If you must store the slow query information in memory, then the information is accessed through
the MBean managed Java object named com.datastax.bdp.performance objects.CqlSlowLog
using the operation retrieveRecentSlowestCqlQueries.
• To temporarily change the cqlslowlog settings without changing dse.yaml or restarting DSE, use the
dsetool perf subcommands:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
386
Using DataStax Enterprise advanced functionality
• Set the number of slow queries to keep in memory. For example, 5 queries:
After you collect information using this temporarily set threshold, you can run a script to view queries
that took longer with this threshold than the previously set threshold. For example:
2. You can export slow queries using the CQL COPY TO command:
• key_cache
Per node key cache metrics. Equivalent to nodetool info.
• net_stats
Per node network information. Equivalent to nodetool netstats.
• thread_pool
Per node thread pool active/blocked/pending/completed statistics by pool. Equivalent to nodetool tpstats.
• thread_pool_messages
Per node counts of dropped messages by message type. Equivalent to nodetool tpstats.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
387
Using DataStax Enterprise advanced functionality
2. In the dse.yaml file, set the enabled option for cql_system_info_options to true.
cql_system_info_options:
enabled: true
refresh_rate_ms: 10000
3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.
• object_io
Per node recent latency metrics by keyspace and table.
• object_read_io_snapshot
Per node recent latency metrics, broken down by keyspace and table and orders data by mean read
latency.
• object_write_io_snapshot
Per node recent latency metrics, broken down by keyspace and table and orders data mean write latency.
2. In the dse.yaml file, set the enabled option for resource_level_latency_tracking_options to true.
resource_level_latency_tracking_options:
enabled: true
refresh_rate_ms: 10000
3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.
• object_read_io_snapshot
• object_write_io_snapshot
The two tables are essentially views of the same data, but are ordered differently. Using these tables, you can
identify which data objects on the node currently cause the most write and read latency to users. Because this
is time-sensitive data, if a data object sees no activity for a period, no data will be recorded for them in these
tables.
In addition to these two tables, the Performance Service also keeps per-object latency information with a
longer retention policy in the object_io table. Again, this table holds mean latency and total count values
for both read and write operations, but it can be queried for statistics on specific data objects (either at the
keyspace or table level). Using this table enables you to pull back statistics for all tables on a particular node,
with the option of restricting results to a given keyspace or specific table.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
388
Using DataStax Enterprise advanced functionality
• node_table_snapshot
Per node lifetime table metrics broken down by keyspace and table.
• table_snapshot
Cluster wide lifetime table metrics broken down by keyspace and table (aggregates node_table_snapshot
from each node in the cluster).
• keyspace_snapshot
Cluster wide lifetime table metrics, aggregated at the keyspace level (rolls up the data in table_snapshot).
2. In the dse.yaml file, set the enabled option for db_summary_stats_options to true.
3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.
Changes made with performance object subcommands do not persist between restarts and are useful
only for short-term diagnostics.
• cluster_snapshot
Per node system metrics.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
389
Using DataStax Enterprise advanced functionality
• dc_snapshot
Aggregates node_snapshot data at the datacenter level.
• node_snapshot
Aggregates node_snapshot data for the whole cluster.
To enable collecting cluster summary diagnostics using the DataStax Enterprise Performance Service:
2. In the dse.yaml file, set the enabled option for cluster_summary_stats_options to true.
3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.
cell_count Y Y N N N N
partition_size Y Y N N N N
range_latency Y Y Y N Y N
read_latency Y Y Y N Y N
sstables_per_read Y Y Y N N N
write_latency Y Y Y N N N
These tables show similar information to the data obtained by the nodetool tablehistograms utility. The
major difference is that the nodetool histograms output is recent data, while the diagnostic tables contain
lifetime data. The data in the diagnostic histogram tables is cumulative since the DSE server was started. In
contrast, the nodetool tablehistograms shows the values for the past fifteen minutes.
To enable the collection of table histogram data using the DataStax Enterprise Performance Service:
2. In the dse.yaml file, set the enabled option for histogram_data_options to true.
3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
390
Using DataStax Enterprise advanced functionality
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.
4. To control the number of complete histograms kept in the tables at any one time, change the
retention_count parameter.
• object_user_io
Per node, long-lived read/write metrics broken down by keyspace, table, and client connection. Each
row contains mean read/write latencies and operation counts for a interactions with a specific table by a
specific client connection during the last sampling period in which it was active. This data has a 10 minute
TTL.
• object_user_read_io_snapshot
Per node recent read/write metrics by client, keyspace, and table. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean read latency.
• object_user_write_io_snapshot
Per node recent read/write metrics by client, keyspace, and table. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean write latency.
• user_io
Per node, long-lived read/write metrics broken down by client connection and aggregated for all
keyspaces and tables. Each row contains mean read/write latencies and operation counts for a specific
connection during the last sampling period in which it was active. This data has a 10 minute TTL.
• user_object_io
Per node, long-lived read/write metrics broken down by client connection, keyspace, and table. Each row
contains mean read/write latencies and operation counts for interactions with a specific table by a specific
client connection during the last sampling period in which it was active. This data has a 10 minute TTL.
object_user_io and user_object_io represent two different views of the same underlying data:
• user_object_read_io_snapshot
Per node recent read/write metrics by keyspace, table, and client. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean read latency.
• user_object_write_io_snapshot
Per node recent read/write metrics by keyspace, table, and client. This table contains only data relating to
clients that were active during the most recent sampling period. Ordered by mean read latency.
• user_read_io_snapshot
Per node recent read/write metrics by client. This table contains only data relating to clients that were
active during the most recent sampling period. Ordered by mean read latency.
• user_write_io_snapshot
Per node recent read/write metrics by client. This table contains only data relating to clients that were
active during the most recent sampling period. Ordered by mean write latency.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
391
Using DataStax Enterprise advanced functionality
To enable collecting user activity diagnostics using the DSE Performance Service:
2. In the dse.yaml file, set the enabled option for user_level_latency_tracking_options to true.
3. (Optional) To control how often the statistics are refreshed, increase or decrease the refresh_rate_ms
option in dse.yaml.
The refresh_rate_ms specifies the length of the sampling period, that is, the frequency with which
this data is updated.
• user_read_io_snapshot
• user_write_io_snapshot
These tables record mean the read/write latencies and total read/write counts per-user on each node. They
are ordered by their mean latency values, so you can quickly see which users are the experiencing the highest
average latencies on a given node. Having identified the users experiencing the highest latency on a node,
you can then can drill down to find the hot spots for those clients.
To do this, query the user_object_read_io_snapshot and user_object_write_io_snapshot tables. These tables
store mean read/write latency and total read/write count by table for the specified user. They are ordered
according to the mean latency values, and therefore able to quickly show for a given user which tables are
contributing most to the experienced latencies.
The data in these tables is refreshed periodically (by default every 10 seconds), so querying them always
provides an up-to-date view of the data objects with the highest mean latencies on a given node. Because this
is time-sensitive data, if a user performs no activity for a period, no data is recorded for them in these tables.
The user_object_io table also reports per-node user activity broken down by keyspace/table and retains it
over a longer period (4 hours by default). This allows the Performance Service to query by node and user to
see latency metrics from all tables or restricted to a single keyspace or table. The data in this table is updated
periodically (again every 10 seconds by default).
The user_io table reports aggregate latency metrics for users on a single node. Using this table, you can query
by node and user to see high-level latency statistics across all keyspaces.
Collection of search data
Collecting slow search queries
The solr_slow_sub_query_log_options performance object reports distributed sub-queries (query executions
on individual shards) that take longer than a specified period of time.
All objects are disabled by default.
To identify slow search queries using the DataStax Enterprise Performance Service:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
392
Using DataStax Enterprise advanced functionality
1. To permanently enable and configure collecting information on slow search queries, edit the dse.yaml file
and uncomment the solr_slow_sub_query_log_options parameters, and define values for the CQL
slow log settings:
solr_slow_sub_query_log_options:
enabled:true
ttl_seconds: 604800
async_writers: 1
threshold_ms: 100
ttl_seconds How many seconds a record survives before it is expired from the performance object.
async_writers For event-driven objects, such as the slow log, determines the number of possible concurrent slow
query recordings. Objects like solr_result_cache_stats are updated in the background.
threshold_ms For the slow log, the level (in milliseconds) at which a sub-query slow enough to be reported.
2. To temporarily change the running parameters for collecting information on slow Solr queries:
To temporarily enable collecting information:
3. You can export slow search queries using the CQL COPY TO command:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
393
Using DataStax Enterprise advanced functionality
2. In the dse.yaml file, under the solr_latency_snapshot_options parameter, change enabled to true
and set the other options as required.
ttl_seconds How many seconds a record survives before it is expired from the performance object.
refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.
2. In the dse.yaml file, under the solr_cache_stats_optionsparameter, change enabled to true and set
the other options as required.
refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
394
Using DataStax Enterprise advanced functionality
Options Determines
ttl_seconds How many seconds a record survives before it is expired from the performance object.
• To permanently collect index statistics, edit the dse.yaml file and change the
solr_index_stats_options to change enabled to true, set the other options as required, and
restart DSE to recognize the changes.
refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.
ttl_seconds How many seconds a record survives before it is expired from the performance object.
• To temporarily enable or disable collecting index statistics, use dsetool perf solrindexstats.
2. In the dse.yaml file, uncomment the solr_update_handler_metrics_options parameter and set the
options as required.
ttl_seconds How many seconds a record survives before it is expired from the performance object.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
395
Using DataStax Enterprise advanced functionality
Options Determines
refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.
ttl_seconds How many seconds a record survives before it is expired from the performance object.
refresh_rate_ms Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
396
Using DataStax Enterprise advanced functionality
To enable collecting Spark cluster information, configure the options in the spark_cluster_info_options
section of dse.yaml.
The driver subsection of spark_application_info_options controls the metrics that are collected by the Spark
Driver.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
397
Using DataStax Enterprise advanced functionality
The executor subsection of spark_application_info_options controls the metrics collected by the Spark
executors.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
398
Using DataStax Enterprise advanced functionality
• Histogram tables
Table names that contain _snapshot are not related to nodetool snapshot. These tables are snapshots of
the data in the last few seconds of activity in the system.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
399
Using DataStax Enterprise advanced functionality
]
Column Name Data type Description
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
400
Using DataStax Enterprise advanced functionality
last_activity timestamp End of sampling period in which this object was last active.
read_latency double Mean value in microseconds for all reads during the last active sampling period
for this object.
total_reads bigint Count during the last active sampling period for this object.
total_writes bigint Count during the last active sampling period for this object.
write_latency double Mean value in microseconds for all writes during the last active sampling period
for this object.
latency_index int Ranking by mean read latency during the last sampling period.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
401
Using DataStax Enterprise advanced functionality
latency_index int Ranking by mean read latency during the last sampling period.
latency_index int Ranking by mean write latency during the last sampling period.
read_latency double Mean value in microseconds during the active sampling period.
write_latency double Mean value in microseconds during the last sampling period.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
402
Using DataStax Enterprise advanced functionality
droppable_tombstone_ratio double Ratio of tombstones older than gc_grace_seconds against total column count in
all SSTables.
memtable_columns_count bigint Approximate number of cells for this table currently resident in memtables.
memtable_switch_count bigint Number of times memtables have been flushed since startup.
unleveled_sstables bigint Current count of SSTables in level 0 (if using leveled compaction).
droppable_tombstone_ratio double Ratio of tombstones older than gc_grace_seconds against total column count in
all SSTables.
memtable_columns_count bigint Approximate number of cells for this table currently resident in memtables.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
403
Using DataStax Enterprise advanced functionality
memtable_switch_count bigint Number of times memtables have been flushed since startup.
unleveled_sstables bigint Current count of SSTables in level 0 (if using leveled compaction).
mean_read_latency double For all tables in the keyspace and all nodes in the cluster since startup.
mean_write_latency double For all tables in the keyspace and all nodes in the cluster since startup.
total_data_size bigint Total size in bytes of SSTables for all tables and indexes across all nodes in the
cluster.
cms_collection_time bigint Total time spent in CMS garbage collection since startup.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
404
Using DataStax Enterprise advanced functionality
mean_range_slice_latency double Mean latency in microseconds for range slice operations since startup.
parnew_collection_time bigint Total time spent in ParNew garbage collection since startup.
process_cpu_load double Current CPU load for the DSE process (Linux only).
range_slice_timeouts bigint Number of timed out range slice requests since startup.
read_timeouts bigint Number of timed out range slice requests since startup.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
405
Using DataStax Enterprise advanced functionality
total_range_slices bigint Total number of range slice operations performed since startup.
write_timeouts bigint Number of timed out range slice requests since startup.
compactions_completed bigint Total number of compactions completed since startup by all nodes in the data
center.
compactions_pending int Total number of pending compactions on all nodes in the datacenter.
completed_mutations bigint Total number of mutations performed since startup by all nodes in the data
center.
dropped_mutation_ratio double Ratio of dropped to completed mutations since startup across all nodes in the
datacenter.
dropped_mutations bigint Total number of dropped mutations since startup by all nodes in the data center.
flush_sorter_tasks_pending bigint Total number of memtable flush sort tasks pending across all nodes in the
datacenter.
free_space bigint Total free disk space in bytes across all nodes in the datacenter.
gossip_tasks_pending bigint Total number of gossip tasks pending across all nodes in the data center.
hinted_handoff_pending bigint Total number of hinted handoff tasks pending across all nodes in the data center.
index_data_size bigint Total size in bytes of index column families across all nodes in the data center.
internal_responses_pending bigint number of internal response tasks pending across all nodes in the data center.
key_cache_capacity bigint Total capacity in bytes of key caches across all nodes in the data center.
key_cache_entries bigint Total number of entries in key caches across all nodes in the data center.
key_cache_size bigint Total consumed size in bytes of key caches across all nodes in the data center.
manual_repair_tasks_pending bigint Total number of manual repair tasks pending across all nodes in the data center.
mean_range_slice_latency double Mean latency in microseconds for range slice operations, averaged across all
nodes in the datacenter.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
406
Using DataStax Enterprise advanced functionality
mean_read_latency double Mean latency in microseconds for read operations, averaged across all nodes in
the datacenter.
mean_write_latency double Mean latency in microseconds for write operations, averaged across all nodes in
the datacenter.
memtable_post_flushers_pending bigint Total number of memtable post flush tasks pending across all nodes in the
datacenter.
migrations_pending bigint Total number of migration tasks pending across all nodes in the data center.
misc_tasks_pending bigint Total number of misc tasks pending across all nodes in the datacenter.
read_repair_tasks_pending bigint Total number of read repair tasks pending across all nodes in the data center.
read_requests_pending bigint Total read requests pending across all nodes in the datacenter.
replicate_on_write_tasks_pending bigint Total number of counter replicate on write tasks pending across all nodes in the
datacenter.
request_responses_pending bigint Total number of request response tasks pending across all nodes in the data
center.
row_cache_capacity bigint Total capacity in bytes of partition caches across all nodes in the data center.
row_cache_entries bigint Total number of row cache entries all nodes in the datacenter.
row_cache_size bigint Total consumed size in bytes of row caches across all nodes in the data center.
storage_capacity bigint Total disk space in bytes across all nodes in the datacenter.
streams_pending int number of pending streams across all nodes in the datacenter.
table_data_size bigint Total size in bytes of non-index column families across all nodes in the data
center.
total_batches_replayed bigint Total number of batchlog entries replayed since startup by all nodes in the
datacenter.
total_range_slices bigint Total number of range slice operations performed since startup by all nodes in
the datacenter.
total_reads bigint Total number of read operations performed since startup by all nodes in the
datacenter.
total_writes bigint Total number of write operations performed since startup by all nodes in the
datacenter.
write_requests_pending bigint Total number of write tasks pending across all nodes in the data center.
compactions_completed bigint Total number of compactions completed since startup by all nodes in the cluster.
completed_mutations bigint Total number of mutations performed since startup by all nodes in the cluster.
compactions_pending int Total number of pending compactions on all nodes in the cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
407
Using DataStax Enterprise advanced functionality
dropped_mutation_ratio double Ratio of dropped to completed mutations since startup across all nodes in the
cluster.
dropped_mutations bigint Total number of dropped mutations since startup by all nodes in the cluster.
flush_sorter_tasks_pending bigint Total number of memtable flush sort tasks pending across all nodes in the
cluster.
free_space bigint Total free disk space in bytes across all nodes in the cluster.
gossip_tasks_pending bigint Total number of gossip tasks pending across all nodes in the cluster.
hinted_handoff_pending bigint Total number of hinted handoff tasks pending across all nodes in the cluster.
index_data_size bigint Total size in bytes of index column families across all nodes in the cluster.
internal_responses_pending bigint Number of internal response tasks pending across all nodes in the cluster.
key_cache_capacity bigint Total capacity in bytes of key caches across all nodes in the cluster.
key_cache_entries bigint Total number of entries in key caches across all nodes in the cluster.
key_cache_size bigint Total consumed size in bytes of key caches across all nodes in the cluster.
manual_repair_tasks_pending bigint Total number of manual repair tasks pending across all nodes in the cluster.
mean_range_slice_latency double Mean latency in microseconds for range slice operations, averaged across all
nodes in the cluster.
mean_read_latency double Mean latency in microseconds for read operations, averaged across all nodes in
the cluster.
mean_write_latency double Mean latency in microseconds for write operations, averaged across all nodes in
the cluster.
memtable_post_flushers_pending bigint Total number of memtable post flush tasks pending across all nodes in the
cluster.
migrations_pending bigint Total number of migration tasks pending across all nodes in the cluster.
misc_tasks_pending bigint Total number of misc tasks pending across all nodes in the cluster.
read_repair_tasks_pending bigint Total number of read repair tasks pending across all nodes in the cluster.
read_requests_pending bigint Total read requests pending across all nodes in the cluster.
replicate_on_write_tasks_pending bigint Total number of counter replicate on write tasks pending across all nodes in the
cluster.
request_responses_pending bigint Total number of request response tasks pending across all nodes in the cluster
row_cache_capacity bigint Total capacity in bytes of partition caches across all nodes in the cluster.
row_cache_entries bigint Total number of row cache entries all nodes in the cluster.
row_cache_size bigint Total consumed size in bytes of row caches across all nodes in the cluster
storage_capacity bigint Total disk space in bytes across all nodes in the cluster.
streams_pending int Number of pending streams across all nodes in the cluster.
table_data_size bigint Total size in bytes of non-index column families across all nodes in the cluster.
total_batches_replayed bigint Total number of batchlog entries replayed since startup by all nodes in the
cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
408
Using DataStax Enterprise advanced functionality
total_range_slices bigint Total number of read operations performed since startup by all nodes in the
cluster.
total_reads bigint Total number of write operations performed since startup by all nodes in the
cluster.
total_writes bigint Total number of write tasks pending across all nodes in the cluster.
write_requests_pending bigint Total number of write tasks pending across all nodes in the cluster.
Histogram tables
A histogram measures the distribution of values in a stream of data. These histogram tables use the Metrics
Core library. You must enable the collection of table histogram data using the DataStax Enterprise Performance
Service.
These tables show similar information to the data obtained by the nodetool tablehistograms utility. The major
difference is that the nodetool histograms output is recent data, while the diagnostic tables contain lifetime
data. The data in the diagnostic histogram tables is cumulative since the DSE server was started. In contrast,
the nodetool tablehistograms shows the values for the past fifteen minutes.
Histogram tables provide DSE statistics that can be queried with CQL and are generated with these templates:
• Detailed
• Summary
• Global
cell_count Y Y N N N N
partition_size Y Y N N N N
range_latency Y Y Y N Y N
read_latency Y Y Y N Y N
sstables_per_read Y Y Y N N N
write_latency Y Y Y N N N
histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.
p50 bigint The threshold where 50 percent of the operation is recorded 50% from the end,
for the 50th percentile.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
409
Using DataStax Enterprise advanced functionality
p75 bigint The threshold where 75 percent of the operation is recorded 25% from the end,
for the 75th percentile.
p90 bigint The threshold where 90 percent of the operation is recorded 10% from the end,
for the 90th percentile.
p95 bigint The threshold where 95 percent of the operation is recorded 5% from the end, for
the 95th percentile.
p98 bigint The threshold where 98 percent of the operation is recorded 2% from the end, for
the 98th percentile.
p99 bigint The threshold where 99 percent of the operation is recorded 1% from the end, for
the 99th percentile.
dropped_messages bigint The total number of dropped messages for mutations to this table.
histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.
bucket_offset bigint The number between the current bucket and the previous bucket.
bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.
histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.
bucket_offset bigint The number between the current bucket and the previous bucket.
bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
410
Using DataStax Enterprise advanced functionality
histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace, and table are ordered by this field, to enable date-based filtering.
bucket_count bigint Number of partitions where the cell count falls in the corresponding bucket.
histogram_id timestamp Groups rows by the specific histogram they belong to. Rows for the same node,
keyspace & table are ordered by this field, to enable date-based filtering.
bucket_offset bigint The number between the current bucket and the previous bucket.
bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.
histogram_id timestamp The timestamp when the histogram record was written.
verb text Where verb denotes the message type that was dropped: MUTATION,
HINT, READ_REPAIR, READ, REQUEST_RESPONSE, BATCH_STORE,
BATCH_REMOVE, RANGE_SLICE, GOSSIP_DIGEST_SYN,
GOSSIP_DIGEST_ACK, GOSSIP_DIGEST_ACK2, DEFINITIONS_UPDATE,
TRUNCATE, SCHEMA_CHECK, REPLICATION_FINISHED,
INTERNAL_RESPONSE, COUNTER_MUTATION, SNAPSHOT,
MIGRATION_REQUEST, GOSSIP_SHUTDOWN, ECHO, REPAIR_MESSAGE,
PAXOS_PREPARE, PAXOS_PROPOSE, PAXOS_COMMIT.
global_count bigint Global metrics are the sum of the internal and the cross-node metrics for
dropped events since the server was started, including dropped mutations.
global_mean_rate double Global metrics for dropped messages, including dropped mutations.
global_1min_rate double Estimated rate of the combined internal and the cross-node metrics for dropped
messages for 1 minute.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
411
Using DataStax Enterprise advanced functionality
global_5min_rate double Estimated sum of the internal and the cross-node metrics for dropped messages
internal_count bigint Inside one DSE node, the number of internal messages that were dropped since
the server was started.
internal_mean_rate double Inside one DSE node, the average rate of dropped message events per second.
internal_1min_rate double Inside one DSE node, the average number of messages that were dropped in a
one-minute interval.
internal_5min_rate double Inside one DSE node, the average number of messages that were dropped in a
five-minute interval.
internal_15min_rate double Inside one DSE node, the average number of messages that were dropped in a
15-minute interval.
internal_latency_median double Inside one DSE node, the median of all recorded durations for one second.
internal_latency_p75 double Inside one DSE node, the threshold where 75 percent of the latency is recorded
25% from the end, for the 75th percentile.
internal_latency_p90 double Inside one DSE node, the threshold where 90 percent of the latency is recorded
10% from the end, for the 90th percentile.
internal_latency_p95 double Inside one DSE node, the threshold where 95 percent of the latency is recorded
5% from the end, for the 95th percentile.
internal_latency_p98 double Inside one DSE node, the threshold where 98 percent of the latency is recorded
2% from the end, for the 98th percentile.
internal_latency_p99 double Inside one DSE node, the threshold where 99 percent of the latency is recorded
1% from the end, for the 99th percentile.
internal_latency_min double Inside one DSE node, the minimum number of dropped mutations.
internal_latency_mean double Inside one DSE node, the average number of messages dropped.
internal_latency_max double Inside one DSE node, the maximum number of dropped mutations.
internal_latency_stdev double Inside one DSE node, the standard deviation of dropped mutations.
xnode_count bigint For cross node messages, the number of messages that were dropped since the
server was started.
xnode_mean_rate double For cross node messages, the average number of messages dropped.
xnode_1min_rate double For cross node messages, the average number of messages that were dropped
in a one-minute interval.
xnode_5min_rate double For cross node messages, the average number of messages that were dropped
in a five-minute interval.
xnode_15min_rate double For cross node messages, the average number of messages that were dropped
in a 15-minute interval.
xnode_median double For cross node messages, the median of all recorded durations for one second.
xnode_p75 double For cross node messages, the threshold where 75 percent of the latency is
recorded 25% from the end, for the 75th percentile.
xnode_p90 double For cross node messages, the threshold where 90 percent of the latency is
recorded 10% from the end, for the 90th percentile.
xnode_p95 double For cross node messages, the threshold where 95 percent of the latency is
recorded 5% from the end, for the 95th percentile.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
412
Using DataStax Enterprise advanced functionality
xnode_p98 double For cross node messages, the threshold where 98 percent of the latency is
recorded 2% from the end, for the 98th percentile.
xnode_p99 double For cross node messages, the maximum number dropped messages are
recorded 1% from the end, for the 99th percentile.
xnode_min double For cross node messages, the minimum number of dropped messages per
second.
xnode_mean double For cross node messages, the average number of dropped messages per
second.
xnode_max double For cross node messages, the maximum number of dropped messages per
second.
xnode_stdev double For cross node messages, the standard deviation for dropped messages per
second.
bucket_offset bigint The number between the current bucket and the previous bucket.
bucket_count bigint The sum of values being measured that is less than or equal to this offset and
greater than or equal to the previous offset.
last_activity timestamp End of sampling period in which this client was last active.
total_reads bigint Count during the last active sampling period for this client.
total_writes bigint Count during the last active sampling period for this client.
latency_index int Ranking by mean read latency during the last sampling period.
read_latency double Mean value in microseconds during the last sampling period.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
413
Using DataStax Enterprise advanced functionality
write_latency double Mean value in microseconds during the last sampling period.
latency_index int Ranking by mean write latency during the last sampling period.
read_latency double Mean value in microseconds during the last sampling period.
write_latency double Mean value in microseconds during the last sampling period.
last_activity timestamp End of sampling period in which this client was last active against this object.
read_latency double Mean value in microseconds during the last active sampling period for this object/
client.
total_reads bigint During the last active sampling period for this object/client.
total_writes bigint During the last active sampling period for this object/client.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
414
Using DataStax Enterprise advanced functionality
write_latency double Mean value in microseconds during the last active sampling period for this object/
client.
latency_index int Ranking by mean write latency during the last sampling period.
read_latency double Mean value in microseconds during the last sampling period.
write_latency double Mean value in microseconds during the last sampling period.
last_activity timestamp End of sampling period in which this client connection was last active against this
object.
read_latency double Mean value in microseconds during the last active sampling period for this object/
client.
total_reads bigint Count during the last active sampling period for this object/client.
total_writes bigint Count during the last active sampling period for this object/client.
write_latency double Mean value in microseconds during the last active sampling period for this object/
client.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
415
Using DataStax Enterprise advanced functionality
latency_index int Ranking by mean read latency during the last sampling period.
read_latency double Mean value in microseconds during the last active sampling period for this object/
client.
total_reads bigint Count during the last active sampling period for this object/client.
total_writes bigint Count during the last active sampling period for this object/client.
write_latency double Mean value in microseconds during the last active sampling period for this object/
client.
latency_index int Ranking by mean write latency during the last sampling period.
read_latency double Mean value in microseconds during the last active sampling period for this object/
client.
total_reads bigint Count during the last active sampling period for this object/client.
total_writes bigint Count during the last active sampling period for this object/client.
write_latency double Mean value in microseconds during the last active sampling period for this object/
client.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
416
Using DataStax Enterprise advanced functionality
Leases table
Acquire, disable, renew, and resolve are the four lease operations. Histogram statistics indicate the rough
distribution of timing: the average amount of time, the time for the worst request out of 100, the absolute worst
request, and the rate at which operations are happening.
Table 60: leases table
[
Lease metrics for the lease subsystem.
]
Column Name Data type Description
acquire_latency99ms bigint Latency recorded 1% from the end, for the 99th percentile.
disable_latency99ms bigint The time for the worst request out of 100.
monitor inet The machine partially responsible with the lease, there are Replication_Factor #
of monitors.
renew_latency99ms bigint The time for the worst request out of 100.
resolve_latency99ms bigint The time for the worst request out of 100.
up boolean Whether the lease is held, implies that the service is up.
up_or_down_since timestamp Time of the last change. For example, UP since 10PM or DOWN since 4PM.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
417
Using DataStax Enterprise advanced functionality
Answer: The search performance objects should only be enabled on search nodes, that is, nodes where
indexes reside that can observe search operations. While it is perfectly acceptable to enable the objects
across an entire cluster, enabling them on a single node for observation first is a good way to mitigate risk.
Question: Can I use existing tables with secondary indexes on some columns, and create search indexes on
other columns in the same table?
Answer: Do not mix search indexes with secondary indexes. Attempting to use both indexes on the same table
is not supported.
Slow sub-query log for search
Report distributed sub-queries for search (query executions on individual shards) that take longer than a
specified period of time.
JMX analog
None.
Schema
When slow query logging is enabled, this table is created automatically.
Slow Solr sub-queries recorded on 10/17/2015 for core keyspace.table for coordinator at 127.0.0.1:
SELECT *
FROM solr_slow_sub_query_log
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
418
Using DataStax Enterprise advanced functionality
Slow Solr sub-queries recorded on 10/17/2015 for core keyspace.table for coordinator at 127.0.0.1 for a
particular distributed query with an ID of 33e56d33-4e63-11e4-9ce5-335a04d08bd4 :
SELECT *
FROM solr_slow_sub_query_log
WHERE core = 'keyspace.table'
AND date = '2015-10-17'
AND coordinator_ip = '127.0.0.1'
AND query_id = 33e56d33-4e63-11e4-9ce5-335a04d08bd4;
Collecting slow search queries [Steps to help you identify slow search queries using the DataStax Enterprise
Performance Service.]
Indexing error log
Record errors that occur during document indexing.
Specifically, this log records errors that occur during document validation. A common scenario is where a non-
stored copy field is copied into a field with an incompatible type.
JMX Analog
None.
Schema
Indexing validation errors recorded on 10/17/2014 for core keyspace.table for at node 127.0.0.1:
SELECT *
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
419
Using DataStax Enterprise advanced functionality
FROM solr_indexing_errors
WHERE core = 'keyspace.table' AND date = '2014-10-17' AND node_ip = '127.0.0.1';
Most recent 5 indexing validation errors recorded on 10/17/2014 for core keyspace.table for at node
127.0.0.1:
SELECT *
FROM solr_indexing_errors
WHERE core = 'keyspace.table'
AND date = '2014-10-17'
AND node_ip = '127.0.0.1'
ORDER BY time DESC
LIMIT 5;
JMX Analog
com.datastax.bdp/search/core/QueryMetrics
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
420
Using DataStax Enterprise advanced functionality
SELECT *
FROM solr_query_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:
SELECT *
FROM solr_query_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;
JMX analog
com.datastax.bdp/search/core/UpdateMetrics
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
421
Using DataStax Enterprise advanced functionality
SELECT *
FROM solr_update_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:
SELECT *
FROM solr_update_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;
JMX Analog
com.datastax.bdp/search/core/CommitMetrics
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
422
Using DataStax Enterprise advanced functionality
AND gc_grace_seconds=0
SELECT *
FROM solr_commit_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:
SELECT *
FROM solr_commit_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;
JMX analog
com.datastax.bdp/search/core/MergeMetrics
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
423
Using DataStax Enterprise advanced functionality
phase text,
count bigint,
latency_percentiles_micros maptext, bigint
PRIMARY KEY ((node_ip, core, date), phase, time)
)
WITH CLUSTERING ORDER BY (phase ASC, time DESC)
AND gc_grace_seconds=0
SELECT *
FROM solr_merge_latency_snapshot
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots for the EXECUTE phase recorded on 10/17/2014 for core keyspace.table” on
the node 127.0.0.1:
SELECT *
FROM solr_merge_latency_snapshot
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND phase = 'EXECUTE'
LIMIT 5;
Solr exposes a core’s filter cache statistics through its registered index searcher, but the core may have many
index searchers over its lifetime. To reflect this, statistics are provided for the currently registered searcher as
well as cumulative/lifetime statistics.
If the dseFilterCache hit_ratio declines over time, and this hit_ratio decline corresponds to a higher average
latency from the QueryMetrics.getAverageLatency(EXECUTE, null) MBean, consider increasing the size of
your filter cache in the search index config.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
424
Using DataStax Enterprise advanced functionality
JMX analog
solr/core/dseFilterCache/com.datastax.bdp.search.solr.FilterCacheMBean
Schema
Snapshots for cumulative statistics recorded on 10/17/2014 for core “keyspace.table” on the node
127.0.0.1:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
425
Using DataStax Enterprise advanced functionality
Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:
SELECT *
FROM solr_filter_cache_stats
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;
Collecting cache statistics [Enable the solr_cache_stats_options parameter in dse.yaml and set the other
options as required.]
Query result cache statistics
Record core-specific query result cache statistics over time.
The core result cache statistics is exposed through its registered index searcher, but the core may have many
index searchers over its lifetime. To reflect the index searchers, statistics for the currently registered searcher
are provided with cumulative/lifetime statistics.
JMX analog
solr/core/queryResultCache/*
Schema
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
426
Using DataStax Enterprise advanced functionality
Snapshots for cumulative statistics recorded on 10/17/2014 for core keyspace.table on the node
127.0.0.1:
Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:
SELECT *
FROM solr_result_cache_stats
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;
Collecting cache statistics [Enable the solr_cache_stats_options parameter in dse.yaml and set the other
options as required.]
Index statistics
Record core-specific index overview statistics over time.
JMX analog
Schema
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
427
Using DataStax Enterprise advanced functionality
SELECT *
FROM solr_index_stats
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:
SELECT *
FROM solr_index_stats
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;
A few fields in this table have both cumulative and non-cumulative versions. The non-cumulative statistics
are zeroed out following rollback or commit, while the cumulative versions persist through those events.
The exception is errors, which is actually cumulative and takes into account a few failure cases that
cumulative_errors does not.
JMX analog
solr/core/updateHandler
Schema
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
428
Using DataStax Enterprise advanced functionality
adds bigint,
cumulative_adds bigint,
commits bigint,
autocommits int,
autocommit_max_time text,
autocommit_max_docs int,
soft_autocommits int,
soft_autocommit_max_docs int,
soft_autocommit_max_time text,
deletes_by_id bigint,
deletes_by_query bigint,
cumulative_deletes_by_id bigint,
cumulative_deletes_by_query bigint,
expunge_deletes bigint,
errors bigint,
cumulative_errors bigint,
docs_pending bigint,
optimizes bigint,
rollbacks bigint,
PRIMARY KEY ((node_ip, core, date), time)
)
WITH gc_grace_seconds=0
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
429
Using DataStax Enterprise advanced functionality
SELECT *
FROM solr_update_handler_metrics
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots recorded on 10/17/2014 for core keyspace.table on the node 127.0.0.1:
SELECT *
FROM solr_update_handler_metrics
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
ORDER BY time DESC
LIMIT 5;
JMX analog
Schema
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
430
Using DataStax Enterprise advanced functionality
SELECT *
FROM solr_update_request_handler_metrics
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots for handler “search” recorded on 10/17/2014 for core keyspace.table on the
node 127.0.0.1:
SELECT *
FROM solr_search_request_handler_metrics
WHERE node_ip = '127.0.0.1'
AND core = 'keyspace.table'
AND date = '2014-10-17'
AND handler_name = 'search'
LIMIT 5;
solr/core/search
Schema
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
431
Using DataStax Enterprise advanced functionality
AND gc_grace_seconds=0
Snapshots recorded for all update handlers on 10/17/2014 for core keyspace.table on the node
127.0.0.1:
SELECT *
FROM solr_search_request_handler_metrics
WHERE node_ip = '127.0.0.1' AND core = 'keyspace.table' AND date = '2014-10-17';
Most recent 5 snapshots for handler “/update/json” recorded on 10/17/2014 for core keyspace.table on
the node 127.0.0.1:
SELECT *
FROM solr_search_request_handler_metrics
WHERE node_ip = '127.0.0.1' AND
core = 'keyspace.table'
AND date = '2014-10-17'
AND handler_name = '/update/json'
LIMIT 5;
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
432
Using DataStax Enterprise advanced functionality
Repair Service
This OpsCenter service performs repair operations in the background across a DataStax Enterprise cluster with
minimal impact. This process alleviates the potential performance impact of having to periodically run repair on
entire nodes.
See Repair Service.
Features
Smartly replicates data from Supports replicating data in a spoke and hub configuration from remote locations to central data hubs and
source clusters to destination repositories. Enterprise customers with remote clusters are able to establish a cluster presence in each
clusters location. In addition, mesh configuration can replicate data from any source cluster to another destination
cluster within reasonable limits.
Prioritizes data streams Allows higher priority data streams to be sent from the source cluster to a destination cluster ahead of lower
priority data streams.
Supports ingestion and DSE Advanced Replication enables ingesting and querying data at any source and sent to any destination
querying of data at every that collects and analyzes data from all of the sites.
source
Solves problem of periodic Useful for energy (oil and gas), transportation, telecommunications, retail (point-of-sale systems), and other
downtime vertical markets that might experience periods of network or internet downtime at the remote locations.
Satisfies data sovereignty Provides configurable streams of selected outbound data, while preventing data changes to inbound data.
regulations
Satisfies data locality Prevents data from leaving the current geography.
regulations
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
433
Using DataStax Enterprise advanced functionality
Figure 9:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
434
Using DataStax Enterprise advanced functionality
This configuration would also be suitable to a network of microservices clusters that report data to a central
analytics cluster.
Another scenario may include similar remote sites that mainly send data to a centralized location, but must
periodically be updated with information from the centralized location. In this scenario, each remote cluster would
be both a source and a destination, with two channels designated, one upstream and one downstream. A small
Point of Sale (POS) system serves as a possible model for this scenario, with periodic updates to the remote
systems.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
435
Using DataStax Enterprise advanced functionality
Figure 10:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
436
Using DataStax Enterprise advanced functionality
A mesh network can also use advanced replication, with remote clusters receiving updates from either a central
location or another remote cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
437
Using DataStax Enterprise advanced functionality
Figure 11:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
438
Using DataStax Enterprise advanced functionality
Although any cluster, remote or centralized, may serve as a source for an advanced replication channel, a limited
number of destinations can be configured for any one source. In general, consider the flow of replication as many
sources to few destinations, rather than few sources to many destinations.
Traffic between the clusters
Traffic between the source cluster and the destination cluster is managed with permits, priority, and configurable
failover behavior for multi-datacenter operation.
Permits
Traffic between the source cluster and the destination cluster is managed with permits. When a permit cannot be
acquired, the message is postponed and waits in the replication log until it is processed when a permit becomes
available. Permits are global and not per destination.
To manage permits and set the maximum number of messages that can be replicated to all destinations
simultaneously, use dse advrep conf:
This example sets the channel for table foo.bar to the top priority of one, so that the table's replication log files
will be transmitted before other table's replication log files. It also sets the replication log files to be read from
newest to oldest.
Configure automatic failover for hub clusters with multiple datacenters
DSE Advanced Replication uses the DSE Java driver load balancing policy to communicate with the
hub cluster. You can explicitly define the local datacenter for the datacenter-aware round robin policy
(DCAwareRoundRobinPolicy) that is used by the DSE Java driver.
You can enable or disable failover from a local datacenter to a remote datacenter. When multiple datacenter
failover is configured and a local datacenter fails, data replication from the edge to the hub continues using the
remote datacenter. Tune the configuration with these parameters:
driver-local-dc
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
439
Using DataStax Enterprise advanced functionality
For destination clusters with multiple datacenters, you can explicitly define the name of the datacenter
that you consider local. Typically, this is the datacenter that is closest to the source cluster. This value is
used only for clusters with multiple data enters.
driver-used-hosts-per-remote-dc
To use automatic failover for destination clusters with multiple datacenters, you must define
the number of hosts per remote datacenter that the datacenter aware round robin policy
(DCAwareRoundRobinPolicy) considers available.
driver-allow-remote-dcs-for-local-cl
Set to true to enable automatic failover for destination clusters with multiple datacenters. The value of
the driver-consistency-level parameter must be LOCAL_ONE or LOCAL_QUORUM.
To enable automatic failover with a consistency level of LOCAL_QUORUM, use dse advrep destination update:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
440
Using DataStax Enterprise advanced functionality
Due to Cassandra-11368, list inserts might not be idempotent (unchanged). Because DSE Advanced
Replication might deliver the same message to the destination more than once, this Cassandra bug might
lead to data inconsistency if lists are used in a column family schema. DataStax recommends using other
collection types, like sets or frozen lists, when ordering is not important.
2. Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation
method.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
441
Using DataStax Enterprise advanced functionality
enabled: true
3. Enable Capture-Data-Change (CDC) in the cassandra.yaml file on a per-node basis for each source:
cdc_enabled: true
Advanced Replication will not start if CDC is not enabled, since CDC logs are used to implement the
feature.
4. Consider increasing the default CDC disk space, depending on the load (default: 4096 or 1/8 of the total
space where cdc_raw_directory resides):
cdc_total_space_in_mb: 16384
5. Commitlog compression is turned off by default. To avoid problems with advanced replication, this option
should NOT be used; ensure that the option is commented out:
# commitlog_compression:
# - class_name: LZ4Compressor
6. Start DataStax Enterprise as a transactional node with the command that is appropriate for the installation
method.
7. Once advanced replication is started on a cluster, the source node will create keyspaces and tables that
need alteration. See Keyspaces for information.
1. On the source node and the destination node, create the sample keyspace and table:
Remember to use escaped quotes around keyspace and table names as command line arguments to
preserve casing: dse advrep create --keyspace \"keyspaceName\" --table \"tableName\"
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
442
Using DataStax Enterprise advanced functionality
The source_id column is recommended as a column to include on the destination node. If the
destination table has a field in the primary key that uniquely determines the source from which the data is
replicated, the source_id is not required as part of the primary key. The source_id column is useful for
preventing overwrites if two records with the same primary key get replicated from different sources, and
you want to keep both records.
• The source node points to its destination using the public IP address that you saved earlier.
• The source-id value is a unique identifier for all data that comes from this particular source node.
• The source-id unique identifier is written to the source-id-column that was included when the foo.bar
table was created on the destination node.
--------------------------------------------------------------------------------------------
|destination|name |value
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_enabled |false
|
--------------------------------------------------------------------------------------------
|mydest |addresses |10.200.182.148
|
--------------------------------------------------------------------------------------------
|mydest |driver_read_timeout |15000
|
--------------------------------------------------------------------------------------------
|mydest |driver_connections_max |8
|
--------------------------------------------------------------------------------------------
|mydest |source_id_column |source_id
|
--------------------------------------------------------------------------------------------
|mydest |driver_connect_timeout |15000
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_protocol |TLS
|
--------------------------------------------------------------------------------------------
|mydest |driver_consistency_level |QUORUM
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
443
Using DataStax Enterprise advanced functionality
--------------------------------------------------------------------------------------------
|mydest |driver_used_hosts_per_remote_dc |0
|
--------------------------------------------------------------------------------------------
|mydest |driver_allow_remote_dcs_for_local_cl|false
|
--------------------------------------------------------------------------------------------
|mydest |driver_compression |lz4
|
--------------------------------------------------------------------------------------------
|mydest |driver_connections |1
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_cipher_suites |
[TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
444
Using DataStax Enterprise advanced functionality
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
445
Using DataStax Enterprise advanced functionality
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
446
Using DataStax Enterprise advanced functionality
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
|
--------------------------------------------------------------------------------------------
|mydest |source_id |source1
|
--------------------------------------------------------------------------------------------
|mydest |transmission_enabled |true
|
--------------------------------------------------------------------------------------------
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
447
Using DataStax Enterprise advanced functionality
dse advrep channel create --source-keyspace foo --source-table bar --source-id source1 --
source-id-column source_id --destination mydest --destination-keyspace foo --destination-
table bar --collection-enabled true --transmission-enabled true --priority 1
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |1 |foo
|bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------
The designated keyspace for a replication channel must have durable writes enabled. If durable_writes =
false, then an error message will occur and the channel will not be created. If the durable writes setting is
changed after the replication channel is created, the tables will not write to the commit log and CDC will not
work. The data will not be ingested through the replication channel and a warning is logged, but the failure will
be silent.
2. On the source, replication to the destination can be paused or resumed, the latter shown here:
Notice that either --transmission or --collection can be specified, to resume transmission from the
source to the destination or to resume collection of data on the source.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
448
Using DataStax Enterprise advanced functionality
3. Review the number of records that are in the replication log. Because no data is inserted yet, the record
count in the replication log is 0:
dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
INSERT INTO foo.bar (name, val, scalar) VALUES ('a', '1', 1);
INSERT INTO foo.bar (name, val, scalar) VALUES ('b', '2', 2);
dse cassandra-stop
INSERT INTO foo.bar (name, val, scalar) VALUES ('c', '3', 3);
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
449
Using DataStax Enterprise advanced functionality
INSERT INTO foo.bar (name, val, scalar) VALUES ('d', '4', 4);
3. Review the number of records that are in the replication log. The replication log should have 2 entries:
dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
dse cassandra
Wait a moment for communication and data replication to resume to replicate the new records from the
source to destination.
dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
dse advrep --verbose channel pause --keyspace foo --table bar --collection
dse advrep --verbose channel resume --keyspace foo --table bar --collection
The dse_system keyspace uses the EverywhereStrategy replication strategy by default; this setting must not be
altered. The dse_advrep keyspace is configured to use the SimpleStrategy replication strategy by default and
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
450
Using DataStax Enterprise advanced functionality
this setting must be updated in production environments to avoid data loss. After starting the cluster, alter the
keyspace to use the NetworkTopologyStrategy replication strategy with an appropriate settings for the replication
factor and datacenters. For example, use a CQL statement to configure a replication factor of 3 on the DC1
datacenter using NetworkTopologyStrategy:
For most environments using DSE Advanced Replication, a replication factor of 3 is suitable. The strategy must
be configured for any datacenters which are serving as an advanced replication source.
nodetool repair must be run on each node of the affected datacenters. to repair the altered keyspace:
Primitive data types: int, ascii, bigint, blob, boolean, decimal, double, All types are implemented for insert/update/delete.
float, inet, text, timestamp, timeuuid, uuid, varchar, varint
Frozen collections: frozen<list<data_type>>, frozen<set<ddata_type>>, All frozen collections are implemented for insert/update/delete, as
frozen<map<data_type, data_type>> values are immutable blocks - entire column value is replicated.
Tuples: tuple<data_type, data_type, data_type>, All tuples are implemented for insert/update/delete, as values are
frozen<tuple<data_type, data_type, data_type> immutable blocks - entire column value is replicated.
Frozen user-defined type (UDT): UDT type and frozen UDT type All UDTs are implemented for insert/update/delete, as values are
immutable blocks - entire column value is replicated.
Geometric types: Point, LineString, Polygon All geometric types are implemented for insert/update/delete.
The following table shows the data type and operations that are not supported in DSE Advanced Replication:
Data Type Advanced Replication Operations
Unfrozen updatable collections: <list<data_type>>, <set<ddata_type>>, All unfrozen updatable collections are implemented for insert/delete
<map<data_type, data_type>> if the entire column value is replicated. Unfrozen collections cannot
update values.
Unfrozen updatable user-defined type (UDT) All unfrozen updatable UDTs are implemented for insert/delete if
the entire column value is replicated. Unfrozen UDTs cannot update
values.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
451
Using DataStax Enterprise advanced functionality
6. Security
8. Monitoring operations
Before you can start and use DSE Advanced Replication, you must create the user keyspaces and tables on the
source cluster and the destination cluster.
On all nodes in the source cluster:
2. Enable Capture-Data-Change (CDC) in the cassandra.yaml file on a per-node basis for each source:
cdc_enabled: true
cdc_raw_directory: /var/lib/cassandra/cdc_raw
Advanced Replication will not start if CDC is not enabled. Either use the default directory or change it to a
preferred location.
3. Consider increasing the default CDC disk space, depending on the load (default: 4096 MB or 1/8 of the total
space where cdc_raw_directory resides):
cdc_total_space_in_mb: 16384
4. Commitlog compression is turned off by default. To avoid problems with advanced replication, this option
should NOT be used:
# commitlog_compression:
# - class_name: LZ4Compressor
5. Do a rolling restart: restart the nodes in the source cluster one at a time while the other nodes continue to
operate online.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
452
Using DataStax Enterprise advanced functionality
enabled: false
2. Do a rolling restart: restart the nodes in the source cluster one at a time while the other nodes continue to
operate online.
3. To clean out the data that was used for DSE Advanced Replication, use cqlsh to remove these keyspaces:
-----------------------------------
|name |value |
-----------------------------------
|audit_log_file |/tmp/myaudit.gz|
-----------------------------------
|audit_log_enabled|true |
-----------------------------------
The following table describes the configuration keys, their default values, and identifies when a restart of the
source node is required for the change to be recognized.
The dse advrep command line tool uses these configuration keys as command arguments to the dse advrep
command line tool.
Configuration key Default Description Restart
value required
permits 30,000 Maximum number of messages that can be replicated in parallel over all No
destinations.
source-id N/A Identifies this source cluster and all inserts from this cluster. The source-id No
must also exist in the primary key on the destination for population of the
source-id to occur.
collection-expire-after-write N/A
collection-time-slice-count 5 The number of files which are open in the ingestor simultaneously. Yes
collection-time-slice-width 60 seconds The time period in seconds for each data block ingested. Smaller time Yes
widths => more files. Larger timer widths => larger files but more data to
resend on CRC mismatches.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
453
Using DataStax Enterprise advanced functionality
invalid-message-log SYSTEM_LOG Select one of these logging strategies to adopt when an invalid message No
is discarded:
SYSTEM_LOG: Log the CQL query and the error message in the system
log on the destination.
CHANNEL_LOG: Store the CQL query and the error message in files in /
var/lib/cassandra/advrep/invalid_queries on the destination.
NONE: Perform no logging.
See Managing invalid messages. Requires node restart.
audit-log-file /tmp/ Specifies the file name prefix template for the audit log file. The file name Yes
advrep_rl_audit.log is appended with .gz if compressed using gzip.
audit-log-max-life-span-mins 0 Specifies the maximum lifetime of audit log files. Periodically, when log Yes
files are rotated, audit log files are purged when they:
• And they have not been written to for more than the specified
maximum lifespan minutes
audit-log-rotate-time-mins 60 Specifies the time interval to rotate the audit log file. On rotation, the Yes
rotated file is appended with the log counter .[logcounter], incrementing
from [0]. To disable rotation, set to 0.
You can verify the channel configuration before you change it. For example:
--------------------------------------------------------------------------------------------
|destination|name |value
|
--------------------------------------------------------------------------------------------
|mydest |driver_ssl_enabled |false
|
--------------------------------------------------------------------------------------------
|mydest |addresses |10.200.182.251
|
--------------------------------------------------------------------------------------------
|mydest |driver_read_timeout |15000
|
--------------------------------------------------------------------------------------------
|mydest |driver_connections_max |8
|
--------------------------------------------------------------------------------------------
|mydest |source_id_column |source_id
|
--------------------------------------------------------------------------------------------
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
454
Using DataStax Enterprise advanced functionality
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
455
Using DataStax Enterprise advanced functionality
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
456
Using DataStax Enterprise advanced functionality
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
457
Using DataStax Enterprise advanced functionality
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
|
--------------------------------------------------------------------------------------------
|mydest |source_id |source1
|
--------------------------------------------------------------------------------------------
|mydest |transmission_enabled |true
|
--------------------------------------------------------------------------------------------
|llpdest |driver_ssl_enabled |false
|
--------------------------------------------------------------------------------------------
|llpdest |addresses |10.200.177.184
|
--------------------------------------------------------------------------------------------
|llpdest |driver_read_timeout |15000
|
--------------------------------------------------------------------------------------------
|llpdest |driver_connections_max |8
|
--------------------------------------------------------------------------------------------
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
458
Using DataStax Enterprise advanced functionality
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
459
Using DataStax Enterprise advanced functionality
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
460
Using DataStax Enterprise advanced functionality
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
461
Using DataStax Enterprise advanced functionality
| | |,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV]
|
--------------------------------------------------------------------------------------------
|llpdest |source_id |source1
|
--------------------------------------------------------------------------------------------
|llpdest |transmission_enabled |false
|
--------------------------------------------------------------------------------------------
The following table describes the configuration keys, their default values, and identifies when a restart of the
source node is required for the change to be recognized.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
462
Using DataStax Enterprise advanced functionality
addresses none REQUIRED. A comma separated list of IP addresses that are used to No
connect to the destination cluster using the DataStax Java driver.
driver-allow-remote-dcs-for-local-cl false Set to true to enable automatic failover for destination clusters with Yes
multiple datacenters. The value of the driver-consistency-level
parameter must be LOCAL_ONE or LOCAL_QUORUM.
driver-compression lz4 The compression algorithm the DataStax Java driver uses to send data Yes
from the source to the destination. Supported values are lz4 and snappy.
driver-connect-timeout 15000 Time in milliseconds the DataStax Java driver waits to connect to a No
server.
driver-connections 32 The number of connections the DataStax Java driver will create. Yes
driver-connections-max 256 The maximum number of connections the DataStax Java driver will Yes
create.
driver-max-requests-per- 1024 The maximum number of requests per connection the DataStax Java
connection driver will create.
driver-consistency-level ONE The consistency level used by the DataStax Java driver when executing No
statements for replicating data to the destination. Specify a valid DSE
CONSISTENCY level: ANY, ONE, TWO, THREE, QUORUM, ALL,
LOCAL_QUORUM, EACH_QUORUM, SERIAL, LOCAL_SERIAL, or
LOCAL_ONE.
driver-local-dc N/A For destination clusters with multiple datacenters, you can explicitly define Yes
the name of the datacenter that you consider local. Typically, this is the
datacenter that is closest to the source cluster. This value is used only for
clusters with multiple data enters.
driver-pwd none Driver password if the destination requires a user and password to Yes
connect. Changing the driver-pwd value for connection to a destination will
automatically connect, but with a slight delay.
By default, driver user names and passwords are plain text. DataStax
recommends encrypting the driver passwords before you add them to
the CQL table.
driver-read-timeout 15000 Time in milliseconds the DataStax Java driver waits to read responses No
from a server.
driver-ssl-enabled false Whether SSL is enabled for connection to the destination. Yes
driver_ssl_keystore_path none The path to the keystore for connection to DSE when SSL client Yes
authentication is enabled.
driver_ssl_keystore_password none The keystore password for connection to DSE when SSL client Yes
authentication is enabled.
driver_ssl_keystore_type none The keystore type for connection to DSE when SSL client authentication is Yes
enabled.
driver_ssl_truststore_path none The path to the truststore for connection to DSE when SSL is enabled. Yes
driver-ssl-truststore-password none The truststore password for connection to DSE when SSL is enabled. Yes
driver-ssl-truststore-type none The keystore type for connection to DSE when SSL client authentication is Yes
enabled.
driver-ssl-protocol TLS The SSL protocol for connection to DSE when SSL is enabled. Yes
driver-ssl-cipher-suites none A comma-separated list of SSL cipher suites for connection to DSE when Yes
SSL is enabled. Cipher suites must be supported by the source machine.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
463
Using DataStax Enterprise advanced functionality
driver-used-hosts-per-remote-dc 0 To use automatic failover for destination clusters with multiple Yes
datacenters, you must define the number of hosts per remote
datacenter that the datacenter aware round robin policy
(DCAwareRoundRobinPolicy) considers available.
driver-user none Driver username if the destination requires a user and password to Yes
connect. Changing the driver-user value for connection to a destination
will automatically connect, but with a slight delay.
source-id N/A Identifies this source cluster and all inserts from this cluster. The source-id No
must also exist in the primary key on the destination for population of the
source-id to occur.
source-id-column source-id The column to use on remote tables to insert the source id as part of the No
update. If this column is not present on the table that is being updated, the
source id value is ignored.
transmission-enabled false Specify if data collector for the table should be replicated to the No
destination using boolean value.
You can verify the channel configuration before you change it. For example:
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |2 |foo
|bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------
Properties are continuously read from the metadata, so a restart is not required after configuration changes are
made. The following table describes the configuration settings.
Column name Description
source-id Placeholder to override the source-id that is defined in the advrep_conf metadata
enabled If true, replication will start for this table. If false, no more messages from this table will be saved to the replication
log.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
464
Using DataStax Enterprise advanced functionality
data-center-id Datacenter this replication channel is meant for, if none specified the replication will happen in all specified dc1.
destination-table The table name on the destination for the replicated table.
transmission-enabled Specify if the data collector for the table should be replicated to the destination.
Security
Authentication credentials can be provided in several ways, see Providing credentials from DSE tools.
The user who is doing the replicating with DSE Advanced Replication requires table and keyspace level
authorization. If the same user access is required, then ensure that the authorization is the same on the source
and destination clusters.
Advanced Replication also supports setting row-level permissions on the destination cluster. The user which
connects to the destination cluster must have permission to write to the specified destination table at the row
level replicated from the source, according to the RLAC restrictions. The user is specified with the --driver-
user destination setting. Row-level access control (RLAC) on the source cluster does not impact Advanced
Replication. Because Advanced Replication reads the source data at the raw CDC file layer, it essentially reads
as a superuser and has access to all configured data tables.
Advanced Replication supports encrypting the driver passwords. Driver passwords are stored in a CQL table. By
default, driver passwords are plain text. DataStax recommends encrypting the driver passwords before you add
them to the CQL table. Create a global encryption key, called a system_key for SSTable encryption. Each node
in the source cluster must have the same system key. The destination does not require this key.
config_encryption_active: false
conf_driver_password_encryption_enabled: true
• Define where system keys are stored on disk. The location of the key is specified on the command line
with the -d option or with system_key_directory in dse.yaml. The default filepath is /etc/dse/conf.
• To configure the filename of the generated encryption key, set the config_encryption_key_name option
in dse.yaml. The default name is system_key.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
465
Using DataStax Enterprise advanced functionality
On-server:
Off-server
For example:
where system_key_file is a unique file name for the generated system key file. See createsystemkey.
Result: Configure transparent data encryption (TDE) on a per table basis. You can configure encryption
with or without compression. You can create a global encryption key in the location that is specified
by system_key_directory in the dse.yaml file. This default global encryption key is used when the
system_key_file subproperty is not specified.
4. On any node in the source cluster, use the dse command to set the encrypted password in the DSE
Advanced Replication environment:
5. Start dse.
• CQL insert, including cqlsh and applications that use the standard DSE drivers
• Spark saveToCassandra
• Spark bulkSaveToCassandra
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
466
Using DataStax Enterprise advanced functionality
Monitoring operations
Advanced replication can be monitored with JMX metrics. The outgoing replication queue size is a key factor to
watch. See Metrics for more details.
CQL queries in DSE Advanced Replication
This overview of supported CQL queries and replication concepts for DSE Advanced Replication provide details
on supported CQL queries and best practices guidelines.
DSE Advanced Replication replicates data from source clusters to destination clusters. Replication takes the
CQL query on the source and then recreates a modified version of the query and runs it on the destination.
DataStax Enterprise supports a restricted list of valid CQL queries to manipulate data. In DSE Advanced
Replication, the same restrictions apply to the generated CQL queries that are used to replicate data into the
destination.
Restrictions apply to the primary key. The primary key consists of two parts: the partition key and the clustering
key. The primary key parts plus the optional field values comprise the database row.
If differences exist between the primary key on the source table and the primary key on the destination table,
restrictions apply for which CQL queries are supported.
Best practices
DataStax recommends the following best practices to ensure seamless replication.
Schema structure on the source table and the destination table
• Maintain an identical primary key (partition keys and clustering keys) format in the same order, with
the same columns.
Although the source_id column can be present in the source table schema, values that are inserted
into that column are ignored. When records are replicated, the configured source-id value is used.
Partition key columns
The following list details support and restrictions for partition keys:
• In the destination table, only an additional optional source_id column is supported in the partition
key. Additional destination table partition key columns are not supported. The source_id can be
either a clustering column or a partition key, but not both.
• Using a subset of source table partition key columns in the destination table might result in
overwriting. There is a many-to-one mapping for row entries.
• CQL UPDATE queries require that all of the partition key columns are fully restricted. Restrict
partition key columns using = or IN (single column) restrictions.
• CQL DELETE queries require that all of the partition key columns are fully restricted. Restrict
partition key columns using = or IN (single column) restrictions.
Clustering columns
The following list details support and restrictions for clustering columns:
• In the destination table, only an additional optional source_id column is supported in the clustering column.
Additional destination table partition key columns are not supported. The source_id can be either a
clustering column or a partition key, but not both.
• Using a subset of source table clustering columns in the destination table might result in overwriting. There
is a many-to-one mapping for row entries.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
467
Using DataStax Enterprise advanced functionality
• Order is irrelevant for replication when using CQL INSERT and UPDATE queries. All permutations are
supported.
• Order is relevant for replication when using CQL DELETE queries. There are limits to permutation support, all
permutations are not supported.
• CQL UPDATE queries require that all of the clustering columns are fully restricted. Restrict partition key
columns using = or IN (single column) restrictions.
• CQL DELETE queries require that the last-specified clustering column be restricted using =/>/>=/</<= (single
or multiple column) or IN (single or multiple column). All of the clustering columns that precede the last-
specified clustering column must also be restricted using = or IN.
• Restricting clustering columns is optional. However, if you do restrict clustering columns, then all of
the clustering columns that you restrict between the first and last (in order) clustering columns must be
restricted.
Field values
The following list details support and requirements for field values:
• A subset, or all, of the field values on the source are supported for replication to the destination.
• Fields that are present on the source, but absent on the destination, are not replicated.
• Fields that are present on the destination, but absent on the source, are not populated.
Source ID (source_id)
The source_id identifies the source cluster and all inserts from the source cluster. The following list details
support and requirements for the source_id:
• The source_id configuration key must be present and correct in the metadata.
• The source_id must be the first position in the clustering column, or any of the partition keys.
If not, then the CQL INSERT and UPDATE queries should work, but the CQL DELETE queries with partially
restricted clustering columns might fail.
• The source_id is always restricted in CQL DELETE and UPDATE queries. Certain delete statements are
not supported where the clustering key is not fully restricted, and the source_id is not the first clustering
column
• For production, DataStax recommends authenticating JMX users, see Configuring JMX authentication.
• Use these steps to enable local JMX access. Localhost access is useful for test and development.
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=localhost"
LOCAL_JMX=yes
2. On the source node, stop and restart DataStax Enterprise to recognize the local JMX change.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
468
Using DataStax Enterprise advanced functionality
------------------------------------------
|Group |Type |Count|
------------------------------------------
|Tables |MessagesDelivered |1002 |
------------------------------------------
|ReplicationLog|CommitLogsToConsume|1 |
------------------------------------------
|Tables |MessagesReceived |1002 |
------------------------------------------
|ReplicationLog|MessageAddErrors |0 |
------------------------------------------
|ReplicationLog|CommitLogsDeleted |0 |
------------------------------------------
-----------------------------------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate |
FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded |1002 |events/second|0.012688461014514603|
9.862886141388435E-39|2.964393875E-314 |2.322135514219019E-114 |
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesDeleted |0 |events/second|0.0 |0.0
|0.0 |0.0 |
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAcknowledged |1002 |events/second|0.012688456391385135|
9.86403600116801E-39 |2.964393875E-314 |2.3230339468969963E-114|
-----------------------------------------------------------------------------------------------------------
|ReplicationLog|CommitLogMessagesRead|16873|events/second|0.21366497971804438 |
0.20580430240786005 |0.39126032533612265|0.2277227124698431 |
-----------------------------------------------------------------------------------------------------------
-------------------------------------
|Group |Type |Value|
-------------------------------------
|Transmission|AvailablePermits|30000|
-------------------------------------
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
469
Using DataStax Enterprise advanced functionality
Choose the MBeans tab and find com.datastax.bdp.advrep.v2.metrics in the left-hand navigation frame:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
470
Using DataStax Enterprise advanced functionality
Performance metrics
Metrics are exposed as JMX MBeans under the com.datastax.bdp.advrep.v2.metrics path and are logically
divided into main groups. Each group refers to an architecture component. Metrics types are:
Counter
A simple incrementing and decrementing 64-bit integer.
Meter
Measures the rate at which a set of events occur.
Histogram
Measures the distribution of values in a stream of data.
Timer
A histogram of the duration of a type of event and a meter of the rate of its occurrence.
Gauge
A gauge is an instantaneous measurement of a value.
Metrics are available for the following groups:
• ReplicationLog
• Transmission
• AdvancedReplicationHub-[destinationId]-metrics
ReplicationLog
Metrics for the ReplicationLog group:
Metric name Description Metric type
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
471
Using DataStax Enterprise advanced functionality
Transmission
Metrics for the Transmission group:
Metric name Description Metric type
AdvancedReplicationHub-[destinationName]-metrics
Metrics for the AdvancedReplicationHub-[destinationName]-metrics group are provided automatically by the DSE
Java driver.
known-hosts Counter
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
472
Using DataStax Enterprise advanced functionality
connected-to Counter
open-connections Counter
requests-timer Timer
connection-errors Counter
write-timeouts Counter
read-timeouts Counter
unavailables Counter
other-errors Counter
retries Counter
ignores Counter
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
473
Using DataStax Enterprise advanced functionality
For example, to access the MessagesReceived metric for the table sensor_readings in the keyspace demo look
at the following path:
com.datastax.bdp.advrep.v2.metrics:type=Tables,scope=demo.sensor_readings,name=MessagesReceived
• SYSTEM_LOG: Log the CQL query and the error message in the system log on the destination.
• CHANNEL_LOG: Store the CQL query and the error message in files in /var/lib/cassandra/advrep/
invalid_queries on the destination. This is the default value.
For the channel logging strategy, a file is created in the channel log directory on the source node, following
the pattern /var/lib/cassandra/advrep/invalid_queries/<keyspace>/<table>/<destination>/
invalid_queries.log where keyspace, table and destination are:
The log file stores the following data that is relevant to the failed message replication:
• time_bucket: an hourly time bucket to prevent the database partition from getting too wide
• cql_string: the CQL query string, explicitly specifies the original timestamp by including the USING
TIMESTAMP option.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
474
Using DataStax Enterprise advanced functionality
1. To store the CQL query string and error message using a channel log, instead of the default system log
location, specify the invalid_message_log configuration key as CHANNEL_LOG:
2. To store the CQL query string and error message using a system log, instead of the default channel log
location, specify the invalid_message_log configuration key as SYSTEM_LOG:
3. To identify the problem, examine the error messages, the CQL query strings, and the schemas of the data
on the source and the destination.
If the configured audit log file is a relative path, then the log files be placed in the default base directory. If
the configured audit log file is an absolute path, then that path is used.
3. To compress the audit log output using the gzip file format:
The default value is NONE for compression. If .gz is not appended to the audit log filename in the
command, it will be appended to the created files. Compressed audit log files will remain locked until
rotated out; the active file cannot be opened.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
475
Using DataStax Enterprise advanced functionality
4. Specify the time interval to rotate the audit log file. On rotation, the rotated file is appended with the log
counter .[logcounter], incrementing from [0]. To disable rotation, set to 0.
For example, the compressed file from the last step can be uncompressed after rotating out to /tmp/
auditAdvRep/myaudit.[0].gz.
• And have not been written to for more than the specified maximum lifespan minutes
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
476
Using DataStax Enterprise advanced functionality
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Connection options
JMX authentication is supported by some dse commands. Other dse commands authenticate with the user
name and password of the configured user. The connection option short form and long form are comma
separated.
You can provide authentication credentials in several ways, see Credentials for authentication.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
477
Using DataStax Enterprise advanced functionality
--keystore-path ssl_keystore_path
Path to the keystore for connection to DSE when SSL client authentication is enabled.
--keystore-type ssl_keystore_type
Keystore type for connection to DSE when SSL client authentication is enabled. JKS is the type
for keys generated by the Java keytool binary, but other types are possible, depending on user
environment.
-p password
The password to authenticate for database access. Can use the DSE_PASSWORD environment
variable.
--ssl
Whether SSL is enabled for connection to DSE.--ssl-enabled true is the same as --ssl.
--ssl-protocol ssl_protocol
SSL protocol for connection to DSE when SSL is enabled. For example, --ssl-protocol ssl4.
-t token
Specify delegation token which can be used to login, or alternatively, DSE_TOKEN environment
variable can be used.
--truststore_password ssl_truststore_password
Truststore password to use for connection to DSE when SSL is enabled.
--truststore_path ssl_truststore_path
Path to the truststore to use for connection to DSE when SSL is enabled. For example, --truststore-
path /path/to/ts.
--truststore-type ssl_truststore_type
Truststore type for connection to DSE when SSL is enabled. JKS is the type for keys generated by
the Java keytool binary, but other types are possible, depending on user environment. For example, --
truststore-type jks2.
-u username
User name of a DSE authentication account. Can use the DSE_USERNAME environment variable.
Examples
This connection example specifies that Kerberos is enabled and lists the replication channels:
destination|name|value
mydest|addresses|192.168.200.100
mydest|transmission-enabled|true
mydest|driver-ssl-cipher-suites|
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,
mydest|driver-ssl-enabled|false
mydest|driver-ssl-protocol|TLS
mydest|name|mydest
mydest|driver-connect-timeout|15000
mydest|driver-max-requests-per-connection|1024
mydest|driver-connections-max|8
mydest|driver-connections|1
mydest|driver-compression|lz4
mydest|driver-consistency-level|ONE
mydest|driver-allow-remote-dcs-for-local-cl|false
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
478
Using DataStax Enterprise advanced functionality
mydest|driver-used-hosts-per-remote-dc|0
mydest|driver-read-timeout|15000
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
479
Using DataStax Enterprise advanced functionality
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set the
source-id and source-id-column differently from the global setting.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
480
Using DataStax Enterprise advanced functionality
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
481
Using DataStax Enterprise advanced functionality
Examples
$
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|
priority|dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|demo |sensor_readings |true |true |LIFO |2 |
demo |sensor_readings |source1|source_id |mydest |true |
--------------------------------------------------------------------------------------------------------------
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set the
source-id and source-id-column differently from the global setting.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
482
Using DataStax Enterprise advanced functionality
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
$ dse advrep channel delete --source-keyspace foo --source-table bar --destination mydest
--data-center-id Cassandra
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel pause
Pauses replication for a channel for change data to flow from a source cluster to a destination cluster.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Pause collection of data or transmission of data between a source cluster and destination cluster.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
483
Using DataStax Enterprise advanced functionality
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.
Examples
$ dse advrep channel pause --source-keyspace foo --source-table bar --destinations mydest
--data-center-ids Cassandra
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel resume
Resumes replication for a channel.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
484
Using DataStax Enterprise advanced functionality
A replication channel is a defined channel of change data between source clusters and destination clusters.
A channel can resume either the collection or transmission of replication between a source cluster and
destination cluster.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
485
Using DataStax Enterprise advanced functionality
Examples
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel status
Prints status of a replication channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
486
Using DataStax Enterprise advanced functionality
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destination destination
The destination where the replication will be sent; the user names the destination.
--data-center-id data_center_id
The datacenter for this channel.
Examples
$ dse advrep channel status --source-keyspace foo --source-table bar --destination mydest
--data-center-id Cassandra
with a result:
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |2 |
foo |bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel truncate
Truncates a channel to prevent replicating all messages that are currently in the replication log.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
487
Using DataStax Enterprise advanced functionality
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
Examples
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep conf list
Lists configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
488
Using DataStax Enterprise advanced functionality
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Examples
The result:
----------------------------
|name |value |
----------------------------
|audit_log_file |auditLog|
----------------------------
|permits |8 |
----------------------------
|audit_log_enabled|true |
----------------------------
The number of permits is 8, audit logging is enabled, and the audit log file name is auditLog.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
489
Using DataStax Enterprise advanced functionality
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none
--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
490
Using DataStax Enterprise advanced functionality
Examples
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
491
Using DataStax Enterprise advanced functionality
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none
--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log
Examples
with a result:
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
492
Using DataStax Enterprise advanced functionality
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
493
Using DataStax Enterprise advanced functionality
--driver-consistency-level ANY|ONE|TWO|THREE|QUORUM|ALL|LOCAL_QUORUM|EACH_QUORUM|
SERIAL|LOCAL_SERIAL|LOCAL_ONE
The consistency level for the destination.
--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path
The SSL keystore path to use for driver connections.
--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples
To update a replication destination:
with a result:
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
494
Using DataStax Enterprise advanced functionality
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
495
Using DataStax Enterprise advanced functionality
--driver-compression snappy|lz4
The compression algorithm for data files.
--driver-connect-timeout timeout_in_milliseconds
The timeout for the driver connection.
--driver-read-timeout timeout_in_milliseconds
The timeout for the driver reads.
--driver-max-requests-per-connection number_of_requests
The maximum number of requests per connection.
--driver-ssl-enabled true|false
Enable or disable SSL connection for the destination.
--driver-ssl-cipher-suites suite1[ , suite2, suite3 ]
Comma-separated list of SSL cipher suites to use for driver connections.
--driver-ssl-protocol protocol
The SSL protocol to use for driver connections.
--driver-keystore-path keystore_path
The SSL keystore path to use for driver connections.
--driver-keystore-password keystore_password
The SSL keystore password to use for driver connections.
--driver-keystore-type keystore_type
The SSL keystore type to use for driver connections.
--driver-truststore-path truststore_path
The SSL truststore path to use for driver connections.
--driver-truststore-password truststore_password
The SSL truststore password to use for driver connections.
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples
with a result:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
496
Using DataStax Enterprise advanced functionality
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
with a result:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
497
Using DataStax Enterprise advanced functionality
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Examples
with a result:
----------------
|name |enabled|
----------------
|mydest|true |
----------------
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
498
Using DataStax Enterprise advanced functionality
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
with a result:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
499
Using DataStax Enterprise advanced functionality
|destination|name |value
|
-------------------------------------------------------------------------------------------
|mydest |addresses |10.200.180.162
|
-------------------------------------------------------------------------------------------
|mydest |transmission-enabled |true
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-cipher-suites |
TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384,|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA256,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
500
Using DataStax Enterprise advanced functionality
| | |
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-enabled |false
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-protocol |TLS
|
-------------------------------------------------------------------------------------------
|mydest |name |mydest
|
-------------------------------------------------------------------------------------------
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
501
Using DataStax Enterprise advanced functionality
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
502
Using DataStax Enterprise advanced functionality
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
503
Using DataStax Enterprise advanced functionality
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
504
Using DataStax Enterprise advanced functionality
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
with a result:
------------------------------------------
|Group |Type |Count|
------------------------------------------
|Tables |MessagesDelivered |3000 |
------------------------------------------
|ReplicationLog|CommitLogsToConsume|1 |
------------------------------------------
|Tables |MessagesReceived |3000 |
------------------------------------------
|ReplicationLog|MessageAddErrors |0 |
------------------------------------------
|ReplicationLog|CommitLogsDeleted |0 |
------------------------------------------
--------------------------------------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate |
FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded |3000 |events/second|0.020790532589851248|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesDeleted |0 |events/second|0.0 |0.0
|0.0 |0.0 |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAcknowledged |3000 |events/second|0.020790529428089743|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|CommitLogMessagesRead|30740|events/second|0.21303361656215317 |
0.13538523143065767 |0.01686330377344829|0.11519609320406245 |
--------------------------------------------------------------------------------------------------------------
-------------------------------------
|Group |Type |Value|
-------------------------------------
|Transmission|AvailablePermits|30000|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
505
Using DataStax Enterprise advanced functionality
-------------------------------------
with a result:
--------------------------------
|Group |Type |Count|
--------------------------------
|Tables|MessagesDelivered|3000 |
--------------------------------
|Tables|MessagesReceived |3000 |
--------------------------------
with a result:
-----------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate
|FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
-----------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded|3000 |events/second|
0.020827685267120057|6.100068258619765E-28|2.964393875E-314|
5.515866021410421E-82|
-----------------------------------------------------------------------------------
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
506
Using DataStax Enterprise advanced functionality
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
507
Using DataStax Enterprise advanced functionality
Examples
$ dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--file audit_log_filename
The audit log file to create.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
508
Using DataStax Enterprise advanced functionality
Examples
with a result:
DSE In-Memory
DSE In-Memory is a part of the multiple storage options offered in DataStax Enterprise for optimizing performance
and cost goals. DSE in-memory provides lightning-fast performance for read-intensive situations. It allows
developers, architects, and administrators to easily choose what parts (some or all) of a database reside fully
in RAM. It is designed for use cases that lend themselves to in-memory computing, while allowing disk-based
workloads to be serviced by DSE Tiered Storage and traditional storage modeling.
DSE In-Memory is suitable for use cases that include primarily read-only workloads with slowly changing data
and/or semi-static datasets, such as a product catalog that is refreshed nightly, but read constantly during the day.
It is not suitable for workloads with heavily changing data or monotonically growing datasets that might exceed the
RAM capacity on the nodes/cluster.
DataStax recommends using OpsCenter to check performance metrics before and after configuring DSE In-
Memory.
Creating or altering tables to use DSE In-Memory
Use CQL directives to create and alter tables to use DSE In-Memory and dse.yaml to limit the size of tables.
Creating a table to use DSE In-Memory
To create a table that uses DSE In-Memory, add a CQL directive to the CREATE TABLE statement. Use the
compaction directive in the statement to specify the MemoryOnlyStrategy class and disable the key and row
caches.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
509
Using DataStax Enterprise advanced functionality
Use the --jobs option to set the number of SSTables that upgrade simultaneously. The default setting is 2,
which minimizes impact on the cluster. Set to 0 to use all available compaction threads.
In cqlsh, use the DESCRIBE TABLE command to view table properties:
• on shutdown
• on nodetool flush
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
510
Using DataStax Enterprise advanced functionality
'rows_per_partition':'NONE'};
Managing memory
Because DataStax Enterprise runs in a distributed environment, you can inadvertently add excessive data that
exceeds the available memory.
When using DSE In-Memory, you must monitor and carefully manage available memory.
You can use OpsCenter to monitor in-memory usage.
DSE In-Memory retains the durability guarantees of the database.
Recommended limits
To prevent exceeding the RAM capacity, DataStax recommends that in-memory objects consume no more than
45% of a node’s free memory.
Managing available memory
If the maximum memory capacity is exceeded, locking some of the data into memory is stopped, and read
performance will degrade and a warning message is displayed.
The warning message looks something like this:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
511
Using DataStax Enterprise advanced functionality
$ dsetool inmemorystatus
Always run nodetool cleanup before taking a snapshot for restore. Otherwise invalid replicas, that is replicas
that have been superseded by new, valid replicas on newly added nodes can get copied to the target when
they should not. This results in old data showing up on the target.
DSE Multi-Instance
DSE Multi-Instance supports multiple DataStax Enterprise nodes on a single host machine to leverage large
server capabilities and enable the use of existing hardware. This allows you to utilize the price-performance
sweet spot in the contemporary hardware market to ensure that cost saving goals are met without compromising
performance and availability.
About DSE Multi-Instance
DSE Multi-Instance supports multiple DataStax Enterprise nodes on a single host machine to leverage large
server capabilities and enable the use of existing hardware. This allows you to utilize the price-performance
sweet spot in the contemporary hardware market to ensure that cost saving goals are met without compromising
performance and availability.
Benefits
Simplifies configuration Administration, installation, and configuration of DataStax Enterprise nodes on a single host machine are more easily
managed.
Effectively utilizes larger Running multiple DataStax Enterprise nodes on a large host machine enables optimal use of RAM, CPU, and so on.
server resources
Supports scaling Simplifies scaling with multiple DataStax Enterprise nodes on a single large host machine.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
512
Using DataStax Enterprise advanced functionality
• All DSE Multi-Instance nodes on a single physical host share the same database rack to avoid replica
placement problems.
If you are not using the rack feature, you must configure racks manually to ensure that the DSE Multi-
Instance nodes on the same host machine do not encounter replica placement problems.
• Ensuring that DSE Multi-instance nodes do not share a single physical disk.
For example, for two DSE Multi-Instance nodes do not configure a server with a single disk. Instead,
configure the server with at least two disks so that each node can have its own exclusive storage device.
• Multiple JVMs.
• Each DataStax Enterprise node has a node-specific set of configuration files, with one directory per service.
See default file locations for package installations.
The following image shows three DataStax Enterprise nodes on a single host machine.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
513
Using DataStax Enterprise advanced functionality
Figure 16:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
514
Using DataStax Enterprise advanced functionality
On the host machine, the DSE Multi-Instance root directory is /etc/defaults. This default location is not
configurable. The node type is defined in the /etc/defaults/dse-nodeId file.
DSE Multi-Instance is supported only for package installations.
• The run levels are updated to the default values so that the node is started and stopped when the host
machine is booted or halted.
• The /etc/default/dse-nodeId file is created to set the default node type as a transactional node.
• With DSE Multi-Instance, when you run the dse command on a node in the host machine, the node
configuration is read from:
# Tarball installations: the /etc/dse directory is the default configuration location in each location where
you installed DataStax Enterprise.
With DSE Multi-Instance, multiple DataStax Enterprise nodes reside on a single host machine. To segregate
the configuration for each DataStax Enterprise node, node-specific directory structures are used to store
configuration and operational files. For example, in addition to /etc/dse/dse.yaml, the DSE Multi-Instance
dse.yaml files are stored in /etc/dse-nodeId/dse.yaml locations. The server_id option is generated in
DSE Multi-Instance /etc/dse-nodeId/dse.yaml files to uniquely identify the physical server on which multiple
instances are running and is unique for each database instance.
Directories Description
/etc/dse-node1 /etc/dse-node1/dse.yaml is the configuration file for the DataStax Enterprise node in the dse-node1
directory
/etc/dse-node2 /etc/dse-node2/dse.yaml is the configuration file for the DataStax Enterprise node in the dse-node2
directory
For DSE Multi-Instance nodes, two files control the configuration of the node. For example, for the node named
dse-node1:
• /etc/dsefault/dse-node1 configures the node behavior, including node type and configures the number
of retries for the DSE service to start.
For package installations, see directories for DSE Multi-Instance for a comprehensive list of file locations in a
DSE Multi-Instance cluster.
1. Verify that your existing DataStax Enterprise installation has the default node configuration in the /etc/
dse directory. The configuration files for the default node include /etc/dse/dse.yaml and /etc/dse/
cassandra/cassandra.yaml.
2. Give the default cluster a meaningful name. For example, change the default cluster named dse to payroll.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
515
Using DataStax Enterprise advanced functionality
• For package installations, you can use the dse add-node command. For example, to add a node that
will join the cluster payroll on startup:
• For tarball installations, extract the product.tar.gz file multiple times and configure nodes in each
location.
5. Before starting the new node, set the node type in the /etc/default/dse-nodeId file:
• DSE Search:
SOLR_ENABLED=1
• DSE Analytics:
SPARK_ENABLED=1
a. To change default DataStax Enterprise configuration values, edit the configuration files in /etc/
nodeId.
Ensure that the JMX port is configured for each node. 7199 is the DSE JMX metrics monitoring
port. DataStax recommends allowing connections only from the local node. Configure SSL and
JMX authentication when allowing connections from other nodes.
8. Verify that the nodes are running and are part of the cluster.
For example, to verify the cluster status from a local node named dse-node1 on a DSE Multi-Instance
cluster:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
516
Using DataStax Enterprise advanced functionality
Using the standard dsetool ring command provides the status of the default node dse:
When a DSE Multi-Instance server is present in the cluster, the output always includes the Server ID
column, even when you run the command on a server that is a DSE Multi-Instance host machine:
9. To run standard DataStax Enterprise commands for nodes on a DSE Multi-Instance host machine, specify
the node name using this syntax:
The node ID that is specified with the add-node command is automatically prefixed with dse-. In all
instances except for add-node, the command syntax requires the dse- prefix.
For example, with DSE Multi-Instance, the command to start a Spark shell on a node named dse-spark-
node is:
In contrast, the command to start a Spark shell without DSE Multi-Instance is:
$ dse spark
• For example:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
517
Using DataStax Enterprise advanced functionality
To run the dsetool ring command on a node named dse-node1 in a cluster on a DSE Multi-Instance host
machine:
• dse add-node
• dse list-nodes
• dse remove-node
• When entire data sets are accessed at approximately the same frequency.
Features
Increases productivity Automates the movement of data between storage media. Eliminates manually moving data.
Improves performance Data is stored by age, so that frequently accessed data is stored on solid state drives (SSDs) for fastest
performance.
Transparent data access Access to the data in different storage tiers is transparent to users and developers.
Lowers storage costs Improves datacenter cost efficiency. Automatically stores less frequently accessed historical data on slower, less
expensive storage media, such as spinning disks.
Flexible configuration Different server configurations are easy to support and configure. Disk layout is configured per node, so you can test
options adjustments on single nodes before deploying cluster wide.
Compaction strategies Uses the selected tiering strategy to compact based on partition age and automate moving data by row between
storage media.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
518
Using DataStax Enterprise advanced functionality
Performance metrics Dashboard Tiered storage performance metrics for DSE Tiered Storage are available in OpsCenter.
• Configure the storage strategies to define storage locations, and the tiers that define the storage locations,
at the node level in the dse.yaml file.
Use OpsCenter Lifecycle Manager to run a Configure job to push the configuration to applicable nodes.
Multiple configurations can be defined for different use cases. Multiple heterogeneous disk configurations
are supported.
DataStax recommends local configuration testing before deploying cluster wide.
The data sets used by DSE Tiered Storage can be very large. Search limitations and known Apache Solr™
issues apply.
2. For each tiered storage strategy, define the configuration name, the storage tiers, and the data directory
locations for each tier.
a. Define storage tiers in priority order with the fastest storage media in the tier that is listed first.
Use this format, where config_name is the tiered storage strategy that you reference with the CREATE
TABLE or ALTER TABLE statements. The config_name must be the same across all nodes:
tiered_storage_options:
config_name:
tiers:
- paths:
- path_to_directory1
- paths:
- path_to_directory2
where:
• config_name is the configurable name of the tiered storage configuration strategy. For example:
strategy1.
• tiers is the section define a storage tier with the paths and file paths that define the priority order.
• paths is the section of file paths that define the data directories for this tier of the disk configuration.
Typically list the fastest storage media first. These paths are used only to store data that is
configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml
file.
For example, the tiered storage configuration named strategy1 has three different storage tiers ordered in
priority (the first tier listed has highest priority):
tiered_storage_options:
strategy1:
tiers:
- paths:
- /mnt1
- /mnt2
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
519
Using DataStax Enterprise advanced functionality
- paths:
- /mnt3
- /mnt4
- paths:
- /mnt5
- /mnt6
3. To apply the tiered storage strategies to selected tables, use CREATE or ALTER table statements.
For example, to apply tiered storage to table ks.tbl:
CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c))
WITH
COMPACTION={'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy',
'tiering_strategy': 'TimeWindowStorageStrategy',
'config': 'strategy1',
'max_tier_ages': '3600,7200'};
• class
'class':'org.apache.cassandra.db.compaction.TieredCompactionStrategy' configures a table to
use tiered storage.
• tiering_strategy
'tiering_strategy': 'TimeWindowStorageStrategy' uses TimeWindowStorageStrategy (TWSS)
to determine which tier to move the data to. TWSS is a DSE Tiered Storage strategy that uses
TimeWindowCompactionStrategy (TWCS).
• config
'config': 'strategy1' specifies to use the strategy that is configured in the dse.yaml file, in this case
strategy1.
• max_tier_ages
'max_tier_ages': '3600,7200' uses the values in a comma-separated list to define the maximum
age per tier, in seconds, where:
# 3600 restricts the first tier to data that is aged an hour (3600 seconds) or less.
# 7200 restricts the second tier to data that aged two hours (7200 seconds) or less.
# All other data is routed to the data direction locations that are defined for the third tier.
For TimeWindowStorageStrategy (TWSS), DataStax recommends that one tier be defined for
each time age that is specified for max_tier_ages, plus another tier for older data. However,
DataStax Enterprise uses only the tiers that are configured in the table schema and the dse.yaml
file.
An implicit tier exists that represents the oldest data. For example, for a strategy with two tiers in
dse.yaml:
# 'max_tier_ages': '3600,7200' uses three tiers. Tier 0 would be for data newer than 3600
seconds, tier 1 would be for data between 3600 seconds and 7200 seconds, and tier 2 would be
for data older than 7200 seconds.
# 'max_tier_ages': '3600,7200,10800' uses all three tiers, but ignores the last value. Any data
that did not belong in the first two tiers goes to the third tier, whether the data was older than
10800 seconds or not.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
520
Using DataStax Enterprise advanced functionality
1. To overwrite the settings in the table schema in the local dse.yaml file, add a local_options key to an
existing tiered storage configuration.
For example, for this dse.yaml configuration:
tiered_storage_options:
strategy1:
tiers:
- paths:
- /mnt1
- paths:
- /mnt2
- paths:
- /mnt3
CREATE TABLE ks.tbl (k INT, c INT, v INT, PRIMARY KEY (k, c))
WITH COMPACTION={'class':'TieredCompactionStrategy',
'tiering_strategy': 'TimeWindowStorageStrategy',
'config': 'strategy1',
'max_tier_ages': '3600,7200'};
You can adjust the max_tier_ages value to 7200,10800 on a single node, by adding the local_options
key like this:
tiered_storage_options:
strategy1:
local_options:
max_tier_ages: "7200, 10800"
tiers:
- paths:
- /mnt1
- paths:
- /mnt2
- paths:
- /mnt3
3. To monitor the tiered storage behavior of individual tables, use the dsetool tieredtablestats command:
dsetool tieredtablestats
ks.tbl
Tier 0:
Summary:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
521
Using DataStax Enterprise advanced functionality
max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
size: 4839
SSTables:
/mnt2/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-3-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
rows: 1
size: 4839
Tier 1:
Summary:
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0
reads_15_min: 0.0
size: 4839
SSTables:
/mnt3/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-4-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0
reads_15_min: 0.0
rows: 1
size: 4839
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
522
Chapter 8. DataStax Enterprise tools
nodetool
About the nodetool utility
The nodetool utility is a command-line interface for monitoring a cluster and performing routine database
operations. It is typically run from an operational node.
The nodetool utility supports the most important JMX metrics and operations, and includes other useful
commands for cluster administration. Use nodetool commands to view detailed metrics for tables, server metrics,
and compaction statistics.
nodetool abortrebuild
Aborts a currently running rebuild operation. Completes processing of active streams, but no new streams are
started.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
523
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
nodetool assassinate
Forcefully removes a dead node without re-replicating any data. Use as a last resort when you cannot
successfully use nodetool removenode.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
524
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
525
DataStax Enterprise tools
ip_address
IP address of the node.
Examples
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
526
DataStax Enterprise tools
Command arguments
resume
Restart the operation.
Examples
nodetool cfhistograms
This tool has been renamed to nodetool tablehistograms
nodetool cfstats
This tool has been renamed to nodetool tablestats
nodetool cleanup
Triggers immediate cleanup of keyspaces that no longer belong to a node.
OpsCenter provides a Cleanup option in the Nodes UI for Running cleanup.
DataStax Enterprise does not automatically remove data from nodes that lose part of their partition range to
a newly added node. Run nodetool cleanup on the source node and on neighboring nodes that shared the
same subrange after the new node is up and running. After adding a new node, run this command to prevent the
database from including the old data to rebalance the load on that node. This command temporarily increases
disk space use proportional to the size of the largest SSTable and causes Disk I/O to occur.
Failure to run nodetool cleanup after adding a node may result in data inconsistencies including resurrection
of previously deleted data.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
527
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-j, --jobs num_jobs
keyspace_name
Keyspace name. By default, all keyspaces.
table_name
The table name.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
528
DataStax Enterprise tools
nodetool clearsnapshot
Removes one or all snapshots.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
529
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
--all
Removes all snapshots.
keyspace_name
Keyspace name. By default, all keyspaces.
-t snapshotname, --tag snapshotname
The snapshot filepath. To remove all snapshots, omit the snapshot filepath.
Examples
To delete snapshot1
nodetool compact
Forces a major compaction on one or more tables or user-defined compaction on given SSTables.
OpsCenter provides a Compact option in the Nodes UI for Running compaction.
Major compactions may behave differently depending which compaction strategy is used for the affected tables:
• TimeWindowCompactionStrategy (TWCS) This strategy is an alternative for time series data. TWCS
compacts SSTables using a series of time windows. While with a time window, TWCS compacts all
SSTables flushed from memory into larger SSTables using STCS. At the end of the time window, all of
these SSTables are compacted into a single SSTable. Then the next time window starts and the process
repeats. The duration of the time window is the only setting required. See TWCS compaction subproperties.
For more information about TWCS, see How is data maintained?.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
530
DataStax Enterprise tools
A major compaction incurs considerably more disk I/O than minor compactions.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
531
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-et, --end-token end_token
The token at which the range ends. Requires start token (-st).
keyspace_name
Keyspace name. By default, all keyspaces.
-s, --split-output
Do not create a single large file. Split output when using SizeTieredCompactionStrategy (STCS) to files
that are 50%-25%-12.5% and so on of the total size. Ignored for DTCS.
sstable_name
The name of the SSTable file. Specify sstable_name or sstable_directory.
-st, --start-token start_token
The token at which the range starts. Requires end token (-et).
table_name
The table name.
--user-defined
Submits listed files for user-defined compaction.
nodetool compactionhistory
Prints the history of compaction.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
532
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool compactionhistory
The output of compaction history is seven columns wide. The first three
columns show the id, keyspace name, and table name of the compacted
SSTable.
Compaction History:
id
keyspace_name table_name
d06f7080-07a5-11e4-9b36-abc3a0ec9088
system schema_columnfamilies
d198ae40-07a5-11e4-9b36-abc3a0ec9088
libdata users
0381bc30-07b0-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
74eb69b0-0621-11e4-9b36-abc3a0ec9088
system local
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
533
DataStax Enterprise tools
e35dd980-07ae-11e4-9b36-abc3a0ec9088
system compactions_in_progress
8d5cf160-07ae-11e4-9b36-abc3a0ec9088
system compactions_in_progress
ba376020-07af-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
d18cc760-07a5-11e4-9b36-abc3a0ec9088
libdata libout
64009bf0-07a4-11e4-9b36-abc3a0ec9088
libdata libout
d04700f0-07a5-11e4-9b36-abc3a0ec9088
system sstable_activity
c2a97370-07a9-11e4-9b36-abc3a0ec9088
libdata users
cb928a80-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
cd8d1540-079e-11e4-9b36-abc3a0ec9088
system schema_columns
62ced2b0-07a4-11e4-9b36-abc3a0ec9088
system schema_keyspaces
d19cccf0-07a5-11e4-9b36-abc3a0ec9088
system compactions_in_progress
640bbf80-07a4-11e4-9b36-abc3a0ec9088
libdata users
6cd54e60-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
c29241f0-07a9-11e4-9b36-abc3a0ec9088
libdata libout
c2a30ad0-07a9-11e4-9b36-abc3a0ec9088
system compactions_in_progress
e3a6d920-079d-11e4-9b36-abc3a0ec9088
system schema_keyspaces
62c55cd0-07a4-11e4-9b36-abc3a0ec9088
system schema_columnfamilies
62b07540-07a4-11e4-9b36-abc3a0ec9088
system schema_columns
cdd038c0-079e-11e4-9b36-abc3a0ec9088
system schema_keyspaces
b797af00-07af-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
8c918b10-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
377d73f0-07ae-11e4-9b36-abc3a0ec9088
system compactions_in_progress
62b9c410-07a4-11e4-9b36-abc3a0ec9088
system local
d0566a40-07a5-11e4-9b36-abc3a0ec9088
system schema_columns
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
534
DataStax Enterprise tools
ba637930-07af-11e4-9b36-abc3a0ec9088
system compactions_in_progress
cdbc1480-079e-11e4-9b36-abc3a0ec9088
system schema_columnfamilies
e3456f80-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
d086f020-07a5-11e4-9b36-abc3a0ec9088
system schema_keyspaces
d06118a0-07a5-11e4-9b36-abc3a0ec9088
system local
cdaafd80-079e-11e4-9b36-abc3a0ec9088
system local
640fde30-07a4-11e4-9b36-abc3a0ec9088
system compactions_in_progress
37638350-07ae-11e4-9b36-abc3a0ec9088
Keyspace1 Standard1
The four columns to the right of the table name show the timestamp, size
of the SSTable before and after compaction, and the number of partitions
merged. The notation means {tables:rows}. For example: {1:3, 3:1} means 3
rows were taken from one SSTable (1:3) and 1 row taken from 3 SSTables (3:1)
to make the one SSTable in that compaction operation.
. . . compacted_at bytes_in
bytes_out rows_merged
. . . 1404936947592 8096
7211 {1:3, 3:1}
. . . 1404936949540 144
144 {1:1}
. . . 1404941328243 1305838191
1305838191 {1:4647111}
. . . 1404770149323 5864
5701 {4:1}
. . . 1404940844824 573
148 {1:1, 2:2}
. . . 1404940700534 576
155 {1:1, 2:2}
. . . 1404941205282 766331398
766331398 {1:2727158}
. . . 1404936949462 8901649
8901649 {1:9315}
. . . 1404936336175 8900821
8900821 {1:9315}
. . . 1404936947327 223
108 {1:3, 2:1}
. . . 1404938642471 144
144 {1:1}
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
535
DataStax Enterprise tools
. . . 1404940804904 383020422
383020422 {1:1363062}
. . . 1404933936276 4889
4177 {1:4}
. . . 1404936334171 441
281 {1:3, 2:1}
. . . 1404936949567 379
79 {2:2}
. . . 1404936336248 144
144 {1:1}
. . . 1404940645958 307520780
307520780 {1:1094380}
. . . 1404938642319 8901649
8901649 {1:9315}
. . . 1404938642429 416
165 {1:3, 2:1}
. . . 1404933543858 692
281 {1:3, 2:1}
. . . 1404936334109 7760
7186 {1:3, 2:1}
. . . 1404936333972 4860
4724 {1:2, 2:1}
. . . 1404933936715 441
281 {1:3, 2:1}
. . . 1404941200880 1269180898
1003196133 {1:2623528, 2:946565}
. . . 1404940699201 297639696
297639696 {1:1059216}
. . . 1404940556463 592
148 {1:2, 2:2}
. . . 1404936334033 5760
5680 {2:1}
. . . 1404936947428 8413
5316 {1:2, 3:1}
. . . 1404941205571 429
42 {2:2}
. . . 1404933936584 7994
6789 {1:4}
. . . 1404940844664 306699417
306699417 {1:1091457}
. . . 1404936947746 601
281 {1:3, 3:1}
. . . 1404936947498 5840
5680 {3:1}
. . . 1404933936472 5861
5680 {3:1}
. . . 1404936336275 378
80 {2:2}
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
536
DataStax Enterprise tools
. . . 1404940556293 302170540
281000000 {1:924660, 2:75340}
nodetool compactionstats
Prints statistics about compactions.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
537
DataStax Enterprise tools
Command arguments
-H, --human-readable
Display bytes in human readable form: KiB (kibibyte), MiB (mebibyte), GiB (gibibyte), TiB (tebibyte).
Examples
The total column shows the total number of uncompressed bytes of SSTables
being compacted. The system log lists the names of the SSTables compacted.
$ nodetool compactionstats
pending tasks: 5
compaction type keyspace table completed
total unit progress
Compaction Keyspace1 Standard1 282310680
302170540 bytes 93.43%
Compaction Keyspace1 Standard1 58457931
307520780 bytes 19.01%
Active compaction remaining time : 0h00m16s
nodetool decommission
Causes a live node to decommission itself, streaming its data to the next node on the ring to replicate
appropriately.
When decommissioning a DSEFS node, you must unmount DSEFS before removing that node.
See Decommissioning a datacenter, Removing a node, and Adding a node and then decommissioning the old
node.
Use nodetool netstats to monitor the progress.
Decommission does not shut down the node. Shut down the node after decommission is complete.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
538
DataStax Enterprise tools
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
-f, --force
Force decommission of the node even when it reduces the number of replicas to below configured RF.
Examples
nodetool describecluster
Prints the name, snitch, partitioner and schema version of a cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
539
DataStax Enterprise tools
Typically used to validate the schema after upgrading. If a schema disagreement occurs, check for and resolve
schema disagreements.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
540
DataStax Enterprise tools
Command arguments
datacenter_name
The datacenter name.
Example
Get cluster name, snitch, partitioner and schema version
$ nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: com.datastax.bdp.snitch.DseDelegateSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
d4f18346-f81f-3786-aed4-40e03558b299: [127.0.0.1]
$ nodetool describecluster
When schema disagreement occurs, the last line of the output includes information about unreachable nodes:
Cluster Information:
Name: Production Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
UNREACHABLE: 1176b7ac-8993-395d-85fd-41b89ef49fbb: [10.202.205.203]
nodetool describering
Shows the token ranges.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
541
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
Examples
Schema Version:1b04bd14-0324-3fc8-8bcb-9256d1e15f82
Keyspace: cycling
TokenRange:
TokenRange(start_token:3074457345618258602,
end_token:-9223372036854775808,
endpoints:[127.0.0.1, 127.0.0.2, 127.0.0.3],
rpc_endpoints:[127.0.0.1, 127.0.0.2, 127.0.0.3],
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
542
DataStax Enterprise tools
endpoint_details:[EndpointDetails(host:127.0.0.1, datacenter:datacenter1,
rack:rack1),
EndpointDetails(host:127.0.0.2, datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.3, datacenter:datacenter1, rack:rack1)])
TokenRange(start_token:-3074457345618258603,
end_token:3074457345618258602,
endpoints:[127.0.0.3, 127.0.0.1, 127.0.0.2],
rpc_endpoints:[127.0.0.3, 127.0.0.1, 127.0.0.2],
endpoint_details:[EndpointDetails(host:127.0.0.3,
datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.1, datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.2, datacenter:datacenter1, rack:rack1)])
TokenRange(start_token:-9223372036854775808,
end_token:-3074457345618258603,
endpoints:[127.0.0.2, 127.0.0.3, 127.0.0.1],
rpc_endpoints:[127.0.0.2, 127.0.0.3, 127.0.0.1],
endpoint_details:[EndpointDetails(host:127.0.0.2, datacenter:datacenter1,
rack:rack1),
EndpointDetails(host:127.0.0.3, datacenter:datacenter1, rack:rack1),
EndpointDetails(host:127.0.0.1, datacenter:datacenter1, rack:rack1)])
nodetool disableautocompaction
Disables autocompaction for a keyspace and one or more tables for the current node or the specified node.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
543
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
table_name
The table name.
nodetool disablebackup
Disables incremental backup.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
544
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool disablebackup
nodetool disablebinary
Disables the native transport that defines the format of the binary messages.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
545
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
546
DataStax Enterprise tools
$ nodetool disablebinary
nodetool disablegossip
Disables the gossip protocol, which effectively marks the node as down.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
547
DataStax Enterprise tools
Command arguments
Disable gossip
$ nodetool disablegossip
nodetool disablehandoff
Disables storing of future hints.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
548
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool disablehandoff
nodetool disablehintsfordc
Turns off hints for a datacenter, but continue hints on other datacenters.
Useful if there is a downed datacenter and during datacenter failover, when hints will put unnecessary pressure
on the datacenter.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
549
DataStax Enterprise tools
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
datacenter_name
The datacenter name.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
550
DataStax Enterprise tools
Examples
nodetool drain
Flushes all memtables from the node to SSTables on disk. DSE stops listening for connections from the client
and other nodes. You need to restart DSE after running nodetool drain. Typically, use this command before
upgrading a node to a new version of DSE.
To simply flush memtables to disk, use nodetool flush.
OpsCenter provides an option for Draining a node.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
551
DataStax Enterprise tools
Command arguments
$ nodetool drain
nodetool enableautocompaction
Enables autocompaction for a keyspace and one or more tables, or all tables.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
552
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
Keyspace name. By default, all keyspaces.
table_name
The table name.
Examples
nodetool enablebackup
Enables incremental backup.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
553
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
554
DataStax Enterprise tools
$ nodetool enablebackup
nodetool enablebinary
Re-enables the native transport that defines the format of the binary messages.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
555
DataStax Enterprise tools
Command arguments
$ nodetool enablebinary
nodetool enablegossip
Re-enables gossip.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
556
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Re-enable gossip
$ nodetool enablegossip
nodetool enablehandoff
Reenables storing of future hints on the current node.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
557
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool enablehandoff
nodetool enablehintsfordc
Turns on hints for a datacenter that was previously disabled with nodetool disablehintsfordc.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
558
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
559
DataStax Enterprise tools
--
Separates an option from an argument that could be mistaken for a option.
datacenter_name
The datacenter name.
Examples
nodetool failuredetector
Shows the failure detector information for the cluster.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
560
DataStax Enterprise tools
Command arguments
$ nodetool failuredetector
Endpoint, Phi
nodetool flush
Flushes one or more tables from the memtable to SSTables on disk.
OpsCenter provides a flush option in the Nodes UI for Flushing tables.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
561
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
Keyspace name. By default, all keyspaces.
table_name
The table name.
Examples
nodetool garbagecollect
Removes deleted data from one or more tables.
The nodetool garbagecollect command is not the same as the Perform GC option in OpsCenter.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
562
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
563
DataStax Enterprise tools
--
Separates an option from an argument that could be mistaken for a option.
-g, --granularity ROW|CELL
ROW (default) removes deleted partitions and rows.
CELL also removes overwritten or deleted cells.
-j, --jobs num_jobs
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
Examples
To remove deleted data from all tables and keyspaces at the default
granularity
$ nodetool garbagecollect
To remove deleted data from all tables and keyspaces, including overwritten
or deleted cells
nodetool gcstats
Prints garbage collection statistics that returns values based on all the garbage collection that has run since the
last time this command was run. Statistics identify the interval, GC elapsed time (total and standard deviation),
the disk space reclaimed in megabytes (MB), number of garbage collections, and direct memory bytes.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
564
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool gcstats
Result: the garbage collection statistics since the last time the command was run are returned:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
565
DataStax Enterprise tools
nodetool getbatchlogreplaythrottle
Prints batchlog replay throttle in KBs. The batchlog replay throttle replays hints. The throttle is reduced
proportionally to the number of nodes in the cluster.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
566
DataStax Enterprise tools
Command arguments
$ nodetool getbatchlogreplaythrottle
nodetool getcachecapacity
Gets the global key, row, and counter cache capacities in megabytes.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
567
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
To get the global key, row cache, and counter cache capacities:
$ nodetool getcachecapacity
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
568
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool getcachekeystosave
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
569
DataStax Enterprise tools
nodetool getcompactionthreshold
Prints the minimum and maximum compaction thresholds in megabytes (MBs) for a given table.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
570
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
The table name.
Examples
nodetool getcompactionthroughput
Prints current compaction throughput in megabytes (MBs) per second.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
571
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
nodetool getconcurrentcompactors
Gets the number of concurrent compactors in the system.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
572
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
573
DataStax Enterprise tools
Examples
nodetool getconcurrentviewbuilders
Display the number of concurrent materialized view builders in the system.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
574
DataStax Enterprise tools
Command arguments
$ nodetool getconcurrentviewbuilders
nodetool getendpoints
Prints the endpoints that own the partition key.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
575
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
key
Partition key of the end points you want to get.
keyspace_name
The keyspace name.
table_name
The table name.
Examples
For example, which nodes own partition key_1, key_2, and key_3?
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
576
DataStax Enterprise tools
The partitioner returns a token for the key. DSE will return an endpoint regardless of whether data exists on
the identified node for that token.
127.0.0.2
127.0.0.2
For example, consider the following table, which uses a primary key of race_year and race_name. This table is
created in the cycling keyspace.
Given the previous information that was inserted into the table, run nodetool getendpoints and enter a value
from the partition key. For example:
10.255.100.150
The resulting output is the IP address of the replica that owns the partition key.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
577
DataStax Enterprise tools
10.255.100.150
nodetool gethintedhandoffthrottlekb
Gets hinted handoff throttle in KB/sec per delivery thread.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
578
DataStax Enterprise tools
Command arguments
$ nodetool gethintedhandoffthrottlekb
nodetool getinterdcstreamthroughput
Prints the outbound throttle (throughput cap) for all streaming file transfers between datacenters.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
579
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Print the outbound throttle (throughput cap) for streaming file transfers
between datacenters
$ nodetool getinterdcstreamthroughput
nodetool getlogginglevels
Gets the runtime logging levels.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
580
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
581
DataStax Enterprise tools
$ nodetool getlogginglevels
nodetool getmaxhintwindow
Prints the maximum time that the database generates hints for an unresponsive node.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
582
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool getmaxhintwindow
Result: the maximum time that the database generates hints for an unresponsive node is 10800000 milliseconds
(3 hours).
nodetool getseeds
Gets the IP list of the current seed node.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
583
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
584
DataStax Enterprise tools
$ nodetool getseeds
Current list of seed node IPs excluding the current node IP: /10.100.15.1
nodetool getsstables
Prints the SSTable that owns the partition key.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
585
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-hf, --hex-format
Specify the key in hexadecimal string format.
key
Partition key of the end points you want to get.
keyspace_name
The keyspace name.
table_name
The table name.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
586
DataStax Enterprise tools
Examples
/var/lib/cassandra/data/cycling/comments-b6239e719c0411e8a6f11f56fd0aa24a/aa-3-bti-Data.db
The hex string representation of the partition key is useful to resolve errors. For example, find out which SSTable
owns the faulty partition key for this exception:
When the primary key of the given table is a blob, get the DecoratedKey from the hexidecimal representation of
the partition key:
/var/lib/cassandra/data/cycling/comments-b6239e719c0411e8a6f11f5cd5459987/aa-2-bti-Data.db
/var/lib/cassandra/data/cycling/comments-b6239e719c0411e8a6f11f5cd5459987/aa-2-bti-Data.db
nodetool getstreamthroughput
Gets the throughput throttle for streaming file transfers.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
587
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool getstreamthroughput
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
588
DataStax Enterprise tools
nodetool gettimeout
Prints the current timeout values in milliseconds.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
589
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
timeout_type
The timeout type: read, range, write, counterwrite, cascontention, truncate, streamingsocket, or misc
(general rpc_timeout_in_ms).
Examples
$ nodetool gettimeout
nodetool gettraceprobability
Prints the current trace probability value.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
590
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
591
DataStax Enterprise tools
Examples
$ nodetool gettraceprobability
nodetool gossipinfo
Shows the gossip information to discover broadcast protocol between nodes in a cluster.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
592
DataStax Enterprise tools
Command arguments
$ nodetool gossipinfo
localhost/127.0.0.1
generation:1532896921
heartbeat:2038494
STATUS:1611484:NORMAL,8242717283351148695
LOAD:2038483:262546.0
SCHEMA:975284:d4f18346-f81f-3786-aed4-40e03558b299
DC:26:Search
RACK:18:rack1
RELEASE_VERSION:4:4.0.0.602
NATIVE_TRANSPORT_ADDRESS:3:127.0.0.1
X_11_PADDING:11503:
{"dse_version":"6.0.2","workloads":"SearchGraphCassandraAnalytics","workload":"SearchAnalytics","active":"true
A6-6F","graph":true,"health":0.9}
NET_VERSION:1:256
HOST_ID:2:3b8e8192-c1d3-4b01-a792-9673b4e377c1
NATIVE_TRANSPORT_READY:121:true
NATIVE_TRANSPORT_PORT:6:9042
NATIVE_TRANSPORT_PORT_SSL:7:9042
STORAGE_PORT:8:7000
STORAGE_PORT_SSL:9:7001
JMX_PORT:10:7199
TOKENS:1611483:<hidden>
nodetool handoffwindow
Prints current hinted handoff window.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
593
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
594
DataStax Enterprise tools
$ nodetool handoffwindow
The maximum time that the database generates hints for an unresponsive node is 10800000 ms (3 hours).
nodetool help
Provides a synopsis and brief description of each nodetool command.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
595
DataStax Enterprise tools
Connection options
Command arguments
command_name
Name of nodetool command.
Examples
$ nodetool help
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
596
DataStax Enterprise tools
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
597
DataStax Enterprise tools
NAME
nodetool netstats - Print network information on provided host
(connecting node by default)
SYNOPSIS
nodetool [(-h <host> | --host <host>)] [(-p <port> | --port <port>)]
[(-pw <password> | --password <password>)]
[(-u <username> | --username <username>)] netstats
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
598
DataStax Enterprise tools
OPTIONS
-h <host>, --host <host>
Node hostname or ip address
nodetool info
Provides node information, including the token and on disk storage (load) information, times started (generation),
uptime in seconds, and heap memory usage.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
599
DataStax Enterprise tools
Connection options
Command arguments
-T, --tokens
Show all tokens.
Examples
$ nodetool info -T
Result:
ID : 3b8e8192-c1d3-4b01-a792-9673b4e377c1
Gossip active : true
Native Transport active: true
Load : 255.29 KiB
Generation No : 1532896921
Uptime (seconds) : 1882997
Heap Memory (MB) : 604.32 / 4012.00
Off Heap Memory (MB) : 0.00
Data Center : Search
Rack : rack1
Exceptions : 0
Key Cache : entries 0, size 0 bytes, capacity 100 MiB, 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 1 hits, 2 requests,
0.500 recent hit rate, 7200 save period in seconds
Chunk Cache : entries 7871, size 260.79 MiB, capacity 2.79 GiB, 7871 misses,
14839137 requests, 0.999 recent hit rate, 937.529 microseconds miss latency
Percent Repaired : 100.0%
Token : 8242717283351148695
nodetool inmemorystatus
Returns a list of the in-memory tables and the amount of memory each table is using.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
600
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
601
DataStax Enterprise tools
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
Examples
$ nodetool inmemorystatus
Result:
Result:
nodetool invalidatecountercache
Resets global counter cache parameter to save all counter keys. Invalidates the counter_cache_keys_to_save
setting in cassandra.yaml to enable the default behavior to save all keys.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
602
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
nodetool invalidatekeycache
Clears the key cache. The key cache is present only until nodetool sstableupgrades is run.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
603
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
604
DataStax Enterprise tools
$ nodetool invalidatekeycache
nodetool invalidaterowcache
Invalidates the row_cache_keys_to_save setting in cassandra.yaml to enable the default behavior to save all
keys.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
605
DataStax Enterprise tools
The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.
Command arguments
$ nodetool invalidaterowcache
nodetool join
Joins the node to the ring. Valid only when the node was initially not started in the ring with the -Djoin_ring=false
start-up parameter. The joining node must be properly configured with the required cassandra.yaml options for
seed list, initial token, and auto-bootstrapping.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
606
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
nodetool listendpointspendinghints
Prints information about hints that the node has for other nodes.
Hint information includes Host ID, Address, Rack, DC, node status, total number of hints and files, and
timestamp of newest and oldest hints.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
607
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
608
DataStax Enterprise tools
Examples
nodetool listendpointspendinghints
nodetool leaksdetection
Enables and configures memory leak tracking. Tracking information is provided along with a stack trace in
debug.log and system.log when a leak is detected.
The resources currently tracked are:
CachedReadsBufferPool
The non-blocking i/o (NIO) byte buffers that are used by file chunks stored in the chunk cache. The
chunk cache is also referred to as the file cache.
DirectReadsBufferPool
The NIO byte buffers that are used for transient, short-term operations, such as some scattered file
reads.
ChunkCache
The file chunks in the chunk cache. The chunk cache is also referred to as the file cache.
Memory
Native memory accessed directly with malloc calls and therefore not managed by the JVM. Currently
used for compression metadata, bloom filters and the row cache.
The row cache should be disabled in DSE 6.x and later.
If memtables are using off-heap objects, the following resource can also be tracked:
NativeAllocator
The memory used for memtables when the memtable allocation type is offheap objects.
The leaksdetection parameters can also be set in cassandra.yaml. See Memory leak detection settings.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
609
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--set_max_stack_depth number
The depth of the stack traces collected. Changes only the depth of the stack traces that will be collected
from the time the parameter is set. Deeper stacks are more unique, so increasing the depth may require
increasing stacks_cache_size_mb.
Default: 30
--set_max_stacks_cache_size_mb number
Set the size of the cache for call stack traces. Stack traces are used to debug leaked resources, and
use heap memory. Set the amount of heap memory dedicated to each resource by setting the max
stacks cache size in MB.
Default: 32
--set_num_access_records number
Set the average number of stack traces kept when a resource is accessed. Currently only supported for
chunks in the cache.
Default: 0
--set_sampling_probability number
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
610
DataStax Enterprise tools
Set the sampling probability. Each resource is tracked with a sampling probability. Set the sampling
probability to 0 to disable tracking and to 1 to enable tracking all the time. A number between 0 and 1
will randomly track a resource. For example, 0.5 will track resources 50% of the time.
Default: 0
Tracking incurs a significant stack trace collection cost for every access and consumes heap space.
It should never be enabled unless directed by a support engineer or consultant.
resource
The resource to which the parameters should be applied. If not specified, the parameters affect all
resources.
Examples
$ nodetool leaksdetection
Result:
Current Status:
CachedReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
DirectReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
ChunkCache/Chunk - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
Memory/Memory - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
Result:
Current Status:
CachedReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
DirectReadsBufferPool/ByteBuffer - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
ChunkCache/Chunk - Sampling probability: 0.000000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30
Memory/Memory - Sampling probability: 0.100000, Max stacks cache size
MB: 32, num. access records: 0, max stack depth: 30.
nodetool listsnapshots
Lists all the snapshots, along with the size on disk, and true size.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
611
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
612
DataStax Enterprise tools
List snapshots
$ nodetool listsnapshots
Snapshot Details:
Snapshot name Keyspace name Column family name True size Size on disk
nodetool mark_unrepaired
Marks all SSTables of a table or keyspace as unrepaired.
This operation marks all targeted SSTables as unrepaired, potentially creating new compaction tasks. Use
only if you are no longer running incremental repair on this node.
When no table name is specified, marks all tables in the keyspace as unrepaired.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
613
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-f, --force
Confirms the operation.
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
Examples
Result:
nodetool: WARNING: This operation will mark all SSTables of keyspace cycling as
unrepaired, potentially creating new compaction tasks. Only use this when no longer
running incremental repair on this node. Use --force option to confirm.
nodetool move
Moves the node on the token ring to a new token, generally used to shift tokens slightly.
Additional syntax is required to move a node to a negative tokens:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
614
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
615
DataStax Enterprise tools
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
token
63 63
The new token. Number in partition range. For Murmur3Partitioner (default): -2 to +2 -1.
Examples
nodetool netstats
Prints network information about the host.
The output includes the following information:
• JVM settings
• Mode - The operational mode of the node: JOINING, LEAVING, NORMAL, DECOMMISSIONED, CLIENT
• Mismatch (blocking) - The number of read repair operations since server restart that blocked a query.
• Mismatch (background) - The number of read repair operations since server restart performed in the
background.
• Pool name - Information about client read and write requests by thread pool size.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
616
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
-H, --human-readable
Display bytes in human readable form: KiB (kibibyte), MiB (mebibyte), GiB (gibibyte), TiB (tebibyte).
-h, --host hostname
The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
617
DataStax Enterprise tools
Examples
$ nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 1
Mismatch (Background): 1
Pool Name Active Pending Completed Dropped
Large messages n/a 0 0 0
Small messages n/a 0 23295 0
Gossip messages n/a 0 1853117 0
nodetool nodesyncservice
Use the following subcommands to manage the NodeSync service on the connected node.
The NodeSync service automatically starts when a DataStax Enterprise node is started.
The service runs continuous repair for tables that have nodesync set to true. By default, the table option is
set to false (disabled). Use CQL ALTER TABLE to change the NodeSync setting on a specific table or dse
nodesync to change the setting on multiple tables.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
618
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
619
DataStax Enterprise tools
true
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
620
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
-f, --force
Forces service to shutdown immediately without completing segment validations that are currently
running.
-t seconds, --timeout seconds
Time to wait in seconds for the service to start.
Default: 120 (2 minutes).
Examples
Shut down NodeSync service on local host without waiting for validations that in process to complete
false
Shut down NodeSync service on host northeast using a timeout period of five minutes
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
621
DataStax Enterprise tools
Set the rate limit temporarily using nodetool nodesyncservice setrate. To persist the rate limit, use the
rate_in_kb setting in cassandra.yaml.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
622
DataStax Enterprise tools
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
623
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--deadline-overrides
Allows override on the configure deadline for some/all of the tables in the simulation.
-ds, --deadline-safety-factor
Specify factor (integer) to decrease table deadlines to account for imperfect conditions.
Only for simulate sub-command.
-e, --excludes keyspace_name.table_name, ...
A comma-separated list of tables to exclude from the simulation when NodeSync is enabled on the
server-side; this simulates the impact on the rate of disabling NodeSync on those tables.
help
Displays options and usage instructions.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
624
DataStax Enterprise tools
--ignore-replication-factor
Ignores the replication factor for the simulation. Without this option, the default assumes that
NodeSync runs on every node of the cluster (which is highly recommended) and assumes that
validation work is spread among replicas. When NodeSync runs on every node of the cluster, each
node must validate the fraction 1/RF of the data the node owns. This option removes that assumption,
and computes a rate that accounts for all the data the node stores.
-i, --includes keyspace_name.table_name, ...
A comma-separated list of tables to include in the simulation when NodeSync is not enabled server-
side; simulates the impact on the rate of enabling NodeSync on those tables.
-rs, --rate-safety-factor factor_integer
Represents a factor of how much to increase the final rate to account for imperfect conditions. Applies
only to the simulate sub-command.
-sg, --size-growth-factor factor_integer
Represents a factor of how much to increase data sizes to account for data growth. Applies only to the
simulate sub-command.
-v, --verbose
Provides details on how the simulation is carried out. Displays all steps taken by the simulation.
Although this option is useful for understanding the simulations, results can be large or may be
excessive if many tables exist.
Examples
Simulate rates with new target times for the comments table
Simulate example
1. In CQL, create tables within a keyspace of RF > 1 and NodeSync enabled. For example:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
625
DataStax Enterprise tools
As expected, the computed rate is rather small because very little data was inserted.
4. Run the simulator with the verbose flag to view insights on why that rate was calculated:
Using parameters:
- Size growing factor: 1.00
- Deadline safety factor: 0.25
- Rate safety factor: 0.10
cycling.comments:
- Deadline target=7.5d, adjusted from 10d for safety.
- Size=1.1MB to validate (2.3MB total (adjusted from 1.1MB for future growth) but
RF=2).
- Added to previous tables, 1.1MB to validate in 7.5d => 2B/s
=> New minimum rate: 2B/s
cycling.comments2:
- Deadline target=7.5d, adjusted from 10d for safety.
- Size=7.1MB to validate (14MB total (adjusted from 7.1MB for future growth) but
RF=2).
- Added to previous tables, 8.3MB to validate in 7.5d => 14B/s
=> New minimum rate: 14B/s
As expected, the computed rate is rather small because very little data was inserted.
Use the nodetool nodesyncservice ratesimulator to review how the change may impact performance. For
more details, see Setting the NodeSync rate.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
626
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
627
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
628
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
-b, --boolean-output
Output NodeSync service status as true or false.
True when service is running and false otherwise. Output is useful for scripts.
Examples
Show NodeSync status on the local host using the boolean option
false
nodetool pausehandoff
Pauses the hints delivery process.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
629
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
630
DataStax Enterprise tools
Examples
$ nodetool pausehandoff
nodetool proxyhistograms
Provides a histogram of network operation statistics at the time of the command.
The output of this command shows the full request latency recorded by the coordinator. The output includes the
percentile rank of read and write latency values for inter-node communication. Typically, you use the command
to see if requests encounter a slow node.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
631
DataStax Enterprise tools
Command arguments
This example shows the output from nodetool proxyhistograms after running 4,500 insert statements and 45,000
select statements on a three ccm node-cluster on a local computer.
$ nodetool proxyhistograms
proxy histograms
Percentile Read Latency Write Latency Range Latency
(micros) (micros) (micros)
50% 1502.50 375.00 446.00
75% 1714.75 420.00 498.00
95% 31210.25 507.00 800.20
98% 36365.00 577.36 948.40
99% 36365.00 740.60 1024.39
Min 616.00 230.00 311.00
Max 36365.00 55726.00 59247.00
CAS Read and Write Latency provides data for compare-and-set operations, while View Write Latency provides
data for materialized view write operations.
proxy histograms
Percentile Read Latency Write Latency Range Latency
CAS Read Latency CAS Write Latency View Write Latency
(micros) (micros) (micros) (micros)
(micros) (micros)
50% 454.83 379.02 1955.67
0.00 0.00 0.00
75% 1358.10 943.13 4055.27
0.00 0.00 0.00
95% 3379.39 12108.97 20924.30
0.00 0.00 0.00
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
632
DataStax Enterprise tools
nodetool rangekeysample
Shows the sampled keys held across all keyspaces.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
633
DataStax Enterprise tools
Command arguments
$ nodetool rangekeysample
RangeKeySample:
356242581507269238
-5568512119108849737
-8044630622444638698
9139769913044883120
9139769913044883120
-9222057613388634431
-9222057613388634431
-8774946291924800999
-8774946291924800999
-7191538117016975626
-7191538117016975626
-4839385740530813564
-4839385740530813564
-2391368834889506351
-2391368834889506351
-257415902412033945
-257415902412033945
2068649272206580393
2068649272206580393
4479264904256751477
4479264904256751477
6874493789974003618
6874493789974003618
-8718305215016653338
-79752896362648430
1139519215559584928
1178565181744072132
-5883607023773259416
-5189327806405140569
2008715943680221220
3066791452337107542
nodetool rebuild
Rebuilds data by streaming from other nodes.
This command operates on multiple nodes in a cluster and streams data only from a single source replica when
rebuilding a token range. Use this command to add a new datacenter to an existing cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
634
DataStax Enterprise tools
If nodetool rebuild is interrupted before completion, restart it by re-entering the command. The process
resumes from the point at which it was interrupted.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
635
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-c, --connections-per-host num_connections
Maximum number of connections per host for streaming. Overrides value of
streaming_connections_per_host in cassandra.yaml.
-dc src_dc_names, --dcs src_dc_names
Comma-separated list of datacenters from which to stream.
• src_dc_names - Datacenter names are case-sensitive. For example, dc-a,dc-b. To include a rack
name, separate datacenter and rack name with a colon (:). For example, dc-a:rack1,dc-a:rack2.
• normal - conventional behavior, streams only ranges that are not already locally available
• refetch - resets locally available ranges, streams all ranges but leaves current data untouched
• reset - resets the locally available ranges, removes all locally present data (like a TRUNCATE),
streams all ranges
• reset-no-snapshot - (like reset) resets the locally available ranges, removes all locally present data
(like a TRUNCATE), streams all ranges but prevents a snapshot even if auto_snapshot is enabled
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
636
DataStax Enterprise tools
Examples
$ nodetool rebuild
nodetool rebuild_index
Performs a full rebuild of native secondary indexes for a given table.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
637
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
index_name
One or more index names, separated by a space.
keyspace_name
The keyspace name.
table_name
The table name.
Examples
nodetool rebuild_view
Performs a rebuild of the specified materialized views for a particular base table on the node on which the
command is run. Use this command to rebuild materialized views after restoring sstables or after restarting a
materialized view build that was previously stopped. If no materialized views are specified, all materialized views
based on the specified table are rebuilt.
The rebuild_view command does not clear existing data in the materialized view.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
638
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
639
DataStax Enterprise tools
--
Separates an option from an argument that could be mistaken for a option.
materialized_view_name
One or more materialized view names, separated by a space. If not specified, all materialized views in
the table are rebuilt.
keyspace_name
The keyspace name.
table_name
The table name.
Examples
nodetool refresh
Loads newly placed SSTables onto the system without a restart.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
640
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
--reset-levels
Force all sstables to level 0.
table_name
The table name.
Examples
nodetool refreshsizeestimates
Refreshes system.size_estimates table. Use when huge amounts of data are inserted or truncated, which can
result in incorrect size estimates.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
641
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
642
DataStax Enterprise tools
Examples
$ nodetool refreshsizeestimates
nodetool reloadseeds
Reloads the seed node list from the seed node provider.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
643
DataStax Enterprise tools
Command arguments
$ nodetool reloadseeds
Updated seed node IP list excluding the current node IP: /10.100.15.1
nodetool reloadtriggers
Reloads trigger classes.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
644
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
$ nodetool reloadtriggers
nodetool relocatesstables
Rewrites SSTables to the correct disk.
Use with JBOD disk storage to manually rewrite the location of SSTables on disk. Useful if you have changed the
replication factor for the cluster or if you added a new disk.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
645
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-j, --jobs num_jobs
keyspace_name
The keyspace name.
table_name
The table name.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
646
DataStax Enterprise tools
Examples
Run this command only on nodes that are down. This command triggers cluster streaming. In large
environments, the additional streaming activity causes more pending gossip tasks in the output of nodetool
tpstats. Nodes can start to appear offline and might need to be restarted to clear up the backlog of pending
gossip tasks.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
647
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
force
Force completion of pending removal.
ID
Remove provided ID.
status
Show status of current node removal.
Examples
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 192.168.2.101 112.82 KB 256 31.7%
420129fc-0d84-42b0-be41-ef7dd3a8ad06 RAC1
DN 192.168.2.103 91.11 KB 256 33.9%
d0844a21-3698-4883-ab66-9e2fd5150edd RAC1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
648
DataStax Enterprise tools
$ nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID
Rack
UN 192.168.2.101 112.82 KB 256 37.7%
420129fc-0d84-42b0-be41-ef7dd3a8ad06 RAC1
UN 192.168.2.102 124.42 KB 256 38.3%
8d5ed9f4-7764-4dbd-bad8-43fddce94b7c RAC1
nodetool repair
Repairs tables on one or more nodes in a cluster when all involved replicas are up and accessible.
Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.
See Repairing nodes. Before using this command, be sure to have an understanding of how node repair works.
If repair encounters a down replica, an error occurs and the repair process halts. Re-run repair after bringing
all replicas back online.
OpsCenter provides a repair option in the Nodes UI for Running a manual repair.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
649
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
650
DataStax Enterprise tools
Examples
All nodetool repair command options are optional. When optional command arguments are not specified, the
defaults are:
• Repair runs in parallel on all nodes with the same replica data at the same time.
• No tracing. No validation.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
651
DataStax Enterprise tools
Results in output:
. . .
INFO [AntiEntropyStage:1] 2014-07-24 22:23:10,708
RepairSession.java:171
- [repair #16499ef0-1381-11e4-88e3-c972e09793ca] Received merkle tree
for sessions from /192.168.2.101
INFO [RepairJobTask:1] 2014-07-24 22:23:10,740 RepairJob.java:145
- [repair #16499ef0-1381-11e4-88e3-c972e09793ca] requesting merkle
trees
for events (to [/192.168.2.103, /192.168.2.101])
. . .
nodetool replaybatchlog
Forces batchlog replay and blocks until batches have been replayed.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
652
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
653
DataStax Enterprise tools
$ nodetool replaybatchlog
nodetool resetlocalschema
Fixes schema disagreements between nodes by dropping the schema information of the local node and
resynchronizing the schema from another node. When schema information on the local node is dropped, the
system schema tables are truncated. The node temporarily loses metadata about the tables on the node, but
rewrites the information from another node.
Useful when:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
654
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Tarball path:
installation_location/resources/cassandra/bin
• For tarball installations, execute the command from the installation_location/bin directory.
• If a username and password for RMI authentication are set explicitly in the cassandra-env.sh file for the
host, then you must specify credentials.
• nodetool bootstrap operates on a single node in the cluster if -h is not used to identify one or more
other nodes. If the node from which you issue the command is the intended target, you do not need the -h
option to identify the target; otherwise, for remote invocation, identify the target node, or nodes, using -h.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
655
DataStax Enterprise tools
Description
The nodetool bootstrap resume command restarts bootstrap streaming.
Examples
nodetool resumehandoff
Resumes hints delivery process.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
656
DataStax Enterprise tools
Command arguments
$ nodetool resumehandoff
nodetool ring
Provides node status and information about the ring as determined by the node being queried. This information
provides an idea of the load balance and if any nodes are down. If the cluster is not properly configured, different
nodes may show a different ring. Check that the node appears the same way in the ring. If you use virtual nodes
(vnodes), use nodetool status for succinct output.
• Address
The node's URL.
• DC (datacenter)
The datacenter containing the node.
• Rack
The rack or, in the case of Amazon EC2, the availability zone of the node.
• Status - Up or Down
Indicates whether the node is functioning or not.
• Token
The end of the token range up to and including the value listed. For an explanation of token ranges, see
Data distribution overview.
• Owns
The percentage of the data owned by the node per datacenter times the replication factor. For example, a
node can own 33% of the ring, but show100% if the replication factor is 3.
• Host ID
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
657
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
-h, --host hostname
The hostname or IP address of a remote node or nodes. When omitted, the default is the local machine.
-p, --port jmx_port
The JMX port number.
-pw, --password jmxpassword
The JMX password for authenticating with secure JMX. If a password is not provided, you are prompted
to enter one.
-pwf, --password-file jmx_password_filepath
The filepath to the file that stores JMX authentication credentials.
-u, --username jmx_username
The user name for authenticating with secure JMX.
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
-r, --resolve-ip
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
658
DataStax Enterprise tools
For LeveledCompactionStrategy (LCS), resets all SSTables back to Level 0 and requires recompaction of all
SSTables.
Synopsis
$ nodetool [connection_options] scrub [-j num_jobs] [-n] [-ns] [-r] [-s] [--] [keyspace_name
table_name [table_name ...]]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
659
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-j, --jobs num_jobs
keyspace_name
The keyspace name.
-n, --no-validate
Do not validate columns using column validator.
-ns, --no-snapshot
If disablesnapshot is false, scrubbed tables are snapshotted first. Default: false.
-r, --reinsert-overflowed-ttl
Rewrite rows with overflowed expiration date affected by CASSANDRA-14092 with the maximum
supported expiration date of 2038-01-19T03:14:06+00:00. The rows are rewritten with the original
timestamp incremented by one millisecond to override/supersede any potential tombstone that may
have been generated during compaction of the affected rows.
-s, --skip-corrupted
Skip corrupted partitions even when scrubbing counter tables. Default is false.
table_name
One or more table names, separated by a space.
nodetool sequence
Sequentially run multiple nodetool commands from a file, resource, or standard input (StdIn) to reduce overhead.
Faster than running nodetool commands individually from a shell script because the JVM doesn't have to restart
for each command.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
660
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
command_name
Commands to execute. Separate individual commands using a colon surrounded by whitespaces ( : ).
--failonerror
Set this option to true to return an error exit code if a child command fails. By default, an error exit code
is not returned if one or more child commands fail.
-i, --input input
The input to run the command.
--stoponerror
Set to true to stop command on error. Default is if one child command fails, the sequence command
continues with remaining commands.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
661
DataStax Enterprise tools
Examples
################################################################################
# Executing 4 commands:
# info
# gettimeout read
# gettimeout write
# status
################################################################################
# Network interface ens3 (ens3): /fe80:0:0:0:f816:3eff:fe17:a66f%ens3/64
[null], /10.200.182.118/19 [/10.200.191.255]
# Network interface lo (lo): /0:0:0:0:0:0:0:1%lo/128 [null], /127.0.0.1/8 [null]
################################################################################
# Command: info
# Timestamp: August 31, 2018 8:24:46 PM UTC
# Timestamp (local): August 31, 2018 8:24:46 PM UTC
# Timestamp (millis since epoch): 1535747086687
################################################################################
ID : 3b8e8192-c1d3-4b01-a792-9673b4e377c1
Gossip active : true
Native Transport active: true
Load : 625.97 KiB
Generation No : 1532896921
Uptime (seconds) : 2850186
Heap Memory (MB) : 1903.08 / 4012.00
Off Heap Memory (MB) : 0.01
Data Center : SearchGraphAnalytics
Rack : rack1
Exceptions : 0
Key Cache : entries 0, size 0 bytes, capacity 100 MiB, 0 hits, 0 requests,
NaN recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 1 hits, 2 requests,
0.500 recent hit rate, 7200 save period in seconds
Chunk Cache : entries 15972, size 595.42 MiB, capacity 2.79 GiB, 15972 misses,
25462774 requests, 0.999 recent hit rate, 606.208 microseconds miss latency
Percent Repaired : 0.0%
Token : 8242717283351148695
# Command 'info' completed successfully in 331 ms
################################################################################
# Command: gettimeout read
# Timestamp: August 31, 2018 8:24:47 PM UTC
# Timestamp (local): August 31, 2018 8:24:47 PM UTC
# Timestamp (millis since epoch): 1535747087024
################################################################################
Current timeout for type read: 5000 ms
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
662
DataStax Enterprise tools
Note: Non-system keyspaces don't have the same replication settings, effective ownership
information is meaningless
# Command 'status' completed successfully in 29 ms
################################################################################
# Total duration: 374ms
# Out of 4 commands, 4 completed successfully, 0 failed.
################################################################################
nodetool setbatchlogreplaythrottle
Sets batchlog replay throttle in KB per second, or 0 to disable throttling. This will be reduced proportionally to the
number of nodes in the cluster.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
663
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
value_in_kb_per_sec
• value - the number of milliseconds that the database generates hints for an unresponsive node
• 0 - disables throttling
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
664
DataStax Enterprise tools
Examples
$ nodetool setbatchlogreplaythrottle 60
$ nodetool setbatchlogreplaythrottle 0
nodetool setcachecapacity
Sets global key, row, and counter cache capacities in megabytes.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
665
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
counter-cache-capacity
Corresponds to the counter_cache_size_in_mb in cassandra.yaml. By default, the database uses the
smaller of minimum of 2.5% of heap or 50 MB.
key-cache-capacity
Key cache capacity in MB units.
row-cache-capacity
Row cache capacity in MB units, corresponds to the row_cache_size_in_mb parameter in
cassandra.yaml. By default, row caching is zero (disabled).
nodetool setcachekeystosave
Sets the global number of keys saved by each cache for faster post-restart warmup.
Overrides the configured value of the row_cache_keys_to_save parameter in cassandra.yaml.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
666
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
counter-cache-keys-to-save
Counter cache capacity in MB units.
key-cache-keys-to-save
Corresponds to the key_cache_keys_to_save (deprecated) parameter in cassandra.yaml. Key cache
limiting is disabled by default, meaning all keys will be saved.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
667
DataStax Enterprise tools
row-cache-keys-to-save
Corresponds to the row_cache_keys_to_save parameter in cassandra.yaml.
The max_threshold table property sets an upper bound on the number of SSTables that may be compacted in
a single minor compaction, as described in How is data updated?.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
668
DataStax Enterprise tools
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
maxthreshold
Resets or overrides internal setting of 32. How many SSTables of a similar size must be present before
a minor compaction is scheduled.
minthreshold
Minimum threshold.
table_name
The table name.
Examples
nodetool setcompactionthroughput
Sets the throughput capacity for compaction in the system, or disables throttling. Overwrites the
compaction_throughput_mb_per_sec setting in cassandra.yaml.
To view the current setting, use nodetool getcompactionthroughput.
Synopsis
Tarball path:
installation_location/resources/cassandra/bin
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
669
DataStax Enterprise tools
value_in_mb The throughput capacity in megabytes (MB) per second for compaction. To disable throttling, set to 0.
Description
Set value_in_mb to 0 to disable throttling.
nodetool setconcurrentcompactors
Sets number of concurrent compactors.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
670
DataStax Enterprise tools
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
num_compactors
Number of concurrent compactors.
Examples
$ nodetool setconcurrentcompactors 2
nodetool setconcurrentviewbuilders
Sets the number of simultaneous materialized view builder tasks allowed to run concurrently. When a view
is created, the node ranges are split into (num_processors * 4) builder tasks and submitted to this executor.
Overrides the concurrent_materialized_view_builders setting in cassandra.yaml.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
671
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
number
The number of concurrent materialized view builder tasks. Must be greater than 0.
Examples
$ nodetool setconcurrentviewbuilders 6
nodetool sethintedhandoffthrottlekb
Sets hinted handoff throttle in KB/sec per delivery thread.
When a node detects that a node for which it is holding hints has recovered, hints are sent to that node. This
command sets the maximum sleep interval per delivery thread after delivering each hint. The interval shrinks
proportionally to the number of nodes in the cluster. For example, if there are two nodes in the cluster, each
delivery thread uses the maximum interval; if there are three nodes, each node throttles to half of the maximum
interval, because the two nodes are expected to deliver hints simultaneously.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
672
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
673
DataStax Enterprise tools
--
Separates an option from an argument that could be mistaken for a option.
value_in_kb_per_sec
• value - the number of milliseconds that the database generates hints for an unresponsive node
• 0 - disables throttling
Examples
$ nodetool sethintedhandoffthrottlekb 64
nodetool setinterdcstreamthroughput
Sets the inter-datacenter throughput capacity in megabits per second (Mbps) streaming.
Since it is a subset of total throughput, inter_dc_stream_throughput_outbound_megabits_per_sec should be
set to a value less than or equal to stream_throughput_outbound_megabits_per_sec.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
674
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
value_in_megabits
• 0 - disables throttling
Examples
$ nodetool setinterdcstreamthroughput 64
nodetool setlogginglevel
Sets the log level threshold for a given component or class.
Use this command to set logging levels for services instead of modifying the logback-text.xml file.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
675
DataStax Enterprise tools
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
class
The following values are valid for the log class qualifier:
• org.apache.cassandra
• org.apache.cassandra.db
• org.apache.cassandra.service.StorageProxy
component
The following values are valid for the log components qualifier:
• bootstrap
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
676
DataStax Enterprise tools
• compaction
• cql
• repair
• ring
• streaming
level
If class qualifier and level arguments to the command are empty or null, logging levels are reset to the
initial configuration.
The valid values for setting the log level include ALL for logging information at all levels, TRACE
through ERROR, and OFF for no logging. TRACE creates the most verbose log, and ERROR, the least.
• ALL
• TRACE
• DEBUG
• INFO (Default)
• WARN
• ERROR
• OFF
When set to TRACE or DEBUG output appears only in the debug.log. When set to INFO the
debug.log is disabled.
Examples
Set StorageProxy service to debug level
Extended logging for compaction is supported and requires table configuration. The extended compaction logs
are stored in a separate file.
nodetool setmaxhintwindow
Sets the maximum time that the database generates hints for an unresponsive node.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
677
DataStax Enterprise tools
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
value_in_ms
• value - milliseconds
• 0 - disables throttling
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
678
DataStax Enterprise tools
Examples
Set time that database generates hints for an unresponsive node to 120
milliseconds
nodetool setstreamthroughput
Sets the throughput capacity in megabits per second (Mb/s) for outbound streaming in the system. Overwrites
the stream_throughput_outbound_megabits_per_sec setting in cassandra.yaml.
If inter_dc_stream_throughput_outbound_megabits_per_sec is set, since it is a subset of total throughput, its
value should be less than or equal to stream_throughput_outbound_megabits_per_sec.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
679
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
value_in_megabits
• 0 - disables throttling
Examples
$ nodetool setstreamthroughput 64
nodetool settimeout
Temporarily sets the timeout for the given timeout type by overriding the corresponding setting in
cassandra.yaml:
• read - read_request_timeout_in_ms
• range - range_request_timeout_in_ms
• write - write_request_timeout_in_ms
• counterwrite - counter_write_request_timeout_in_ms
• cascontention - cas_contention_timeout_in_ms
• truncate - truncate_request_timeout_in_ms
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
680
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
681
DataStax Enterprise tools
--
Separates an option from an argument that could be mistaken for a option.
timeout_in_ms
Time to wait in milliseconds.
0 - Disables socket streaming timeout.
timeout_type
The timeout type: read, range, write, counterwrite, cascontention, truncate, streamingsocket, or misc
(general rpc_timeout_in_ms).
Examples
nodetool settraceprobability
Sets the probability for tracing any given request to value.
Probabilistic tracing identifies which queries are responsible for intermittent query performance problems. You
can trace some or all statements sent to a cluster. Tracing a request usually requires at least 10 rows to be
inserted.
A probability of 1.0 traces everything whereas lesser amounts (for example, 0.10) only sample a certain
percentage of statements. Take care on large and active systems, as system-wide tracing will have a
performance impact. Unless you are under a very light load, tracing all requests (probability 1.0) will probably
overwhelm your system. Start with a small fraction, for example, 0.001 and increase only if necessary.
The trace information is stored in a system_traces keyspace that holds the sessions and events tables that can
be easily queried to answer questions, such as what the most time-consuming query has been since a trace was
started. Query the parameters map and thread column in the system_traces.sessions and system_traces.events
tables for probabilistic tracing information.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
682
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
value
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
683
DataStax Enterprise tools
Examples
$ nodetool settraceprobability 1
$ nodetool settraceprobability 0
nodetool sjk
Runs Swiss Java Knife (SJK) commands to execute, troubleshoot, and monitor the database.
See Using nodetool sjk. To learn more about SJK, see the jvm-tools Github repository.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
684
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
args
Arguments passed as is to 'Swiss Java Knife'.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
685
DataStax Enterprise tools
Examples
nodetool snapshot
Creates a backup by taking a snapshot of table data. A snapshot is a hardlink to the SSTable files in the data
directory for a schema table at the moment the snapshot is executed.
The snapshot directory path is: data/keyspace_name/table-UID/snapshots/snapshot_name. Data is backed up
into multiple .db files and table schema is saved to schema.cql. The schema.cql file captures the structure of
the table at the time of snapshot because restoring the snapshot requires the table to have the same structure.
See this DataStax Support knowledge base article Manual Backup and Restore, with Point-in-time and table-
level restore.
Always run nodetool cleanup before taking a snapshot for restore. Otherwise invalid replicas, that is replicas
that have been superseded by new, valid replicas on newly added nodes can get copied to the target when
they should not. This results in old data showing up on the target.
Before upgrading DataStax Enterprise, be sure to create a back up of all keyspaces. See taking a snapshot.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
686
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
--table, -cf, --column-family table_name
Table name in the specified keyspace.
-kt, --kt-list, -kc, --kc.list keyspace_name.table_name,...
Comma-separated list of keyspace_name.table_name with no spaces after the comma. For example,
cycling.cyclist,basketball.players
-sf, --skip_flush
Do not flush tables before creating the snapshot.
Snapshot will not contain unflushed data.
-t snapshotname, --tag snapshotname
The snapshot filepath. When not specified, the current time is used for the directory name. For example,
1489076973698.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
687
DataStax Enterprise tools
Examples
$ nodetool snapshot
Requested creating snapshot(s) for [all keyspaces] with snapshot name [1489076973698] and
options {skipFlush=false}
Snapshot directory: 1489076973698
The cycling keyspace contains two tables, cyclist_name and upcoming_calendar. The snapshot creates multiple
snapshot directories named cycling_2017-3-9. A number of .db files containing the data are located in these
directories, along with table schema. For example, from the DSE installation directory:
$ ls -1 data/cycling/cyclist_name-9e516080f30811e689e40725f37c761d/snapshots/
cycling_2017-3-9
manifest.json
mc-1-big-CompressionInfo.db
mc-1-big-Data.db
mc-1-big-Digest.crc32
mc-1-big-Filter.db
mc-1-big-Index.db
mc-1-big-Statistics.db
mc-1-big-Summary.db
mc-1-big-TOC.txt
schema.cql
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
688
DataStax Enterprise tools
The resulting snapshot directory 1391461910600 contains data files and the schema of cyclist_name table in
data/cycling/cyclist_name-a882dca02aaf11e58c7b8b496c707234/snapshots.
Take a snapshot the cyclist_name table in the cycling keyspace and the sample_times table in the test
keyspace. For the -kt command argument, list tables in a comma-separated list with no spaces.
nodetool status
Provides information about the cluster, such as the state, load, and IDs.
A frequently used command, nodetool status provides the following information:
• Address
The node's URL.
• Tokens
The number of tokens set for the node.
• Owns
The percentage of the data owned by the node per datacenter times the replication factor. For example, a
node can own 33% of the ring, but shows 100% if the replication factor is 3. For non-system keyspaces, the
endpoint percentage ownership information is shown.
• Host ID
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
689
DataStax Enterprise tools
• Rack
The rack or, in the case of Amazon EC2, the availability zone of the node.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
690
DataStax Enterprise tools
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
-r, --resolve-ip
Node domain names instead of IPs.
Examples
$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 47.66 KB 1 ? aaa1b7c1-6049-4a08-ad3e-3697a0e30e10 rack1
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 47.66 KB 1 33.3% aaa1b7c1-6049-4a08-ad3e-3697a0e30e10 rack1
UN 127.0.0.2 47.67 KB 1 33.3% 1848c369-4306-4874-afdf-5c1e95b8732e rack1
UN 127.0.0.3 47.67 KB 1 33.3% 49578bf1-728f-438d-b1c1-d8dd644b6f7f rack1
nodetool statusbackup
Provides status of incremental backup.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
691
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
692
DataStax Enterprise tools
Examples
not running
nodetool statusbinary
Provides the status of the native transport that defines the format of the binary message.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
693
DataStax Enterprise tools
Command arguments
running
nodetool statusgossip
Provides status of gossip.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
694
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
running
nodetool statushandoff
Provides status of storing future hints.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
695
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
696
DataStax Enterprise tools
nodetool stop
Stops all compaction operations from continuing to run, typically run on a node where compaction has a negative
impact on performance. After the compaction stops, the remaining operations in the queue are continued.
Eventually, the compaction is restarted.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
697
DataStax Enterprise tools
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-id, --compaction-id compaction_id
Stop a single compaction operation by the specified ID. Use nodetool compactionstats to find the ids of
compaction operations in progress.
compaction_type
Supported compaction types:
• COMPACTION
• VALIDATION
• CLEANUP
• SCRUB
• UPGRADE_SSTABLES
• VERIFY
• INDEX_BUILD
• TOMBSTONE_COMPACTION
• ANTICOMPACTION
• VIEW_BUILD
• INDEX_SUMMARY
• RELOCATE
• GARBAGE_COLLECT
nodetool stopdaemon
Stops cassandra daemon.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
698
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
699
DataStax Enterprise tools
nodetool tablehistograms
Initial troubleshooting and performance metrics that provide current performance statics for read and write
latency on a table during the past fifteen minutes.
Synopsis
Tarball path:
installation_location/resources/cassandra/bin
Description
nodetool tablehistograms shows table performance statistics over the past fifteen minutes, including read/
write latency, partition size, cell count, and number of SSTables. Use this tool to analyze performance and tune
individual tables and ensure that the percent latency level meets the SLA for the data stored in the table.
Example
For example, to get statistics for the DSE Search wiki demo solr table, use this command:
Output:
wiki/solr histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell
Count
(micros) (micros) (bytes)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
700
DataStax Enterprise tools
The output shows the percentile rank of read and write latency values, the partition size, and the cell count for
the table.
nodetool tablestats
Provides statistics about one or more tables. Statistics are updated after SSTables change through compaction
or flushing.
DataStax Enterprise uses the metrics-core library to make the output more informative and easier to
understand.
Space used (live) 9592399 Total number of bytes of disk space used Storing data on disk in SSTables
by all active SSTables belonging to this
table.
Space used (total) 9592399 Total number of bytes of disk space used Same as above.
by SSTables belonging to this table,
including obsolete SSTables waiting for
GC management.
Space used by snapshots 0 Total number of bytes of disk space used About snapshots
(total) by snapshot of this table's data.
Off heap memory used (total) Total number of bytes of off heap memory
used for memtables, Bloom filters, index
summaries and compression metadata for
this table.
SSTable Compression Ratio 0.367… Ratio of size of compressed SSTable data Types of compression options.
to its uncompressed size.
Number of partitions (estimate) 3 The number of partition keys for this table. Not the number of primary keys. This
gives you the estimated number of
partitions in the table.
Memtable cell count 1022550 Number of cells (storage engine rows x How the database reads and writes data
columns) of data in the memtable for this
table.
Memtable data size 32028148 Total number of bytes in the memtable for Total amount of live data stored in the
this table. memtable, excluding any data structure
overhead.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
701
DataStax Enterprise tools
Memtable off heap memory 0 Total number of bytes of off-heap data for The maximum amount is set in
used this memtable, including column related cassandra.yaml by the property
overhead and partitions overwritten. memtable_offheap_space_in_mb.
Memtable switch count 3 Number of times a full memtable for this Increases each time the memtable for
table was swapped for an empty one. a table is flushed to disk. See How
memtables are measured article.
Local read latency 0.048 ms Round trip time in milliseconds to How is data read?
complete the most recent request to read
the table.
Local write latency 0.054 ms Round trip time in milliseconds to How are consistent read and write
complete an update to the table. operations handled?
Pending flushes 0 Estimated number of reads, writes, and Monitor this metric to watch for
cluster operations pending for this table. blocked or overloaded memtable flush
writers. The nodetool tpstats tool does
not report on blocked flushwriters.
Bytes pending repair 0.000KiB Size of table data isolated for an ongoing
incremental repair.
Bloom filter false positives 0 Number of false positives reported by this Tuning bloom filters
table's Bloom filter.
Bloom filter false ratio 0.00000 Fraction of all bloom filter checks resulting
in a false positive from the most recent
read.
Bloom filter space used, bytes 11688 Size in bytes of the bloom filter data for
this table.
Bloom filter off heap memory 8 The number of bytes of offheap memory
used used for Bloom filters for this table.
Index summary off heap 41 The number of bytes of off heap memory
memory used used for index summaries for this table.
Average live cells per slice 0.0 Average number of cells scanned by
(last five minutes) single key queries during the last five
minutes.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
702
DataStax Enterprise tools
Maximum live cells per slice 0.0 Maximum number of cells scanned by
(last five minutes) single key queries during the last five
minutes.
Dropped mutations 0.0 The number of mutations (INSERT, A high number of dropped mutations can
UPDATE, or DELETE) started on this indicate an overloaded node.
table but not completed.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
703
DataStax Enterprise tools
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-F, --format json | yaml
The format for the output. The default is plain text. The following wait latencies (in ms) are included in
the following order: 50%, 75%, 95%, 98%, 99%, Min, and Max.
-H, --human-readable
Display bytes in human readable form: KiB (kibibyte), MiB (mebibyte), GiB (gibibyte), TiB (tebibyte).
-i
Ignore list of tables and display remaining tables.
keyspace [tables]
Run compaction on an entire keyspace or specified tables; use a space to separate table names.
• If you do not specify a keyspace or table, a major compaction is run on all keyspaces and tables.
• If you specify only a keyspace, a major compaction is run on all tables in that keyspace.
• If you specify one or more tables, a major compaction is run on those tables.
Examples
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
704
DataStax Enterprise tools
total_number_of_tables: 68
cycling:
write_latency_ms: 0.05625
tables:
calendar:
average_tombstones_per_slice_last_five_minutes: .NaN
bloom_filter_off_heap_memory_used: '0'
bytes_pending_repair: 0
memtable_switch_count: 0
maximum_tombstones_per_slice_last_five_minutes: 0
memtable_cell_count: 12
memtable_data_size: '854'
average_live_cells_per_slice_last_five_minutes: .NaN
local_read_latency_ms: NaN
local_write_latency_ms: '0.046'
pending_flushes: 0
compacted_partition_minimum_bytes: 0
local_read_count: 0
sstable_compression_ratio: -1.0
dropped_mutations: '0'
bloom_filter_false_positives: 0
off_heap_memory_used_total: '0'
memtable_off_heap_memory_used: '0'
index_summary_off_heap_memory_used: '0'
bloom_filter_space_used: '0'
sstables_in_each_level: []
compacted_partition_maximum_bytes: 0
space_used_total: '0'
local_write_count: 12
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
705
DataStax Enterprise tools
compression_metadata_off_heap_memory_used: '0'
number_of_partitions_estimate: 3
bytes_repaired: 0
maximum_live_cells_per_slice_last_five_minutes: 0
space_used_live: '0'
compacted_partition_mean_bytes: 0
bloom_filter_false_ratio: '0.00000'
bytes_unrepaired: 0
percent_repaired: 100.0
space_used_by_snapshots_total: '0'
birthday_list:
average_tombstones_per_slice_last_five_minutes: .NaN
bloom_filter_off_heap_memory_used: '0'
bytes_pending_repair: 0
memtable_switch_count: 0
maximum_tombstones_per_slice_last_five_minutes: 0
memtable_cell_count: 6
memtable_data_size: '799'
average_live_cells_per_slice_last_five_minutes: .NaN
local_read_latency_ms: NaN
local_write_latency_ms: '0.035'
pending_flushes: 0
compacted_partition_minimum_bytes: 0
local_read_count: 0
sstable_compression_ratio: -1.0
dropped_mutations: '0'
bloom_filter_false_positives: 0
off_heap_memory_used_total: '0'
memtable_off_heap_memory_used: '0'
index_summary_off_heap_memory_used: '0'
bloom_filter_space_used: '0'
sstables_in_each_level: []
compacted_partition_maximum_bytes: 0
space_used_total: '0'
local_write_count: 6
compression_metadata_off_heap_memory_used: '0'
number_of_partitions_estimate: 5
bytes_repaired: 0
maximum_live_cells_per_slice_last_five_minutes: 0
space_used_live: '0'
compacted_partition_mean_bytes: 0
bloom_filter_false_ratio: '0.00000'
bytes_unrepaired: 0
percent_repaired: 100.0
space_used_by_snapshots_total: '0'
read_latency_ms: .NaN
pending_flushes: 0
write_count: 20
read_latency: .NaN
read_count: 0
nodetool toppartitions
Samples the activity in a table during the specified duration and reports the most active partitions.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
706
DataStax Enterprise tools
Synopsis
$ nodetool [connection_options] toppartitions [-a samplers] [-k num_partitions] [-s size] [--]
keyspace_name table_name duration
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
707
DataStax Enterprise tools
--
Separates an option from an argument that could be mistaken for a option.
-a samplers, samplers2
Comma-separated list of samplers. Default is all.
duration
Duration in milliseconds.
-k num_partitions
Number of top partitions. Default is 10.
keyspace_name
The keyspace name.
-s size
Capacity of stream summary. A value closer to actual cardinality of partitions yields more accurate
results. Default is 256.
table_name
The table name.
Examples
Sample the most active partitions for the table test.users for 1,000
milliseconds.
WRITES Sampler:
Cardinality: ~2 (256 capacity)
Top 4 partitions:
Partition Count +/-
4b504d39354f37353131 15 14
3738313134394d353530 15 14
4f363735324e324e4d30 15 14
303535324e4b4d504c30 15 14
READS Sampler:
Cardinality: ~3 (256 capacity)
Top 4 partitions:
Partition Count +/-
4d4e30314f374e313730 42 41
4f363735324e324e4d30 42 41
303535324e4b4d504c30 42 41
4e355030324e344d3030 41 40
For each of the samplers used (WRITES and READS in the example), toppartitions reports:
• The cardinality of the sampled operations (that is, the number of unique operations in the sample set)
• The n partitions in the specified table that had the most traffic in the specified time period (where n is the
value of the -k argument, or ten if -k is not explicitly set in the command).
For each Partition, toppartitions reports:
Partition
The partition key
Count
The number of operations of the specified type that occurred during the specified time period.
+/-
The margin of error for the Count statistic
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
708
DataStax Enterprise tools
To keep the toppartitions reporting from slowing performance, the database does not keep
an exact count of operations, but uses sampling techniques to create an approximate number.
(This example reports on a sample cluster; a production system might generate millions of
reads or writes in a few seconds.) The +/- figure allows you to judge the accuracy of the
toppartitions reporting.
nodetool tpstats
Prints usage statistics of thread pools. The DataStax Enterprise (DSE) database is based on a staged event-
driven architecture (SEDA).
The database separates different tasks into stages connected by a messaging service. Each stage has a queue
and a thread pool. Some stages skip the messaging service and queue tasks immediately on a different stage
when it exists on the same node. The database can back up a queue if the next stage is too busy and lead to
performance bottlenecks, as described in Monitoring a DataStax Enterprise cluster.
Reports are updated after SSTables change through compaction or flushing.
Report columns
The nodetool tpstats command report includes the following columns:
Active
The number of Active threads.
Pending
The number of Pending requests waiting to be executed by this thread pool.
Completed
The number of tasks Completed by this thread pool.
Blocked
The number of requests that are currently Blocked because the thread pool for the next step in the
service is full.
All-Time Blocked
The total number of All-Time Blocked requests, which are all requests blocked in this thread pool up
to now.
Report rows
The follow list describes the task or property associated with the task reported in the nodetool tpstats output.
General metrics
The following report aggregated statistics for tasks on the local node:
BackgroundIoStage
Completes background tasks like submitting hints and deserializing the row cache.
CompactionExecutor
Running compaction.
GossipStage
Distributing node information via Gossip. Out of sync schemas can cause issues. You may have to sync
using nodetool resetlocalschema.
HintsDispatcher
Dispatches a single hints file to a specified node in a batched manner.
InternalResponseStage
Responding to non-client initiated messages, including bootstrapping and schema checking.
MemtableFlushWriter
Writing memtable contents to disk. May back up if the queue is overruns the disk I/O, or because of
sorting processes.
nodetool tpstats no longer reports blocked threads in the MemtableFlushWriter pool. Check the
Pending Flushes metric reported by nodetool tblestats.
MemtablePostFlush
Cleaning up after flushing the memtable (discarding commit logs and secondary indexes as needed).
MemtableReclaimMemory
Making unused memory available.
PendingRangeCalculator
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
709
DataStax Enterprise tools
Calculating pending ranges per bootstraps and departed nodes. Reporting by this tool is not useful —
see Developer notes.
PerDiskMemtableFlushWriter_N
Activity for the memtable flush writer of each disk.
ReadRepairStage
Performing read repairs. Usually fast, if there is good connectivity between replicas.
Thread per core (TPC) task metrics
All actions in the TPC loop are labeled and therefore observable. Tasks marked Pendable are throttled, limited
to the value set for tpc_concurrent_requests_limit in the cassandra.yaml (by default, 128). Thread per core
messages are prepended with TPC/type, where:
• TPC/N are metrics for the core number (when --cores is specified).
• TPC/other are metrics for tasks executed that are not on TPC threads.
UNKNOWN
Unknown task.
FRAME_DECODE
Asynchronous frame decoding.
READ_LOCAL
Single-partition read request from a local node generated directly from clients.
READ_REMOTE
Single-partition read request from a remote replica.
READ_TIMEOUT
Signals read timeout errors.
READ_DEFERRED
Single-partition read request that will be first scheduled on an event loop (Pendable)
READ_RESPONSE
Single-partition read response.
READ_RANGE_LOCAL
Partition range read request from a local node generated directly from clients.
READ_RANGE_REMOTE
Partition range read request from a remote replica.
READ_RANGE_NODESYNC
Partition range read originating from NodeSync.
READ_RANGE_INTERNAL
Range reads to internal tables.
READ_RANGE_RESPONSE
Partition range read response.
READ_FROM_ITERATOR
Switching thread to read from an iterator.
READ_SECONDARY_INDEX
Switching thread to read from secondary index.
READ_DISK_ASYNC
Waiting for data from disk.
WRITE_LOCAL
Write request from a local node generated directly from clients.
WRITE_REMOTE
Write request from a remote replica
WRITE_INTERNAL
Writes to internal tables.
WRITE_RESPONSE
Write response
WRITE_DEFRAGMENT
Write issued to defragment data that required too many sstables to read (Pendable)
WRITE_MEMTABLE
Switching thread to write in memtable when not already on the correct thread
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
710
DataStax Enterprise tools
WRITE_POST_COMMITLOG_SEGMENT
Write request is waiting for the commit log segment to be allocated
WRITE_POST_COMMITLOG_SYNC
Write request is waiting for commit log to sync to disk
WRITE_POST_MEMTABLE_FULL
Write request is waiting for space in memtable
BATCH_REPLAY
Replaying a batch mutation
BATCH_STORE
Store a batchlog entry request (Pendable)
BATCH_STORE_RESPONSE
Store a batchlog entry response
BATCH_REMOVE
Remove a batchlog entry (Pendable)
COUNTER_ACQUIRE_LOCK
Acquiring counter lock.
EXECUTE_STATEMENT
Executing a statement.
CAS
Executing compare-and-set (LWT).
LWT_PREPARE
Preparation phase of light-weight transaction (Pendable).
LWT_PROPOSE
Proposal phase of light-weight transaction (Pendable).
LWT_COMMIT
Commit phase of light-weight transaction (Pendable).
TRUNCATE
Truncate request (Pendable).
NODESYNC_VALIDATION
NodeSync validation of a partition.
AUTHENTICATION
Authentication request.
AUTHORIZATION
Authorization request.
TIMED_UNKNOWN
Unknown timed task.
TIMED_TIMEOUT
Scheduled timeout task.
EVENTLOOP_SPIN
Number of busy spin cycles done by this TPC thread when it has no tasks to perform.
EVENTLOOP_YIELD
Number of Thread.yield() calls done by this TPC thread when it has no tasks to perform.
EVENTLOOP_PARK
Number of LockSupport.park() calls done by this TPC thread when it has no tasks to perform.
HINT_DISPATCH
Hint dispatch request (Pendable).
HINT_RESPONSE
Hint dispatch response.
NETWORK_BACKPRESSURE
Scheduled network backpressure.
Droppable messages
The database generates the messages listed below, but discards them after a timeout. The nodetool tpstats
command reports the number of messages of each type that have been dropped. You can view the messages
themselves using a JMX client.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
711
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
712
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
-C, --cores
Include data for each core. The number of cores is determined by the tpc_cores.
-F, --format json | yaml
The format for the output. The default is plain text. The following wait latencies (in ms) are included in
the following order: 50%, 75%, 95%, 98%, 99%, Min, and Max.
Examples
Run nodetool tpstats on the host labcluster
$ nodetool tpstats -C
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
713
DataStax Enterprise tools
MigrationStage 0 0 (N/A)
N/A 7 0 0
PendingRangeCalculator 0 0 (N/A)
N/A 4 0 0
PerDiskMemtableFlushWriter_0 0 0 (N/A)
N/A 25 0 0
ReadRepairStage 0 0 (N/A)
N/A 2 0 0
TPC/0 0 0 (0)
0 3470 N/A 0
TPC/0/EVENTLOOP_SPIN 0 N/A (N/A)
N/A 49289 N/A N/A
TPC/0/READ_DISK_ASYNC 0 N/A (N/A)
N/A 21 N/A N/A
TPC/0/READ_INTERNAL 0 N/A (N/A)
N/A 1565 N/A N/A
TPC/0/READ_RANGE_INTERNAL 0 N/A (N/A)
N/A 14 N/A N/A
TPC/0/READ_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 1572 N/A N/A
TPC/0/TIMED_TIMEOUT 0 N/A (N/A)
N/A 5005 N/A N/A
TPC/0/UNKNOWN 0 N/A (N/A)
N/A 1 N/A N/A
TPC/0/WRITE_INTERNAL 0 N/A (N/A)
N/A 33 N/A N/A
TPC/0/WRITE_SWITCH_FOR_MEMTABLE 0 N/A (N/A)
N/A 251 N/A N/A
TPC/0/WRITE_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 13 N/A N/A
TPC/all/EVENTLOOP_SPIN 0 N/A (N/A)
N/A 49307 N/A N/A
TPC/all/NODESYNC_VALIDATION 0 N/A (N/A)
N/A 2 N/A N/A
TPC/all/READ_DISK_ASYNC 0 N/A (N/A)
N/A 21 N/A N/A
TPC/all/READ_INTERNAL 0 N/A (N/A)
N/A 1565 N/A N/A
TPC/all/READ_RANGE_INTERNAL 0 N/A (N/A)
N/A 14 N/A N/A
TPC/all/READ_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 1572 N/A N/A
TPC/all/TIMED_TIMEOUT 0 N/A (N/A)
N/A 5003 N/A N/A
TPC/all/UNKNOWN 0 N/A (N/A)
N/A 1 N/A N/A
TPC/all/WRITE_INTERNAL 0 N/A (N/A)
N/A 33 N/A N/A
TPC/all/WRITE_SWITCH_FOR_MEMTABLE 0 N/A (N/A)
N/A 251 N/A N/A
TPC/all/WRITE_SWITCH_FOR_RESPONSE 0 N/A (N/A)
N/A 13 N/A N/A
TPC/other 0 0 (0)
0 2 N/A 0
TPC/other/NODESYNC_VALIDATION 0 N/A (N/A)
N/A 2 N/A N/A
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
714
DataStax Enterprise tools
nodetool truncatehints
Truncates all hints on the local node or for one or more endpoints.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
715
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
endpoint
Endpoint address or addresses. IP address or hostname.
nodetool upgradesstables
Rewrites SSTables for tables that are not running the current version of DataStax Enterprise to upgrade to
current version. Use this command when upgrading your server or changing compression options.
See sstableupgrade for SSTable compatibility with current DSE version.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
716
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
--
Separates an option from an argument that could be mistaken for a option.
-a, --include-all-sstables
Upgrade target SSTables, including SSTables already on the current DSE version.
-j, --jobs num_jobs
keyspace_name
The keyspace name.
table_name
One or more table names, separated by a space.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
717
DataStax Enterprise tools
Examples
Upgrade all SSTables in the cycling keyspace and the cyclist_name table
Force upgrade all SSTables, including SSTables already on the current DSE version.
$ nodetool upgradesstables -a
Force upgrade the SSTables for the specified keyspace and table, including SSTables already on the current
DSE version.
nodetool verify
Checks the data checksum for one or more specified tables.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
718
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
-e, --extended-verify
Each cell data, beyond simply checking SSTable checksums.
--
Separates an option from an argument that could be mistaken for a option.
keyspace_name
The keyspace name.
table_name
One table name, or many table names separated with a space.
table_name
The table name.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
719
DataStax Enterprise tools
Examples
nodetool version
Provides the DSE database version.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
720
DataStax Enterprise tools
Command arguments
$ nodetool version
ReleaseVersion: 4.0.0.607
nodetool viewbuildstatus
Shows the progress of a materialized view build.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
721
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Connection options
Command arguments
keyspace_name
Keyspace name. By default, all keyspaces.
view
The name of the view.
dse commands
The dse commands for starting the database and connecting an external client to a DataStax Enterprise node and
performing common utility tasks.
About dse commands
The dse commands provide controls for starting and using DataStax Enterprise (DSE).
dse subcommands
Specify one dse subcommand and none or more optional command arguments.
When multiple flags are used, list them separately on the command line. For example, ensure there is a space
between -k and -s in dse cassandra -k -s.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
722
DataStax Enterprise tools
$ dse [-f config_file | -u username -p password] [-a jmx_username [-b jmx_password]] command
[options]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Specify how to connect and authenticate to the database for dse commands.
This list shows short form (-f filename) and long form (--config-file=filename):
-f, --config-file config_filename
File path to configuration file that stores credentials. The credentials in this configuration file override the
~/.dserc credentials. If not specified, then use ~/.dserc if it exists.
The configuration file can contain DataStax Enterprise and JMX login credentials. For example:
username=username
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
723
DataStax Enterprise tools
password=password
jmx_username=jmx_username
jmx_password=jmx_password
The credentials in the configuration file are stored in clear text. DataStax recommends restricting
access to this file only to the specific user.
-u username
Role to authenticate for database access.
-p, --password password
Password to authenticate for database access.
-a, --jmxusername jmx_username
User name for authenticating with secure local JMX.
-b, --jmxpassword jmx_password
Password for authenticating with secure local JMX. If you do not provide a password, you are prompted
to enter one.
Examples
$ dse -f configfile
dse add-node
For DSE Multi-Instance, simplifies adding and configuring a node on a host machine. When optional parameters
are absent, the default values remain unchanged.
The user running the command must have permissions for writing to the directories that DSE uses, or use
sudo.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
724
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
725
DataStax Enterprise tools
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
726
DataStax Enterprise tools
Add node1
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
727
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Connection options
JMX authentication is supported by some dse commands. Other dse commands authenticate with the user
name and password of the configured user. The connection option short form and long form are comma
separated.
You can provide authentication credentials in several ways, see Credentials for authentication.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
728
DataStax Enterprise tools
destination|name|value
mydest|addresses|192.168.200.100
mydest|transmission-enabled|true
mydest|driver-ssl-cipher-suites|
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,
mydest|driver-ssl-enabled|false
mydest|driver-ssl-protocol|TLS
mydest|name|mydest
mydest|driver-connect-timeout|15000
mydest|driver-max-requests-per-connection|1024
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
729
DataStax Enterprise tools
mydest|driver-connections-max|8
mydest|driver-connections|1
mydest|driver-compression|lz4
mydest|driver-consistency-level|ONE
mydest|driver-allow-remote-dcs-for-local-cl|false
mydest|driver-used-hosts-per-remote-dc|0
mydest|driver-read-timeout|15000
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
730
DataStax Enterprise tools
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set
the source-id and source-id-column differently from the global setting.
dse advrep channel update
Updates a replication channel configuration.
A replication channel is a defined channel of change data between source clusters and destination clusters.
To update a channel, specify a new value for one or more options.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
731
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
732
DataStax Enterprise tools
The order in which the source table log files are transmitted.
Examples
$
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|
priority|dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|demo |sensor_readings |true |true |LIFO |2 |
demo |sensor_readings |source1|source_id |mydest |true |
--------------------------------------------------------------------------------------------------------------
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same. You can also set
the source-id and source-id-column differently from the global setting.
dse advrep channel delete
Deletes a replication channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.
To delete a channel, you must specify source information and the destination and data-center for the channel.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
733
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel pause
Pauses replication for a channel for change data to flow from a source cluster to a destination cluster.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Pause collection of data or transmission of data between a source cluster and destination cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
734
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
735
DataStax Enterprise tools
Examples
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel resume
Resumes replication for a channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.
A channel can resume either the collection or transmission of replication between a source cluster and
destination cluster.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
736
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
--collection
No data for the source table is collected.
--transmission
No data for the source table is sent to the configured destinations.
Examples
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel status
Prints status of a replication channel.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
737
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destination destination
The destination where the replication will be sent; the user names the destination.
--data-center-id data_center_id
The datacenter for this channel.
Examples
with a result:
--------------------------------------------------------------------------------------------------------------
|dc |keyspace|table |collecting|transmitting|replication order|priority|
dest ks|dest table |src id |src id col|dest |dest enabled|
--------------------------------------------------------------------------------------------------------------
|Cassandra|foo |bar |true |true |FIFO |2 |
foo |bar |source1|source_id |mydest|true |
--------------------------------------------------------------------------------------------------------------
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep channel truncate
Truncates a channel to prevent replicating all messages that are currently in the replication log.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
738
DataStax Enterprise tools
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--source-keyspace keyspace_name
The source cluster keyspace to replicate.
--source-table source_table_name
The source table to replicate.
--destinations destination [ , destination ]
The destinations where the replication are sent.
--data-center-ids data_center_id [ , data_center_id ]
The datacenters for this channel, which must exist.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
739
DataStax Enterprise tools
Examples
with a result:
The source datacenter will be the datacenter in which the command is run. The keyspace and table names on
the destination can be different than on the source, but in this example they are the same.
dse advrep conf list
Lists configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
740
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Examples
The result:
----------------------------
|name |value |
----------------------------
|audit_log_file |auditLog|
----------------------------
|permits |8 |
----------------------------
|audit_log_enabled|true |
----------------------------
The number of permits is 8, audit logging is enabled, and the audit log file name is auditLog.
dse advrep conf remove
Removes configuration settings for advanced replication.
A replication channel is a defined channel of change data between source clusters and destination clusters.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
741
DataStax Enterprise tools
--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log
Examples
with a result:
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
742
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--audit-log-compression true|false
Enable or disable audit logging.
--audit-log-compression none|gzip
Enable audit log compression. Default: none
--audit-log-file log_file_name
The audit log filename.
--audit-log-rotate-max number_of_minutes
The maximum number of minutes for the audit log lifespan.
--audit-log-rotate-mins number_of_minutes
The number of minutes before the audit log will rotate.
--permits number_of_permits
Maximum number of messages that can be replicated in parallel over all destinations. Default: 1024
--collection-max-open-files number_of_files
Number of open files kept.
--collection-time-slice-count number_of_files
The number of files which are open in the ingestor simultaneously.
--collection-time-slice-width time_period_in_seconds
The time period in seconds for each data block ingested. Smaller time widths mean more files,
whereas larger timer widths mean larger files, but more data to resend on CRC mismatches.
--collection-expire-after-write
Whether the collection expires after the write occurs.
--invalid-message-log none|system_log|channel_log
Specify where error information is stored for messages that could not be replicated. Default:
channel_log
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
743
DataStax Enterprise tools
Examples
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
744
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
745
DataStax Enterprise tools
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples
To update a replication destination:
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
746
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
747
DataStax Enterprise tools
--driver-truststore-type truststore_type
The SSL truststore type to use for driver connections.
Examples
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
748
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
749
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Examples
with a result:
----------------
|name |enabled|
----------------
|mydest|true |
----------------
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
750
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
with a result:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
751
DataStax Enterprise tools
| | |TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_256_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_256_CBC_SHA,
|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA256,
|
| | |
TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256, |
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA256,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
|
| | |TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
|
| | |
TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,|
| | |
TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,|
| | |TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_RSA_WITH_AES_256_GCM_SHA384,
|
| | |
TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384, |
| | |TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_RSA_WITH_AES_256_GCM_SHA384,
|
| | |TLS_DHE_DSS_WITH_AES_256_GCM_SHA384,
|
| | |TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_RSA_WITH_AES_128_GCM_SHA256,
|
| | |
TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256, |
| | |TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256,
|
| | |TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
752
DataStax Enterprise tools
| | |TLS_DHE_DSS_WITH_AES_128_GCM_SHA256,
|
| | |TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
|
| | |SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA,
|
| | |TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDHE_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_ECDSA_WITH_RC4_128_SHA,
|
| | |TLS_ECDH_RSA_WITH_RC4_128_SHA,
|
| | |SSL_RSA_WITH_RC4_128_MD5,
|
| | |TLS_EMPTY_RENEGOTIATION_INFO_SCSV
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-enabled |false
|
-------------------------------------------------------------------------------------------
|mydest |driver-ssl-protocol |TLS
|
-------------------------------------------------------------------------------------------
|mydest |name |mydest
|
-------------------------------------------------------------------------------------------
|mydest |driver-connect-timeout |15000
|
-------------------------------------------------------------------------------------------
|mydest |driver-max-requests-per-connection |1024
|
-------------------------------------------------------------------------------------------
|mydest |driver-connections-max |8
|
-------------------------------------------------------------------------------------------
|mydest |driver-connections |1
|
-------------------------------------------------------------------------------------------
|mydest |driver-compression |lz4
|
-------------------------------------------------------------------------------------------
|mydest |driver-consistency-level |ONE
|
-------------------------------------------------------------------------------------------
|mydest |driver-allow-remote-dcs-for-local-cl|false
|
-------------------------------------------------------------------------------------------
|mydest |driver-used-hosts-per-remote-dc |0
|
-------------------------------------------------------------------------------------------
|mydest |driver-read-timeout |15000
|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
753
DataStax Enterprise tools
-------------------------------------------------------------------------------------------
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
754
DataStax Enterprise tools
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
755
DataStax Enterprise tools
Examples
with a result:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
756
DataStax Enterprise tools
--metric-type metric_type
The source table for which to show count.
Examples
with a result:
------------------------------------------
|Group |Type |Count|
------------------------------------------
|Tables |MessagesDelivered |3000 |
------------------------------------------
|ReplicationLog|CommitLogsToConsume|1 |
------------------------------------------
|Tables |MessagesReceived |3000 |
------------------------------------------
|ReplicationLog|MessageAddErrors |0 |
------------------------------------------
|ReplicationLog|CommitLogsDeleted |0 |
------------------------------------------
--------------------------------------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate |
FifteenMinuteRate |OneMinuteRate |FiveMinuteRate |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded |3000 |events/second|0.020790532589851248|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesDeleted |0 |events/second|0.0 |0.0
|0.0 |0.0 |
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|MessagesAcknowledged |3000 |events/second|0.020790529428089743|
4.569533277209345E-28|2.964393875E-314 |2.3185964029982446E-82|
--------------------------------------------------------------------------------------------------------------
|ReplicationLog|CommitLogMessagesRead|30740|events/second|0.21303361656215317 |
0.13538523143065767 |0.01686330377344829|0.11519609320406245 |
--------------------------------------------------------------------------------------------------------------
-------------------------------------
|Group |Type |Value|
-------------------------------------
|Transmission|AvailablePermits|30000|
-------------------------------------
with a result:
--------------------------------
|Group |Type |Count|
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
757
DataStax Enterprise tools
--------------------------------
|Tables|MessagesDelivered|3000 |
--------------------------------
|Tables|MessagesReceived |3000 |
--------------------------------
with a result:
-----------------------------------------------------------------------------------
|Group |Type |Count|RateUnit |MeanRate
|FifteenMinuteRate |OneMinuteRate |FiveMinuteRate
|
-----------------------------------------------------------------------------------
|ReplicationLog|MessagesAdded|3000 |events/second|
0.020827685267120057|6.100068258619765E-28|2.964393875E-314|
5.515866021410421E-82|
-----------------------------------------------------------------------------------
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
758
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
$ dse advrep replog count --destination mydest --source-keyspace foo --source-table bar
with a result:
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
759
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--file audit_log_filename
The audit log file to create.
Examples
with a result:
dse beeline
Starts the Beeline shell.
Synopsis
$ dse beeline
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
760
DataStax Enterprise tools
dse cassandra
Starts the database in transactional mode. Command options start the database in other modes and enable
advanced features on a node. See Starting DataStax Enterprise.
To change the DSE system properties on start up, see Setting system properties during startup.
Synopsis
When multiple flags are used, list them separately on the command line. For example, ensure there is a space
between -k and -s in dse cassandra -k -s.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Options
-k
Start the node in analytics mode. The first time the node starts up the analytics workload type is
configured.
-g
Start the node in graph mode. The first time the node starts up the graph workload type is configured.
-s
Start the node in search mode. The first time the node starts up the search workload type is configured.
-E
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
761
DataStax Enterprise tools
$ dse cassandra
$ dse cassandra -f
$ dse cassandra -k
$ dse cassandra -k -s
Start a node in DSE Analytics, DSE Graph, and DSE Search modes
$ dse cassandra -k -g -s
Ensure there is a space between -k, -g, and -s in dse cassandra -k -g -s.
Start a node in DSE Search mode and change the location of the search index
data on the server
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
762
DataStax Enterprise tools
Experiment with different strategies and benchmark write performance differences without affecting the
production workload. See Testing compaction and compression.
Start a node in transactional mode and pass the dead node IP address
dse cassandra-stop
Stops the DataStax Enterprise process.
See Stopping a node.
Synopsis
$ cassandra-stop -p pid
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
763
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
pid
DataStax Enterprise (cassandra) process id.
Examples
Stop by process id
cassandra-stop -p 41234
dse exec
Sets the environment variables required to run third-party tools that integrate with Spark:
• Sets other environment variables required by DSE Spark to enable custom DSE
This command is typically used for third-party tools that integrate with Spark.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
764
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Examples
See Using DSE Spark with third party tools and integrations.
dse fs
Starts the DSE File System (DSEFS). The DSEFS prompt shows the current working directory, which is the
default DSEFS search directory.
See DSEFS (DataStax Enterprise file system).
Synopsis
--prefer-contact-points -h IP_address1,IP_address2,...
Give precedence to the specified hosts, regardless of proximity, when issuing DSEFS commands. As
long as the specified hosts are available, DSEFS will not switch to other DSEFS nodes in the cluster.
Without these options, DSEFS switches to the closest available DSEFS node.
Examples
Start DSEFS
$ dse fs
Connected to DataStax Enterprise File System 6.0.2 at DSE cluster Test Cluster
Type help to get the list of available commands.
dsefs dsefs://127.0.0.1:5598/ >
$ dse fs 10.0.0.2,10.0.0.5
Connected to DataStax Enterprise File System 6.0.2 at DSE cluster Test Cluster
Type help to get the list of available commands.
dsefs dsefs://127.0.0.1:5598/ >
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
765
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
766
DataStax Enterprise tools
port
Port number of the DataStax Enterpise database port, default is 9042. Overrides the setting in the
remote.yaml.
Options
Gremlin console options.
-C, --color
Disable use of ANSI colors.
-D, --debug
Enabled debug console output.
-Q, --quiet
Suppress superfluous console output.
-V, --verbose
Enable verbose Console output
-e, --execute=SCRIPT_NAME [ARG1 ARG2 …]
Execute the specified script and close the console on completion.
-h, --help
Display this help message.
-i, --interactive=SCRIPT_NAME [ARG1 ARG2 ... ]
Execute the specified script and leave the console open on completion.
-l
Set the logging level of components that use standard logging output independent of the Console.
-v, --version
Display the version.
dse hadoop fs
Invokes DSEFS operations using the HDFS interface to DSEFS. DseFileSystem has partial support of the
Hadoop FileSystem interface.
See Hadoop FileSystem interface implemented by DseFileSystem and DSEFS.
Synopsis
$ dse hadoop fs
Examples
$ dse hadoop fs
Connected to DataStax Enterprise File System 6.0.2 at DSE cluster Test Cluster
Type help to get the list of available commands.
dsefs dsefs://127.0.0.1:5598/ >
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
767
DataStax Enterprise tools
Synopsis
$ dse list-nodes
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
This command takes no arguments and lists the nodes that are configured for the DSE Multi-Instance host
machine.
Examples
$ dse list-nodes
dse pyspark
Starts the Spark Python shell.
See the DataFrames documentation for an example of using PySpark, and the PySpark API documentation.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
768
DataStax Enterprise tools
Synopsis
$ dse pyspark
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
The user running the command must have permissions for writing to the directories that DSE uses, or use
sudo.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
769
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
nodeId
Required. Because the node name is always prepended with dse- the remove-node command works if
you specify dse-nodeID or just nodeID.
--yes
Confirms node deletion. Files are deleted and are not recoverable. When not specified, you are
prompted to confirm node deletion.
Examples
##############################
#
# WARNING
# You're trying to remove node dse-payrollnode
# This means that all configuration files for dse-payrollnode will be deleted
#
##############################
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
770
DataStax Enterprise tools
#?
dse spark
Enters interactive Spark shell and offers basic auto-completion.
• Starting Spark
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
771
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
In general, Spark submission arguments (--submission_args) are translated into system properties -
Dname=value and other VM parameters like classpath. The application arguments (-app_args) are passed
directly to the application.
Configure the Spark shell with these arguments:
--conf name=spark.value|sparkproperties.conf
An arbitrary Spark option to the Spark configuration prefixed by spark.
• name-spark.value
• sparkproperties.conf - a configuration
--executor-memory mem
The amount of memory that each executor can consume for the application. Spark uses a 512 MB
default. Specify the memory argument in JVM format using the k, m, or g suffix.
-framework dse|spark-2.0
The classpath for the Spark shell. When not set, the default is dse.
• dse - Sets the Spark classpath to the same classpath that is used by the DSE server.
• spark-2.0 - Sets a classpath that is used by the open source Spark (OSS) 2.0 release to
accommodate applications originally written for open source Apache Spark. Uses a BYOS (Bring
Your Own Spark) JAR with shaded references to internal dependencies to eliminate complexity
when porting an app from OSS Spark.
If the code works on DSE, applications do not require the spark-2.0 framework. Full support
in the spark-2.0 framework might require specifying additional dependencies. For example:
hadoop-aws is included on the dse server path but is not present on the OSS Spark-2.0
classpath. In this example, applications that use S3 or other AWS APIs must include their
own aws-sdk on the runtime classpath. This additional runtime classpath is required only for
applications that cannot run on the DSE classpath.
--help
Shows a help message that displays all options except DataStax Enterprise Spark shell options.
-i app_script_file
Spark shell application argument that runs a script from the specified file.
--jars path_to_additional_jars
A comma-separated list of paths to additional JAR files.
--master dse://?appReconnectionTimeoutSeconds=secs
A custom timeout value when submitting the application, useful for troubleshooting Spark application
failures. The default timeout value is 5 seconds.
--properties-file path_to_properties_file
The location of the properties file that has the configuration settings. By default, Spark loads the settings
from spark-defaults.conf.
--total-executor-cores cores
The total number of cores the application uses.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
772
DataStax Enterprise tools
Examples
$ dse spark
DseGraphFrame and Spark SQL are case insensitive by default. Column names that differ only in case will result
in conflicts. The Spark property spark.sql.caseSensitive=true avoids case conflicts.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
773
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
This command supports the same options as running a class using the java command.
Examples
dse spark-jobserver
Starts and stops the Spark Jobserver that is bundled with DSE.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
774
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
start
Starts the Spark Jobserver.
--verbose
Displays which arguments are recognized as Spark configuration options and which arguments are
forwarded to the Spark shell.
stop
Stops the Spark Jobserver.
For the dse spark-jobserver start command, apply one or more valid spark-submit options.
--properties-file path_to_properties_file
The location of the properties file that has the configuration settings. By default, Spark loads the settings
from spark-defaults.conf.
--executor-memory mem
The amount of memory that each executor can consume for the application. Spark uses a 512 MB
default. Specify the memory argument in JVM format using the k, m, or g suffix.
--total-executor-cores cores
The total number of cores the application uses.
--conf name=spark.value|sparkproperties.conf
An arbitrary Spark option to the Spark configuration prefixed by spark.
• name-spark.value
• sparkproperties.conf - a configuration
--jars path_to_additional_jars
A comma-separated list of paths to additional JAR files.
Examples
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
775
DataStax Enterprise tools
dse spark-history-server
Starts and stops the Spark history server, the front-end application that displays logging data from all nodes in
the Spark cluster.
Configuration is required for the Spark history server. See Spark history server.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
start
Starts the Spark history server to load the event logs from Spark jobs that were run with event logging
enabled. The Spark history server can be started from any node in the cluster.
--properties-file properties_file
The properties file to overwrite the default Spark configuration in conf/spark-defaults.conf. The
properties file can include settings like the authentication method and credentials and event log location.
stop
Stops the Spark history server.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
776
DataStax Enterprise tools
Examples
The Spark history server is started with the default configuration in conf/spark-defaults.conf.
The Spark history server is started with the configuration specified in sparkproperties.conf.
dse spark-sql
Starts the Spark SQL shell in DSE to interactively perform Spark SQL queries.
The Spark SQL shell in DSE automatically creates a Spark session and connects to the Spark SQL Thrift server
to handle the underlying JDBC connections. See Using Spark SQL to query data.
Synopsis
$ dse spark-sql
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
777
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
$ dse spark-sql
At the spark-sql prompt, you can interactively perform Spark SQL queries.
dse spark-sql-thriftserver
Starts and stops the Spark SQL Thriftserver. The Spark SQL Server uses a JDBC and an ODBC interface for
client connections to DSE.
Configuration is required for the Spark SQL Thriftserver. See Using the Spark SQL Thriftserver.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
778
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
start
Starts the Spark SQL Thriftserver. The user who runs the command to start the Spark SQL Thriftserver
requires permissions to write to the Spark directories.
--conf spark_prop
Pass in general Spark configuration settings, like spark.cores.max=4.
-hiveconf config_file
Pass in a hive configuration property, like hive.server2.thrift.port=10001.
stop
Stops the Spark SQL Thriftserver.
Examples
Start the Spark SQL Thriftserver with default Spark and Hive options
dse spark-submit
Launches applications on a cluster to enable use of Spark cluster managers through a uniform interface. This
command supports the same options as Apache Spark spark-submit.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
779
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
This command supports the same options as Apache Spark spark-submit. Unlike the standard behavior for the
Spark status and kill options, in DSE deployments these options do not require the Spark Master IP address.
kill driver_id
Kill a Spark application running in the DSE cluster.
master master_ip_address
The IP address of the Spark Master running in the DSE cluster.
status driver_id
Get the status of a Spark application running in the DSE cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
780
DataStax Enterprise tools
Examples
Run the HTTP response example program (located in the dse-demos directory)
on two nodes:
Pass the SSL configuration with standard Spark commands to use secure HTTPS on port 4440.
Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.
To kill a driver
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
781
DataStax Enterprise tools
Unlike the Apache Spark option, you do not have to specify the Spark Master IP address.
dse SparkR
Starts the R shell configured with DSE Spark to automatically set the Spark session within R. See Using SparkR
with DataStax Enterprise.
Synopsis
$ dse SparkR
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
782
DataStax Enterprise tools
Examples
$ dse sparkR
dse -v
Sends the DataStax Enterprise version number to standard output.
This command does not require authentication.
Synopsis
$ dse -v
Example
$ dse -v
6.0.7
dse client-tool
About dse client-tool
The dse client-tool command line interface connects an external client to a DataStax Enterprise node and
performs common utility tasks.
Connection options
Connection options specify how to connect and authenticate for all dse client-tool commands:
Short Long Description
-p --password Password.
-u --username Username.
-t Delegation token which can be used to login. Alternatively, you can use
the DSE_TOKEN environment variable.
• If a username and password for RMI authentication are set explicitly in the cassandra-env.sh file for the
host, then you must specify credentials.
• The repair and rebuild commands can affect multiple nodes in the cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
783
DataStax Enterprise tools
• Most nodetool commands operate on a single node in the cluster if -h is not used to identify one or more
other nodes. If the node from which you issue the command is the intended target, you do not need the -h
option to identify the target; otherwise, for remote invocation, identify the target node, or nodes, using -h.
Example:
To show the command line help for a specific dse client-tool subcommand:
For example:
You can provide authentication credentials in several ways, see Credentials for authentication.
To enable dsetool to use Kerberos authentication, see Using dsetool with Kerberos enabled cluster.
Different sources of configuration properties are used to connect external clients to a DSE node: DSE
configuration in dse.yaml and cassandra.yaml.
You can provide authentication credentials in several ways, see Credentials for authentication. The dse
client-tool subcommands use DSE Unified Authentication, like the Java and other language drivers, not JMX
authentication like dsetool.
RPC permissions over the native protocol leverage DSE authentication and role-based access abilities. To
configure external client access to DataStax Enterprise commands, see Authorizing remote procedure calls
(RPC).
DSE proxy authentication can be used with dse client-tool, and delegation tokens can be generated for
the proxy authenticated role. If the role alice is authenticated, and alice uses proxy authorization to the role
bob, alice's delegation token can be used authenticate as alice and authorize as bob. If bob loses login
permissions, the token can still be used to login as alice, because the token reflects alice's authentication. If
alice loses authorization permissions for bob, the token cannot be used to login .
Synopsis
$ dse client-tool [-a proxy_auth_username] [-u username] [-p password] [--port port] [--host
hostname] [--sasl-protocol-name dse_service_principal] [--keystore-path ssl_keystore_path]
[--keystore-password keystore_password] [--keystore-type ssl_keystore_type] [--truststore-
path ssl_truststore_path] [--truststore-password ssl_truststore_password] [--truststore-type
ssl_truststore_type] [--cipher-suites ssl_cipher_suites] [--kerberos-enabled (true | false)]
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
784
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--cipher-suites ssl_cipher_suites
Specify comma-separated list of SSL cipher suites for connection to DSE when SSL is enabled. For
example, --cipher-suites c1,c2,c3.
--host hostname
The DSE node hostname or IP address.
--kerberos-enabled true | false
Whether Kerberos authentication is enabled for connections to DSE. For example, --kerberos-enabled
true.
--keystore-password keystore_password
Keystore password for connection to DSE when SSL client authentication is enabled.
--keystore-path ssl_keystore_path
Path to the keystore for connection to DSE when SSL client authentication is enabled.
--keystore-type ssl_keystore_type
Keystore type for connection to DSE when SSL client authentication is enabled. JKS is the type for keys
generated by the Java keytool binary, but other types are possible, depending on user environment.
-p password
The password to authenticate for database access. Can use the DSE_PASSWORD environment
variable.
--port port
The native protocol RPC connection port (Thrift).
--sasl-protocol-name dse_service_principal
SASL protocol name, that is, the DSE service principal name.
--ssl
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
785
DataStax Enterprise tools
Whether SSL is enabled for connection to DSE.--ssl-enabled true is the same as --ssl.
--ssl-protocol ssl_protocol
SSL protocol for connection to DSE when SSL is enabled. For example, --ssl-protocol ssl4.
-t token
Specify delegation token which can be used to login, or alternatively, DSE_TOKEN environment
variable can be used.
--truststore_password ssl_truststore_password
Truststore password to use for connection to DSE when SSL is enabled.
--truststore_path ssl_truststore_path
Path to the truststore to use for connection to DSE when SSL is enabled. For example, --truststore-
path /path/to/ts.
--truststore-type ssl_truststore_type
Truststore type for connection to DSE when SSL is enabled. JKS is the type for keys generated by
the Java keytool binary, but other types are possible, depending on user environment. For example, --
truststore-type jks2.
-u username
User name of a DSE authentication account. Can use the DSE_USERNAME environment variable.
-a proxy_auth_username
DSE authorization username if proxy authentication is used.
--use-server-config
Read parameters from server yaml configuration files. It assumes this node is properly configured.
dse client-tool cassandra
Performs token management and partitioner discovery.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
786
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
cancel-token token
Cancel the specified token.
generate-token [username]
Generate delegation token to access Kerberos DSE from non-Kerberos clusters.
• When the username is not specified, the current user is the token renewer. Only DSE processes
can renew a token.
• When the username is specified as the token renewer, that user can renew and cancel the token.
partitioner
Returns the partitioner that is being used by the node.
renew-token token
Renew the specified token.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
787
DataStax Enterprise tools
Examples
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
788
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
file
File name for the exported compressed file. For example, dse-config.jar.
Examples
To export the DataStax Enterprise client configuration from the remote node:
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
789
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
790
DataStax Enterprise tools
Merge the default Spark properties with the DSE Spark properties
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--cqlshrc
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
791
DataStax Enterprise tools
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
792
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
leader-address
Returns the IP address of the currently selected Spark Master for the datacenter.
master-address
Returns the localhost IP address used to configure Spark applications. The address is returned as URI:
dse://ip:port?connection.local_dc=dc_name;connection.host=cs_list_contactpoints;
DSE automatically connects Spark applications to the Spark Master. You do not need to use the IP
address of the current Spark Master in the connection URI.
metastore-migrate --from_version --to_version
Migrate Spark SQL metastore from one DSE version to another DSE version.
version
Returns the version of Spark that is bundled with DataStax Enterprise.
sql-schema (--exclude | --keyspace | --table | --decimal | --all)
Exports the SQL table creation query with these options:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
793
DataStax Enterprise tools
Examples
You can use the generated schema files with Spark SQL on external Spark clusters.
To map custom external tables from DSE 5.0.11 to the DSE 6.0.0 release format of the Hive metastore used by
Spark SQL after upgrading:
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
794
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
status
Get the AlwaysOn SQL service status of the datacenter. With the --dc datacenter name option, get the
status of the specified datacenter.
The returned status is one of:
• STOPPED_AUTO_RESTART: the server is being started but is not yet ready to accept client requests.
• STOPPED_MANUAL_RESTART: the server was stopped with either a stop or restart command. If the
server was issued a restart command, the status will be changed to STOPPED_AUTO_RESTART as
the server starts again.
• STARTING: the server is actively starting up but is not yet ready to accept client requests.
stop
Manually stop the AlwaysOn SQL service. With the --dc datacenter name option, manually stop the
service on the specified datacenter.
start
Manually start the AlwaysOn SQL service. With the --dc datacenter name option, manually start the
service on the specified datacenter. The service will start automatically if its been enabled.
restart
Manually restart a running AlwaysOn SQL service. With the --dc datacenter name option, manually
restart the service on the specified datacenter.
reconfig
Manually reconfigure the AlwaysOn SQL service. With the --dc datacenter name option, manually
reconfigure the service on specified datacenter. Running this command will tell the service to re-read
the configuration options.
The alwayson_sql_options section in dse.yaml, described in detail at AlwaysOn SQL options, has
options for setting the ports, timeout values, log location, and other Spark or Hive configuration settings.
Additional configuration options are located in spark-alwayson-sql.conf.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
795
DataStax Enterprise tools
Examples
dse nodesync
The NodeSync service continuous background repair is enabled on a per table basis.
Modifies CQL nodesync property on one or more tables, enables nodesync tracing and monitoring.
Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.
Synopsis
[dse] nodesync
[(-ca cql_Authprovider | --cql-auth-provider cql_Authprovider)]
[(-cp cql_password | --cql-password cql_password)]
[(-cs | --cql-ssl)]
[(-cu cql_username | --cql-username cql_username)]
[(-h cql_host | --host cql_host)]
[help]
[(-jp jmx_password | --jmx-password jmx_password)]
[(-jpf jmx_password_file | --jmx-password-file jmx_password_file)]
[(-js | --jmx-ssl)]
[(-ju jmx_username | --jmx-username jmx_username)]
[(-p cql_port | --port cql_port )]
subcommand [options]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square
brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
796
DataStax Enterprise tools
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type
the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation
marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema
and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
797
DataStax Enterprise tools
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
$ nodesync help
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
798
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
799
DataStax Enterprise tools
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Examples
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
800
DataStax Enterprise tools
nodesync enable
Sets nodesync enabled to true on target tables.
Default setting is true.
Refer to Configuring SSL for nodetool, nodesync, dsetool, and Advanced Replication for important details
about creating a ~/.cassandra/nodesync-ssl.properties file. It defines properties for NodeSync that are
shared by JMX and CQL. The file must be present on any node where you will run the nodesync command.
Also, the JVM properties for NodeSync should be the same as those set for nodetool, but defined in a
separate file, such as nodesync-jvm.options. The JVM options are described in the topic referenced above.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
801
DataStax Enterprise tools
[(-v | --verbose)]
[--] [(table_list | "*")]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
802
DataStax Enterprise tools
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Examples
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
803
DataStax Enterprise tools
nodesync help
Displays usage information for nodesync commands. Use nodesync help to display a synopsis and brief
description for a specific nodesync command.
Synopsis
Validation options
command_name
Name of nodesync command.
subcommand_name
Name of nodesync subcommand.
Examples
$ nodesync help
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
804
DataStax Enterprise tools
NAME
nodesync validation - Monitor/manage user-triggered validations
SYNOPSIS
nodesync validation
nodesync [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)]
[(-p <cqlPort> | --port <cqlPort>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)] [(-js | --jmx-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] [(-cs | --cql-ssl)] validation
cancel [--quiet] [(-v | --verbose)]
nodesync [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)]
[(-p <cqlPort> | --port <cqlPort>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)] [(-js | --jmx-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] [(-cs | --cql-ssl)] validation list
[--quiet] [(-v | --verbose)] [(-a | --all)]
nodesync [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)]
[(-p <cqlPort> | --port <cqlPort>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)] [(-js | --jmx-ssl)]
[(-h <cqlHost> | --host <cqlHost>)] [(-cs | --cql-ssl)] validation
submit [--quiet] [(-v | --verbose)]
[(-r <rateInKB> | --rate <rateInKB>)]
OPTIONS
-ca <cqlAuthProvider>, --cql-auth-provider <cqlAuthProvider>
CQL auth provider class name
-cp <cqlPassword>, --cql-password <cqlPassword>
CQL password
-cs, --cql-ssl
Enable SSL for CQL
-cu <cqlUsername>, --cql-username <cqlUsername>
CQL username
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
805
DataStax Enterprise tools
COMMANDS
With no arguments, Display help information
submit
Submit a forced user validation
With --quiet option, Quiet output; don't print warnings
With --verbose option, Verbose output
With --rate option, Rate to be used just for this validation, in KB per second
cancel
Cancel a user-triggered validation
With --quiet option, Quiet output; don't print warnings
With --verbose option, Verbose output
list
List user validations. By default, only running validations are
displayed.
With --quiet option, Quiet output; don't print warnings
With --verbose option, Verbose output
With --all option, List all either running or finished validations since less then
1 day
NAME
nodesync validation submit - Submit a forced user validation
SYNOPSIS
nodesync
[(-ca <cqlAuthProvider> | --cql-auth-provider <cqlAuthProvider>)]
[(-cp <cqlPassword> | --cql-password <cqlPassword>)] [(-cs | --cql-ssl)]
[(-cu <cqlUsername> | --cql-username <cqlUsername>)]
[(-h <cqlHost> | --host <cqlHost>)]
[(-jp <jmxPassword> | --jmx-password <jmxPassword>)]
[(-jpf <jmxPasswordFile> | --jmx-password-file <jmxPasswordFile>)]
[(-js | --jmx-ssl)] [(-ju <jmxUsername> | --jmx-username <jmxUsername>)]
[(-p <cqlPort> | --port <cqlPort>)] validation submit [(-q | --quiet)]
[(-r <rateInKB> | --rate <rateInKB>)] [(-v | --verbose)] [--] <table>
[<range>...]
OPTIONS
-ca <cqlAuthProvider>, --cql-auth-provider <cqlAuthProvider>
CQL auth provider class name
-cp <cqlPassword>, --cql-password <cqlPassword>
CQL password
-cs, --cql-ssl
Enable SSL for CQL
-cu <cqlUsername>, --cql-username <cqlUsername>
CQL username
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
806
DataStax Enterprise tools
nodesync tracing
Provides detailed transaction information related to internal NodeSync operations by capturing events in the
system_traces keyspace. When tracing is enabled a session id displays in standard output and an entry with
the high-level details is written to the system_traces.session table. More detailed data for each operation is
written to the system_traces.events table.
By default, Tracing information is saved for 7 days.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
807
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
808
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
809
DataStax Enterprise tools
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
810
DataStax Enterprise tools
Examples
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
811
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Display all messages.
--
Separates table list from the rest of the command.
table_list
Target tables using any of the following methods:
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
812
DataStax Enterprise tools
If --follow is used, color each trace event according from which host it originates from
-f, --follow
After having enabled tracing, continuously show the trace events,showing new events as they come.
Note that this won't exit unless you either manually exit (with Ctrl-c) or use a timeout (--timeout option).
-l <levelStr>, --level <levelStr>
The tracing level: either 'low' or 'high'. If omitted, the 'low' level is used. Note that the 'high' level is
somewhat verbose and should be used with care.
-n, --nodes node_list
Only disable tracing on the listed nodes. Specify the host name or IP address in a comma separated
list.
Default: all nodes.
--quiet
Suppresses messages from displaying on stdout.
-t <timeoutStr>, --timeout <timeoutStr>
Timeout on the tracing; after that amount of time, tracing will be automatically disabled (and if --follow
is used, the command will return). This default in seconds, but a 's', 'm' or 'h' suffix can be used for
seconds, minutes or hours respectively.
--tables <tableStr>
A comma separated list of fully-qualified table names to trace. If omitted, all tables are trace.
-v, --verbose
Verbose output.
Examples
When the CQL host and JMX port is not specified, the local IP and default port are used. Tracing
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
813
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Connect to the specified remote CQL host.
help
Displays options and usage instructions. Use nodesync help subcommand for more information on a
specific command.
-jp, --jmx-password jmx_password
JMX password.
-jpf, --jmx-password-file jmx_password_file
Path to JMX password file.
-js | --jmx-ssl
Use SSL for JMX.
-ju, --jmx-username jmx_username
JMX username.
-p, --port cql_port
Connection port for CQL.
-k, --keyspace keyspace_name
Specify a default keyspace for unqualified table names or wildcards in the table_list.
--quiet
Suppress warning and error messages.
-v | --verbose
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
814
DataStax Enterprise tools
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
815
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates
the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ).
This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Use SSL for CQL connection.
-cu, --cql-username cql_username
CQL username.
-h, --host cql_host
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
816
DataStax Enterprise tools
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Main options
The following options apply to all nodesync commands.
-ca, --cql-auth-provider cql_Authprovider
CQL auth provider class name.
-cp, --cql-password cql_password
CQL password.
-cs | --cql-ssl
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
817
DataStax Enterprise tools
# An asterisk in double quotes to select all tables. For example, -k cycling "*".
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
818
DataStax Enterprise tools
table_name [ token_range ]
Keyspace qualified table name, optionally followed by token ranges in the form (x, y). If no token ranges
are specified, then all the tokens are validated.
-v | --verbose
Display all messages.
Examples
List all nodesync validations:
$ nodesync validation list --all Identifier Table Status Outcome Duration ETA Progress
Validated Repaired 1e6255f0-7754-11e9-aad8-579eeacd08f6 cycling.comments running ? 0ms ?
0% 0B 0B 0ac37290-7754-11e9-ab57-0f1d9fa56691 cycling.cyclist_races successful success
24ms - 100% 0B 0B
dsefs commands
The DSEFS functionality supports operations including uploading, downloading, moving, and deleting files,
creating directories, and verifying the DSEFS status.
append
Appends a local file to a remote file.
Refer to files in the local file system by prefixing paths with the file: prefix.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
819
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
destination_filepath
Explicit or relative filepath.
• If destination path ends with name, destination entry is given that name.
• If the destination path ends with a backslash (/), original source file name is used.
source_filepath
Explicit or relative filepath.
Examples
cat
Concatenates files and prints on the standard output.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
820
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
821
DataStax Enterprise tools
Examples
September 2018
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
September 2018
Su Mo Tu We Th Fr Sa
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
October 2019
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31
cd
Changes the working directory in DSEFS. The DSEFS shell remembers the last working directory of each file
system separately.
• dsefs file:/ > is the current directory on the local file system
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
822
DataStax Enterprise tools
Synopsis
$ cd filepath
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
823
DataStax Enterprise tools
Examples
Change directory to the last working directory on the local file system
dsefs file:/home/user1/path/to/local/files
chgrp
Changes group ownership for files or directories.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
824
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
group_name
Group name.
-R, --recursive
Remove directories and their contents recursively.
-v, --verbose
Turn on verbose output.
Examples
chmod
Changes permission mode for owner, group, and others.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
825
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
permission_mode
Octal representation of permission mode for owner, group, and others:
• 0 – no permission
• 1 – execute
• 2 – write
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
826
DataStax Enterprise tools
• 4 – read
-R, --recursive
Remove directories and their contents recursively.
-v, --verbose
Turn on verbose output.
Examples
Change permission to make file readable, writable and executable by all users
chown
Changes ownership and/or group ownership for files or directories.
Synopsis
$ chgrp [-R] [-v] [-u username] [-g group_name] filepath [filepath ...]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
827
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
cp
Copies a file within a file system or between two file systems. If the destination filepath points to a file system
other than DSEFS, the block size and redundancy options are ignored.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
828
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
• If destination path ends with name, destination entry is given that name.
• If the destination path ends with a backslash (/), original source file name is used.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
829
DataStax Enterprise tools
--force-sync
Synchronize files in this directory with the storage device when closed. Files created in the directory
inherit the option.
--no-force-sync
Do not synchronize files in this directory with the storage device when closed. Files created in the
directory inherit the option.
-n, --redundancy-factor num_nodes
Set the number of replicas of file data, similar to the replication factor in the database keyspaces, but
more granular.
• Set this to one number greater than the number of nodes that are allowed to fail before data loss
occurs. For example, set this value to 3 to allow 2 nodes to fail.
• For simple replication, use a value that is equivalent to the replication factor.
-o, --overwrite
If destination file exists, overwrite.
source_filepath
Explicit or relative filepath.
Examples
df
Reports file system status and disk space usage.
Synopsis
$ df [-h]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
830
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
-h, --human-readable
Display human-readable sizes. For example, 1.25k, 234M, or 2G.
Examples
du
List sizes of the files and directories in a specific directory.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
831
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
-h, --human-readable
Display human-readable sizes. For example, 1.25k, 234M, or 2G.
-s, --summarized
Display only the total size of all files and directories.
directories
The directories to search to calculate the space usage.
Examples
Get disk usage from the root of the DSEFS file system.
$ dse fs "du"
464827 example1
0 tmp/hive
0 tmp
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
832
DataStax Enterprise tools
464827 .
Get the disk usage from the example1 directory in human readable form.
454K example1
Get the total disk usage in human readable form of all files in DSEFS.
454K .
echo
Displays a line of text.
Synopsis
$ echo text_to_display
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
833
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Command arguments
text_to_display
Text to display.
Examples
exit
Exits DSEFS command shell.
Synopsis
$ exit
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
834
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Command arguments
fsck
Performs file system consistency check and repairs file system errors. Only a superuser may run fsck. Run fsck
after running umount, or if you encounter file write errors (for example, timeouts).
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
835
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Command arguments
Use throttling to limit the number of files being repaired at the same time to 8.
$ dse fs fsck -p 8
get
A special case of cp that copies a DSEFS remote file to the local file system. If a relative source path is given, it
is resolved in the last DSEFS working directory, regardless of the current working directory. Similarly, if a relative
destination path is given, it is always resolved in the last local working directory. Filepaths can be absolute and
can point to any file system.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
836
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
destination_filepath
Explicit or relative filepath.
• If destination path ends with name, destination entry is given that name.
• If the destination path ends with a backslash (/), original source file name is used.
source_filepath
Explicit or relative filepath.
Examples
ls
Lists directory contents.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
837
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
directory_name
Directory on DSEFS file system.
-h, --human-readable
Display human-readable sizes. For example, 1.25k, 234M, or 2G.
-l, --long
Use long listing format.
-R, --recursive
Remove directories and their contents recursively.
-1, --single-column
List one file per line.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
838
DataStax Enterprise tools
Examples
bin cdrom dev home lib32 lost+found mnt proc run srv tmp var
initrd.img.old vmlinuz.old
boot data etc lib lib64 media opt root sbin sys usr initrd.img vmlinuz
bin
cdrom
dev
home
lib32
lost+found
mnt
proc
run
srv
tmp
var
initrd.img.old
vmlinuz.old
boot
data
etc
lib
lib64
media
opt
root
sbin
sys
usr
initrd.img
vmlinuz
mkdir
Creates new directory or directories.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
839
DataStax Enterprise tools
Synopsis
$ mkdir [-p] [-b size_in_bytes] [-n num_nodes] [-c encoder_name] [-m permission_mode] [--no-
force-sync] [--force-sync] new_directory_name [new_directory_name ...]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
--force-sync
Synchronize files in this directory with the storage device when closed. Files created in the directory
inherit the option.
-m, --permission-mode permission_mode
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
840
DataStax Enterprise tools
• 0 – no permission
• 1 – execute
• 2 – write
• 4 – read
--no-force-sync
Do not synchronize files in this directory with the storage device when closed. Files created in the
directory inherit the option.
-n, --redundancy-factor num_nodes
Set the number of replicas of file data, similar to the replication factor in the database keyspaces, but
more granular.
• Set this to one number greater than the number of nodes that are allowed to fail before data loss
occurs. For example, set this value to 3 to allow 2 nodes to fail.
• For simple replication, use a value that is equivalent to the replication factor.
-p, --parents
If needed, makes parent directories. If parent directories exist, no error.
Examples
Make new directory with 32-MB block sizes, redundancy factor or 2, files
synchronize on close
mv
Moves a file or directory.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
841
DataStax Enterprise tools
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
destination_filepath
Explicit or relative filepath.
• If destination path ends with name, destination entry is given that name.
• If the destination path ends with a backslash (/), original source file name is used.
source_filepath
Explicit or relative filepath.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
842
DataStax Enterprise tools
Examples
put
A special case of cp that copies a local file to the DSE filesystem. If a relative source path is given, it is resolved
in the last local working directory, regardless of the current working directory. Similarly, if a relative destination
path is given, it is always resolved in the last DSEFS working directory. As in cp, both paths may be absolute and
are allowed to point to any file system. If the destination path points to a different file system than DSEFS, the
block size and redundancy options are ignored.
Synopsis
$ put [-o] [-b size_in_bytes] [-n num_nodes] [-c encoder_name] [-f frame_size_in_bytes] [-m
permission_mode] [--no-force-sync] [--force-sync] source_filepath destination_filepath
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
843
DataStax Enterprise tools
• If destination path ends with name, destination entry is given that name.
• If the destination path ends with a backslash (/), original source file name is used.
• 0 – no permission
• 1 – execute
• 2 – write
• 4 – read
• Set this to one number greater than the number of nodes that are allowed to fail before data loss
occurs. For example, set this value to 3 to allow 2 nodes to fail.
• For simple replication, use a value that is equivalent to the replication factor.
--no-force-sync
Do not synchronize files in this directory with the storage device when closed. Files created in the
directory inherit the option.
-o, --overwrite
If destination file exists, overwrite.
source_filepath
Explicit or relative filepath.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
844
DataStax Enterprise tools
Examples
pwd
Prints full filepath of current working directory.
Synopsis
$ pwd [directory_path]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
845
DataStax Enterprise tools
directory_path
Current working directory.
Examples
dsefs:/home/user1/new_directory
realpath
Prints the resolved absolute path; all but the last component must exist
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
846
DataStax Enterprise tools
Command arguments
-e, --canonicalize-existing
Convert data that involves more than one representation into a standard, approved format. All
components of the path must exist.
-m, --canonicalize-missing
Convert missing data into a standard approved format. No path components needed to exist or be a
directory.
filepath
Explicit or relative filepath.
Examples
Print filepath
file:/home/user1/myDirectory
rename
Renames a file or directory without moving it to a different directory.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
847
DataStax Enterprise tools
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
new_name
New file name.
filepath
Explicit or relative filepath.
Examples
rm
Removes files or directories.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
848
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
-R, --recursive
Remove directories and their contents recursively.
-v, --verbose
Turn on verbose output.
Examples
Remove files
Remove directory
rmdir
Removes empty directory or directories.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
849
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
850
DataStax Enterprise tools
Examples
stat
Displays file or directory status.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
851
DataStax Enterprise tools
-v, --verbose
Turn on verbose output.
Examples
DIRECTORY file:/home/user1/new_directory:
Owner user1
Group user1
Permission rwxr-xr-x
Created 2017-01-15 13:10:06+0200
Modified 2017-01-15 13:10:06+0200
Accessed 2017-01-15 13:10:06+0200
Size 4096
truncate
Truncates file or files to a specified length.
To retain only metadata, set file size to 0 bytes. Also useful to keep an empty file for processes without deleting
and recreating a file.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
852
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
filepath
Explicit or relative filepath.
umount
Unmounts file system storage locations from file hierarchy. Only a superuser may run umount. After running
umount, run fsck to add missing block replicas taken away by the unmounted location.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
853
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
-f, --force
Force unmounting, even if location is unavailable.
location_UUID
UUID of location.
Examples
dsetool
About dsetool
dsetool is a command line interface for DSE operations.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
854
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
$ dsetool help
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
855
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
JMX authentication is used by some dsetool commands. Other dsetool commands authenticate with the
user name and password of the configured user. The connection option short form and long form are comma
separated.
You can provide authentication credentials in several ways, see Credentials for authentication.
To enable dsetool to use Kerberos authentication, see Using dsetool with Kerberos enabled cluster.
username=username
password=password
jmx_username=jmx_username
jmx_password=jmx_password
The credentials in the configuration file are stored in clear text. DataStax recommends restricting
access to this file only to the specific user.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
856
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
857
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Retrieves the dynamic indexing status (INDEXING, FINISHED, or FAILED) of the specified index or indexes.
Also identifies the reindexing reason. The possible reason for a reindexing event is categorized as one of the
following:
• BOOTSTRAP
• NEW_SSTABLES
• USER_REQUEST
Parameters:
[keyspace_name.]table_name
The search index table name is required. The keyspace name is optional. The case of keyspace and
table names is preserved. You must use the correct case for the keyspace and table names.
--all
Retrieve the dynamic indexing status of the specified search index on all nodes.
--progress
Display the percent complete, an estimated completion time in milliseconds, and the reindexing reason.
This option is ignored and is assumed true. The command always displays the status information.
See Verifying indexing status.
Examples
These examples use the demo keyspace and health_data table.
To view the indexing status for the local node:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
858
DataStax Enterprise tools
reason: USER_REQUEST
dsetool create_core
Creates the search index table on the local node.
Supports DSE authentication with [-l username -p password].
The CQL command to create a search index is CREATE SEARCH INDEX.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
859
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
coreOptions=yamlFilepath
When auto-generation is on with generateResources=true, the file path to a customized YAML-
formatted file of options. See Changing auto-generated search index settings.
coreOptionsInline=key1:value1#key2:value2#...
Use this key-value pair syntax key1:value1#key2:value2# to specify values for these settings:
• auto_soft_commit_max_time:ms
• default_query_field:field
• generateResources:(true|false)
spaceSavingNoJoin Do not index a hidden primary key field. Prevents joins across cores.
spaceSavingSlowTriePrecision Sets trie fields precisionStep to '0', allowing for greater space saving but slower querying.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
860
DataStax Enterprise tools
rt
Enable live indexing to increase indexing throughput. Enable live indexing on only one search index per
cluster.
rt=true
recovery=(true|false)
Whether to delete and recreate the search index if it is not able to load due to corruption. Valid values:
• true - If search index is unable to load, recover the index by deleting and recreating it.
reindex=(true|false)
Whether to reindex the data when search indexes are auto-generated with generateResources=true.
Reindexing works on a datacenter (DC) level. Reindex only once per search-enabled DC. Repeat the
reindex command on other data centers as required.
Valid values:
• true - Default. Reindexes the data. Accepts reads and keeps the current search index while the
new index is building.
• false - Does not reindex the data. You can check and customize search index resources before
indexing.
schema=path
Path of the UTF-8 encoded search index schema file. Cannot be specified when
generateResources=true.
To ensure that non-indexed fields in the table are retrievable by queries, you must include those
fields in the schema file. For more information, see Solr single-pass CQL queries.
solrconfig=path
Path of the UTF-8 encoded search index configuration file. Cannot be specified when
generateResources=true.
Examples
Automatically generate search index for the health_data table in the demo
keyspace
Override the default and reindex existing data, specify the reindex=true
option
The generateResources=true option generates resources only if resources do not exist in the solr_resources
table.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
861
DataStax Enterprise tools
To turn on live indexing, also known as real-time (RT) indexing, the contents of the rt.yaml are rt: true:
dsetool createsystemkey
Creates an encryption/decryption key for transparent data encryption (TDE).
See Transparent data encryption.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
862
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
cipher_algorithm[/mode/padding]
DSE supports the following JCE cipher algorithms:
where system_key2 is the unique file name for the generated key file.
where group2 is the key server group defined in the kmip_hosts section of dse.yaml.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
863
DataStax Enterprise tools
$ dsetool encryptconfigvalue
dsetool get_core_config
Displays the XML for the specified search index config. Supports DSE authentication with [-l username -p
password].
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
864
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
current=true|false
Optionally specify to view the current (active) configuration.
• false - Default. Returns the pending (latest uploaded) search index configuration.
Examples
The following examples view the search index config for the demo keyspace and health_data table.
To view the pending (latest uploaded) configuration:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
865
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
current=true|false
Optionally specify to view the current (active) schema.
Examples
The following examples view the search index schema for the demo keyspace and health_data table.
To save the XML output to a file:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
866
DataStax Enterprise tools
dsetool help
Provides a listing of dsetool commands and parameters.
Synopsis
$ dsetool help
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
867
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Typing dsetool or dsetool help provides a listing of dsetool commands and parameters.
Help is not available on a single command.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
868
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
coreOptions=yamlFilepath
When auto-generation is on with generateResources=true, the file path to a customized YAML-
formatted file of options. See Changing auto-generated search index settings.
coreOptionsInline=key1:value1#key2:value2#...
Use this key-value pair syntax key1:value1#key2:value2# to specify values for these settings:
• auto_soft_commit_max_time:ms
• default_query_field:field
• generateResources:(true|false)
• true - Runs the index check to verify index integrity. Reads the full index and has performance
impact.
--index_checks_stop=true|false
Specify to stop the index check.
Examples
Ensure that indexing is inactive before doing an index check.
To do an index check:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
869
DataStax Enterprise tools
maxDoc:0
deletedDocs:0
indexHeapUsageBytes:0
version:2
segmentCount:0
current:true
hasDeletions:false
directory:org.apache.lucene.store.MMapDirectory:MMapDirectory@/
Users/maryjoe/dse/data/solr.data/demo.health_data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@5c94e0dd
segmentsFile:segments_1
segmentsFileSizeInBytes:71
userData:{}
dsetool infer_solr_schema
Automatically infers and proposes a schema that is based on the specified keyspace and table. Search indexes
are not modified. Supports DSE authentication with [-l username -p password].
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
870
DataStax Enterprise tools
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
coreOptions=yamlFilepath
When auto-generation is on with generateResources=true, the file path to a customized YAML-
formatted file of options. See Changing auto-generated search index settings.
coreOptionsInline=key1:value1#key2:value2#...
Use this key-value pair syntax key1:value1#key2:value2# to specify values for these settings:
• auto_soft_commit_max_time:ms
• default_query_field:field
• generateResources:(true|false)
dsetool inmemorystatus
Provides the memory size, capacity, and percentage for this node and the amount of memory each table is
using. The unit of measurement is MB. Bytes are truncated.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
871
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
[keyspace_name.table_name]
The keyspace name and table name.
Examples
$ dsetool inmemorystatus
dsetool insights_config
Enables and disables DSE Metrics Collector and configures reporting frequency and storage options. The default
mode enables metrics collection and reporting with local storage on disk.
Run this command only on a single node. The change is propagated to all other nodes in the cluster. Wait at
least 30 seconds for the changes to propagate to all nodes. Restarting DSE is not required.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
872
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--show_config
Prints the current configuration for DSE Metrics Collector.
--mode DISABLED|ENABLED_NO_STORAGE|ENABLED_WITH_LOCAL_STORAGE
Enables and disables DSE Metrics Collector and configures storage options:
• ENABLED_NO_STORAGE - enables metrics collection and starts reporting metrics. Typically used
when collectd is configured to report to a real-time monitoring system.
Restarting DSE is not required after changing the configuration mode. The configuration mode persists
after DSE is restarted.
--metric_sampling_interval_in_seconds seconds
The frequency that metrics are reported to DSE Metrics Collector.
Default: 30
--config_refresh_interval_in_seconds seconds
How often the DSE Metrics Collector configuration changes are pushed to all nodes in the cluster. If
nodes are down when a change is made, the change will propagate when the node is back up.
Default: 30
--data_dir_max_size_in_mb mb
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
873
DataStax Enterprise tools
When local storage is enabled, the limit on how much DSE Metrics Collector data will be stored on disk.
The maximum size of the data directory cannot exceed 2 GB.
Default: 1024 (1 GB)
--node_system_info_report_period duration
The repeating time interval, in ISO-8601 format, for gathering diagnostic information about the node.
For example, PT1H is 1 hour, PT5M is 5 minutes, and PTM200S is 200 seconds.
Default: PT1H (1 hour)
Examples
{
"mode" : "DISABLED",
"config_refresh_interval_in_seconds" : 30,
"metric_sampling_interval_in_seconds" : 30,
"data_dir_max_size_in_mb" : 1024,
"node_system_info_report_period" : "PT1H"
}
Configure 1500 MB for the DSE Metrics Collector local data directory
The maximum size of the local data directory must not exceed 2 GB.
The default directory for local storage is /var/lib/cassandra/insights_data. To change the directory to store
collected metrics, see Configuring data and log directories for DSE Metrics Collector.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
874
DataStax Enterprise tools
After you make configuration changes with dsetool insights_config, you must disable and then re-enable DSE
Metrics Collector to read the configuration file again. Wait at least 30 seconds for the changes to propagate to
all nodes.
dsetool insights_filters
Configures filters to include and exclude specific metrics for DSE Metrics Collector.
By default, the following metrics are always excluded:
• DSE internal table metrics (except system_auth, paxos, and batchlog metrics)
Use a regular expression (regex) to specify which metrics to include or exclude from the filter. See Filtering
metrics.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
875
DataStax Enterprise tools
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--show_filters
Prints the current filters for DSE Metrics Collector.
--remove_all_filters
Remove all metrics filters for DSE Metrics Collector.
--add --global|--insights_only regex
Include metrics that match this regular expression and apply the filter with scope of --global or --
insights_only.
--deny --global|--insights_only regex
Exclude metrics that match this regular expression and apply the filter with scope of --global or --
insights_only.
--global
Metrics filter scope includes metrics reported locally and insights data files.
--insights_only
Limit metrics filter scope to insights data files only. Appropriate for diagnostic use.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
876
DataStax Enterprise tools
Remove a global filter to allow metrics for a specific keyspace that has an
existing deny filter
Add a filter to insights data files that deny grace period metrics
dsetool list_index_files
Lists all index files for a search index on the local node. The results show file name, encryption, disk usage,
decrypted size, and encryption overhead. The index file is encrypted only when the backing CQL table is
encrypted and the search index uses EncryptedFSDirectoryFactory; otherwise, the index file is decrypted.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
877
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
--index
The data directory that contains the index files.
• If not specified, the default directory is inferred from the search index name.
• directory - A specified file path to the solr.data directory that contains the search index files.
Examples
The results show file name, encryption, disk usage, decrypted size, and encryption overhead:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
878
DataStax Enterprise tools
dsetool list_core_properties
Lists the properties and values in the dse-search.properties resource for the search index.
See Load balancing for distributed search queries.
Synopsis
$ dsetool list_core_properties
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
879
DataStax Enterprise tools
Examples
To view properties set in the dse-search.properties resource:
Example result, assuming the shard shuffling strategy has already been set to RANDOM:
shard.shuffling.strategy=RANDOM
dsetool list_subranges
Lists the subranges of data in a keyspace by dividing a token range into a number of smaller subranges. Useful
when the specified range is contained in the target node's primary range.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name table_name
Keyspace table pair.
keys_per_range
The approximate number of rows per subrange.
start_token
The start token of a specified range of tokens.
end_token
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
880
DataStax Enterprise tools
The subranges are output and can be used as input to the nodetool repair command.
dsetool listjt
Lists all Job Tracker nodes grouped by the datacenter that is local to them.
Synopsis
$ dsetool listjt
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
881
DataStax Enterprise tools
$ dsetool listjt
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
namespace=key_namespace
Namespace on the specified KMIP provider.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
882
DataStax Enterprise tools
Examples
Get a list of the available keys and states from the KMIP server:
The results show that the KMIP server named vormetricgroup has two keys:
Keys on vormetricgroup:
ID Name Cipher State
Activation Date Creation Date Protect Stop Date Namespace
02-449 82413ef3-4fa6-4d4d-9dc8-71370d731fe4_0 AES/CBC/PKCS5 Deactivated Mon
Apr 25 20:25:47 UTC 2016 n/a n/a n/a
02-540 0eb2277e-0acc-4adb-9241-1dd84dde691c_0 AES Active Tue
May 31 12:57:59 UTC 2016 n/a
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
883
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
kmip_key_id
The key id on the KMIP provider.
date_time
After the specified date_time, new data will not be encrypted with the key. Data can be decrypted with
the key after this expire date/time. Format of datetime is YYYY-MM-DD HH:MM:SS:T. For example, use
2016-04-13 20:05:00:0 to expire the encryption key at 8:05 p.m. on 13 April 2016.
Examples
Encryption for new data is prevented, but decryption with the key is still allowed. Because the expire date/time is
not specified, the key is expired immediately.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
884
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
kmip_key_id
The key id on the KMIP provider.
Examples
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
885
DataStax Enterprise tools
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
kmip_groupname
The user-defined name of the KMIP group that is configured in the kmip_hosts section of dse.yaml.
kmip_key_id
The key id on the KMIP provider.
Examples
dsetool node_health
Retrieves a dynamic score between 0 and 1 that describes the health of a DataStax Enterprise node. Node
health is a score-based representation of how fit a node is to handle search queries. The node health composite
score is based on dropped mutations and uptime. A higher score indicates better node health. Nodes that have a
large number of dropped mutations and nodes that are just started have a lower health score.
See Collecting node health and indexing status scores.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
886
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
--all
Run the operation on all nodes.
Examples
$ dsetool node_health
dsetool partitioner
Returns the fully qualified classname of the IPartitioner that is used by the cluster.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
887
DataStax Enterprise tools
Synopsis
$ dsetool partitioner
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
$ dsetool partitioner
org.apache.cassandra.dht.Murmur3Partitioner
dsetool perf
Temporarily changes the running parameters for the CQL Performance Service. Histogram tables provide DSE
statistics that can be queried with CQL.
Changes made with performance object subcommands do not persist between restarts and are useful only for
short-term diagnostics.
To make these changes permanent, change the CQL Performance Service options in dse.yaml.
See DSE Performance Service diagnostic table reference and Collecting histogram diagnostics.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
888
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
clustersummary enable|disable
Whether to enable the collection of database-level statistics for the cluster.
cqlslowlog enable|disable
Whether to enable the collection of CQL queries that exceed the specified time threshold.
cqlslowlog threshold
The CQL slow log threshold as a percentile of the actual request times:
cqlslowlog skip_writing_to_db
Keeps slow queries in-memory only.
cqlslowlog write_to_db
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
889
DataStax Enterprise tools
Writes data to the database. When data writes to the database, the threshold must be >= 2000 ms to
prevent a high load on database.
Temporary equivalent of cql_slow_log_options.skip_writing_to_db: false setting in dse.yaml.
cqlslowlog set_num_slowest_queries
The number of slow queries to keep in-memory.
cqlslowlog recent_slowest_queries
The specified number of the most recent slow queries to retrieve.
cqlsysteminfo enable|disable
Whether to collect CQL system performance information statistics.
dbsummary enable|disable
Whether to collect database summary statistics.
histograms enable|disable
Whether to collect table histograms that measure the distribution of values in a stream of data.
Histogram tables provide DSE statistics that can be queried with CQL. The data in the diagnostic
histogram tables is cumulative since the DSE server was started.
resourcelatencytracking enable|disable
Whether to collect resource latency tracking statistics.
solrcachestats enable|disable
Whether to collect Solr cache statistics.
solrindexingerrorlog enable|disable
Whether to log Solr indexing errors.
solrindexstats enable|disable
Whether to collect Solr indexing statistics.
solrlatencysnapshots enable|disable
Whether to collect Solr latency snapshots.
solrrequesthandlerstats enable|disable
Whether to collect Solr request handler statistics.
solrslowlog threshold enable|disable
Whether to log the Solr slow sub-query log and set the Solr slow log threshold in milliseconds.
solrupdatehandlerstats enable|disable
Whether to collect Solr update handler statistics.
userlatencytracking enable|disable
Whether to enable user latency tracking.
Examples
These example commands make temporarily changes only. Changes made with performance object
subcommands do not persist between restarts and are useful only for short-term diagnostics.
See Collecting database summary diagnostics.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
890
DataStax Enterprise tools
dsetool read_resource
Reads the specified search index config or schema. Supports DSE authentication with [-l username -p
password].
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
891
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
res_filename
The name of the search index resource file to read.
Examples
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
892
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
index1,index2,...
Include one or a comma-separated list of secondary indexes to rebuild. If indexes are not specified,
rebuilds all indexes.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
893
DataStax Enterprise tools
Examples
dsetool reload_core
Reloads the search index to recognize changes to schema or configuration. Supports DSE authentication with [-
l username -p password].
To reload the core and prevent reindexing, accept the default values reindex=false and deleteAll=false.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
894
DataStax Enterprise tools
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
schema=path
Path of the UTF-8 encoded search index schema file. Cannot be specified when
generateResources=true.
To ensure that non-indexed fields in the table are retrievable by queries, you must include those
fields in the schema file. For more information, see Solr single-pass CQL queries.
solrconfig=path
Path of the UTF-8 encoded search index configuration file. Cannot be specified when
generateResources=true.
distributed=(true|false)
Whether to distribute and apply the operation to all nodes in the local datacenter.
• False applies the operation only to the node it was sent to. False works only when recovery=true.
• true - Default. Reindexes the data. Accepts reads and keeps the current search index while the
new index is building.
• false - Does not reindex the data. You can check and customize search index resources before
indexing.
deleteAll=( true|false )
• true - deletes the already existing index before reindexing; search results will return either no or
partial data while the index is rebuilding.
• false - does not delete the existing index, causing the reindex to happen in-place; search results
will return partially incorrect results while the index is updating. Default.
During reindexing, a series of criteria routes sub-queries to the nodes most capable of handling them.
See Shard routing for distributed queries.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
895
DataStax Enterprise tools
Examples
dsetool ring
Lists the nodes in the ring. For more readable output, use dsetool status.
Synopsis
$ dsetool ring
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
896
DataStax Enterprise tools
Examples
$ dsetool ring
Results:
0
10.101.33.157 Cassandra rack1 Cassandra
no Up Normal 178.75 KiB 50.00%
-9223372036854775808 1.00
10.101.32.188 Cassandra rack1 Cassandra
no Up Normal 188.22 KiB 50.00%
0 1.00
$ dsetool ring
Results:
dsetool set_core_property
Sets the properties and values in the dse-search.properties resource for the search index.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
897
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
For shard.set.cover.finder:
DYNAMIC
Use randomization in token range and endpoint selection for load balancing. DYNAMIC is the default.
STATIC
Requires load balanced client. Suitable for 8+ vnodes. The same query on a node uses the same token
ranges and endpoints. Creates fewer token filters, and has better performance than DYNAMIC.
When shard.set.cover.finder=DYNAMIC, values for shard.shuffling.strategy:
HOST
Shards are selected based on the host that received the query.
QUERY
Shards are selected based on the query string.
HOST_QUERY
Shards are selected by host x query.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
898
DataStax Enterprise tools
RANDOM
Suitable only for 8 or fewer vnodes. Different random set of shards are selected with each request
(default).
SEED
Selects the same shard from one query to another.
When shard.set.cover.finder=STATIC, values for shard.set.cover.finder.inertia:
inertia_integer
Increasing the inertia value from the default of 1 may improve performance for clusters with more than 1
vnode and more than 20 nodes. The default is appropriate for most workloads.
Examples
To not use randomization to select token ranges and endpoints:
As shown in the examples, after setting the core property value, be sure to reload the search index. While you
can use set_core_property per cluster, reloading the search index must occur per Data Center. In cqlsh,
you can use RELOAD SEARCH INDEX. Example:
You do not need to reindex the specified table unless schema changes were made. Refer to Reloading the
search index.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
899
DataStax Enterprise tools
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
This command has an optional argument datacenter. If a datacenter is specified, it will remove the recovery
data for that datacenter.
Examples
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
900
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
dsetool status
Lists the nodes in their ring, including the node type and node health. When the datacenter workloads are the
same type, the workload type is listed. When the datacenter workloads are heterogeneous, the workload type is
shown as mixed.
Synopsis
$ dsetool status
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
901
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
$ dsetool status
dsetool stop_core_reindex
Stops reindexing for the specified search index on the node where the command is run. Optionally, specify a
timeout in minutes so that the core waits to stop reindexing until the specified timeout is reached, then gracefully
stops the indexing. The default timeout is 1 minute.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
902
DataStax Enterprise tools
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
timeout_min
The number of minutes to wait to gracefully stop the indexing.
Examples
dsetool tieredtablestats
Outputs tiered storage information, including SSTables, tiers, timestamps, and sizes. Provides information on
every table that uses tiered storage.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
903
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
-v
Output statistics for each SSTable, in addition to the tier summaries.
Examples
$ dsetool tieredtablestats
Output of command:
ks.tbl
Tier 0:
Summary:
max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
size: 4839
SSTables:
/mnt2/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-3-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168678515945
min_timestamp: 1449168678515945
reads_120_min: 5.2188117172945374E-5
reads_15_min: 4.415612774014863E-7
rows: 1
size: 4839
Tier 1:
Summary:
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0
reads_15_min: 0.0
size: 4839
SSTables:
/mnt3/ks/tbl-257cecf1988311e58be1ff4e6f1f6740/ma-4-big-Data.db:
estimated_keys: 256
level: 0
max_data_age: 1449178580284
max_timestamp: 1449168749912092
min_timestamp: 1449168749912092
reads_120_min: 0.0
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
904
DataStax Enterprise tools
reads_15_min: 0.0
rows: 1
size: 4839
dsetool tsreload
Reloads the truststores without a restart. Specify client or server.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
client
Reloads the truststore that is used for encrypted client-to-node communications.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
905
DataStax Enterprise tools
server
Reloads the server truststore that is used for encrypted node-to-node (internode) SSL communications.
dsetool unload_core
Removes a search index. Supports DSE authentication with [-l username -p password].
To drop a search index from a table and delete all related data for the entire cluster, see DROP SEARCH
INDEX.
The removal of the secondary index from the table schema is always distributed.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
• true - Deletes index data and any other artifacts in the solr.data directory. It does not delete
DataStax Enterprise data.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
906
DataStax Enterprise tools
Whether to delete the config and schema resources associated with the search index.
Valid values:
distributed=true | false
Whether to distribute and apply the operation to all nodes in the local datacenter.
• False applies the operation only to the node it was sent to. False works only when recovery=true.
Default: true
Distributing a re-index to an entire datacenter degrades performance severely in that datacenter.
dsetool upgrade_index_files
Upgrades all DSE Search index files.
Requirements:
• The remote node that contains the encryption configuration must be running.
• The user that runs this command must have read and write permissions to the directory that contains the
index files.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
907
DataStax Enterprise tools
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
-h IP_address
Required. Node hostname or IP address of the remote node that contains the encryption configuration
that is used for index encryption. The remote node must be running.
-c port
The DSE port on the remote node that contains the encryption configuration.
--backup
Preserves the index files from the current index as a backup after successful upgrade. The preserved
index file backup is moved to the --workspace directory. When not specified, index files from the current
index are deleted.
--workspace directory
The workspace directory for the upgrade process. The upgraded index is created in this directory. When
not specified, the default directory is the same directory that contains the search index files.
--index directory
The data directory that contains the search index files. When not specified, the default directory is
inferred from the search index name.
Examples
See Migrating encrypted tables from earlier versions and Encrypting new Search indexes.
dsetool write_resource
Uploads the specified search index config or schema.
Resource files are stored internally in the database. You can configure the maximum resource file size or disable
resource upload with the resource_upload_limit option in dse.yaml.
Supports DSE authentication with [-l username -p password].
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
908
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
keyspace_name.table_name
Required. The keyspace and table names of the search index. Keyspace and table names are case-
sensitive. Enclose names that contain uppercase in double quotation marks.
res_filename
The name of the search index resource file to upload.
file
The file path of the file to upload.
Examples
To specify the uploaded resource file and the path to the resource file:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
909
DataStax Enterprise tools
The cassandra-stress tool also supports a YAML-based profile for defining specific schemas with various
compaction strategies, cache settings, and types. Sample files are located in the tools directory:
• cqlstress-counter-example.yaml
• cqlstress-example.yaml
• cqlstress-insanity-example.yaml
The YAML file supports user-defined keyspace, tables, and schema. The YAML file can be used to design tests
of reads, writes, and mixed workloads.
When started without a YAML file, cassandra-stress creates a keyspace, keyspace1, and tables, standard1
or counter1, depending on what type of table is being tested. These elements are automatically created the first
time you run a stress test and reused on subsequent runs. You can drop keyspace1 using DROP KEYSPACE.
You cannot change the default keyspace and tables names without using a YAML file.
Usage:
• Package installations:
• Tarball installations:
cassandra-stress options
Command Description
counter_read Multiple concurrent reads of counters. The cluster must first be populated by a counter_write test.
mixed Interleave basic commands with configurable ratio and distribution. The cluster must first be populated by a
write test.
read Multiple concurrent reads. The cluster must first be populated by a write test.
user Interleave user provided queries with configurable ratio and distribution.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
910
DataStax Enterprise tools
Additional sub-options are available for each option in the following table. To get more detailed information on
any of these, enter:
When entering the help command, be sure to precede the option name with a hyphen, as shown.
Cassandra-stress sub-options
Sub-option Description
-col Column details, such as size and count distribution, data generator, names, and comparator.
Usage:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
911
DataStax Enterprise tools
Sub-option Description
-graph Graph results of cassandra-stress tests. Multiple tests can be graphed together.
Usage:
-insert Insert specific options relating to various methods for batching and splitting partition updates.
Usage:
-port Specify port for connecting Cassandra nodes. Port can be specified for Cassandra native protocol, Thrift protocol or
a JMX port for retrieving statistics.
Usage:
where
• throttle=N throttle operations per second across all clients to a maximum rate (or less) with no implied
schedule. Default is 0.
• fixed=N expect fixed rate of operations per second across all clients with implied schedule. Default is 0.
OR
Where
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
912
DataStax Enterprise tools
Sub-option Description
-sendto <host>
Command Description
cl=? Set the consistency level to use during cassandra-stress. Options are ONE, QUORUM, LOCAL_QUORUM,
EACH_QUORUM, ALL, and ANY. Default is LOCAL_ONE.
err<? Specify a standard error of the mean; when this value is reached, cassandra-stress will end. Default is 0.02.
n>? Specify a minimum number of iterations to run before accepting uncertainly convergence.
n<? Specify a maximum number of iterations to run before accepting uncertainly convergence.
ops(?) Specify what operations to run and the number of each. (only with the user option)
profile=? Designate the YAML file to use with cassandra-stress. (only with the user option)
truncate=? Truncate the table created during cassandra-stress. Options are never, once, or always. Default is never.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
913
DataStax Enterprise tools
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
914
DataStax Enterprise tools
2. The value of n used in the read phase is different from the value used in write phase. During the write
phase, n records are written. However in the read phase, if n is too large, it is inconvenient to read
all the records for simple testing. Generally, n does not need be large when validating the persistent
storage systems of a cluster.
The -pop dist=UNIFORM\(1..1000000\) portion says that of the n=100,000 operations, select the
keys uniformly distributed between 1 and 1,000,000. Use this when you want to specify more data per
node than what fits in DRAM.
3. In the rate section, the greater-than and less-than signs are escaped. If not escaped, the shell
attempts to use them for IO redirection: the shell tries to read from a non-existent file called =256 and
create a file called =16. The rate section tells cassandra-stress to automatically attempt different
numbers of client threads and not test less that 16 or more than 256 client threads.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
915
DataStax Enterprise tools
comment='' AND
gc_grace_seconds=864000 AND
index_interval=128 AND
replicate_on_write='true' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'class': 'LZ4Compressor'};
#On Node1
$ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema
keyspace="keyspace1" -pop seq=1..1000000 -log file=~/node1_load.log -node $NODES
#On Node2
$ cassandra-stress write n=1000000 cl=one -mode native cql3 -schema
keyspace="keyspace1" -pop seq=1000001..2000000 -log file=~/node2_load.log -node
$NODES
Cassandra authentication and SSL encryption must already be configured before executing
cassandra-stress with these options. The example shown above uses self-signed CA certificates.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
916
DataStax Enterprise tools
keyspace: perftesting
keyspace_definition:
The table name and definition are created in the next section using CQL:
table: users
table_definition:
In the extra_definitions section you can add secondary indexes or materialized views to the table:
extra_definitions:
- CREATE MATERIALIZED VIEW perftesting.users_by_first_name AS SELECT * FROM
perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY
(first_name, username);
- CREATE MATERIALIZED VIEW perftesting.users_by_first_name2 AS SELECT * FROM
perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY
(first_name, username);
- CREATE MATERIALIZED VIEW perftesting.users_by_first_name3 AS SELECT * FROM
perftesting.users WHERE first_name IS NOT NULL and username IS NOT NULL PRIMARY KEY
(first_name, username);
The population distribution can be defined for any column in the table. This section specifies a uniform
distribution between 10 and 30 characters for username values in gnerated rows, that the values in
the generated rows willcreates , a uniform distribution between 20 and 40 characters for generated
startdate over the entire Cassandra cluster, and a Gaussian distribution between 100 and 500
characters for description values.
columnspec:
- name: username
size: uniform(10..30)
- name: first_name
size: fixed(16)
- name: last_name
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
917
DataStax Enterprise tools
size: uniform(1..32)
- name: password
size: fixed(80) # sha-512
- name: email
size: uniform(16..50)
- name: startdate
cluster: uniform(20...40)
- name: description
size: gaussian(100...500)
After the column specifications, you can add specifications for how each batch runs. In the following code,
the partitions value directs the test to use the column definitions above to insert a fixed number of rows
in the partition in each batch:
insert:
partitions: fixed(10)
batchtype: UNLOGGED
The last section contains a query, read1, that can be run against the defined table.
queries:
read1:
cql: select * from users where username = ? and startdate = ?
fields: samerow # samerow or multirow (select arguments from the same row,
or randomly from all rows in the partition)
The following example shows using the user option and its parameters to run cassandra-stress tests
from cqlstress-example.yaml:
Notice that:
• The user option is required for the profile and opt parameters.
• The value for the profile parameter is the path and filename of the .yaml file.
• The values supplied for ops specifies which operations run and how many of each. These values
direct the command to insert rows into the database and run the read1 query.
How many times? Each insert or query counts as one batch, and the values in ops determine how
many of each type are run. Since the total number of batches is 1,000,000, and ops says to run three
inserts for each query, the result will be 750,000 inserts and 250,000 of the read1 query.
Use escaping backslashes when specifying the ops value.
For more information, see Improved Cassandra 2.1 Stress Tool: Benchmark Any Schema – Part 1.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
918
DataStax Enterprise tools
Results:
op rate : 46751 [WRITE:46751]
partition rate : 46751 [WRITE:46751]
row rate : 46751 [WRITE:46751]
latency mean : 4.3 [WRITE:4.3]
latency median : 1.3 [WRITE:1.3]
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
919
DataStax Enterprise tools
END
pk/s Number of partition operations per second performed during the run.
row/s Number of row operations per second performed during the run.
mean Average latency in milliseconds for each operation during that run.
med Median latency in milliseconds for each operation during that run.
.95 95% of the time the latency was less than the number displayed in the column.
.99 99% of the time the latency was less than the number displayed in the column.
.999 99.9% of the time the latency was less than the number displayed in the column.
stderr Standard error of the mean. It is a measure of confidence in the average throughput number; the smaller the
number, the more accurate the measure of the cluster's performance.
fs-stress tool
Synopsis
The default IP address is the listen_address property in the cassandra.yaml file. If not using localhost, specify the
correct IP address.
fs-stress is located in the tools directory of your installation.
The default location of the tools directory depends on the type of installation:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
920
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Description
The fs-stress tool performs stress testing of the DSE File System (DSEFS) layer.
Data Description
max latency Maximum latency in milliseconds during the current reporting window.
SSTable utilities
SSTable utility tools are diagnostic tools for analyzing, using, upgrading, and changing DataStax Enterprise
SSTables.
About SSTable tools
For the following SSTable utility tools, stop DSE before running the command:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
921
DataStax Enterprise tools
• sstabledump
• sstableexpiredblockers
• sstablelevelreset
• sstablemetadata
• sstableofflinerelevel
• sstablerepairedset
• sstablesplit
SSTable tools work offline from the DataStax Enterprise database. If you need to pass a JVM parameter,
specify it in the command line. For example, to change the max heap size:
$ MAX_HEAP=2g sstabletoolname
sstabledowngrade
Downgrades the SSTables in the given table or snapshot to the version of OSS Apache Cassandra™ that is
compatible with the current version of DSE.
The sstabledowngrade command cannot be used to downgrade system tables or downgrade DSE versions.
Synopsis
SSTable compatibility
For details on SSTable versions and compatibility, see DataStax Enterprise, Apache Cassandra, CQL, and
SSTable compatibility.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
922
DataStax Enterprise tools
Definition
The short form and long form parameters are comma-separated.
Command arguments
--debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
-k, --keep-source
Do not delete the source SSTables. Do not use with the --keep-generation option.
-b, --backups
Rewrite incremental backups for the given table. May not be combined with the snapshot_name
option.
--keep-generation
Keep the SSTable generation. Do not use with the --keep-source option.
-o, --output-dir
Rewritten files are placed in output-dir/keyspace-name/table-name-and-id.
--schema
Allows upgrading and downgrading SSTables using the schema of the table in a CQL file containing
the DDL statements to re-create the schema. Must be a DDL file that allows the recreation of the table
including dropped columns. Repeat the option to specify multiple DDL schema files.
Always use the schema.cql from a snapshot of the table so that the DDL has all of the information
omitted by DESCRIBE TABLE, including dropped columns.
--sstable-files
Instead of processing all SSTables in the default data directories, process only the tables specified
via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all
SSTables within that directory are processed. Snapshots and backups are not supported with this
option.
-t, --throughput
Set to limit the maximum disk read rate in MB/s.
--temp-storage
When used with --schema, specifies location of temporary data. Directory and contents are deleted
when the tool terminates. Directory must not be shared with other tools and must be empty. If not
specified the default directory is /tmp.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
snapshot_name
Snapshot name.
• Replaces files in the given snapshot and breaks any hard links to live SSTables.
• Required before attempting to restore a snapshot taken in a different DSE version than the one that
is currently running.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
923
DataStax Enterprise tools
Examples
sstabledump
Dumps contents of given SSTable to standard output in JSON format.
Synopsis
$ sstabledump sstable_filepath [-d] [-e] [-k partition_key] [-l] [-t] [-x partition_key]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
924
DataStax Enterprise tools
Command arguments
-d
Display a CQL row per line.
-e
Display a list of partition keys.
-k, --key partition_key
Partition keys to include.
-l
Output JSON lines, by partition.
-t
Print raw timestamps instead of ISO 8601 date strings.
-x, --exclude-key partition_key
Partition key to exclude. Ignored if -y option is given.
Examples
$ nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1
$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
f4f24621ce3f11e89d32bdcab3a99c6f/aa-1-bti-Statistics.db
[
{
"partition" : {
"key" : [ "Claudio HEINEN" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 90,
"liveness_info" : { "tstamp" : "2018-10-12T16:58:00.368228Z" },
"cells" : [
{ "name" : "blist_", "deletion_info" : { "marked_deleted" :
"2018-10-12T16:58:00.368227Z", "local_delete_time" : "2018-10-12T16:58:00Z" } },
{ "name" : "blist_", "path" : [ "bday" ], "value" : "27/07/1992" },
{ "name" : "blist_", "path" : [ "blist_age" ], "value" : "23" },
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
925
DataStax Enterprise tools
$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -d
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
926
DataStax Enterprise tools
$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -e
$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -k "Claudio HEINEN"
[
{
"partition" : {
"key" : [ "Claudio HEINEN" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 75,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.445075Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "23" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1992" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "GERMANY" }
]
}
]
}
]
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
927
DataStax Enterprise tools
$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -x "Claudio HEINEN"
[
{
"partition" : {
"key" : [ "Claudio VANDELLI" ],
"position" : 0
},
"rows" : [
{
"type" : "row",
"position" : 151,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.437559Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "54" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1961" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "ITALY" }
]
}
]
},
{
"partition" : {
"key" : [ "Luc HAGENAARS" ],
"position" : 152
},
"rows" : [
{
"type" : "row",
"position" : 231,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.448698Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "28" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1987" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "NETHERLANDS" }
]
}
]
},
{
"partition" : {
"key" : [ "Toine POELS" ],
"position" : 232
},
"rows" : [
{
"type" : "row",
"position" : 309,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.451068Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "52" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1963" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "NETHERLANDS" }
]
}
]
},
{
"partition" : {
"key" : [ "Allan DAVIS" ],
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
928
DataStax Enterprise tools
"position" : 310
},
"rows" : [
{
"type" : "row",
"position" : 383,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.430478Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "35" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1980" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "AUSTRALIA" }
]
}
]
},
{
"partition" : {
"key" : [ "Laurence BOURQUE" ],
"position" : 384
},
"rows" : [
{
"type" : "row",
"position" : 460,
"liveness_info" : { "tstamp" : "2018-03-19T22:35:57.441360Z" },
"cells" : [
{ "name" : "blist", "path" : [ "age" ], "value" : "23" },
{ "name" : "blist", "path" : [ "bday" ], "value" : "27/07/1992" },
{ "name" : "blist", "path" : [ "nation" ], "value" : "CANADA" }
]
}
]
}
]
$ sstabledump /var/lib/cassandra/data/cycling/birthday_list-
e439b9222bc511e8891b23da85222d3d/aa-2-bti-Data.db -l
{"partition":{"key":["Claudio HEINEN"],"position":0},"rows":
[{"type":"row","position":75,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.445075Z"},"cells":[{"name":"blist","path":
["age"],"value":"23"},{"name":"blist","path":["bday"],"value":"27/07/1992"},
{"name":"blist","path":["nation"],"value":"GERMANY"}]}]}
{"partition":{"key":["Claudio VANDELLI"],"position":76},"rows":
[{"type":"row","position":151,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.437559Z"},"cells":[{"name":"blist","path":
["age"],"value":"54"},{"name":"blist","path":["bday"],"value":"27/07/1961"},
{"name":"blist","path":["nation"],"value":"ITALY"}]}]}
{"partition":{"key":["Luc HAGENAARS"],"position":152},"rows":
[{"type":"row","position":231,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.448698Z"},"cells":[{"name":"blist","path":
["age"],"value":"28"},{"name":"blist","path":["bday"],"value":"27/07/1987"},
{"name":"blist","path":["nation"],"value":"NETHERLANDS"}]}]}
{"partition":{"key":["Toine POELS"],"position":232},"rows":
[{"type":"row","position":309,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.451068Z"},"cells":[{"name":"blist","path":
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
929
DataStax Enterprise tools
["age"],"value":"52"},{"name":"blist","path":["bday"],"value":"27/07/1963"},
{"name":"blist","path":["nation"],"value":"NETHERLANDS"}]}]}
{"partition":{"key":["Allan DAVIS"],"position":310},"rows":
[{"type":"row","position":383,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.430478Z"},"cells":[{"name":"blist","path":
["age"],"value":"35"},{"name":"blist","path":["bday"],"value":"27/07/1980"},
{"name":"blist","path":["nation"],"value":"AUSTRALIA"}]}]}
{"partition":{"key":["Laurence BOURQUE"],"position":384},"rows":
[{"type":"row","position":460,"liveness_info":
{"tstamp":"2018-03-19T22:35:57.441360Z"},"cells":[{"name":"blist","path":
["age"],"value":"23"},{"name":"blist","path":["bday"],"value":"27/07/1992"},
{"name":"blist","path":["nation"],"value":"CANADA"}]}]}
sstableexpiredblockers
Outputs the SSTables that prevent an SSTable from dropping.
By identifying the blocking SSTables, you can take correction active so the database can drop entire SSTables
during compaction. SSTables are dropped during compaction when they contain only expired tombstones and is
guaranteed not to cover any data in other SSTables.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
930
DataStax Enterprise tools
Command arguments
--dry-run
Test command syntax and environment. Do not execute the command.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
Examples
$ nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1
sstablelevelreset
Uses LeveledCompactionStrategy to reset the level to zero on a set of SSTables. If the SSTable is already at
level 0, no change occurs. If the SSTable is releveled, the metadata is rewritten to designate the level at 0.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
931
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Command arguments
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
--really-reset
Specifies that DSE is stopped.
table_name
Table name. Required.
Examples
$ nodetool status
Datacenter: Graph
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
932
DataStax Enterprise tools
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1
Skipped /var/lib/cassandra/data/cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/
aa-2-bti-Data.db since it is already on level 0
Skipped /var/lib/cassandra/data/cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/
aa-3-bti-Data.db since it is already on level 0
sstableloader
Streams a set of SSTable data files from the sstable_directory to a live cluster. The target keyspace and
table are the parent directories of the sstable_directory.
For example, to load an SSTable named Standard1-g-1-Data.db into Keyspace1/Standard1, have the files
Standard1-g-1-Data.db and Standard1-g-1-Index.db in directory /path/to/Keyspace1/Standard1/.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
933
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
-alg,--ssl-alg algorithm
Client SSL algorithm. Default: SunX509.
-ap,--auth-provider authentication_provider
Custom AuthProvider class name. Can be combined with -u username and -pw password if the
AuthProvider supports plain text credentials.
-ciphers, --ssl-ciphers cipher-suite
Comma-separated list of encryption suites for Client SSL.
-cph,--connections-per-host num_connections_per_host
Number of concurrent connections per host.
-d, --nodes initial_host
Required. Comma-separated list of hosts to connect to initially for ring information.
-df, --dse-conf-path dse_yaml_path
The dse.yaml filepath.
-f, --conf-path cassandra_yaml_path
The filepath to a cassandra.yaml config file to override only these encryption options that were set in
the cassandra.yaml file that was read at startup:
• stream_throughput_outbound_megabits_per_sec
• server_encryption_options
• client_encryption_options
-h, --help
Display the usage and listing of the commands.
-i, --ignore node
Comma-separated list of nodes to ignore.
-idct, --inter-dc-throttle throttle_speed
Inter-datacenter throttle speed in Mbits. Default: unlimited.
-ks,--keystore keystore_path
Filepath to keystore for SSL client-to-node encryption. Overrides the client_encryption_options in
cassandra.yaml.
-kspw,--keystore-password keystore_password
Client SSL keystore password. Overrides the client_encryption_options in cassandra.yaml.
--no-progress
Do not display progress.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
934
DataStax Enterprise tools
Package installation
Tarball installation
sstablemetadata
Prints metadata about give SSTable or SSTables to standard output, including SSTable name, partitioner,
tombstone details, compressor, TTL, token, min and max clustering values, SSTable level, partition size and
statistics, and column information.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
935
DataStax Enterprise tools
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Command arguments
-c, --colors
ANSI color sequence.
-g, --gc_grace_seconds seconds
Time to use when calculating droppable tombstones.
sstable_filepath
The explicit or relative filepath to the SSTable data file ending in Data.db.
-s, --scan
Full SSTable scan for additional details. Default: false.
-t, --timestamp_unit time_unit
Time unit that cell timestamps are written with.
-u, --unicode
Use Unicode to draw histograms and progress bars.
Examples
These examples are generated using the cycling keyspace. See Setting up the Cycling keyspace.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
936
DataStax Enterprise tools
$ nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1
$ sstablemetadata /var/lib/cassandra/data/cycling/birthday_list-
f4f24621ce3f11e89d32bdcab3a99c6f/aa-1-bti-Statistics.db
SSTable: /var/lib/cassandra/data/cycling/birthday_list-f4f24621ce3f11e89d32bdcab3a99c6f/
aa-1-bti
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.01
Minimum timestamp: 1539363480354442 (10/12/2018 16:58:00)
Maximum timestamp: 1539363480374846 (10/12/2018 16:58:00)
SSTable min local deletion time: 1539363480 (10/12/2018 16:58:00)
SSTable max local deletion time: 2147483647 (no tombstones)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.6884057971014492
TTL min: 0
TTL max: 0
First token: -5189327806405140569 (Claudio HEINEN)
Last token: -428849430723689847 (Luc HAGENAARS)
minClusteringValues: []
maxClusteringValues: []
Estimated droppable tombstones: 0.3333333333333333
SSTable Level: 0
Repaired at: 0
Pending repair: --
Replay positions covered: {CommitLogPosition(segmentId=1539277782404,
position=18441844)=CommitLogPosition(segmentId=1539277782404, position=18480562)}
totalColumnsSet: 3
totalRows: 3
Estimated tombstone drop times:
Drop Time | Count (%) Histogram
1539363480 (10/12/2018 16:58:00) | 3 (100) OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Percentiles
50th 1663415872 (09/17/2022 11:57:52)
75th 1663415872 (09/17/2022 11:57:52)
95th 1663415872 (09/17/2022 11:57:52)
98th 1663415872 (09/17/2022 11:57:52)
99th 1663415872 (09/17/2022 11:57:52)
Min 1386179894 (12/04/2013 17:58:14)
Max 1663415872 (09/17/2022 11:57:52)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
937
DataStax Enterprise tools
Partition Size:
Size (bytes) | Count (%) Histogram
103 (103 B) | 3 (100) OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Percentiles
50th 103 (103 B)
75th 103 (103 B)
95th 103 (103 B)
98th 103 (103 B)
99th 103 (103 B)
Min 87 (87 B)
Max 103 (103 B)
Column Count:
Columns | Count (%) Histogram
3 | 3 (100) OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Percentiles
50th 3
75th 3
95th 3
98th 3
99th 3
Min 3
Max 3
Estimated cardinality: 3
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1539363480 (10/12/2018 16:58:00)
EncodingStats minTimestamp: 1539363480354442 (10/12/2018 16:58:00)
KeyType: org.apache.cassandra.db.marshal.UTF8Type
ClusteringTypes: []
StaticColumns:
RegularColumns:
blist_:org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.
$ sstablemetadata /var/lib/cassandra/data/cycling/cyclist_category-
e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Rows.db -u
SSTable: /var/lib/cassandra/data/cycling/cyclist_category-
e1f76e21ce4311e8949e33016bf887c0/aa-1-bti
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.01
Minimum timestamp: 1539365167498813 (10/12/2018 17:26:07)
Maximum timestamp: 1539365167524231 (10/12/2018 17:26:07)
SSTable min local deletion time: 2147483647 (no tombstones)
SSTable max local deletion time: 2147483647 (no tombstones)
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 1.0761904761904761
TTL min: 0
TTL max: 0
First token: -798238132730727330 (One-day-races)
Last token: -798238132730727330 (One-day-races)
minClusteringValues: [367]
maxClusteringValues: [198]
Estimated droppable tombstones: 0.0
SSTable Level: 0
Repaired at: 0
Pending repair: --
Replay positions covered: {CommitLogPosition(segmentId=1539277782404,
position=19530606)=CommitLogPosition(segmentId=1539277782404, position=19541152)}
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
938
DataStax Enterprise tools
totalColumnsSet: 4
totalRows: 2
Estimated tombstone drop times:
Drop Time | Count (%) Histogram
Percentiles
50th 0
75th 0
95th 0
98th 0
99th 0
Min 0
Max 0
Partition Size:
Size (bytes) | Count (%) Histogram
124 (124 B) | 1 (100) ##############################
Percentiles
50th 124 (124 B)
75th 124 (124 B)
95th 124 (124 B)
98th 124 (124 B)
99th 124 (124 B)
Min 104 (104 B)
Max 124 (124 B)
Column Count:
Columns | Count (%) Histogram
4 | 1 (100) ##############################
Percentiles
50th 4
75th 4
95th 4
98th 4
99th 4
Min 4
Max 4
Estimated cardinality: 1
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1442880000 (09/22/2015 00:00:00)
EncodingStats minTimestamp: 1539365167498813 (10/12/2018 17:26:07)
KeyType: org.apache.cassandra.db.marshal.UTF8Type
ClusteringTypes:
[org.apache.cassandra.db.marshal.ReversedType(org.apache.cassandra.db.marshal.Int32Type)]
StaticColumns:
RegularColumns: id:org.apache.cassandra.db.marshal.UUIDType,
lastname:org.apache.cassandra.db.marshal.UTF8Type
sstableofflinerelevel
Creates a decent leveling for the given keyspace and table.
Synopsis
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
939
DataStax Enterprise tools
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
--dry-run
Test command syntax and environment. Do not execute the command.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
Examples
$ nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
940
DataStax Enterprise tools
sstablepartitions
Identifies large partitions of SSTables and outputs the partition size in bytes, row count, cell count, and
tombstone count.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
941
DataStax Enterprise tools
Command arguments
-b, --backups
Include backups in the data directories (recursive scans).
-c, --min-cells cell_threshold
Partition cell count threshold.
-k, --key partition_key
Partition keys to include.
-m, --csv
Produce CSV machine-readable output instead of JSON formatted output.
-o, --min-tombstones tombstone_threshold
Partition tombstone count threshold.
-r, --recursive
Recursively.
sstable_directory
The absolute path to the SSTable data directory. The data_file_directories property in cassandra.yaml
defines the default directory.
sstable_filepath
The explicit or relative filepath to the SSTable data file ending in Data.db.
-t, --min-size partition_threshold
Partition size threshold in bytes.
-u, --current-timestamp
Include timestamp in output. Timestamp is the number of seconds since epoch, unit time for TTL
expired calculation.
-x, --exclude-key partition_key
Partition key to exclude. Ignored if -y option is given.
-y, --partitions-only
Only brief partition information. Exclude per-partition detailed row/cell/tombstone information from
process and output.
Examples
$ sstablepartitions -r /var/lib/cassandra/data/stresscql/
blogposts-7dd6dfc289b511e8a4a329556a9391cc/
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
942
DataStax Enterprise tools
p90 149 1 1 1
p95 149 1 1 1
p99 149 1 1 1
p999 179 2 2 1
min 51 0 0 0
max 446 10 10 1
count 2169
time 3626
Output only partitions with cell count threshold equal to or greater than 10
$ sstablepartitions -c 10 /var/lib/cassandra/data/stresscql/
blogposts-7dd6dfc289b511e8a4a329556a9391cc/aa-4-bti-Data.db
$ sstablepartitions -c 10 -m /var/lib/cassandra/data/stresscql/
blogposts-7dd6dfc289b511e8a4a329556a9391cc/aa-4-bti-Data.db
key,keyBinary,live,offset,size,rowCount,cellCount,tombstoneCount,rowTombstoneCount,rangeTombstoneCount,complex
"Fwl
Cc xD06iw_]Q|[t[KzCI&
$",46776c0b4363097815114430361169775f7f5d511b3b08177c5b745b4b1306007a434926091a24,true,208502,434,10,10,0,0,0
home/dimitarndimitrov/.ccm/c13529-master/node1/data0/
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
943
DataStax Enterprise tools
stresscql/blogposts-7dd6dfc289b511e8a4a329556a9391cc/aa-4-bti-
Data.db,stresscql,blogposts,,,,4,bti,aa
sstablerepairedset
Sets status as repaired or unrepaired on a given set of SSTables and updates the repairedAt field to denote
the time of the repair. This metadata facilitates incremental repairs. Use this tool in the process of migrating an
installation to incremental repair.
Use the following command to list all the *Data.db files in a keyspace:
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
944
DataStax Enterprise tools
Command arguments
-f sstable_list_file
The filepath to a file that contains a list of SSTables. For example, a *.txt file.
--is-repaired
Sets repaired status.
--is-unrepaired
Sets unrepaired status.
--really-set
Acknowledgement of potential command impact with DSE stopped.
sstable_filepath
The explicit or relative filepath to the SSTable data file ending in Data.db.
Examples
$ nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1
where the repairSetSSTables.txt file contains a list of SSTables (*Data.db) files, like:
/data/cycling/cyclist_by_country-82246fc065ff11e5a4c58b496c707234/ma-1-big-Data.db
/data/cycling/cyclist_by_birthday-8248246065ff11e5a4c58b496c707234/ma-1-big-Data.db
/data/cycling/cyclist_by_birthday-8248246065ff11e5a4c58b496c707234/ma-2-big-Data.db
/data/cycling/cyclist_by_age-8201305065ff11e5a4c58b496c707234/ma-1-big-Data.db
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
945
DataStax Enterprise tools
/data/cycling/cyclist_by_age-8201305065ff11e5a4c58b496c707234/ma-2-big-Data.db
sstablescrub
Scrubs the SSTable for the provided table.
The sstablescrub utility is an offline version of nodetool scrub. It attempts to remove the corrupted parts while
preserving non-corrupted data. Because sstablescrub runs offline, it can correct errors that nodetool scrub
cannot. If an SSTable cannot be read due to corruption, it will be left on disk.
If scrubbing results in dropping rows, new SSTables become unrepaired. However, if no bad rows are detected,
the SSTable keeps its original repairedAt field, which denotes the time of the repair.
Synopsis
$ sstablescrub [--debug] [-e arg] [-h] [-j arg] [-m] [-n] [-r] [-s] [-v] keyspace_name
table_name [-sstable-files arg]
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
946
DataStax Enterprise tools
--debug
Display stack traces.
-e, --header-fix argument
Check SSTable serialization-headers and repair issues. Takes the following arguments:
validate-only
Validate serialization-headers only. Do not attempt any repairs and do not continue with the
scrub once the validation is complete.
validate
Validate serialization-headers and continue with the scrub once the validation is complete.
(Default)
fix-only
Validate and repair only the serialization-headers. Do not continue with the scrub once
serialization-header validation and repairs are complete.
fix
Validate and repair serialization-headers and perform a normal scrub. Do not repair and do not
continue with the scrub if serialization-header validation encounters errors.
off
Do not perform serialization-header validation checks.
-h, --help
Display the usage and listing of the commands.
-j, --jobs
Number of sstables to scrub simultaneously. Defaults to the minimum between either the number of
available processors and 8.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
-m, --manifest-check
Check and repair only the leveled manifest. Do not scrub the SSTables.
-n, --no-validate
Do not validate columns using column validator.
-r, --reinsert-overflowed-ttl
Rewrite rows with overflowed expiration date affected by CASSANDRA-14092 with the maximum
supported expiration date of 2038-01-19T03:14:06+00:00. Rows are rewritten with the original
timestamp incremented by one millisecond to override/supersede any potential tombstone that might
have been generated during compaction of the affected rows. See https://docs.datastax.com/en/dse-
trblshoot/doc/troubleshooting/recoveringTtlYear2038Problem.html.
-s, --skip-corrupted
Skips corrupt rows in counter tables.
--sstable-files
Instead of processing all SSTables in the default data directories, process only the tables specified
via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all
SSTables within that directory are processed. Snapshots and backups are not supported with this
option.
table_name
Table name. Required.
-v,--verbose
Verbose output.
Examples
$ nodetool status
Datacenter: Graph
================================
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
947
DataStax Enterprise tools
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1
sstablesplit
Splits SSTable files into multiple SSTables of a maximum designated size.
For example, if SizeTieredCompactionStrategy was used for a major compaction and results in an excessively
large SSTable, split the table to ensure that compaction occurs before the next huge compaction.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
948
DataStax Enterprise tools
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
--debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
--no-snapshot
Do not snapshot SSTables before splitting.
-s, --size max_size_in_MB
Maximum size in MB for output SSTables. Default: 50.
sstable_filepath
Filepath to an SSTable.
Examples
$ nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 10.200.177.92 265.04 KiB 1 ? 980cab6a-2e5d-44c6-b897-0733dde580ac
rack1
DN 10.200.177.94 426.21 KiB 1 ? 7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3
rack1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
949
DataStax Enterprise tools
Split SSTables to 10 MB
$ sstablesplit /var/lib/cassandra/data/cycling/cyclist_category-
e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Statistics.db 10
sstableupgrade
Upgrades the SSTables in the given table or snapshot to the current version of Cassandra.
Synopsis
SSTable compatibility
For details on SSTable versions and compatibility, see DataStax Enterprise, Apache Cassandra, CQL, and
SSTable compatibility.
Definition
The short form and long form parameters are comma-separated.
Command arguments
--debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
-k, --keep-source
Do not delete the source SSTables. Do not use with the --keep-generation option.
-b, --backups
Rewrite incremental backups for the given table. May not be combined with the snapshot_name
option.
--keep-generation
Keep the SSTable generation. Do not use with the --keep-source option.
-o, --output-dir
Rewritten files are placed in output-dir/keyspace-name/table-name-and-id.
--schema
Allows upgrading and downgrading SSTables using the schema of the table in a CQL file containing
the DDL statements to re-create the schema. Must be a DDL file that allows the recreation of the table
including dropped columns. Repeat the option to specify multiple DDL schema files.
Always use the schema.cql from a snapshot of the table so that the DDL has all of the information
omitted by DESCRIBE TABLE, including dropped columns.
--sstable-files
Instead of processing all SSTables in the default data directories, process only the tables specified
via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
950
DataStax Enterprise tools
SSTables within that directory are processed. Snapshots and backups are not supported with this
option.
-t, --throughput
Set to limit the maximum disk read rate in MB/s.
--temp-storage
When used with --schema, specifies location of temporary data. Directory and contents are deleted
when the tool terminates. Directory must not be shared with other tools and must be empty. If not
specified the default directory is /tmp.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
snapshot_name
Snapshot name.
• Replaces files in the given snapshot and breaks any hard links to live SSTables.
• Required before attempting to restore a snapshot taken in a different DSE version than the one that
is currently running.
Examples
The SSTables are already on the current version, so the command returns immediately and no action is taken.
sstableutil
Lists SSTable files for the provided table.
Synopsis
$ sstableutil [-c] [-d] [-h] [-o] [-t type] [-v] keyspace_name table_name
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
951
DataStax Enterprise tools
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
-c, --cleanup
Clean up any outstanding transactions.
-d, --debug
Display stack traces.
-h, --help
Display the usage and listing of the commands.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
-o, --oplog
Include operation logs.
table_name
Table name. Required.
-t, --type type
Type of files:
-v,--verbose
Verbose output.
Examples
Listing files...
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
CompressionInfo.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-Data.db
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
952
DataStax Enterprise tools
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Digest.crc32
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Filter.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Partitions.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-Rows.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-
Statistics.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-1-bti-TOC.txt
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
CompressionInfo.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-Data.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Digest.crc32
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Filter.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Partitions.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-Rows.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-
Statistics.db
/var/lib/cassandra/data/cycling/comments-eae06ce2ce4211e8949e33016bf887c0/aa-2-bti-TOC.txt
sstableverify
Verifies the SSTable for the given table.
Synopsis
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the
square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not
type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as
required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single
quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the
key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple.
Separate the data types with a comma.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This
syntax is useful when arguments might be mistaken for command line options.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
953
DataStax Enterprise tools
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the
schema and solrconfig files.
Definition
The short form and long form parameters are comma-separated.
Command arguments
--debug
Display stack traces.
-e, --extended
Extended verification.
-h, --help
Display the usage and listing of the commands.
keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
table_name
Table name. Required.
-v,--verbose
Verbose output.
Examples
Verifying
TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-2-bti-Data.db') (0.151KiB)
Deserializing sstable metadata for TrieIndexSSTableReader(path='/var/lib/cassandra/data/
cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-2-bti-Data.db')
Checking computed hash of TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-2-bti-Data.db')
Verifying TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-3-bti-Data.db') (0.131KiB)
Deserializing sstable metadata for TrieIndexSSTableReader(path='/var/lib/cassandra/data/
cycling/cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-3-bti-Data.db')
Checking computed hash of TrieIndexSSTableReader(path='/var/lib/cassandra/data/cycling/
cyclist_name-4157ef22ce4411e8949e33016bf887c0/aa-3-bti-Data.db')
DataStax tools
Tools that are installed separately and used across products. See DataStax tools.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
954
DataStax Enterprise tools
Usage
Run the preflight check without options to run all tests.
--disk-duration=DISK_DURATION Time (in seconds) for each test disk benchmark. Set to simulate a normal
load.
--disk-threads=DISK_THREADS Number of threads for each disk benchmark. Set to simulate a normal load.
$ cd /checks
$ touch my_test.py
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
955
DataStax Enterprise tools
$ pip install pyyaml && pip install termcolor ## Optional. Install for colored output.
The Missing Settings section of the report lists both missing and deprecated settings.
The nodelist parameter is optional since the script checks for the list of IP addresses contained in
nodetool status. The format for the nodelist file is one address per line.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
956
Chapter 9. Operations
4. SearchAnalytics nodes.
5. Remaining nodes one at a time. See Initializing multiple datacenters per workload type.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
957
Operations
Command Description
SPARK_ENABLED=1 Starts the node as a Spark node and starts the Spark Master service.
or
SPARK_ENABLED=1
BYOS (Bring Your Own Spark) Set BYOS nodes as transactional nodes:
Spark nodes run in separate Spark cluster from a vendor other than All_NODE_TYPES=0 or not present.
DataStax.
2. Set the node type in the /etc/default/dse file. For example, to a Spark node:
SPARK_ENABLED=1
SOLR_ENABLED=0
GRAPH_ENABLED=0
Alternately, you can omit the other start up entries and just use SPARK_ENABLED=1.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
958
Operations
If the following error appears, look for DataStax Enterprise times out when starting and other articles in
the Support Knowledge Center.
$ nodetool status
If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.
The nodetool command shows the node type and the status. For a transactional node running in a
normal state (UN) with virtual nodes (vnodes) enabled shows:
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 127.0.0.1 82.43 KB 128 ? 40725dc8-7843-43ae-9c98-7c532b1f517e
rack1
For example, a running node in a normal state (UN) with DSE Analytics without vnodes enabled shows:
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 172.16.222.136 103.24 KB ? 3c1d0657-0990-4f78-a3c0-3e0c37fc3a06
1647352612226902707 rack1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
959
Operations
4. SearchAnalytics nodes.
5. Remaining nodes one at a time. See Initializing multiple datacenters per workload type.
When multiple flags are used, list them separately on the command line. For example, ensure there is a space
between -k and -s in dse cassandra -k -s.
Spark Analytics, DSE Graph, and DSE Search node bin/dse cassandra -k -g -s
2. From the install directory, start the node. For example, to set a Spark node:
bin/dse cassandra -k
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
960
Operations
where the installation directory is the directory where you installed DSE.
If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.
The nodetool command shows the node type and the status. For a transactional node running in a
normal state (UN) with virtual nodes (vnodes) enabled shows:
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 127.0.0.1 82.43 KB 128 ? 40725dc8-7843-43ae-9c98-7c532b1f517e
rack1
For example, a running node in a normal state (UN) with DSE Analytics without vnodes enabled shows:
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 172.16.222.136 103.24 KB ? 3c1d0657-0990-4f78-a3c0-3e0c37fc3a06
1647352612226902707 rack1
$ nodetool drain
Running nodetool drain before using the cassandra-stop command to stop a stand-alone process is not
necessary because the cassandra-stop command drains the node before stopping it.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
961
Operations
$ bin/dse cassandra-stop
In the unlikely event that the cassandra-stop command fails because it cannot find the process DataStax
Enterprise Java process ID (PID), the output instructs you to find the DataStax Enterprise Java process ID (PID)
manually, and stop the process using its PID number.
Use the PID, in the second column of the output, to stop the database.
• Rebalancing the nodes within a datacenter is no longer necessary because a node joining the datacenter
assumes responsibility for an even portion of the data.
For a detailed explanation about how vnodes work, see Virtual nodes.
When adding multiple nodes to the cluster using the allocation algorithm, ensure that nodes are added one at
a time. If nodes are added concurrently, the algorithm assigns the same tokens to different nodes.
Be sure to use the same version of DataStax Enterprise on all nodes in the cluster, as described in the
installation instructions.
1. Install DataStax Enterprise on the new nodes, but do not start DataStax Enterprise.
If your DataStax Enterprise installation started automatically, you must stop the node and clear the
data.
2. Copy the snitch properties file from another node in the same center datacenter to the node you are adding.
• Dynamically allocating tokens based on the keyspace replication factors in the datacenter:
auto_bootstrap: true
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
962
Operations
cluster_name: 'cluster_name'
listen_address:
endpoint_snitch: snitch_name
num_tokens: 8
allocate_tokens_for_local_replication_factor: RF_number
seed_provider:
- class_name: seedprovider_name
parameters:
- seeds: "IP_address_list"
For RF_number if the keyspaces in the datacenter have different replication factors (RF), use the
factor of the most data intensive keyspace, or when multiple keyspaces with equal data intensity
exist, use the highest RF. When adding multiple nodes alternate between the different RF.
auto_bootstrap: true
cluster_name: 'cluster_name'
listen_address:
endpoint_snitch: snitch_name
num_tokens: 128
seed_provider:
- class_name: seedprovider_name
parameters:
- seeds: "IP_address_list"
Manually add the auto_bootstrap setting if it does not exist in the cassandra.yaml. The other settings
should exist in the default cassandra.yaml file, ensure that you uncomment and set.
Seed nodes cannot bootstrap. Make sure the new node is not listed in the -seeds list. Do not make
all nodes seed nodes. See Internode communications (gossip).
4. Change any other non-default settings you have made to your existing cluster in the cassandra.yaml file
and cassandra-topology.properties or cassandra-rackdc.properties files. Use the diff command to
find and merge any differences between existing and new nodes.
5. Start the bootstrap node, see Starting DataStax Enterprise as a service or Starting DataStax Enterprise as a
stand-alone process.
6. Verify that the node is fully bootstrapped using nodetool status. All other nodes must be up (UN) and not in
any other state.
7. After all new nodes are running, run nodetool cleanup on each of the previously existing nodes to remove
the keys that no longer belong to those nodes. Wait for cleanup to complete on one node before running
nodetool cleanup on the next node.
Failure to run nodetool cleanup after adding a node may result in data inconsistencies including
resurrection of previously deleted data.
If the new datacenter uses existing nodes from another datacenter or cluster, complete the following steps to
ensure that old data will not interfere with the new cluster:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
963
Operations
1. If the nodes are behind a firewall, open the required ports for internal/external communication.
3. Clear the data from DataStax Enterprise (DSE) to completely remove application directories.
1. Complete the following steps to prevent client applications from prematurely connecting to the new
datacenter, and to ensure that the consistency level for reads or writes does not query the new datacenter:
If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.
b. Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter,
which might not have any data.
2. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.
a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.
b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.
DESCRIBE SCHEMA ;
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
964
Operations
3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.
4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.
Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.
• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.
• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.
• endpoint_snitch: snitch
See endpoint_snitch and snitches.
Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.
• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.
b. Configure node architecture (all nodes in the datacenter must use the same type):
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
965
Operations
DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.
For more information, refer to Virtual node (vnode) configuration.
Single-token architecture settings
• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.
After making any changes in the configuration files, you must the restart the node for the changes to
take effect.
a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.
b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.
7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
966
Operations
8. Rotate starting DSE through the racks until all the nodes are up.
9. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.
If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.
10. Run nodetool rebuild on each node in the new datacenter, specifying the datacenter to rebuild from. This
step replicates the data to the new datacenter in the cluster.
You must specify an existing datacenter in the command line, or the new nodes will appear to rebuild
successfully, but might not contain all anticipated data.
Requests to the new datacenter with LOCAL_ONE or ONE consistency levels can fail if the existing
datacenters are not completely in-sync.
a. Use nodetool rebuild on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the existing cluster.
b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.
$ dsetool status
If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support
Knowledge Center.
The datacenters in the cluster are now replicating with each other.
DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.2 28.44 KB 13.0.% e2451cdf-f070- ... -922337.... RAC1
UN 110.82.155.2 44.47 KB 16.7% f9fa427c-a2c5- ... 30745512... RAC1
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
967
Operations
DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.3 15.44 KB 50.2.% e2451cdf-f070- ... 9243578.... RAC1
UN 110.82.155.4 18.78 KB 49.8.% e2451cdf-f070- ... 10000 RAC1
1. Configure every keyspace using SimpleStrategy to use the NetworkTopologyStrategy replication strategy,
including (but not restricted to) the following keyspaces.
If SimpleStrategy was used previously, this step is required to configure NetworkTopologyStrategy.
a. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for
the following keyspaces.
b. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any
existing keyspaces use the NetworkTopologyStrategy replication strategy.
DESCRIBE SCHEMA ;
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
968
Operations
2. Stop the OpsCenter Repair Service if it is running in the cluster. See Turning the Repair Service off.
3. In the new datacenter, install DSE on each new node. Do not start the service or restart the node.
4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in
the cluster.
Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml
configuration files.
• auto_bootstrap: true
This setting has been removed from the default configuration, but, if present, should be set
to true.
• listen_address: empty
If not set, DSE asks the system for the local address, which is associated with its host name.
In some cases, DSE does not produce the correct address, which requires specifying the
listen_address.
• endpoint_snitch: snitch
See endpoint_snitch and snitches.
Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-
datacenter deployments (or single-zone deployments in public clouds), and does not
recognize datacenter or rack information.
• If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade
Guide for removed settings.
b. Configure node architecture (all nodes in the datacenter must use the same type):
Virtual node (vnode) allocation algorithm settings
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
969
Operations
DataStax recommends not using vnodes with DSE Search. However, if you decide
to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that
allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured
for your environment.
For more information, refer to Virtual node (vnode) configuration.
Single-token architecture settings
• Generate the initial token for each node and set this value for the initial_token property.
See Adding or replacing single-token nodes for more information.
After making any changes in the configuration files, you must the restart the node for the changes to
take effect.
a. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the
seed nodes in the new datacenter.
b. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in
the cluster. If changing snitches, see Switching snitches.
7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a
time, and then start the rest of the nodes:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
970
Operations
8. Install and configure DataStax Agents on each node in the new datacenter if necessary: Installing DataStax
Agents 6.5
nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a
-9223372036854775808 rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc
-9223372036854775798 rack1
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.111 961.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c
-9223372036854775788 rack1
Datacenter: DC4
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Token
Rack
UN 10.200.175.114 361.56 KiB ? ac43e602-ef09-4d0d-a455-3322f444198c
-9223372036854775688 rack1
10. After all nodes are running in the cluster and the client applications are datacenter aware, use cqlsh to alter
the keyspaces to add the desired replication in the new datacenter.
If client applications, including DSE Search and DSE Analytics, are not properly configured, they
might connect to the new datacenter before it is online. Incorrect configuration results in connection
exceptions, timeouts, and/or inconsistent data.
11. Run nodetool rebuild on each node in the new datacenter, specifying the corresponding datacenter/rack
from the source datacenter.
The following commands replicate data from an existing datacenter DC1 to the new datacenter DC2 on
each DC2 node. The rack specifications correspond with the rack specifications in DC1:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
971
Operations
a. Use nodetool rebuild -dc on one or more nodes at the same time. Run on one node at a time to
reduce the impact on the source datacenter.
b. Alternatively, run the command on multiple nodes simultaneously when the cluster can handle the
extra I/O and network pressure.
Rebuild can be safely run in parallel, but has potential performance tradeoffs. The nodes
in in the source datacenter will be streaming data, so application performance involving
that datacenter's data will be potentially impacted. Run tests within a the environment,
adjusting various levels of parallelism and streaming throttling to strike the optimal balance
of speed and performance.
12. Monitor the rebuild progress for the new datacenter using nodetool netstats and examining the size of
each node.
The nodetool rebuild command issues a JMX call to the DSE node and waits for rebuild to
finish before returning to the command line. Once the JMX call is invoked, the rebuild process will
continue on the server regardless of the nodetool rebuild process (the rebuild will continue to run
if nodetool dies.) There is not typically significant output from the nodetool rebuild command itself.
Instead, rebuild progress should be monitored via nodetool netstats, as well as examining the
data size of each node.
The data load shown in nodetool status will only be updated after a given source node is
done streaming, so it will appear to lag behind bytes reported on disk (e.g. du). If any streaming
errors occur, ERROR messages will be logged to system.log and the rebuild will stop. In the
event of temporary failure, nodetool rebuild can be re-run and skips any ranges that were
already successfully streamed.
13. Adjust stream throttling on the source datacenter as required to balance out network traffic. See nodetool
setstreamthroughput.
14. Confirm that all rebuilds are successful by searching for finished rebuild in the system.log of each node
in the new datacenter.
In rare cases the communication between two streaming nodes may hang, leaving the rebuild
operation alive but with no data streaming. Monitor streaming progress using nodetool netstats,
and, if the streams are not making any progress, restart the node where nodetool rebuild was
executed and re-run nodetool rebuild with the same parameters used originally.
15. Start the DataStax Agent on each node in the new datacenter if necessary.
16. Start the OpsCenter Repair Service if necessary. See Turning the Repair Service on.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
972
Operations
The procedure for replacing a dead node is the same for vnodes and single-token nodes. Extra steps are
required for replacing dead seed nodes.
Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.
2. Record the datacenter, address, and rack settings of the dead node; you will use these later.
3. Add the replacement node to the network and record its IP address.
4. If the dead node was a seed node, change the cluster's seed node configuration on each node:
a. In the cassandra.yaml file for each node, remove the IP address of the dead node from the - seeds
list in the seed-provider property.
b. If the cluster needs a new seed node to replace the dead node, add the new node's IP address to
the - seeds list of the other nodes.
Making every node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is recommended to use
a small seed list (approximately three nodes per datacenter).
5. On an existing node, gather setting information for the new node from the cassandra.yaml file:
• cluster_name
• endpoint_snitch
• Other non-default settings: Use the diff tool to compare current settings with default settings.
• If the cluster uses the PropertyFileSnitch, record the rack and data assignments listed in the
cassandra-topology.properties file, or copy the file to the new node.
• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, Configuring Amazon EC2 multi-region snitch, or Configuring the Google Cloud Platform
snitch, record the rack and datacenter assignments in the dead node's cassandra-rackdc.properties
file.
7. Make sure that the new node meets all prerequisites and then Install DataStax Enterprise on the new node,
but do not start DataStax Enterprise.
Be sure to install the same version of DataStax Enterprise as is installed on the other nodes in the cluster,
as described in the installation instructions.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
973
Operations
8. If DataStax Enterprise automatically started on the node, stop and clear the data that was added
automatically on startup.
9. Add values to the following properties in cassandra.yaml file from the information you gathered earlier:
• auto_bootstrap: If this setting exists and is set to false, set it to true. (This setting is not included in
the default cassandra.yaml configuration file.)
• cluster_name
• seed list
If the new node is a seed node, make sure it is not listed in its own - seeds list.
• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, and Configuring Amazon EC2 multi-region snitch or Configuring the Google Cloud Platform
snitch:
a. Add the dead node's rack and datacenter assignments to the cassandra-rackdc.properties file
on the replacement node.
Do not remove the entry for the dead node's IP address yet.
a. Copy the cassandra-topology.properties file from an existing node, or add the settings to
the local copy.
b. Edit the file to add an entry with the new node's IP address and the dead node's rack and
datacenter assignments.
11. Start the new node with with the required options:
Package installations:
-Dcassandra.replace_address_first_boot=address_of_dead_node
b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, add the
consistent_replace option to jvm.options using either QUORUM or LOCAL_QUORUM values to ensure data
consistency on the replacement node, otherwise the node may stream from a potentially inconsistent
replica, and reads may return stale data.
For example:
-Ddse.consistent_replace=LOCAL_QUORUM
• consistent_replace.parallelism
• consistent_replace.retries
• consistent_replace.whitelist
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
974
Operations
Tarball installations:
b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, in addtion to
replace_address_first_boot, add the consistent_replace parameter using either QUORUM or
LOCAL_QUORUM values to ensure data consistency on the replacement node, otherwise the node may
stream from a potentially inconsistent replica, and reads may return stale data.
For example:
• consistent_replace.parallelism
• consistent_replace.retries
• consistent_replace.whitelist
12. Run nodetool status to verify that the new node has bootstrapped successfully.
Tarball path:
installation_location/resources/cassandra/bin
13. In environments that use the PropertyFileSnitch, wait at least 72 hours and then remove the old node's IP
address from the cassandra-topology.properties file.
This ensures that old node's information is removed from gossip. If removed from the property file too
soon, problems may result. Use nodetool gossipinfo to check the gossip status. The node is still in
gossip until LEFT status disappears.
The cassandra-rackdc.properties file does not contain IP information; therefore this step is not
required when using other snitches, such as GossipingPropertyFileSnitch.
Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
975
Operations
Be sure to use the same version of DataStax Enterprise on all nodes in the cluster, as described in the
installation instructions.
1. Prepare and start the replacement node, as described in Adding nodes to an existing cluster.
Tarball path:
installation_location/resources/cassandra/bin
• nodetool ring: Up
• nodetool status: UN
3. Note the Host ID of the original node; it is used in the next step.
4. Using the Host ID of the original node, decommission the original node from the cluster using the nodetool
decommission command.
5. Run nodetool cleanup on all the other nodes in the same datacenter.
Failure to run nodetool cleanup after adding a node may result in data inconsistencies including
resurrection of previously deleted data.
If you've written data using a consistency level of ONE, you risk losing data because the node might contain
the only copy of a record. Be absolutely sure that no application uses consistency level ONE.
2. Follow the instructions for replacing a dead node using the old node’s IP address for -
Dcassandra.replace_address.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
976
Operations
• The preferred method is to decommission the node and re-add it to the correct rack and datacenter.
This method takes longer than the alternative method (below) because unneeded data is first removed from
the decommissioned node and then the node gets new data during bootstrapping. The alternative method
does both operations simultaneously.
• An alternative method is to update the node's topology and restart the node. Once the node is up, run a full
repair on the cluster.
This method has risks because until the repair is completed, the node may blindly handle requests for
data the node doesn't yet have. To mitigate this problem with request handling, start the node with -
Dcassandra.join_ring=false after repairing once, then fully join the node to the cluster using the JMX
method org.apache.cassandra.db.StorageService.joinRing(). The node will be less likely to be
out of sync with other nodes before it serves any requests. After joining the node to the cluster, repair the
node again, so that any writes missed during the first repair will be captured.
Decommissioning a datacenter
Steps to properly remove a datacenter so no information is lost.
To decommision a DSE datacenter:
1. Make sure no clients are still writing to any nodes in the datacenter.
When not using OpsCenter, the following JMX MBeans provide details on client connections and
pending requests:
2. Run a full repair with nodetool repair --full to ensure that all data is propagated from the datacenter being
decommissioned.
You can also use the OpsCenter Repair Service.
If using OpsCenter ensure that the repair has completed, see Checking the repair progress.
4. Change all keyspaces so they no longer reference the datacenter being removed.
If the RF (replication factor) on any keyspace has not been properly updated:
c. If the keyspace had RF simple strategy also run a full repair on the keyspace:
8. Run nodetool status to ensure that the nodes in the datacenter were removed.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
977
Operations
nodetool status
Status shows that there are three datacenters with 1 node in each:
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a
-9223372036854775808 rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc
-9223372036854775798 rack1
Datacenter: DC3
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.111 461.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c
-9223372036854775788 rack1
3. Using JConsole, check the following JMX Beans to make sure there are no active connections:
• org.apache.cassandra.metrics/Client/connectedNativeClients
• org.apache.cassandra.metrics/Client/connectedThriftClients
4. Verify that there are no pending write requests on each node that is being removed (The Pending
column should read 0 or N/A):
nodetool tpstats
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
978
Operations
HintsDispatcher 0 0 (N/
A) N/A 2...
5. Start cqlsh and remove DC3 from all keyspace configurations. Repeat for each keyspace that has a
RF set for DC3:
9. Run nodetool assassinate on each node in the DC3 (datacenter that is being removed):
10. In a remaining datacenter verify that the DC3 has been removed:
nodetool status
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.11 503.54 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a
-9223372036854775808 rack1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.200.175.113 522.47 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc
-9223372036854775798 rack1
Removing a node
Use these instructions when you want to remove nodes to reduce the size of your cluster, not for replacing a
dead node.
If you are not using Virtual nodes (vnodes), you must rebalance the cluster.
Prerequisites: If the node is a DSEFS node, follow this alternative node removal procedure: Removing a
DSEFS node.
Failure to follow the DSEFS procedure may result in data loss.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
979
Operations
To avoid excessive data streaming, make node topology changes one at a time.
Decommission does not shutdown the node, shutdown the node after decommission has completed.
# If the cluster uses vnodes, remove the node using the nodetool removenode command.
# If the cluster does not use vnodes, before running the nodetool removenode command, adjust your
tokens to evenly distribute the data across the remaining nodes to avoid creating a hot spot.
1. To speed up the restart process, before stopping the dse service, run nodetool drain.
3. Replace the old IP address in the cassandra.yaml with the new one.
• listen_address
• broadcast_address
4. If the node is a seed node, update the -seeds parameter in the seed_provider list cassandra.yaml file on all
nodes.
5. If the endpoint_snitch is PropertyFileSnitch, add an entry for the new IP address in the the cassandra-
topology.properties file on all nodes.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
980
Operations
Switching snitches
Because snitches determine how the database distributes replicas, the procedure to switch snitches depends on
whether the topology of the cluster changes:
• If data has not been inserted into the cluster, there is no change in the network topology. This means that
you only need to set the snitch; no other steps are necessary.
• If data has been inserted into the cluster, it's possible that the topology has changed and you will need to
perform additional steps.
A change in topology means that there is a change in the datacenters and/or racks where the nodes are placed.
Topology changes may occur when the replicas are placed in different places by the new snitch. Specifically,
the replication strategy places the replicas based on the information provided by the new snitch. The following
examples demonstrate the differences:
• No topology change
Change from five nodes using the DseSimpleSnitch (default) in a single datacenter
To five nodes in one datacenter and 1 rack using a network snitch such as the GossipingPropertyFileSnitch
• Topology changes
• cassandra-rackdc.properties
GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region snitch, and Configuring
Amazon EC2 multi-region snitch only.
• cassandra-topology.properties
All other network snitches.
3. Change the snitch for each node in the cluster in the node's cassandra.yaml file. For example:
endpoint_snitch: GossipingPropertyFileSnitch
4. If the topology has not changed, you can restart each node one at a time.
Any change in the cassandra.yaml file requires a node restart.
5. If the topology of the network has changed, but no datacenters are added:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
981
Operations
Failure to run nodetool cleanup after adding a node may result in data inconsistencies
including resurrection of previously deleted data.
b. Replicate data into new datacenter. Remove nodes from old datacenter.
Failure to run nodetool cleanup after adding a node may result in data inconsistencies
including resurrection of previously deleted data.
DataStax recommends stopping repair operations during topology changes; the Repair
Service does this automatically. Repairs running during a topology change are likely to error
when it involves moving ranges.
• Example 2: Switch the keyspace cycling from SimpleStrategy to NetworkTopologyStrategy for two
datacenters:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
982
Operations
3. Run nodetool repair using the -full option on each node affected by the change.
Tarball path:
installation_location/resources/cassandra/bin
• Migrating a cluster, including transitioning an EC2 cluster to Amazon virtual private cloud (VPC), moving a
cluster, or upgrading from an early version cluster to a recent major version.
• Renaming a cluster. You cannot change the name of an existing cluster; you must create a new cluster and
migrate your data to the new cluster.
The following method migrates a cluster without service interruption and ensures that if a problem occurs in the
new cluster, you still have an existing cluster as a fallback.
1. Set up and configure the new cluster as described in Initializing a DataStax Enterprise cluster.
If you're not using vnodes, be sure to configure the token ranges in the new nodes to match the
ranges in the old cluster. See Initializing single-token architecture datacenters.
Depending on how the writes are implemented, code changes may be required. Be sure to use
identical consistency levels.
4. Ensure that the data is flowing to the new nodes so you won't have any gaps when you copy the snapshots
to the new cluster in 6.
• You can copy the data files to their matching nodes in the new cluster, which is simpler and more
efficient, if:
• If the clusters are different sizes or if you are using vnodes, use the sstableloader (sstableloader).
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
983
Operations
7. You can either switch to the new cluster all at once or perform an incremental migration.
For example, to perform an incremental migration, you can set your client to designate a percentage of the
reads that go to the new cluster. This allows you to test the new cluster before decommissioning the old
cluster.
8. Ensure that the new cluster is operating properly and then decommission the old cluster. See
Decommissioning a datacenter.
• Add capacity by doubling the cluster size: Adding capacity by doubling (or tripling or quadrupling) the
number of nodes is less complicated when assigning tokens. Using this method, existing nodes keep their
existing token assignments, and the new nodes are assigned tokens that bisect (or trisect) the existing token
ranges.
• Add capacity for a non-uniform number of nodes: When increasing capacity with this method, you must
recalculate tokens for the entire cluster, and assign the new tokens to the existing nodes.
Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.
For DataStax Enterprise clusters, you can use OpsCenter to rebalance a cluster.
1. Calculate the tokens for the nodes based on your expansion strategy using the Token Generating Tool.
2. Install DataStax Enterprise and configure DataStax Enterprise on each new node.
3. If DataStax Enterprise starts automatically, stop the node and clear the data.
• cluster_name
• listen_address/broadcast_address: Usually leave blank. Otherwise, use the IP address or host name
that other nodes use to connect to the new node.
• endpoint_snitch
• seed_provider: Make sure that the new node lists at least one seed node in the existing cluster.
Seed nodes cannot bootstrap. Make sure the new nodes are not listed in the -seeds list. Do not
make all nodes seed nodes. See Internode communications (gossip).
• Change any other non-default settings in the new nodes to match the existing nodes. Use the diff
command to find and merge any differences between the nodes.
5. Depending on the snitch, assign the datacenter and rack names in the cassandra-topology.properties or
cassandra-rackdc.properties for each node.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
984
Operations
6. Start DataStax Enterprise on each new node in two minutes intervals with consistent.rangemovement
turned off:
• Package installations: To each bootstrapped node, add the following option to the jvm.options file and
then start DataStax Enterprise:
-Dcassandra.consistent.rangemovement=false
• Tarball installations:
$ bin/cassandra -Dcassandra.consistent.rangemovement=false
The following operations are resource intensive and should be done during low-usage times.
7. After the new nodes are fully bootstrapped, use nodetool move to assign the new initial_token value to
each node that requires one, one node at a time.
8. After all nodes have their new tokens assigned, run nodetool cleanup on each node in the cluster and wait
for cleanup to complete on each node before doing the next node.
This step removes the keys that no longer belong to the previously existing nodes.
Failure to run nodetool cleanup after adding a node may result in data inconsistencies including
resurrection of previously deleted data.
2. For each new node, edit the configuration properties in the cassandra.yaml file:
• Set the initial_token. Be sure to offset the tokens in the new datacenter, see Initializing single-
token architecture datacenters.
• Set the seed lists. Every node in the cluster must have the same list of seeds and include at least
one node from each datacenter. Typically one to three seeds are used per datacenter.
3. Update the relevant properties file on all nodes to include the new nodes. You do not need to restart.
• GossipingPropertyFileSnitch: cassandra-rackdc.properties
• PropertyFileSnitch: cassandra-topology.properties
4. Ensure that your client does not auto-detect the new nodes so that they aren't contacted by the client until
explicitly directed.
5. If using a QUORUM consistency level for reads or writes, check the LOCAL_QUORUM or EACH_QUORUM
consistency level to make sure that the level meets the requirements for multiple datacenters.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
985
Operations
a. Change the replication factor for your keyspace for the expanded cluster.
Only add new nodes to the cluster. A new node is a system that DataStax Enterprise has never started. The
node must have absolutely NO PREVIOUS DATA in the data directory, saved_caches, commitlog, and hints.
Adding nodes previously used for testing or that have been removed from another cluster, merges the older
data into the cluster and may cause data loss or corruption.
2. Record the datacenter, address, and rack settings of the dead node; you will use these later.
3. Record the existing initial_token setting from the dead node's cassandra.yaml.
4. Add the replacement node to the network and record its IP address.
5. If the dead node was a seed node, change the cluster's seed node configuration on each node:
a. In the cassandra.yaml file for each node, remove the IP address of the dead node from the - seeds
list in the seed-provider property.
b. If the cluster needs a new seed node to replace the dead node, add the new node's IP address to
the - seeds list of the other nodes.
Making every node a seed node is not recommended because of increased maintenance and
reduced gossip performance. Gossip optimization is not critical, but it is recommended to use
a small seed list (approximately three nodes per datacenter).
6. On an existing node, gather setting information for the new node from the cassandra.yaml file:
• cluster_name
• endpoint_snitch
• Other non-default settings: Use the diff tool to compare current settings with default settings.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
986
Operations
• If the cluster uses the PropertyFileSnitch, record the rack and data assignments listed in the
cassandra-topology.properties file, or copy the file to the new node.
• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, Configuring Amazon EC2 multi-region snitch, or Configuring the Google Cloud Platform
snitch, record the rack and datacenter assignments in the dead node's cassandra-rackdc.properties
file.
8. Make sure that the new node meets all prerequisites and then Install DataStax Enterprise on the new node,
but do not start DataStax Enterprise.
Be sure to install the same version of DataStax Enterprise as is installed on the other nodes in the cluster,
as described in the installation instructions.
9. If DataStax Enterprise automatically started on the node, stop and clear the data that was added
automatically on startup.
10. Add values to the following properties in cassandra.yaml file from the information gathered earlier:
• auto_bootstrap: If this setting exists and is set to false, set it to true. (This setting is not included in
the default cassandra.yaml configuration file.)
• cluster_name
• initial token
• seed list
If the new node is a seed node, make sure it is not listed in its own - seeds list.
• If the cluster uses the GossipingPropertyFileSnitch, Configuring the Amazon EC2 single-region
snitch, and Configuring Amazon EC2 multi-region snitch or Configuring the Google Cloud Platform
snitch:
a. Add the dead node's rack and datacenter assignments to the cassandra-rackdc.properties file
on the replacement node.
Do not remove the entry for the dead node's IP address yet.
a. Copy the cassandra-topology.properties file from an existing node, or add the settings to
the local copy.
b. Edit the file to add an entry with the new node's IP address and the dead node's rack and
datacenter assignments.
12. Start the new node with with the required options:
Package installations:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
987
Operations
-Dcassandra.replace_address_first_boot=address_of_dead_node
b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, add the
consistent_replace option to jvm.options using either QUORUM or LOCAL_QUORUM values to ensure data
consistency on the replacement node, otherwise the node may stream from a potentially inconsistent
replica, and reads may return stale data.
For example:
-Ddse.consistent_replace=LOCAL_QUORUM
• consistent_replace.parallelism
• consistent_replace.retries
• consistent_replace.whitelist
Tarball installations:
b. If applications expect QUORUM or LOCAL_QUORUM consistency levels from the cluster, in addtion to
replace_address_first_boot, add the consistent_replace parameter using either QUORUM or
LOCAL_QUORUM values to ensure data consistency on the replacement node, otherwise the node may
stream from a potentially inconsistent replica, and reads may return stale data.
For example:
• consistent_replace.parallelism
• consistent_replace.retries
• consistent_replace.whitelist
13. Run nodetool status to verify that the new node has bootstrapped successfully.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
988
Operations
Tarball path:
installation_location/resources/cassandra/bin
14. In environments that use the PropertyFileSnitch, wait at least 72 hours and then remove the old node's IP
address from the cassandra-topology.properties file.
This ensures that old node's information is removed from gossip. If removed from the property file too
soon, problems may result. Use nodetool gossipinfo to check the gossip status. The node is still in
gossip until LEFT status disappears.
The cassandra-rackdc.properties file does not contain IP information; therefore this step is not
required when using other snitches, such as GossipingPropertyFileSnitch.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
989
Operations
2. Run the nodetool snapshot command, specifying the hostname, JMX port, and keyspace. For example:
Tarball path:
installation_location/resources/cassandra/bin
$ ls -1 data/cycling/cyclist_name-9e516080f30811e689e40725f37c761d/snapshots/
cycling_2017-3-9
The data files extension is .db and the full CQL to create the table is in the schema.cql file.
manifest.json
mc-1-big-CompressionInfo.db
mc-1-big-Data.db
mc-1-big-Digest.crc32
mc-1-big-Filter.db
mc-1-big-Index.db
mc-1-big-Statistics.db
mc-1-big-Summary.db
mc-1-big-TOC.txt
schema.cql
1. To delete all snapshots for a node, run the nodetool clearsnapshot command. For example:
Tarball path:
installation_location/resources/cassandra/bin
To delete snapshots on all nodes at once, run the nodetool clearsnapshot command using a parallel ssh
utility.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
990
Operations
2. To delete a single snapshot, run the clearsnapshot command with the snapshot name:
The file name and path vary according to the type of snapshot. See nodetools snapshot for details about
snapshot names and paths.
1. Edit the cassandra.yaml configuration file on each node in the cluster and change the value of
incremental_backups to true.
Restoring from snapshots and incremental backups temporarily causes intensive CPU and I/O activity on the
node being restored.
1. Make sure the table schema exists and is the same as when the snapshot was created.
The nodetool snapshot command creates a table schema in the output directory. If the table does not exist,
recreate it using the schema.cql file.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
991
Operations
1. Verify that the SSTable version is compatible with the current version of DSE:
data/cycling/cyclist_expenses-e4f31e122bc511e8891b23da85222d3d/aa-1-bti-Data.db
2. Make sure the table schema exists and is the same as when the snapshot was created.
The nodetool snapshot command creates a table schema in the output directory. If the table does not exist,
recreate it using the schema.cql file.
4. Restore the most recent snapshot using the sstableloader tool on the backed-up SSTables.
The sstableloader streams the SSTables to the correct nodes. You do not need to remove the commitlogs or
drain or restart the nodes.
This procedure assumes you are familiar with restoring a snapshot and configuring and initializing a cluster.
1. From the old cluster, retrieve the list of tokens associated with each node's IP:
2. In the cassandra.yaml file for each node in the new cluster, add the list of tokens you obtained in the
previous step to the initial_token parameter using the same num_tokens setting as in the old cluster.
If nodes are assigned to racks, make sure the token allocation and rack assignments in the new
cluster are identical to those of the old.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
992
Operations
3. Make any other necessary changes in the new cluster's cassandra.yaml and property files so that the new
nodes match the old cluster settings. Make sure the seed nodes are set for the new cluster.
This allows the new nodes to use the initial tokens defined in the cassandra.yaml when they restart.
5. Start each node using the specified list of token ranges in new cluster's cassandra.yaml:
6. Create schema in the new cluster. All the schemas from the old cluster must be reproduced in the new
cluster.
7. Stop the node. Using nodetool refresh is unsafe because files within the data directory of a running
node can be silently overwritten by identically named just-flushed SSTables from memtable flushes or
compaction. Copying files into the data directory and restarting the node will not work for the same reason.
8. Restore the SSTable files snapshotted from the old cluster onto the new cluster using the same directories,
while noting that the UUID component of target directory names has changed. Without restoration, the new
cluster will not have data to read upon restart.
It's possible that you can simply replace the disk, restart DataStax Enterprise, and run nodetool repair.
However, if the disk crash corrupted system table, you must remove the incomplete data from the other disks in
the array. The procedure for doing this depends on whether the cluster uses Vnode or single-token architecture.
1. Verify that the node has a defective disk and identify the disk, by checking the logs on the affected node.
Disk failures are logged in FILE NOT FOUND entries, which identifies the mount point or disk that has
failed.
2. If the node is still running, stop DSE and shut down the node.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
993
Operations
JVM_OPTS="$JVM_OPTS -Dcassandra.allow_unsafe_replace=true
Tarball installations:
Tarball path:
installation_location
5. If DataStax Enterprise restarts, run nodetool repair on the node. If not, replace the node.
a. On the affected node, clear the system directory on each functioning drive.
Example for a node with a three disk JBOD array:
-/mnt1/cassandra/data
-/mnt2/cassandra/data
-/mnt3/cassandra/data
-Dcassandra.allow_unsafe_replace=true
a. On one of the cluster's working nodes, run nodetool ring to retrieve the list of the repaired node's
tokens:
$ nodetool ring | grep ip_address_of_node | awk ' {print $NF ","}' | xargs
c. Edit the output, keeping the list of tokens and deleting the other columns.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
994
Operations
d. On the node with the new disk, open the cassandra.yaml file and add the tokens (as a comma-
separated list) to the initial_token property.
e. Change any other non-default settings in the new nodes to match the existing nodes. Use the diff
command to find and merge any differences between the nodes.
If the repair succeeds, the node is restored to production. If not, replace the node.
f. On the affected node, clear the system directory on each functioning drive.
Example for a node with a three disk JBOD array:
-/mnt1/cassandra/data
-/mnt2/cassandra/data
-/mnt3/cassandra/data
-Dcassandra.allow_unsafe_replace=true
Repairing nodes
For conceptual information about repairing nodes, see Anti-entropy repair.
Manual repair: Anti-entropy repair
A manual repair is run using nodetool repair. This tool provides many options for configuring repair. This page
provides guidance for choosing certain parameters.
Tables with NodeSync enabled will be skipped for repair operations run against all or specific keyspaces. For
individual tables, running the repair command will be rejected when NodeSync is enabled.
DataStax recommends using the partitioner range parameter when running full repairs during routine
maintenance.
Full repair is run by default.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
995
Operations
If running nodetool repair -pr on a downed node that has been recovered, be sure to run the command on
all other nodes in the cluster as well.
• Does not support the use of -local with the -pr option unless the datacenter nodes have all the data for all
ranges.
• Does not support the use of -local with -inc (incremental repair).
For repairs across datacenters, use the -dcpar option to repair datacenters in parallel.
One-way targeted repair from a remote node (--pull, --hosts, -st, -et)
Runs a repair directly from another node, which has a replica in the same token range. This option minimizes
performance impact when cross-datacenter repairs are required.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
996
Operations
Figure 18: Merkle Trees for Incremental Repair versus Full Repair
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
997
Operations
Incremental repairs work like full repairs, with an initiating node requesting Merkle trees from peer nodes with the
same unrepaired data, and then comparing the Merkle trees to discover mismatches. Once the data has been
reconciled and new SSTables built, the initiating node issues an anti-compaction command. Anti-compaction
is the process of segregating repaired and unrepaired ranges into separate SSTables, unless the SSTable fits
entirely within the repaired range. In the latter case, the SSTable metadata repairedAt is updated to reflect its
repaired status.
Anti-compaction is handled differently, depending on the compaction strategy assigned to the data.
• Size-tiered compaction (STCS) splits repaired and unrepaired data into separate pools for separate
compactions. A major compaction generates two SSTables, one for each pool of data.
• Leveled compaction (LCS) performs size-tiered compaction on unrepaired data. After repair completes,
Casandra moves data from the set of unrepaired SSTables to L0.
• Date-tiered (DTCS) splits repaired and unrepaired data into separate pools for separate compactions. A
major compaction generates two SSTables, one for each pool of data. DTCS compaction should not use
incremental repair.
• When recovering a node after a failure while bringing it back into the cluster.
• To update data on a node containing infrequently read data, and subsequently does not get read repair.
• When recovering missing data or corrupted SSTables. You must run non-incremental repair.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
998
Operations
Full repair is useful for maintaining data integrity, even if deletions never occur.
• Use the parallel and partitioner range options, unless precluded by the scope of the repair.
• Migrate off incremental repairs and then run a full repair to eliminate anti-compaction. Anti-compaction is the
process of splitting an SSTable into two SSTables, one with repaired data and one with non-repaired data.
This has compaction strategy implications.
If you are on DataStax Enterprise version 5.1.0-5.1.2, DataStax recommends upgrading to 5.1.3 or later.
• Run repair frequently enough that every node is repaired before reaching the time specified in the
gc_grace_seconds setting. If this requirement is met, deleted data is properly handled in the cluster.
• Schedule routine node repair operations to minimize cluster disruption during low-usage hours and on one
node at a time:
• Increase the time value setting of gc_grace_seconds if data is seldom deleted or overwritten. For these
tables, changing the setting minimizes impact to disk space and provides a longer interval between repair
operations.
• Mitigate heavy disk usage by configuring nodetool compaction throttling options (setcompactionthroughput
and setcompactionthreshold) before running a repair.
Prerequisites:
In RHEL and Debian installations, you must install the tools packages before following these steps.
Before starting this procedure, be aware that the first system-wide full repair (3) can take a long time, as the
database recompacts all SSTables. To make this process less disruptive, migrate the cluster to incremental
repair one node at a time.
In a terminal:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
999
Operations
$ nodetool disableautocompaction
2. Before running a full repair (3), list the nodes SSTables located in /var/lib/cassandra/data. You will
need this list to run the command to set the repairedAt flag in 5.
The data directory contains a subdirectory for each keyspace. Each subdirectory contains a set of files
for each SSTable. The name of the file that contains the SSTable data has the following format:
<version_code>-<generation>-<format>-Data.db
$ nodetool repair
5. Using the list you created in 2, set the repairedAt flag on each SSTable using sstablerepairedset to --is-
repaired.
Unless you set the repairedAt to repaired for each SSTable, the existing SSTables might not be
changed by the repair process and any incremental repair process that runs later will not process these
SSTables.
Tarball path:
installation_location/resources/cassandra/tools/bin
The value of the repairedAt flag is the timestamp of the last repair. The sstablerepairedset
command applies the current date/time. To check the value of the repairedAt flag, use:
What's next:
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1000
Operations
After you have migrated all nodes, you can run incremental repairs using nodetool repair with the -inc
option.
https://www.datastax.com/dev/blog/repair-in-cassandra
https://www.datastax.com/dev/blog/more-efficient-repairs
https://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1
• Bloom filters
• Partition summary
• Compression offsets
The metadata resides in memory and is proportional to total data. Some of the components grow proportionally
to the size of total memory. The database gathers replicas for a read or for anti-entropy repair and compares the
replicas in heap memory.
Data written to the database is first stored in memtables in heap memory. Memtables are then flushed to
SSTables on disk.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1001
Operations
• Page cache. The database uses additional memory as page cache when reading files on disk.
• The database can store cached rows in native memory, outside the Java heap. This reduces JVM heap
requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.
• Solr stores indexed data in RAM buffer until it is flushed to index segments on disk; when setting the heap
size determine the amount of memory required for Solr indexes. Allow enough free RAM, that is total RAM -
DSE heap size - DSE off heap object size.
• Multiple concurrent indexers can cause GC thrashing, even with a large heap.
• Indexes larger than the page cache size can cause impact search query performance. Ensure that the index
size does not exceed the page cache size for highly performant search queries.
Analytics
DSE Analytics nodes run Spark in a separate JVM. Therefore, adjustments to the Cassandra JVM do not affect
Spark operations directly. DSE Analytics typically have read heavy workloads because analytic nodes run a
significant number of range reading queries. Additional memory usage considerations include:
• Spark executors are the most memory intensive processes in Spark. These are tuned to use G1 GC by
default. Tune the size of the executor heap in spark.defaults.conf. Consider leaving room for OS page
cache when tuning the executor heaps.
• Common causes of Spark OOM's are shuffle steps. Try to avoid performing shuffles by leveraging
RepartitionByCassandraReplica / JoinWithCassandraTable in your RDD Jobs .
Graph
DSE Graph workloads often include Search, Analytics, or both. Tune the GC for the Search and Analytics
workloads. In addition to the memory needed by the Search and Analytics workloads, Graph queries
utilize memory during execution. This workload is characterized by its short lived objects. Most DSE Graph
deployments with Search enabled are run on systems of >= 128GB RAM with G1 GC heaps of 32 GB.
Changing heap size parameters
By default, DataStax Enteprise (DSE) sets the Java Virtual Machine (JVM) heap size from 1 to 32 GB
depending on the amount of RAM and type of Java installed. The cassandra-env.sh automatically configures
the min and max size to the same value using the following formula:
To adjust the JVM heap size, uncomment and set the following parameters in the jvm.options file:
• Minimum (-Xms)
• Maximum (-Xmx)
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1002
Operations
When overriding the default setting, both min and max must be defined the jvm.options file.
Additionally, for larger machines, increase the max direct memory (-XX:MaxDirectMemorySize), but leave
around 15-20% of memory for the OS and other in-memory structures.
Guidelines and recommendations
Setting the Java heap higher than 32 GB may interfere with the OS page cache. Operating systems that
maintain the OS page cache for frequently accessed data are very good at keeping this data in memory.
Properly tuning the OS page cache usually results in better performance than increasing the row cache. For
production use, follow these guidelines to adjust heap size for your environment:
• Heap size is usually between ¼ and ½ of system memory but not larger than 32 GB.
• Reserve enough memory for the offheap cache and file system cache.
• Enable parallel processing for GC, particularly when using DSE Search.
• The GCInspector class logs information about any garbage collection that takes longer than 200 ms.
Garbage collections that occur frequently and take a moderate length of time (seconds) to complete
indicate excessive garbage collection pressure on the JVM. In addition to adjusting the garbage collection
options, other remedies include adding nodes, and lowering cache sizes.
• For a node using G1, DataStax recommends a MAX_HEAP_SIZE as large as possible, up to 64 GB.
For more tuning tips, see Secret HotSpot option improving GC pauses on large heaps.
CMS for newer computers (8+ cores) with up to 256 GB RAM No more than 16 GB
• ¼ of MAX_HEAP_SIZE
-Xloggc:/var/log/cassandra/gc.log
After restarting Cassandra the log is created and GC events are recorded.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1003
Operations
a. Uncomment and set both the min and max heap size. For example to set both the min and max
heap size to 16 GB:
-Xms16G
-Xmx16G
Set the min (-Xms) and max (-Xmx) heap sizes to the same value to avoid stop-the-world
GC pauses during resize, and to lock the heap in memory on startup which prevents any of it
from being swapped out.
b. If using CMS, uncomment and set the new generation heap size to tune the heap for CMS. As a
starting point, set the new parameter to 100 MB per physical CPU core. For example, for a modern
eight-core or greater system:
-Xmn800M
A larger size leads to longer GC pause times. For a smaller new size, GC pauses are shorter
but usually more expensive.
3. On larger machines, increase the max direct memory (-XX:MaxDirectMemorySize), but leave around
15-20% of memory for the OS and other in-memory structures. For example, to set the max direct memory
to 1 MB:
-XX:MaxDirectMemorySize=1M
By default, the size is zero, so the JVM selects the size of the NIO direct-buffer allocations
automatically.
Alternatively, you can set an environment variable called MAX_DIRECT_MEM, instead of setting a size
for -XX:MaxDirectMemorySize in the jvm.options file.
5. Restart Cassandra and run some read heavy or write heavy operations.
This method decreases performance for the test node, but generally does not significantly reduce
cluster performance.
If performance does not improve, contact the DataStax Services team for additional help.
• G1 divides the heap into multiple regions, where the number of regions depends primarily on the heap
size and heap region size. The G1 collector dynamically assigns the regions to old generation or new
generation based on the running workload, prioritizing garbage collection in areas of the heap that will yield
the largest free space when collected. Additionally, G1 makes tradeoffs at runtime optimizing for a pause
target (which is configurable using -XX:MaxGCPauseMillis) to provide predictable performance.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1004
Operations
• CMS divides the heap into new generation (eden + survivor spaces), old generation, and permanent
generation, and relies on many heuristics and configurable settings to optimize for performance.
G1 advantages
DataStax recommends G1 over CMS for the following reasons:
• G1 supports large heap sizes (24-96 GB) without tuning. DSE systems, especially those with Search,
Analytics, or Graph workloads, have enough RAM to run larger heaps.
• G1 handles dynamic workloads more effectively than CMS. DSE systems typically have multiple
workloads, such as reads, writes, compactions, search indexing, and range reads for analytics, etc.
• G1 is easier to configure. The only configuration options are MAX_HEAP_SIZE and -XX:MaxGCPauseMillis.
Setting MaxGCPauseMillis lower than 500 ms to force lower latency collections might not have the intended
effect. When this value is set lower, it causes GC to run more aggressively and less efficiently, which can
steal cycles without yielding considerable benefit.
Set the value for the -XX:MaxGCPauseMillis parameter in the jvm.options file.
Using the Continuous Mark Sweep (CMS) garbage collector
For some deployments that have small heap sizes, Continuous Mark Sweep (CMS) performs better than
Garbage-First (G1) garbage collector (GC). CMS requires manual tuning which is time consuming, requires
expertise, and can result in poor performance when not done scientifically or if a workload changes.
Using CMS has the following disadvantages:
• Only supports heap sizes up to 14 gigabytes (GB). Allocating more memory to heap can result in
diminishing performance as the garbage collection facility increases the amount of database metadata in
heap memory.
CMS guidelines
Use the following basic recommendations when configuring CMS:
• Only use CMS in fixed workload environments, that is the cluster performs the same processes all the
time.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1005
Operations
# For systems with more than 24 GB of RAM, configure a 14 GB heap and the settings from
CASSANDRA-8150.
# For systems with less than 24 GB of RAM, configure an 8 GB heap and use the default settings.
# For systems that cannot support 8 GB heap (which are not usually fit for production workloads) use
the default settings. This allocates ¼ of the available RAM to the heap.
Note: For more CMS tuning tips, seeSecret HotSpot option improving GC pauses on large heaps.
1. Open jvm.options.
After updating the value of bloom_filter_fp_chance on a table, Bloom filters need to be regenerated in one of
these ways:
• Initiate compaction
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1006
Operations
$ nodetool upgradesstables -a
If the SSTables are already on the current version, the nodetool upgradesstables command returns
immediately and no action is taken. You must use the -a command argument to force the SSTable
upgrade.
• The write load includes a high volume of updates on a smaller set of data.
• A steady stream of continuous writes occurs. This action leads to more efficient compaction.
Allocating memory for memtables reduces the memory available for caching and other internal database
structures, so tune carefully and in small increments.
Data caching
Configuring data caches
DataStax Enterprise includes integrated caching and distributes cache data around the cluster.
When a node goes down, the client can read from another cached replica of the data. The database architecture
also facilitates troubleshooting because there is no separate caching tier, and cached data matches what is
in the database exactly. The integrated cache alleviates the cold start problem by saving the cache to disk
periodically. The database reads contents back into the cache and distributes the data when it restarts. The
cluster does not start with a cold cache.
The saved key cache files include the ID of the table in the file name. A saved key cache filename for the users
table in the mykeyspace keyspace looks similar to:
mykeyspace-users.users_name_idx-19bd7f80352c11e4aa6a57448213f97f-KeyCache-
b.db2046071785672832311.tmp
Configure the number of rows to cache in a partition by setting the rows_per_partition table option. To cache
rows, if the row key is not already in the cache, the database reads the first portion of the partition, and puts the
data in the cache. If the newly cached data does not include all cells configured by user, the database performs
another read. The actual size of the row-cache depends on the workload. You should properly benchmark your
application to get ”the best” row cache size to configure.
There are two row cache options, the old serializing cache provider and a new off-heap cache (OHC) provider.
The new OHC provider has been benchmarked as performing about 15% better than the older option.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1007
Operations
Configuring the row_cache_size_in_mb (in the cassandra.yaml configuration file) determines how much space
in memory the database allocates to store rows from the most frequently read partitions of the table.
1. Set the table caching property that configures the partition key cache and the row cache.
• Store lower-demand data or data with extremely long partitions in a table with minimal or no caching.
• Deploy a large number of transactional nodes under a relatively light load per node.
The Tuning the row cache in Cassandra 2.1 blog describes best practices of using the built-in caching
mechanisms and designing an effective data model.
When you query a table, turn on tracing to check that the table actually gets data from the cache rather than
from disk. The first time you read data from a partition, the trace shows this line below the query because the
cache has not been populated yet:
In subsequent queries for the same partition, look for a line in the trace that looks something like this:
This output means the data was found in the cache and no disk read occurred. Updates invalidate the cache. If
you query rows in the cache plus uncached rows, request more rows than the global limit allows, or the query
does not grab the beginning of the partition, the trace might include a line that looks something like this:
Ignoring row cache as cached value could not satisfy query [ReadStage:89]
This output indicates that an insufficient cache caused a disk read. Requesting rows not at the beginning of
the partition is a likely cause. Try removing constraints that might cause the query to skip the beginning of the
partition, or place a limit on the query to prevent results from overflowing the cache. To ensure that the query
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1008
Operations
hits the cache, try increasing the cache size limit, or restructure the table to position frequently accessed rows
at the head of the partition.
Monitoring and adjusting caching
In the event of high memory consumption, consider tuning data caches.
Make changes to cache options in small, incremental adjustments, then monitor the effects of each change using
nodetool info.
The cassandra.yaml file provides options for adjusting row cache and key cache settings:
• Capacity in bytes
• Number of hits
• Number of requests
• Duration in seconds after which the database saves the key cache.
For example, on start-up, the information from nodetool info might look something like this:
ID : 387d15ba-7103-491b-9327-1a691dbb504a
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 65.87 KB
Generation No : 1400189757
Uptime (seconds) : 148760
Heap Memory (MB) : 392.82 / 1996.81
datacenter : datacenter1
Rack : rack1
Exceptions : 0
Key Cache : entries 10, size 728 (bytes), capacity 103809024 (bytes), 93 hits, 102
requests, 0.912 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN
recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 (bytes), capacity 51380224 (bytes), 0 hits, 0
requests, NaN recent hit rate, 7200 save period in seconds
Token : -9223372036854775808
• snapshot_before_compaction
• concurrent_compactors
• compaction_throughput_mb_per_sec
The compaction_throughput_mb_per_sec parameter is designed for use with large partitions. The database
throttles compaction to this rate across the entire system.
DataStax Enterprise provides a start-up option for testing compaction strategies without affecting the production
workload.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1009
Operations
DataStax Enterprise supports the following compaction strategies, which you can configure using CQL:
• TimeWindowCompactionStrategy (TWCS) This strategy is an alternative for time series data. TWCS
compacts SSTables using a series of time windows. While with a time window, TWCS compacts all
SSTables flushed from memory into larger SSTables using STCS. At the end of the time window, all of
these SSTables are compacted into a single SSTable. Then the next time window starts and the process
repeats. The duration of the time window is the only setting required. See TWCS compaction subproperties.
For more information about TWCS, see How is data maintained?.
To configure the compaction strategy property and CQL compaction subproperties, such as the maximum
number of SSTables to compact and minimum SSTable size, use CREATE TABLE or ALTER TABLE.
1. Update a table to set the compaction strategy using the ALTER TABLE statement.
2. Change the compaction strategy property to SizeTieredCompactionStrategy and specify the minimum
number of SSTables to trigger a compaction using the CQL min_threshold attribute.
You can monitor the results of your configuration using compaction metrics, see Compaction metrics.
What's next: DataStax Enterprise supports extended logging for Compaction. This utility must be configured
as part of the table configuration. The extended compaction logs are stored in a separate file. For details, see
Enabling extended compaction logging.
Compression
Compression maximizes the storage capacity of DataStax Enterprise (DSE) nodes by reducing the volume of
data on disk and disk I/O, particularly for read-dominated workloads. The database quickly finds the location
of rows in the SSTable index and decompresses the relevant row chunks. DSE uses a storage engine that
dramatically reduces disk volume automatically. See Putting some structure in the storage engine
Write performance is not negatively impacted by compression in DataStax Enterprise as it is in traditional
databases. In traditional relational databases, writes require overwrites to existing data files on disk. The
database has to locate the relevant pages on disk, decompress them, overwrite the relevant data, and finally
recompress. In a relational database, compression is an expensive operation in terms of CPU cycles and disk I/
O. Because SSTable data files are immutable (they are not written to again after they have been flushed to disk),
there is no recompression cycle necessary in order to process writes. SSTables are compressed only once when
they are written to disk. Writes on compressed tables can show up to a 10 percent performance improvement.
In DSE the commit log can also be compressed and write performance can be improved 6-12%. See the
Updates to Cassandra’s Commit Log in 2.2 blog.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1010
Operations
After configuring compression on an existing table, subsequently created SSTables are compressed. Existing
SSTables on disk are not compressed immediately. DataStax Enterprise compresses existing SSTables
when the normal database compaction process occurs. You can force existing SSTables to be rewritten and
compressed by using nodetool upgradesstables or nodetool scrub.
Configuring compression
You configure a table property and subproperties to manage compression. CQL table properties describes the
available options for compression. Compression is enabled by default.
• Disable compression, using CQL to set the compression parameter enabled to false.
• Enable compression on an existing table, using ALTER TABLE to set the compression algorithm class to
LZ4Compressor, SnappyCompressor, or DeflateCompressor.
• Change compression on an existing table, using ALTER TABLE and setting the compression algorithm
class to DeflateCompressor.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1011
Operations
If you want to see how read performance is affected by modifications, stop the node, bring it up as a standalone
machine, and then benchmark read operations on the node.
JVM_OPTS="$JVM_OPTS -Dcassandra.write_survey=true
• Use DataStax Bulk Loader (dsbulk) to load and unload CSV or JSON data in and out of the DSE database.
• DSE Graph Loader is a command line utility for loading graph datasets into DSE Graph from various input
sources.
• The CQL COPY TO command mirrors what the PostgreSQL RDBMS uses for file/export import.
You can use COPY in the CQL shell to read CSV data to DSE and write CSV data from DSE to a file system.
Typically, an RDBMS has unload utilities for writing table data to a file system.
• The sstableloader provides the ability to bulk load external data into a cluster.
• DSE Analytics can use Apache Spark to connect to a wide variety of data sources and save the data to DSE
using either the older RDD or newer DataFrame method.
The DataStax Apache Kafka™ Connector synchronizes records from a Kafka topic with rows in one or more DSE
database tables.
ETL tools
If you need more sophistication applied to a data movement situation than just extract-load, you can use
any number of extract-transform-load (ETL) solutions that support DataStax Enterprise. These tools provide
transformation routines for manipulating source data and then loading the data into a DSE target. The tools offer
features such as visual, point-and-click interfaces, scheduling engines, and more.
Many ETL vendors who support DSE supply community editions of their products that are free and able to solve
many different use cases. Enterprise editions are also available.
You can download ETL tools that work with DSE from Talend, Informatica, and Streamsets.
• A lower score applies to nodes that have a large number of dropped mutations and nodes that are just
started.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1012
Operations
On DSE Search nodes, the shard selection algorithm uses account proximity and secondary factors such as
active and indexing statuses. You can examine node health scores and indexing status. The indexing status is
INDEXING, FINISHED, or FAILED.
Replication selection for distributed search queries can be configured to consider node health when multiple
candidates exist for a particular token range. This health-based routing enables a trade-off between index
consistency and query throughput. When the primary concern is performance, do not enable health-based routing.
a. Customize node health options to increase the node health score from 0 to 1 (full health):
node_health_options:
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30
node_health_options
Node health options are always enabled.
b. To enable replication selection for distributed search queries to consider node health, enable health-
based routing:
enable_health_based_routing: true
Health-based routing enables a trade-off between index consistency and query throughput. When
the primary concern is performance, do not enable health-based routing.
2. To retrieve a dynamic health score between 0 and 1 that describes the specified DataStax Enterprise node,
use the dsetool node_health command.
For example:
If you do not specify the IP address, the default is the local DataStax Enterprise node.
Specify dsetool node_health -all to retrieve the node health scores for all nodes.
You can also see node health scores with dsetool status.
3. To retrieve the dynamic indexing status (INDEXING, FINISHED, or FAILED) of the specified core on a node,
use the dsetool core_indexing_status command.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1013
Operations
For example:
wiki.solr: INDEXING
Tarball installation
To clear all data from the default directories:
$ cd installation_location
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1014
Chapter 10. Planning
Hardware selection, estimating disk capacity, anti-patterns, cluster testing and more.
Page
DSE 6.0 Administrator Guide Earlier DSE version Latest 6.0 patch: 6.0.13
1015