0% found this document useful (0 votes)

71 views37 pages

Search Head Clustering Guide

Uploaded by

Valarmathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views37 pages

Search Head Clustering Guide

Uploaded by

Valarmathi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Search Head Clustering

Basics To Best Practices

Bharath Aleti | Product Manager, Splunk

Manu Jose | Sr. Software Engineer, Splunk

September 2017 | Washington, DC

Forward-Looking Statements
During the course of this presentation, we may make forward-looking statements regarding future events or
the expected performance of the company. We caution you that such statements reflect our current
expectations and estimates based on factors currently known to us and that actual events or results could
differ materially. For important factors that may cause actual results to differ from those contained in our
forward-looking statements, please review our filings with the SEC.

The forward-looking statements made in this presentation are being made as of the time and date of its live
presentation. If reviewed after its live presentation, this presentation may not contain current or accurate
information. We do not assume any obligation to update any forward looking statements we may make. In
addition, any information about our roadmap outlines our general product direction and is subject to change
at any time without notice. It is for informational purposes only and shall not be incorporated into any contract
or other commitment. Splunk undertakes no obligation either to develop the features or functionality
described or to include any such feature or functionality in a future release.
Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in
the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. © 2017 Splunk Inc. All rights reserved.
Agenda

▶︎ What is Search Head Clustering?

▶︎ Clustering Internals
▶︎ Distributed Scheduling
▶︎ Configuration Management
▶︎ Bundle Replication
▶︎ What’s New in SHC
Search Head
Clustering Overview
What is Search Head Clustering?
Search Head Clustering

Ability to group search heads into a cluster in order to provide

Highly Available and Scalable search services

MISSION
CRITICAL
ENTERPRISE
Business Benefits of SHC

Horizontal Scaling Consistent User

Experience

Always-on Search Easy to add / manage

Services premium contents (apps)
Clustering
Internals
How does SHC work?
SHC – How Does It Work?
1 2

1. Group search heads into a cluster (Horizontal scaling)

2. Captain gets elected dynamically (No Single point failure)
3. User created reports/dashboards automatically replicated to other search
heads (Consistent Configuration)
Search Head Cluster Bring Up

1. Bootstrap captain
2. Bring-up members
captain
3. Captain establishes authority
4. Members join/register
members
5. CLI based cluster scale/shrink
...

config-log
{s1,s2, .., sn}
Dynamic Captain & Auto Failover
members
Fixups
▶︎ Raft Consensus Protocol
new captain
from Stanford
• Diego Ongaro & John Osterhout

▶︎ SHC uses RAFT for LE and artifacts

Auto Failover running jobs
alerts, etc

...
search load

old captain
Controlling Captaincy

▶︎ Captain Switching should be extremely rare

▶︎ Repair a problem by transfer captain without restarts!!!
▶︎ Rolling-restart from the captain maintains the node as captain after restarts
▶︎ Captain preference added for members
▶︎ Disaster Recovery using static captaincy
Best Practices

▶︎ Add only fresh instances, if a node is re-purposed use “splunk clean all”
▶︎ High availability requires a minimum of 3 members
▶︎ All search heads on homogenous hardware and at same version
▶︎ Number of instances >= replication_factor
▶︎ Admin needs to manually do “splunk remove shcluster-member” on captain
to remove a dead node
▶︎ Multi-site clusters to have majority nodes at one site

12
Distributed
Scheduling
How jobs are scheduled in SHC?
Job Scheduling Orchestration

• Captain is job scheduler SUCC

• Eliminates need for a job-server
Search 1
• Job distribution based on round
robin or load-based heuristic

captain LOAD

search -3 load

...
search 2 balancer

FAIL
scheduler
Job Scheduling

▶︎ Auto-failover – New captain becomes scheduler

▶︎ captain_is_adhoc_searchhead knob to reduce captain load
▶︎ Captain updates RA/DM summaries on indexers.
▶︎ Scheduler limits honored across the cluster
▶︎ Real time scheduled searches run one instance across cluster
▶︎ Centralized user quota Management*
High Availability Of Search Results

▶︎ Artifacts are replicated across the SH members

▶︎ Adhoc searches are not replicated
▶︎ At least replication_factor number of nodes should be in UP state for
enforcing replication policy
▶︎ Replicated directory starts with “rsa_<sid>” in the dispatch directory
▶︎ Captain orchestrates reaping of search artifacts from dispatch directory
of all members
▶︎ An artifact is served based on availability from (1) itself, (2) search
originating node, (3) captain
Centralized Cluster State

▶︎ Captain maintains a global view of alerts and suppressions and updates the
list to all members
▶︎ Captain registers all the adhoc searches run in the cluster
▶︎ Captain orchestrates reaping of search artifact replicas
▶︎ GET /services/search/jobs requests on any member will proxy to captain to
get complete jobs
Configuration
Management
How are dynamic changes to SHC kept consistent?
Configuration Files

▶︎ Goals
• Consistent user experience across all search heads
• Changes made on one member are reflected on all members
▶︎ Types of Configuration Files
• custom user content
• reports
• dashboards
• search-time knowledge
• field extractions
• event types
• macros
• system configurations
• inputs, forwarding, authentication
Configuration Changes

▶︎ Users customize search and UI configurations via UI/CLI/REST

• save report
• add panel to dashboards
• create field extraction
▶︎ Administrators modify system configurations
– configure forwarding
– deploy centralized authentication (e.g. LDAP)
– install entirely new app or hand-edited configuration
Search And UI Configurations

▶︎ Goal: Eventual Consistency

▶︎ Changes to search and UI configurations are replicated across the search
head cluster automatically
Conf Replication - Workflow

my_dashboard.xml
Conf Replication – Progress Check

▶︎ captain keeps track of the conf replication progress of each SHC member

18bc830e3087301900bdf2a30dc1a67bf8
https://localhost:11089
318ced: Tue Jul 19 15:32:56 2016
18bc830e3087301900bdf2a30dc1a67bf8
https://localhost:8089
318ced: Tue Jul 19 15:32:52 2016
dc4a991d168ae746f27979212253d6fb95
https://localhost:8189
9fc92c: Fri Jul 1 13:51:05 2016
CaptainDummyOpId: Tue Jul 19
https://localhost:9089
15:32:09 2016
Bundle Replication
How are system-wide changes kept consistent?
System Configurations

▶︎ Recall: only changes to search and UI configurations are replicated across the
search head cluster automatically
▶︎ Changes to system configurations are not replicated automatically because of
their high potential impact
▶︎ How are system configurations kept consistent, then?
Configuration Deployment

▶︎ Deployer: a single, well-controlled instance outside of the cluster

▶︎ Configurations should be tested on dev/QA instances prior to deploy

D
Bundle Push
1 2
/etc/shcluster/app1: No Changes
All apps and config are /etc/shcluster/app2: Updated Only updated apps and
shipped to the SHs in the /etc/shcluster/user: Updated updated user config is
initial deployer push pushed on subsequent
Deployer
bundle push
Bundle Push

Captain
3 4
A B C
App configuration User configuration is sent
is propagated to to the captain and then
all SHC members replicated to remaining
/etc/app2 /etc/app2: /etc/app2
SHC members
/etc/user
/etc/user /etc/user

5
Periodically, captain checks
for new bundles and
propagates the bundles to
the indexers
Idx1: Idx2: Idx3: Idx4:
KB cksum1 KB cksum1 KB cksum1 KB cksum1
KB cksum2 KB cksum2 KB cksum2 KB cksum2
Bundle Replication
2
1 Captain SH periodically contacts CM to
Each bundle push A B C grab generation and peer set
includes a KB cksum, information. It tracks/reads
when it is sent to the async the latest common
indexers knowledge bundle across the
Idx1: cksum2 peers
Idx2: cksum2
Idx3: cksum3
3
Captains delegates a
scheduled search on 4 5
Search: cksum2
SH B SH B determines the If indexers do not have a common bundle
latest KB shared across • Best Effort Search uses common bundle
peers (cksum2) across the the largest subset of indexers
and excludes the other indexers
• Otherwise – a synchronous bundle
replication is kicked off prior to search

Idx3: KB cksum3 Idx4: KB cksum3

7
6
Indexers use the
Search request is knowledge bundle
issued with common (ckum2) included in
bundle checksum Idx1: Idx2: Idx3: Idx4: search request
(cksum2) KB cksum1 KB cksum1 KB cksum1 KB cksum1
KB cksum2 KB cksum2 KB cksum2 KB cksum2
SH->SHC Migration
Single Search Head Deployer
/etc/app1/default/dashboard1.xml /etc/shcluster/app1/default/dashboard1 Deployer /shcluster/etc/app1
/etc/app1/local/dashboard2.xml /etc/shcluster/app1/local/dashboard2
Bundle Push

Captain

SHC Members A B C
/etc/app1/default/dashboard1.xml
/etc/app1/default/dashboard2.xml
/etc/app1 /etc/app1 /etc/app1

▶︎ Deployer merges default and local app configuration during migration

▶︎ Post migration, users cannot perform certain operations on app settings like
delete, move or unshare since default settings are immutable by a user
▶︎ Tip: Exclude default (ex: search) apps during migration to avoid overwrite.
Migrate any custom settings in default apps by moving them to a new app
Recent Additions
What’s New in SHC?
SHC Health Checker
Goal: Improve diagnosability with actionable information
▶︎ High level cluster health assessment
▶︎ Display node status
• Captain/member
• Heartbeat status
• Uptime
• Local unpublished conf changes
▶︎ Determine conf replication baseline consistentcy
▶︎ Expose search concurrency limits (running/capacity)
Conf Replication - Health Check
Resilient Conf Replication
▶︎ Higher resiliency to ensure continuous replication of knowledge objects across
the SHC members
• Conf replication failures when JSON string exceeds 512KB
• Long file path (>255 characters) leading to snapshot creation failure
• Large lookups files may block configuration push from the members
• Accelerated baseline match using bloom filters to find the common baseline

▶︎ Intelligent captain selection

• Prevent out-of-sync SHC member from becoming captain
Bundle Push/Replication Improvements
▶︎ Delta bundle push to indexers on lookup deletes at runtime
• Trigger delta bundle replication when conf objects are deleted
▶︎ Deployer directs first bundle push to the Captain node
• Pushing to to captain enables faster bundle push down to the indexers
▶︎ Replicate option for lookup replication across SHC members
• replicate = true|false in transforms.conf
•True: lookup table is replicated to indexers,
•False: lookup table is only replicated within SHC and not to the indexers
• Avoids limitation of not replicating outputcsv (used to capture search results)
• Use outputlookup to create a new csv file and replicate to SH and indexers as needed
• Target usecase is ESTracker tables, that are replicated to only to SHC members
▶︎ Support MV fields in outputlookup
SHC Manager UI
▶︎ New SHC UI available from any of the SHC members
▶︎ Enabled only in SHC environments
▶︎ Enables admins to run cluster operations (rolling restart, captain transfer)
▶︎ More functionality to come in upcoming releases

Actions Captain is Node2

1. SHC provides always-on search services

Key and consistent user experience
Takeaways 2. Enable SHC for horizontal scalability
3. Recent additions: SHC health check
(6.5), Increased conf replication resiliency
(6.6), SHC manager UI (6.6)
© 2017 SPLUNK INC.

Thank You
Don't forget to rate this session in the
.conf2017 mobile app

SearchHead Clustering
No ratings yet
SearchHead Clustering
10 pages
SHC Cheatsheet
No ratings yet
SHC Cheatsheet
2 pages
Splunk Cluster Setup & Configuration Guide
No ratings yet
Splunk Cluster Setup & Configuration Guide
13 pages
Splunk-7 2 3-DistSearch
No ratings yet
Splunk-7 2 3-DistSearch
215 pages
Architecting Splunk For High Availability and Disaster Recovery
No ratings yet
Architecting Splunk For High Availability and Disaster Recovery
47 pages
Deploying Splunk Indexer and Search Head Clusters With Ansible Playbooks
No ratings yet
Deploying Splunk Indexer and Search Head Clusters With Ansible Playbooks
18 pages
Deploy Splunk Clusters with Ansible
No ratings yet
Deploy Splunk Clusters with Ansible
18 pages
Best Practices and Better Practices For Admins
No ratings yet
Best Practices and Better Practices For Admins
62 pages
Splk-2002 Answers - First20
No ratings yet
Splk-2002 Answers - First20
5 pages
Splunk 9.0.4 DistSearch RestartSHC
No ratings yet
Splunk 9.0.4 DistSearch RestartSHC
8 pages
Splunk Test Blueprint Architect v.1.1
No ratings yet
Splunk Test Blueprint Architect v.1.1
4 pages
Scalability and High Volume Performance of Indexer Clustering at Splunk
No ratings yet
Scalability and High Volume Performance of Indexer Clustering at Splunk
44 pages
Best Practices and Better Practices For Admins Latest Slides: Collaborate: #Bestpractices Sign Up at HTTP://SPLK - It/slack
No ratings yet
Best Practices and Better Practices For Admins Latest Slides: Collaborate: #Bestpractices Sign Up at HTTP://SPLK - It/slack
105 pages
Book-Sleha-Guide Color en PDF
No ratings yet
Book-Sleha-Guide Color en PDF
369 pages
SPLK-1003 Exam Q&A Demo
No ratings yet
SPLK-1003 Exam Q&A Demo
7 pages
Book Sleha Guide Color en
No ratings yet
Book Sleha Guide Color en
368 pages
Splunk-7 2 1-Indexer
No ratings yet
Splunk-7 2 1-Indexer
446 pages
Pushing Configuration Bundles in An Indexer Cluster
No ratings yet
Pushing Configuration Bundles in An Indexer Cluster
50 pages
HACMP Cluster Management Guide
No ratings yet
HACMP Cluster Management Guide
4 pages
Splunk Q & A Final Document
No ratings yet
Splunk Q & A Final Document
129 pages
Andrew Beekhof - Pacemaker Configuration Explained
No ratings yet
Andrew Beekhof - Pacemaker Configuration Explained
54 pages
Activities Hacmp
No ratings yet
Activities Hacmp
21 pages
SUSE Linux Cluster
No ratings yet
SUSE Linux Cluster
392 pages
Cluster Computing Essentials
No ratings yet
Cluster Computing Essentials
47 pages
Splunk Enterprise 8.2 System Administration
No ratings yet
Splunk Enterprise 8.2 System Administration
251 pages
Splunk 2002 With Answers
No ratings yet
Splunk 2002 With Answers
31 pages
Cluster Computing: by Aakash Kumar Singh
No ratings yet
Cluster Computing: by Aakash Kumar Singh
26 pages
NEW Security4Rookiesv1.3
No ratings yet
NEW Security4Rookiesv1.3
98 pages
Top Answers To Splunk Interview Questions
100% (2)
Top Answers To Splunk Interview Questions
6 pages
Microsoft Cluster Service: A Retrospect
No ratings yet
Microsoft Cluster Service: A Retrospect
37 pages
Splunk Test Blueprint Architect
No ratings yet
Splunk Test Blueprint Architect
5 pages
HACMP For AIX 6L Administration Guide
No ratings yet
HACMP For AIX 6L Administration Guide
500 pages
Book-Sleha Color en PDF
No ratings yet
Book-Sleha Color en PDF
394 pages
HANDS Hadoop Cloud
No ratings yet
HANDS Hadoop Cloud
10 pages
Splunk 101
100% (1)
Splunk 101
66 pages
Splunk Questions and Answers Final Document
No ratings yet
Splunk Questions and Answers Final Document
128 pages
Splunk-6 0 3-Deploy
No ratings yet
Splunk-6 0 3-Deploy
54 pages
Unit IV
No ratings yet
Unit IV
10 pages
Chapter-10-Introduction To Dashboards and Alerts
No ratings yet
Chapter-10-Introduction To Dashboards and Alerts
64 pages
What Is The Correct Order For Pipeline Processing
No ratings yet
What Is The Correct Order For Pipeline Processing
50 pages
SME1 ES Health Check
No ratings yet
SME1 ES Health Check
13 pages
Architecang and Sizing Your Splunk Deployment: Simeon Yep
No ratings yet
Architecang and Sizing Your Splunk Deployment: Simeon Yep
47 pages
Splunk 4.1.6 User Manual
No ratings yet
Splunk 4.1.6 User Manual
181 pages
Splunk Sizing for Admins & Architects
No ratings yet
Splunk Sizing for Admins & Architects
24 pages
Splunk Basic Tutorial (Admin + Developer)
100% (1)
Splunk Basic Tutorial (Admin + Developer)
13 pages
Splunk Interview
No ratings yet
Splunk Interview
36 pages
Splunk CLI Useful Commands Cheatsheet
50% (4)
Splunk CLI Useful Commands Cheatsheet
9 pages
Define Clustering. What Are The Different Types of Clustering Techniques - Explain Hierarchical and Partitional Clustering in Detail. - Google Search
No ratings yet
Define Clustering. What Are The Different Types of Clustering Techniques - Explain Hierarchical and Partitional Clustering in Detail. - Google Search
1 page
Lotus Domino Cluster
No ratings yet
Lotus Domino Cluster
170 pages
Hadoop Chapter 1
No ratings yet
Hadoop Chapter 1
6 pages
Simulation Cluster MGR Ps
No ratings yet
Simulation Cluster MGR Ps
2 pages
Clustering in Open Source Systems
No ratings yet
Clustering in Open Source Systems
15 pages
All-Products Esuprt Ser Stor Net Esuprt Ha Cluster Soln Esuprt Ha Cluster Soln Pvaul Emc Iscsi Storage Dell-emc-cx4i-Win-ha-ctr Reference Guide En-Us
No ratings yet
All-Products Esuprt Ser Stor Net Esuprt Ha Cluster Soln Esuprt Ha Cluster Soln Pvaul Emc Iscsi Storage Dell-emc-cx4i-Win-ha-ctr Reference Guide En-Us
76 pages
Useful Cli Commands
No ratings yet
Useful Cli Commands
10 pages
Hadoop Operations 1st Edition Eric Sammer Updated 2025
No ratings yet
Hadoop Operations 1st Edition Eric Sammer Updated 2025
91 pages
Splunk Course Notes
100% (1)
Splunk Course Notes
70 pages
What Is Splunk
No ratings yet
What Is Splunk
11 pages
IBM Hiring Process Question Bank
No ratings yet
IBM Hiring Process Question Bank
23 pages
HSUPA Parameter Huawei
No ratings yet
HSUPA Parameter Huawei
17 pages
Capacity Leveling & Evaluation Guide
No ratings yet
Capacity Leveling & Evaluation Guide
29 pages
Operating System Exam Papers
No ratings yet
Operating System Exam Papers
6 pages
Advanced PEGA Training Course
No ratings yet
Advanced PEGA Training Course
18 pages
Embedded System OS Design Notes
100% (1)
Embedded System OS Design Notes
15 pages
OStest
No ratings yet
OStest
6 pages
Scheme of Work For The Topics OS and Networking
No ratings yet
Scheme of Work For The Topics OS and Networking
2 pages
Plan of Study Msc. Computer Science For The Session Fall 2017
No ratings yet
Plan of Study Msc. Computer Science For The Session Fall 2017
19 pages
Walk in Clinics
No ratings yet
Walk in Clinics
14 pages
Krajewski Ism Ch16
50% (2)
Krajewski Ism Ch16
32 pages
10EC65 Operating Systems - Structure of Operating Systems
No ratings yet
10EC65 Operating Systems - Structure of Operating Systems
37 pages
Sri Ramanujar Engineering College Department of Information Technology Model Exam Ii Year/Iv Sem
No ratings yet
Sri Ramanujar Engineering College Department of Information Technology Model Exam Ii Year/Iv Sem
2 pages
Os 3
No ratings yet
Os 3
17 pages
Medium-Scale System Solutions
No ratings yet
Medium-Scale System Solutions
8 pages
Running Fluent Using A Load Manager
No ratings yet
Running Fluent Using A Load Manager
76 pages
Operating System Presentation
No ratings yet
Operating System Presentation
22 pages
C++ Scheduling Algorithms Guide
No ratings yet
C++ Scheduling Algorithms Guide
8 pages
Process Control Block
No ratings yet
Process Control Block
6 pages
A Systematic Review of Green Aware Management - 2024 - Sustainable Computing I
No ratings yet
A Systematic Review of Green Aware Management - 2024 - Sustainable Computing I
22 pages
OPERATING SYSTEM p1
No ratings yet
OPERATING SYSTEM p1
45 pages
MCQ 1
No ratings yet
MCQ 1
22 pages
OS Notes
No ratings yet
OS Notes
16 pages
William Stallings Computer Organization and Architecture 8 Edition Operating System Support
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Operating System Support
66 pages
Lab Manual 1768
No ratings yet
Lab Manual 1768
84 pages
What Are The Difference Between Hub and Switch
No ratings yet
What Are The Difference Between Hub and Switch
10 pages
RTOS Slides UCOS II
No ratings yet
RTOS Slides UCOS II
139 pages
Operating Systems QA
No ratings yet
Operating Systems QA
4 pages
Os Lab Manual New 11.05.23 (1) New1
No ratings yet
Os Lab Manual New 11.05.23 (1) New1
59 pages
System Software Notes
No ratings yet
System Software Notes
6 pages

Search Head Clustering Guide

Uploaded by

Search Head Clustering Guide

Uploaded by

Search Head Clustering

Basics To Best Practices

Bharath Aleti | Product Manager, Splunk

September 2017 | Washington, DC

▶︎ What is Search Head Clustering?

Ability to group search heads into a cluster in order to provide

Horizontal Scaling Consistent User

Always-on Search Easy to add / manage

1. Group search heads into a cluster (Horizontal scaling)

▶︎ SHC uses RAFT for LE and artifacts

▶︎ Captain Switching should be extremely rare

• Captain is job scheduler SUCC

▶︎ Auto-failover – New captain becomes scheduler

▶︎ Artifacts are replicated across the SH members

▶︎ Users customize search and UI configurations via UI/CLI/REST

▶︎ Goal: Eventual Consistency

▶︎ Deployer: a single, well-controlled instance outside of the cluster

Idx3: KB cksum3 Idx4: KB cksum3

▶︎ Deployer merges default and local app configuration during migration

▶︎ Intelligent captain selection

Actions Captain is Node2

1. SHC provides always-on search services

You might also like