Splunk PDF
Splunk PDF
I too have used both quite extensively, and for a time I too would have gone with Splunk as my
preferred option; however, this is not the case anymore, and is thoroughly explained here. I have
used both Splunk & ArcSight from an analyst, admin, content creation and engineering role:
Content Creation:
Splunk’s unstructured approach to data ingestion is marvelous for truly unique data sources (home-
brew applications, or really lazy system admins). Yet this presents challenges as it relates to
correlation, which is explained further down. The ability to create simple rules, simple reports, and
simple interactive dashboards is better in Splunk than ArcSight; which is a great confidence booster,
however it’s the ability to correlate or rather the simplicity in which ESM is able to create an
advanced correlated rule (well advanced for Splunk, not so much with ESM) is where ESM truly
shines.
While ESM does employ the java-based console, and while it may look like it’s straight out of 1997,
the same task that in Splunk would require almost 2 paragraphs worth of a query can be
accomplished with ESM in about 7-to-8 clicks of a mouse. Don’t even get me started on active lists
and session lists, along with the ability to have rules dynamically update them. This is why the usage
of filters in ESM is exponentially more powerful than Splunk. I could create a search marco in Splunk
(akin to an ESM filter), but having to remember the name of it and viewing a totally separate web
page, when I could more simply select “Filters” from my Navigator pane in ESM only adds to the
time wasted in creating content with Splunk. Plus, these filters in ESM are able to be used in
interactive dashboards, reports, trends, etc…
Analyst:
The ability to use free-text search “Google-like search capabilities” actually exists in both Splunk and
ESM, albeit ESM’s web interface (aka Command Center) and Logger (which is more similar to
Splunk than ESM). Splunk does pull ahead here, but not IMHO far beyond ArcSight. Also, in ESM if
an item is of interest I can simply right click and investigate or leverage an integration command to
conduct a whole host of activities from a single pane of glass. Since Logger is actually more similar
to Splunk, I would be offering you all a great disservice by not mentioning the speed of super-
indexed fields within Logger. Now I use to not be such a big Logger fan, but if I was stuck in the
trenches of a cyberwar, I would go with Logger hands down simply for the super-indexed search.
Case management is built in with ESM, so the tracking of progress in an investigation is paramount.
Splunk gets ZERO points here. The right-click investigate capability really allows an analyst to follow
the breadcrumbs during either the initial triage stages or even hunting. The integration commands
open up worlds of possibilities for PCAP retrieval, blocking of IPs, kicking off Cuckoo, hell you can
even have an integration command to brew coffee!
Admin:
Both Splunk and ArcSight offer great administration capabilities, and would be evenly split if it
weren’t for one huge fact….you can delete logs in Splunk. Granted it takes an intent to reconfigure,
but this ability completely makes Splunk unworthy of a log management solution. The most
important aspect of any log management solution, is the admissibility in a court of law if necessary.
With this, Splunk fails.
Engineering:
Finally, the title bout. While Splunk does pull ahead in the simplicity (but this is because of its
unstructured data approach), this means there are less moving parts and thus less things to go
wrong. However, with ArcSight Smart Connectors you are able to successfully parse over 400
sources natively. Even if a smart connector hasn’t been designed to incorporate your log source, you
still have flex connectors that are able to ingest, parse, and normalize XMLs, CSVs, JSONs, Syslog,
regex, databases; once that information is ingested and structured, the searching for such data
(especially on super-indexed fields in Logger) is at least 3-5 times faster than Splunk. However,
ArcSight has a product known as ArcMC which is used to maintain, upgrade, and administer not just
all of these smart/flex connectors, but Logger as well. Want to send logs to a new destination in
structured CEF format, just add it. In fact, this is the ONLY way that we were able to get Splunk to
mirror a fraction of ESM’s correlation capabilities. Now, if I need ArcSight to make Splunk work
effectively, why they hell am I going to pay for both?!?!?
Which brings me to cost. After looking at the licensing structure for both, ArcSight’s pulls ahead; this
is due to the duplicitous nature of Splunk’s licensing. I think we can agree that licensing should be
based on what you use as an average, right? Why should I pay for something during my peak
times? What happens if there is a DDoS and my peak data ingest rate skyrockets?!?!? What if my
log collection grows to a metric fμck ton, then I am going to be paying out the wazoo for Splunk. With
ArcSight, at least I can filter and aggregate on my smartconnectors BEFORE it’s ingested. Now I can
streamline my data feeds, get rid of what I don’t want, and aggregate items to use less space and
create higher fidelity alerts. Oh, and I can manage all of that from ArcMC. Just sayin’….
What is Splunk?
Splunk is ‘Google’ for our machine-generated data. It’s a software/engine that can be
used for searching, visualizing, monitoring, reporting, etc. of our enterprise data. Splunk
takes valuable machine data and turns it into powerful operational intelligence by
providing real-time insights into our data through charts, alerts, reports, etc.
It generally works as a remote collector, intermediate forwarder, and possible data filter,
and since it parses data, it is not recommended for production systems.
• Enterprise license
• Free license
• Forwarder license
• Beta license
• Licenses for search heads (for distributed search)
• Licenses for cluster members (for index replication)
If we plan to run a variety of summary index reports, we may need to create additional
summary indexes.
Learn more about Splunk from this Splunk Training in New York to get ahead in
your career!
Can you write down a general regular expression for extracting the IP
address from logs?
There are multiple ways in which we can extract the IP address from logs. Below are a
few examples:
OR
rex field=_raw "(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})"
• When the unique ID (from one or more fields) alone is not sufficient to discriminate
between two transactions. This is the case when the identifier is reused, for
example, web sessions identified by a cookie/client IP. In this case, the time span
or pauses are also used to segment the data into transactions.
• When an identifier is reused, say in DHCP logs, a particular message identifies the
beginning or end of a transaction.
• When it is desirable to see the raw text of events combined rather than an analysis
of the constituent fields of the events.
• Hot: A hot bucket contains newly indexed data. It is open for writing. There can be
one or more hot buckets for each index.
• Warm: A warm bucket consists of data rolled out from a hot bucket. There are
many warm buckets.
• Cold: A cold bucket has data that is rolled out from a warm bucket. There are many
cold buckets.
• Frozen: A frozen bucket is comprised of data rolled out from a cold bucket. The
indexer deletes frozen data by default, but we can archive it. Archived data can
later be thawed (data in a frozen bucket is not searchable).
We should see the hot-db there, and any warm buckets we have. By default, Splunk sets
the bucket size to 10 GB for 64-bit systems and 750 MB on 32-bit systems.
• The stats command generates summary statistics of all the existing fields in the
search results and saves them as values in new fields.
• Eventstats is similar to the stats command, except that the aggregation results are
added inline to each event and only if the aggregation is pertinent to that event. The
eventstats command computes requested statistics, like stats does, but aggregates
them to the original raw data.
$SPLUNK_HOME/etc/system/local/
• In the file, we will have to use the following command (here, in the place of
‘NEW_PASSWORD’, we will add our own new password):
[user_info]
PASSWORD = NEW_PASSWORD
• After that, we can just restart the Splunk Enterprise and use the new password to
log in
Now, if we are using the versions prior to 7.1, we will follow the below steps:
Note: In case we have created other users earlier and know their login details, copy and
paste their credentials from the passwd.bk file into the passwd file and restart Splunk.
It contains seek pointers and CRCs for the files we are indexing, so ‘splunkd’ can tell us
if it has read them already. We can access it through the GUI by searching for:
index=_thefishbucket
Are you interested in learning Splunk from experts? Intellipaat’s Splunk Course in
Bangalore is the right choice!
• In props.conf:
<code>[source::/var/log/foo]
# index processor
TRANSFORMS-set= setnull,setparsing
</code>
• In transforms.conf:
[setparsing]
REGEX = login
DEST_KEY = queue
FORMAT = indexQueue