Splunk, What’s in My Data?!

An absurdly brief document on discovering interesting things

David Carasso

In each of the below examples, replace “…” with whatever search you want to pull events from. If you want to understand a particular source, then the “…” should be replaced with “source=<path to source>”. If you want to understand fatal errors “…” might be replaced with the keyword “fatal”.

Event Overview

Show 5 examples from each cluster, from most common cluster to least:

…| cluster labelonly=t showcount=t

| sort -cluster_count, cluster_label, _time

| dedup 5 cluster_label

Group events by most common first 5 punctuation characters:

…| rex field=punct "(?P.{5})"

| eval smallpunct= "*" + smallpunct

| stats first(_raw) as example count by smallpunct

| sort -count

Field Overview

Show patterns of co-occurring fields. 1.0 means two fields always co-occur:

…| fields - date* source* time*

| correlate

Use 'top' to look at the most common Log_Level by Component...

… | top Log_Level by Component

Alternatively, we can build a contingency table to more conveniently see the patterns:

… | contingency Log_Level Component maxcols=5

The associate command can be used to automatically deduce conclusions:

… | associate Log_Level Component

Transaction Bursts

One of the most useful ways to see transactions, or higher-level events, is to look for pauses in your events, as real-physical events often happen in bursts. Here we see the most common pauses between events:

…| delta _time as timedelta

| top timedelta

Now let's group events into transactions, splitting when there's a pause of more than 2 seconds, and then filtering out all the transactions of size 1:

…| transam maxpause=2s

| search eventcount > 1

Incidentally, transactions of size one are also interesting, as they're often longer running single, higher level commands:

…| transam maxpause=2s

| search eventcount = 1

Anomalies

Very often you want to find “problems” in your IT data, but you don’t know what to look for.

look for events that do not cluster into large groups.:

…| cluster showcount=true

| sort - cluster_count

find unexpected events by finding values that are far from the standard deviation. The delay field here:

… | anomalousvalue delay action=filter pthresh=0.02

unexpected values based on the past historical context

… | anomalies

Once you have searches that find unexpected events, you can set alerts for them. You can also combine events together into ‘transactions’, and look for anomalies in groups of events.

Graphing

One of the most obvious methods of discovering what your data is saying is to simply graph your data. The problem is that most people don’t graph enough. Graph lots and lots of fields against other fields. Delay by time, delay by IP, max(delay) by IP, min(delay) by referrer, count(referrer) by day of week, time of day, delay by state, etc.

Eventually you’ll tweak out little nuggets of knowledge.