Splunk for SQL Users      BY CARASSO

A quick concept mapping to get you started

Splunk is a powerful, scalable search engine and data store. If you are familiar with SQL this quick comparison might be helpful to jump-starting you into using Splunk. First off, Splunk is not a database, in the normative sense -- relational databases require that all table columns be defined up-front and they don't automatically scale by just plugging in new hardware -- but there are analogs to many of the concepts in the database world. It's not a perfect mapping, but it should be enough to get you started.

In database terms, Splunk is a distributed, non-relational, semi-structured database with an implicit time dimension.

Concepts

DB ConceptSplunk ConceptNotes
SQL query Splunk search A Splunk search retrieves indexed data and can perform transforming and reporting operations. Results from one search can be "piped" (i.e., transferred) from command to command, to filter, modify, reorder, and group your results.
table / view search results Search results can be thought of as a database view, a dynamically generated table of rows, with columns.
index index All values and fields are indexed in Splunk, so there is no need to manually add, update, drop, or even think about indexing columns. Everything can be quickly retrieved automatically.
row result / event A result in Splunk is a list of field (i.e., column) values, corresponding to a table row. An event is a result that has a timestamp and raw text. Typically in event is a record from a log file, such as:
    173.26.34.223 - - [01/Jul/2009:12:05:27 -0700] "GET /trade/app?action=logout HTTP/1.1" 200 2953
column field Fields in Splunk are dynamically returned from a search, meaning that one search might return a set of fields, while another search might return another set. After teaching Splunk how to extract out more fields from the raw underlying data, the same search will return more fields that it previously did. Fields in Splunk are not tied to a datatype.
database / schema index / app In Splunk, an index is a collection of data, somewhat like a database has a collection of tables. Domain knowledge of that data, how to extract it, what reports to run, etc, are stored in a Splunk app.

From SQL to Splunk

Note:
SQL Feature SQL Example Splunk Example
SELECT * SELECT *
FROM mytable
source=mytable
WHERE SELECT *
FROM mytable
WHERE mycolumn=5
source=mytable mycolumn=5
SELECT SELECT mycolumn1, mycolumn2
FROM mytable
source=mytable
| FIELDS mycolumn1, mycolumn2
AND / OR SELECT *
FROM mytable
WHERE (mycolumn1="true" OR mycolumn2="red") AND mycolumn3="blue"
source=mytable
AND (mycolumn1="true" OR mycolumn2="red")
AND mycolumn3="blue"
AS (alias) SELECT mycolumn AS column_alias
FROM mytable
source=mytable
| RENAME mycolumn as column_alias
| FIELDS column_alias
BETWEEN SELECT *
FROM mytable
WHERE mycolumn
BETWEEN 1 AND 5
source=mytable mycolumn<=1 mycolumn>=5
GROUP BY SELECT mycolumn, avg(mycolumn)
FROM mytable
WHERE mycolumn=value
GROUP BY mycolumn
source=mytable mycolumn=value
| STATS avg(mycolumn) BY mycolumn
| FIELDS mycolumn, avg(mycolumn)
HAVING SELECT mycolumn, avg(mycolumn)
FROM mytable
WHERE mycolumn=value
GROUP BY mycolumn
HAVING avg(mycolumn)=value
source=mytable mycolumn=value
| STATS avg(mycolumn) BY mycolumn
| SEARCH avg(mycolumn)=value
| FIELDS mycolumn, avg(mycolumn)
LIKE SELECT *
FROM mytable
WHERE mycolumn LIKE "%some text%"
source=mytable mycolumn="*some text*"

Note: The most common search usage in Splunk is actually something that is nearly impossible in SQL -- to search all fields for a substring. The following search will return all rows that contain "some text" anywhere.


    source=mytable "some text"
ORDER BY SELECT *
FROM mytable
ORDER BY mycolumn desc
source=mytable
| SORT -mycolumn
SELECT DISTINCT SELECT DISTINCT mycolumn1, mycolumn2
FROM mytable
source=mytable
| DEDUP mycolumn1
| FIELDS mycolumn1, mycolumn2
SELECT TOP SELECT TOP 5 mycolumn1, mycolumn2
FROM mytable
source=mytable
| TOP mycolumn1, mycolumn2
INNER JOIN SELECT *
FROM mytable1
INNER JOIN mytable2
ON mytable1.mycolumn=mytable2.mycolumn
source=mytable1
| JOIN type=inner mycolumn [ SEARCH source=mytable2 ]

Note: Joins in Splunk can be achieved as above, or by two other methods:
  • with the lookup operator to look on additional values given others:
    ... | LOOKUP myvaluelookup mycolumn OUTPUT myoutputcolumn
    source=mytable1
    [
      SEARCH source=mytable2 mycolumn2=myvalue
      | FIELDS mycolumn2
    ]
LEFT (OUTER) JOIN SELECT *
FROM mytable1
LEFT JOIN mytable2
ON mytable1.mycolumn=mytable2.mycolumn
source=mytable1
| JOIN type=left mycolumn [ SEARCH source=mytable2 ]
SELECT INTO SELECT *
INTO new_mytable IN mydb2
FROM old_mytable
source=old_mytable
| EVAL source=new_mytable
| COLLECT index=mydb2

Note: COLLECT is typically used to store expensively calculated fields back into Splunk so that future access is much faster. This current example is atypical but shown for comparison with SQL's command. source will be renamed orig_source
TRUNCATE TABLE TRUNCATE TABLE mytable source=mytable
| DELETE
INSERT INTO INSERT INTO mytable
VALUES (value1, value2, value3,....)
Note: see SELECT INTO. Individual records are not added via the search language, but can be added via the API if need be.
UNION SELECT mycolumn
FROM mytable1
UNION
SELECT mycolumn FROM mytable2
source=mytable1
| APPEND [ SEARCH source=mytable2]
| DEDUP mycolumn
UNION ALL SELECT *
FROM mytable1
UNION ALL
SELECT * FROM mytable2
source=mytable1
| APPEND [ SEARCH source=mytable2]
DELETE DELETE FROM mytable
WHERE mycolumn=5
source=mytable1 mycolumn=5
| DELETE
UPDATE UPDATE mytable
SET column1=value, column2=value,...
WHERE some_column=some_value
Note: There are a few things to think about when updating records in Splunk. First, you can just add the new values into Splunk (see INSERT INTO) and not worry about deleting the old values, because Splunk always returns the most recent results first. Second, on retrieval, you can always de-duplicate the results to ensure only the latest values are used (see SELECT DISTINCT). Finally, you can actually delete the old records (see DELETE).

Further Reading