Splunk for SQL Users BY CARASSO

A quick concept mapping to get you started

Splunk is a powerful, scalable search engine and data store. If you are familiar with SQL this quick comparison might be helpful to jump-starting you into using Splunk. First off, Splunk is not a database, in the normative sense -- relational databases require that all table columns be defined up-front and they don't automatically scale by just plugging in new hardware -- but there are analogs to many of the concepts in the database world. It's not a perfect mapping, but it should be enough to get you started.

In database terms, Splunk is a distributed, non-relational, semi-structured database with an implicit time dimension.

Concepts

DB Concept	Splunk Concept	Notes
SQL query	Splunk search	A Splunk search retrieves indexed data and can perform transforming and reporting operations. Results from one search can be "piped" (i.e., transferred) from command to command, to filter, modify, reorder, and group your results.
table / view	search results	Search results can be thought of as a database view, a dynamically generated table of rows, with columns.
index	index	All values and fields are indexed in Splunk, so there is no need to manually add, update, drop, or even think about indexing columns. Everything can be quickly retrieved automatically.
row	result / event	A result in Splunk is a list of field (i.e., column) values, corresponding to a table row. An event is a result that has a timestamp and raw text. Typically in event is a record from a log file, such as: `173.26.34.223 - - [01/Jul/2009:12:05:27 -0700] "GET /trade/app?action=logout HTTP/1.1" 200 2953`
column	field	Fields in Splunk are dynamically returned from a search, meaning that one search might return a set of fields, while another search might return another set. After teaching Splunk how to extract out more fields from the raw underlying data, the same search will return more fields that it previously did. Fields in Splunk are not tied to a datatype.
database / schema	index / app	In Splunk, an index is a collection of data, somewhat like a database has a collection of tables. Domain knowledge of that data, how to extract it, what reports to run, etc, are stored in a Splunk app.

From SQL to Splunk

Note:

In the below examples, we use the value of the Splunk field "source" as a proxy for "table". In Splunk, "source" is the name of the file, stream, or other input from which a particular piece of data originates, for example /var/log/messages or UDP:514. In truth, we could use any field, but "source" feels most appropriate for these examples.

When translating from any language to another, often the translation is longer because of idioms in the original language. Needless to say, some of the Splunk search examples shown below could be more concise, but for parallelism and clarity, the table and field names are kept the same from the sql. In addition, searches rarely need the FIELDS command to filter out columns as the user interface provides a more convenient method; and you never have to use "AND" in boolean searches, as they are implied between terms.

SQL Feature	SQL Example	Splunk Example
SELECT *	SELECT * FROM mytable	source=mytable
WHERE	SELECT * FROM mytable WHERE mycolumn=5	source=mytable mycolumn=5
SELECT	SELECT mycolumn1, mycolumn2 FROM mytable	source=mytable \| FIELDS mycolumn1, mycolumn2
AND / OR	SELECT * FROM mytable WHERE (mycolumn1="true" OR mycolumn2="red") AND mycolumn3="blue"	source=mytable AND (mycolumn1="true" OR mycolumn2="red") AND mycolumn3="blue"
AS (alias)	SELECT mycolumn AS column_alias FROM mytable	source=mytable \| RENAME mycolumn as column_alias \| FIELDS column_alias
BETWEEN	SELECT * FROM mytable WHERE mycolumn BETWEEN 1 AND 5	source=mytable mycolumn<=1 mycolumn>=5
GROUP BY	SELECT mycolumn, avg(mycolumn) FROM mytable WHERE mycolumn=value GROUP BY mycolumn	source=mytable mycolumn=value \| STATS avg(mycolumn) BY mycolumn \| FIELDS mycolumn, avg(mycolumn)
HAVING	SELECT mycolumn, avg(mycolumn) FROM mytable WHERE mycolumn=value GROUP BY mycolumn HAVING avg(mycolumn)=value	source=mytable mycolumn=value \| STATS avg(mycolumn) BY mycolumn \| SEARCH avg(mycolumn)=value \| FIELDS mycolumn, avg(mycolumn)
LIKE	SELECT * FROM mytable WHERE mycolumn LIKE "%some text%"	source=mytable mycolumn="some text" Note: The most common search usage in Splunk is actually something that is nearly impossible in SQL -- to search all fields for a substring. The following search will return all rows that contain "some text" anywhere. source=mytable "some text"
ORDER BY	SELECT * FROM mytable ORDER BY mycolumn desc	source=mytable \| SORT -mycolumn
SELECT DISTINCT	SELECT DISTINCT mycolumn1, mycolumn2 FROM mytable	source=mytable \| DEDUP mycolumn1 \| FIELDS mycolumn1, mycolumn2
SELECT TOP	SELECT TOP 5 mycolumn1, mycolumn2 FROM mytable	source=mytable \| TOP mycolumn1, mycolumn2
INNER JOIN	SELECT * FROM mytable1 INNER JOIN mytable2 ON mytable1.mycolumn=mytable2.mycolumn	source=mytable1 \| JOIN type=inner mycolumn [ SEARCH source=mytable2 ] Note: Joins in Splunk can be achieved as above, or by two other methods: with the lookup operator to look on additional values given others: ... \| LOOKUP myvaluelookup mycolumn OUTPUT myoutputcolumn with subsearches, searches that run other searches: source=mytable1 [ SEARCH source=mytable2 mycolumn2=myvalue \| FIELDS mycolumn2 ]
LEFT (OUTER) JOIN	SELECT * FROM mytable1 LEFT JOIN mytable2 ON mytable1.mycolumn=mytable2.mycolumn	source=mytable1 \| JOIN type=left mycolumn [ SEARCH source=mytable2 ]
SELECT INTO	SELECT * INTO new_mytable IN mydb2 FROM old_mytable	source=old_mytable \| EVAL source=new_mytable \| COLLECT index=mydb2 Note: COLLECT is typically used to store expensively calculated fields back into Splunk so that future access is much faster. This current example is atypical but shown for comparison with SQL's command. source will be renamed orig_source
TRUNCATE TABLE	TRUNCATE TABLE mytable	source=mytable \| DELETE
INSERT INTO	INSERT INTO mytable VALUES (value1, value2, value3,....)	Note: see SELECT INTO. Individual records are not added via the search language, but can be added via the API if need be.
UNION	SELECT mycolumn FROM mytable1 UNION SELECT mycolumn FROM mytable2	source=mytable1 \| APPEND [ SEARCH source=mytable2] \| DEDUP mycolumn
UNION ALL	SELECT * FROM mytable1 UNION ALL SELECT * FROM mytable2	source=mytable1 \| APPEND [ SEARCH source=mytable2]
DELETE	DELETE FROM mytable WHERE mycolumn=5	source=mytable1 mycolumn=5 \| DELETE
UPDATE	UPDATE mytable SET column1=value, column2=value,... WHERE some_column=some_value	Note: There are a few things to think about when updating records in Splunk. First, you can just add the new values into Splunk (see INSERT INTO) and not worry about deleting the old values, because Splunk always returns the most recent results first. Second, on retrieval, you can always de-duplicate the results to ensure only the latest values are used (see SELECT DISTINCT). Finally, you can actually delete the old records (see DELETE).

Splunk for SQL Users BY CARASSO

A quick concept mapping to get you started

Concepts

From SQL to Splunk

Further Reading