Simple Analytics with Elasticsearch and Kibana

I’ve recently replaced Splunk Storm with Elasticsearch and Kibana to analyze usage data for my game Wordismic.

Splunk is great, but Elasticsearch is fully open source and works equally well in this case. So here I’ll show you how to get started.

The Data

The data for this example comes from a social game, but a similar approach can work for many applications. Whenever a player performs an interesting action, like invite a friend to play, finish a game, etc. the servers log an event in a simple JSON format like:

{"event":"player_challenged_friend","timestamp":"2014-12-01T17:16:39.200","player_id":"39cdf2bd4970416a81984389452a450c","opponent_id":"6630666ce8cc47989acb4fd51a22bbae","game_id":"9396479ff22742bcb33c5f64de021df9"}

or

{"event":"player_finished_game","timestamp":"2014-12-01T17:23:15.391","player_id":"dbedc7ce9bc9429ab4780fd8452eb31f","game_id":"8ff8293a4d7740418513d23a8ec176c4","board_id":"487","score":"294"}

The interesting thing to note is that the only two fields guaranteed to be present are timestamp and event. The other fields vary depending on the event type, so the data is (for the most part) schemaless.

The details of how exactly events are logged and collected are not important here, the point is that we have some data stored in flat files (once every day/hour/other time period) where each line is a JSON object as shown above.

(It would also be possible to log events directly to Elasticsearch, but storing data in flat files first provides decoupling, making it easier to switch tools later – which is exactly what I did when swapping Splunk Storm with Elasticsearch.)

Importing into Elasticsearch

Assuming Elasticsearch (I’m using v1.4.2) has been set up and is running locally, first of all we need to create an index where to store our events. This can be done with a POST request to the API, for example from the command line:

curl -X POST --data-binary @- localhost:9200/wordismic <<EOF
{
  "event": {
    "_timestamp": {
      "enabled": true,
      "path": "timestamp"
    }
  }
}
EOF

The meaning of this request is: create a new index called wordismic and enable automatic timestamp indexing for documents of type event using their timestamp field.

Now we want to load a file containing multiple events. This can be achieved using the Bulk API, but requires some preprocessing on our files. Basically, the Bulk API expects each line of data to be preceded by another line containing the action (index in our case) to be performed.

So we need to convert

{"event":"player_challenged_friend","timestamp":"2014-12-01T17:16:39.200","player_id":"39cdf2bd4970416a81984389452a450c","opponent_id":"6630666ce8cc47989acb4fd51a22bbae","game_id":"9396479ff22742bcb33c5f64de021df9"}
{"event":"player_finished_game","timestamp":"2014-12-01T17:23:15.391","player_id":"dbedc7ce9bc9429ab4780fd8452eb31f","game_id":"8ff8293a4d7740418513d23a8ec176c4","board_id":"487","score":"294"}
...

into

{"index":{"_index":"wordismic","_type":"event"}}
{"event":"player_challenged_friend","timestamp":"2014-12-01T17:16:39.200","player_id":"39cdf2bd4970416a81984389452a450c","opponent_id":"6630666ce8cc47989acb4fd51a22bbae","game_id":"9396479ff22742bcb33c5f64de021df9"}
{"index":{"_index":"wordismic","_type":"event"}}
{"event":"player_finished_game","timestamp":"2014-12-01T17:23:15.391","player_id":"dbedc7ce9bc9429ab4780fd8452eb31f","game_id":"8ff8293a4d7740418513d23a8ec176c4","board_id":"487","score":"294"}
...

and this simple shell script can take care of that:

while read LINE; do
  echo '{"index":{"_index":"wordismic","_type":"event"}}' >> events-bulk.json
  echo $LINE >> events-bulk.json
done <events.json

Once we have the file in the right format we can finally import it into Elasticsearch with

curl -X POST --data-binary @events-bulk.json localhost:9200/_bulk

Visualizing with Kibana

Kibana (I’m using v3.1.2) is essentially a web-based UI for Elasticsearch. The initial setup is straightforward, but we also need to add the following lines to config/elasticsearch.yml and restart Elasticsearch otherwise Kibana won’t be able to connect to it:

http.cors.enabled: true
http.cors.allow-origin: http://localhost:8080

The first time you connect to Kibana on http://localhost:8080 it displays a welcome screen with some useful instructions. Clicking on Sample Dashboard will create an initial dashboard that we can then customize for our needs.

Clicking on the cog icon in the top right corner brings up the Dashboard Settings where we can enter our own title in the General tab, change the Default Index to wordismic in the Index tab, and the Time Field name to timestamp in the Timepicker tab. It’s then a good idea to save this Dashboard with a custom name, and maybe set it as the default one with Save as Home.

At this stage we can already start running some useful queries. Entering event:player_finished_game in the query string for example allows us to see how many games have been played overall, then changing the Time filter in the top bar to Last 7d or Last 24h will limit the count to the selected time period.

However it would be much nicer to have some sort of graph so that when we look at the last 30 days we also see a histogram plotting the event count for each day. That can easily be done by adding a panel of type histogram, again configuring the Time Field to match the data i.e. timestamp, and then resizing the panel and its row to make it more visible. Just remember to save the dashboard again to preserve your customizations.

After rearranging the panels a little bit, here’s what it looks like:

Kibana-Wordismic

(Hey! Looks like less people than usual were playing games on 2014-12-25. I wonder what’s so special about that day…)

As you can see, it’s not difficult to set up your own analytics tool based on Elasticsearch. Once you get started you can analyze your data with flexible queries, the Kibana UI is easily customizable and you have full control over it.

4 thoughts on “Simple Analytics with Elasticsearch and Kibana

    • mirkonasato says:

      It’s not a publicly available dataset, they’re events from a game I published. But it’s just an example, you can adapt it to your own data.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s