We’re doing some pretty exciting stuff in the useMango™ team right now, starting as far to the left as you can, shift-wise, and setting up a full Docker+AWS+Elasticsearch+Reporting+Jenkins+CI+CD stack. Yes, that’s a lot of moving parts but having all those wheels in motion will allow us to:

  • deploy faster
  • debug faster
  • test faster
  • ship faster
  • and finally provide value you *you* faster

However, learning all this and foremost getting it all to work is a bit of a challenge. Not necessarily because any one part of the system is hard to understand or undocumented but because everything has its pitfalls. I’ll now try to present the ones we fell into when setting this up in hope that it might aid someone else in their quest for high quality rapid development (aka being to lazy to manually move zipfiles).

Quick overview

Just to explain a bit what we are talking about here. The stack described (filebeat->kibana) is meant to visualize logs in graphs basically. It pulls logs from any server, parses them, indexes them and then presents them:

  • Filebeat; Reads log files and pushes any updates to a target based on configuration
  • Logstash; Accepts events, filters/modifies them and pushes to a new source
  • Elasticsearch; Keeps logged events and indexes
  • Kibana; Pulls data from Elasticsearch and applies searches and visualizations on them

Filebeat

Setting up filebeat is easy enough supposedly, you just download the zip file, unpack and run the install powershell or do the apt-get. Problem comes when you want to configure it. Filebeat doesn’t really tell you much when it doesn’t work. So here’s my suggestions:

  • Run filebeat in interactive mode to start with, not as a service
  • Make use of the -configtest flag to make sure your config compiles at all
  • Make sure you log the outputs with low level to see what’s actually happening (most logging is on info or dbg level)

A few other observations I made along the way:

  • Windows paths do not support environment variables, they need to be expanded manually, like %temp%\logs should be C:\users\<username>\AppData\local\temp
  • hosts: parameter only wants <hostname>:<port> no protocol/uri
  • logging\files\path config value is relative
  • be sure to set the loggin\level to debug to begin with

Here is an example of how I set it up:

https://gist.github.com/ddikman/118a7481d3a4df13e941

Setup instructions for filebeat

Logstash

Probably before trying to get your filebeat to work you’ll want to setup Logstash. As mentioned above logstash is kind of filter/proxy in between your service and the Elasticsearch server. It specifies input sources, such as listening on http or filebeat, filters to apply on the incoming events and then outputs to send the processed events to.

Logstash is easily hosted using docker, in my case I had a configuration file hosted in a folder on our server which I made available to the docker container using a shared volume and then using that configuration file in the docker run command:

https://gist.github.com/ddikman/1f4a5dd7f18e7861a7ff

The nice thing with the above is that you really don’t have to bother with anything else, docker just pulls the latest image from docker hub and it’s up and running, aside from the configuration of course, in which I noted:

  • the input config is really simple enough, if using it with filebeat it’s a beat input instead of a tcp though, remember to forward the port in docker and make it available in AWS/the host
  • the filter can be tricky to get right and I used a lot of docker logs –tail=50 <container name> to track down something that would parse properly
  • Grok constructor is almost a must to use before you can get your grok match running correctly, it lets you test your grok on some actual lines of log as well
  • remember to overwrite the message unless you want to get the original message in the event logged to Elasticsearch
  • the date plugin allows you to overwrite @timestamp, you’ll most likely want this as it will index your event by the time it was logged instead of when it was sent to Elasticsearch. I believe it uses jodatime for parsing the date you give it and I had some issues since it didn’t support ZZZ, just a single Z worked fine though
  • With the Elasticsearch output, same thing goes as with the filebeat config, <hostname>:<port>, no url or protocol. Use the index to specify your own index instead of getting everything logged as logstash which is the default

Here’s our parse configuration:

https://gist.github.com/ddikman/2935135e95c258b01f2e

Elasticsearch & Kibana

I won’t go into details on this since both AWS and Elasticsearch themselves offer cloud hosted solutions for this package. The only thing I would like to mention is that it can sometimes be difficult to find your new index or verify that it’s come in since you configure the index in Kibana without seeing which you’ve got to choose from.

In Elasticsearch host: open up the Kopf that comes bundled, that will show you which indices have been created. If you’re index isn’t there somethings wrong in the event posting.

In AWS: you can find the indices under a tab in the Elasticsearch cluster view

 

Thats it for this time. Although we haven’t run it much yet this stack really seems powerful in terms of possibilities, you can basically graph anything in one place given the right configuration. My only consideration is that the configuration will have to change in logstash when you want to add some additional data source, a UI/web interface/API to control this would be handy.