Mining Logs with Splunk
There’s a lot of information floating around in a network, information that’s being written into log files every minute of every day. Those log files that contain everything you need to know about how the systems you’re managing are running, and how they’re responding to the demands of their users.It doesn’t matter if those machines are desktops or if they’re servers, they’re all collecting log files – log files which are rarely, if ever, read.
Sure, we often look at them after there’s been a problem. Logs make excellent diagnostic tools, with plenty in them to help you find out what went wrong and when. Unfortunately, though, those logs are often in many many different places, and many many different formats. There are binary logs, text logs, databases, syslog logs, Microsoft logs, Apache format logs; the list goes on. Problems rarely happen in isolation, and you’ll end up with a copy of Notepad or vi, searching through megabytes of data for the one clue that can help fix a problem.
Things get even worse when we’re using logs to track the trail of malware or an intruder into a network. These days the foot prints of a black hat are light, slow, well-spaced touches that barely register in one log file, let alone the logs of a distributed set of Web applications. It’s near impossible to extract what you need to find an intruder, when you have to do all your searches by hand (or using a copy of grep).
One answer to the problem is Splunk, a product that’s advertised as a tool for IT search. It’s one of those ‘does what it says on the tin’ products, an easy-to-install and easy-to-run IT log file search engine that can also plug into system-level event management tools. The latest version builds on earlier releases, with tools for delivering reports to staff at all levels in an organisation. You might want detailed server information, but your clients might only want to know that their ecommerce system is running inside its SLA, or that email is being delivered on time. Splunk’s reports and searches can be saved and reused, and you can even build them into recurring reports that keep you updated.
Installation is quick and easy, and there are versions for most common operating systems, including 32- and 64-bit versions for Windows, Linux, FreeBSD and Solaris, as well as installers for OS X and AIX. Download your choice of installer from www.splunk.com/download?r=header. Each install is a standalone Splunk system, able to search and index the log files on one server or desktop machine. There aren’t many differences between the free version and the licensed alternatives; the main one is the number of simultaneous searches you’re able to run.
Installing Splunk 4
The latest release, Splunk 4, introduces a new concept: Apps. Splunk apps aren’t really applications; they’re collections of searches, queries and reports that deliver specific answers. There’s one for managing Windows, which pulls together WMI queries with Windows log files to give you a view of what your machines are doing. There are similar tools for working with UNIX and Linux, as well as apps for Cisco networks and F5 appliances. It’s also easy enough to build your own apps, so you can have one that reports on client Web servers, another for file and print servers and so on. Splunk apps can deliver reports directly to users, so they can see if their systems are meeting SLAs. Windows installs include the Windows app.
Once installed on a Windows server, Splunk 4 starts its own Web server on port 8000, and opens a browser to show its UI. You can use any Web browser, as long as it supports Flash. Many of the graphing and visual reporting components in Splunk are written in Flash, giving it a clean and well-designed feel. The Windows Splunk install includes the Windows app, along with a getting started tool that helps you build your own searches and reports, and then package them up for use with other Splunk installations. The Getting Started app is also where you can define your first set of data sources, including the log files you want to index and search, as well as live system data providers, like WMI (the Windows Management Instrumentation APIs used by Microsoft’s management tools), and specific network ports to monitor for diagnostic information. Splunk can also be used as a tripwire, checking key system files for changes.
The search app gives you get quick access to the indexes you’ve built, and you can write and test out queries. Queries can be built into dashboards, and these can be shared between your team members – or even with a client. It’s easy enough to add new data sources: click the Manager link, and then navigate to Data Inputs. Pick the sources you want to index, whether they’re files, network connections, or WMI collections. There’s even support for monitoring the Windows Registry, or working with your own monitoring scripts.
The Splunk Manager
It’s worth spending a lot of time in the Splunk Manager, as you can use it to set up email alerts for specific problems, as well as to handle searches across several Splunk installs, so you can compare the status of different customers if you’re worried about bugs or spreading malware. You can use searches you’ve already created to add new reports and alerts, using the Manager. Clone searches and you can use them to create more focused searches and different report types. You can also build macros around searches, and design your own navigation menus, something you’ll need to do if you’re providing reporting apps to your clients. Splunk’s tools also let you control its users, giving them user names, passwords, and, most importantly, can also manage access to features and reports.
Splunk isn’t only for one server – or one desktop. It’s at its best when it’s part of a network of Splunk instances, with one or more machines working with data from several servers, analysing and reporting on the elements that make up a business process or handle key functions for a client’s business. The latest release also includes tools for multi-tenancy, so you can bring in results from several different sites and customers (with the appropriate licenses).
Forwarding and Receiving
Splunk gives you two options for forwarders: a standard forwarder with the full Splunk Web interface, which delivers what the Splunk folk call ‘cooked’ fully processed data, and a light forwarder with no Web interface, and only ‘seared’ partially processed data. We’d recommend using the light forwarder in most cases, when you don’t need to drill down into logs on a managed machine (for Web servers, databases, and the like). Leave the Web interface running on servers that need more monitoring (like main domain servers and core file servers), as it comes in handy when you’re working on site.
First, choose the system you want to use as a report and analysis client, and set it up as a receiver. Use the management screens to enable the receiver function, simply by giving the machine a listening port. The suggested 9997 port works well in most networks, but make sure that it doesn’t conflict with any other applications. You also need to make sure that the port is open in any firewalls you’re using, as Splunk does not automatically open the ports for you.
To set up a forwarder, use the administration screens to add the fully qualified name of the receiver and the port number you specified earlier. Forwarders can deliver log data to more than one receiver, so you can give your clients their own report servers, at the same time as delivering logs to your own machines for deeper, proactive management searches. Forwarders can also be receivers, so for more complex architectures you could have a Splunk server that dealt with ecommerce functions, another dealing with back office systems, and a third collating information from those Splunk instances, giving you an overview for performance reports and places to drill down into specific issues.
Once you’ve installed Splunk, you’ll wonder how you ever did without IT Log files that would have rolled over and been deleted are now useful diagnostic tools. You can monitor historical and live data in the same application, and proactively respond to issues before your users even know that anything is wrong. The key to success with Splunk is quite simple: use it. Logs that aren’t analysed or used are simply data that’s taking up disk space. It’s only once you start to use them that they become information – and only then can you use them to add value to your clients’ networks.
Put logs through the Sawmill
Splunk isn’t the only log file analysis and search tool on the market. One alternative is Sawmill, which can analyse many different log file formats. Originally intended for Web site analysis, Sawmill is now suitable for many more purposes, including handling regulatory compliance. Available in Lite, Professional and Enterprise versions, Sawmill can handle over 800 different log file formats.
If you’re only working with Windows systems, then Microsoft System Center Operations Manager 2007 (and the SME-focused System Center Essentials 2007) are worth considering. As well as providing deep analytics of your Windows logs, SCOM will proactively manage servers and desktops, pushing out updates and handling automatic trouble shooting through management packs. Some of Microsoft’s hardware and software partners also provide their own management packs, giving you a lot more control than you might expect.
One of the most common log file formats is the good old UNIX syslog, and many devices and appliances can automatically ship their syslog files to a syslog server somewhere on a network. Of course you can use Splunk for this, but if you’re just gathering up log files for later analysis, then the Kiwi Syslog Server is well worth a look. It’ll gather logs from all over a network, and includes tools to build rules that can alert you if any problems occur.
Microsoft Log Parser 2.2
If you want to look at logs on a Windows Server, Microsoft’s free Log Parser is a excellent way to quickly work your way through megabytes of text information
Splunk Windows Installation guide
Part of Splunk’s hefty library of webcasts, this installation guide will get you started in no time
Free licence for 500MB/day, enterprise licences POA
Lite starts at £75, Professional at £145, and Enterprise at £500
Kiwi Syslog Server