Why I dislike Third Party Database Monitoring Tools

I have to admit I am not a fan of the various 3rd party monitoring tools (after using a few of them in the past 10 years) mostly because of the following two reasons:

1. Lack of operation productivity. What I see from various products is just a common pattern: alerts received –> a GUI opened –> a few clicks to drill down to the “root cause”

In an environment with a few hundred sql server instances, how can you expect a DBA to have multiple clicks to just figure ONE issue on ONE instance each time?

2. Big  TCO (Total Cost of Ownership) as most tools are priced on the number of sql server instances that the tool is monitoring. When you have a few hundred sql instances to monitor, the cost is usually prohibitive and can make a DBA feel bad that s/he cannot contribute to the company’s bottom line.

3. Compromising my capability to design a customized / flexible monitoring framework.

All current DBA monitoring tools are “closed” system, i.e. they generate data and then consume the data internally. For example, they can collect the CPU usage of a session, and then present the data via GUI or in various reports (with PDF, HTML etc format). However, there is no easy way for me to consume the raw data directly, say, if I run the same query in two different servers with different # of CPU, and I want to compare the CPU usage by assigning a factor parameter to one server’s data to level the competition ground, there is no way for me to do this with current monitoring tools.

In my observation, if a company has one or two capable DBAs, usually the company will not buy any 3rd party monitoring tools because in theory, all the functions from the monitoring tools can be made in-house and more than often, you do not need all functions from a monitoring tool (just like you do not use every feature of Windows Office, you probably only use 20% of the total features).

But I have to say all the monitoring tools have their merits and values, it is just the philosophy of making these tools that I probably do not agree with. To me, all these tools put overly unnecessary focus on GUI, the fancy reports, the multiple drill-down clicks to the “so-called” root of the issues and the complex configurations for different alerts etc, and all these mean the tools are more suitable for reporting to managers what  goes wrong instead of for DBAs to add extra values for the company. For those capable/responsible DBAs, they are more willing to create their own customized solutions instead of relying on a tool that only meets 20% of their targets while paying 100% price.

If I were an architect for a monitoring tool, I’d like to design a tool with only one goal

Put productivity as #1 priority for the tool’s existence.

With this principle in mind, I will provide a total library of functions for monitoring, and let the DBA (or better DBAA, i.e. DataBase Admininstration Architect) to develop the customized monitoring framework using the library. I will get rid of the GUI, and just provide a command-line interface (CLI).

I’d argue GUI is never a productive interface for repeating work (like database performance monitoring / analysis). If you look at Microsoft in server administration field, MS once won big praise for its GUI based administration (vs the traditional Unix style server administration), however, MS started to promote Powershell  in the last couple of years as both a tool and an interface to manage servers. Powershell in essence is a command line tool, which is efficient and easily scalable to handle big and complex environment. Why did MS adopt this approach? To me, it is quite possible that MS realizes that GUI is not efficient when you need to manage large scale of servers, and that’s why CLI is preferred.

I’d like to use an example here to illustrate why I think a CLI is better. I guess every SQL Server DBA should know about the famous sp_whoisactive tool created by Adam Machanic. To me, this is a perfect tool, which I can utilize in my total monitoring framework, esp. during the trouble-shooting period. I can use it in almost all scenarios, here are a few scenarios:

1. When my main monitoring thread finds tempdb data space usage is above the threshold, the main monitoring thread can call the sp_whoisactive (with customized parameters) and dump the info to a table, and another notification thread will read from the table, and send a well-formatted html email to me (or I can let the notification thread to dump info into SSRS service)

2. When my main monitoring thread finds high CPU / long blocking / long running sessions / high log space usage / … , the sp_whoisactive is called and the same subsequent steps will be followed.

Of course for a full scale monitoring tool library, we can add lots of features with different sets of functions (which can be implemented via CLR), such as disk space monitoring, memory/cpu monitoring, and many many more stuff that current GUI provides.

So in short, I want a 3rd party tool to be only a fact data provider whose consumer should be decided / created by DBAs. From a more abstract way, I’d argue if you want a tool that DBA will like, the tool should facilitate the DBA to add more values to the DBAs’ own work with their own imaginations (not to replace a DBA’s imagination), just like sp_whoisactive which is such a basic unit that I can easily embed it into my various monitoring frameworks without compromising my imagination to design the framework as a whole.

PS: My opinion may be biased as I have not researched all monitoring tools so I’d like to hear there are CLI functions in a monitoring tool. If so, please let me know and I will be more than glad to test / review it.

About these ads
This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Why I dislike Third Party Database Monitoring Tools

  1. way0utwest says:

    You make some good points and I agree with some of them, but I don’t think sweeping statements like “all monitoring tools are “closed” system” is accurate. There are quite a few monitoring tools that store their data in a SQL Server database, and it can easily be reported on or consumed by other processes. Even some of the large scale tools, like Patrol or Unicenter have files and data you can import.

    The GUI is a problem, but typically you don’t have an issue across hundreds of instances. If you do, you have a fundamental problem. Drilling in to find information on one issue is usually on one instance.

    There are comparison problems across instances and across multiple counters, but that’s an interface design issue, or a lack of using the proper tools (queries) instead of a GUI to find issues.

    Disclosure: I work for Red Gate Software, maker of SQL Monitor.

  2. jeff_yao says:

    Thanks Steve for your comment. I know most of the monitoring tools (for sql server) store their data in SQL server databases, but the problem is there is no (or few) documentation to assist me to interpret what is stored there, even less any APIs to allow me to consume the data stored in the database(s). That’s why I call those tools as “closed” system, i.e. the data is mainly for the tools itself instead of for DBAs.

    One more thing, I tend to disagree with this statement “but typically you don’t have an issue across hundreds of instances If you do, you have a fundamental problem.” It really depends on what you define as an issue. If you say, I want to get alerted if any running sql statement goes beyond 1 second (or less), or something like if a blocking lasts more than 1 second, I need to be get alerted. Then from one week’s perspective, you may get alert across all servers for that type of alerts. But in short, my point is, do not build a tool that requires a DBA to use multiple clicks to find the “root cause”. If so, the tool’s scalability is very poor. From operation efficient perspective, one less click means one second more efficiency.

  3. way0utwest says:

    I think you’re still not thinking this through. If you get alerts on blocking (though 1sec seems rather low to me) across multiple servers, there is no “root cause” across multiple instances, or there very rarely is. If that’s the case, you still need to go examine data for each instance, which means whether you run queries or “click”, you are digging through one instance. Almost no tool or query is going to just figure everything out for you and get you there in one query or execution. Even well written scripts usually require you to perform multiple executions or steps to find a root cause.

  4. jeff_yao says:

    Ok, say I am an architect of a monitoring tool, and also say we have got alerts for blocking issue across N servers (let’s say the blocking threshold is x seconds). Here is my design of a CLI command to get all blocking info across all these servers, I’ll use a pseudo powershell format

    Get-BlockingInfo -ComputerName server1, server2, …, serverN, -Latest 1

    Interpretation:
    Get-BlockingInfo will return the blocking info on the monitored servers collected in the central database
    -ComputerName Server_List , puts the monitored server list you want to get the block info
    -Latest M, the latest M occurrences (if there are) of the blocking info

    I can add a few more parameters to this Get-BlockingInfo, such as a -Path parameter to dump the info into -Path, and a -Format parameter to indicate whether the info will be in HTML / XML / CSV / PDF etc.

    I guess this type of CLI command is much simpler than multiple “clicks” via GUI, and it should not be difficult to implement either, because we have the central collection database with all the raw data there.

    My preferred design principal of any tool is that the architect itself should go through logical Stress Test first before it even goes into the coding phase.

    I hope I make myself clear. :-)

  5. In my observation, even a company that has 1-2 good DBAs will still want/need to purchase some tools in order to effectively monitor their environment.

    I worked in a mid-size shop, with 181 instances of SQL Server (and a few Sybase) and over 3,000 databases for two FTE DBAs and two contractors. We needed a combination of tools in order to get the job done.

    No, we could not afford to buy a 3rd party tool at $2k/instance. So we used OpsMgr in order to help with some base items. We also leveraged OpsMgr to help with pieces of a home grown solution.

    For the mission critical servers, however, we bought 3rd party tools. The reason was because most tools were more robust than anything we were going to build. When we added up the cost of trying to build a complete solution ourselves it was just cheaper for us to buy rather than build.

    Not every shop is the same. When we started and we only had six instances we never would have bought a tool even though that was when it was most affordable. We hit a tipping point at about 100 instances. That’s when we needed to start thinking about having a blend of home grown and purchased tools.

    HTH

    • jxhyao says:

      @SQLRockStart, I cannot disagree with your opinions because in my environment, we do have microsoft SCOM monitoring tool, it helps in someway. But for a pure Database monitoring tool, at least, as of now, I have not seen a tool that can make my eyes wide-open and cannot wait to have it, on the contrary, after some test, I’d rather build my own. :-)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s