Database Administration KPIs – 1/2

 

As a DBA, we may all have this question: How should my work be evaluated objectively? or in another way, what criteria would I hope to be used by others (my manager / clients / peers etc) to evaluate my work?

In DBA world, we love “no news is good news” and as such, to me, the best DBA (/team) should be a person (/team) who is never seen in any news/spotlight but holds an irreplaceable position in an organization’s IT hierarchy.

Though this qualitative description gets nodding from all DBAs I have worked with, it can hardly be useful to non-DBA stakeholders, like HR/senior management, because intangible description cannot be used for performance evaluation, which must be based on facts, the quantitative and verifiable facts.

So I think a set of quantitative KPIs will help out here. But to design these KPIs seems not easy because most DBA work has dependencies on external factors (such as network/hardware/business policies etc) controlled by others, and we do not want the KPIs to be unable to reflect DBA’s own and sole efforts.

With some struggle and thoughts, I have come up with the following KPIs which I believe are usable in most cases:

Avg. Database Availability Time (longer, better): The average time a sql server service is running between two unplanned downtime windows. (Planned downtime can be excluded here)

This KPI, strictly speaking, is not necessarily the full responsibility of DBA team as any unexpected network/hardware issue can cause the database unavailable. But there are many items that a DBA (/team) can control, such as: one db log file growing out of control, or tempdb growing out of disk or bad CLR stored proc causing a mini-dump and sql service restart etc. So this KPI actually aims to indicate DBA efforts to maintain the internal health of the sql server system, and to avoid any events that may cause the system’s unavailability. At the bottom line, this KPI will request the DBA to make the unplanned to be planned, for example. send early warnings to stakeholders, if something is unavoidable, at least we can minimize the shocks to end users and help them to make a decent “exit”. 

 

Avg. Database Usability Accidents (lower, better): The accidents that cause the users unable to use the database system.

Still this may not be the full responsibility of DBA as these accidents can come from lots of things like bad query of 3rd party applications or bad hardware etc. This KPI mainly addresses database accidents that, if intervened early, can be prevented to impact the database usability from end-user perspective, let’s say long-running reporting queries blocking user’s regular OLTP application session, or user session timeout due to continued log file growth because of the small database file growth configuration (like 1 KB), etc.

 

Avg. Daily Workload/Person(higher, better): Avg daily dba service requests fulfilled per DBA  

We can use # of service requests as a starting point, but initially, we could categorize the common DBA service tasks (such as refresh an environment, install a sql server instance, deploy a script etc) and assign a payload factor to each category, so we can have a meaningful workload indicator. I think four categories, i.e. tiny, small, medium and big, should be a good starting point.

 

Avg. Service Fulfillment Time (shorter, better): this KPI should be calculated as workload unit (a unit of workload can be considered as one DBA work hour, this may includes the real work time plus the necessary documentation / communication time) completion time, for example a SQL Server instance setup task is worth 5 units, while granting a user with read-only access may count for 0.5 unit. so this KPI can indicate to stakeholders how efficiently /responsively a DBA works.

DBA Duty Volume: information about # of servers/instances/dbs/total db sizes.

This KPI is a good indicator for DBA responsibility scope, i.e. the number of database administration “objects” and their sizes.

Although these quantitative KPIs are useful to non-DBAs, I still feel there should be some qualitative KPIs that are useful to DBAs. In another word, these qualitative KPIs are used by DBAs to evaluate DBAs. For example, as a DBA, if I were asked to evaluate another company’s DBA work, I could use these qualitative KPIs as a starting point to do my evaluation. (I will come up with a list of quantitative KPIs in my next post.)

So far, I have not worked in an environment that a DBA team needs to be evaluated against a set of KPIs, and it would be very interesting to hear stories from anyone who has such experience, things like how to maintain (i.e. collect, calculate, report) these KPIs, whether there are any pros and cons etc.

About these ads
This entry was posted in Operation Productivity and tagged . Bookmark the permalink.

7 Responses to Database Administration KPIs – 1/2

  1. Matthew says:

    DBA Duty Volume, as you have called it is not really a KPI in itself. The numbers and size of databases/servers etc is not something you want DBA’s trying to increase anymore than office size is a KPI for CEO’s.

    • jxhyao says:

      I guess it really depends. In most environments, the # of servers/databases are not controlled by DBAs but by business applications. As for the size of databases, we can use the actual used spaces of dbs.
      But no matter what, more servers/databases plus larger databases will no doubt mean more efforts/challenges to DBAs than otherwise.

      • abutaareq says:

        More servers, more databases or more users it does not matter to a DBA, if he or she is well organized and automated daily activities and planned for failures. There are DBAs just can’t handle two clusters and there are DBAs just maintain anything given to them.

        As a DBA most of the challenges comes from business users, developers who designed bad systems, poorly architectures data structures and quality analysis (QA) with no qualities at all.

        In that scenario, you cannot judge a DBA, any thoughts?

      • jxhyao says:

        Thanks for your comment,abutaareq. I probably cannot agree that “more servers, more databases or more users it does not matter to a DBA if …”. The truth is more servers / more databases will absolutely bring new challenges. Yes, automation is one solution to these challenges, but to come up with sophisticated / well-functioned automation itself is a challenge. Your automation strategy to 10 server vs 1000 servers probably will be quite different. Use a more detailed example, say, for 10 servers, I probably can put a job on each server to read sql log error and send email to me every hour, say there is 10% possibility that a server will send me an email, which amounts to one email per hour, but for 1000 servers, I cannot do the same as I will not be able to read/digest 1000 X 10% = 100 emails/hr. If you consider that 1000 servers may have sql server versions, and with sql server 2008+, I may tend to use XEvent to monitor while on SQL Server 2000/2005, I have to rely on trace etc. It is totally different experience. I guess any DBA who have to manage 200+ instances will know the pains to handle big volume of duties.

        I do agree to some extend that “most of the challenges comes from business users, developers …”, but this is real world. If anything is perfectly designed/engineered, there probably will not much demand for a DBA’s service. :-)

      • abutaareq says:

        I agree with your point as well, it is case by case. Sometimes if the 1000 servers are same it is easy to maintain if they are different then it will be a jungle. Your example is correct if you receive that many emails then you will spending whole day looking at emails.. on the other hand we should plan something to reduce that emails.

        I disagree with the point that we will not have jobs… real world people do not create job for us, our job is to keep the database for data and data only. All the problems starts, when they do not follow that rule.

        Good posting, keep it up.

  2. Thulasi says:

    In addition to that , depedns on Servers / Size the maintenance work will be more , even we have automated everything in place. Assume if we have automated the SQL Backups daily night and it got failed on one day ,When it fails in one server then it would be eaisr , but if you want check the same in 100 servers then it required more time. The same way we need to precise monitoring in place when our DB Size growing heavly , because the house keeping jobs might take more time due to some other activities on system and we might observe on time to time. So we should also consider the No of servers and amount of work we have on each server.

  3. CK says:

    Great article and posts … I have been asked to produce a set of quantitative KPI’s for my DBA Team. My workplace is a large Insurance company, so the DBA Team only looks after and manages databases. This appeared easy when I wrote down a list, until I realised that the KPI measurement either; cant be measured easily, doesn’t fully fall into the DBA realm, is more qualitative than quantitative, its relevance is only to a DBA audience or is influenced by Application related factors outside of the DBA’s control. Considering my KPI’s need to be aligned with the Business’s KPI’s, I could only come up with ‘Database Availability’ , change requests to be done within 3 days .. help

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s