As a DBA, we may all have this question: How should my work be evaluated objectively? or in another way, what criteria would I hope to be used by others (my manager / clients / peers etc) to evaluate my work?
In DBA world, we love “no news is good news” and as such, to me, the best DBA (/team) should be a person (/team) who is never seen in any news/spotlight but holds an irreplaceable position in an organization’s IT hierarchy.
Though this qualitative description gets nodding from all DBAs I have worked with, it can hardly be useful to non-DBA stakeholders, like HR/senior management, because intangible description cannot be used for performance evaluation, which must be based on facts, the quantitative and verifiable facts.
So I think a set of quantitative KPIs will help out here. But to design these KPIs seems not easy because most DBA work has dependencies on external factors (such as network/hardware/business policies etc) controlled by others, and we do not want the KPIs to be unable to reflect DBA’s own and sole efforts.
With some struggle and thoughts, I have come up with the following KPIs which I believe are usable in most cases:
Avg. Database Availability Time (longer, better): The average time a sql server service is running between two unplanned downtime windows. (Planned downtime can be excluded here)
This KPI, strictly speaking, is not necessarily the full responsibility of DBA team as any unexpected network/hardware issue can cause the database unavailable. But there are many items that a DBA (/team) can control, such as: one db log file growing out of control, or tempdb growing out of disk or bad CLR stored proc causing a mini-dump and sql service restart etc. So this KPI actually aims to indicate DBA efforts to maintain the internal health of the sql server system, and to avoid any events that may cause the system’s unavailability. At the bottom line, this KPI will request the DBA to make the unplanned to be planned, for example. send early warnings to stakeholders, if something is unavoidable, at least we can minimize the shocks to end users and help them to make a decent “exit”.
Avg. Database Usability Accidents (lower, better): The accidents that cause the users unable to use the database system.
Still this may not be the full responsibility of DBA as these accidents can come from lots of things like bad query of 3rd party applications or bad hardware etc. This KPI mainly addresses database accidents that, if intervened early, can be prevented to impact the database usability from end-user perspective, let’s say long-running reporting queries blocking user’s regular OLTP application session, or user session timeout due to continued log file growth because of the small database file growth configuration (like 1 KB), etc.
Avg. Daily Workload/Person(higher, better): Avg daily dba service requests fulfilled per DBA
We can use # of service requests as a starting point, but initially, we could categorize the common DBA service tasks (such as refresh an environment, install a sql server instance, deploy a script etc) and assign a payload factor to each category, so we can have a meaningful workload indicator. I think four categories, i.e. tiny, small, medium and big, should be a good starting point.
Avg. Service Fulfillment Time (shorter, better): this KPI should be calculated as workload unit (a unit of workload can be considered as one DBA work hour, this may includes the real work time plus the necessary documentation / communication time) completion time, for example a SQL Server instance setup task is worth 5 units, while granting a user with read-only access may count for 0.5 unit. so this KPI can indicate to stakeholders how efficiently /responsively a DBA works.
DBA Duty Volume: information about # of servers/instances/dbs/total db sizes.
This KPI is a good indicator for DBA responsibility scope, i.e. the number of database administration “objects” and their sizes.
Although these quantitative KPIs are useful to non-DBAs, I still feel there should be some qualitative KPIs that are useful to DBAs. In another word, these qualitative KPIs are used by DBAs to evaluate DBAs. For example, as a DBA, if I were asked to evaluate another company’s DBA work, I could use these qualitative KPIs as a starting point to do my evaluation. (I will come up with a list of quantitative KPIs in my next post.)
So far, I have not worked in an environment that a DBA team needs to be evaluated against a set of KPIs, and it would be very interesting to hear stories from anyone who has such experience, things like how to maintain (i.e. collect, calculate, report) these KPIs, whether there are any pros and cons etc.