In my last post, I discussed about the quantitative KPIs which, to me, are useful for “outsiders”, i.e. non-DBAs, to evaluate DBA work performance from business perspective. However, I believer there should be some qualitative KPIs that can be best used by “insiders” to evaluate the DBA work from a more technical and sophisticated point of view. So here is the list of these KPIs:
1. Database Administration Infrastructure Index: We know that in real world, we need transportation infrastructure of both good quality and sufficient quantity to boost economy and production. The transportation infrastructure is to facilitate the quick and efficient goods exchange to revitalize economy. The same theory applies to database administration, i.e. DBAs need an infrastructure to quickly /efficiently collect / consume various information to do a top-tier database administration work. Here, the database administration infrastructure may include a set of hardware and software equipment/tools that composes a database administration framework. One quick example is that we often need a re-occurrence of an issue in a production for further trouble-shooting, but to do so, we can either wait the issue to re-appear in the production, or we can try to “trigger” the same issue in a dedicated DBA lab if there is such a lab, which of course includes both hardware and software tools. To me, this DBA lab can be considered as part of a database administration infrastructure, and I always dream that I can have a top-class dedicated lab where I can test how far my dream can fly with various administration ideas. I am still dreaming these days. (after all, where there is a dream, there is a way)
2. Database Administration Scalability Index: This index is to evaluate the DBA capability and capacity in handling increased DBA duty volume. With a strong scalability index, it means when duty volume grows, the actual DBA workload should be of little or no change as shown below
Between point A and B, with duty volume more than doubled, the DBA workload remains almost unchanged. To achieve this goal, there must be lots of automation plus a comprehensive administration platform on a robust administration infrastructure.
One example is: if we can automate the restore of a database from one production to one QA environment, we should be able to automate multi-database restores from multi-production environments to multi-QA environments. If we can further add some parallel restoring mechanism in the design, I’d say this db restore practice is very scalable.
3. Database Administration Maturity Index: This index tries to address the following questions:
(i) Are there any established rules/policies for a DBA to follow when dealing with administration issues? These rules/policies are not necessarily to be followed during manual work, they can be actually embedded into the script logic, workflow logic etc.
Take a random restore for example, before a restore, do we need to check the target server’s disk space to see whether there is sufficient space for a restore? I guess we may all encounter the following scenario: I need to restore a database by restoring from a full backup file and then from a differential backup, and this database is big, so it may take me 8 hrs to finish the restore from a full backup file, and when I do the restore from the diff backup file, the restore fails because in the diff backup file, the log file grows from 10GB to 18 GB and thus outgrows my 15 GB log disk space.
If there is proper pre-restore step followed, this embarrassing scenario can be avoided in the first place.
(ii) Is there a continuous improvement process for DBA daily practices? If there is, what are the evaluation criteria for the result of this process?
Many times, people just do what they are accustomed to and seldom think about what can be done to change the practice to be better. Or in extreme cases, “continuous improvement” is not even encouraged. I remember long, long ago, I was working in a company that required DBA to manually check 4 production servers against a 2-paged (about 20+ items) checklist document everyday. What I felt very bad was I have to manually check all the sql server logs and windows event log (application/system), job history etc. It can easily take 3 hrs to do this check. Imagine you have to do it every day in the morning, when interrupted by emails / meetings / phones, the process can easily last for a whole day. The worst thing was senior management did not even encourage any automation in this area for various reasons.
(iii) Does the DBA(/team) have the requirements/resources to improve the technical knowledge/skills on a regular base? And what are the criteria to verify the improvement? Top tier administration needs top tier DBAs, and as such, a mature database administration environment should encourage/facilitate DBAs to continuously grow/challenge their knowledge/skills.
(iv) Is the db administration corporate knowledge being taken care of? Each business environment may have different business practices/requirements when managing back-end db systems, this can vary from sql service installation to common user problems. All these are precious corporate knowledge and should be shared/maintained properly so the knowledge will not be lost due to one DBA’s absence.
(v) How is the collaboration and work relationships between DBA team and other stakeholder teams or among members within the DBA team? A mature DBA team should have a strong bond among the members to work for the reputation and goods of the team, and can collaborate well with other teams, like network / system admin /application support teams etc.
In a business world, it is almost impossible that we can achieve something without interaction with others. One example is performance issue, when users complains about sudden slowness of an application, it is hard to say whether the issue is caused by the db system or something else, like storage, network, application itself or even the user self (I once saw a case that the slowness was due to the bad data input from the end user). So a healthy and positive work relationship will avoid the unnecessary blames or finger pointing to others, and quicken the whole work.
To me, a good designed KPI, whether quantitative or qualitative, should be verifiable via both DBA work quantity and quality and cannot be abused for selfish purpose. So for example, assuming we have a KPI called “Avg DBA daily work time”, this KPI aims to reflect how many hours a DBA need to handle the workload daily, however, this KPI can be easily manipulated, and we cannot logically link more workload with long work time either, also long work time does not necessarily mean quality work. As such, this is not a proper KPI candidate.
I hope our DBA community can collaborate to come up with a set of KPIs to be used in a broader way. I think one practical use is to certify against these KPIs those companies with remote dba outsourcing services, just like currently, lots of companies are getting ISO 9000 certified. If there is such certification, I believe it will be very popular not only for those companies trying to get certified but also other companies, who can use the KPIs as yardsticks for internal assessment and evaluations.