RELIABILITY, VALIDITY AND THE SUBWAY SYSTEM
by Ed Colet
In order to do any sort of useful data mining, one would like to have clean, high quality data to mine. The importance of clean data is obvious -- without it, subsequent analysis and interpretation of results can easily become questionable. In this column I discuss a real-world example of how a flawed measurement process can result in "unclean" data. There are two very useful concepts from statistical measurement theory that are relevant here: reliability and validity (which basically correspond to consistency and accuracy).
A recent story in a local newspaper reported that since the 1970's, the Transportation Authority (TA) of a major city has been using a flawed method for recording and calculating the number of miles that subway trains travel between mechanical failures. This distance, known as the mean distance between failures has been computed based on the assumption that trains always start their run at the first station on the line, end at the last stop, and then make a return trip. But in reality, trains often start or end their runs somewhere in the middle of the line and/or don't make return trips. The recorded distances between breakdowns were therefore overestimates by 2-10%, or 1000-4500 miles. In response to the report, TA officials argued that more accurate reporting is unnecessary, costly and any possible advantage to more accurate reporting would be negated by the inability to monitor comparative performance from year to year.
Making measurements: validity and reliability
Formally, validity refers to the usefulness of inferences drawn from measurements for a given purpose under a prescribed set of conditions. In other words, it refers to the extent that one is measuring what one thinks one is measuring. The quantitative index of validity is a correlation between one's measure with a separate measure for the same thing being measured. Although there are several types of validity (internal, external, construct, etc), the essential notion is that the observed measurement is an accurate (i.e. true) one. An example can clarify this.
If one were to measure an actual distance of 5.4 inches with a ruler and the observed measure is indeed 5.4 inches, then the process of measurement and the measuring instrument are valid. The correlation between one's ruler and an independent standard of length is appropriately high. In the case of the TA, their process does not yield valid (accurate) measures of the distances that subway cars travel.
Related to validity is the notion of reliability. Formally, reliability refers to the consistency of one's relative measures over repeated administrations of the measurement process. The quantitative measure of reliability is the correlation of a measure with itself (i.e. repeated administrations of the process). For example, imagine that one has a ruler again. But this time the ruler is defective (short by an inch). All measurements taken will be inaccurate (invalid), but they will be reliable because they are internally consistent. To further clarify, imagine that one is to measure a fixed distance AB. This time, one's ruler is made of rubber and dynamically stretches and shrinks. One measure reports 6.5 inches, a second measurement yields 7.2 inches, and a third measure results in yet another value. This process is not reliable and sources of error can be attributed to the instrument (the rubber ruler stretches) and/or the person making the measurement.
It's possible to have reliability without validity, and if so all is not lost. For example, if a set of measured distances is based on a ruler that is off by an inch then the set of measurements is reliable but invalid. Having reliability without validity does allow one to make comparisons between measurements by capitalizing on the internal consistency inherent in reliable measurements.
In the situation at the TA, there is a clear case of knowing that a data collection and measurement process is imperfect. But rather than correct a flawed procedure, or adjust subsequent analyses to "correct" for the error, it was decided to continue as before in order to compare performance from year to year. In choosing this latter option, the officials at the TA are implicitly acknowledging that validity may be lacking but they are also erroneously assuming that reliability is present. Imagine that two different trains on different lines both travel an actual distance of 5 miles but what is recorded are the complete track lengths of 10 and 7 miles respectively. In other words, this is analogous to having a measuring instrument that should report 5 miles but instead reports 10 and 7 miles -- a situation like our rubber ruler. As a result, one concludes that the process is not reliable. Alternatively, it is also possible to claim that because the distance traveled is recorded as the fixed distance of the train's entire route, they're technically not really measuring any distance at all. If so, it is formally not possible to calculate a reliability index, and we conclude again that the process is not reliable.
The ideal situation is to have measurement procedures that produce both reliable and valid data since this would allow one to conduct a richer set of analyses. At the TA, both reliability and validity are lacking and along with it are opportunities for discovering potentially significant patterns.
For example: To what extent are mechanical failures associated with distance traveled? Because the overestimates range from 2-10% it's not possible to know the actual distance a train traveled (although I suppose that one could use a 10% correction factor and get an upper bound on the answers). Also, one may not be able to determine if mechanical failures are associated with particular sub-routes traveled. Consequently it may not be possible to accurately determine if mechanical failures are associated with factors such as rush-hour periods, passenger load, etc.
These and other potentially interesting patterns, like desirably clean data, may not be available.
---
For more information, see http://www.virtualgold.com