Sorry for the late response. I got intrigued by the nature of your example data, and I got side-tracked hoping to offer some "deep" insight. Not!

Originally Posted by
Duarte_RV
I'm trying two options: trimmed mean and median. Not sure which one is better to show the trend in this case.
Considering the Poisson-like or exponential-like distribution of your example data, the median might indeed be more representative of the central tendency than the mean.
But that really depends on how you want to use the statistic. The benefit of the median is also its weakness: it is less sensitive to changes in the data. For example, MEDIAN({1,1,10,1000,1000}) and MEDIAN({8,8,10,11,11}) both return 10.
In any case, since much of your example data are zero values, which are invalid, be sure to at least exclude zeros in any calculations.
For example, array-enter formulas of the following form (press ctrl+shift+Enter instead of just Enter):
=TRIMMEAN(IF(data>0,data), 0.5%)
=MEDIAN(IF(data>0,data))
Or you might normally-enter formulas of the following form (press Enter as usual):
=AVERAGEIF(data,">0")
=SUMIF(data,">0") / COUNTIF(data,">0")

Originally Posted by
Duarte_RV
I know that a trimmed mean doesn't remove outliers per se. The thing is that this dataset is about ambulance response times, and there were times when they didn't start counting (so the time value is 0) or that they didn't stop (so the time is too long). I thought that for this case the trimmed mean was useful. But if you know a better method, I'm more than happy to change!
Actually, the sample data that you post demonstrates the weakness of TRIMMEAN.
More than 54% of the data are zero. Obviously, we cannot write TRIMMEAN(data,108%).
Any reasonable use of TRIMMEAN would ignore only some of the zeros, not even "a lot" of them. So your trimmed mean would still be skewed to the left (too low).
Based on your knowledge of the process, the ideal way to exclude "true" outliers (mistakes) is to determine arbitrary lower and upper boundaries that are clearly the result of procedural error.
For example, you mentioned failing to start the response timer immediately. Clearly, zero reflects that mistake; perhaps 12 sec (the next smallest) does, too. And you mentioned failing to stop the response timer appropriately. Perhaps 8198 (2.25 hr), the example max, reflects that mistake; perhaps something lower does, too, like 3600 (1 hr).
Alternatively, we can try to identify potential outliers statistically based on the interquartile range (IQR), the middle 50%.
For some background, read http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm (click here).
To use this methodology, we might calculate the following:
Aside.... I prefer 1.7 instead of 1.5 because Q3 + IQR*1.7 is close to 3 sd, which is a common outlier limit for a normal distribution.
Array-enter the formulas in X1 and X2 (press ctrl+shift+Enter instead of just Enter).
Note that we always exclude zero because the predominance of zero data values (in the example) would skew any blind statistical calculation.
Then we might calculate one of the following for a measure of central tendency:
=AVERAGEIFS(data, data, ">0", data, ">=" & X4, data, "<=" & X5)
or
=MEDIAN(IF(data>0,IF(data>=X4,IF(data<=X5,data))))
Array-enter the MEDIAN formula (press ctrl+shift+Enter instead of just Enter).
Note that I always test data>0 as well as data>=X4 because X4 might be less than zero. That is the case for your example data.
Bookmarks