In the study of measurement error, we sometimes find that the within-subjectvariation is not uniform but is proportional to the magnitude of themeasurement. It is natural to estimate it in terms of the ratio within-subjectstandard deviation/mean, which we call the within-subject coefficient ofvariation.

In our *British Medical Journal*Statistics Note on the subject,Measurement error proportional to the mean, Doug Altman and I described howto calculate this using a logarithmic method. We take logarithms of the dataand then find the within-subject standard deviation. We take the antilog ofthis and subtract one to get the coefficient of variation.

Alvine Bissery, statistician at the Centre d'Investigations Cliniques,Hôpital européen Georges Pompidou, Paris, pointed out that someauthors suggest a more direct approach. We find the coefficient of variationfor each subject separately, square these, find their mean, and take the squareroot of this mean. We can call this the root mean square approach. She askedwhat difference there is between these two methods.

In practice, there is very little difference between these two ways ofestimating within-subject coefficient of variation. They give very similarestimates.

This simulation, done in Stata, shows what happens. (The functioninvnorm(uniform()) gives a standard Normal random variable.)

. clear

Set sample size to 100.

. set obs 100obs was 0, now 100

We generate true values for the variable whose measurement we are simulating.

. gen t=6+invnorm(uniform())

We generate measurements x and y, with error proportional to the true value.

. gen x = t + invnorm(uniform())*t/20. gen y = t + invnorm(uniform())*t/20

The simulated data look like this, shown as a scatter plot with the line ofequality:

Calculate the within-subject variance for the natural scale values.(Within-subject variance is given by difference squared over 2 when we havepairs of subjects.)

. gen s2 = (x-y)^2/2

Calculate subject mean and s squared / mean squared, i.e. CV squared.

. gen m=(x+y)/2. gen s2m2=s2/m^2

Calculate mean of s squared / mean squared.

. sum s2m2Variable | Obs Mean Std. Dev. Min Max---------+----------------------------------------------------- s2m2 | 100 .0021519 .0030943 4.47e-07 .0166771

The within-subject CV is the square root of the mean of s squared / meansquared:

. disp sqrt(.0021519).04638858

Hence the within-subject CV is estimated to be 0.046 or 4.6%.

Now the log method. First we log transform.

. gen lx=log(x). gen ly=log(y)

Calculate the within-subject variance for the log values.

. gen s2l = (lx-ly)^2/2. sum s2lVariable | Obs Mean Std. Dev. Min Max---------+----------------------------------------------------- s2l | 100 .0021566 .003106 4.46e-07 .0167704

The within-subject standard deviation on the log scale is the square root ofthe mean within-subject variance. The CV is the antilog (exponent since we areusing natural logarithms) minus one.

. disp exp(sqrt(.0021566))-1.04753439

Hence the within-subject CV is estimated to be 0.048 or 4.8%. Compare thiswith the direct estimate, which was 4.6%. The two estimates are almost thesame.

If we average the CV estimated for each subject, rather than their squares, wedo not get the same answer.

Calculate subject CV and find the mean.

. gen cv=sqrt(s2)/m. sum cvVariable | Obs Mean Std. Dev. Min Max---------+----------------------------------------------------- cv | 100 .0361173 .0292567 .0006682 .1291399

This gives us the within-subject CV estimate = 0.036 or 3.6%. This isconsiderably smaller than the estimates by the root mean square method or thelog method. The mean CV is not such a good estimate and we should avoid it.

Sometimes researchers estimate the within-subject CV using the mean and within-subject standard deviation for the whole data set. They estimate the within-subject standard deviation in the usual way, as if it were a constant. They then divide this by the mean of all the observations to give a CV. This appears to be a completely wrong approach, as it estimatesa single value for a varying quantity.However, it often works remarkably well, though why it does I do not know. Itworks in this simulation:

. sum x y s2Variable | Obs Mean Std. Dev. Min Max---------+----------------------------------------------------- x | 100 6.097301 1.012154 3.62283 8.696612 y | 100 6.081827 1.000043 3.759932 8.447584 s2 | 100 .0823188 .1212132 .0000193 .605556

The within-subject standard deviation is the square root of the mean of s2 andthe overall mean is the average of the X mean and the Y mean. Hence theestimate of the within-subject CV is:

. disp sqrt(.0823188)/( (6.097301 + 6.081827)/2).04711545

So this method gives the estimated within-subject CV as 0.047 or 4.7%. Thiscan be compared to the estimates by the root mean squared CV and the logmethods, which were 4.6% and 4.8%. Why this should be I do not know, but itworks. I do not know whether it would work in all cases, so I do not recommendit.

We can find confidence intervals quite easily for estimates by either the rootmean square method or the log method. For the root mean square method, this isvery direct. We have the mean of the squared CV, so we use the usualconfidence interval for a mean on this, then take the square root.

. sum s2m2Variable | Obs Mean Std. Dev. Min Max---------+----------------------------------------------------- s2m2 | 100 .0021519 .0030943 4.47e-07 .0166771

The standard error is the standard deviation of the CVs divided by the squareroot of the sample size.

. disp .0030943/sqrt(100).00030943

The 95% confidence interval for the squared CV can be found by the mean minusor plus 1.96 standard errors. If the sample is small we should use the tdistribution here. However, the squared CVs are unlikely to be Normal, so theCI will still be very approximate.

. disp .0021519 - 1.96*.00030943.00154542. disp .0021519 + 1.96*.00030943.00275838

The square roots of these limits give the 95% confidence interval for the CV.

disp sqrt(.00154542).03931183. disp sqrt(.00275838).05252028

Hence the 95% confidence interval for the within-subject CV by the root meansquare method is 0.039 to 0.053, or 3.9% to 5.3%.

For the log method, we can find a confidence interval for the within-subjectstandard deviation on the log scale. The standard error is*s _{w}*/root(2

*n*(

*m*-1)), where

*s*isthe within-subject standard deviation,

_{w}*n*is the number of subjects, and

*m*is the number of observations per subject.

In the simulation, *s _{w}* = root(0.0021566) = 0.0464392,

*n*= 100, and

*m*= 2.

Hence the standard error is 0.0464392/root(2 * 100 * (2-1)) = 0.0032837.

The 95% confidence interval is 0.0464392 - 1.96*0.0032837 = 0.0400031 to0.0464392 + 1.96*0.0032837 = 0.0528753.

Finally, we antilog these limits and subtract one to give confidence limits forthe CV: exp(0.0400031)-1 = 0.040814 and exp(0.0528753)-1 = 0.05429817, so the95% confidence interval for the within-subject CV is 0.041 to 0.053, or 4.1% to5.3%. These are slightly narrower than the root mean square confidence limits,but very similar.

I would conclude that either the root mean square method or the log method canbe used.

Thanks to Garry Anderson for pointing out an error on this page.

Martin Bland

Back to frequently asked questions on the design and analysis of measurementstudies.

Back to Martin Bland's home page.

This page maintained by Martin Bland.

Last updated: 16 October, 2006.

Back to top.