Heterogeneity in the Usability Evaluation Process

Accompanying Website

Here I make available a statistical program to check for heterogeneity in the response matrix of usability evaluations. You can use this test to check whether the sensitivity of your participants or the visibility of defects differed in the study. The latter is critical as it causes the process to slow down. The stimations based on Virzis formula are then inappropriate - they tend to harmful overestimation.

The program is written for the R statistical computing environment


First of all the R statistical computing environment needs to be installed. It is available for Windows, Linux and MacOS. Follow these instructions.

It is recommended to install an editor or development environment for convenient use of R. For Windows users Tinn-R is a nice tool. For Linux users Rkward is recommended

The Code

Open a new R script and copy the following code into your editor. You can also download it here. Then highlight the code and run it.

## ms: margin sum (do not confuse a frequency distribution on margin sum)
## n: the number of times the Bernoulli experiment was conducted
## A test, whether a vector of margin sums can be the result of a Binomial process
## The variance in sum of true responses is compared to the variance of random 
## Binomial data with fixed p.
## The result is the probability that the variance exceeds the observed variance
## This test deploys a MC procedure
	k<-length(ms) 	# the number of trials ()
	obsProb<-sum(ms)/(n*k) 	# here we compute the mean probability
	# The Monte-Carklo procedure
	for (i in (1:nruns)) simVar<-append(simVar,var(rbinom(k,n,obsProb)))
	cat("The probability that the variance observed in this margin sum happened under a Binomial experiment is ")
	cat(probExpr); cat("\n")


Test it

Now you can try the test. First, let's generate a purely Binomial vector of margin sums. Assume, there have been 20 testing sessions on 100 defects that were all equally visible (p=0.4)

msbin = rbinom(100,20,0.4)

Run the overdispersion test against the above data by


You get a message that tells you that the probability for this variance is about 0.5 or a similar value. Consequently, you cannot conclude this data set to be overdispersed. Which is like expected.

Now, try a data set that is heterogeneous. Let's assume there are again 100 defects, but 25 of them are hard to discover (p=0.3), 50 are moderate (p=50) and another 25 are easily discovered (p=0.25). Also, you want to be sure, this time, and choose 10.000 MC runs (this is still pretty fast on recent computers)

msmix = c(rbinom(25,20,0.3),rbinom(50,20,0.4),rbinom(25,20,0.5))

As expected you get quite small values for the probability. If $freq says 40, for example, this means that in 10.000 samples only in 40 trials the variance was equal or larger than the observed variance. This can be interpreted in terms of alpha level

Have fun! If you encounter any problems, feel free to contact the author