Introducing IRT for Measuring Usability Inspection Processes

Accompanying Simulation Programs

Here the simulation programs accompaning our CHI2008 paper are available.

Written for the R statistical computing environment featuring the eRm package

Please, do not distribute files found on this page and also keep the website address confidential until the publication process of the CHI2008 is finished!


First of all the R statistical computing environment needs to be installed. It is available for Windows, Linux and MacOS. Follow these instructions.

It is recommended to install an editor or development environment for convenient use of R. For Windows users Tinn-R is a nice tool. For Linux users Rkward is recommended

For analysis of Rasch models the eRm package has to be installed. This is achieved on the R command line with

install.packages("eRm", dependencies = TRUE)

Usually a dialog should appear, where the CRAN mirror has to be selected. Linux users may need some additional packages from their distributions, like GCC and accompaning Fortran compiler. Windows users should be fine.

The scenario scripts are based on two own library scripts for some functions used. First download:

Put them in a dedicated folder on your disk. Remember the path!

Now download the three scenario scripts and put them in the same folder:

Finally, open each of the scenario files in your favorite editor, or your new and fresh R GUI - but not Word or Wordpad - and change the following line at the top of the script according to the path of the library scripts on your disk.

source("~/Dissertation/Material/Statistics/Libraries/InspectionRealm.R") source("~/Dissertation/Material/Statistics/Libraries/RaschMeasurement.R")

Now you should be ready to run each script with R. Use the GUI tools suggested above to open and run the scripts, or on the R command line type:


Note that some simulations can take quite long. Up to several minutes.

Have fun! If you encounter any problems, feel free to contact the author

Scenario 1: Simulating the Inspection Process

View the complete formatted source code here

The most basic application of the IRT is using the Rasch formula for modelling the core inspection process. This demonstrates the utility of the function  for investigating the behavior of complex inspection processes accounting for both impact factors -- difficulty and ability.

With the Rasch formula the impact of the variance of defect detectability and inspector ability on process variability can be analyzed. For epsilon we choose the distribution N(0,1) and for theta a normal distribution with mu=-1.1.

## Mean of the defect detectabilty distribution (can be a vector)
## Variance of the defect detectabilty distribution (can be a vector)
## Mean of the inspector ability distribution (can be a vector)
abilmean<- -1.1
## variance of the inspector ability distribution (can be a vector) abilvar<-c(0.5,1,2)

These settings yield an average detection probability of approx 0.3, which is typical for inspections . In order to investigate the impact of variance in inspector ability, a simulation is run with three different values for the variance of inspector ability theta. This is done in the next piece of code. As you may guess fromo the multiple nested slopes, this is capable of running quite general simulations, for example also manipulating the means of both distributions:

for (dm in defmean){
	for (dv in defvar){
		for (am in abilmean){
			for (av in abilvar){
				for (ps in procsize){
					## Run several inspection processes with parameters now set (means a fixed group of inspectors and a fixed set of defects)
					for (r in c(1:nruns)){
						## simulate single instances of an inspection process 
						## write out the mean thoroughness of individual inspectors
						## write out the group result
					## put everything in a result vector (parameters and result)

The gathered data is put into the dataframe result, with the following parameters:

  • preset defect mean
  • preset defect variance
  • preset ability mean
  • preset ability variance
  • inspector group size
  • measured thoroughness mean (individual)
  • measured thoroughness variance (individual)
  • measured thoroughness mean (group)
  • measured thoroughness variance (group)

In the final two code blocks two graphics are output (have a look at the paper)

  1. The variance of group outcome with three different variances in the inspector population (as a pdf)
  2. The mean of group outcome with three different variances in the inspector population

Note that the data sets are hard coded and must be adapted for different kinds of simulations.

Scenario 2: Testing Inspectors' Abilities

View the complete formatted source code here

A preliminary for several further applications of Rasch measurement in inspection research and practice is a test for inspectors' skills. In the following scenario a diagnostic instrument for assessing inspectors ability is set up, which, by the way, is close to the original purpose of the IRT approach.

First, it is required to establish a test: A sample of participants (n=30) is asked to fully inspect a sample application with previously known usability defects (n=30, verified via falsification testing ). For demonstration purpose this test is simulated alike the previous scenario, yielding a 30x30- response matrix.

## Setting up the inspectors

## setting up the project project1<-data.frame(ID=c(1:30),Difficulty=rnorm(30,0,1)) ## Computing individual detection probabilities (inspector x defect) lm1<-LambdaMatrix(inspectors[c(1:30),],project1)## Computes a response matrix (random process) rm1<-ResponseMatrix(lm1)

From this matrix the defect difficulty parameters are estimated with the CML method.

## Estimating the item parameters

The model is then approved with the LR-test, which yields a LR-value of 20.25 (df=22) and p=.57 for instance. The null hypothesis that the values of epsilon are equal in subgroups can be retained, which means the Rasch model holds.

## Estimating the item parameters

Note: If you want to view the results of the LR-test, on the R command-line just invoke:


As an application consider a team of five inspectors (for example in a usability consulting agency) to be tested. These participate in the test by conducting an inspection of the test application.

## Testing five newly employed inspectors
## Simulating the test data
## Test scores (sum of corect responses)
testscores<-apply(testdata, 1, sum)
## Looking up the person parameters

Again, raw scores are computed and the person parameters are simply obtained from the test score table.

## Test scores (sum of corect responses)
testscores<-apply(testdata, 1, sum)
## Looking up the person parameters

The output figure shows the test response matrix and the person parameters compared to the distribution of the calibration sample.

Scenario 3: Predicting the Inspection Process

The outcome of an inspection process can be regarded as a three-fold random experiment: A sample of inspectors is chosen from the population, a sample of defects is chosen from the defect population and each pair undergoes a Bernoulli process. Accordingly, all three sources count for undeterminism of process outcome. In this final demonstration we show that the process is predicted more accurately, if the individual abilities in the sample of inspectors are assessed a priori.

This is in the achieved by testing the inspectors as depicted in the previous scenario. Of course, the individual defect parameters cannot be estimated in advance. Thus, we choose the mean of the defect parameters from the test construction scenario. For the prediction we first define a general function, which compute the expected group outcome, given a ability vector and a fixed mean for the difficulty

	for (theta in inspectors){

To demonstrate the increased accuracy of prediction, we prepared a simulation of the inspection process with the samples from the test construction scenario above and compared the accuracy of prediction between two conditions: In the first condition, the prediction relies only on the mean values of the parameters and the expected outcome with n inspectors is computed with the homogeneous prediction formula (``Virzi predictor'').
Note: Don't get confused that we use the Rasch outcome function here. The Virzi predictor is just a trivial case, where we have a mean theta instead of a vector.

	## Predicted outcome from mean group theta and Virzi formula

Accordingly the Rasch predictor:

	## Compute the predicted outcome, given the distribution of defects and estimated inspector abilities

Finally, we really let them conduct the inspection process (the series of Bernoulli experiments with differing Lambda)...


... and gather the following data points for each run of the simulation: