Outputs

The outputs of the assessment deliver value by conveying information and helping people decide what to do.

Think about the audience for the outputs and when the outputs are produced.

In some cases, the outputs are shown to the respondents as soon as they have finished the assessment. For example, a pre-sales assessment of website issues would highlight the identified issues at the end of the assessment, in the hope that the repondent would then consider the offered services.

In other cases, the respondents are not shown the outputs. In a typical customer satisfaction survey, only the survey owners see full results. They may choose to publish a summary of results, but this is not aimed at only the respondents.

There are three main classes of outputs: scores, findings and categorisation.

Scores

It is easy to apply a scoring scheme to a set of questions with graded answers, to measure the "goodness" of each evaluand.

The simplest scoring scheme is to attach a score to each answer, and then to take an average across all answers.

If questions vary in their importance, different questions can be given different weights.

The questions can be divided into groups, each group scored separately, and then a total score calculated. This allows different aspects of goodness to be examined. Weights can be used within the groups and between the groups, as necessary. Groups can follow the same structure used to present the questions, or can be completely separate. If separate groups are used, the same question can be in more than one group, and can have different weightings in different groups.

Sometimes it is useful to highlight negative or positive situations. For example, in an assessment of IT security controls, scoring three controls as perfect (score 100) and one as non-existent (score 0) suggests a worse situation than scoring four controls as acceptable (score 75), even though the average score is the same. In these situations, a power mean can be used. This calculates an overall score which is biased towards either low or high scores. The calculation of weighted power means is covered in the development guide section on Scoring. Power means are better than ad hoc calculations such as "take the average of the bottom three scores" because they are more mathematically defensible and easier to calculate across a range of conditions.

Where the same evaluand is assessed multiple times, an average score can be taken. Similarly, an average score can be taken across multiple evaluands of the same type.

Within the scoring algorithm, you need to consider how to deal with questions that have not been answered. If it is reasonable to not know the answer (as in an opinion survey), then scores should be calculated ignoring questions that have not been answered. If a single definitive assessment is being made (for example, evaluating a project), then not answered should score 0. Where not answered scores 0, the coverage figure should be shown to indicate how many questions have been answered. (See Scoring for a definition of coverage.)

Within Metrici, scoring schemes can use any maximum score, though the scoring scheme needs to be consistent across all the questions in an assessment. It is generally easiest to have a range of 0 to 100. Even if the final reports show a different scale (e.g. a 1 to 5 rating), it is generally easiest to perform all the calculations on a 0 to 100 basis and then scale the final results.

Scores can be shown graphically. In most cases, simple bar charts are the easiest to interpret.

Findings

A finding is a conclusion, inference or recommendation made as a result of an assessment.

At their simplest, there can be a 1:1 correpondence between questions and findings. If the respondent didn't give the right answer, tell them what the right answer was.

Other than in online exams, 1:1 findings quickly become tedious, and it is usually better to derive a smaller set of more insightful findings from the answers. One or more of the following strategies can be used:

  • Group scores and threshold rules. Where group scores are used, the score can be used to trigger one of a number of findings. For example, in a web content assessment, if the score for the accessibility group is less than 50 you might show a finding which explains about accessibility and how to improve it.
  • Poor Man's Rules (PMRs). PMRs are a simple way of deriving findings from combinations of answers. A rule is created which indicates what finding should be triggered and which answers contribute to it with what weight. If the weight of the given answers is greater than or equal to one, the finding is triggered. If a finding lists two answers each with a weight of 1, then either answer will trigger the finding (an OR condition). If both answers have a weighting of 0.5, then both answers must be given to trigger the finding (an AND condition). Negative numbers can be used to give NOT conditions, and additional features allow nested rules. See Scoring for more details.
  • Expert system rules. These allow full if-then conditions that are evaluated across all answers. These are useful for situations that become too complex for PMRs.
  • Scripts. Any other strategy can be implemented in scripts.

Where findings are produced, it can be useful to show the questions that triggered the findings. Where the questions allow for textual input as well as gradings, the textual input provides a very powerful justification for the findings because they are the respondents own words.

Categorisation

Evaluands can be categorised, either using score levels, findings, or simply by answers.

For a single assessment, categories can be useful, for example giving projects a "red", "amber" or "green" status.

Across multiple assessments, categories are very useful to subdivide the evaluands for management attention. Evaluands can have multiple categories. These are particularly useful in complex assessments, for example to find important projects that are not well run.

 

Whatever outputs are produced, it is important that the audience can understand the outputs and understand the relationship between the questions and the outputs. It is better to provide a simple and defensible set of outputs than provide complex, nuanced outputs that people can not understand.