Scoring

Metrici is often used for building assessment and evaluation solutions. This section describes general strategies for interpreting single assessments.

It covers

Assessment structure - how an individual assessment questionnaire is structured
Weighting and scoring - conventions for configuring weights and scores
Weighted mean - the basic recipe for calculating scores across a group of questions
Coverage - measuring completeness
Weighted power mean - skew results to give due emphasis to bad (or good) answers
Group scores - scoring groups of questions
Findings - inferences and recommendations based on an assessment
Threshold rules - set recommendations based on scores
Poor man's rules - a simple convention of associating rules with assessments

Assessment structure

An assessment questionnaire is just like any other structure in Metrici:

Each assessment questionnaire is a node type.
Each response is an instance of the questionnaire.
Each question is a member type of the questionnaire.
Each grade is a target on the target list of the question.

Weighting and scoring

Defining a weighting and scoring scheme

The member type list on the questionnaire allows a weight to be associated with a question.

The target list on a question allows a score to be associated with a grade.

Using this information, is is possible to calculate a weighted mean across all the answers to a question.

Maximum score

It is possible to calculate a maximum score for an individual question:

If a member type has a cardinality of 1, the maximum score for the member type is the highest score of any grade, i.e. the highest scale in the target list.
If a member type can repeat, the maximum score for the member type is the total of all the grade scores.

This calculation is implemented in Calculate Maximum Score (library.parts.CalculateMaximumScore), which writes the maximum score to Maximum Score (library.parts.MaximumScore).

When considering a whole assessment, there are two strategies for deciding the assessment maximum score:

Use the highest maximum score found for any question. This is works if every question is on the same scale (e.g. 0 to 100). Care is required for questions with a cardinality greater than 1.
Set a maximum score, and then scale the score of each question to this maximum. This is a better general strategy because it allows questions with different scoring schemes to be combined.

Weighted mean

A weighted mean can be calculated across all or some of the questions in a response.

The general algorithm is:

Total score = 0
Total weight = 0
For each member type
  If member type weight > 0
    Member type score = 0
    For each member
      Add the score associated with the target in the member type's target list to the member type score
    Next
    Lookup member type maximum score
    Member Type Score = Assessment maximum score * Member Type Score / member type maximum score
    Total score = total score + Member Type Score * member type weight
    Total weight = total weight + member type weight
  End If
Next
Weighted mean = Total Score / Total Weight

Completion and coverage

Completion and coverage measure how many questions have been answered.

Completion measures the percentage of questions that have been answered.
Coverage measures the percentage of questions with weights that have been answered, weighted according to the weights.

Coverage is relatively simple to calculate at the same time as score. The coverage is the weights of the questions that have beeen answered and have at least one target as a percentage of the total weights of all the questions.

Completion is, perhaps surprisingly, much more difficult to calculate. Within a questionnaire, some member types do not represent questions, and some questions do not need to be answered.

The Question library provides a number of types to calculate completion.

The Question type (library.question.QuestionType) provides a set of member types that define a completion rule:
- Completion rule (library.question.CompletionIndicator) indicates whether the question is optional or mandatory. The default is that questions which do not repeat are mandatory and those that repeat are optional.
- Text completion (library.question.ValueCompletionIndicator) indicates whether, if the questions allows a value, the value is required for the answer to be considered complete.
- Text completion length (library.question.ValueCompletionLength) indicates the amount of text that must be entered for the value to be considered answered.
- Link completion (library.question.TargetCompletionIndicator) indicates whether, if the question has a target, a target is required for the answer to be considered complete.
- Derive Completion Rule (library.question.DeriveCompletionRule) uses the above information to derive Completion Rule (library.question.CompletionRule) which holds a JSON structure that the conditions under which a question should be considered complete. (Note that there are two members called completion rule - one (local reference CompletionIndicator) indicates whether the question needs to be answered, the other (local reference CompletionRule) describes the full rule in JSON.)
When added to a questionnaire, Completion (library.question.Completion) calculates and holds completion for all questions with a populated Completion Rule (library.question.CompletionRule). This derives a single member the scale of which is the percentage completion and the value of which lists questions that need to be completed.
Completion Column Script (library.question.CompletionColumnScript) uses the Completion member to produce a column which shows completion as a bar with a pop-up showing the completion text.

Completion is generally shown to the user who fills in the questionnaire, and coverage is shown when analysing scores.

Weighted power mean

Most assessments are aimed at helping organisations decide how to act by identifying situations that demand attention. An assessment with some low scores and some high scores may come out similar to an assessment with only medium scores, and thus hide the areas requiring attention from management.

Where this is an issue, an alternative approach can be used, known as the weighted power mean. In this, each score is raised to a power, and then the total raised to the inverse of the power. Depending on the power used, this has the effect of skewing the mean to the low scoring end or the high scoring end.

In the algorithm below, the power is represented by P.

Total score = 0
Total weight = 0
For each member type
  If member type weight > 0
    Member type score = 0
    For each member
      Add the score associated with the target in the member type's target list to the member type score
    Next
    Lookup member type maximum score
    Member Type Score = Assessment maximum score * Member Type Score / member type maximum score
    Total score = total score + ((member type score raised to the power of P) * member type weight)
    Total weight = total weight + member type weight
  End If
Next
Weighted power mean = (Total score / total weight) raised to the power of ( 1 / P )

For a typical assessment with a score range of 0 to 100, a power of 0.2 to 0.5 provides a suitable skew towards the low end. A power of 1 gives the same value as a simple weighted mean.

Group scores

Scores can be calculated across the whole assessment or across groups of member types.

Different solutions use different strategies for identifying the groups of member types.

The simplest strategy is based on using member type groups and the conventions used in the core library for member type lists.

The core library types have two member type lists:

An original member type list, of type library.core.MemberTypeList
A derived member yype list, of type system.MEMBER_TYPE_LIST

The derived member type list is the "official" list of member types used by the underlying engine. It is derived from the original member type list.

The original member type list may contain member type groups (identified by the Member Type Group tag (library.tags.MemberTypeGroup)). The groups for the analysis can therefore be found by scanning the original member type list for targets with the tag Member Type Group. Any groups with a total weighting of 0 should be discarded.

By default, groups do not modify question weightings, i.e. the weighting for a group is simply the sum of the weightings of the questions within it. Some solutions (e.g. those based on the Advisor Library, library.advisor) use a hierarchical scoring scheme, in which the groups are given weights that are then shared out to the questions within them in proportion to those question's weights.

Sometimes the groups used to create the questionnaire are not suitable groups for analysis. In this case, additional groups can be created. Sometimes these groups are known as "dimensions" because they measure different dimensions of the overall assessment.

Findings

A finding is a conclusion, inference or recommendation made as a result of an assessment.

By convention:

All selected grades and inferred findings are written to the Finding List (library.advisor.FindingList) member type. The target of the finding list points to the finding nodes.
Text associated with the finding on the finding list provides further context or justification of the finding, for example echoing the textual part of the questions that derived the finding.
Findings may have a priority, which is specified as the scale of the Priority (library.advisor.Priority) member.
The priority of a finding can be adjusted up or down according to whether other findings have also been found. This is specified by the Priority Adjustment (library.advisor.PriorityAdjustment) members of the finding.
Findings with a calculated priority of between 1 (high) and 5 (low) are considered relevant to show the user.
The questions that triggered the finding are typically recorded, to help the user understand the findings.

Most solutions use the type Finding (library.advisor.FindingType) for findings. If this is used as a grade type, then simply selecting the grade will trigger the finding.

Different solutions use different strategies to create the findings. The most sophisticated is the expert system used by the Advisor Library (library.advisor). However, this is relatively complicated and most solutions will get at least as good results more easily by using a combination of threshold rules and poor man's rules, which are explained in the next sections.

Threshold rules

Threshold rules associate findings with group scores. The node type Threshold Rule (library.parts.ThresholdRuleType) is a drop-in replacement for the member type group which allows a series of score and findings to be associated with the group. After sorting the thresholds in descending order, the group score triggers the first finding in the list that it has reached or exceeded.

For example, given the thresholds and findings:

80	Performance good
60	Performance adequate
40	Performance weak
20	Performance very poor
0	Performance unaccetable

A score of 35 would trigger the "Performance very poor" finding.

Poor man's rules

The standard Grade node type (library.parts.GradeType) allows a grade to list one or more findings.

By default, any listed findings will be triggered when the grade is selected.

Each finding is associated with a "contribution" number, which can be any positive or negative number, though 0 is interpreted the same as 1.

If contributions are used, then a finding must have a total contribution of 1 before it is triggered. For example, grades on two different questions could each refer to the same finding with a contribution of 0.5. The finding is only triggered if both grades are set.

Contributions can be used to simulate AND, OR and NOT conditions. In this way they can be considered "poor man's rules", though in practice they can be used to simulate a more complex rule processor.

The list of findings may list other grades, which themselves have finding lists, which are processed recursively. This allows complex conditions to be built up.

There are two components that can be used to apply poor man's rules:

Apply Poor Man's Rules (library.parts.ApplyPoorMansRules) is a member type that can be added to a questionnaire. It reads through all the grades on all the questions and adds them to the findings list, and then applies the poor man's rules logic to add further findings to the list. This component is suitable when you are using poor man's rules to trigger other processing.
Recommendations Table (library.parts.PoorMansRecommendationTable) is a more sophisticated component that can be used as a member type or a dynamic table. It looks at grades and applies poor man's rules, and also:
- Calculates the priority of each finding, i.e. applies the priorities and priority adjustments, and filters out findings with a priority > 5.
- Sorts the findings into ascending priority sequence (i.e. most urgent ones first).
- List the related member types.
This is more suitable for calculating recommendations based on poor man's rules to show to the user.