Improving assessment of clinical competence

Researchers at the Australian Council for Educational Research (ACER) presented a diverse array of insightful presentations, papers and workshops aimed at improving and enhancing the assessment of clinical competence.

The sessions delved into the complexities of modern medical education with depth and insight, from pioneering studies that explored borderline regression standard setting, to thought-provoking symposiums tackling the co-existence of programmatic assessment and certifying examinations. Other sessions shared the transformative journey from unstructured ’Vivas’ (oral examinations) to Objective Structured Clinical Examinations (OSCEs), explored the nuances of levels-based marking schemes, and unpacked the intricacies of interpreting item analysis data for written examinations. They also shared their experiences in reforming radiopharmaceutical science training through a programmatic approach to assessment, highlighting the tangible benefits of collaborative efforts with esteemed medical institutions.

Learn more about each of the presentations.

“Which way were you leaning?” The impact of two borderline categories in borderline regression standard setting.
Jacob Pearce (ACER), Vernon Mogol, Kristy Osborne (co-author, ACER), Gabes Lau, Barry Soans and Anne Rogan (RANZCR)

Borderline regression standard setting is considered best-practice for determining pass marks in Objective Structured Clinical Examinations (OSCEs). Candidates receive question-based marks for stations and examiners also provide a global ratings of candidate performance. The impact of choice of category labels on the validity of the standard setting process is under-researched. A 6-point categorical scale was applied during borderline regression for the Royal Australian and New Zealand College of Radiology (RANZCR) OSCE, known as OSCER. We hypothesised that two borderline categories (“borderline pass” and “borderline fail”) would be more helpful to examiners, than the one borderline category used in the previous RANZCR Viva Examinations. When examiners are pressed on a borderline rating, they can often tell you which way they are leaning. Separating the borderlines into two categories worked well in practice. Examiners found the application of the scale straightforward. The data demonstrated an empirical difference between the two borderline categories and provided more nuanced assessment evidence for review by the expert panel. The precise wording used in categorical rating scales does impact the standard setting outcomes. But the more important factor to consider is how examiners conceptualise the minimally competent candidate and appreciate the differences between levels of candidate performance captured in the rating scale.

"Double think or brave new world”: can programmatic assessment and certifying examinations exist in the same educational reality?
Tim Wilkinson (University of Otago), Mike Tweed (University of Otago and RACP), Rola Ajawi (Deakin University), Jacob Pearce (ACER), Imogene Rothnie (RACP, ANZAHPE, AES)

This symposium explored the inherent tension between programmatic approaches to assessment, that emphasise the important of student learning over time, and high-stakes examinations, that measure proficiency at a point in time. This is a complex balance for medical schools and specialist training colleges, tasked with training medical professionals but also ensuring standards are met in the interests of public safety. In this symposium, a range of experts in clinical competence and post-graduate medical assessment debated several dimensions of this challenge. Drawing on scholarly work, applied practice and stakeholder perspectives, the panel debated whether and how these divergent assessment paradigms could co-exist in curricula and what we gain or lose by merging them or selecting one over the other.

From unstructured Viva to OSCER: Transforming the assessment of clinical competence in radiology training
Gabriel Lau, Barry Soans, Mark Phillips, Anne Rogan, Brendan Grabau, Nicole Groombridge (RANZCR) and Jacob Pearce (co-author, ACER)

The Royal Australian and New Zealand College of Radiologists (RANZCR) commissioned a full-scale assessment review in 2015. One finding was that the Viva Fellowship Examination was no longer fit-for-purpose. There was a lack of standardisation in the format, no formal standard setting, was prone to examiner variation and potential bias, and was somewhat contrived and not reflective of current clinical practice. A new Objective Structured Clinical Examination in Radiology (OSCER) was designed. After much planning and development, the first OSCER was administered in June 2023. The new OSCER enhances reliability by having lots of stations and data points. It is highly standardised through careful blueprinting, case selection, specific questions, and clear marking guides with rubrics. The same cases are now shown for all candidates on the same day. The potential for examiner variation and bias is minimised, and examiners are routinely trained and engage in multiple calibration processes. The oral interaction format still allows for deep probing of knowledge, understanding and higher-order thinking with real-time clarification. Borderline regression standard setting has also been implemented. And most importantly, the examination is now less contrived, and more reflective of clinical practice. The re-design from Viva to OSCER was part of a “whole of assessment program” review, to ensure the value and purpose of all assessments and examinations in the training program. The OSCERs are now RANZCR’s capstone assessment of clinical reasoning, thinking and the integration of the relevant knowledge and ability of a clinical radiologist.

Supporting marker judgement and validity with levels-based marking schemes
Neville Chiavaroli (ACER) and Jacob Pearce (ACER)

This paper explored the alternative approaches to developing marking schemes (also known as rubrics) for short answer questions. The authors outlined the importance of marking scheme design in written assessments, noting how the quality and structure of marking schemes are key elements in the overall validity of such assessments. While so-called ‘points-based’ schemes, where marks are allocated for each correct statement, are common in medicine and health professions courses, there are many question types which require a more structured and nuanced approach, such as those assessing deep understanding and higher order reasoning. Such marking schemes are known as ‘levels-based’, and are structured to support markers to focus on and reward the overall quality of the response, in accordance with the key concepts reflected in the questions. Although more challenging to write than points-based schemes, levels-based schemes are essential if student responses to higher level questions are to be assessed appropriately and validly. The authors also emphasised the importance of marker training and presented examples of levels-based guides for clinical contexts.

Using and interpreting item analysis data for written examinations
Neville Chiavaroli (ACER) and Clare Ozolins (ACER)

This workshop focussed on using and interpreting item analysis data, one of the key forms of evidence for validity for multiple-choice assessments. This form of data is an essential part of quality assurance for high-stakes assessments and for making defensible decisions based on the results of examinations. But it is often neglected due to lack of awareness of its value, uncertainty with regards to its interpretation, or not having direct access to the results. This workshop outlined the nature of these statistics, how they are to be interpreted and applied to multiple-choice assessments to evaluate the quality of questions, and how they can be calculated using simple and widely available software. The workshop included several practical activities in interpreting authentic item data, and a demonstration of how to calculate the key statistics with sample data using an Excel spreadsheet. The presenters also showed examples of more sophisticated question analyses which are possible with advanced test analysis software.

Reforming radiopharmaceutical science training through a programmatic approach to assessment
Kristy Osborne (ACER), Jacob Pearce (ACER) and Jennifer Guille (ACPSEM)

ACER worked with the Australasian College of Physical Scientists and Engineers in Medicine (ACPSEM) to reform their 3-year training program in radiopharmaceutical science. The aim of the program is to train registrars to design, manufacture and analyse radiopharmaceuticals for use in nuclear medicine. We supported ACPSEM to declutter their curriculum, implement clear checkpoints to ensure timely feedback to registrars, and use a suite of assessment types best suited to the learning outcomes. These assessment types included entrustment scale ratings for workplace-based assessment, presentations, formal and reflective reports, short-answer questions, and annotated records. The new program launched at the end of 2023, and has had positive feedback from registrars, supervisors and assessors. Overall, those surveyed agreed the new curriculum is less repetitive, good feedback mechanisms are now in place, and the new assessments are better suited to the program. As one registrar stated, ‘The new Curriculum has been well revised, it is much clearer and some unnecessarily long assessments have been effectively condensed.’

Improving assessment of clinical competence

Copyright and publishing permissions

Media enquiries

ACER Social Media