Deloitte

THE CASE STUDY WITH DELOITTE

Client

Deloitte, through a contract with St. Charles Consulting


Challenge

Deloitte provides continuing education, with the requisite CPE credit, to their professionals. Their programs are all accredited by the AACSB and are large-scale high-stakes assessments. As such, the item and test analysis must be comprehensive, precise and psychometrically rigorous. Additionally, annual reports must be provided for continuous quality improvement of the assessments associated with courses. Previous psychometric analysis for the reports only used Classical Test Theory (CTT) approaches. Even though high reliability of these assessments is desired for consistent measurement of learner knowledge and ability, leveraging only CTT can result in unstable statistical estimates, because the statistics associated with this method are sample and test dependent. Modern test theory utilizing item response theory (IRT) is a more appropriate and rigorous approach.


Solution

In this project we have evaluated over 100 learning assessments, used to measure learning outcomes of continuing education courses for accounting professionals using both CTT and IRT to optimize their learning assessments. This allows us to quantify item performance on each of the final assessments using psychometric metrics. We have helped to identify opportunities for improvement while ensuring compliance with ongoing board certification requirements.  These quantitative analyses are followed by expert qualitative analyses to recommend next steps for achieving more reliable and valid measurement of knowledge of learning objectives.

Impact

The quantitative analyses are used to identify problematic items that may be improved to more validly measure learner ability with respect to the intended learning objectives.  Using the results of the qualitative analysis, item writers are able to further adapt future assessments to improve reliability and better ascertain learner proficiency.  Items identified as problematic can also be opportunities to clarify educational content presented to learners.

Methodology


We used IRT, CTT, DIF analyses, and modern data visualizations to quantify test item performance and identify opportunities for improvement of assessment. This included the full scope of the item from the question to the response options, with visualization of the various item characteristic curves to ascertain how items were performing across examinees. Once items were flagged as problematic, a qualitative review was conducted, using a standardized approach to make recommendations to item writers with respect to how to write items of higher quality.

Results


This is a year-to-year ongoing project. In year one we worked to evaluate the test and assessment process and write the initial base code so the results are accurate and repeatable. In year two we repeated the analysis and worked to semi-automate the process due to the large number of assessments. We have begun working on qualitative psychometric aspects around item writing, which includes rigorous evaluation of the assessment constructs in the context of the goals and objectives. This ensures the items are measuring what is intended and helps to increase the statistical validity and reliability. In the next several years, we will continue automation utilizing machine learning and artificial intelligence. We will also explore predictive validity to ensure the assessments are useful for important outcomes.


Tools Used: Python for data extraction, cleaning and pre-processing; R for statistical analysis (tidyverse, dplyr, CTT, difR, psych, car, flextable, knitr), FlexMIRT and R (mirt, ggmirt) for IRT analysis.

Learn More About Our Research Firm or Program Evaluation Services

To Learn More About Our Services or To Schedule a Free Consultation Contact Us Today!

Contact Us
Share by: