Skip to content
Dr. Steven RigattiJan 12, 2024 11:06:00 AM3 min read

Interpretable Machine Learning and the Clock Drawing Test

Dr. Steven Rigatti, Consulting Medical Director to CRL, discusses a new digital version of the Clock Drawing Test, created with the goal of achieving better test accuracy and reliability.


The Clock Drawing Test has been used in the insurance industry for many years, as a way to indicate the potential presence of dementing illness in proposed insureds, particularly those applying for long-term care coverage. In the classic version of the test, the subject is asked to draw an analog clock face and place the hands to indicate a particular time (usually 11:10) – this is the command clock – and then to copy a provided clock face which shows the same time – this is the copy clock.

This test purports to evaluate many cognitive domains including executive function (planning), verbal working memory, comprehension, visuospatial skills, numerical knowledge and on-demand motor execution (praxis). It is commonly used as a screening test for dementia, either as a stand-alone, or as part of a larger battery such as the Montreal Cognitive Assessment (MoCA). The scoring of this test has traditionally relied upon categorical hierarchies of various features, assigning points to each. In the original Rousseau scoring criteria, points were available for correct construction of the face (2 points), numbers (4 points) and hands (4 points). Though useful, some of the descriptions of the features are vague and subject to interpretation by the rater (example: “Slight error in the placement of the hands” vs. “Major error in the placement of the hands.”). In the time since the test’s development some 30 scoring systems have been developed and evaluated in various contexts.

Recently, a group of researchers at MIT and Boston University decided to create a digital version of the Clock Drawing Test in an attempt to achieve better test accuracy and reliability . To do so they used a digital pen and “smart” paper which has a faint grid visible to the pen’s camera. The entire process of clock creation is therefore captured and can be mined for features which are predictive of dementia. These include not just traditional factors, but time-dependent factors and fine details of the written digits, timing of pen strokes, speed and order of the process. Not surprisingly, when the full feature space is employed using state of the art machine learning algorithms like Random Forest, gradient boosted trees, and neural nets, the accuracy as measured by the area under the receiver operating characteristics curve (AUC) improved from about 0.7 to over 0.9. However, these algorithms are rather opaque (“black box”) and do not suggest which features played the most important roles in any given decision.

The researchers, in conjunction with clinicians, desired a more interpretable model and therefore employed a technique known as SLIM (supersparse linear integer model). This algorithm uses a more limited set of features chosen from operationalized versions of the 30 existing clinical scoring systems and combines them in a logical way such that only integer values are applied to their presence or absence, and no more than 10 of them are selected. The resulting scoring system achieved an AUC of 0.78 – not quite as good as the full machine-learning system, but quite an improvement over previously published systems. It also has the advantage of being easily explicable (see below).

This study shows that when handled well, and in collaboration with subject matter experts, advanced machine learning can be utilized to optimize model performance while preserving human-readable insights into the effect that various inputs have on the model outputs. The disadvantages are that some performance/accuracy is sacrificed for the clarity achieved, and that integer models can be extremely resource-heavy in their development and fitting.


About the Author

Dr. Steven J. Rigatti is a consulting medical director with Clinical Reference Laboratory, with 12 years’ experience in the life insurance industry. He is the current chair of the Mortality Committee of the American Academy of Life Insurance Medicine.


1 Souillard-Mandal, W. et al. Interpretable Machine Learning Models for the Digital Clock Drawing Test. 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY, USA (link)