Damian Sendler: There is strong evidence that most psychiatric disorders originate in childhood and that childhood adversity increases the risk of developing psychiatric disorders in adulthood [1]. More than half of the burden and disability in young people aged 10–24 years is caused by neuropsychiatric disorders (45%), which are strongly associated with risk-taking behaviours and significant psychosocial impairment [2–5]. For many psychiatric disorders, the length of time an individual suffers from an untreated illness is a significant predictor of poorer outcomes [6]. Thus, early detection and appropriate intervention are essential to reduce the overall burden and disability associated with neuropsychiatric disorders [7].
Dr. Sendler: Many patients with a psychiatric disorder do not or delay seeking help from a mental health professional, which contributes to the long duration of untreated illness [8]. If they are experiencing behavioral or emotional issues at school, most children and adolescents will see a school counselor or a general practitioner (e.g., a pediatrician or nurse practitioner). When it is assumed that a child has a specific mental health disorder, these mostly non-mental health professionals require screening instruments to detect whether or not the child requires a general psychological evaluation (caseness) (e.g., ADHD, psychosis). Mental health professionals, on the other hand, may need screenings if specialized, complex, or lengthy assessments are considered, such as for psychosis risk or autism [7, 9, 10].
Damian Jacob Sendler: Screeners are frequently used to detect psychiatric disorders in many fields of medicine [5, 9, 11]. The low positive predictive value (i.e., low accuracy) and lack of age-appropriateness of many screening instruments have led to their widespread discredit in the field of mental health [9]. While it is true that screening for some mental illnesses can be difficult, the most serious issue is that reports on new screening instruments frequently lack sufficient evaluation of important psychometric properties that would be required to judge their usefulness. This could be a factor in the poor image that psychiatric screening tests have earned.
In most cases, data on reliability and validity and norms for the targeted population is required to evaluate its applicability.
The accuracy of a screener’s measurement is what determines a screener’s reliability, not whether or not the targeted construct is assessed. It is possible to distinguish between three different types of trustworthiness: Screeners must measure what they measure consistently over time in order to have a good test–retest reliability (note: a low test–retest reliability may be seen if the screener measures a fluctuating state or a trait condition, or if the condition itself has changed). In order to ensure internal consistency, all items of the screener or its subscales must measure the same concept (s). (3) The rate of agreement between different raters is evaluated if the screen is an interview (inter-rater reliability) [12,13].
Damian Sendler
The degree to which a screener accurately measures what it claims to measure is referred to as validity. For screening instruments, there are three main aspects of validity that are commonly required: Criteria validity is an important consideration in clinical diagnostics because it shows how closely a screener’s result matches a specific criterion [12, 13]. Two aspects of criteria validity can be distinguished as a result of the difference in time between the screening and the criterion assessment: (a) the degree to which the screening can identify individuals who currently have any or a specific mental disorder (concurrent validity; requires nearly simultaneous screening and criterion assessment in the test construction phase, while in practice some time may pass between screening and formal assessment); and (b) the degree to which the screening can identify individuals who currently have any or a specific mental disorder. A test’s construct validity is evaluated when the emphasis is placed on the score rather than the test’s outcome, and the measure of interest is less well defined than, for example, a formal diagnosis but pertains to a construct that is not directly assessable (such as intelligence or personality characteristics).
According to expert consensus, it refers to the degree to which screener scores match those of the gold standard assessment (such as the HAWIK in the assessment of IQ). If there is high agreement between the screener’s results and those of an established measure of the same construct, this is a positive sign of construct validity. This is the opposite of construct validity, which is high when screener scores are not correlated with measures of other constructs. There should be no correlation between the scores of an ADHD screener and scores of scales assessing emotional or behavioral disorders. Content validity demands that the screening instrument measure all important aspects of the target condition, such as inattention but also hyperactivity and impulsivity when ADHD and not just the inattentive subtype is targeted.
Damian Jacob Markiewicz Sendler: As a whole, a screener must be able to consistently produce (state) accurate scores and results (reliability) (validity). Instruments are frequently described in terms of their reliability before or solely in terms of their validity. Due to lack of validation data, it is difficult to determine the clinical utility of many screening instruments [12, 13]. Concurrent (predictive) validity of a diagnostic screening instrument should be demonstrated by (1) ruling in most or all patients with the target condition (diagnosis) while (2) excluding a significant number of those without it.
Screeners should generally have a sensitivity close to 100%, a negative diagnostic likelihood ratio (LR) 0.1 that indicates a ‘large and often conclusive’ change from pre-screening to post-screening probability of the absence of illness risk [14], and a positive predictive value that is greatest in settings where the prevalence of the condition is highest, i.e., greater in clinical settings than in c. settings. Screeners must have high specificity and a positive diagnostic LR 5 that indicates a moderate increase in the pre-screening to post-screening risk probability in order to exclude a significant number of patients who do not have the target condition [14]. Diagnostic likelihood ratios are rarely reported in studies evaluating screening instruments, e.g., for psychosis risk [16], despite the fact that these can be more easily interpreted [cutoffs for “good” concurrent (predictive) validity exist].
Damian Jacob Sendler
Screeners’ differential accuracy should not rely heavily on confounding conditions, such as co-occurring emotional or behavioral disorders, but should have good content and convergent or criterion validity (i.e. measure the target condition) [13,14]. Using a clinical interview as an example, the final screener result (e.g., determined by a cutoff score) should not only match the interview result but also each screener item should be highly correlated with its respective interview counterpart (both aspects of convergent validity in dimensional assessments or criterion validity when symptoms are assessed) [13]. Aside from that, the screening tool should examine all aspects of the target condition, not just the most obvious ones (content validity). Screening instruments are rarely evaluated on the basis of these considerations.
Last but not least, norms or cutoffs should be provided so that an individual’s performance can be compared to that of a similar group for clinical purposes and the evaluation of a patient’s mental state. A screener should be tailored to the overall goal (e.g., screening for psychiatric caseness in the general population versus screening for a specific condition in a clinical population) or to different groups (e.g., separate norms for age groups, gender and/or other potentially influential sociodemographic characteristics) to improve the population fit..
Damien Sendler: Psychometric properties are often overlooked in studies of screening instruments. To begin with, it is important to know what the screening is for and where it will be used (e.g., general population/school, primary care, or mental health services, as well as the expected developmental stage of the recipients). As a general rule, most screeners are not useful for everything (e.g., for caseness and a specific disorder). This means that reliability and validity cutoffs (e.g., diagnostic likelihood ratios) that distinguish between a useful and a useless screening instrument should be studied in appropriate populations with an adequate sample size.
For children and adolescents with mental health issues, it may be difficult to develop effective screening instruments for all possible scenarios and conditions. Many studies on potential screening instruments are also inappropriate at this point, but research on screening instruments is necessary to improve comprehensive and early detection of mental health conditions in children and adolescents, especially during times of increasingly tighter resources.