Findings
- It has been strongly asserted in both juvenile and adult risk assessment contexts that actuarial assessment has the capacity to predict risk more accurately than clinical assessment; however, this contention is not universally accepted and many have noted that both assessment models have strengths and weaknesses. Despite this ongoing debate, it is generally recognized that the exercise of unaided professional judgment by mental health practitioners is not a reliable or accurate means for assessing the potential for future dangerous behavior.
- The goals of a comprehensive risk assessment process extend beyond the assessment of risk alone.
- Much of the literature on risk factors for juvenile sexual offending is theoretical and descriptive rather than the result of statistical research. Given these problems, it is not surprising that findings regarding risk factors vary considerably and are inconsistent across different studies. Despite these problems, the empirical research indicates that it is the presence and interaction of multiple risk factors, rather than the presence of any single risk factor alone, that is most important in understanding risk.
- Although there is a developing research base, the empirical evidence concerning the validity of commonly identified risk factors for juvenile sexual offending remains weak and inconsistent.
- Although some empirical support for the predictive validity of the J-SOAP-II, ERASOR and JSORRAT-II assessment tools can be found in the literature, the instruments do not perform in a manner that suggests or proves their ability to accurately predict juvenile sexual recidivism.
- Despite the apparent importance of protective factors, few of the instruments commonly used with juveniles incorporate protective factors and those that do either have no empirical support or are in development and have not yet been empirically validated.
Introduction
The assessment of sexual recidivism risk for juveniles who commit sexual offenses serves several purposes. The overall purpose is to estimate the risk of future sexual offending so that the most effective steps can be taken to reduce, contain or eliminate that risk. Hence, risk assessment essentially serves as an investigative tool that helps inform and guide various intervention, treatment and legal processes. (For more information on treatment, Chapter 5, "Effectiveness of Treatment for Juveniles Who Sexually Offend," in the Juvenile section.)
A risk assessment can be administered at different points once a juvenile is identified by authorities as the perpetrator of a sexual offense. An assessment can be administered during the intake screening process to inform and guide authorities as to the appropriate course of action. In the event of a referral to the court, an assessment can be administered prior to or during adjudication (or trial, when transfer to the adult criminal court occurs) to provide the court, its officers and other professionals with risk information that can be used in legal proceedings, as well as in decision-making regarding supervision or treatment. Finally, assessments may be administered at the postadjudication level to provide the court, its officers, correctional authorities and/or treatment professionals with information about risk that can be used in dispositional or sentencing hearings, as well as in decisions regarding institutional placement, community supervision or treatment. Of note, the point in the process at which an assessment is administered, as well as the purpose of the evaluation, may have significant impact on the risk evaluation. Within the context of treatment, risk assessment is typically used to set a baseline assignment of risk and to then periodically re-evaluate risk during treatment. In addition, the risk assessment process can be used to determine the type and intensity of treatment needed and to help define targets for treatment and case management. The risk assessment process can thus work hand in hand with the three principles of risk, need and responsivity that have increasingly come to hallmark individualized treatment planning and case management (Looman & Abracen, 2013; Yates, 2013).
Regardless of the purpose of risk assessment or the point at which it occurs, assessing risk involves making predictions about the likelihood of future behavior, which is an inherently difficult task under any circumstances. The process of risk assessment for juveniles who sexually offend is further complicated by the relatively low base rates of sexual recidivism found among juveniles. Given these low base rates, the process of risk prediction can potentially result in type I errors, or false positive findings, in which risk is overestimated and low or lower-risk juveniles are incorrectly determined to be high-risk (Caldwell, 2013; Craig, Browne & Stringer, 2004; Wollert, 2006). Juvenile risk assessment is complicated even further by the ongoing development and maturation of youth. In short, juveniles vary and change over time in their physical development; cognitive, neurological and personality development; formation of attitudes and acquisition of information; and emotional and behavioral maturity (Rich, 2009; Steinberg, 2009, 2010; Steinberg & Scott, 2003; Zimring, 2004). Accordingly, risk assessment models and tools must account for these developmental factors in order to accurately estimate risk. Hence, Stockdale, Olver and Wong (2013) note that adolescent risk assessment instruments must be capable of capturing changes in risk that result from developmental changes, with or without treatment.
Whereas the process of juvenile risk assessment was once largely driven by adult risk assessment research and instrumentation, the field of juvenile risk assessment has largely developed in its own right over the past 15 or so years, and continues to do so. Like adult risk assessment, juvenile assessment has traditionally focused on the identification and assessment of factors within the individual that increase (and possibly predict) risk for sexual recidivism. Risk assessment for sexual recidivism — both juveniles and adult — has also traditionally focused on static risk factors that reflect historical behaviors and experiences related to sexual offending. Static risk factors are those that have previously occurred and will remain unaltered over time. Contemporary risk assessment, however, also includes a focus on dynamic risk factors. Dynamic risk factors are those associated with current behaviors, thoughts, feelings, attitudes, situations, interactions and relationships. So named because they are fluid and sometimes relational or situational, dynamic risk factors may thus change over time, particularly through some form of treatment. Dynamic risk factors are sometimes referred to as criminogenic needs because they contribute directly or indirectly to criminal behavior. Although the measurement and evaluation of one or both types of risk factors (static and dynamic) is central to the risk assessment process, focusing on dynamic risk factors is particularly important when treatment is provided because criminogenic needs provide targets for rehabilitative interventions (Beggs & Grace, 2011; Olver & Wong, 2009; Pedersen, Rasmussen & Elsass, 2010). In addition, both juvenile and adult risk assessment can also be used as a process by which to identify and assess risk factors, as well as protective factors, or those elements, strengths, supports and circumstances that mitigate risk for sexual recidivism, increasingly considered in contemporary juvenile risk assessment.
Given the importance of risk assessment in sex offender management and treatment, this chapter reviews the literature on the assessment of risk for sexual recidivism for juveniles who commit sexual offenses. It summarizes what is scientifically known about risk assessment for juveniles who sexually offend and presents key, up-to-date research findings on the defining features and predictive accuracy of commonly used assessment instruments.
When reading this chapter, it is important to keep the following in mind. First, while it is possible to describe the historical context and current state of juvenile risk assessment, there is ongoing controversy in the field about the best model to employ in risk assessment and the capacity of various models and instruments to accurately predict risk for sexual recidivism. Both of these issues will be discussed in detail. Second, although research on female juveniles who commit sexual offenses and preadolescent children who engage in sexually abusive and sexually troubled behavior is emerging, the existing knowledge base concerning juvenile risk assessment is primarily based on studies of adolescent males who commit sexual offenses. Little has changed in this regard, despite increased research on female juveniles, and no risk assessment instrument exists that is specifically designed to assess risk in the adolescent female population. The same continues to be true with respect to both understanding and estimating risk in children with sexual behavior problems, and the research stream for this population has not picked up in any noticeable manner. Accordingly, although much of the information in this review is pertinent to both males and females, and to adolescents and pre-adolescents, the reader must bear in mind that the research cited and discussed in this chapter is most directly relevant to male adolescents who commit sexual offenses.
Risk Assessment Process
Juvenile sexual offending takes place within an environment of developmental, social and contextual circumstances that differ for each young person, and we recognize the heterogeneity of each individual despite shared features and commonalities.
Juvenile risk assessment, therefore, focuses not only on adolescents who commit sexual offenses, but also on the systems within which they live, learn and function and on which they depend for structure, guidance and nurturance. In short, risk assessments of juveniles who sexually offend place behavior and risk factors in the context of the social environment, as well as the context of child and adolescent development. In fact, unlike adult risk assessment instruments, the most widely used juvenile risk assessment instruments set what are essentially time limits (or expiration dates) for any individual's assessed risk level or score, either requiring reassessment of risk within a specified time period (such as every six months2) or noting that the risk estimate is limited to sexual recidivism prior to age 18.3 Developmental considerations are important not only when estimating the risk of sexual recidivism, but also when identifying the very risk factors that are to be used as the foundation for the risk assessment process itself (Quinsey, Skilling, Lalumiere & Craig, 2004). For instance, in their study of 1,396 juvenile offenders, van der Put and colleagues (2011) found that the effect of both static and dynamic risk factors on recidivism varied by the age of the adolescent.
Models of Risk Assessment
Evaluation should include a wide range of individual, social, interactional and contextual factors.
Currently, two general models are used in juvenile risk assessment: the actuarial model and the clinical model. In both models, the assessment process attempts to identify and evaluate the likely effects of risk factors believed to be associated with sexual recidivism. In the actuarial model — also known as statistical or mechanical assessment — risk determination is based entirely on a statistical comparison between the personal characteristics and past behavior of the juvenile and those of known recidivists. The assessment of static risk factors is a distinguishing feature of the actuarial model, although clinical models exist as well. Clinical risk assessment is primarily based on observation and professional judgment, rather than statistical analysis, in which the evaluator attempts to develop an understanding of the juvenile and the presence and likely effect of defined risk factors. In contemporary applications of the clinical model, a structured risk assessment instrument is used to guide clinical judgment. Hence, this approach is considered a structured or anchored clinical risk assessment (Rettenberger, Boer & Eher, 2011), increasingly referred to as structured professional judgment. Unlike actuarial assessment, clinical risk assessment typically evaluates both static and dynamic risk factors and, increasingly, protective factors that may decrease the risk of sexual reoffense.
Actuarial and Clinical Judgments of Risk
It has been strongly asserted in both juvenile and adult risk assessment contexts that actuarial assessment has the capacity to predict risk more accurately than clinical assessment (Hanson & Thornton, 2000; Harris & Rice, 2007; Meehl, 1996; Quinsey et al., 1998; Steadman et al., 2000). In addition, some researchers have argued that the two methods of assessment — actuarial and clinical — are essentially incompatible (Grove & Lloyd, 2006; Harris & Rice, 2007). In fact, Quinsey and colleagues (2006) have argued for strict adherence to the actuarial model and the elimination of clinical judgment from the risk assessment process altogether. These positions, however, are not universally agreed upon, and there is strong disagreement with the assertion that actuarial risk assessment has greater predictive power than clinical assessment (Boer et al., 1997; Hanson & Morton-Bourgon, 2007; Hart, Michie & Cooke, 2007; Litwack, 2001).
Sjöstedt and Grann (2002), for example, have argued that there are problems associated with strict pro-actuarial positions, and other researchers have suggested that actuarial instruments should be used to support, rather than replace, clinical judgment (Grubin, 20111; Monahan et al., 2001). Moreover, Sjöstedt and Grann (2002) and Pedersen, Rasmussen and Elsass (2010) reported strong predictive validity for structured clinical risk assessment, and Hart and colleagues (2003) have argued that guidelines for structured professional judgment help improve the consistency, transparency and usefulness of decision-making. Further, Rettenberger, Boer and Eher (2011) and Rich (2011) have argued that actuarial assessment does not provide information about risk or possible risk management strategies that are highly personalized for the individual being assessed; hence, it fails to meet the practical and ethical issues and requirements relevant to any individual case.
Despite the ongoing debate, it is important to recognize that the exercise of unaided professional judgment by mental health practitioners is not considered a reliable or accurate means for assessing the potential for future dangerous behavior (Ægisdóttir et al., 2006; Hanson & Thornton, 2000; Monahan & Steadman, 1994; Steadman et al., 2000; Webster et al., 1997). Further, it is clear that the actuarial and clinical assessment models both have strengths and weaknesses. Campbell (2004) writes that neither actuarial nor clinical risk assessment instruments stand up to rigorous scientific scrutiny, noting that all current actuarial and clinical risk assessment instruments are insufficiently standardized, lack inter-rater reliability,4 are absent of adequate operational manuals and generally fail to satisfy significant scientific standards. Similarly, Grisso (2000) and Hart and colleagues (2003) have argued that such instruments have not yet achieved the level of psychometric rigor needed to meet publication standards. Sixteen years or so later, little has changed, despite advances in both adult and juvenile risk assessment.
Development of Risk Assessment Instruments
Bonta (1996) and others have characterized the evolution of risk assessment methods as occurring in distinct stages (Andrews, Bonta & Wormith, 2006; Bonta & Andrews, 2007; Hannah-Moffat & Maurutto, 2003; and Schwalbe, 2008). First-generation methods primarily involved unstructured clinical judgment, whereas second-generation methods involved statistically derived and static actuarial assessments of risk. Third-generation methods, which are increasingly common in sexual risk assessments of adult offenders, incorporate both the actuarial base of a static assessment and the dynamic factors of a clinical assessment. Fourth-generation methods integrate an even wider range of dynamic factors, incorporating factors relevant to treatment interventions, case management and monitoring. Third- and fourth-generation methods not only recognize the utility of both static and dynamic risk factors, but also that "there is no reason to think that one type is superior to another when it comes to the predicting recidivism" (Bonta, 2002, p. 367). In fact, when dynamic measures are part of the assessment process, the predictive accuracy of risk assessment can exceed that which may be achievable with only static risk factors (Allan et al., 2007). McGrath and Thompson (2012) report that although static and dynamic risk factors both predicted sexual recidivism in juveniles who commit sexual offenses, a combination of static and dynamic factors resulted in a significant improvement in prediction.
While the characterizations and propositions described above are largely drawn from the literature on risk assessment for adult sexual offenders, they are equally relevant in the context of risk assessment for juveniles who commit sexual offenses, in which, thus far, clinical risk assessment represents almost the entirety of juvenile sexual risk assessment instruments, with the exception of a single actuarial instrument. Moreover, these ideas and principles are essential for understanding the groundwork upon which juvenile risk assessment is built.
Focus and Breadth of Juvenile Risk Assessment
Epps (1997) describes the goal of juvenile risk assessment as synthesizing psychosocial, statistical, factual and environmental information in a manner that allows defensible decisions to be made about matters of management, treatment and placement. Within this context, Will (1999) describes three broad purposes for juvenile risk assessment: i) the assessment of risk for re-offense, ii) the development of a clinical formulation upon which treatment can be based and iii) the assessment of the juvenile's motivation to accept and engage in treatment. Notably, these three goals closely approximate the principles of risk, need and responsivity that have been increasingly central in practice. Graham, Richardson and Bhate (1997) describe six overarching and interactive goals for juvenile risk assessment:
- Identifying troubled patterns of thoughts, feelings and behavior.
- Recognizing and understanding learned experiences and processes contributing to the development and maintenance of juvenile sexually abusive behavior.
- Identifying situational contexts and correlates of sexually abusive behavior.
- Evaluating the probability of sexual recidivism.
- Assessing the juvenile's motivation to engage in treatment aimed at emotional and behavioral regulation.
- Gathering the information required to develop interventions and treatment.
In short, the goals of a comprehensive risk assessment process extend beyond the assessment of risk alone. To this end, Prentky, Righthand and Lamade (2016) describe juvenile risk assessment as informing the treatment planning process with respect to risk-relevant needs and interventions designed to support prosocial rehabilitation. Similarly, Viljoen, Brodersen, Shaffer and McMahon (2016) have stated the "goal of risk assessment is to identify youths' needs in order to assist in planning individualized risk management or risk reduction efforts" (p. 519).
Risk Factors for Juvenile Sexual Offending
An extensive literature has developed that has identified and discussed risk factors for juvenile sexual offending.5 Although definitive conclusions regarding the risk factors that are most pertinent to the prediction of sexual recidivism have yet to be made, similar risk factors appear in the most frequently used juvenile risk assessment instruments, and can be grouped into one of 10 categories (Rich, 2009):
- Sexual beliefs, attitudes and drive
- History of sexual offending behavior
- History of personal victimization
- History of general antisocial behavior
- Social relationships and connection
- Personal characteristics
- General psychosocial functioning
- Family relationships and functioning
- General environmental conditions
- Response to prior/current treatment
However, much of the literature on risk factors for juvenile sexual offending remains theoretical and descriptive, rather than the result of reliably replicated statistical research. It also is characterized by a number of methodological problems and other limitations (Spice et al., 2013). Spice and colleagues (2013) noted that early studies on juvenile sexual recidivism were often based on follow-up periods of less than three years, and that early, as well as more contemporary, studies often employed small sample sizes. They also noted that risk factors examined vary widely from one study to another. Similarly, McCann and Lussier (2008) maintained that the risk factors examined in many studies were selected by researchers based on their own clinical experience, the literature on adult sexual recidivism and, until recently, a lack of theoretical understanding regarding sexual offending behavior among juveniles.
Additionally, risk factors for juvenile sexual and nonsexual offending are significantly influenced by developmental processes in children and adolescents, and are not necessarily stable or uniform during adolescence (Kim and Duwe, 2016; Quinsey et al., 2004; Sampson & Laub, 2003). Further, more recently the research of Leroux, Pullman, Motayne and Seto (2016) illustrates that different risk factors appear to be at play for adolescents who sexually offend, based upon the age of their victims (child, peer/adult and mixed child and peer/adults victims), in which different types of risk factors appear to be at play. Given these problems, it is not surprising that findings regarding risk factors vary considerably and are inconsistent across different studies (Spice et al., 2013).
Interactive Effect of Multiple Risk Factors
Despite the problems outlined above, the empirical research indicates that it is the presence and interaction of multiple risk factors, rather than the presence of any single risk factor alone, that is most important in understanding risk. Thus, all risk assessment instruments — regardless of whether they are used with adults or juveniles, or whether they are actuarial or clinical — include multiple risk factor items, and all risk assessment processes are concerned not only with the presence of different risk factors, but also with the interactive and amplifying effects of multiple risk factors. Simply put, no single risk factor, even one with relatively high predictive strength, is alone capable of predicting recidivism accurately (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2005, 2007; Roberts, Doren & Thornton, 2002).
Empirical Basis of Risk Factors for Juvenile Sexual Recidivism
The problem of the low base rate for juvenile sexual recidivism complicates the process of determining which individual risk factors are likely to be most important in juvenile risk assessment. In fact, many of the risk factors included in juvenile risk assessment instruments used today have face validity (an intuitive and perhaps common sense appeal that appears to reflect aspects of risk), but very little proven predictive validity. In any case, as Prentky et al. (2016) note, risk factors are rarely demonstrated to be causal, and are instead, in most cases, correlational. That is, establishing causality requires empirical evidence that the presence or absence of the risk factor results in changes in the base rate of offending behavior (Prentky et al., 2016). This is, at best, a difficult task.
Indeed, Worling and Långström (2003, 2006) contend that most risk factors commonly associated with juvenile sexual offending lack empirical validation. Describing 21 commonly cited risk factors, Worling and Långström (2006) argue that only five factors — deviant sexual arousal, prior convicted sexual offenses, multiple victims, social isolation and incomplete sexual offender treatment — are empirically supported through at least two published, independent research studies, and that only two other factors — problematic parent-child relationships and attitudes supportive of sexually abusive behavior — have empirical support in at least one study, and thus can be considered "promising" risk factors. The remaining 14 factors they describe as either third-tier "possible" risk factors based on general clinical support or fourth-tier "unlikely" risk factors that either lack empirical support or are contradicted by empirically derived evidence.
However, Worling and Långström's (2006) typology of empirically supported risk factors is both incomplete and has not been replicated and is only weakly supported by later research. Although evidence supporting some elements of their typology is found in later studies, it is also true that later studies have found evidence for factors not supported in their four-tier typology, as well producing some evidence for still more risk factors. Indeed, the literature is mixed and inconsistent.
For instance, supporting Worling and Långström's (2006) typology, in a meta-analysis of 18 studies and more than 3,100 juveniles, McCann and Lussier (2008) found support for deviant sexual interest as a risk factor, as did Seto and Lalumière (2010) in their meta-analysis of 59 studies.6 Seto and Lalumière additionally identified social isolation as a significant risk factor for juvenile sexual recidivism, also identified by Gunby and Woodhams (2010) and van der Put and colleagues (2013). Leroux et al. (2016) also described social isolation and lack of peer intimacy as a risk factor, although specifically in the case of adolescents who sexually abused children, rather than those who sexually abused peers or adults. Similarly, Miner and colleagues (2016) identified social isolation as a risk factor, or predictor, for adolescents who sexually abuse children (but not peers or adults), as well as the adolescent's experience of masculine inadequacy.
However, in direct contrast to the Worling and Långström (2006) typology, Epperson and colleagues (2006), Mallie et al. (2011) and Carpentier and Proulx (2011) found empirical support for a history of sexual victimization as a risk factor for juvenile sexual recidivism, and Epperson and colleagues (2006) also found empirical evidence for a history of nonsexual offending as a risk factor, both of which were identified by Worling and Långström as unlikely risk factors for sexual recidivism. In their analysis of data from the National Longitudinal Study of Adolescent Health, Casey, Beadnell and Lindhorst (2009) also found childhood sexual victimization to be a significant predictor of later sexually coercive behavior, as was a history of adolescent delinquency. Similarly, Leroux et al. (2016) implicated general delinquency as a risk factor, but primarily for adolescent sexual offenders with peer or adult victims.
Knight and Sims-Knight (2003, 2004) and Knight, Ronis and Zakireh (2009) found support for hypersexuality/sexual deviance, impulsivity/antisocial behavior, arrogant/deceitful personality, violent behavior/fantasies and history of victimization as risk factors. However, only one of these factors (sexual deviance) was included among Worling and Långström's (2006) empirically supported risk factors.
In addition to identifying deviant sexual interests as a risk factor, McCann and Lussier (2008) found that having a stranger victim as predictive of sexual recidivism, as well as several risk factors described by Worling and Långström (2006) as empirically unsupported or unlikely. These included a history of prior nonsexual offenses, the use of threats or weapons, having a male victim and having a child victim. In addition, McCann and Lussier found that older age upon intake for treatment was associated with increased likelihood of reoffending. Nevertheless, they noted that even the risk factors found to be the best predictors of sexual recidivism in their study had a relatively small effect size and were based on findings derived from analyses involving small sample sizes. In an earlier meta-analysis, Heilbrun, Lee and Cottle (2005)7 concluded that younger age at first offense, prior noncontact sexual offenses and having an acquaintance victim (rather than a stranger victim) were associated with sexual recidivism. However, in their study of 193 juveniles who commit sexual offenses, Spice and colleagues (2013) found that only opportunity to reoffend was significantly associated with sexual recidivism, although a number of risk and protective factors were linked to nonsexual recidivism.
Finally, Worling, Bookalam and Litteljohn (2012) identified obsessive sexual interests and/or preoccupation, antisocial interpersonal orientation, lack of intimate peer relationships/social isolation, interpersonal aggression and problematic parent-child relationships/parental rejection as risk factors for juvenile sexual recidivism, only two of which were identified as empirically supported or promising risk factors in Worling and Långström's (2006) earlier typology. In his continuing research, Långström (2011) has described sexual offense in a public area, sexual offense involving a stranger victim, two or more sexual offenses and two or more victims as risk factors for juvenile sexual recidivism. However, only one of these appears in Worling and Långström's earlier typology.
As the findings presented above demonstrate, research on the risk factors for sexual recidivism has produced inconsistent and sometimes contradictory results. Indeed, as Spice and colleagues (2013) observe, it is clear that the research literature regarding risk factors for sexual recidivism among sexually abusive youth is disconnected and varied, with little to unify it. Whether the disparate findings are an artifact of the methodological variations found across studies, a reflection of real-world risk factor dynamics or some combination of the two remains unknown at this time. Spice and colleagues (2013) and McCann and Lussier (2008) have voiced concerns about the idiosyncratic nature of individual studies as well as the lack of consistency across studies in terms of research designs, samples, hypotheses and statistical procedures. However, Rich (2009) argues that risk factors for sexual recidivism may operate differently in different people, and at different points in child and adolescent development. For instance, in their study of 1,396 juvenile offenders, van der Put and colleagues (2011) found that the effect of both static and dynamic risk factors on recidivism varied by the age of the adolescent. Thus, risk factors may exert different influences on the propensity to reoffend depending on a number of personal and contextual factors, including the juvenile's age, development and social settings, and the myriad interaction effects different risk factors have in different circumstances and at different points in time. Casey, Beadnell and Lindhorst (2009) similarly noted how difficult it is to clearly implicate in sexually coercive behavior any one risk factor in the absence of other potential risk factors, again highlighting the role multiple risk factors play in contributing to juvenile sexual recidivism.
Both Seto and Lalumière (2010) and van der Put and colleagues (2013) describe further subtlety in understanding and identifying risk factors for juvenile sexual recidivism. Each set of authors recognizes prior childhood sexual victimization as a risk factor for later juvenile sexually abusive behavior. However, Seto and Lalumière describe childhood sexual abuse as a risk factor for the onset of juvenile sexually abusive behavior, but not for sexual reoffense. Similarly, in their study of 625 sexually abusive youth, van der Put and colleagues found that a history of childhood sexual abuse was not a risk factor for recidivism, although they reported significant differences in the incidence of prior sexual victimization among different types or groups of sexually abusive youth, reflecting both heterogeneity within the population and the multifaceted nature of risk factors.
Risk Factors for Sexual Recidivism: Summary and Conclusions
Despite a developing research base, the empirical evidence concerning the validity of commonly identified risk factors for juvenile sexual offending remains weak and inconsistent. As a result, the knowledge base regarding risk factors for juvenile sexual recidivism is speculative and provisional at this point in time, but it is evolving. The inability of research to thus far produce trustworthy and definitive evidence regarding juvenile risk factors for sexual recidivism may reflect problems with the research undertaken to date. However, it is also likely that complex interactions among different risk factors are at play at different times in the development of children and adolescents and that these dynamics are exceptionally difficult to disentangle and document empirically. Indeed, Krahé & Vanwesenbeeck (2016) write that juvenile sexually abusive behavior results from the interaction of multiple risk factors and variables found at different levels, ranging from the macro social level to the level of the individual youth. Furthermore, similarities found between risk factors that place juveniles at risk for sexual offending and those that place juveniles at risk for many other problem behaviors, including general delinquency, complicate matters even further. Far more research is needed to identify, understand and construct both static and dynamic risk variables linked specifically to juvenile sexual recidivism.
Juvenile Risk Assessment Instruments
Most studies designed to assess the accuracy and validity of juvenile risk assessment instruments have focused on the overall structure and predictive accuracy of the most widely used instruments, rather than the individual risk factors within them. Since many, if not most, of the risk factors used in these instruments have not been empirically validated, it is not surprising that instrument validation studies have produced weak or inconsistent results. Nevertheless, there is some empirical support for the capacity of risk assessment instruments to identify statistically valid risk factors, as well as for the predictive validity of various instruments. However, it is not currently possible to definitively assert that any such instrument is empirically validated in terms of its capacity to accurately predict juvenile sexual recidivism. Based on the current, and still developing, research literature, the best we can say about current juvenile sexual risk assessment instruments is that they are partially validated, meaning that validation of their ability to accurately estimate sexual recidivism is weak, with evidence — often only mild to moderate — provided by some research studies but not others. Kim and Duwe (2016), for instance, describe limitations in the capacity of existing risk assessment tools to reliably predict juvenile sexual recidivism. Similarly, Prentky et al. (2016) describe the inability of research to conclude that any current risk assessment instrument is able to accurately estimate or predict juvenile sexual recidivism.
Validation Studies of the Most Commonly Used Instruments
Although there are a number of juvenile sexual risk assessment instruments in use today, the two most commonly used instruments in North America are the Juvenile Sex Offender Assessment Protocol-II (J-SOAP-II) and the Estimate of Risk of Adolescent Sexual Offense Recidivism (ERASOR), both of which are structured and empirically informed instruments designed for clinical assessment. The only actuarial assessment instrument currently available for use with juveniles who commit sexual offenses is the Juvenile Sexual Offense Recidivism Risk Assessment Tool-II (JSORRAT-II), but it is not used as extensively as either J-SOAP-II or ERASOR. Unlike J-SOAP-II and ERASOR — both of which are structured clinical instruments — the JSORRAT-II is a static assessment instrument; that is, it includes only static risk factors.
The JSORRAT-II has been validated by its designers for use only in Utah (where it was initially developed) and Iowa, but it is also available for use in Georgia and California, where it is presently undergoing, and has been for some time, validation studies. Despite its lack of validation in California, the instrument has nonetheless been selected by the State Authorized Risk Assessment Tool for Sex Offenders Committee (www.saratso.org) as the required instrument to be used in the assessment of male juveniles who commit sexual offenses (California Penal Code, §§ 290.03-290.08). However, the instrument authors have relaxed the use of the JSORRAT-II outside of these four states, so that it may now be used as a measure of relative risk (in which individual risk is compared against the total sample), rather than absolute risk (the statistical probability of a sexual reoffense).
Inter-Rater Reliability
The J-SOAP-II, ERASOR and JSORRAT-II have each been generally reported to have inter-rater reliability (Caldwell, Ziemke & Vitacco, 2008; Knight, Ronis & Zakireh, 2009; Martinez, Flores & Rosenfeld, 2007; Park & Bard, 2006; Viljoen et al., 2008). For example, in a study of both ERASOR and J-SOAP-II, Rajlic and Gretton (2010) found strong inter-rater reliability for both instruments, with an intraclass correlation score of 0.78 for the total risk assignment of ERASOR and 0.94 for the J-SOAP-II total score.8
However, Vitacco and colleagues (2009) report an absence of well-designed and executed inter-rater reliability studies in the juvenile risk assessment field overall, pointing out the need for these studies across populations of juveniles in different treatment or supervision settings as well as for research that examines the potential for allegiance bias. Although their study focused on three sexual risk assessment instruments commonly used with adults, Murrie and colleagues (2009) found that assessed risk levels varied depending on whether the assessment instrument was administered by an evaluator retained by the defense or the prosecution. This suggests that assessed risk scores used in legal proceedings may be influenced by the allegiance of the evaluator. Boccaccini and colleagues (2012) also found that subjective factors influenced assessment outcomes in their study of an actuarial instrument used with adults, even though high inter-rater reliability values were reported for the instrument.
Predictive Validity
Drawing firm conclusions about the predictive validity of juvenile risk assessment instruments is difficult for several reasons. First, relatively few validation studies of juvenile risk assessment instruments have been undertaken to date, and research that has examined the predictive validity of juvenile instruments has produced inconsistent findings. Second, there is very little consistency across validation studies in terms of the recidivism definition employed, the time period studied, the selection of the sample/cohort, the study design itself and the ways in which statistics are applied9 and/or interpreted. In addition, some research has reviewed multiple instruments, some of which are not intended nor designed to measure risk for sexual recidivism, while other research has reviewed and evaluated only a single instrument. Sometimes, but not always, the research has also reviewed the capacity of juvenile sexual risk instruments to accurately predict nonsexual recidivism, although none of the juvenile risk assessment instruments currently available for use in the field are designed for that purpose. Notwithstanding these problems, research findings concerning the predictive validity of the J-SOAP-II, ERASOR and JSORRAT-II are sequentially presented in subsequent sections below.
Studies that have examined the predictive validity of each instrument are shown in Table 1. However, despite continued research on the psychometric properties of juvenile sexual risk assessment instruments, as well as continued interest in the process and application of juvenile sexual risk assessment in general, little new or recent research has been undertaken with respect to the predictive validity of these instruments.
Study Authors | Instrument Studied | |||
---|---|---|---|---|
J-SOAP-II | ERASOR | JSORRAT-II | Other | |
Aebi et al. (2011) |
X |
|
|
|
Caldwell & Dickinson (2009) |
X |
|
|
|
Caldwell, Ziemke, & Vitacco(2008) |
X |
|
|
|
Chu et al. (2012) |
X |
X |
|
|
Epperson et al. (2006) |
|
|
X |
|
Epperson & Ralston (2009) |
|
|
X |
|
Fanniff & Letourneau (2012) |
X |
|
|
|
Hiscox, Witt & Haran (2007) |
|
|
|
Juvenile Risk Assessment Scale |
Martinez, Flores & Rosenfeld (2007) |
X
|
|
|
|
Parks & Bard (2006) |
X |
|
|
|
Prentky et al. (2010) |
X |
|
|
|
Powers-Sawyer & Miner (2009) |
X |
|
|
|
Rajlic & Gretton (2010) |
X |
X |
|
|
Ralston & Epperson (2012) |
|
|
X |
|
Viljoen et al. (2008) |
X |
|
|
|
Viljoen et al. (2009) |
|
X |
|
|
Worling, Bookalam & Litteljohn (2012) |
|
X |
|
|
Viljoen, Mordell & Beneteau (2012) |
X |
X |
X |
|
Statistics Used in Assessing Predictive Validity: Area Under the Curve
Statistics used to evaluate the predictive validity of risk assessment instruments vary by study; Area under the curve values (also known as receiver operating characteristics) have been increasingly and commonly used for this purpose. In part used to overcome the possibility of false positives, which can occur with low base rates (such as those for the sexual recidivism of juveniles), area under the curve analysis is unaffected by base rate variations (Craig, Browne & Stringer, 2004; Wollert, 2006).
Whereas an area under the curve value of 1.0 represents 100 percent predictive validity, or discrimination, an area under the curve of 0.50 represents only a 50 percent chance of accuracy, or a level of predictive accuracy that is no better than chance and lacks discriminative value. However, there is no clear standard or benchmark for what area under the curve values between these two points represent, other than the fact that any area under the curve value of 0.51 or higher indicates a probability greater than chance, with a value of 0.64 meeting a commonly accepted standard of statistical significance. Nevertheless, statistical significance in this case simply denotes that the result is not simply the result of chance, and should not be equated with strong predictive validity. Indeed, Singh, Desmarais and Van Dorn (2013) report a lack of consistency in the description and interpretation of area under the curve performance indicators across studies, suggesting a need for standardized guidelines for risk assessment predictive validity studies, although "there is no such agreement" (Singh, 2014, p. 181). In fact, although there is some disagreement regarding what the area under the curve value actually represents with respect to weak, mild, moderate or strong predictive validity, Kim and Duwe (2016) suggest that the bar is often set low in describing the predictive validity of risk assessment instruments across the field. Caution and critical appraisal is thus required in considering the meaning of an area under the curve value when it comes to evaluating the predictive validity of risk assessment instruments.
However, drawn from the general statistical literature (for instance, Fan, Upadhye & Worster, 2006; Streiner & Cairney, 2007; Zhu, Zeng & Wang, 2010), area under the curve values between 0.65 and 0.70 are generally considered to show weak-mild predictive accuracy, values between 0.71 and 0.80 indicate mild to moderate predictive accuracy and values above 0.80 indicate moderate to strong predictive accuracy. Area under the curve values between 0.50 and 0.60 suggest that predictive accuracy is no better or little better than chance. Values between 0.61 and 0.64/0.65 offer weak evidence of predictive accuracy, as these values fall below the threshold that demonstrates any meaningful level of predictive validity, despite reaching a level of statistical significance. Nevertheless, Kim and Duwe (2016) describe predictive validation as a matter of degree in the field of risk assessment, in which an instrument may be considered to show predictive validity as long as it meets or exceeds, even slightly, a level of statistical significance.
Predictive Validity of the J-SOAP-II
The J-SOAP-II has received the most attention with respect to its psychometric properties and its capacity for predictive validity. The instrument has also been studied in combination with, and in contrast to, other juvenile risk assessment instruments, such as the ERASOR, JSORRAT-II and other more general (i.e., nonsexual) juvenile risk assessment instruments.
Table 2 summarizes research findings regarding the predictive validity of the J-SOAP-II. Area under the curve values reported in each study for the J-SOAP-II total score and the instrument's four subscale scores are presented in the table, in most cases for both sexual and nonsexual or general recidivism predictive accuracy.
Overall, the values shown in Table 2 tend to follow an inconsistent pattern across individual studies. Area under the curve values for the J-SOAP-II total score, for instance, range from 0.51 to 0.83 for sexual recidivism, indicating that some studies found strong levels of predictive validity while others found that the instrument's predictive accuracy was no better than chance. Similar variation is reported for nonsexual recidivism and for the instrument's four subscales. Variation in the predictive validity of the instrument is even found within individual studies.
The strongest support for the predictive validity of J-SOAP-II arguably comes from the study conducted by Prentky and colleagues (2010). The research examined the predictive validity of the instrument based on an analysis of sexual recidivism for 336 preadolescent and 223 adolescent males using a follow-up period of seven years. Two of the researchers who conducted the study were involved in the development of the instrument. Prentky and colleagues reported total score area under the curve values of 0.80 for the preadolescent males and 0.83 for the adolescent males, who were among the higher risk offenders in the study sample.
Study Authors | J-SOAP-II Total Score |
J-SOAP-II Subscales | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scale 1 | Scale 2 | Scale 3 | Scale 4 | |||||||||||||||||||||||||||||||
Type of Recidivism | ||||||||||||||||||||||||||||||||||
Sex | NS/ Gen |
Sex | NS/ Gen |
Sex | NS/ Gen |
Sex | NS/ Gen |
Sex | NS/ Gen |
|||||||||||||||||||||||||
Aebi et al. (2011) |
.65 |
.61 |
.51 |
.47 |
.74 |
.66 |
.50 |
.57 |
.74 |
.60 |
||||||||||||||||||||||||
Caldwell & Dickinson (2009) |
NA |
NA |
.23 |
.39 |
.59 |
.55 |
NA |
NA |
NA |
NA |
||||||||||||||||||||||||
Caldwell & Dickinson (2009) |
NA |
NA |
.47 |
.39 |
.70 |
.65 |
NA |
NA |
NA |
NA |
||||||||||||||||||||||||
Caldwell, Ziemke & Vitacco (2008) |
Not |
Not |
Not |
Not |
Not |
Not |
Sig |
Not |
Not |
Not |
||||||||||||||||||||||||
Chu et al. (2012) |
.51 |
.79 |
.72 |
.52 |
.37 |
.71 |
.41 |
.79 |
.55 |
.69 |
||||||||||||||||||||||||
Fanniff & Letourneau (2012) |
.58 |
.60 |
NG |
NG |
.64 |
.61 |
NA |
.61 |
NA |
NA |
||||||||||||||||||||||||
Parks & Bard (2006) |
Not |
Not |
Not |
Not |
Sig |
Sig |
Not |
Not |
Not |
Not |
||||||||||||||||||||||||
Powers-Sawyer & Miner (2009) |
.75 |
.45 |
.72 |
.41 |
.64 |
.47 |
NA |
NA |
NA |
NA |
||||||||||||||||||||||||
Prentky et al. (2010) |
.80 |
NA |
.78 |
NA |
.56 |
NA |
NA |
NA |
.76 |
NA |
||||||||||||||||||||||||
Prentky et al. (2010) |
.83 |
NA |
.83 |
NA |
.66 |
NA |
NA |
NA |
.81 |
NA |
||||||||||||||||||||||||
Rajlic & Gretton (2010) |
.69 |
.77 |
.65 |
.56 |
.61 |
.79 |
.64 |
.70 |
.68 |
.74 |
||||||||||||||||||||||||
Rajlic & Gretton (2010) |
.80 |
.62 |
.66 |
.48 |
.73 |
.72 |
.75 |
.51 |
.77 |
.62 |
||||||||||||||||||||||||
Rajlic & Gretton (2010) |
.51 |
.74 |
.59 |
.51 |
.41 |
.72 |
.51 |
.73 |
.53 |
.69 |
||||||||||||||||||||||||
Viljoen et al. (2008) |
.54 |
.56 |
.60 |
.53 |
.54 |
.64 |
.52 |
.49 |
.45 |
.54 |
||||||||||||||||||||||||
Viljoen, Mordell & Beneteau |
.67 |
.66 |
.61 |
.49 |
.63 |
.66 |
.60 |
.60 |
.70 |
.65 |
||||||||||||||||||||||||
Martinez, Flores & Rosenfeld (2007) |
.78 |
.76 |
|
|||||||||||||||||||||||||||||||
Note: Results are shown by area under the curve value, unless otherwise noted. Sex = sexual recidivism; NS/Gen = nonsexual or general recidivism; NG = value not given; NA = scale not assessed; Sig = significant. |
However, in a 2012 study involving 73 adolescent males who sexually offended, Fanniff and Letourneau (2012) found that the J-SOAP-II total score was not predictive of either sexual recidivism (AUC = 0.58) or general recidivism (AUC = 0.60). In studying both the J-SOAP-II and JSORRAT-II, Viljoen and colleagues (2008) reported that neither instrument reached a level of statistical significance in predicting sexual recidivism. The J-SOAP-II achieved an area under the curve value of only 0.54 for sexual recidivism and 0.56 for general recidivism. Similarly, Parks and Bard (2006) and Caldwell, Ziemke and Vitacco (2008) found no relationship between the total score of the instrument and either sexual or nonsexual recidivism. Chu and colleagues (2012) studied the use of J-SOAP-II in Singapore and also found that the total score was not predictive of sexual recidivism (AUC = 0.51); however, the researchers reported that the instrument showed moderate predictive validity for general recidivism (AUC = 0.79).
In their study, Rajlic and Gretton (2010) found substantial variation in the predictive accuracy of the J-SOAP-II within subgroups of sexually abusive youth who later sexually recidivated. While the researchers reported a sexual recidivism area under the curve value of 0.69 (demonstrating mild predictive accuracy) based on an analysis of all 286 study subjects, a higher degree of predictive validity was found among the 128 juveniles who had previously committed only sexual offenses, and a much lower degree of predictive validity was found among study subjects who had previously committed both sexual and nonsexual offenses. For juvenile recidivists who had previously committed only sexual offenses, Rajlic and Gretton reported an area under the curve value of 0.80, indicating moderate validity in predicting sexual recidivism. Conversely, they reported a value of only 0.51 in predicting sexual recidivism in the group of 14010 juveniles who had previously committed both sexual and nonsexual offenses, indicating only chance levels of predictive validity.
Finally, Viljoen, Mordell and Beneteau (2012) conducted a meta-analysis that examined the predictive accuracy of several sexual risk assessment instruments, including the J-SOAP-II. Aggregated area under the curve values were reported for each instrument studied.11 For the J-SOAP-II, the researchers reported aggregated values of 0.67 for sexual recidivism and 0.66 for general recidivism, both of which narrowly fall into the range of mild predictive validity. It is important to note, however, that these findings arguably reflect a homogenized view of the instrument's predictive validity rather than a set of consistent or stable validation results across different studies, as aggregated area under the curve values mask and filter out significant variation in outcomes produced across different studies.
J-SOAP-II Subscales
As previously noted, the J-SOAP-II consists of four subscales, each of which produces a risk score. As the data reported in Table 2 indicate, research examining the predictive validity of these subscales has also produced inconsistent findings. Wide variations in predictive accuracy are found across studies even within specific subscales. For example, area under the curve values for sexual recidivism within subscale 1 of the J-SOAP-II range from a high of 0.83 (strong predictive accuracy) to a low of 0.23 (no better than chance). Similar variation is apparent within other subscales of the instrument.
Although there is some support in the literature for the predictive validity of the J-SOAP-II, the empirical evidence can best be described as partial and inconsistent. In some studies, evidence of predictive accuracy has been found for the total score of the instrument, while in others the total score was found to be less predictive than the individual subscales of the instrument. Rajlic and Gretton (2010) also found significant differences in the predictive capacity of the instrument based on the composition of the juveniles being assessed. Further, in some independent research, the J-SOAP-II has been found to be effective in predicting general, but not sexual, recidivism. Given these disparate findings, the J-SOAP-II cannot be considered an empirically validated instrument at this time. Far more research is needed to determine whether the disparate validation findings reflect true weaknesses in the predictive accuracy of the instrument or shortcomings within the validation research undertaken to date. However, as Fanniff and Letourneau (2012, p. 403) aptly state —
Mental health professionals conducting predisposition evaluations should proceed with great caution when interpreting J-SOAP-II scores as part of broader risk assessments. Even when J-SOAP-II is only one source informing clinical judgment, evaluators have been unable to produce valid estimates of risk.
Although Hecker (2014) challenged Fanniff and Letourneau's (2012) conclusion regarding scale 1 of the J-SOAP-II, Fanniff and Letourneau (2014) stood by the results and implication of their study. They, in turn, argued that greater level of accuracy is demanded when instrument scores may be used to inform decisions that may have a serious, long-term and possibly harmful impact on the young person, urging "caution in the interpretation of J-SOAP-II scores in disposition evaluations" (p. 410).
Predictive Validity of the ERASOR
The ERASOR has not been as widely examined as the J-SOAP-II. However, like the J-SOAP-II, the available research on the ERASOR offers inconsistent and, at best, partial, support for its predictive validity.
Table 3 summarizes research findings concerning the predictive validity of the instrument. Area under the curve values reported in each study for the instrument's clinical rating score and total score are presented for both sexual and general recidivism predictive accuracy. The reader should note that the ERASOR total score is a numerical scoring system assigned by researchers, rather than a scale that appears in the instrument itself; it is not likely to be used by practitioners in the field. Instead, the instrument employs only a clinical rating system based on the evaluator's professional judgment of risk associated with the presenting risk factors.
As with the J-SOAP-II, the area under the curve values reported for the ERASOR vary considerably across studies. For example, area under the curve values for the clinical rating score for sexual recidivism range from 0.86 (high predictive validity) to 0.54 (no better than chance). Total score values for sexual recidivism range from 0.93 to 0.54.
The strongest support for the predictive validity of ERASOR comes from the study conducted by Worling, Bookalam and Litteljohn (2012). The researchers reported an area under the curve value of 0.82 for the sexual recidivism clinical rating score based on a mean follow-up period of 1.4 years. However, the reported value drops to 0.61 when the follow-up period increases to a mean of 3.7 years. Worling and his colleagues suggested that this may reflect the deterioration of accurate risk prediction in still-developing adolescents, and noted that the instrument is intended to measure risk in a two-year period. In discussing their findings, Worling, Bookalam and Litteljohn (2012, p. 14) stated —
The fact that more contemporaneous ratings were … more predictive of subsequent sexual offending suggests that it is important for clinicians to reassess adolescents and that clinical and forensic decisions are likely to be more accurate if they are based on more recent risk assessments.
Study Authors | Total Score | Clinical Rating | ||
---|---|---|---|---|
Sexual Recidivism | General Recidivism | Sexual Recidivism | General Recidivism | |
Chu et al. (2012) |
.74 |
.66 |
.83 |
.69 |
Rajlic & Gretton (2010) |
.71 |
.70 |
.67 |
.71 |
Rajlic & Gretton (2010) |
.86 |
.66 |
.77 |
.64 |
Rajlic & Gretton (2010) |
.54 |
.61 |
.54 |
.58 |
Viljoen et al. (2009) |
.60 |
.53 |
.64 |
.50 |
Viljoen, Mordell, & Beneteau (2012) |
.66 |
.59 |
.66 |
.59 |
Worling, Bookalam, & Litteljohn (2012) |
.72 |
.65* |
.61 |
.61 |
Worling, Bookalam, & Litteljohn (2012) |
.93 |
.62* |
.82 |
.62 |
Note: Results shown in area under the curve values. |
Indeed, the study conducted by Worling — the instrument's primary author — and his colleagues shows variability in results depending on what is measured, when it is measured and how it is measured. Area under the curve values range from 0.61 to 0.82 for the clinical rating score, and from 0.72 to 0.93 for the total score in this study.
Although Worling et al. (2012) argue that the results of their study provide support for the predictive validity of the ERASOR regarding sexual recidivism, their results varied depending on the length of the follow-up period and how the instrument was scored. Moreover, for the purposes of research, Worling and colleagues, like researchers in some other studies, scored the ERASOR in ways that most field evaluators would not. In addition to a clinical rating (low, moderate or high) based on the final judgment of the evaluator (which is the way in which ERASOR is designed to be scored by evaluators in the field), Worling et al. additionally scored the total number of risk factors assessed as present and calculated a total score, based on assigning numerical values to each risk factor. As noted, based on the design of and instructions for the instrument, it is only the clinical rating score that is most likely to be used in the field.
While some studies other than that conducted by Worling and his colleagues (2012) have found moderate to high levels of sexual recidivism predictive accuracy associated with the ERASOR clinical rating score, others have not produced similar results. For example, Chu and colleagues (2012) reported an area under the curve value of 0.83 for the ERASOR clinical scale, indicating moderate to strong predictive validity for sexual recidivism. However, Viljoen and colleagues (2009) examined the predictive validity of ERASOR as part of a larger study of risk assessment instruments and reported a value of only 0.64, concluding that the instrument did not yield significant predictive validity for accurately or dependably predicting juvenile sexual recidivism. 12
In their study, Rajlic and Gretton (2010) reported that ERASOR was moderately predictive of sexual recidivism, with an overall area under the curve value of 0.71 for the total score and 0.67 for the clinical rating score. When used to evaluate risk for sexual recidivism among juveniles who had previously committed only sexual offenses, ERASOR yielded an area under the curve of 0.86 for the total score and 0.77 for the clinical rating score. However, when used to evaluate predictive validity for sexual recidivism for juvenile sexual offenders who had previously committed both sexual and nonsexual offenses, ERASOR resulted in an area under the curve value of only 0.54 for both the clinical rating and total score, failing to show predictive validity.
Most recently, in their meta-analysis consolidating the results from 33 studies, Viljoen, Mordell and Beneteau (2012) reported aggregate area under the curve values for the ERASOR of 0.66 for sexual recidivism and 0.59 for nonsexual recidivism. Even though an aggregate score potentially inflates the area under the curve value, Viljoen and colleagues' results still produce only marginal evidence of predictive validity for the instrument. Based on the evidence, ERASOR may be considered a promising, but not an empirically validated, instrument.
Predictive Validity of the JSORRAT-II
The JSORRAT-II is the first and only actuarial risk assessment instrument available for use with juveniles who sexually offend. Although it is still undergoing validation, the introduction of JSORRAT-II has added a significant new dimension to the assessment of juveniles who commit sexual offenses. However, few studies focusing on JSORRAT-II have been undertaken to date, and their findings offer little consistent empirical support for the predictive validity of the instrument, based on area under the curve values.
Table 4 summarizes research findings from five studies that examined the predictive validity of JSORRAT-II. Area under the curve values reported in each study for the instrument's sexual and nonsexual recidivism predictive validity are presented in the table. Again, the research has produced mixed results. Area under the curve values for sexual recidivism range from a high of 0.89 (strong predictive validity) to a low of 0.53 (predictive validity that is no better than chance).
Study Authors | JSORRAT-II | |
---|---|---|
Sexual Recidivism | General/Nonsexual Recidivism | |
Epperson et al. (2006) |
.89* |
NA |
Epperson & Ralston (2009) |
.65–.66 |
NA |
Ralston & Epperson (2012) |
.70 |
.54 |
Ralston, Epperson & Edwards (2014) |
.70 |
.54 |
Age 11-13 |
.73 |
N/A |
Viljoen et al. (2008) |
.53 |
.54 |
Viljoen, Mordell & Beneteau (2012) |
.64 |
NA |
Note: Results shown in area under the curve values. |
The strongest support for the predictive validity of the JSORRAT-II comes from a study conducted by the instrument's developers, Epperson and colleagues (2006). In their 2006 study based on an initial sample of 636 adjudicated male juveniles who committed sexual offenses, Epperson and colleagues (2006) reported an area under the curve value of 0.89 for predicting sexual recidivism prior to age 18, and a value of 0.79 for predicting sexual recidivism any time prior to or after age 18. Both values reflect strong predictive accuracy. However, in examining the instrument's capacity to accurately predict sexual recidivism only after age 18, Epperson and colleagues reported a value of 0.64, indicating weak predictive validity. This led the researchers to speculate that different risk factors may be at play for young adult recidivists compared to juvenile recidivists. Despite the strong area under the curve values Epperson and colleagues found in their 2006 study, in the cross-validation study of the instrument (the sample against which the initial predictive model was tested after first being developed), Epperson and Ralston (2009) and Epperson, Ralston and Edwards (2009) reported sexual recidivism values of only 0.66 and 0.65, respectively, both of which fall at the threshold of mild predictive validity (despite statistical significance, which again simply indicates that the results are greater than chance alone).
In a more recent study, Ralston and Epperson (2012) and Ralston, Epperson & Edwards (2014) reported an area under the curve value of 0.70 for the JSORRAT-II's capacity to predict sexual recidivism, indicating mild-moderate predictive accuracy. However, when gradated by age, although area under the curve values for adolescents aged 11-15 fell between 0.73 and 0.74, indicating moderately good predictive validity, when applied to older adolescents, aged 16-17, the value was only 0.56.
Independent studies focusing on the JSORRAT-II are few, and have not found the same level of predictive validity that Epperson and colleagues found in their 2006 study and 2012 studies. In the only independent study of the instrument, Viljoen and colleagues (2008) found no evidence of predictive validity for either sexual or nonsexual recidivism, reporting area under the curve values of 0.53 for sexual recidivism and 0.54 for general recidivism. In their meta-analysis of all juvenile risk assessment instrument validation studies, Viljoen, Mordell and Beneteau (2012) reported an aggregated area under the curve value of 0.64 (which included the values previously reported by Epperson and colleagues) for the capacity of JSORRAT-II to predict juvenile sexual recidivism, which falls just below a marginal level of predictive validity despite the aggregated score.
In summary, while there is some evidence supporting the instrument's capacity for accurately predicting sexual recidivism for juveniles prior to age 18, the research studies conducted by independent researchers have failed to demonstrate that the instrument meets the threshold for predictive accuracy. Given the limited body of research on the instrument and the considerable variation in findings, JSORRAT-II cannot yet be considered an empirically validated instrument.
State-Specific Juvenile Risk Assessment Instruments
In addition to the three instruments discussed above, a handful of state-specific juvenile risk assessment instruments have been developed and placed into use to meet state requirements for sexual offender registration. (For more information on registration, see Chapter 6, "Registration and Notification of Juveniles Who Commit Sexual Offenses," in the Juvenile section.) Structured and empirically based risk assessment instruments have been developed and tailored for use in Texas (Texas Juvenile Sex Offender Risk Assessment Instrument), New Jersey (Juvenile Risk Assessment Scale: JRAS) and Wisconsin (Wisconsin Department of Corrections Guidelines for Release). However, none of these instruments are based on actuarial validation, nor are they empirically validated (Vitacco et al., 2009). Caldwell, Ziemke and Vitacco (2008) concluded that the risk constructs underlying the instruments were not valid, and that none of the three instruments predicts sexual recidivism.
One study has been completed on the JRAS (used in New Jersey), conducted by the instrument's developers, Hiscox, Witt and Haran (2007). The study followed 231 adjudicated male adolescent sexual offenders for an average follow-up period of 8.5 years and found that one of the three primary factors of JRAS — the antisocial factor — was moderately predictive of nonsexual recidivism and mildly predictive of sexual recidivism. Area under the curve values of 0.70 and 0.67 were found for nonsexual and sexual recidivism, respectively. The instrument's sexual deviance factor proved not to be predictive of either sexual or nonsexual recidivism. In terms of the number of youth assessed at a risk level that correctly matched actual recidivism, only 19 percent of youth assessed at moderate risk and 25 percent of youth assessed at high risk actually sexually recidivated; there were false positive rates of 81 percent and 75 percent for youth assessed at moderate and high risk, respectively.
The MEGA
The Multiplex Empirically Guided Inventory of Ecological Aggregates for Assessing Sexually Abusive Adolescents and Children (MEGA) is a structured clinical risk assessment instrument released for use in 2013, after several years in development. To date, the instrument has undergone validation studies largely aimed at developing a strong instrument, conducted by its developer or close associate, and has yet to be evaluated by independent researchers. Studies have thus far focused primarily on the instrument's internal construction and consistency (Miccio-Fonseca, 2009, 2010), with one study (Miccio-Fonseca, 2013) describing predictive validity, with reported area under the curve values for the risk scale of the instrument of 0.71 for ages 13-19 and 0.77 for ages 4-12, over a six-month follow-up period. While these values indicate mild-moderate predictive validity, the follow-up period is short, and predictive validity has yet to be examined or established over a longer follow-up period, and awaits independent research.
The MEGA is intended for use with males and females aged 4 through 19, of all IQ levels; this is a remarkably wide range of potentially applicable assessment subjects for a single risk assessment instrument, including both young children and young adults. While the practical benefits of having a single instrument that can be used with so many different subjects are many, targeting such a wide range of subjects with a single instrument (in terms of age, gender and cognitive capacity) may inadvertently undermine the instrument's capacity to predict recidivism accurately. In a study of almost 1,400 juvenile offenders, van der Put and colleagues (2011) found that the effect of both static and dynamic risk factors on recidivism, and hence predictive validity, varied by adolescent age. The researchers suggested not only that different risk assessment instruments be used for juveniles and adults, but that different instruments be used for different age groups within adolescence, as well.
Summary
No juvenile risk assessment instrument has demonstrated consistent predictive validity.
Although empirical support for the predictive validity of the J-SOAP-II, ERASOR and JSORRAT-II is found in the literature, the instruments do not perform in a manner that suggests or proves their ability to accurately predict juvenile sexual recidivism (Caldwell et al., 2008; Viljoen et al., 2009; Vitacco, Viljoen & Petrila, 2009). As Knight, Ronis and Zakireh (2009) have stated, the relatively few studies of juvenile risk assessment instruments undertaken to date have not produced consistent evidence that either the J-SOAP-II, ERASOR or JSORRAT-II are effective in predicting sexual recidivism. Tests of the predictive accuracy of the instruments conducted by independent investigators have typically yielded mixed to poor results for both sexual and nonsexual risk, and especially for the prediction of sexual recidivism. Hence, none of the instruments has a consistently demonstrated record of predictive validity and, as, Viljoen, Mordell and Beneteau (2012) note, juvenile risk assessment instruments may be insufficient to make predictions that require a high degree of precision, such as situations in which the civil commitment of juveniles who commit sexual offenses is at stake or juveniles face the possibility of extended or lifetime sexual offender registration.
Until existing or new instruments are better validated, evaluations in this context will remain a complex balancing act between the need to provide the courts and other stakeholders with useful information and the serious limitations in empirically based knowledge about sexual risk.
(Fanniff & Letourneau, (2012, pp. 403–404).
Things to Come
The juvenile sexual risk assessment field continues to develop, and there is a significantly different type of risk assessment model on the horizon. Funded by the U.S. Department of Justice (grant number 2013-AW-BX-0053), Kim and Duwe (2016) describe the development of a statistical machine learning model they hope will address the field's fundamental approach to risk assessment. Machine learning involves mathematical algorithms, in which the assessment model iteratively learns from data in order to form predictions, in this case regarding sexual recidivism.
Based on a sample of over 3,000 juvenile offenders, Kim and Duwe (2016) describe the comparison of machine learning models against more traditional methods of risk prediction (comparing models for statistically estimating risk, not risk assessment instruments). In this study, one of several underway to build the model, machine learning models resulted in the highest area under the curve values for two-year and three-year follow-ups in the validation sample of 1,125 juvenile sexual offenders, ranging between 0.81 and 0.71, compared to values that ranged between 0.58 and 0.72 for more traditional models of prediction. As each prediction model was based on exactly the same set of information and variables, Kim and Duwe highlight the importance of how risk predictors were measured, and not simply what was measured or classified.
However, there is a tradeoff when it comes to statistical and computerized models of assessment and deriving information, not only about possible sexual recidivism, but, importantly, also the use of the risk assessment model as a means to inform, shape and guide treatment, and for the purposes of re-evaluation over time. That is, the possibility of gaining greater predictive power may also result in a loss of interpretive power and the capacity of risk assessment to understand youthful offenders and inform treatment. This juxtaposition of statistical (and in this case entirely computerized) and clinical assessment models raises questions about the role and nature of the risk assessment instrument/model in the larger and more comprehensive assessment of risk designed to drive case management and treatment, rather than passively predicting future behavior. Indeed, as Kim and Duwe (2016) note, the black box process of machine learning is unable to provide guidance on how to identify and address treatment needs. This is a bridge to be crossed in the event that machine learning models for juvenile sexual risk assessment become a reality, possibly sometime within the next one to two years.
Protective Factors in Assessments of Juvenile Risk
Although risk factors are the foundation of virtually all risk assessment instruments, in recent years, and increasingly so, more attention has been given to protective factors and their role in mitigating the effects of risk factors. Protective factors have been described in the child and adolescent development literature, and their role in delinquency prevention has long been recognized. Their appearance in the forensic literature and consideration of protective factors in the process of evaluating and treating risk for juvenile sexual recidivism, however, are both relatively new (for example, Langton, & Worling, 2015; de Vries Robbé, Mann, Maruna & Thornton, 2015).
The relationship between risk and protective factors is complex. Jessor and colleagues (1995, 2014) describe risk and protection as opposite ends of the same constructs. They argue that risk and protective factors exist independently of one another, and are not statistically correlated. Similarly, Hall and colleagues (2012) view risk and protective factors as conceptually distinct (rather than opposite ends of a single dimension) and assert that it is not only possible, but essential to conceptualize and define risk and protective factors independently from one another. Regardless, it remains difficult to estimate the effects protective factors have on risk, even though the process of risk assessment arguably must take protective factors into account, and in addition there are different types of protective factors, include those that act directly and those that act indirectly, sometimes also known as promotive factors (Farrington & Ttofi, 2011).
In his critique of forensic risk assessment, Rogers (2000) describes assessment as inherently flawed if it pays attention only to risk factors without consideration of the presence, weight and action of protective factors. Similarly, Rutter (2003) describes the importance of paying attention to the possibility of factors that protect against antisocial behavior, as well as to those that fuel it. Although not referring to protective factors per se, in describing clinical predictions of risk Monahan (1995) noted the importance of giving balanced consideration to factors that indicate the absence of violent behavior, as well as those that suggest the recurrence of violence. Finally, several researchers have described the mitigating effects protective factors can have on risk, noting that these effects have direct implications for programming to reduce violent recidivism, as both risk and protective factors should be targets of intervention and treatment efforts (Lodewijks, de Ruiter & Doreleijers, 2010; Stouthamer-Loeber et al., 2002). Indeed, Lodewijks, de Ruiter and Doreleijers (2010, p. 584) stated, "We can safely conclude that protective factors should be an inextricable part of all risk assessment instruments used with youth."
Despite their importance in mitigating risk, protective factors are incorporated in few juvenile instruments at this time.
Despite the apparent importance of protective factors, few of the instruments commonly used with juveniles incorporate protective factors, and those that do either have no empirical support or are in development and have not yet been empirically validated. In fact, Worling, Bookalam and Litteljohn (2012) noted that very little research regarding factors that lead to the cessation of sexual offending behaviors for juveniles has been undertaken to date, and that it will be important for future research to identify protective factors and determine how best to combine risk and protective factors to enhance judgments of future sexual behavior.
One of the first studies to examine the relationship of risk and protective factors to sexual and nonsexual recidivism was conducted by Spice and colleagues (2013) using a sample of adolescent males who committed sexual offenses. Although the study failed to find any protective factors that were statistically related to sexual recidivism or desistance, study findings nonetheless suggest there may be protective factors that are specific to sexual, rather than nonsexual, recidivism. Like Worling, Bookalam and Litteljohn (2012), the researchers called for more research on both risk and protective factors and the roles they play in sexual offending, and they specifically noted the need for studies that examine whether there are protective factors that apply to sexually abusive youth specifically. Since that time, a handful of studies have emerged that address the nature and role of protective factors in helping to reduce or buffer against sexual recidivism, but this research is in its early stages. Nevertheless, learning more about and understanding the mechanisms and effects of protective factors on risk for sexual reoffense will perhaps prove as difficult as better understanding the actions and complexities of risk factors. In their study, for example, Klein, Rettenberger, Yoon, Köhler and Briken (2015) found that the presence of protective factors failed to achieve any measurable effect on juvenile sexual recidivism, whereas van der Put and Asscher (2015) arrived at an opposite conclusion. In their study of sexually abusive youth, they found that the inclusion of protective factors in the risk assessment process added to increased predictive validity regarding sexual recidivism (although not in those juveniles who also had a history of violent offenses).
A handful of juvenile sexual risk assessment instruments are worth noting due to their assimilation of protective factors. These include the AIM2 (Print et al., 2007), the Juvenile Risk Assessment Tool (J-RAT) (Rich, 2011) and the previously mentioned MEGA (Miccio-Fonseca, 2010). However, the protective factors scales of these instruments have not yet been empirically tested, and so remain only theoretical instruments for clinical use and treatment planning at this time. The Protective Factors Scale (PFS) (Bremer, 2006) is not a risk assessment instrument, but was nevertheless developed specifically for work with sexually abusive youth and its sexuality scale reviews three elements specifically related to such behavior. However, the PFS has received scant attention from researchers and practitioners: It has not been subject to any form of validation and is not in general use in the field. More recently, the Desistence for Adolescents Who Sexually Harm (DASH-13) has become available for helping to incorporate protective factors into the process of risk assessment, and has been the subject of one research study. However, Zeng, Chu & Lee (2015) found that the instrument did not predict desistance from sexual or nonsexual recidivism.
Summary
Research concerning the factors that place juveniles at risk for sexual offending behavior and sexual recidivism is still in its infancy, as is research on the capacity of risk assessment instruments to accurately predict risk for sexual recidivism. Nevertheless, studies that have been undertaken to date provide some important insights about both issues.
First, the range of risk factors for juvenile sexual offending behavior and recidivism is relatively well defined, and the types and classes of factors that place youth at risk for sexually abusive behavior or sexual recidivism have been identified. However, our understanding of these factors and how they relate to sexual offending tends to be global rather than specific in nature. The role and effect of risk factors is fairly well understood, but the specific mechanisms through which risk factors develop and ultimately impact the behavior of children and adolescents are not. The effects of risk factors under different circumstances and their interactions with one another are particularly obscure. Moreover, research has not yet produced a universally agreed upon, finite and valid set of risk factors for sexually abusive behavior.
Second, the risk assessment instruments that are currently available for use with juveniles who sexually offend are far from empirically validated. In short, there is a lack of consistent, independently corroborated empirical evidence concerning both inter-rater reliability and the predictive validity of current juvenile sexual risk assessments, making it difficult to conclude with any degree of confidence that the instruments are scientifically valid. This raises concerns about the capacity of such instruments to reliably and accurately predict the risk of juvenile sexual recidivism or to inform either juvenile court decisions or public policy debates. While some validation research has produced promising findings, the evidence concerning the predictive accuracy of various instruments is mixed and inconsistent overall. Thus, Vitacco and colleagues (2009) describe current instruments as important developmental milestones in further refining the risk assessment process and method, but far from complete. Viljoen, Mordell and Beneteau (2012) also warn that such instruments are not yet capable of making precise and certain estimates of risk and should thus be used cautiously in legal procedures, such as the civil commitment of juveniles who commit sexual offenses or their placement onto sex offender registries.
Third, given the developmental processes that characterize both childhood and adolescence, there is a clear need for juvenile risk assessment instruments and processes to focus on estimates of short-term rather than long-term risk (Fanniff & Letourneau, 2012; Vitacco et al., 2009; Worling, Bookalam & Litteljohn, 2012). Participants in the Sex Offender Management Assessment and Planning Initiative forum expressed concern that estimates of risk reaching more than one to three years into the future are unlikely to sufficiently account for the fluid nature of child and adolescent development. However, the adoption of a short-term assessment model will likely mean that the manner in which juvenile risk instruments are used and researched will have to significantly change.
Finally, Rich (2011) and Spice and colleagues (2013) have argued for future research to study not only risk factors and the accuracy of risk assessment instruments, but also the nature of risk itself. They further argue that risk assessment instruments should be used as a platform for case management and treatment rather than for making "passive predictions of limited practical use" (Boer et al., 1997, p. 4). In this vein, Viljoen, Mordell and Beneteau (2012) write that despite the research focus on the prediction of sexual recidivism, these instruments are also intended to help manage risk and plan treatment to prevent reoffense. They note that increased attention to the utility of tools for these purposes will enable us to move beyond the prediction of sexual reoffense toward the prevention of sexual reoffense.
Regardless of the strength of the instrument, sound risk assessment requires well-trained risk evaluators who do not simply rely on risk scores when making decisions about a juvenile offender, particularly decisions with potentially lifelong consequences. As described in the psychological evaluation guidelines of the American Psychological Association (Turner et al., 2001), risk evaluators should use their professional training and knowledge of psychology, human behavior and social interactions to draw clinical conclusions. Even when using an actuarial assessment tool, it remains important for the evaluator to apply clinical judgment in the risk assessment process.
Indeed, SOMAPI national forum participants noted a need for the provision of federally funded training and technical assistance to ensure the development of well-trained evaluators who understand the nature of the risk assessment process and the limitations of assessment instruments that are currently available. Well-trained, knowledgeable evaluators are the best defense against the pitfalls associated with erroneous assumptions concerning the predictive accuracy or use of risk assessment instruments for juveniles who sexually offend. Those who use the results of juvenile risk assessments must also understand the strengths and weaknesses of the risk assessment process and the limitations of risk assessment instruments in use today, and particularly the lack of empirical evidence demonstrating their predictive accuracy.
Perhaps most important, risk assessment instruments must be integrated into a comprehensive assessment process that produces a thorough understanding of the juvenile who is being assessed. Risk assessment instruments certainly can play an important role in the process, but their current value arguably lies more in their ability to serve as a basis for case management and treatment rather than in their capacity to accurately predict risk. The role that risk assessment instruments can play in identifying the presence of dynamic risk factors that provide targets for treatment is particularly important, as is the role they can play in identifying the presence of protective factors and their potentially mitigating effects on risk. Indeed, participants in the 2012 SOMAPI forum recommended that protective factors be incorporated into juvenile risk assessment instruments, both those currently in use and those that will be developed in the future. Future research should be concerned with expanding the knowledge base concerning both risk and protective factors, including the mechanisms through which they affect the propensity to reoffend, particularly in combination with one another.
Finally, better risk assessment instruments for juveniles who sexually offend and better trained evaluators are both needed. In describing the "covenant" between the developers and users of risk assessment instruments, Rich (2009) underscored how important well-designed instruments and trained, experienced evaluators are for effective professional practice. As Ward, Gannon and Birgden (2007, p. 207) aptly stated in discussing the responsibility of the instrument end user:
Practitioners have obligations to always use such measures appropriately, ensure they are trained in their administration and, most importantly, make sure that the assessment process culminates in an etiological formulation that is based around the individual's features alongside those they share with other offenders.
1 The base rate refers to the frequency with which a defined situation occurs, or its incidence rate. 2 Juvenile Sex Offender Assessment Protocol-II (J-SOAP-II). 3 Juvenile Sexual Offense Recidivism Risk Assessment Tool-II (JSORRAT-II). 4 Inter-rater reliability refers to the consistency of a measure or tool in giving the same result when the same information is collected and assessed by different evaluators. 5 Rich (2011) identified 101 risk factors for juvenile sexually abusive behavior and more have since been described in the literature. 6 The meta-analysis involved a combined sample of almost 3,900 adolescent male sexual offenders. In this meta-analysis, Seto and Lalumière described deviant sexual interest as "atypical" sexual interest. 7 The meta-analysis involved nine studies. 8 An intra-class correlation coefficient greater than 0.75 indicates a high level of inter-rater consistency. 9 However, receiver operating characteristic scores and resulting area under the curve values have been increasingly used in the assessment of the predictive validity of risk assessment instruments. 10 Eighteen juveniles were excluded from the breakdown into subgroups due to unavailable data regarding prior sexual and nonsexual offenses. 11 Viljoen and colleagues studied the J-SOAP-II, ERASOR, JSORRAT-II and Static-99, an adult risk assessment instrument. Overall, the meta-analysis consolidated 33 studies involving more than 6,000 male adolescent sexual offenders. 12 The other instruments in the study were the Hare Psychopathy Checklist: Youth Version (PCL:YV) and the Youth Level of Service/Case Management Inventory (YLS/CMI), which are designed to assess and predict risk for juvenile non-sexual violence and aggression, respectively; and Static-99, an adult actuarial risk assessment instrument. 13 Also see Hempel et al. (2011). In their review of juvenile sexual risk assessment instruments, the researchers conclude that "the predictive validities of the risk assessment instruments for JSOs are still insufficient to accurately predict recidivism" (p. 16). |