Use and Abuse of "Risk Factors" in Sexually Violent Predator Trials
by John Tennison, MD, Copyright September 18, 2011
In sexually violent predator trials that I have witnessed in Texas, comments by prosecution attorneys and testimony from experts retained by the prosecution have involved discussions and presentations of “Risk Factors” and “Protective Factors” in ways that are deceptive and misleading to juries.
Consequently, I am hopeful that the following discussion will contribute to a more rational understanding and consideration of so-called “risk factors” and “protective” factors both in and out of the courtroom.
Causality
First and foremost, “risk factors” are associations. The mathematical or statistical determination of a construct or item as a “risk factor” does not demonstrate causality. That is, even though something is mathematically and statistically a “risk factor,” such a risk factor might or might not have a causal relationship to the outcome of concern.
“Meaningful” Risk Factors
On page 191 of their 2010 article, Ruth E. Mann, R. Karl Hanson, and David Thornton proposed that:
“the basic requirements for a psychologically meaningful risk factor are (a) a plausible rationale that the factor is a cause of sexual offending and (b) strong evidence that it predicts sexual recidivism.”
Another 3rd important criterion for a risk factor to be “meaningful” is that it not be redundant with other risk factors that have already been taken into account. For a further discussion of the meaninglessness of redundant risk factors, refer to the sections below titled “Redundancy of Risk Factors: Psychopathy and other ‘Risk Factors’,” “Identified ‘Risk Factors’ Are Rarely Additive,” and “White Wood-Frame Houses.”
Calling Something “Protective” Does Not Make It So
Despite criterion b above (strong evidence that it predicts sexual recidivism), I have personally witnessed experts for the prosecution in Texas declare various things as being “protective factors” for which there is very little, if any, evidence to support such claims. That is, in many instances, items get presented to juries that (to the best of my knowledge) have not been shown to be associated with a reduction in sexual recidivism. For example, I have personally witnessed prosecution experts declare that having earned a GED is a “protective factor.” Yet, I know of no evidence anywhere, let alone “strong” evidence, that has ever demonstrated that earning a GED is associated with a reduction in sexual recidivism. More generally, if actually studied, the extent of education might turn out to have very little, if any, association with the extent of sexual recidivism. For example, despite being highly-educated, some priests have perpetrated some of the worst recorded acts of pedophilia on record.
Effect Size – “d,” “r,”and AUC (ROC Area)
Every construct or item that has been demonstrated by scientific and statistical methods to be a “risk factor” has an associated “effect size,” “magnitude,” or “potency,” which is the strength of association between that “risk factor” and a particular outcome (such as sexual recidivism) to which it is being associated.
Effect size is commonly computed with what is known as the “d statistic” or “Cohen’s d.” Here, “d” stands for “difference,” and is more completely known as “standardized mean difference” (Cohen, 1988).
Other ways of measuring effect size include “r,” also known as a “correlation coefficient,” and AUC (Area Under the Curve, also known as "ROC Area"). The mathematical relationship between d, r, and AUC is presented in a table in the 2005 paper "Comparing Effect Sizes in Follow-Up Studies: ROC Area, Cohen's d, and r," by Marnie E. Rice, Grant T. Harris, Law and Human Behavior, Vol. 29, No. 5, October, 2005.
With regard to the range of d values normally associated with sexual recidivism studies, the value of d will be slightly more than twice as large as r (the correlation coefficient) calculated from the same data.
Moreover, the POV (proportion of variance) shared between two variables can be calculated by squaring r. In statistical analysis, the squared r value is called the “coefficient of determination.” The coefficient of determination is understood by statisticians as quantifying the strength of association or the magnitude of variance that is shared between two variables (page 12, Ellis, 2010).
When “Medium” and “Large” Are Small
In his Statistical Power Analysis text, Cohen gives three anchor points for what can be conventionally considered “small,” “medium,” and “large” d statistics within the behavioral sciences. These values are .2, .5, and .8 respectively (pages 24-27, Cohen, 1988).
If one also considers ranges of d values for which the labels “small,” “medium,” and “large” could be reasonable applied, a d value of .35 for the threshold between “small” and “medium” seems reasonable, and a d value of .65 as a threshold between “medium” and “large” seems reasonable. Thus, “small” d values would be those less than .35; “medium” d values would be those from .35 to .65, and “large” d values would be anything greater than .65.
However, the labels “small,” “medium,” and “large” are so vague that it is ultimately better to consider the actual statistical and probabilistic meaning of d values in a certain range.
Indeed, in their 2005 paper, "Comparing Effect Sizes in Follow-Up Studies: ROC Area, Cohen's d, and r," Rice and Harris state the following in their abstract (page 615):
".....we urge researchers and practitioners to use numbers rather than verbal labels to characterize effect sizes."
and (page 619):
"The verbal characterizations proposed by Cohen could be very useful when communicating to laypersons such as court officials, for example, if they were commonly understood and adhered to by researchers and practitioners. However, we suggest the field of risk assessment place little reliance on plain language verbal labels because of the considerable disagreement about what they mean. For example, our colleagues (Hilton, Carter, Harris, & Bryans, 2005) found that, among a small group of forensic clinicians, some risks (i.e., probabilities between .40 and .55 of violent recidivism in 10 years of opportunity) were simultaneously characterized as "high,"" moderate," and "low" by different clinicians. Clarity is best reflected by numerical characterization and we offer the accompanying table to aid communication among researchers and practitioners."
For example, let’s examine actual data pertaining to sexual recidivism which underscores the misleading use of such labels as "small," "medium," and "large."
In their large 2009 meta-analysis (Hanson & Morton-Bourgon, 2009, page 17), which considered the outcome for 20,010 individuals for whom the Static-99 had been administered, the authors found that the Static-99 had a mean d value of .67 with regard to the outcome of “sexual recidivism.” Thus, r, the correlation coefficient, can be taken to be half of .67, which is .335. Moreover, squaring .335 equals .112225, which, after rounding off non-significant digits, equals .11, which is the value of the coefficient of determination. Thus, the POV (proportion of variance) is only 11%. (a very small proportion indeed).
The median d value for the Static-99 in this study was .74, which yields an r value of .37, and a squared r value of .1369, which rounds to .14. Thus, even when considering the slightly large median d value, the POV is still only 14%.
Thus, the overall variance in sexual recidivism that is “explained” or “taken into account” by or “associated with” the Static-99 is in the range of 11-14%. That is, risk factors that cause the other 86-89% of variance in sexual recidivism are not taken into account by the Static-99.
Recall that these small POV values are associated with d values that have conventionally been described in the “medium” to “large” range, i.e. between .5 and .8. However, only after converting such d values to POV values does it become clear that it is misleading to associate the words “medium” and “large” with d values in the range of .5 to .8.
Given the small proportion of variance (POV) of sexual recidivism associated with scores on actuarial instruments, such as the Static-99, basing involuntary commitments that will probably last a lifetime solely on such instruments seems scientifically unjustified.
Is Ice Cream Consumption in China a Risk Factor for Wildfires in Texas?
On page 338 of the excellent article, “Coming to Terms With the Terms of Risk,” from the Archives of General Psychiatry, Helena Kraemer (of Stanford University) and other authors make the following statement:
“It is difficult to find 2 variables that are absolutely independent of each other. Without considering some notion of potency in the assessment, given a large enough sample size, virtually every factor could be demonstrated to be a risk factor for every outcome that follows it. That would trivialize the concept of a risk factor.”
Based on this consideration, with a large enough sample size, it would almost certainly be possible to show that ice cream consumption in China was a risk factor for wildfires in Texas, yet such a “risk factor” would be expected to have a very small d value, or magnitude.
Since virtually any construct or item can be statistically shown to have a non-zero (both positive or negative) association with some particular outcome, virtually any construct or item can be theoretically presented to juries as “risk factors,” even if such constructs or items have d statistics that are above or below zero by small amounts (less than .3). In the field of assessment for sexual recidivism probability, “risk factors” with d statistics above or below zero by small amounts (less than .3) have effect sizes that are so small that to call such constructs or items “risk factors” can be highly deceptive and misleading to juries, as it implies a strength of association (or even worse, causality) that is far greater than that calculated by such small d statistics.
Here is an example that will help to clarify several statistical principles related to risk factors. I call my example “White Wood-Frame Houses.”
“White Wood-Frame Houses”
Let’s say an arsonist was preferentially seeking out and burning down “white wood-frame houses” in a particular neighborhood. Let’s say that brick houses and wood-frame houses of colors other than white were only occasionally burned down by this arsonist, and only then by random selection, rather than being chosen for some specific feature. Such a state would allow for a statistical calculation that would correctly conclude that a “White wood-frame house” was a high-magnitude “risk factor” for the outcome of being burned down.
However, let’s say that it was also the case that all white wood-frame houses in this neighborhood had brass doorknobs, red doors, and chimneys, while brick houses and houses of colors other than white had doorknobs made of metals mostly other than brass; doors of colors mostly other than red; and only had rarely had chimneys. If so, then a statistical analysis would also correctly conclude that “brass doorknobs,” “red doors,” and “chimneys” were also high-magnitude “risk factors” for the outcome of being burned down.
Three important points can be made from this example. In particular, 3 errors, two of which are “errors of causal attribution” and the third of which is an “error of redundant risk factors,” have been and continue to be frequently made in the thinking of prosecutors and prosecution experts in the Sexually Violent Predator Trials that I have observed. I break down the causal attributional errors into 2 subtypes: A and B:
Type A Causal Attribution Error: The fact that “white wood-frame houses,” “brass doorknobs,” “red doors,” and “chimneys,” are risk factors DOES NOT prove that these items were the cause of the outcome (burning down) being measured. Indeed, to say that “white frame houses,” “brass doorknobs,” “red doors”, or “chimneys” cause a house to be burned down would be absurd. Rather, it would make sense instead to attribute the causality to the choice of the arsonist, who preferentially targeted white wood-frame houses.
Type B Causal Attribution Error: Now imagine a realtor showing a potential buyer houses in this neighborhood. If a white wood-frame house was for sale in the neighborhood, a statistically-unsophisticated realtor might say, “I’d stay away from that white wood-frame house because it has 4 risk factors for being burned down, namely a white wood-frame, brass doorknobs, red doors and a chimney.” However, if there was a brick house for sale with brass doorknobs, red doors, and a chimney, the realtor might say “This brick house is safer because it only has 3 risk factors for being burned down, namely brass doorknobs, red doors and a chimney.” However, by calling the brass doorknob, red doors, and chimney “risk factors,” the realtor would be erroneously conveying the idea that these 3 items were associated with an increased probability of the brick house being burned down, when in fact, the presence of these items was merely (given the preference of the arsonist) a co-incidence that was not associated with an increased probability of the brick house being burned down at all. That is, only the fact of a house being a “white wood-frame house” was the selection criterion of the arsonist. Moreover, the realtor might say that a brick house with aluminum doorknobs, green doors, and no chimney had no “risk factors” for being burned down, yet the risk of such a house being burned down would actually be no less than a brick house with brass doorknobs, red doors and a chimney.
The 3rd error is an “Error of Redundant Risk Factors.” That is, risk factors can be partially or completely redundant with each other. For example, once someone has identified that a “white wood-frame house” is a risk factor for being burned down, no additional risk is accounted for and no additional explanatory capacity comes from adding in or taking into account “brass doorknobs,” “red doors,” or “chimneys.” This is true despite the fact that “brass doorknobs,” “red doors,” and “chimneys” still remain “risk factors” from a mathematical and statistical standpoint. Since all white wood-frame houses in the neighborhood already have these 3 features, no additional risk is accounted for by recognizing these 3 features. That is, after having identified the actual selection criterion of the arsonist, namely “white wood-frame houses,” no additional variability in risk is explained by also listing out “brass doorknobs” “red doors” and “chimneys,” as these three risk factors are already subsumed under (or subsets of) and thus are redundant with “white wood-frame house” as a risk factor.
Redundancy of Risk Factors: Psychopathy and other “Risk Factors”
Although other variables, such as “Psychopathy” as scored on the PCL-R have been be shown to be “risk factors” with small d values (i.e. less than .3), such “risk factors” do not appear to be adding any explanatory capacity or to be associated with any additional variance in sexual recidivism over and beyond what is already associated with actuarial instruments, such as the Static-99. For example, even though the SVR-20, the VRAG, and the SORAG are assessments that take into account psychopathy as measured by the PCL-R, each of these 3 assessments have median and mean d values that are no better than the Static-99. In fact, the d values as measured for these 3 instruments is actually worse than the Static-99. Thus, factoring in psychopathy, as measured by the PCL-R, appears to be no better than the Static-99 alone.
Thus, even if someone was truly a “psychopath” by PCL-R standards, such a fact does not appear to contribute to or explain any additional risk over and beyond the factors already considered in the Static-99. Thus, after risk factors in the Static-99 have been considered, someone’s score on a PCL-R seems to have limited relevance, if any.
Identified “Risk Factors” Are Rarely Additive
Yet, despite the high potential for substantial redundancy in what are identified as “risk factors,” I have personally witnessed laundry lists of risk factors being presented to juries in Sexually Violent Predator trials in Texas with no consideration whatsoever of the fact that there could be a high degree of redundancy between the individual risk factors. That is, the so-called “risk factors” are saliently listed out so as to suggest that each risk factor listed adds additional risk not already accounted for by the other risk factors.
If the hypothetical realtor in my example above was an expert witness, a jury would likely be erroneously told by the realtor that a white wood-frame house with brass doorknobs, red doors, and a chimney was 4 times as unsafe from arson as a white wood-frame house without brass doorknobs, red doors, and a chimney, yet (based on the selection criterion of the arsonist), neither white wood-frame house would be any more dangerous than the other.
Thus, simply presenting a list of constructs or items as risk factors to a jury can falsely imply an equal magnitude of risk for each of the risk factors; and can falsely imply that the overall risk is an arithmetic sum of the individual risk magnitudes associated with each risk factor.
Such misleading presentations to juries can be avoided by citing the specific d statistic that has been calculated for each item or construct being presented; and by explicitly considering the degree (when possible, by mathematical and statistical methods) to which there is redundancy between risk factors. In general, there will almost always be some redundancy between risk factors in the social sciences. This leads me to conclude with Tennison’s Rule of Risk Factor Redundancy:
“The Whole is Less Than the Sum of The Parts”
References
1. “Coming to Terms With the Terms of Risk,” by Kraemer HC, Kazdin AE, Offord DR, Kessler RC, Jensen PS, Kupfer DJ.; Department of Psychiatry and Behavioral Sciences, Stanford University, California, USA; Archives of General Psychiatry, April, 1997, 54(4):337-43.
2. “The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research Results,” by Paul D. Ellis, Cambridge University Press, 2010.
3. “Statistical Power Analysis for the Behavioral Sciences, 2nd Edition,” by Jacob Cohen, 1988, (reprinted in 2009), The Psychology Press.
4. “The Accuracy of Recidivism Risk Assessments for Sexual Offenders: A Meta-Analysis of 118 Prediction Studies,” by R. Karl Hanson & Kelly E. Morton-Bourgon, Psychological Assessment, 2009, Vol. 21, No. 1, 1-21.
5. “Assessing Risk for Sexual Recidivism: Some Proposals on the Nature of Psychologically Meaningful Risk Factors,” by Ruth E. Mann, R. Karl Hanson, and David Thornton, Sexual Abuse: A Journal of Research and Treatment, 22(2) 191–217, 2010.
6. "Comparing Effect Sizes in Follow-Up Studies: ROC Area, Cohen's d, and r," by Marnie E. Rice, Grant T. Harris, Law and Human Behavior, Vol. 29, No. 5, October, 615-620, 2005.