**
Use and Abuse of "Risk Factors" in Sexually
Violent Predator Trials**

** **

**by John
Tennison, MD, Copyright September 18, 2011**

In sexually violent predator
trials that I have witnessed in Texas, comments by prosecution attorneys and
testimony from experts retained by the prosecution have involved discussions and
presentations of “**Risk Factors**” and “**Protective Factors**” in ways
that are deceptive and misleading to juries.

Consequently, I am hopeful that
the following discussion will contribute to a more rational understanding and
consideration of so-called **“risk factors**” and “**protective**” factors
both in and out of the courtroom.

__
Causality__

First and foremost,** “risk
factors” are associations.** The mathematical or statistical determination of
a construct or item as a “risk factor” does not demonstrate causality. That is,
even though something is mathematically and statistically a “risk factor,” such
a risk factor might or might not have a causal relationship to the outcome of
concern.

__
“Meaningful” Risk Factors__

On page 191 of their 2010 article, Ruth E. Mann, R. Karl Hanson, and David Thornton proposed that:

**“the basic requirements for a
psychologically meaningful risk factor are (a) a plausible rationale that the
factor is a cause of sexual offending and (b) strong evidence that it predicts
sexual recidivism.”**

Another 3^{rd} important criterion for
a risk factor to be **“meaningful**” is that it ** not** be
redundant with other risk factors that have already been taken into account.
For a further discussion of the meaninglessness of redundant risk factors, refer
to the sections below titled

__
Calling Something “Protective” Does Not Make It
So__

Despite criterion b above (**strong evidence
that it predicts sexual recidivism),** I have personally witnessed experts for
the prosecution in Texas declare various things as being “**protective factors**”
for which there is very little, if any, evidence to support such claims. That
is, in many instances, items get presented to juries that (to the best of my
knowledge) have not been shown to be associated with a reduction in sexual
recidivism. For example, I have personally witnessed prosecution experts
declare that having earned a **GED** is a “protective factor.” Yet, I know
of no evidence anywhere, let alone “**strong**” evidence, that has ever
demonstrated that earning a **GED** is associated with a reduction in sexual
recidivism. More generally, if actually studied, the extent of education might
turn out to have very little, if any, association with the extent of sexual
recidivism. For example, despite being highly-educated, some priests have
perpetrated some of the worst recorded acts of pedophilia on record.

__
Effect Size – “d,” “r,”and AUC (ROC Area)__

Every construct or item that has
been demonstrated by scientific and statistical methods to be a “risk factor”
has an associated “**effect size**,” “**magnitude**,” or “**potency**,”
which is the **strength** of association between that “risk factor” and a
particular outcome (such as sexual recidivism) to which it is being associated.

Effect size is commonly computed
with what is known as the “**d statistic**” or “**Cohen’s d**.” Here, “d”
stands for “difference,” and is more completely known as “standardized mean
difference” (Cohen, 1988).

Other ways of measuring effect
size include “**r**,” also known as a “**correlation coefficient**,” and
**AUC (Area Under the Curve, also known as "ROC Area")**. The
mathematical relationship between **d**, **r**, and **AUC** is
presented in a table in the 2005 paper **"Comparing Effect Sizes in Follow-Up
Studies: ROC Area, Cohen's d, and r," **by Marnie E. Rice, Grant
T. Harris, Law and Human Behavior, Vol. 29, No. 5, October, 2005.

With regard to the range of **d**
values normally associated with sexual recidivism studies, the value of
**d** will be slightly more than twice as large as **r (the correlation coefficient)**
calculated from the same data.

Moreover, the **POV (proportion
of variance)** shared between two variables can be calculated by squaring **r**.
In statistical analysis, the squared r value is called the “**coefficient of
determination**.” The **coefficient of determination** is understood by
statisticians as quantifying the strength of association or the magnitude of
variance that is shared between two variables (page 12, Ellis, 2010).

__
When “Medium” and “Large” Are Small__

In his Statistical Power Analysis
text, Cohen gives three anchor points for what can be conventionally considered
“**small**,” “**medium**,” and “**large**” d statistics within the behavioral sciences.
These values are .2, .5, and .8 respectively (pages 24-27, Cohen, 1988).

If one also considers ranges of d values for which the labels “small,” “medium,” and “large” could be reasonable applied, a d value of .35 for the threshold between “small” and “medium” seems reasonable, and a d value of .65 as a threshold between “medium” and “large” seems reasonable. Thus, “small” d values would be those less than .35; “medium” d values would be those from .35 to .65, and “large” d values would be anything greater than .65.

However, the labels “small,” “medium,” and “large” are so vague that it is ultimately better to consider the actual statistical and probabilistic meaning of d values in a certain range.

Indeed, in their 2005 paper, **
"Comparing Effect Sizes in Follow-Up Studies: ROC Area, Cohen's d,
and r," **Rice and Harris state the following in their abstract (page 615):

**".....we urge researchers and
practitioners to use numbers rather than verbal labels to characterize effect
sizes."**

and (page 619):

**"The verbal
characterizations proposed by Cohen could be very useful when communicating to
laypersons such as court officials, for example, if they were commonly
understood and adhered to by researchers and practitioners. However, we
suggest the field of risk assessment place little reliance on plain language
verbal labels because of the considerable disagreement about what they mean.
For example, our colleagues (Hilton, Carter, Harris, & Bryans, 2005) found that,
among a small group of forensic clinicians, some risks (i.e., probabilities
between .40 and .55 of violent recidivism in 10 years of opportunity) were
simultaneously characterized as "high,"" moderate," and "low" by different
clinicians. Clarity is best reflected by numerical characterization and we
offer the accompanying table to aid communication among researchers and
practitioners."**

For example, let’s examine actual data pertaining to sexual recidivism
which underscores the misleading use of such labels as "**small**," "**medium**,"
and "**large**."

In their large 2009 meta-analysis
(Hanson & Morton-Bourgon, 2009, page 17), which considered the outcome for
20,010 individuals for whom the Static-99 had been administered, the authors
found that the Static-99 had a mean d value of .67 with regard to the outcome of
“**sexual recidivism**.” Thus, r, the **correlation coefficient**, can be
taken to be half of .67, which is .335. Moreover, squaring .335 equals .112225,
which, after rounding off non-significant digits, equals .11, which is the value
of the **coefficient of determination**. Thus, the **POV (proportion of
variance)** is only 11%. (a very small proportion indeed).

The median d value for the
Static-99 in this study was .74, which yields an r value of .37, and a squared r
value of .1369, which rounds to .14. Thus, even when considering the slightly
large median d value, the **POV** is still only 14%.

Thus, the overall variance in sexual recidivism that is “explained” or “taken into account” by or “associated with” the Static-99 is in the range of 11-14%. That is, risk factors that cause the other 86-89% of variance in sexual recidivism are not taken into account by the Static-99.

Recall that these small POV
values are associated with d values that have conventionally been described in
the **“medium**” to **“large” **range, i.e. between .5 and .8. However,
only after converting such d values to POV values does it become clear that it
is misleading to associate the words **“medium**” and **“large”** with d
values in the range of .5 to .8.

Given the small proportion of variance (POV) of sexual recidivism associated with scores on actuarial instruments, such as the Static-99, basing involuntary commitments that will probably last a lifetime solely on such instruments seems scientifically unjustified.

__
Is Ice Cream Consumption in China a Risk Factor for Wildfires in
Texas?__

On page 338 of the excellent article, “**Coming
to Terms With the Terms of Risk**,” from the **Archives of General Psychiatry**,
Helena Kraemer (of Stanford University) and other authors make the following
statement:

**“It is
difficult to find 2 variables that are absolutely independent of each other.
Without considering some notion of potency in the assessment, given a large
enough sample size, virtually every factor could be demonstrated to be a risk
factor for every outcome that follows it. That would trivialize the concept of
a risk factor.” **

Based on this consideration, with a large enough sample size, it would almost certainly be possible to show that ice cream consumption in China was a risk factor for wildfires in Texas, yet such a “risk factor” would be expected to have a very small d value, or magnitude.

Since virtually any construct or
item can be statistically shown to have a non-zero (both positive or negative)
association with some particular outcome, virtually any construct or item can be
theoretically presented to juries as “risk factors,” even if such constructs or
items have d statistics that are above or below zero by small amounts (**less
than .3**). In the field of assessment for sexual recidivism probability,
“risk factors” with d statistics above or below zero by small amounts (**less
than .3**) have effect sizes that are so small that to call such constructs or
items “risk factors” can be highly deceptive and misleading to juries, as it
implies a strength of association (or even worse, causality) that is far greater
than that calculated by such small d statistics.

Here is an example that will help
to clarify several statistical principles related to risk factors. I call my
example “**White Wood-Frame Houses**.”

__
“White Wood-Frame Houses”__

Let’s say an arsonist was preferentially
seeking out and burning down “**white wood-frame houses**” in a particular
neighborhood. Let’s say that brick houses and wood-frame houses of colors other
than white were only occasionally burned down by this arsonist, and only then by
random selection, rather than being chosen for some specific feature. Such a
state would allow for a statistical calculation that would correctly conclude
that a “White wood-frame house” was a high-magnitude “risk factor” for the
outcome of being burned down.

However, let’s say that it was also the case that all white wood-frame houses in this neighborhood had brass doorknobs, red doors, and chimneys, while brick houses and houses of colors other than white had doorknobs made of metals mostly other than brass; doors of colors mostly other than red; and only had rarely had chimneys. If so, then a statistical analysis would also correctly conclude that “brass doorknobs,” “red doors,” and “chimneys” were also high-magnitude “risk factors” for the outcome of being burned down.

Three important points can be made from this
example. In particular, 3 errors, two of which are “**errors of causal
attribution**” and the third of which is an “**error of redundant risk
factors,**” have been and continue to be frequently made in the thinking of
prosecutors and prosecution experts in the Sexually Violent Predator Trials that
I have observed. I break down the causal attributional errors into 2 subtypes:
A and B:

** Type A Causal Attribution
Error**: The fact that “white wood-frame houses,” “brass doorknobs,” “red
doors,” and “chimneys,” are risk factors

** Type B Causal Attribution Error**:
Now imagine a realtor showing a potential buyer houses in this neighborhood. If
a white wood-frame house was for sale in the neighborhood, a
statistically-unsophisticated realtor might say, “I’d stay away from that white
wood-frame house because it has 4 risk factors for being burned down, namely a
white wood-frame, brass doorknobs, red doors and a chimney.” However, if there
was a brick house for sale with brass doorknobs, red doors, and a chimney, the
realtor might say “This brick house is safer because it only has 3 risk factors
for being burned down, namely brass doorknobs, red doors and a chimney.”
However, by calling the brass doorknob, red doors, and chimney “risk factors,”
the realtor would be erroneously conveying the idea that these 3 items were
associated with an increased probability of the brick house being burned down,
when in fact, the presence of these items was merely (given the preference of
the arsonist) a co-incidence that was not associated with an increased
probability of the brick house being burned down at all. That is, only the fact
of a house being a “white wood-frame house” was the selection criterion of the
arsonist. Moreover, the realtor might say that a brick house with aluminum
doorknobs, green doors, and no chimney had no “risk factors” for being burned
down, yet the risk of such a house being burned down would actually be no less
than a brick house with brass doorknobs, red doors and a chimney.

The 3^{rd} error is an __“ Error of
Redundant Risk Factors.”__ That is, risk factors can be partially or
completely

__
Redundancy of Risk Factors: Psychopathy and
other “Risk Factors”__

Although other variables, such as “Psychopathy” as scored on the PCL-R have been be shown to be “risk factors” with small d values (i.e. less than .3), such “risk factors” do not appear to be adding any explanatory capacity or to be associated with any additional variance in sexual recidivism over and beyond what is already associated with actuarial instruments, such as the Static-99. For example, even though the SVR-20, the VRAG, and the SORAG are assessments that take into account psychopathy as measured by the PCL-R, each of these 3 assessments have median and mean d values that are no better than the Static-99. In fact, the d values as measured for these 3 instruments is actually worse than the Static-99. Thus, factoring in psychopathy, as measured by the PCL-R, appears to be no better than the Static-99 alone.

Thus, even if someone was truly a “psychopath” by PCL-R standards, such a fact does not appear to contribute to or explain any additional risk over and beyond the factors already considered in the Static-99. Thus, after risk factors in the Static-99 have been considered, someone’s score on a PCL-R seems to have limited relevance, if any.

__
Identified “Risk Factors” Are Rarely Additive__

Yet, despite the high potential for substantial redundancy in what are identified as “risk factors,” I have personally witnessed laundry lists of risk factors being presented to juries in Sexually Violent Predator trials in Texas with no consideration whatsoever of the fact that there could be a high degree of redundancy between the individual risk factors. That is, the so-called “risk factors” are saliently listed out so as to suggest that each risk factor listed adds additional risk not already accounted for by the other risk factors.

If the hypothetical realtor in my example above was an expert witness, a jury would likely be erroneously told by the realtor that a white wood-frame house with brass doorknobs, red doors, and a chimney was 4 times as unsafe from arson as a white wood-frame house without brass doorknobs, red doors, and a chimney, yet (based on the selection criterion of the arsonist), neither white wood-frame house would be any more dangerous than the other.

Thus, simply presenting a list of constructs or items as risk factors to a jury can falsely imply an equal magnitude of risk for each of the risk factors; and can falsely imply that the overall risk is an arithmetic sum of the individual risk magnitudes associated with each risk factor.

Such misleading presentations to
juries can be avoided by citing the specific **d statistic** that has been
calculated for each item or construct being presented; and by explicitly
considering the degree (when possible, by mathematical and statistical methods)
to which there is redundancy between risk factors. In general, there will
almost always be some redundancy between risk factors in the social sciences.
This leads me to conclude with __Tennison’s Rule of Risk Factor Redundancy__:

** **

**
“The Whole is Less Than the Sum of The Parts”**

__
References__

**1. “Coming to Terms With the Terms of Risk,”** by
Kraemer HC, Kazdin AE, Offord DR, Kessler RC, Jensen PS, Kupfer DJ.; Department
of Psychiatry and Behavioral Sciences, Stanford University, California, USA;
Archives of General Psychiatry, April, 1997, 54(4):337-43.

**2. “The Essential Guide to Effect Sizes: Statistical
Power, Meta-Analysis, and the Interpretation of Research Results,”** by Paul
D. Ellis, Cambridge University Press, 2010.

**3. “Statistical Power Analysis for the Behavioral
Sciences, 2 ^{nd} Edition,”** by Jacob Cohen, 1988, (reprinted in
2009), The Psychology Press.

**4. “The Accuracy of Recidivism Risk Assessments for
Sexual Offenders: A Meta-Analysis of 118 Prediction Studies,”** by R. Karl
Hanson & Kelly E. Morton-Bourgon, Psychological Assessment, 2009, Vol. 21, No.
1, 1-21.

**5. “Assessing Risk for Sexual Recidivism: Some
Proposals on the Nature of Psychologically Meaningful Risk Factors,” **by Ruth
E. Mann, R. Karl Hanson, and David Thornton,** **Sexual Abuse: A Journal of
Research and Treatment**, **22(2) 191–217, 2010.

**6. "Comparing Effect Sizes in Follow-Up Studies:
ROC Area, Cohen's d, and r," **by Marnie E. Rice, Grant T. Harris, Law
and Human Behavior, Vol. 29, No. 5, October, 615-620, 2005.