The many ways to calculate adverse impact

Yesterday I attended a pretty good workshop put on by the Personnel Testing Council of Southern California in which Dennis Doverspike talked about assessing adverse impact –when a test or other hiring system discriminates against one group more than another. (He also spoke on hiring based on a public service work ethic, which I’ll probably write about next week).
Adverse impact analyses had always been pretty straight forward to me. I was certainly aware that other methods existed, but I had always used the “Four-Fifths or 80% Rule” to determine the presence of a hiring system’s adverse impact against minorities or women. Quoth the Uniform Guidelines on Employee Selection Procedures:

A selection rate for any race, sex, or ethnic group which is less than four-fifths (4/5) (or eighty percent) of the rate for the group with the highest rate will generally be regarded by the Federal enforcement agencies as evidence of adverse impact, while a greater than four-fifths rate will generally not be regarded by the Federal enforcement agencies as evidence of adverse impact.

So here’s an example:

In this example 64 males took a test and 16 passed while 17 women took the test and 3 passed. So the passing rates were 20% for males and 15% for females. Is the 5% difference enough to signal adverse impact?
The answer is yes: 15 / 20 = 75% or three quarters. The Four-Fifths rule says that if it’s less than 80% (i.e., four-fifths) then you’ve got evidence of adverse impact. Pretty cut and dry, right?
Well, as the PTC-SC workshop point out, no. There’s also language in the Uniform Guidelines that allows for most rigorous statistical tests like Chi Square or Fisher’s Exact Test, and there’s a history of court cases that use other quasi-statistical rules of thumb, like saying that pass rate for the protected group must be within 1.97 standard deviations of the dominant group’s passing rate. And the thing is that depending on the distribution of your data, one method may yield a red flag while another may not. There are also different assumptions about what’s the population of interest –is it all the people who applied for the job or is it all the people in your labor market who could have applied. And don’t even get me started about setting different levels of alpha (i.e., accepting a 5% or 10% or 1% chance of saying there’s a difference between the groups when there’s not). Seriously, don’t. We’ll be here all day.
Dr. Doverspike’s presentation provided a long list of helpful formulas and procedures, but the thread that ran through them all: There’s more than one way to skin a cat and then not hire it based on discriminatory hiring practices against skinless cats. In other words, the Four-Fifths rule isn’t the final word and whether your hiring procedure has adverse impact may depend as much on your data as your lawyer.
In the end, though, it’s almost all a moot point. My own rule of thumb would be this: Unless you’re actively trying to increase the diversity of your workforce, assume you have adverse impact and move on to looking at validity and utility. If you use your favorite method and find out that you don’t have adverse impact, assume that some other lawyer or expert witness could come along and uncover some just by slicing your data differently or making a couple of assumptions differently. If you want to maximize the usefulness of your test, you should be more worried about whether or not it’s valid and what kind of utility you’re getting out of it.


8 Comments on “The many ways to calculate adverse impact”

  1. David says:

    Thought you might find the following link interesting and useful: http://www.hr-software.net/EmploymentStatistics/DisparateImpact.htm
    It is a disparate impact calculator written in Java that calculates the most of the tests we learned in the workshop. Sandy came across it and I’ve been using it for some additional analyses I’ve been doing for another company.
    I’ve been trying to replicate the programs in SAS this weekend, there are a couple I really like and am trying to fully understand the statistics behind them.

  2. Stephen says:

    The adverse impact arithmetic based on Table 1 seems a bit off. The pass rate for males is .25 (16/64) not .20, and the pass rate for females is .18 (3/17) not .15. The adverse impact ratio is therefore .18/.25 or .72 (not .75 as reported).
    Under the four-fifths rule-of-thumb, there is a possible indication of adverse impact. However, the exact probability of having selected 3 or fewer females by chance alone under this scenario is .389. I think most analysts would agree that this result could easily have occurred by chance and does not indicate adverse impact. The example serves as a reminder that the crude four-fifths rule-of-thumb quite often produces results different from more refined tests.

  3. Bryan says:

    Thanks for the link, David! Very interesting.

  4. Jamie says:

    Stephen: Uh, yes! Congratulations! You win the prize for spotting the error. As I planned. Now that the contest is over I’ll change it so that it’s right. 🙂
    David: Cool, thanks for the link. That’s pretty neat.

  5. David says:

    I wouldn’t go so far as calling the 4/5ths rule, “crude”. Over the past couple of weeks, I’ve ran about 60 or so impact analyses. I’d run the 4/5ths calcuation at the highest level. If there was evidence of impact, I’d dig down deeper until I found out where the biggest differences were. In most cases, the 4/5ths result coincided with other results. What I like about the 4/5ths calculation is that I can tell managers what percentage or how people they need to come down (I’m looking at terminations, not hires). It is far easier to explain percentages to them instead of observed and expected values, or 95% confidence intervals. Although with a little coaching, I think looking at probability distributions is the way to go.

  6. Stephen says:

    Using your revised Table 1 data, the Four-Fifths rule of thumb still indicates adverse impact against females. However, the probability of blindly selecting 3 or fewer females (where no regard is given to gender) under this scenario is now .434. The outcome does not seem to indicate bias against females because the results could have easily occurred by chance (about 43% of the time, based on a Monte Carlo approach using one million random trials). This makes sense because the most likely outcome would have been to select 4 females (16 of the applicants passed out of 84, and 16/84 times 20 females equals 4 females). What the probability approach is saying is that the gender-neutral process would produce 4 females and 3 actually made it, a number very close to the gender-neutral result of 4 females.
    Be aware that the on-line HR-Guide Disparate Impact Analysis calculator seems to have a glitch. Assume in the above example that ZERO females were selected. The on-line calculator results for the Four-Fifths Rule were reported thusly: ‘Adverse impact as defined by the 4/5ths rule was NOT found in the above data.’ That cannot be correct because the selection rate for females (0/20 or 0.0%) divided by the selection rate for males (13/64 or 20%) gives 0.0/.20 or zero. Females are selected at a rate less than 80% (zero is less than 80%) of males so the correct conclusion should be that Adverse Impact IS found in the data. I wrote to the HR-Guide folks some time ago but they never responded.

  7. Jeff says:

    I think part of the problems described above is that it is difficult to interpret statistics based on small numbers. You are describing a situation in which 0 (None; nada) females were hired and are expecting to measure the level of adverse impact against females using a ratio. If no (zero; nada) females were hired, then why bother with a statistic. You can simply declare that there was adverse impact (if you’re just using the 80% rule).

  8. Madison says:

    Both of the calculations are wrong. The chart example states that 13 males passed out of 64 applicants 13/64=.20 and 3 females passed out of 20 applicants 3/20=.15. Which correlates to 15/20=.758 100=75%
    In each examples your using diffrent data. the cahrt differs from the verbiage example which reads “In this example 64 males took a test and 16 passed while 17 women took the test and 3 passed. So the passing rates were 20% for males and 15% for females. Is the 5% difference enough to signal adverse impact.