Testing the significance of corrected validities

Put the following in that file folder labeled “Technical Stuff I Probably Should Have Known Coming Out of Grad School” along with “how to change that bulb thingie in the back of the toilet tank.”
Some colleagues and I were recently debating whether or not to conduct significance tests on corrected validity coefficients. For example, say we’re doing a validation study and have a huge number of test-takers and a less-than-huge number of employees with job performance data. Once you match up the two, the variation in test scores amongst those test-takers is truncated, maybe because only the brainiacs with higher test scores got hired. Because of the way God made math, that restriction in range is going to hold down your validity coefficient like a pair of cement shoes.
So what to do? Whip out your Bag of Magic Pixie Dust and correct for restriction of range, of course. If the magic dust doesn’t work, apply this formula:

Restriction of Range Formula

Where ru is the unrestricted validity coefficient, r is the observed validity, S1 is the unrestricted standard deviation, and s1 is the observed standard deviation.
So all this I already knew, minus having to look up the actual formula. The question that me and my chums were debating, though, is what to do with that shiny new corrected validity coefficient. My first inclination is to answer the question “Is it significantly different from 0?” In other words, do a significance test on it. Seems natural enough.
One of the other folks I was talking to, though, said he had vague recollections that such tests weren’t appropriate on corrected coefficients and that making claims that a corrected validity coefficient is “significant” was naughty at best and nonsensical at worst. So I did some digging, and sure enough, he was right.
To quote SIOP’s Principles for the Validation and Use of Personnel Selection Procedures:

When range restriction causes underestimation of the validity coefficient, a suitable bivariate or multivariate adjustment should be made when the necessary information is available.

…When adjustments are made, both the unadjusted and adjusted validity coefficients should be reported. Researchers should be aware that he usual tests of statistical significance to not apply to adjusted coefficients such as those adjusted for restriction of range and/or criterion unreliability.

That last sentence is key. The Principles pretty much carry enough weight to stop the debate right there, but I decided to get a second opinion from the American Psychological Association’s Standards for Educational and Psychological Testing. Sure enough in Standard 14.5:

Statistical significance tests for uncorrected correlations should not be used with corrected correlations.

Both works provide references for explanations of why this is the case, but suffice to say I don’t want to get into the uber-technicalities now. In the meantime, maybe this is something new you didn’t know. And knowing is half the battle! (The other half is actually doing stuff.)

Teach the children to cheat

I normally don’t write about educational testing because it’s not my purview (other than being on the receiving end for so long). But this article on WSJ.com had such a great tagline I had to read it: “Text-messaging answers. Googling during exams. In the Internet age, some schools have a new approach to cheating: Make it legal.”
The basic idea is that some schools are deciding that rote memorization isn’t as important as learning how to solve problems or learning how to find answers to problems using the tools that will be available in the “real world.” Don’t know the definition of the word “omphaloskepsis?” If you came across that word in a quarterly report at work, you’d look it up on dictionary.com, so why not test students’ ability to do that if we’re interested in making sure they’re prepared for reality? To quote the article:

The move, which includes some of the country’s top institutions, reflects a broader debate about what skills are necessary in today’s world — and how schools should teach them. The real-world strengths of intelligent surfing and analysis, some educators argue, are now just as important as rote memorization.

The old rules still reign in most places, but an increasing number of schools are adjusting them. This includes not only letting kids use the Internet during tests, but in the most extreme cases, allowing them to text message notes or beam each other definitions on vocabulary drills. Schools say they in no way consider this cheating because they’re explicitly changing the rules to allow it.

It’s an interesting concept and I have no beef with teaching concepts like how to use a calculator, search engine, dictionary, or that nerd who sits behind you in Algebra (who, by the way, is going to be your boss in a few years so be nice to him). And I generally don’t give a flip about memorizing dates, capitals, or names that have no relevance to anything important. But I think this kind of thing has to be alloyed with the good old fashioned “you know it or you fail” approach. There’s value to being able to quickly calculate, in your head, what 40% off of $35 is. Or to write an e-mail without giving your spell-checker a nervous breakdown or making the recipient scratch her head raw from trying to figure out what the heck your disorganized jumble of internet idioms means.
So yeah, teach (and evaluate) students’ ability to use real-world tools and resources, but don’t forget the other stuff, either.

Things I wish I knew about surveys


I was talking to a co-worker this morning about ways to optimize responses to surveys. I use job analysis surveys from time to time, and I’m always fretting over details in the hopes that they’ll help me get more responses and cleaner data. And it’s easy to come up a long list of possible factors that affect all this, but by and large I’m not sure of any kind of research to back them up.
For example, I want to know how much the following affect response rates and data integrity:

  • Sending out reminders
  • Who the reminders come from (e.g., me vs. a Vice President)
  • The timing of the reminders –day of the week, how far apart, etc.
  • 5-point vs. 7-point vs. 9-point response scale for Likert-type items
  • Whether or not to use a response scale with a “neutral” option vs. forcing people to one side of the fence
  • Font sizes, color, and general design rules
  • Personalizing the survey with the respondent’s name or other info
  • Offering incentives for completing the survey
  • Print vs. e-mail vs. Internet-based surveys
  • Survey length
  • Complexity/length of directions (is shorter better or worse?)
  • Randomization or items or grouping them together by scale?
  • Annual vs. monthly surveys for recurring metrics?
  • “Survey fatigue” –that is, getting too many surveys too close together
  • Putting “WON’T SOMEBODY THINK OF THE CHILDREN?” in bold across every page?

Now, I’ve seen anecdotal evidence for some of these questions and I think I’ve seen research-based discussions for others (in fact, a co-worker and I addressed the “annual vs. monthly” question in a SIOP presentation a couple years ago) but what I’d really like to see is all this information scientifically studied and put together in one place for easy reference. While it may never find its way to the “Best Sellers” rack at your local Barnes and Noble bookstore, a book like this entitled “Survey Hacks: How Little Changes Improve Your Surveys” would be quite useful.
If anyone knows of such a tome, let me know. The world needs to know.

January 2006 issue of TIP


The January 2006 issue of The Industrial Psychologist (TIP) is out and available online. One article in particular caught my eye. It’s entitled “The Association of Test Publishers and the State of the Testing Industry: Perceptions From the Front Lines”, and here’s the abstract:

In his 2005 SIOP Presidential Address, Fritz Drasgow indicated that SIOP members should be concerned about the proliferation of Internet tests that have little or no validity and reliability evidence, and he characterized this situation as the “wild, wild West” of testing (Drasgow, 2005). The Association of Test Publishers (ATP) shares all professional concerns regarding the psychometric properties of assessments and the criticality of these properties to sound psychological measurement. Throughout our history, ATP has worked diligently to help promote professionalism in the development, marketing, and use of testing. Through this article we hope to provide greater insight into ATP for those SIOP members who are not familiar with the organization, as well as to comment on the actual impact of the many poorly developed assessments that are available via the Internet and other sources.

A good read, though in places it kind of seems like self-promotion for the Association of Test Publishers.
There are also some articles on “Cross-Cultural skills” and reverse discrimination.

HR practices in video games? Yep.

World of Warcraft Logo

A different kind of post today. When not doing I/O type stuff, I’ve been playing a computer game called “World of Warcraft.” For those of you who haven’t heard of it, it’s an online, fantasy-themed game where you create a character and play with (or against) thousands of other, real people from all over the world. This human element adds all kinds of new twists to things, one of which is the organization of, well, organizations in the virtual world.

These assemblies of players, called “guilds,” come together for a variety of reasons. Many of them are just social groups comprised of people who know each other outside of the game or who have become friends through it. Others, as I’ve recently found out, are way more like businesses. They have officers, jobs/roles, rules, policies, budgets, mission statements, performance appraisals, and selection processes for new members. Some of them even have formal work (or in this case, play) hours where you’re expected to show up on time and put your virtual nose to the virtual grindstone!

My friend, who is in one of these guilds, was telling me about them today and all this made me think how much their operations sometimes resemble real organizations. When my friend applied for membership in the guild, they took his application and reviewed his qualifications and work/play history. They then brought him along for an employment test of sorts –a foray into a particularly dangerous part of the game world that demands skillful performance and cooperation with other team members in order to succeed. During this test, the guild’s officers evaluated my friend’s performance with a number of tools that gave hard data on his and others’ performance.

These tools assessed things like how much damage team members did to enemies, how much they endangered their teammates, and how well they used their special talents. It was, in effect, the data-driven decision making of Total Quality Management adapted for use in a video game. Certain players were expected to fulfill certain roles or jobs (attacking, healing, enhancing, controlling the actions of enemies, etc.), and these statistics made it easy to see who was doing his job and who wasn’t. If someone consistently failed, there were escalating levels of reprimand. Depending on the nature of the infraction, there could be warnings, performance improvement plans, training, demotions, or even expulsion from the group. These guilds were handling things more efficiently than many real life businesses I’ve seen!

There are differences, I know, so I’ll try not to overstate things. Consequences in real life are more dear, though you may have difficulty convincing the more fanatical players of that. And there are completely different mores in games and in business. You wouldn’t, for example, tolerate an office full of people screaming vulgarities when your Hunter adds two elite MOBs while trying to kite an instance boss. …So to speak.

Anyway, I don’t have much of a point beyond the observation that organizations and various Human (or Elf or Orc) Resources functions almost seem endemic to human nature when the circumstances are right. Similar problems in real-life and in games lead to similar solutions, even if one results in increased stock price while another results in a dead dragon.

New EEOC Commissioner sworn in

Looks like the EEOC is all staffed up with today’s swearing in of Christine Griffin.

Christine M. Griffin was sworn in today as a Commissioner of the U.S. Equal Employment Opportunity Commission (EEOC), bringing the bipartisan panel to its full five-member complement for the first time in more than a year. Ms. Griffin was nominated by President George W. Bush on July 28, 2005, and unanimously confirmed by the U.S. Senate on Nov. 4 to serve the remainder of a five-year term expiring July 1, 2009.

“Christine Griffin brings to the Commission a wealth of talent and experience in employment law and disability issues that will serve the American public well,” said EEOC Chair Cari M. Dominguez, noting that Ms. Griffin worked at the agency in the mid-1990s as a senior staff attorney. “On behalf of my fellow Commissioners, I welcome Commissioner Griffin back and look forward to working closely with her.”

My reaction to this article and the accompanying picture, of course, is “What? There are only five people on the Equal Employment Opportunity Commission? And they inhabit that office? Doesn’t that seem a little …underwhelming? I guess I just had images of the EEOC as this giant, sprawling complex full of bustling people. Some of them wore capes and watched thirty-foot high stacks of monitors for workplace injustices. When they saw one, they’d leap into action and blast off in supersonic jets while a bombastic narrator intones “Meanwhile…” Or something.
Instead we see a pleasant, middle-aged woman who looks like half the HR Managers I’ve ever met. And a guy who apparently fell asleep while holding the Bible for the ceremony. But that’s fine, too. I guess.