Skip to main content

5.4.2 The validity of interview data

As we have already seen interviews are a very flexible method of assessment. In an interview we could be assessing knowledge, skills, abilities, personality, motivation, and so on. All of these different constructs can have a different relationship with job performance. This means that any examination of the validity of interviews needs to take these differences into account. This is why meta-analyses have been an important source of information about the factors that influence the validity of interviews.

Large meta-analyses tend to reveal modest validity coefficients for selection interviews. Reilly and Chao (1982) found an average coefficient of .19 with a variety of criteria. Weisner and Cronshaw (1988) found a slightly higher coefficient of .26 with supervisor ratings of performance. Many of these average coefficients are for a range of different types of interviews.

When one delves deeper into the impact of the design and execution of interviews on validity coefficients a number of important findings emerge. In one of the most substantial meta-analyses of interviews McDaniel, Whetzel, Schmidt, and Maurer, (1994) reviewed 245 different validity coefficients from studies of interviews. They found that validity was the highest when:

  • The interviewers used situational and job-related questions (interview content)
  • When the interview was highly structured and carried out by one person (interview execution). Salgado's (1999) review reports that highly structured interviews have an average validity coefficient of around .5, whereas those with little structure have coefficients of around .2
  • When job performance measures (rather than tenure) were the criteria for the validation study (validation criteria)

These findings are logical. If the interview is specifically designed to examine job-related competencies in an organised and methodical way then there is a better chance that it will predict future performance than if it is conducted in a haphazard fashion.

Campion, Palmer and Campion (1997) provided a more detailed analysis of the determinants of the predictive validity of interviews and concluded that predictive validity was improved by certain design characteristics.

A quick task: Think back to your own experiences of being a candidate in selection interviews. How many of those interviews followed the recommendations that have emerged from the research you have read so far in this section of the unit? What were the main similarities and differences?

Campion et al. (1997) also found that the way that data was collected and evaluated also had a significant impact on the validity of the interview. Higher validates tended to be obtained when:

  • Each answer given by the candidate was rated separately and on multiple rating scales (e.g. different rating scales for interpersonal skills, drive and determination, technical knowledge etc. rather than a single overall rating for the answer to the question)
  • Interviewers took detailed notes of candidate performance and used rating scales that had clearly defined rating points (i.e. anchors such as "displays many of the positive behaviour indicators of this competency" or "displays mostly negative behavioural indicators of the competency")
  • Interviewers used information about the links between interview performance and job performance in their decision-making
  • Overall evaluations of candidates were determined by summing the scores obtained in the interview rather than allowing interviewers to determine the overall rating using their own individual rationale
  • Interviewers were provided with extensive training in all aspects of the interviewing process

As you can see, these measures are designed to restrict the impact of human biases or decision-making heuristics on the outcome of the interviews. Many are measures designed to improve the reliability of ratings and as we know reliability is a necessary condition for validity.

The issue of incremental validity is perhaps less important for selection interviews than it is for other methods of assessment. Given that an interview forms the bulk of many selection processes, the fact that it captures data on individual differences that are also captured by other selection methods is perhaps desirable. For example, if the interview captures good data on personality then might there be an argument that the omission of a personality questionnaire from the selection process is less of a problem?

Research on the incremental validity of interviews yields fairly unsurprising results. In terms of predicting job performance, there is a lack of incremental validity over intelligence tests (Mayfield, 1964; Schmidt & Hunter, 1998). It appears that interview performance is significantly related to, but not the same as, intelligence. In Reading 1.1 (p.456) Robertson and Smith (2001) present a useful succinct discussion of the construct validity of interviews. Performance in unstructured interviews tends to rely more upon social skills and personality, while cognitive ability has more of a role to play in determining performance in highly structured interviews. There is also evidence that interview performance is related to performance on more elaborate and complex selection methods such as assessment centres (Dayan, Fox & Kasten, 2008). Therefore, typical selection interviews tend to have broad construct validity: this is perhaps re-assuring as they tend to dominate many selection procedures.