Internationally recognised standardised tests: Are they accurate measures of scholastic ability?

Standardised tests, such as the SAT and GRE, are often criticised for not doing a good job at measuring what they are supposed to. Critics question the accuracy of the evaluations themselves, and also what the tests are supposed to achieve (namely, predicting academic performance in college or graduate school).

The issues are broad-ranging:

  • Should tests such as the SAT and GRE be used as admission criteria by institutions of higher learning?
  • Are they skewed towards certain segments of the student population?
  • Are they accurate measures of scholastic ability?
  • Are they useful for the institutions themselves? That is, do they accurately predict future academic performance?
  • Why do educators place a heavy emphasis on performance on such tests? Do alternative metrics exist?

Of all these different points of debate, let us focus on the tests themselves and ask: Can the SAT (Scholastic Aptitude Test), GRE (Graduate Record Examination), and CAT (Common Admission Test) accurately measure a student’s overall scholastic ability? Here are some points and counterpoints.

1. Standardised aptitude tests attempt to measure a limited set of capabilities: Logic, reasoning, language, reading comprehension, and basic mathematics (LRLCM). The GRE was modified in 2002 to include essays, but this falls within the brackets of reasoning and language. Many will argue that this is insufficient. For example, there are MBA programs that accept GRE scores; the GRE, however, does not measure many of the skills required of management graduates — for example, leadership skills and effective spoken communication.

Counterpoint: Advocates of standardised tests claim that the measured LRLCM skills are good predictors of success in graduate school, and that LRLCM skills is a point of value in itself. But this has been disputed, as in (1). A review in the psychology department of Yale University found that test scores could explain only 3% of grade difference over the first two years of graduate school.

2. Most often, institutions look at prospective students’ combined scores on all the sections of the test (GRE, SAT, or CAT, as the case may be). A student who performed poorly on the language section and exceptionally well on the mathematics part would have a similar combined score to one whose performance was vice-versa. This, it has been claimed, reduces the overall correlation with future academic performance because different programs require different skills of the student.

Students’ aptitudes on different sections of the test can vary widely; these sometimes should — and sometimes should not — be penalised. A student aspiring to enter a foreign language program should not be penalised for a below-average score on the analytical-ability section. On the contrary, for a mathematics program, a good combined score with a sub-par mathematics score — should be penalised. The predictive ability of the standardised test is thus enhanced in some cases and lowered in some cases; overall, therefore, a combined score is undesirable.

Counterpoint: There are claims that a well-rounded score is a better predictor of future performance. There is not, however, enough documented evidence about the use or benefit of individual versus combined scores.

3. A classic argument is that standardised competitive tests reward practice and coaching. An ideal test would not. It should not make a difference whether a student has practised test material for two months, or whether test questions are presented impromptu. 

Counterpoint: Academic programs themselves reward test practice, revision of material prior to examinations, etc. The fact of a standardised test rewarding practice is consistent with the nature of academic programs.

4. Tests are skewed towards “good test takers” — that is, those who are more sensitised to time and timing, who work well under pressure, and so on. It has been said that such individuals are able to treat the test like a game or puzzle, and that they plan a strategy in accordance with “the rules of the game.” These soft skills do not necessarily correlate with the skills that the test is eliciting from the student.

Counterpoint: As with reward for practice and coaching, the ability to work under pressure and deadlines, and to correctly estimate work quanta, correlate with the demands of academic work.

5. In the case of internationally valid examinations (such as the GRE), language becomes a point of bias against those whose native language is different from that of the test. Such students have the additional task of becoming familiar with the test language, if they have not been exposed to it early in their lives.

Counterpoint: Some might argue that proficiency in the test language (most often English) is essential for success in a study program offered in English. However, tests like the SAT and GRE heavily reward above-average proficiency in English, which necessarily makes a bias against those for whom it is a second language. From (2): At one university, bilingual Hispanic students who scored low on the GRE performed exceptionally well on a comparable test administered in Spanish.

 

As a conclusion, performance on such tests does not necessarily correlate with academic excellence, past or future. The points above remain open to debate. A conclusion — about the value or otherwise of standardised tests —  is elusive.

 

 

 

References:

1. Sternberg, R. & Williams, W. (1997). Does the Graduate Record Examination Predict Meaningful Success in the Graduate Training of Psychologists?

2. Bornheimer, D.G. (1984). Predicting Success in Graduate School Using GRE and PAEG Aptitude Test Scores. College and University, v. 60 (no. 1) pp. 54-62.