Research The Army Alpha and Beta tests opened the floodgates to a host of other group tests of mental ability. Today, most U.S. students are assessed with at least one of these tests at some point while in school.
The SAT is just one familiar example. This sort of test is typically given to a whole class of students or group of individuals at once. The test often consists of numerous multiple-choice questions, which are answered on a special answer sheet that can be scored by a machine.
One of the big advantages to such tests is that they are standardized. This means that the tests themselves have undergone extensive testing before ever being used in an actual classroom or other real-life situation. During the development phase, a test is given to a representative sample of individuals under clearly spelled-out conditions, and the results are scored and interpreted according to set criteria. The goal is to establish a standardized method of giving, scoring, and interpreting the test in the future. This helps ensure that as much as possible of the variance in scores will be caused by true differences in ability, and not by differences in the testing procedure.
Norms are also provided to help with the interpretation process. These are the test results gathered from a particular group of test takers during the development phase. The norms can then be used as benchmarks for interpreting individual test scores in the future. Depending on the test, different kinds of norms may be provided. For example, age norms and grade norms indicate the average scores of a group of test takers who are of a certain age or in a particular grade.
The other major advantage to group tests is their efficiency. It might take hundreds of hours for a skilled examiner to administer 100 individual intelligence tests. In contrast, it might take just a few hours to give a group test to an entire roomful of people. Clearly, group testing is much less expensive and time-consuming. In fact, without the development of group tests, intelligence testing would never have become the large-scale industry that it is today.
Group tests also have some disadvantages, however. The group setting makes it impossible to take into account individual factors—such as being sleepy, sick, uncooperative, or anxious—that might affect a person's score, but that have nothing to do with his or her intelligence. The group format also does not allow an examiner to note why a particular answer was chosen or question was skipped. It simply scores how many correct answers were chosen. No distinction is made between questions that were missed because the person simply did not know the answer and those that were missed because the person could not read them, did not understand them, or simply was taking his or her time in an effort to avoid careless mistakes.
Another drawback relates to the multiple-choice format that these tests favor. Multiple-choice questions may call for the use of different psychological strategies than the open-ended questions often found on individual tests. For one thing, multiple-choice questions, which are based on the assumption that there is one right answer, may penalize creative thinkers, who often see the same problem from many different angles. Nevertheless, research has shown that scores on the best group tests are generally highly correlated with those on individual tests. In other words, a person who gets a certain score on a group intelligence test is likely to also get a similar score on an individual intelligence test.
The mass testing of mental abilities remains controversial. Yet many organizations have concluded that the pros outweigh the cons. Group tests assessing various mental abilities have become a fixture in American society. These are just a few common examples:
• Multidimensional Aptitude Battery. This is a test of general thinking ability, designed to be given to groups of adolescents or adults. It is an adaptation of the Wechsler Adult Intelligence Scale-Revised (WAIS-R), the most widely used individual test of adult intelligence.
• Cognitive abilities tests. These are two distinct group tests designed to assess the general mental ability of schoolchildren.
• SAT. This is a test of general scholastic ability that is used to help colleges make decisions about which students to admit.
• Graduate Record Examinations (GRE). These scholastic ability tests are used to make graduate school admission and placement decisions.
• Armed Services Vocational Aptitude Battery. This is an example of a test that attempts to measure several specific aptitudes. It is used to screen military recruits and help place them in appropriate jobs.
• General Aptitude Test Battery. This test also assesses several aptitudes. It was developed by the U.S. Department of Labor and is currently used by the U.S. Employment Service to help guide job placements.
Certainly, a huge amount of data has been amassed over the years on the validity and reliability of various group intelligence tests. In addition, great strides have been made in the way such tests are standardized and normalized. Nevertheless, the underlying philosophy and basic procedures for most group tests still bear a strong family resemblance to their ancestor: Yerkes's World War I Army tests.
The SAT While all group intelligence tests owe a debt to Yerkes, one test has a more direct link to him.
The original version of the modern SAT was developed by Brigham, Yerkes's junior colleague in the Army testing program. Soon after the war, Brigham began adapting the Army Alpha test for use in screening college applicants. In the 1920s, Brigham first tried out his new test on freshman at Princeton University and applicants to the Cooper Union, an all-scholarship college in New York City.
Several years earlier, in 1900, the College Entrance Examination Board had been founded. The board was set up by the presidents of a dozen leading universities, who sought to simplify the application process for the benefit of both prospective students and admissions officers. In order to do that, the board wanted to devise a common entrance exam that could be used by all the universities. That way, an applicant would have to take only one entrance exam, rather than a separate test for each school to which he or she applied. At first, the exam consisted of essay tests in specific subject areas. When the board heard about Brigham's research, however, they put him in charge of a committee, which was asked to develop a test that could be used by a broad range of colleges as an objective measure of academic potential. The test also needed to streamline the admissions process and level the playing field for students from a wide variety of backgrounds.
In 1926, Brigham's test, which later came to be known as the SAT, was given to high-school students for the first time. Then, in 1933, officials of Harvard University set out to find a way of evaluating candidates for a new scholarship program. The program was intended to help academically gifted young men who had not graduated from the elite Eastern boarding college preparatory schools that supplied most of Harvard's students. The officials settled on Brigham's test, because they thought it measured pure intelligence rather than the quality of a student's high-school education. By the late 1930s, the SAT was being used as a scholarship test by all of the prestigious Ivy League schools.
Use of the SAT soon spread beyond its Ivy League roots, and the test remains very widely used today. In fact, in 2003, a record 1.4 million high school seniors took it. Yet, in spite of—or perhaps because of—the SAT's popularity, the test has been a lightning rod for controversy over the decades. Critics have charged that the test systemically underestimates the academic ability of females, applicants over age 25, and those whose first language is not English. In addition, some studies have shown that the SAT does not predict college performance—such as freshman grades, undergraduate class rank, college graduation rates, or attainment of a graduate degree—as well for black students as it does for white ones.
In general, studies have shown that high school grades are better predictors of college grades than SAT scores are. The SAT still does a fair job of predicting how well a college freshman will perform, however. When SAT scores and high school grades are both used, their combined predictive ability is slightly better than that of grades alone. One problem with using grades alone is that they are less comparable, since they may reflect not only a student's ability, but also the difficulty of the courses the student has taken and the standards of the school. On the other hand, SAT scores alone can not reveal anything about a student's motivation or work habits. Therefore, most psychologists currently recommend that, if SAT scores are used at all, they should be combined with grades, portfolios, or other evidence of academic potential.
As an interesting aside, it is worth noting that a version of the SAT introduced in 2005 includes a new essay-writing section. In part, then, the test has come full circle. Yerkes and his followers introduced the idea that large groups of people could be tested and compared quickly using objective methods. Many people still believe that group tests can be quite useful as an efficient screening tool. Even advocates of this approach recognize that it has its limits, however. To fully assess any individual's capabilities, it is necessary to look at other dimensions besides a test score.
Case studies Several colleges and universities have studied the validity of the SAT. The aim of such studies is to measure the predictive power of SAT scores for that particular college's student body. The College Board (the current name for the College Entrance Examination Board) encourages such research through its Validity Study Service. The service itself has come under fire in recent years however. Critics claim that it encourages the use of flawed research methods that overstate the SAT's benefits.
Nevertheless, validity research at individual institutions has generally found that the SAT has relatively weak predictive ability. The National Center for Fair and Open Testing (nicknamed FairTest), an organization that opposes the misuse of standardized testing, has documented some of the less encouraging results. For example, researchers at the University of Pennsylvania looked at high-school class rank, scores on the SAT I (the main test), and scores on the SAT II (optional subject area tests). They compared all these factors to students' cumulative grade point average (GPA) in college. The researchers found that the SAT I was the poorest predictor of all, explaining a mere 4% of the
Was this article helpful?
Enchanted Learning Experiences -Why They Should Be The Norm For Our Children. The latter part of the twentieth century has seen more discoveries about the human brain than in all previous history of mankind. It is as though we have been paddling in the shallows of a vast ocean hitherto unaware of its existence.