If The Graduate was remade for today I’d have the “one word” for Dustin Hoffman’s character: testing. If it isn’t already, testing will soon be as ubiquitous as plastics. If your interest is in making money you should consider investing in this high growth industry. Like plastics, though, testing clutters the landscape like so much cheap refuse.
I’m a teacher-educator at Emory & Henry College, a small private liberal arts college in SW Virginia. In order to enter our program students must take and pass the Praxis I test in mathematics, reading, and writing (or it’s equivalent.) For many this is just another $135 hoop. For a few it is a major roadblock. And then there are those for whom the Praxis I is a whole series of $135 stepping stones. These students are making the yacht payments for the well-placed at ETS.
OK, so I’m revealing my bias against testing as big business. So much of our schooling is centered on testing, let’s hope it isn’t all bad. I am prepared to at least provisionally accept that well constructed tests measure something of value. I think we are wrong about what tests measure, though, and I will use the data from a recent study by (who else?) ETS to prove it.
The explicit assumption most of us make is that tests measure knowledge and/or skills. Many criticisms may be leveled at tests, but they would assert something like this test isn’t good because it doesn’t accurately measure what the test-taker knows or can do. In other words we take for granted that tests -- by definition -- are at least meant to measure knowledge and skills. For example let’s say Johnny takes the Praxis I and gets a combined score (across the 3 subtests) of 532. The Virginia legislature has deemed that this score indicates Johnny has the minimal knowledge and skills to enter into a teacher preparation program. Let’s set aside (for now) the problem that if Johnny had, because of some very minor perturbation, scored a minutely different 531, his score would have indicated he did NOT have the minimal knowledge and skills to enter into a teacher preparation program. Apart from that, I’m guessing that my description so far is uncontroversial. I can probably even safely add that the reason that Johnny scored better than Jerry -- who took the same test on the same day and scored a distinctly lower 515 -- is because Johnny knows and/or can do more than Jerry. Getting agreement beyond this is going to be more difficult -- but let’s try. And fair warning: I’m trying to describe how most people think about this so I can then criticize it and suggest a better alternative.
Why is it that Johnny knows and/or can do more than Jerry? Some will say Johnny was better prepared by his prior experiences, some will say Johnny has higher native ability (at least for the sort of knowledge and skills measured on the Praxis test), and I think most will agree to a generic statement like: a combination of differences in ability and experiences led to the difference in knowledge and/or skills between Johnny and Jerry. I think this level of generality makes this statement as strong as it can be while keeping as many subscribers as possible. There may be few strict “nativists” who claim that all the difference can be explained by innate ability, and there may be a few strict developmentalists who claim that all the difference can be explained by experience, but most people will accept that both play a role, and indeed, may only make sense taken together. And we don’t know anything else about Johnny and Jerry to prejudice us about the potential explanations of the difference.
It may be an interesting aside to consider what question would you ask first if you sought an explanation for the difference between Johnny and Jerry. I suspect it would be something like: what were the differences in childhood environments between Johnny and Jerry? Any question that smacks of biological determinism has such negative connotations that you will likely try to avoid thinking about those questions. Instead you will ask: Did the caretakers encourage reading early? Did Johnny have better teachers in school? Did Jerry’s friends distract him from his studies? Like biology, though, these environmental factors are also “deterministic.” Early childhood environments are not chosen any more that one’s genes. I suppose we feel more like we can intervene in the fluid-seeming realm of the environment. I want you to see why that feeling is fallacious. More broadly I want you to see that these explanations are so brittle and prone to being misconstrued because they are built on a false assumption about what tests really measure.
If you’ve read this far you are probably a sincere educator, so I’d like to ask you a question that might clear up the fallacy over biological determinism. Given the choice, who would you prefer to help, a child with a difficult home life or a child with a learning disability? Silly question? I know, you don’t choose who comes in the door. And I also know you are exasperated by some of what you hear about what goes on at home. And you are frustrated by the difficulties some children have -- and the time it takes them to learn what others pick up quickly. Most importantly, the life work you have chosen is to help both. You will show the child with a difficult home life the security of a consistent and supportive classroom environment. You will help the child with a learning disability develop the specific reading strategy that will help him progress. We have equal purchase to intervene for a child regardless of the source of difference in that child’s performance.
It shouldn’t be about us, though. The child with a rough home life will go to that home at the end of the day. She will always be the child of her parents -- and the parent of her own children. The child with the learning disability will go to other classes and on to places of employment where he will have to rely on his own assimilated strategies to process information. We have to be careful not to overstep the boundaries of our responsibility -- because only one person can ultimately be responsible.
That brings me back to testing. Who is ultimately responsible for the score of a test-taker? I don’t ask this to start a buck-passing game. The answer tells us something important about what is really being tested. Here is the title (and conclusion) of the study conducted by ETS that I referenced near the beginning of this post: Motivation Matters. [full title and abstract at http://edr.sagepub.com/content/41/9/352.abstract]
The test that was tested (leave it to ETS!) was the Proficiency Profile, one of 3 major assessments offered by ETS to colleges and universities to measure the “value added” by higher education. The idea of these tests generally has been to give the assessment to freshman and seniors and use the difference as a measure of the value added at the institution. Colleges who bought into the assessment -- hoping to use it for accreditation or similar purposes -- have been torpedoed by the results that have now been widely publicized and discussed in the form of the book Academically Adrift, by Richard Arum and Josipa Roksa. If ETS wants to continue to sell this test to institutions of higher education it desperately needs to explain why the results of the assessments so far show little or no value added. Why wouldn’t the gains in knowledge and skills that surely come with years of college-level work be evidenced by a test designed to measure knowledge and skills? Maybe (the test-makers at ETS think to themselves) the test-takers are not sufficiently motivated! Let’s do a study to try and prove our hypothesis!
The study compared the results of 3 randomized groups each of which was given a slightly different consent form. The control group was given a standard consent form that indicated the test results would be kept confidential and only used by the research team. The personal condition group was given the same consent form with the addition of the clause: your test scores may be released to faculty in your college or to potential employers to evaluate your academic ability. The third group with the institutional condition was given the standard consent form with the addition of this clause: your test scores will be averaged with all other students taking the test at your college. Only this average will be reported to your college. This average may be used by employers and others to evaluate the quality of instruction at your college. This may affect how your institution is viewed and therefore affect the value of your diploma.
Check it out for yourself [full article at http://edr.sagepub.com/content/41/9/352.full#sec-4] but in my view the study was well designed and carried out, and the results are robust. Rather than follow the evidence to its logical conclusion, however, the researchers clung to their initial assumptions and then bent their analysis to fit it.
The researchers found that students in the personal condition group reported higher motivation (as measured by the Student Opinion Scale -- a 10-item self-report of test-taking motivation) and scored significantly higher on the Proficiency Profile than students in the control group. Students in the institutional condition also reported higher motivation than those in the control group, and also scored higher on the Proficiency Profile. Of course the researchers (who work for ETS) do not accept the obvious implications of the results. They stick to the assumption that the assessment measures skills and knowledge -- so long as students are sufficiently motivated. What is needed, they conclude, is to motivate students to take the test seriously (sound familiar?). So ETS will now provide all test-takers with an electronic certificate that indicates score and may be shared. Oh boy, another rationale to take the Proficiency Profile -- it may be required by future employers.
What the results of the ETS study show is that the Proficiency Profile measures the motivation to perform the skills and demonstrate the knowledge captured by the assessment. The model underlying our assumption that tests measure skills and knowledge directly is faulty. Using the conventional model of human performance we imagine that competence is derived from (some mixture of) innate ability and appropriately stimulating prior experience. Further we imagine that levels of performance can then be measured by a test so long as the test-taker is sufficiently motivated.
Here’s a better model: personal competence is the motivation to perform the skill or demonstrate the knowledge. That’s what the ETS study revealed -- test scores correlated with motivation scores. Motivation is considerably more complex than just the perceived use of a test score, of course, but the result is generalizable. Why did Johnny score higher on the Praxis 1 than Jerry? Johnny was more motivated than Jerry, and likely in a much deeper and more complex way than his feelings about a single test.
Ability and experience are the mechanical components of human performance. They twist and pull us like the strings of a puppeteer. Motivation cuts through and surrounds all of that like a fourth dimension. It isn't the same as free floating and mysterious willpower, but it does reside in and ultimately define the person. For better or worse, it directly corresponds with the responsibility given to the person. We cannot motivate a person, but we can -- and should -- hold him or her accountable. A person must be credited for what is due and debited for what is owed. We needn't have tests for this -- the world has it’s own way of settling accounts -- but any test we administer to a person can only give us a measure of his or her motivation to take it.