By Rob Schroeder
“The last 15 years have given educators and parents good reason to be skeptical about standardized testing. The industry has earned our distrust.”
If there was a Paul Revere in the assessment revolution, he would probably be galloping away right now proclaiming “The PARCC exam is coming! The PARCC exam is coming!” But these days, it’s unclear how the citizenry is going to respond.
By any measure, standardized testing is in a hot mess. Consider the following:
-Increasing numbers of teachers and parents decry the practice;
-The Illinois State Board of Education had to threaten the Chicago Public Schools with $1.4 billion dollars in sanctions before CPS agreed to test all of its students this spring;
-President Obama believes testing received too much emphasis under No Child Left Behind but committed over $350 million to develop new Common Core assessments like PARCC;
-Republican presidential hopefuls decry the PARCC exam as a “top-down” approach but universally approve of standardized testing to hold schools and teachers accountable for improvement.
One if by land, two if by sea, and three if no one knows what they are talking about.
The Center for Urban Education Leadership doesn’t own any horses, so Center leadership coach and assessment specialist Paul Zavitkovsky is going to have to find another way to spread the results of his three most recent studies of standardized testing.
In ISAT R.I.P, Zavitkovsky examined test results from the Measures of Academic Progress (MAP), National Assessment of Educational Progress (NAEP), and Illinois Standards Achievement Test (ISAT). The ISAT has long been viewed as an easy, low-rigor test compared to the MAP and NAEP. Surprisingly, he found that all three tests produced more or less identical results for comparable test populations in Chicago, Evanston and the State of Illinois. These findings raise complicated questions about what it means for tests to be “rigorous” and “standards-based.”
A key challenge of No Child Left Behind (NCLB) was a requirement for tests to be “standards-based.” Unlike conventional, norm-referenced tests that measure achievement in comparison with other students, standard-based tests were supposed to measure achievement in relation to specific state standards. In Standards-Based Clothing, Zavitkovsky examined how Illinois responded to the challenge. He found that most of what Illinois has called standards-based testing is plain old norm-referenced testing dressed up in standards-based clothing. This “clothing” distorted what the ISAT actually measured and sealed its reputation as an easy, low-rigor exam.
Items and passages on standardized tests are built to assess depth and breadth of student knowledge along a continuum of academic difficulty. The standardization process uses a numeric scale to define this continuum. Then it translates raw test results into “scale scores” that lie somewhere along that continuum. Students who are able to size up and work through test items at higher levels of academic complexity earn higher scale scores than students who are stumped by more complex items and passages.
Information from the ISAT was reported to educators and parents in two different ways. The first was an overall proficiency rating based on scale scores, an aspect that has received significant public attention on the very low cut scores that Illinois State Board of Education used to define what it meant to “meet” created the ISAT’s reputation as an easy, low-rigor test. Standards-Based Clothing detailed just how low those cut scores actually were and at how poorly aligned they were across the six grade levels that the ISAT tested.
The second type of ISAT reporting used “content strands” to offer diagnostic information detailing what students knew and were able to accomplish on the ISAT. Content strands had names like “main idea,” “supporting details,” “number sense” and “measurement.” Student mastery of academic standards in each content strand was reported out as the percentage of correct answers students earned in each strand.
Content strands delivered two clear messages: first, the ISAT was standards-based because it tested discrete bits of knowledge and skill which fit neatly into separate topical categories. Second, earning higher test scores was a result of students mastering greater volumes of content and skill in each topical category. Standards-Based Clothing demonstrates that neither of these messages is actually true.
Zavitkovsky contends content strands created an alternate universe of test information which did not reflect what the ISAT actually measured. Standards-Based Clothing used item-analysis of content strands from ACT and Learning First exams (an early ISAT proxy) to illustrate that content strands grossly distort what standardized tests actually measure. This analysis showed that questions almost always test more than one skill at the same time. For example, if students struggle with identifying and comprehending supporting details, their ability to get a “main idea” question right is also affected. It is exactly this mix of skills and content knowledge that test makers use to adjust the depth and breadth of knowledge that individual items represent, but content strand reporting has failed to convey that representation.
Because of low cut scores, bogus content strands and other reporting problems, the research community and many school districts, including CPS, stopped using the ISAT as a credible source of achievement information. In The Changing Face of Achievement in Chicago and the Rest of the State, Zavitkovsky showed giving up on the ISAT created a vacuum of information that kept important trends in statewide achievement from being more visible. Examples include dramatic flattening of achievement outside of Chicago; sustained, across-the-board growth of achievement in Chicago beginning in 2007; and big drop-offs in achievement as students in school districts outside of Chicago transition from elementary to middle school.
“The last 15 years have given educators and parents good reason to be skeptical about standardized testing,” Zavitkovsky said. “The industry has earned our distrust.
“The irony is that PARCC exams are now taking heat for the very forms of malpractice that they’ve been designed to correct. The shame is that we didn’t have the political will, or the systemic accountability, to make these corrections a whole lot sooner than we have.”