How much should standardized tests count in school review?A look at local schools' results
Students across Maine are taking federally mandated standardized tests this week, at a time when the role of the results in school accountability is in flux and the tests themselves are as polarizing as ever.
Most teachers contacted for this story did not respond to requests for interviews, and one declined because, she said, the topic is so controversial she was afraid "it would be one of those 'throw myself to the wolves' situations."
Student performance on the Maine Educational Assessments is the primary factor in accountability formulas the state uses to identify underperforming schools for review and interventions. Earlier this month, Congress voted to do away with the complex rules governing those accountability systems in order to give states more flexibility in their design. The state had already developed its Consolidated State Plan for compliance with the Every Student Succeeds Act (ESSA) of 2015, the broader federal law which requires student testing and school review, so its proposed accountability system still complies with the rescinded rules. That state plan is open for comment until March 30.
The testing requirement was on the line as ESSA was going through Congress, but Democrats' support of the provision to keep schools accountable to their students and expose inequality outweighed Republicans' opposition to excessive federal intervention in local schools. But unlike its predecessor, the No Child Left Behind Act of 2001, ESSA allows for other measures besides test scores, including graduation rate and attendance and an additional state-chosen measure, as factors into school accountability. Maine has not yet chosen its additional measure or set the weights each will carry in the formulas.
Any comments the state receives on its state plan will add to years of debate over standardized testing. The Maine teacher's union says that judging a school based on student performance on standardized tests is "unfair," and that the overuse of standardized tests is "detrimental to students and to public education."
Its position is neatly encapsulated in a quote highlighted on its website: “I trust my daughter’s teachers and our public school. I know my child is getting a great education; I don’t need a test to prove that to me.”
But Steve Lazar of Educational Testing Service which produces the National Assessment of Educational Progress argues that standardized tests provide an important external measure of student achievement beyond what teachers assess and report themselves.
"It is important to have everyone's grades matched up to a common yardstick," he said. "There are, frankly, civil rights implications in doing this. If we thought education experiences were perfectly equitable, you could make a case that you wouldn’t need these things, but I think for the most part we do."
The DOE emphasizes that schools are ranked by their test results for the purpose of identifying schools that need support- support which includes funding for specific professional development programs and meetings with school improvement coaches. But that support is coupled with blame. The department also publicizes letter grades for schools based on their students' test results in order "to make sure that schools are accountable for explaining school performance to their communities," according to the department's website.
As the debate over testing rages on the national and state level, students are taking the tests each year, adding to a massive trove a data that may or may not be useful. Maine has spent nearly $30 million on assessments since 2009, the earliest year for which data is available on the state transparency website Maine Open Checkbook.
Looking at the last year's results, less than half of the students in most Knox and Waldo county schools were proficient in math. In English language arts, they are doing better.
But there are important things to consider when interpreting the results. What does it mean that a student is proficient? Is student proficiency the right measure of school quality? What other factors might affect proficiency?
One issue with state assessment data is that the definition of proficiency is changeable and inconsistent. States determine which tests they use and set the cutoff scores above which a student will be counted as proficient.
In “Proficiency illusion,” a 2007 report by the Northwest Evaluation Association (NWEA), author Deborah Adkins wrote that there was no common understanding of what proficiency means.
The definition, she said, varied state to state, year to year, subject to subject, and grade level to grade level. Adkins concluded students might clear the hurdles of proficiency, yet emerge unprepared.
"We run the risk that children in many states may be nominally proficient, but still lack the education needed to be successful on a shrinking, flattening, and highly competitive planet,” she said.
No Child Left Behind imposed punitive measures for schools that didn’t make “adequate yearly progress.” In response, schools and state education departments did whatever they could to nudge their percent-proficient numbers upward. These could be efficiently increased by focusing attention on the students who scored just below the proficiency cutoff score, the “bubble effect” referred to by Sen. Al Franken (D-Minn.) in remarks at a confirmation hearing for Education Secretary Betsy DeVos.
States could also simply lower their proficiency cutoff scores. Maine did just that between 2006 and 2011, according to a study by NWEA. Researchers looked at the percentiles on a standardized test NWEA administers, to which Maine's proficient cutoff score corresponded, and found the cutoff scores were dropping over the years.
Former Education Secretary Arne Duncan said the law unintentionally encouraged many states to lower their cutoff scores "so that more students would appear to be proficient, even though they weren't."
Maine's proficiency standards historically have been lower than national proficiency standards set for the National Assessment of Educational Progress given to a sample of fourth, eighth and 12th graders across the nation. (See chart 2 in gallery.)
Now Maine's definition of proficiency is more in line with national standards. (See chart 3 in gallery.) In 2015 and 2016, the state changed the tests it uses for state assessments. A DOE spokesman said that the department was careful to establish cutoff scores that set high standards for proficiency.
"Three different methods were used to estimate cut scores, including input from Maine educators, and then a triangulation process was used to establish the final achievement levels," he wrote.
Is growth a better measure?
The problem with proficiency tests with fixed cutoff scores, some say, is that they do not capture student improvement at either end of the spectrum.
“Proficiency just grades you for how many kids are above that level; it is not an accurate representation of what’s really happening in the classroom,” said Jessica Hahn, a spokesperson for NWEA, which produces a growth test used by 70 schools and districts in Maine.
She gave the example of a fourth grade class in which most students are two years below their grade level in reading. If a teacher brings them up to one-quarter year below grade level, she said, that is exceptional education but it will not be reflected in the percent-proficient number at the end of the year.
Students and teachers like NWEA's growth test, Hahn said, because it gives immediate results on screen.
"We are super motivated about NWEA,” said Camden Hills Regional High School senior Zoe Zwecker, who pronounced it "nuweeah". “We are kind of competitive and talk about what we got on it.” She said she would try harder on that test than on the state assessments.
Neal Guyer, director of school improvement for Regional School Unit 13, said the results give detailed profiles of individual students' strengths and weaknesses.
“We can drill down into their math performance and see how they did on algebraic thinking, or numeric reasoning," he said. "That’s the kind of information that we value that you don’t get from standardized testing that the state mandates."
As a "norm-referenced" test, its scores indicate the difficulty of question students can answer correctly, independent of which grade they are in. In contrast, proficiency tests, like the state assessments, measure performance against a fixed set of standards for each grade level.
"You're basically picking standards and saying, we think this standard is a good standard for grade six in math," RSU 13 Superintendent John MacDonald said of the state assessments. "The people that are doing that are a committee that are looking at state standards and kind of plucking them and putting them in tests."
The state teacher’s union writes in its position on standardized assessments that it "has concerns with assessments that evaluate student achievement on a static statewide set of standards and skills."
Proficiency tests based on curriculum standards are sometimes criticized for encouraging "teaching to the test." But Bruce Bailey, principal of Troy Howard Middle School in Belfast, said that if the curriculum standards used in the test are good — which he says they are — that should not matter. There are so many that teachers cannot teach them all, he said, so they look at the previous year's questions to see which are covered and arrange their curricula to be sure those standards are taught before the testing period.
Though many administrators advocated for NWEA's growth test to be used as the state assessment, it would not satisfy the federal requirements because it measures growth, not proficiency.
Changing the yardstick
Because Maine has used four different tests since 2009, results cannot be compared from one year to the next. The state changed from “MEAs” to the “NECAP” test in 2010, then to the “SBAC” in 2015. Last year the eMPower ME test, also referred to as MEAs was administered for the first time, setting a new baseline for comparison.
"This year we have only one year of data," RSU 13's Guyer said. "Until you have at least three years of data, you can’t really have a lot of confidence in your metrics in determining much of anything."
Changing the tests meant that the state could not track schools' progress, so accountability systems under No Child Left Behind could not be fully implemented. Superintendent MacDonald said the state identified some schools in his district as needing improvement but never followed up because there was no way of knowing whether growth targets were being met.
“They evaluate your test scores, and they say you need to implement changes, and you come up with your plan and you implement your changes,” he said. “They change the assessment; now you’ve changed the baseline so you don’t know apple-to-apple whether you have improved as far as that standardized test goes or not.”
Though the state assessments cannot yet track improvement of individual schools over time, the NAEP test can show long-term trends for the state.
Lazar, of ETS, said the designers go to “extraordinary lengths” to ensure that that test is comparable over time, and that “if they do break that trend line they do it consciously and on purpose.”
The NAEP test underwent major changes in the early 1990s, including selecting students by grade rather than by age and breaking out results by state. It now has robust trend lines for individual states going back 26 years for math and 24 years for reading, he said. Maine has shown some improvement over that time period, but its average score is still below proficient in math. (See charts 4 and 6 in gallery.)
Maine has been changing its tests because the paper-and-pencil “bubble tests” it was using through 2014 have been criticized for testing only low-level skills, and some say this is partly to blame for the difficulty Maine employers have finding workers with the science and math skills they need among Maine public school graduates.
The state's first attempt to increase the rigor of its assessments and raise proficiency standards with the computer-based Smarter Balanced Assessment Consortium tests in 2015 was a flop. Five Town CSD Superintendent Maria Libby said many teachers felt the tests were not age-appropriate, and students went into them with the expectation of failure. Before the testing period was over that year, the Legislature passed a bill prohibiting its use in the future.
"It was a problem with branding," MacDonald said. "They didn't get the message out right."
The current MEAs, first administered in the spring of 2016 have gotten a better reception than the SBAC, and may have a better chance of sticking around long enough to track school progress.
Libby said she was pleased with the test questions that were released so far. “I thought they were really challenging, good, thoughtful questions,” she said. “The test is better because the state listened to feedback.”
Lois Kilby-Chesley, president of Maine Education Association, the state teacher's union, said March 3, “I think the teachers last year found that it was better, but it’s not perfect."
One problem, she said, was that the results — released in December 2016 — did not come back in time to be useful to teachers in developing their curriculum for the following year.
Measured Progress’s spokesperson declined to be interviewed, but the organization’s founder Stuart Kahl described in a March 2016 opinion piece subtitled “One test can’t do it all,” that the mindset that state accountability tests can serve multiple purposes has led to their misuse, and teachers' expectations of immediate results compounds the problem.
"Unfortunately, the desire for inexpensive machine-scoring for quick turnaround of results means that these tests often emphasize low-level knowledge and skills, with a resulting negative impact on instruction and student achievement,” he wrote.
In other words, the purpose of the test is to apply a single measure to all schools in the state for accountability, not to provide information to teachers to develop their curriculums. However, the Department of Education said in a press release that results will be delivered sooner this year.
Though tracking individual schools’ progress has not yet been possible with the state assessments, at least they can be used for their primary purpose, comparing schools in a single year to identify the lowest performing schools for added scrutiny and supports.
However, MacDonald said he does not use the state assessment data to compare schools in the district because the populations differ from one school to the next.
“You’ve got one school that has 90 kids, you've got another that has 250, and the communities they’re in have different special-ed populations and different poverty levels and everything else.
“I think it’s tough,” he said,”especially when you have these small, more rural-community-type schools (and) a more centralized school like in Rockland, to make any kind of a comparison that’s meaningful.”
When asked if comparing the percentage of proficient students in each school is worthwhile, RSU 71 Superintendent Paul Knowles responded, "I would say no."
"Every school system around this area is different and has a different population of students," he said. "Camden students are different than Rockland students are different than (RSU) 71 students are different from Searsport students are different from (RSU 3) students. They're all different, and so trying to put us all in a big bin, you can't. You can't do it. All schools are doing good things, but again, they're our students, they're different."
Several variables, most significantly poverty, affect each school’s scores differently. The state does not take these variables into account, according to Rachel Paling, director of communication for Maine DOE.
“There is no value measure added for certain student groups or conditions mentioned," she said. "Expectations are applied uniformly to all schools.”
Some see the simplicity and consistency of the measure applied across all schools as a strength, while others suggest the correlation with poverty levels indicates that poverty has a strong influence on the results. Maine Education Association, the state teacher's union, reported in 2013 that an average of 67 percent of students were eligible for free and reduced price lunch in the schools receiving an F grade (based on state assessment proficiency rates) while only 25 percent of students in schools receiving an A grade were eligible for free or reduced-price lunch.
Looking at local schools, we found that those with higher proportions of economically disadvantaged students had lower percentages of proficient students.
The chart above shows how students with free or reduced price lunches performed compared to all tested students in 2014.
No Child Left Behind's requirement to publicize these “achievement gaps” has done little to improve the situation. According Maine’s results on the NAEP test, the gap in average eighth-grade math scores between economically disadvantaged students and their peers is wider now, at 22 points up from an eight-point gap in 2003 when their reporting began. (See image 5 in gallery.)
Five Town CSD Superintendent Maria Libby said that because student eligibility for free or reduced lunches is confidential, schools may not know who the economically disadvantaged students are and it can be difficult to target them specifically. Instead, federal funding for schools with high percentages of those students goes to programs that benefit all students.
Some students say they have no reason or incentive — besides the pleas of teachers and administrators — to try to do well on the tests. Thus the results may not accurately reflect a student’s ability, and differing school cultures might make this factor more or less significant.
A group of seventh-graders from Troy Howard Middle School who were at the Waldo County YMCA on March 17 said they try hard on the tests because it shows their teachers that they care about education, and because they believe their performance affects their placement in classes the following year.
Zoe Zwecker, a senior at Camden Hills Regional High School, said her fellow students are less likely to give their best shot on the state assessments because of the perception that the results do not impact them, but are used only to judge the school.
"There are so many tests, it kind of bogs us down," she said, "so it’s hard to maintain motivation” by the time the state assessments are given in the spring.
One eighth-grader at Troy Howard said most of her fellow students do not understand the point of the state assessments, and estimated that only 5 percent of them put in an honest effort on the tests. She said many students rush through the multiple choice questions and skim the reading sections.
These kinds of shenanigans were easier to spot when the state used paper-and-pencil tests, said Chris Walker-Spencer, principal of Camden-Rockport Elementary School. Teachers could see on the test form and booklet if students had skipped a lot of questions or wrote too little, and send them back to their desks to keep working. But this is not possible with the current computer-based assessments.
Maine Department of Education's recent efforts to improve the tests it uses for its state assessments have the potential to make future data more valid as a measure of school quality and useful to teachers as a curriculum tool. Despite their imperfections, there are those who believe the state assessments provide a useful measure of school success. Walker-Spencer summed up this view during an interview in his office at Camden Elementary School.
"Standardized tests definitely have a role," he said. "They are one — and I emphasize one — important way to measure success of our school. It is important for the public to know how we’re doing. In general, how we’re doing shows up in our test scores.”
As Maine finalizes the measures it will use and the weights they will carry in its school accountability system, some districts are coming up with other ways to publicize their schools' performance. Regional School Unit 71 Superintendent Paul Knowles has initiated academic audits for schools in the district that he said will be made public when they are completed. Five Town CSD is developing its own report card system using a set of indicators which, according to Libby, will present "a more holistic view of the health of our schools."