The New York State Education Department has been working on creating a VAM measure of high school principals to be used this year, even though its parameters have not been shared with those who will be evaluated. It was just introduced to the Board of Regents this month.
Below is a letter that I sent to the Regents expressing my concerns. Thanks to Kevin Casey, the Executive Director of SAANYs, Dr. Jack Bierwirth the Superintendent of Herricks, and fellow high school principals Paul Gasparini and Harry Leonardatos for their review and input.
May 18, 2013
Dear Members of the Board of Regents:
It is mid-May and the school year is nearly over. High school principals, however, have yet to be informed about what will comprise our VAM score, which will be 25% of our APPR evaluation this year. A Powerpoint presentation was recently posted on the State Education Department website, following the April meeting of the Regents. The very few ‘illustrative’ slides relevant to our value-added measure do not provide sufficient detail regarding how scores will be derived, or information regarding the validity or reliability of the model that will be used to evaluate our work. The slides also do not answer the most important question of all—what specifically does VAM evaluate about a principal’s work?
Upon seeing the slides, I contacted SAANYs and they provided additional information. What I received raised more doubts regarding the validity, reliability and fairness of the measure. I will be most interested to read the BETA report when it becomes available. More important, it is apparent that like the 3-8 evaluation system, there may be unintended consequences for students and for schools.
Construct validity is the degree to which a measurement instrument, in this case the VAM score, actually measures what it purports to measure. The measure, therefore, should isolate the effect of the high school principal on student learning, to the exclusion of other factors that might influence student achievement. Because this model does not appear in any of the research on principal effectiveness, we do not know if it indeed isolates the influence of a high school principal on the chosen measures outside of the context of factors such as setting, funding, Board of Education policies and the years of service (and therefore influence) of the individual principal.
Simply because AIR can produce a bell curve on which to place high school leaders, it does not follow that the respective position of any principal on that curve is meaningful. That is because the individual components of the measure, which I discuss below, are highly problematic.
The First Component—ELA/Algebra Growth
The first proposed measure compares student scores on seventh and eighth grade tests against scores on two Regents exams—the Integrated Algebra Regents and the English Language Arts Regents. It is a misnomer to call it a growth measure. The Integrated Algebra Regents, which is taken by students between Grades 7-12, is a very different test than the seventh and eighth grade math tests. It is an end or course exam, not one that shows continuous growth in skills. Because it is a graduation requirement, some students take it several times prior to passing.
Because many students take the Integrated Algebra Regents in Grade 8, the amount of data points with which to compare principals will also vary widely across the state. For example, if you were to use the Integrated Algebra scores of my ninth-grade students this year, you would have 14 scores of the weakest math students in the ninth-grade class. That is because about 250 ninth-graders passed the test in Grade 8. You would have a few more scores if you included 10th-12th graders who have yet to pass the test. These scores would be the scores of ELL students who are recent arrivals, students who transferred in from out of state or from other districts, students with severe learning disabilities, or students with attendance issues. In many of these cases, there would be no middle-school scores for comparison purposes.
At the end of the day, perhaps there would be 20 scores in the possible pool. How is that a defensible partial measure of the effectiveness of a principal of nearly 1200 students? There are other schools that universally accelerate all students in Grade 8, and still others who accelerate not all, but nearly all eighth-graders. There are still other schools that give the Algebra Regents to some students in Grade 7, thus further complicating the problem.
The second measure that comprises the Math/ELA growth measure compares similar students’ performance on the eighth-grade ELA exam and the ELA Regents. Some schools give 11th graders that test in January and others in June. That means that principal ‘effectiveness’ will be, as in the case of Algebra, compared using different exams at different times of the year. The ‘growth’ in English Language Arts skills takes place over the course of three years in Grades 9, 10, 11. Therefore, any principal who has been in her school for less than three years has only proportional influence on the scores.
The Second Component—The Growth in Regents Exams Passed
The second component of principal effectiveness counts the numbers of Regents examinations passed in a given year, comparing the progress of similar students. This is a novel concept, but again there is no research that demonstrates that it has any relationship to principal effectiveness, and like the first measure, it is highly problematic.
First, not all Regents exams are similar in difficulty, although they are counted equally in the proposed model. There are 11th graders who take the Earth Science Regents, a basic science exam of low difficulty, and others who take the Physics Regents, which the state correctly categorizes as an advanced science exam. Both groups of students may have accrued the same number of Regents exams (5) and have similar middle-school scores (thus meeting the test of ‘similar student’), but certainly Earth Science would be far easier to pass. Yet for each exam, the principal gets (or does not get) a comparative point.
And what of schools that are unusual in their course of studies? Scarsdale High School only offers 6 Regents exams, choosing instead to have its students take rigorous tests based on Singapore Math. It also gives its own challenging physics exam in lieu of the Regents. Will the principal of Scarsdale High School be scored ineffective because he cannot keep up with the count with his high performing students? Or will he be advantaged in the upper grades when his high performing students are now compared to students with low Regents counts who frequently failed exams, thus disadvantaging the principals of schools serving less affluent populations?
The Ardsley School District double accelerates a group of students in mathematics. Some students enter their high school having passed 3 Regents exams—two in mathematics and one in science. Who will be the ‘similar students’ for these ninth-graders? How will the principals of portfolio schools, which only give the English Regents, receive a score? Is the system fair to principals of schools with no summer school program, thus giving students fewer opportunities to pass the exam? How will a VAM score be generated for principals of BOCES high schools who give few or no Regents exams? Will those Regents exams, taken at BOCES, reflect the score of the home school principal who has absolutely no influence on instruction, or the BOCES principal? The scores are presently reported from the home school.
The Board of Regents allows a score of 55 to serve as a passing score for students with disabilities. How will this measure affect the principals of schools with large numbers of special education students, especially those schools who have, as their mission, the improvement of the emotional health of the student, rather than the attainment of a score of 65?
The Unintended Consequences of Implementation
All of the above bring into question the incentives and disincentives that will be created by the system. This is the most important consideration of all, because the unintended consequences of change affect our students. Will this point system incentivize principals to encourage students to take less challenging, easier-to-pass science Regents rather than the advanced sciences of chemistry and physics? Will schools such as Scarsdale High School and portfolio schools abandon their unusual curricula from which we all can learn, in order to protect their principals from ineffective and developing scores?
Will principals find themselves in conflict with parents who want their children to attend BOCES programs in the arts and in career tech, rather than continue the study of advanced mathematics and science that are rewarded by the system? Will we find that in some cases, it is in a principal’s interest that students take fewer exams so that they are compared with lower performing ‘similar’ students? What will happen to rigorous instruction when simply passing the test is rewarded? Will special education students be pressured to repeatedly take exams beyond what is in their best interest in order to achieve a 65 for ‘the count’? No ethical measure of performance should put the best interests of students in possible conflict with the best interests of the adults who serve them.
Most important of all, how will this affect the quality of leadership of our most at-risk schools, where principals work with greater stress and fewer supports? School improvement is difficult work, especially when it involves working with high needs students. This model does not control for teacher effects, therefore it is in fact a crude measure of both teacher and principal effects. If the leadership of the school is removed due to an ineffective VAM score, who will want to step in, knowing that receiving an ineffective score the following year is nearly inevitable?
Why would a new principal who receives a developing score want to risk staying in a school in need of strong leadership, knowing that it will take several years before they can achieve substantial improvement on any of these measures? The response that VAM is only a partial measure of effectiveness is hollow. An additional 15% is based on student achievement, and the majority of composite points are in the ineffective category, deliberately designed so that ‘ineffective’ in the first two categories assures an ineffective rating overall.
We frequently see the unintended consequences of changes in New York State education policy. The press recently noted a drop in the Regents Diploma with Advanced Designation rate, which resulted from the decision to eliminate the August Advanced Algebra/Trigonometry and Chemistry Regents. The use of the four-year graduation rate as a high-stakes measure has resulted in the proliferation of ‘credit recovery’ programs of dubious quality along with teacher complaints of being pressured to pass students with poor attendance and grades, especially in schools under pressure to improve. These are but two obvious examples of the unintended consequences of policy decisions. The actions that you take and the measures that you use in a high-stakes manner greatly affect our students, often in negative ways that were never intended.
You have before you an opportunity to show courageous leadership. You can acknowledge with candor that the VAM models for teachers and principals developed by the department and AIR are not ready to be used because they have not met the high standards of validity, reliability and fairness that you require. You can acknowledge that even if they were perfect measures, the unintended consequences from using them make them unacceptable. Or, you can favor form over substance, allowing the consequences that come from rushed models to occur. You can raise every bar and continue to load on change and additional measures, or you can acknowledge and support the truth that school improvement takes time, capacity building, professional development and district and state support.
I hope that you will seize this moment to pause, ask important questions, provide transparency to stakeholders and seek their input before rushing yet another evaluation system into place. Creating a bell curve of relative performance may look like progress and science, but it is a measure without meaning that will not help schools improve. Thank you for considering these concerns.