However, I am still at work, so these are my five cents on the
new teacher evaluation system that RIDE is implementing this year. None of
this is news to them; I have had many opportunities to share my thoughts with
the RIDE team members responsible for this impressive project. I am writing as
a concerned friend, not as a disengaged critic. It does have a good potential,
and I wish it very much to succeed.
The main idea is to use the value-added model to evaluate
teacher effectiveness. In other words, if your students show growth, you must
be an effective teacher; if they do not, you are not effective, whatever you
say and whatever your credentials are. Intuitively, this makes a lot of sense
to the policy makers and to the public at large. And RIDE’s statistics experts
developed a very clever model that measures just the growth, not the absolute
test scores, against the average growth rates in the state. There are also
multiple safety checks in place for teachers not to be dismissed on accident.
First, the growth model is only about one third of the evaluation; the rest is
observations and professionalism. Second, you’d need to show several years of
low performance to be actually dismissed. Third, you will be offered help along
the way.
There are still serious scholarly concerns
about how the model is going to behave in a large-scale trial. Most of them have to do with measurements’
stability. If you were excellent one year, but then poor in the next; the
measure is not likely to be accurate. It happens more than people expect – an instrument
may be tested to be valid, but after scaling up its use, or after the
conditions of use change (say, from clinical to field application), it loses
its reliability and validity. The non-measured, external influences may become
too strong, your selection of sample becomes less random (more biased), and its
size turns out to be too small. Will it happen in this case? We don’t know yet.
The RIDE team has run some older data through the model, and it seems to be checking
out. But no one can say it will be fine once the data is collected in the
context of the high-stakes system. The math in the model is not really a
problem. (Well, it may disadvantage teachers who work with gifted students –
those tend to score very high on any tests to begin with, so their growth may
not be as impressive. It may reflect poorly on teachers who work with students
who are so low, their growth is invisible on available instruments.)
Once the evaluation system is established, people will start
manipulating it, consciously or unconsciously. That pull may or may not be
strong enough to undermine the validity of the central measure, but we simply
cannot tell in advance. It is very hard to predict how the pressures of the new
system will affect teachers’ and principals’ behavior. For example, if I work
for a non-NECAP-able subjects and grades, I get to establish my own learning
objectives and measure their achievement with an instrument I construct. There are
very
good guidelines on learning objectives, and they could be mastered, no
doubt. But it takes years of trying to develop a good sense of what’s
achievable, to construct a good instrument to measure growth, and people may
set learning objectives as too high or too low. Every incentive is to set them
too low though.
There is a comprehensive lesson observation and teacher
evaluation tool developed on Charlotte Danielson’s framework. RIDE estimates
a principal will spend 10 hours a year on evaluating each teacher. I think it
is an underestimation, because the learning curve needs to be factored in. Coventry
High School has 172 full time teachers, and Frank D. Spaziano Annex Elementary
School has 8; on
average 42. Even most optimistically, it adds to 420 hours, or 56 full days
(if you assume 7.5 hour workday) or 11 full weeks. One third of the entire
school year time is gone from the principal’s time budget, if she or he did it
alone in an average school.
And then again, enter the human factor. Most of observation
criteria are by necessity vague, the time is very limited, and the stakes are
fairly high. From my experience, this is the recipe for the “regression to the
excellent” phenomenon, which we are struggling with in teacher preparation. If
you are a principal, checking 80 items within 50 minutes, and you know it actually
matters, you will be tempted to evaluate everyone high, just to be safe. Then
you get flat, uninteresting data in the end, where everyone is above average.
Looking at the data will reinforce your low buy-in. That is the real danger.
Once people lose faith in the system (even when they are at fault), their next
cycle of observations becomes even less accurate. Why should I care if this
does not tell me anything useful anyway? You can probably tell I am speaking
from experience here. What begins a big scare ends up being a biggest joke.
Now, I don’t want any of these things to happen, and I hope
they won’t. This is not a call to abandon or dismantle the new evaluation
system. We should give it a very serious try, and work earnestly on using what
we all learn. Expect years of finding new unintended consequences, not
despairing, but fixing them all, one by one.
I do, however, believe that the timelines set by the Federal
Government are utterly unrealistic. The State’s educators led by the “RIDE
rangers,” no matter how competent and hard-working, simply cannot deliver a
functioning evaluation system within a couple of years. It would be also
absolutely unrealistic to count on that system to work properly within the next
five years. So when we pin our other items on the reform agenda on this
unrealistic hope, we only increase the uncertainty. For example, moving the
professional development requirements from certification into new the
evaluation system is just a hugely risky. We are dismantling one quality
control mechanism, on pure hope that the new one will be better. Yes, the old
system was not that great, but at least we know it worked somewhat. Remember
American education has been slowly improving over the last thirty years by
almost every measure available.
There is a huge distance between a promising idea and a
working public policy, with all its underlying processes and procedures. The
new evaluation system elevates the level of complexity tenfold, because of the
sophisticated information technology requirements, and the number of decisions
that needs to be made and recorded. One cannot expect the Great Leap Forward. Didn’t
we try this before? The Goals 2000, anyone? We all remember what happened- the
financial collapse, the stimulus money, the mad rush to spend it. Mistakes have
been made, but they must be corrected. The sense of urgency is great, but not
when it can actually make things worse rather than better.
This is not really a message to RIDE – they cannot do
anything about the timelines in the Race to the Top grant, on which the entire
State (with the exception of higher education) has happily signed. The feds
screwed up (which never happened before, right?) We should try to persuade our
Congressional delegation to work with the Federal Department of Education to
allow for more flexibility.
I worry that every new failed reform undermines our
collective ability to hope, to learn, and to trust each other. And we do need
all of those three things to move education forward. Hope, learning, trust is
what we need the most. It is easy to get cynical, and just wait for all this to
pass, for stuff to hit the fan, etc., etc. That is not much of an option,
really. Educators in this State have already invested an ocean of energy into
the reform. Let’s just do it right this time.
Well said - do wish all those legislators would read your blog! I saw new programs fail in international schools too, when even good faculty were worn down with new program after new program, and the previous ones were 'erased'. Let's hope this one is done slowly enough to tweak along the way.
ReplyDeleteConnie
Well said - I do wish all the legislators would read your blog.
ReplyDeleteI saw also in internationals schools reforms come and then be 'erased' for a new one over and over, and even very good teachers lose hope and get discouraged. Here's hoping this reform will be done slowly enough to tweak it as we go along.