Oct 7, 2011

The human factor

It is a gorgeous Fall Friday, one of those days that can put one’s senses in the state of hyper-alertness. Certain smells, shades, and views bring out misfiled, but never quite discarded memories. This is all I want to think and write about.

However, I am still at work, so these are my five cents on the new teacher evaluation system that RIDE is implementing this year. None of this is news to them; I have had many opportunities to share my thoughts with the RIDE team members responsible for this impressive project. I am writing as a concerned friend, not as a disengaged critic. It does have a good potential, and I wish it very much to succeed.

The main idea is to use the value-added model to evaluate teacher effectiveness. In other words, if your students show growth, you must be an effective teacher; if they do not, you are not effective, whatever you say and whatever your credentials are. Intuitively, this makes a lot of sense to the policy makers and to the public at large. And RIDE’s statistics experts developed a very clever model that measures just the growth, not the absolute test scores, against the average growth rates in the state. There are also multiple safety checks in place for teachers not to be dismissed on accident. First, the growth model is only about one third of the evaluation; the rest is observations and professionalism. Second, you’d need to show several years of low performance to be actually dismissed. Third, you will be offered help along the way.  

There are still serious scholarly concerns about how the model is going to behave in a large-scale trial.  Most of them have to do with measurements’ stability. If you were excellent one year, but then poor in the next; the measure is not likely to be accurate. It happens more than people expect – an instrument may be tested to be valid, but after scaling up its use, or after the conditions of use change (say, from clinical to field application), it loses its reliability and validity. The non-measured, external influences may become too strong, your selection of sample becomes less random (more biased), and its size turns out to be too small. Will it happen in this case? We don’t know yet. The RIDE team has run some older data through the model, and it seems to be checking out. But no one can say it will be fine once the data is collected in the context of the high-stakes system. The math in the model is not really a problem. (Well, it may disadvantage teachers who work with gifted students – those tend to score very high on any tests to begin with, so their growth may not be as impressive. It may reflect poorly on teachers who work with students who are so low, their growth is invisible on available instruments.)

Once the evaluation system is established, people will start manipulating it, consciously or unconsciously. That pull may or may not be strong enough to undermine the validity of the central measure, but we simply cannot tell in advance. It is very hard to predict how the pressures of the new system will affect teachers’ and principals’ behavior. For example, if I work for a non-NECAP-able subjects and grades, I get to establish my own learning objectives and measure their achievement with an instrument I construct. There are very good guidelines on learning objectives, and they could be mastered, no doubt. But it takes years of trying to develop a good sense of what’s achievable, to construct a good instrument to measure growth, and people may set learning objectives as too high or too low. Every incentive is to set them too low though.

There is a comprehensive lesson observation and teacher evaluation tool developed on Charlotte Danielson’s framework. RIDE estimates a principal will spend 10 hours a year on evaluating each teacher. I think it is an underestimation, because the learning curve needs to be factored in. Coventry High School has 172 full time teachers, and Frank D. Spaziano Annex Elementary School has 8; on average 42. Even most optimistically, it adds to 420 hours, or 56 full days (if you assume 7.5 hour workday) or 11 full weeks. One third of the entire school year time is gone from the principal’s time budget, if she or he did it alone in an average school.

And then again, enter the human factor. Most of observation criteria are by necessity vague, the time is very limited, and the stakes are fairly high. From my experience, this is the recipe for the “regression to the excellent” phenomenon, which we are struggling with in teacher preparation. If you are a principal, checking 80 items within 50 minutes, and you know it actually matters, you will be tempted to evaluate everyone high, just to be safe. Then you get flat, uninteresting data in the end, where everyone is above average. Looking at the data will reinforce your low buy-in. That is the real danger. Once people lose faith in the system (even when they are at fault), their next cycle of observations becomes even less accurate. Why should I care if this does not tell me anything useful anyway? You can probably tell I am speaking from experience here. What begins a big scare ends up being a biggest joke.

Now, I don’t want any of these things to happen, and I hope they won’t. This is not a call to abandon or dismantle the new evaluation system. We should give it a very serious try, and work earnestly on using what we all learn. Expect years of finding new unintended consequences, not despairing, but fixing them all, one by one.

I do, however, believe that the timelines set by the Federal Government are utterly unrealistic. The State’s educators led by the “RIDE rangers,” no matter how competent and hard-working, simply cannot deliver a functioning evaluation system within a couple of years. It would be also absolutely unrealistic to count on that system to work properly within the next five years. So when we pin our other items on the reform agenda on this unrealistic hope, we only increase the uncertainty. For example, moving the professional development requirements from certification into new the evaluation system is just a hugely risky. We are dismantling one quality control mechanism, on pure hope that the new one will be better. Yes, the old system was not that great, but at least we know it worked somewhat. Remember American education has been slowly improving over the last thirty years by almost every measure available.

There is a huge distance between a promising idea and a working public policy, with all its underlying processes and procedures. The new evaluation system elevates the level of complexity tenfold, because of the sophisticated information technology requirements, and the number of decisions that needs to be made and recorded. One cannot expect the Great Leap Forward. Didn’t we try this before? The Goals 2000, anyone? We all remember what happened- the financial collapse, the stimulus money, the mad rush to spend it. Mistakes have been made, but they must be corrected. The sense of urgency is great, but not when it can actually make things worse rather than better.

This is not really a message to RIDE – they cannot do anything about the timelines in the Race to the Top grant, on which the entire State (with the exception of higher education) has happily signed. The feds screwed up (which never happened before, right?) We should try to persuade our Congressional delegation to work with the Federal Department of Education to allow for more flexibility.

I worry that every new failed reform undermines our collective ability to hope, to learn, and to trust each other. And we do need all of those three things to move education forward. Hope, learning, trust is what we need the most. It is easy to get cynical, and just wait for all this to pass, for stuff to hit the fan, etc., etc. That is not much of an option, really. Educators in this State have already invested an ocean of energy into the reform. Let’s just do it right this time.

2 comments:

  1. Well said - do wish all those legislators would read your blog! I saw new programs fail in international schools too, when even good faculty were worn down with new program after new program, and the previous ones were 'erased'. Let's hope this one is done slowly enough to tweak along the way.
    Connie

    ReplyDelete
  2. Well said - I do wish all the legislators would read your blog.
    I saw also in internationals schools reforms come and then be 'erased' for a new one over and over, and even very good teachers lose hope and get discouraged. Here's hoping this reform will be done slowly enough to tweak it as we go along.

    ReplyDelete