I just read Teacherken's diary on the subject of teacher evaluations with some interest—one of the reasons I don't feel drawn to write very many diaries is because Teacherken normally says what needs to be said before I've ever even had the chance to start writing. But this time I think there are some substantive things left out of the discussion. By dint of being in a classroom for most of their childhood, everyone thinks they're experts on education; the call for evaluations by politicians, pundits and parents is a perfect example of this. There haven't been enough voices from the classroom speaking against the lunacy. Not everyone is waiting for Superman. I, for one, think that the current move towards 'evaluating teachers' is wrong. I may not have Teacherken's decade plus experience and my subject is art, not civics, but even fifth year art teachers have a stake in this. Let me say it bluntly: the current move to enshrine 'teacher evaluations' is misguided and potentially destructive if not outright evil. But why are evaluations so bad?
Teachers Are Already Evaluated
The first problem is that evaluating teacher performance is not new. Although there is a notion that teachers work a few years, gain tenure, then stop trying because their jobs are 'safe,' it is a simply isn't true for the majority of teachers. Teachers have lost their jobs for all manner of reasons—being a Wiccan in Florida, bringing kids to an art museum with nudes in it in Texas. They weren't safe, tenured or not. If they are lucky, such teachers might have professional appraisal instruments to fall back on when a grudge-bearing parent or tin-pot dictator of an admin has it out for them. That's actually why tenure even exists, to prevent outside influences from stifling academic discussion and debate. Even in instances where a teacher is terminated for reasons of competence, the proof of their incompetence comes in the form of their professional appraisals. It already works and has for decades. Why are we suddenly so gung-ho to change it?
Part of the problem is one of politicians talking out of both sides of their mouth. A recent study conducted in North Carolina found that most government employees—including teachers—were rated as 'above standard.' This was noted with some sarcasm that it would require redefining the 'law of averages' be rewritten; they can't all be above standard after all. But that is exactly what is called for by No Child Left Behind—teachers can no longer be merely qualified to teach, they must all be highly qualified. Since the continuing qualification to teach is based on ongoing evaluations, by law all teachers must be above standard.
Testing Doesn't Prove Anything
There is another popular notion that test scores prove excellence in teaching. I'm calling BS on that one. Testing, at its best, only proves the ability of that student to take that test at that time. In order for any test to be considered statistically valid, it would have to be taken multiple times and in diverse settings to eliminate any of the inherent randomness of the resultant score. Even then it may not be enough. It Johnny Student takes the same test five times under different circumstances and scores an 85 every time, you could be fairly confident he had mastered 85 percent of the material. But if his sister Suzie has scores ranging from 25 to 100 could you hae the same confidence in her average score? What if the 15 percent Johnny got wrong was different every time? Even if repeated testing worked perfectly, there isn't enough time in the year to test everything enough times to eliminate statistical error.
Because of this statistical difficulty, the pay-for-performance cronies use two tricks to try and make testing-as-evaluation seem worthwhile. The first is to aggregate the data. The problem with aggregation is that it has no mechanism to increase the confidence interval (the reason why repeated testing does is because the scores can be compared one to one, but how would glomming together Johnny and Suzie's scores prove that either score is a valid measure of material learned?). The resultant set of data is used to compare one teacher to another, one class to another, one school to another and all of these to an aggregation of the whole. How is this valid? Mr. Simmon's kids are not Ms. Murray's kids; comparing their aggregate scores is as meaningless as pulling a single child's scores from each class at random and then claiming one teacher was better than another.
The second merit-pay inspired trick is the value added model. This model assumes a certain level of growth from year to year. If little Suzie grows more than expected, the teacher gets the credit for the difference having 'added value.' If little Johnny grows less than expected, the teacher similarly carries the blame. But what about Mortimer who didn't deviate from the expected growth? Apparently that means that the teacher sat on their tuchus and did nothing. This model quite literally assumes that students will grow in their abilities whether there was a teacher there or not. Do we really think that? Do you think that, once you've mastered Trig, Calculus just floats into your brain without any outside assistance? It's like the old Garfield 'learning by osmosis' poster; It'd be silly if so many people didn't take it seriously.
But no matter what type of testing you use, there will always be the problem of which students are taking the test. An incompetent, should-be-fired teacher with a room full of geniuses might get amazing test results (and even apparent added value) while the well-meaning teacher trying to get students to grade level when they are multiple years behind would have every right to be satisfied if the majority barely passed. But under this current testing and evaluation regime, we'd keep the incompetent one, wouldn't we?
What Would a Good Test Be Like?
A well-written test might have 5 questions relating to each concept, each one asking about a different portion of it or from a different angle. If the student can answer 4 out of five of those questions, we can assume they have mastered the concept. We can then measure the proportion of the concepts they have mastered, but it must also be noted that not all concepts are as important as others, so we must also have a way of assigning weight to each section. Even then we must test and retest to try and eliminate any error in the test (otherwise how do we know that a student who only got 1 of 5 in a section didn't know it? What if the questions were faulty?) Even then, such a test can only measure knowledge; no standardized multiple choice test can measure understanding. No matter what, though, test scores alone whether raw, aggregated or value-added should not be used to evaluate a teacher's performance.
So How Should Teachers Be Measured?
If we want to evaluate teachers, we first need to figure out what a good teacher does. It can't be high test scores—if that was the case then the best practice for classroom teachers would be to simply have their students cheat. So what? Good teachers motivate and engage their students. Good teachers produce students that are more capable and knowledgeable at the end of the semester/quarter/year than they were at the beginning. Good teachers impart social skills and experience. Good teachers prepare their students for subsequent courses, college and a career. There may be many others, but we should be deciding that as a community. Whatever you come up with a good teacher should do, then that—and that alone—is how they should be measured.
So let's look at engagement—are the students there, awake, actively learning? Let's look at motivation—do they do the work, share their learning with others? Let's look at growth—if little Johnny started at a 3rd grade level and ended at a 6th, shouldn't that be an amazing outcome even if it was supposed to be for a 9th grade test? Let's look at community engagement, civic-mindedness and other measures. Let's look at how they did the year after. Let's talk to the students. Let's talk to the teacher's peers. While we're at it, we might try actually talking to the teachers.