Here's my bottom line re testing: Wherever possible, we should be relying on direct evidence of the domain we're investigating. And the final judges should be close to the action being judged. If they need or value information from standardized tests, they should be free to do so. (Otherwise large-scale standardized testing should be sample-based for the purpose of gaining information on trends, regions, subjects, etc—not on individual students or schools)
We're always in the end resting our case on fallible judgment. Until and unless we turn ourselves into robots. The further my evidence is from the object of judgment, the less reliable.
The above words are from Road Testing for Schools, a posting by Deb Meier on the joint blog she shares with Diane Ravitch entield Bridging Differences. It caught my attention because it overlaps very much with my own thinking, and what I was doing yesterday. I will explain a bit below the fold.
This is one time I cannot go into as much detail as I would like. I have recently had the opportunity to participate in some serious lobbying on the Hill about NCLB. It is fair to say that any lobbying about NCLB is likely to be addressing issues with the entire approach of how we measure and for what we use those measurements. The material I have quoted, which is from the first and beginning of the 2nd of paragraphs of Meier;'s blog post, point at something which greatly concerns me, that the way we test is so removed from what SHOULD be occurring in the classroom that it neither accurately represents what the students can do nor does it provide meaningful information to the teacher and the student that can be used to inform and improve instruction FOR THE STUDENT BEING TESTED.
Recently we have had reports about civic education showing improvement, at least as shown by the National Assessment of Educational Progress (NAEP) scores. NAEP serves as the nation's educational report card, and is a periodic sampling, not entirely random, of students and schools across the country. It provides a snapshot that can be compared to previous snapshots (not taken every year) to provide trend information on educational progress. It does not realistically allow one to draw specific inferences about individual schools (they may not be included in samples from one year to the next) and certainly allows no inferences to be drawn about individual students. But it does serve as a check on trends in the nation as a whole, and in the past has allowed one to use it to check claims on performance within states. Of course, its value has largely been because there have not been high stakes attached to it, which allows it to serve as a neutral indicator.
To me as a teacher the most meaningful evaluation of what a student can do is to see the student do something: I want a meaningful performance task, or series of such tasks, in which a student demonstrates knowledge and skill by doing a task or a series of task relevant to the domain(s) in which s/he is being instructed. Further, that the student can explain/reflect upon what s/he has done. And by accumulating a collection of such tasks over time (a portfolio if you will) providing evidence of improvement over the time that student has been in my class.
Because the tasks included in such an evaluation approach can - and should - be evaluated by the teacher (after sufficient training so that the teacher can do the evaluation in a consistent fashion) it enables the kind of feedback so essential to improving learning and instruction to occur in a timely fashion, which one shot end of year tests do not. And it provides evidence that is both more comprehensive and more clear of a growth model - the improvement the student has made during the time for which I have born instructional responsibility for her learning.
Let me use Deb Meier's words to provide the clearest example of what I mean by performance tasks:
f I want to know if I should trust Sam to drive a car on his own, the best source for deciding this is to ask someone who has driven with him. I could design a more efficient method—a test that "correlates" with some other criteria for measuring good driving (number of accidents?). But even so, once the word was "out", the correlation would disappear. Even on a driving road test, if I know exactly what they are going to ask me to do (and where), I may narrow my practice down to those particulars. On manual cars, stopping and starting on a hill was the supreme test. "Lucky" were those whose test route didn't include a hill, "phew". No more hours and hours of practicing for that skill. If I want information on the status of U.S. drivers, not just Sam, I'd sample the population with a good road test.
For all its faults, compared with the written test, the road test still is the real thing. The hardest bubble-in or "constructed text" paper-and pencil test on driving won't be of any use to me at all in deciding whether to hand my car over to Sam. Yet what we have done in schooling is try to make the paper-and-pencil driver's test harder, and give it more often, and eliminate the performance test entirely. (Do we agree so far?)
Deb's post is fairly short. She talks about the work of Ted Sizer, who tried to persuade us through his writing that the best way to evaluate what our students were learning was through the use of performance tasks and portfolios. For the past several decades the Coalition of Essential Schools has followed Sizer's approach. Deb Meier has established and run several schools using those principles which have been successful in education students from the kinds of backgrounds where they often struggle in our current model of education. To see the kind of thing one can use as a summative assessment, one can explore what CES provides during its National Exhibition Month (there is a link in Deb's blog post).
I don't intend this diary to teach you everything there is to know about this different approach to assessment. It can be used to ensure that teachers and schools are fulfilling their responsibilities - and note that I prefer the terminology of responsibility, which implies ownership of the task, to accountability, which has become so negatively charged. As a teacher I am responsible for my students regardless of any external oversight of what occurs (and I accept that there will be external oversight - from my school, my district and through these at least my state), and like many teachers my standards for that are far more rigorous than those applied through measures such as end of course tests.
I also know that it is possible to use such an approach, sometimes designed and usually applied and evaluated at the local level, for "accountability" purposes even on a statewide basis. This is already being done to some degree or another around New York City,in RI, in Wyoming and Nebraska.
If our goal is to serve all of our students, we need to consider how we use those tasks we do to inform both the students and their teachers in a meaningful way what the students can do and how much they have learned. I think this is a far superior approach than a series of end of course tests, or even pre- and post-testing, which is unfortunately ow the predominant (and sometimes the sole) method of assessment being used.
I will be interested in your reponses.