Skip to main content

There's a lot of talk these days about testing, fairness, and education.  A lot of this relates to the field known as psychometrics.  Since it seems like some of this is relevant to some people here, and since that's what I got my PhD in, I thought I'd share some thoughts.  A gentle introduction to the field, if you will.  So, if you will, join on the flip, and if you won't, then I'll see you on other diearies.

Any measurement of any quality - weight, income, intelligence, scholoastic aptitude, whatever - has certain qualities that can be used to judge how good a measurement it is.  Three that are of concern to psychometricians are relibability, validity, and bias.

Reliability is whether measure is accurate and conssitent.
The most common ways to assess reliability are test-retest reliability and split-half reliability.  Test-retest means you give the same test (or measure) to the same people more than once, and see how similar their scores are.  Split-half means you divide the items on a test into two parts, and see if the two parts have similar scores.

Validity is whether it measures what you think it measures.  Although there are lots of types, they boil down to two sorts of things: Face validity - does the measure seem right?  and all the other kinds, which all involve seeing whether the measure is related to things you think it should be, and not related to things it shouldn't be.

Bias is systematic over- or under-estimation.

For something like weight, this is relatively straightforward.  Step on a scale.  Get off. Step back on. Do the numbers match?  Are they close?  That's a measure of reliability.  (split half reliability isn't really meaningful here).  If you add a weight tot he scale, do the numbers go up?  Do things that seem bigger and heavier weigh more than things that seem smaller and lighter?  Do you weigh more with your clothes on than naked? More after a big meal than before?  All those are signs of validity.  OTOH, if your weight changes when you touch your nose, that's not so good.

Bias would mean that a scale consistently over-estimated or underestimated your weight.  If three scalce say you weigh 120 pounds, and another says you weigh 130, amd there are similar differences for other people, then at least one is biased.

When it comes to things like intelligence or scholastic aptitude, though, things are murky.  They are also controversial.  This combination is hardly one to lead to senseible debate!  For intelligence, in fact, things are so murky that they might as well be statements by George Bush.

Why?  Well, for intelligence no one knows what it means.  There's no agreed upon definition.  So, does IQ measure intelligence? Well, if you tell me what intelligence is, I will tell you if IQ measures it.  I will say, though, that IQ seems to be related to something that most people seem to think of as intelligence.  In general.  But if you had a 10 minute conversation with 10 people who had IQs of 140, 10 with IQs of 100, and 10 with IQs of 60, you'd probably be able to guess which is which.  And IQ tests are at least moderately reliabile (not nearly as reliable as bathroom scales, but better than tea leaves or astrology), so they are measuring something.

For scholastic aptitude, things are a little better.  We have at least some idea what we mean by that.  It has something to do with getting good grades.  But not exactly, becuase it is APTITUDE, and isn't intended to measure desire, or time availability, or a dozen other things.
But there's a bigger problem with assessing the validity of SATs - which colllege you go to is determined in part by your SAT score, and what grades you get is determined in part by which college you go to.  There are ways around this, in particular, there's a statistical technique called hierarchical linear models (aka mixed models, random effect models, and several other terms) that seems promising.  I have, however, seen no studies using this method for SATs and college grades.  

There's even MORE to the problem, because the SATs might be more valid for some people than for others.  In particular, people with various disabilities (blindness, deafness, and various learning disabilities) may be measured less validly than more typical people.

SATs, like IQs, are at least somewhat reliable.

The real controversy comes in with regard to bias. Some groups, in particular certain racial and ethnic groups, do worse on IQs and SATs than other groups.  That's just a fact, and it's not controversial.  What is blazingly controversial is why these differences exist.  Part of it is due to factors that correlate with race - Number of siblings, parental income, parental education, likelihood of one- versus two- parents in the home.  But, AFAIK, some difference still persist, and no one is sure why.  I certainly don't know.

But another part of the controversy is whether SATs should be used for college admission, given the above.  Well, the real problem here is what to use instead.  Typical choices are high school grades, college entrance essays, and interviews.  The problem is, these other methods are even more biased than SATs.  And interviews have MUCH lower reliabilities than SATs.

I don;'t have any answers, but I hope this at least illuminates the questions.  

Originally posted to plf515 on Tue Apr 11, 2006 at 07:22 PM PDT.

Your Email has been sent.
You must add at least one tag to this diary before publishing it.

Add keywords that describe this diary. Separate multiple keywords with commas.
Tagging tips - Search For Tags - Browse For Tags


More Tagging tips:

A tag is a way to search for this diary. If someone is searching for "Barack Obama," is this a diary they'd be trying to find?

Use a person's full name, without any title. Senator Obama may become President Obama, and Michelle Obama might run for office.

If your diary covers an election or elected official, use election tags, which are generally the state abbreviation followed by the office. CA-01 is the first district House seat. CA-Sen covers both senate races. NY-GOV covers the New York governor's race.

Tags do not compound: that is, "education reform" is a completely different tag from "education". A tag like "reform" alone is probably not meaningful.

Consider if one or more of these tags fits your diary: Civil Rights, Community, Congress, Culture, Economy, Education, Elections, Energy, Environment, Health Care, International, Labor, Law, Media, Meta, National Security, Science, Transportation, or White House. If your diary is specific to a state, consider adding the state (California, Texas, etc). Keep in mind, though, that there are many wonderful and important diaries that don't fit in any of these tags. Don't worry if yours doesn't.

You can add a private note to this diary when hotlisting it:
Are you sure you want to remove this diary from your hotlist?
Are you sure you want to remove your recommendation? You can only recommend a diary once, so you will not be able to re-recommend it afterwards.
Rescue this diary, and add a note:
Are you sure you want to remove this diary from Rescue?
Choose where to republish this diary. The diary will be added to the queue for that group. Publish it from the queue to make it appear.

You must be a member of a group to use this feature.

Add a quick update to your diary without changing the diary itself:
Are you sure you want to remove this diary?
(The diary will be removed from the site and returned to your drafts for further editing.)
(The diary will be removed.)
Are you sure you want to save these changes to the published diary?

Comment Preferences

  •  Tips? Comments? Argumetns? (4+ / 0-)
    Recommended by:
    Emerson, hubcap, ek hornbeck, Capn Guts

    This is the spot.

    "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

    by plf515 on Tue Apr 11, 2006 at 07:21:46 PM PDT

  •  good start (1+ / 0-)
    Recommended by:

    i was disappointed that you ended by talking only about SAT's (and IQ's).  in the age of NCLB, other acronyms and weeks of testing play a big role in making school boring... and graduation impossible... for a whole lot of kids.  see e.g. http://www.civilrightsproject.harvar...

    •  Thanks (0+ / 0-)

      I don't know that much about the NCLB tests; I studied this stuff long before that happened.

      I do agree that a lot of NCLB is just plain dumb, and I've said so in comments on other diaries (I think I made these comments on one of TeacherKen's diaries)

      "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

      by plf515 on Tue Apr 11, 2006 at 07:36:30 PM PDT

      [ Parent ]

      •  The testing for NCLB (1+ / 0-)
        Recommended by:

        should be measuring achievement, which is completely different from IQ testing, and somewhat different from aptitude testing.

        Aptitude testing like the SAT seeks to predict future performance, and although some of the measurement may be of things you already know or are expected to know (and a lot of the SAT is), the goal of the test is still predictive.

        Achievement testing, which is what most classroom tests are, simply tries to measure if you've mastered certain learning objectives (and even higher-order skills, like analysis, criticism and synthesis can be learning objectives).

        At least that's my recollection from a 3 credit course in Test and Measurement - no PhD here.

        We all go a little mad sometimes - Norman Bates

        by badger on Tue Apr 11, 2006 at 08:05:28 PM PDT

        [ Parent ]

        •  OK (0+ / 0-)

          That's about what I knew about the NCLB tests - what they are supposed to be doing.  But I haven't seen anything on their reliability or validity.

          The SAT isn't supposed to measure future performance, least not exactly.  It's supposed to measure aptitude, which is a little different.  SAT scores ought to measure something like 'future performance if effort were equal'

          "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

          by plf515 on Wed Apr 12, 2006 at 03:43:39 AM PDT

          [ Parent ]

  •  I think... (1+ / 0-)
    Recommended by:

    ...that the SATs have some use.  For example, I enrolled in a community college and my SAT scores were high enough to allow me to skip the placement tests normally required.  So it personally benefitted me.

    However, to use it as a measure of intelligence is just stupid.  My SAT didn't get me into college, or help me get in.  An enrollment form and $45 did that.  Many smart people suck at tests.  So using them for college admissions seems pointless to me.

    We will appoint as...officials, only men that know the law of the realm and are minded to keep it well. -- Magna Carta, #46 (-6.25, -7.18)

    by DH from MD on Tue Apr 11, 2006 at 07:30:45 PM PDT

    •  What should we use instead? (0+ / 0-)

      I agree that there are people who such at tests but would do well in school.  And people who do well on tests, but won't do well in college.  No one is proposing (least of all me) that they be the ONLY criterion.  But the other criteria are, if anything, worse.

      "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

      by plf515 on Tue Apr 11, 2006 at 07:34:34 PM PDT

      [ Parent ]

      •  I don't know (0+ / 0-)

        I'd suggest using mainly high school scores and merits for admissions.  Though I didn't mean to imply anything about you.

        We will appoint as...officials, only men that know the law of the realm and are minded to keep it well. -- Magna Carta, #46 (-6.25, -7.18)

        by DH from MD on Tue Apr 11, 2006 at 08:34:46 PM PDT

        [ Parent ]

        •  The problem with HS grades (0+ / 0-)

          The problem with HS grades is 1)  that they aren't comparable to each other - there are thousands and thousands of high schools, and an A from one is not necessarily an A from another.  This is helped a little by using class rank.  But some HS are now refusing to release class rank (saying it puts too much pressure on kids) and even when they do, being in the top 10% of different classes means different things in different HS as well.  2) They are also subject to bias - teachers are human, and may, consciously or not, grade kids unfairly.

          OTOH, HS grades /*do*/ measure some things that SATs do effort in class.  So, I think we should use both.

          "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

          by plf515 on Wed Apr 12, 2006 at 03:47:57 AM PDT

          [ Parent ]

          •  and yet correlate w/college grades (0+ / 0-)
            at least as well as SAT, which lacks any predictive value beyond the 1st year, for a variety of reasons, and which really does not account for that much of the variance in 1st year college grades.

            Those who can, do. Those who can do more, TEACH!

            by teacherken on Wed Apr 12, 2006 at 07:16:18 AM PDT

            [ Parent ]

            •  Data (0+ / 0-)

              I have seen others say this, as well, but never seen the actual data.....have you got a link? Or a citation?  I tried googling (include google scholar) and found very little.

              But there is also the statistical problem - as I mentioned in my diary.  Straight correlation or regression would assume that the observations are independent, when they clearly are not. First, GPA is not independent of school; second, if you try to figure out multiple years, then GPA in year 2 is not
              independent of GPA in year 1.  

              Then there's the problem of figuring in for people who drop out...

              Very tricky.  I'd be very interested in any articles you (or anyone) can find on this

              "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

              by plf515 on Wed Apr 12, 2006 at 07:21:33 AM PDT

              [ Parent ]

              •  not readily available (0+ / 0-)

                and sorry, today is the first of two days set aside for doing taxes, so I cannot search.  I know I have stuff in hardcopy someplace that cites this, but cannot look for it now.

                also, given my own experience of doing test prep  - for Princeton and another company, a total of 5 years  -  I would question the reliability of SAT because of how much I was able to raise scores.

                I am not a fan of SAT, even though I personally do quite well on such tests.

                Those who can, do. Those who can do more, TEACH!

                by teacherken on Wed Apr 12, 2006 at 07:26:54 AM PDT

                [ Parent ]

  •  Rehabilitate the SAT (0+ / 0-)

    The SAT actually began as a way to make college more accessible to less privileged students.  Before the SAT, admissions at top colleges were sometimes granted based upon nothing more but a letter of introduction from a well-placed family member or school dean.

    The SAT could be seen as a democratic challenge to admissions policies that are biased in favor of privilege.  Unfortunately, the same inequities in education that it was meant to address simply repeated themselves in the training for the test.  So priviliged kids get $100/hour tutoring and poor, public-school students get nothing but a foreign-looking puzzle.

    What's worse, the test gets slammed (often deservedly) in a way that can only diminish the motivation of the students who would benefit from it the most.

    I'm working on it, though.

  •  One question. And you better get it right. (1+ / 0-)
    Recommended by:

    At least SAT and ACT tests are, in the main, voluntary and used by colleges who can pick and choose how much weight to give them.

    The same cannot be said about the  State tests under NCLB which are given unusually high credence by government officials in terms of validity and reliability.

    For the most part, they attempt to measure whether or not a student is scoring at "grade level".  What is grade level?  Well, that is the median score in a random sample of students in that grade:  half score above, half score below.  

    And so, under NCLB, the trick is to get everyone, 100% up to grade level.  But you see the problem, we already have determined that grade level is simply a statistical mid-point and does not really exist in reality.  In reality, kids are coming and going from schools, rising and falling in interest, attention, engagement--in point of fact, just about everything imaginable is happening to those kids.

    And yet, if just 1 of them does not test up to the median for their grade level, the school will be judged to have failed by 2013.  

    Psychometrician Man, tell me if this makes sense to you on a statistical level.

    Education? Teaching? NCLB? Read my book _Becoming Mr. Henry_

    by Mi Corazon on Tue Apr 11, 2006 at 08:05:54 PM PDT

    •  Lake Woebegone already compliant (1+ / 0-)
      Recommended by:
      Mi Corazon

      Every week the report says that 'all the children are above average'.

      One small school district in Minnesota down, rest of United States to go.

      Live Free or Die-words to live by

      by ForFreedom on Tue Apr 11, 2006 at 08:41:49 PM PDT

      [ Parent ]

    •  Well, no, that makes no sense (0+ / 0-)

      I am not that familiar with NCLB legislation.  But it's hard to believe that even the Republicans would be dumb enough to say that everyone has to be above the median - that's an impossibility.  The median is the point where 50% are below and 50% above.

      Others here will know more, but I /*think*/ what they say is that all the kids have to be at some proficiency level.  

      Even then, though, the idea of putting that much emphasis on one test, with limited reliability and validity, is highly questionable.  

      "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

      by plf515 on Wed Apr 12, 2006 at 03:52:08 AM PDT

      [ Parent ]

      •  depends where state sets proficiency (1+ / 0-)
        Recommended by:
        in theory a state can set proficiency levels at a high enough point that you are trying to achieve Lake Wobegon effects

        or because of the political implications of too many children "failing" the state could set its levels so low that almost everyone is shown as proficient.

        We have clear examples of states manipulating cut scores on their own tests to show "improvement"  --  in Virginia this happened with SOL tests for US history at both the high school and middle school levels.  Most of the "improvement" in pass rates between the testing in 2001 and that in  2003 was because the cut scores were significantly lower.

        With respoect to NCLB in theory NEAP is supposed to be used as a control, but that means it will cease to lose its value as an  research evaluator because states will begin to game it, as they already have in s-called state NEAP.   And that is still independent of what the proficiency levels of NEAP mean, a point on which Bracey for one fulminates with regularity.

        Those who can, do. Those who can do more, TEACH!

        by teacherken on Wed Apr 12, 2006 at 07:21:04 AM PDT

        [ Parent ]

  •  Not familiar with SAT... (1+ / 0-)
    Recommended by:
    but I watched my husband go through GMAT.

    I don't think it measures what it purports to measure. For one thing, there are all those books that "train" you to get a higher score. And they work. They work by teaching people how to get around the weird quirks of the GMAT questions. That's not measuring "aptitude" at English, or math, or anything but reading the test producers' minds.

    I have been reading since I was three. I was reading Dickens for pleasure when I was ten; I have probably read at least 40,000 books in my life. I edited a small newspaper for five years. I have written graduate-level papers, corporate vision statements, technical documentation, cost-benefit studies for new technology, letters to the editor, comments on usenet and blogs, a number of short stories, and two novels. I have made a particular study all my life of the effective use of English. I've been at it for sixty-plus years, and I'm still working at it.

    I say this not to brag, but to make a point. You see, if I had taken the test without preparation, I would have scored quite badly in the language section. Now, if I can't do reasonably well on a test that supposedly measures "aptitude" for English, then I'm confident it's not me that's at fault. It's the test.

    I would have done worse, in fact, than some of my husband's fellow students whose native language was not English, who had little grasp of idiom and could barely construct a grammatical sentence. They did well only because they had drilled on sample test questions over and over and over again.

    That drilling, by the way, did not help them to handle the level of English required by an MBA program. They struggled; if they were lucky, they'd have a fellow team member who could edit their contribution to the team's assignment to be readable.

    My conclusion? The GMAT doesn't actually measure anything except the learnable but essentially useless skill of aceing the GMAT.

    Folly is fractal: the closer you look at it, the more of it there is. - TNH

    by Canadian Reader on Tue Apr 11, 2006 at 09:17:44 PM PDT

    •  A good point, clearly (0+ / 0-)

      Yes, the role of the tutoring firms and preparation books is problematic.  The evidence for their effect on SAT scores is not conclusive, but there is at least some evidence that they help.  They may help more for the GMAT, which is moer speicalized.

      Unfortunately, there is no way to ban these.....

      As for your particular case; there's a Yiddish saying "For instance is not proof".  I admit, as do all responsible parties, that some people who are highly competent (obviously you are)  do badly on these tests.

      "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

      by plf515 on Wed Apr 12, 2006 at 03:56:04 AM PDT

      [ Parent ]

      •  The interesting thing is (0+ / 0-)
        the reason I would have done badly.

        For many of the sample questions I saw, the "right" answer depended, not on accurate comprehension or a large vocabulary, but on noticing deliberate trickery. In real life, the texts and academic papers a grad student is required to read may at times be obscurely or awkwardly phrased, and will often use very specialized terminology -- but they are never purposely deceptive.

        The "training" works because the people constructing the GMAT test have only a limited bag of tricks. Once you have seen them repeated often enough, you can learn to avoid the traps.

        On the composition questions, GMAT scoring rewards rote learning of a handful of pre-approved formulas that can be quickly filled out with details, rather than original thought or clear, effective expression. The training books are quite frank about this. They explicitly tell students that these formulas they're learning will be of no use to them in graduate studies.

        The problem is not the existence of preparation books, and the solution is not to ban them. They would not be able to improve scores -- nobody would want to buy them -- if the GMAT were really testing what it pretends to test: the ability to excel in graduate school.

        I think whatever correlation is found to exist between the test and success in grad school has a common third cause. The test selects, not so much for aptitude, as for a pragmatic willingness (and ability) to game the system. That is a success factor of sorts, I suppose, but it's not the one the GMAT claims to be measuring.

        Moreover, it penalizes some students who are naively idealistic and interested only in the subject matter they want to study. Unfortunately, it is this latter group, caring only about knowlege for knowledge's sake, too impatient to waste time on anything as irrelevant as gaming the system, that is most likely to give us a real advancement in human understanding of the world. What is needed is a better test, one that can select for the people who really should be in grad school.

        Folly is fractal: the closer you look at it, the more of it there is. - TNH

        by Canadian Reader on Wed Apr 12, 2006 at 09:20:18 AM PDT

        [ Parent ]

  •  sorry to be so late to discussion (1+ / 0-)
    Recommended by:
    had not known of diary until receiving your email.  I would have recommended for the quality of clear explanations of key terminology.

    Keep posting.

    Those who can, do. Those who can do more, TEACH!

    by teacherken on Wed Apr 12, 2006 at 07:22:06 AM PDT

    •  thanks! (0+ / 0-)


      It is so hard to keep track of all the diaries here!
      And I keep adding people to my hotlist!

      I'm gonna have to retire just to read daily kos!  

      "Necessity is the plea for every infringement of human freedom. It is the argument of tyrants, and the creed of slaves." William Pitt

      by plf515 on Wed Apr 12, 2006 at 07:26:15 AM PDT

      [ Parent ]

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site