Search This Blog

Mar 27, 2017

Thou shalt not worship data, because much of it is no good

Here is what happened: we (higher ed/researchers) oversold the public on the idea of data-informed (or evidence-based) improvement decisions, and are now paying for it. While I see lots and lots of data being collected, the decisions made with the collected data as critical piece of evidence are rare. When we have problems, for example, in teacher preparation, we know about them before data is available to corroborate. When data contradicts the anecdotal evidence, we tend to distrust it. In those rare occasions where data is reliable, timely, and complete, it is more often than not correlational, and thus tells us little about causality. For example, you may find that class size correlates with failure rate. So what? It is very likely that there is a confounding variable that explains both, and you could not or did not think of measuring. If, for example, you find that there is no correlation between student evaluations and grades – yes, this busts the myth that grade inflation is fueled by student evaluations. While correlation does not imply causation, the lack of correlation usually mean the absence of causation (or a measurement error). Similarly, if you find no correlation between the scores on your math placement test and student performance on subsequent courses, your placement test is no good. However, if you do find a correlation, it does not mean the test is good. So, before collecting anything, the simple check is – what are you going to do with it, exactly? Don’t collect just because it is there, and hope it will bring some useful knowledge – it won’t. And if you’re an accrediting agency – don’t ask for data that will not result in any decision.

My basic claim is this: an organization has to evaluate the usefulness of any data like this: i=R/U, where R is resources expended to gather, analyze, and keep the data, and U is the potential usefulness of it for real decisions. One very important stipulation is this: R should include time expenditures. Sadly, it is often the case that all the time is spent on collecting and crunching, so no time or strength is left for using it. My best guess is that in the overwhelming majority of cases i is greater than 1. In other words, we’re wasting a lot of valuable time.

Of course, much of data is collected because various accrediting bodies tell us to do so. However, they also have no idea how exactly the data is going to be used, and if it is any good. For example, NCATE asked us to measure impact on student learning that teacher candidates make. So we all figured out some sort of action research thing for our candidates, with pre and post-test, figuring out the effect size, etc. we complied with the requirement to measure impact on student learning, but that is simply bad science. We could never teach teacher candidates even how to build valid measurements. An instrument simply cannot be validated after a one-time use. Or else, NCATE made us measure candidate performance as we observed in the field. But those observation rubrics often produce flat, uninteresting data, because they are not reliable, and don’t measure what they intend to measure. Even more rigorously designed instruments like Danielson framework, show only modest correlation with teacher quality as measure by student achievement. But in the field, with dozens of supervisors who change constantly, who has time or money for interrater reliability training?

Everyone looks at colored charts, happy, pretending those numbers mean something. And we pretend that oh, yes, we looked at this, and made this specific decision. I don’t want to accuse everyone, but in most cases it is not true. Notice, I am not saying it is never true – the good examples are too rare to justify the enormous time and effort.

Many of us got a case of what I call “the compliance disease.” It feels good to be proficient t something, and we find clever ways of collecting data, aligning it to standards, and presenting. The process itself takes a lot of skill and creativity, so we forget that it is less than useful in the end. This is a common phenomenon – people get better and better at figuring out how to comply, and stop questioning what they have agreed to comply with.

There is a class of data that has direct significance: how many students do we have, what are average class sizes, which groups succeed more and which tend to do drop out, where are the bottlenecks, etc. It is just the measures of quality, derived from performance standards that remain elusive. And it is after at least 30 years of trying. Measuring quality of higher education is still an aspiration rather than reality. We can measure quality of K-12, but very narrowly. It is like looking at vast landscape through a keyhole of standardized testing. But in higher ed, we cannot see much at all.

The data technology is still primitive. What we have now is really quite basic hand tools that require a lot of human labor and subjective judgments. All I am saying is that brains are more needed to improve things we already know need fixing, than on collecting mountains of data we have no time to do anything with. We should only do things that move us forward.

I am not suggesting we give up on the idea of data-informed decision-making. The alternative is pure guessing, or gut instincts – all notoriously unreliable means of decision-making. The alternative is going back to the dark ages. Many people, including me, are hoping that the next generation of data tech, based on naturally occurring digital traces, in combination with the neural networks and predictive analytics will change everything. In the meanwhile, modesty is virtue.

No comments:

Post a Comment