Search This Blog

May 9, 2013

Big data and big-enough data

Big data is information that is too big to process by any kind of manual data analysis tools. For example Walmart has databases equivalent of 167 times the information in all the books in the Library of Congress. But besides purely technological, there are also significant institutional limitations to process data. For example, in our old assessment system, one data manager was able to process most of the reports within six months or so. It was always too late to make any decisions for the following academic year. It just took too much work to bring our puny (in comparison to Walmart’s) amount of data into a usable form.

Now we are working on a new system, where many different people will be entering data more or less in real time, into one integrated and publicly accessible warehouse. But even with those efficiencies, the question remains – is it the right size and the right quality of data we can actually digest? Is it too big, or too small? Is the quality of data good enough to make it actionable? And finally, are the time and resources used to collect and aggregate it justifiable? Can it actually improve the quality of our decisions? Those are all questions that can only be answered through experiences.

Various accrediting bodies, including NCATE and most states’ departments of education tried to impose the culture of data onto its member institutions. That first attempt more or less failed, because no one knew what the right size of data is appropriate for a particular kind of institution. As a result, most of the teacher preparation institutions contracted the compliance disease. I venture to guess that the quality of data collected actually got worse because of these miscalculated policies. They are now trying to correct their own error by encouraging institutions to think deeper about what data is needed, and how it can be improved and used. Data that is generated for compliance reasons only is always too big. Therefore the ownership of the process turned out actually a lot more important than the size of the data.

In a sense, we all are starting from square one again; and this would be true not just for the teacher preparation programs. The questions to ask in the square one are not what we can collect and what does RIDE or CAEP or a SPA want. Those are entirely wrong questions. The questions should be like this: What do we not know, but would like to know? How can we be surprised by data? Is it interesting to look at? What can we feasibly collect and store? What and how can we process quickly, in time for some decisions to be made? What tools and resources do we have to make all of this possible?

2 comments:

  1. There's also an element of opportunity cost here, as usual. What is the 'right' amount of time and effort to spend on data collection and analysis, compared to our other priorities?

    ReplyDelete
  2. Several of us were talking about this at the workshop today. It was in relationship to what RIDE is requiring of teachers in the State. Repeated research shows that roughly only 3-5% of teachers are rated as failing to meet a minimum standard. Who will sift through all this RIDE data to find and work to improve these teachers...or fire them? Educators, as well as the medical profession are being harassed by repeated requests for more data. As a result, both professions are suffering. Most of the doctors I know tell me they would not go into medicine again. Teachers tell me they are telling their children not to consider teaching as a profession. All of us need data to drive our decision-making processes (e.g. investments, health risks, etc) but how much is enough.

    Well put, Sasha. I'm going to miss your blogs.

    ReplyDelete