Tuesday, January 10, 2012

Where does all the data come from?

Tonight we discussed an amazing statistic I saw on Wikipedia.  My goal was to have a conversation where the kids would have to create hypotheses as to why the statistic might or might not be true.  This worked on their logical thinking, creativity and judgement of data skills.

According to this post (http://en.wikipedia.org/wiki/Big_data) 90% of the data in the world today has been created in the last two years.  I told this to the kids and asked them how is this possible?

The discussion started with Olivia laying out a hypothesis - "there is a lot of new technology that must be generating data" she said.  We stumbled around this for a few minutes and I realized we needed to define what data was in order to figure out why so much of it has been created in the last two years.  So I asked the question - "give me some examples of data".

Nicholas said - "quantitative measurements like the temperature."

Olivia added - "qualitative data".  I asked what she meant. "Descriptions of things like the wick of the candle is white."  After a bit more discussion we generalized qualitative data to include both text and pictures.

Jamie then said - "well if data includes pictures - how about photographs?"

The list building continued for a few minutes and included things like machine generated information, internet search data, e-commerce data (what you clicked on), and cell phone texts.

Once we had a pretty good list I reverted back to the original question.  "How could it be possible that 90% of the world's data has been generated in the last few years."  It was incredible to see how once they had a better understanding of what was included in data, their ability to come up with legitimate hypotheses rose dramatically.

Olivia - people are generating huge amounts of data on Facebook and other social media sites when they post photos and other things.

Jamie - people are using their cell phones to take lots of pictures (and after some prompting) and send lots of texts.

Nicholas - the use of the internet for e-commerce has exploded and they capture every click you make.  On this one I pushed - "but e-commerce has been around for a long time".  He thought about it and said - "but so many more people have access to the internet now through their smart phones so the usage must be a lot higher"

The discussion also covered the fact that the cloud has made it much cheaper and easier to store information so that we can keep more of it and duplicate it for back up.

Unbelievable.  They even thought of things i hadn't!  I would have loved to turn the conversation in the direction of trying to estimate if the 90% statistic seemed reasonable, but alas it was time for homework and bed.  Never the less, in 15 minutes we had tested the validity of the Wikipedia statement by thinking through what it really meant (what is data?), coming up with a basic hypothesis (technology is creating it) and then generating specific examples that allowed us to judge the statistic to be potentially reasonable (social media, smart phones, increased access, the cloud etc.).  Repeat that 1000 times and we'll have some pretty good critical thinkers.

No comments: