There was a time when ancient wise men used astrology the way we use science and analysis today to make important decisions. But the astrology was more reality-based than we appreciate these days. If you wanted to know when tons of tasty mammoth meat would come lumbering down to the perilous annual water crossing, or the best time to plant crops, paying close attention to the apparent motion of the sun, moon, and stars paid off. So, our early analysts reasoned, if subtle celestial changes foretold the migrations or the coming of summer, perhaps even more subtle motion could predict the best time to get married, or start a war.
Nowadays, with the benefit of hindsight, we know that the calendar really does depend on celestial mechanics, and we know individual horoscopes are just for fun. But you can’t blame those early astronomers for trying to use what worked, and considering some of the other jobs available at the time, it was a pretty good gig.
Today there’s a new predictive discipline, and it’s not hard to find modern-day analogues of those ancient stargazers practicing it. They peer into and write arcane scripts full of confusing glyphs to extract what they say is a valuable product, one that lends itself to just about every aspect of modern commerce. Which is not a bad description of modern science in general, but we call this particular gig data science, and it has rapidly become one of the hottest jobs in the world.
But data science is not as new as some people think. In fact, it’s older than your grandparents. What’s new is there’s been an explosion in it for a simple reason: you!
The wiki defines data science in part as “an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” There’s an assumption out there that companies like Google and Facebook invented it, but it’s been around in one form or another for as long as detailed records have been kept. In fact data science is a bit of a misnomer: all science is data-driven. Weather forecasting and climate science depend on detailed local records on temperature, wind, and rainfall that go back for centuries in some places. In the world of business, insurance companies and credit reporting agencies are both familiar examples of data science in action that have been around since the industrial revolution.
Old or new, one important thing to understand about the field is that you, as a retail, private person, are not generally the consumer of these kinds of products and services: you are the product. Or rather, everything you do—your birth date, school transcripts, rent and mortgage history, buying habits by day of the week or time of year, etc.—are all part of the rambling footprint you create on your personal journey through life. For most of history, only a fraction of all that stuff was written down and preserved. But today, virtually everything is recorded for everyone. In that sense the internet, social media, and especially smart phones and tablets with the endless apps that track and report your every physical, social, and financial move are arguably way more responsible for the explosion in data science over the last several years than any single technology or company.
Many would argue that data science is just another name for statistics; that the stats and formulae are merely souped up thanks to computers fueled with all this new data about who did what, when, where, and how. But the impetus behind the explosion in data science, what has businesses owners so interested in it, is the notion that with an exponential jump in data points summarizing your activities 24/7, any business serving your needs and desires might benefit from an exponential jump in target marketing and smart advertising that converts into good old-fashioned, bottom-line dollars. But the way that manifests itself in the day-to-day lives of us ordinary people ranges from invisible to annoying to creepy. For example, say I text you about a cool wakeboard I saw on sale, and the next time I sign into Facebook or read a blog, there are ads for about sales on wakeboards and other water sport items in the margins. Well, that’s a nicely targeted ad!
As the industry matures, it’s getting more comprehensive, bigger, and faster, sharing more info between giant databases measured in petabytes and exabytes. How that all meshes together is more complicated, and some might say it’s also a little more intrusive. As an example, it might mean that automated cross referencing creates a record—one I know nothing about— that Steven Andrew who writes on Daily Kos is the same guy who lives at 123 Main Street USA, overdraws his checking account a couple of times a year around the holidays, and spends more than 300 bucks at a mail-order pharmacy every quarter. And while one such person isn’t worth much, a spreadsheet containing 50,000 similar individuals complete with current phone numbers, email and snail mail addresses, and fave websites would be of great value to a competing mail order RX company or a payday lender.
What works in business might also work in politics, for fundraising, PR, passing on misinformation, or delivering votes. If a social media platform is set up to correlate my profile with wakeboards and populate ads to click or feed me articles to read and share accordingly, it would presumably work the same way for those interested in material supporting or criticizing a particular party, candidate, or issue. It’s no surprise to anyone in the field that Russian operatives looking to influence the U.S. election might have given that a whirl.
Because there’s been such an explosion in data science, there’s been a mini-explosion in various systems for managing and applying it. You can download demo versions of some, like Alteryx, Splunk, and Tableau to name a few, and play around with some really fun software. If this field interests you, if you are not afraid of spreadsheets and have some facility for basic math, there’s no reason why you can’t learn to use many of the features of these programs. There are YouTube channels dedicated to support, active communities of users, and in some cases, even live experts at the respective companies you can chat with and learn from.
There’s also a lot of hype surrounding data science: anything involving methods and specialized knowledge that is mysterious to the average layperson, and potentially leads to fabulous riches, is ripe for exaggeration or misrepresentation. In the hands of the less-informed or the outright unscrupulous, data science could be exploited, turned into a pseudo-science making all sorts of preposterous claims and promises reminiscent of the laughable contraptions using electricity to cure all ills that popped up in the early 1900s. Since electricity really was so damn useful (and yet so new and mysterious), it was easy to fool the uninformed about its actual capabilities. A similar dynamic could be in play with something as useful and promising as data science.
As an analogy, remember that those stars in the zodiac making up the familiar constellations mentioned above the fold are distributed more or less randomly from our planetary perspective. But random does not mean uniform. Random means clusters, singles, triplets, and pairs, and mottled regions of emptiness. When we divine meaning from small changes in those clusters or regions of sparsity, we might be seeing something that turns out to be very real and very important, like the astral signs marking the first day of summer. But guard against the human tendency to impose patterns on fluke random distribution, and going on to assign the invented figures with meaning, influence, and even intent that cannot be tested. They may be no more real or lasting than faces seen in white, puffy clouds drifting overhead on a warm, lazy day.