February, 2017 - Data Analytics

Prediction is a difficult art. There are some questions that include random variables that simply can’t be predicted. What will the spot price of oil be at close on June 20, 2017? You may be a fabulous forecaster who takes into account historical trends, published production figures, geopolitical risks etc etc and be more informed than anyone on the planet on the topic, but the likelihood of precisely hitting the spot price at close is very very low. This means that although you may be, on average, more accurate than a less capable forecaster, you will nevertheless more than likely to get the final answer wrong. Is this just as useless as someone who is just guessing?

So, are perfect forecasts really the golden standard we need to aim for? Or instead like the metaphorical “running away from the bear meme”, don’t we just need to be better than “the other guy” to get competitive advantage? The answer is Yes, you just need to be better at predicting than your competitors. You don’t need to achieve the impossible…i.e. perfect accuracy of your predictions. Many people simply abandon any effort to get better at prediction once they realise that perfection is unattainable. This is a big mistake…getting better at prediction is both worthwhile and eminently doable.

There is no advantage in predicting things that are perfectly predictable and no-one can predict the totally unpredictable. The competitive advantage lives in the middle. Being better than everyone else at forecasting hard to predict things gives you an edge even though you are unlikely ever to get the answer perfectly right.

In fact, as the diagram above shows there is no competitive advantage in either “totally unpredictable” or “fully predictable” events. No-one is going to get rich predicting the time of the next lunar eclipse anymore. Equations and data exist that make forecasting eclipse events to the second quite mundane. Similarly no-one can predict the next meteor strike (yet), so we are all as inaccurate as each other and no better than pure guesswork regarding when and where the next one will strike. But in between these two extremes there’s plenty of money to be made.

In the above chart the actuals are the orange dots and the blue line is a typical forecast. The typical forecast (blue line) even got the answer perfectly right in period 5, hitting the actual number of 33 precisely. But the Superforecast (orange line) is almost twice as accurate as the typical forecast and yet never got the precise answer correct in any one period. A decision maker armed with the Superforecast is going to be in a much better position than someone armed with the Typical forecast.

So the key is to be as accurate as possible and more accurate than your competitors when it comes to predicting market demand, geopolitical outcomes, crop yields, productivity yields etc etc. Although still unable to predict perfectly accurately, being better than everyone else yields significant competitive advantage when deciding whether to invest your capital, divest that business, acquire that supplier etc etc. So how do you get better at predicting the future…Well that’s where a combination of Big Data and Superforecasting come in.

Big Data is the opportunistic use of the data both internal to your organisation and available from 3rd parties which can use modern data crunching technology to make better predictions about what is likely to happen. Superforecasting is the practical application of techniques borne from cognitive science (commonly misnamed as Behavioural Economics) that overcome human’s natural cognitive biases and lack of statistical/probabilistic thinking to improve forecasting across any expertise domain. Between the two, any organisation can significantly improve its forecasting capability and reap the benefits of clearing away more of the mists of time than their competitors.

The key is not giving up simply because perfect prediction is impossible.

Do you know what activities in your organisation would seriously improve their performance based on efforts to improve their predictive accuracy and then significantly impact the bottom line?

https://www.linkedin.com/pulse/prediction-do-we-need-perfection-just-better-than-popova-clark/

I have been a follower of Philip Tetlock, Dan Kahneman, Richard Nisbett, Thomas Gilovich etc for over 20 years and devoured Tetlock’s recent Superforecasting book on its release. Tetlock has led a team of forecasters using a combination of crowdsourcing, cognitive bias training, performance feedback and other techniques to outperform a suite of other teams at forecasting geopolitical events in a controlled multi-year forecasting tournament. The tournament was run by IARPA (Intelligence Advanced Research Projects Activity including FBI, CIA, NSA etc), which is the Intelligence community’s equivalent to DARPA, as part of their Analysis and Anticipatory Intelligence streams.

IARPA invited a range of teams from multiple elite US universities to forecast hundred’s of important geopolitical questions to gauge their performance against professional intelligence analysts’ forecasts (the professionals had access to classified information but not the teams). In the first year Tetlock’s team beat all their competitors handsomely but also outforecast IARPA’s forecast accuracy goals, not only for the first year but also IARPA’s initial goals for the second and third year of the tournament. The next year Tetlock’s team vastly improved again, vastly outperforming all the new goals (and competitors) such that IARPA decided to cancel the tournament and bring Tetlock’s team into to do all future IARPA forecasting. In many cases Tetlock’s team of Superforecasters, with access only to publically available information, significantly outperforms the intelligence analysts with access to classified information.

But what makes Tetlock’s team of Superforecasters so good at predicting the future? Tetlock, being an academic, took the opportunity to identify what habits and characteristics his team of forecasting geniuses had that other’s don’t. Here’s the attributes he found:

Cautious – they predicted outcomes with less personal certainty than others, they always kept “on the other hand” in mind
Humble – they didn’t claim that they knew everything or that they fully understood all aspects of a problem and would readily change their mind when new evidence came in
Nondeterministic – just because something has happened they don’t ascribe a “Monday’s quarterback” explanation for its occurrence. They keep in mind that events may have turned out differently for potentially unknown reasons.
Actively open-minded – they’re always testing their own beliefs and see them as hypotheses to be tested, not dogma to be protected
Naturally Curious with a need-for-cognition – love solving problems and adding new ideas and facts to their own knowledge
Reflective, introspective and self-critical – they are constantly re-evaluating their own performance and trying to uncover and correct the errors they themselves made
Numerate – tend to do back-of-the-envelope calculations, comfortable with numbers
Pragmatic (vs Big Idea) – not wedded to any one ideology or worldview, prefering reality and facts over opinions and untested theories
Analytical (capable of seeing multiple perspectives) – break problems up into logical parts and consider many different views of a problem
Dragonfly-eyed (value multiple perspectives) – value the inputs of viewpoints that are new and different to their own and can handle assessing differing theories at the same time.
Probabilistic – see events as likely or unlikely (as opposed to will or won’t happen) and are comfortable with uncertainty
Thoughtful updaters – willing to cautiously adjust their previous assessments as new information comes in
Good intuitive psychologists (aware of and able to compensate for common biases) – they understand the various cognitive biases us humans have when thinking about problems and put in the effort to overtly overcome the shortfalls the biases cause
Personal Improvement mindset – always trying to learn more and get better at whatever they are doing
Grit – simply don’t give up until they feel they can’t improve their assessment any further

When I reviewed this list of personal attributes and habits it struck me just how similar these Superforecaster attributes were to the attributes ascribed to truly excellent data scientists. A quick review of articles (e.g. Harvard Business Review, The Data Warehouse Institute, Information Week etc) about the differences between good Data Scientists and great ones turned up this aggregated list (a ✔ indicates an overlap with Superforecasters and ✗ being unique to excellent Data Scientists ) :

Humble (before the data) ✔
Open-minded (will change their mind with new evidence) ✔
Self-critical ✔
Analytical (breaks problems down, does quick back-of-the-envelope calcs) ✔
Persistence/grit ✔
Comfort with uncertainty/able to hold multiple theories at once ✔
Cognitive/Innate Need-for-understanding ✔
Pragmatic (as opposed to theoretical) ✔
Interested in constantly improving ✔
Creative (able to generate many hypotheses) ✗
Has innate understanding of probability/statistical concepts (conditional probability, large numbers etc) ✔
Business Understanding ✗
Understands Databases and scripting/coding ✗

This also works in the other direction with only a couple of Superforecaster specific attributes that aren’t covered by the Great Data Scientist attributes list. Tetlock has studied his Superforecasters with academic rigour whereas the Data Scientist list is likely to be more opinion and untested hypotheses (so I believe the Superforecasters won’t like this article :-), however one cannot but be impressed by the significant overlap of the two lists.

Is it possible that the big successes provided by Data Science to date have actually not been due primarily to the massive data crunching capability and the ubiquity of IoT and social media data collection, but is actually primarily the result of applying the personal habits and attributes of Superforecaster-like Data Scientists to business problems? If so, this says we could get a lot of business value out of applying Superforecaster-like people and approaches to many business problems, especially those that may not have a lot of data available.

Do you know of any applications where a Superforecaster approach might help your organisation?

List of articles:

https://hbr.org/2013/01/the-great-data-scientist-in-fo

http://www.informationweek.com/big-data/14-traits-of-the-best-data-scientists/d/d-id/1326993?image_number=1

https://infocus.emc.com/william_schmarzo/traits-that-differentiate-successful-data-scientists/

http://venturebeat.com/2015/04/20/the-top-3-qualities-of-a-great-data-scientist/

http://www.cio.com/article/2377108/big-data/4-qualities-to-look-for-in-a-data-scientist.html

http://bigdata-madesimple.com/9-qualities-that-make-a-good-data-scientist/

http://www.boozallen.com/insights/2015/12/data-science-field-guide-second-edition

What are the 6 Essential Qualities of a Great Data Scientist?

https://upside.tdwi.org/articles/2016/06/13/five-characteristics-good-data-scientist.aspx

Note that I had to edit the aggregated list due to the fact that some articles were comparing Data Scientists to general IT people, or general executives, as opposed to less effective Data Scientists. This led to some articles concentrating on the very technical capabilities required by every Data Scientist as opposed to the personal characteristics and habits which resulted in excellence.