Neural Nets: Use with caution

There has been significant hype surrounding artificial intelligence and neural nets recently and with good reason. AI is superb at learning to respond appropriately to a set of inputs like a stream of video or audio data and handling complexity beyond the capacity of us mere humans. Recent efforts in natural language processing and driverless vehicles have been nothing short of astounding.

As the tools of data mining have morphed into big data, the capabilities coming from these global leading AI projects are being incorporated into the toolkits available to the corporate data analyst. Many tools such as R, Oracle Data Mining, Weka, Orange and RapidMiner as well as libraries associated with languages like Python, make neural nets readily available to a vast range of analytic endeavours.

But caution is advised. The great results achieved by these major AI projects have been achieved with a few little noticed advantages. Firstly, the humans have already determined that there is indeed a signal in the data that they are trying to model. When the video stream comes in that the parking sign says “Handicapped Only”, the AI Neural Net quickly learns that parking the driverless vehicle there is an error. But the fact that the parking sign is verifiably there in the data is known by the modelling Data Science team apriori. A neural net will find the signal that we knew was there. Another advantage is the sheer quantity of data. Teams working for Google, Apple and the like use billions and even trillions of data records too train their neural nets. This makes up for the relatively high number of “degrees of freedom” inherent in their neural net models.

However in many real world situations we are looking for signals in the data that may or may not be there. For instance if we are trying to predict the future performance of one of our regional sales locations based on attributes of the centre and its surrounding catchment we may or may not have sufficient information in the data to detect a signal. Perhaps a natural disaster will impact the performance or there will be industrial action, or a terrorist act or… The key is that the data we possess may or may not be sufficient to make a reliable prediction. If it is sufficient, that’s great, but, if it isn’t, we need to know that fact.

Unlike more traditional predictive analytic techniques (like regression, survival analysis and decision trees), neural net models are difficult for us mere humans to interpret. But the powerful data tools I mentioned above will let our corporate data science team casually throw a neural net model at the data. What’s more they will invariably produce seemingly more accurate results than the traditional models. The old “this model makes no logical sense” is not available to neural nets and leaves our data science team at high risk of modelling simple noise. This is particularly a problem when we re-use the same holdout sets when trialing hundreds of slightly differently configured neural net models.

So who cares? Essentially using un-interpretable but powerful neural net models may make you feel like you are more accurately predicting the future. But in reality it may be simply capturing the noise from your input data. You may waste lots of time chasing ghosts or worse you may deploy a model into operation which performs little better than chance in the real world.

Have you seen an example of a Data Science team chasing ghosts?