July, 2016 - Data Analytics

When designing enterprise level information technology solutions, a new dichotomy is becoming increasingly apparent: transaction generators vs information consumers. When thinking in terms of single applications we have traditionally thought in terms of business functionality: e.g. an accounts receivable function needs to be supported by an accounts receivable system. It has been standard to rely on the individual application’s reporting system to satisfy the analytics requiring portion of the user base. For more esoteric requirements we have started to rely on extracting data into data warehouses and, more recently, data lakes to mash up data from the various functions and also with 3rd party data (like data streams).

However, it is becoming more clear that there is indeed a set of users who do very little transaction generation and really don’t need to be users of our transaction system at all. This includes executives, analysts, consultants, auditors, data scientists, and so on: information consumers. This class of users can be better served if they can instead obtain almost all of their requirements from a comprehensive data lake. Indeed such a design is preferable as this can be a single source for all of their analysis, removing the need to get access to and learn how to use all of the various data transaction systems which generate the source data. The transaction generation systems can then concentrate on capturing transactions efficiently and effectively, freeing themselves from needing to accommodate the needs of these fundamentally different types of users.

Aggregating all of the required analytics data into a single enterprise repository has many synergistic benefits. In a single repository, data from multiple functions are able to be seen in light of whole-of-enterprise and even whole-of-supply chain and whole-of-market contexts. User familiarity with their analytics interface is dependent not on the content of the data, but on the analytics task: Visualisation (e.g. Tableau, Qlik, PowerBI) for building interactive dashboards, continuous monitoring for alerting and compliance (e.g. ACL, Logstash, Norkom), data discovery for investigations (e.g. SAS Analytics, Spotfire), reporting (e.g. Cognos, BusinessObjects), search (e.g. elastic), data mining (e.g. Oracle Data Mining, EnterpriseMiner) and statistical analytics (e.g. SAS, R).

But doesn’t this approach violate the concept of single point of truth (SPOT)? No, the transaction system might indeed be the SPOT, but our data repository can be a replica of the SPOT and differ in known ways (e.g. is a copy as per midnight the previous night or as per 30 minutes ago). For the vast majority of information analysis needs, this level of “fresh enough” is perfectly fit for purpose.

Functional systems have the most rudimentary analytics capability (likely just some basic reporting) which is only a small fraction of the analytics capability of value to information consumers. Executives and other information consumers have really been short changed by the rudimentary data analytic tools provided by functionally focussed transaction systems up to now.

Modern enterprise solution architects need to split the data analytics functions from the transaction systems: free the data and then deploy the analytics power of Big Data.