By Irene Aldridge
Big Data, Machine Learning and Artificial Intelligence are three du-jour buzzwords of today’s business. If your business does not do one of the three, you risk being considered tardy, inefficient, or, gasp, uncool, particularly with the dreaded taste-making millennial set. Worst of all, you may miss the next chance of becoming a unicorn — a billion-dollar entity like Google and Facebook that deployed Big Data, Machine Learning and Artificial Intelligence techniques to turn reams of data points into solid gold. Various Big Data, Machine Learning and AI methodologies, trends and ideas, will be discussed in detail at the upcoming 7th Annual Big Data Finance conference to take place at Cornell Tech in NYC on May 9–10, 2019 (to register, please click here: https://bit.ly/2UTAVIr).
While many people and companies enjoy throwing around the new buzz terms, few have a clear understanding, let alone distinction, of the terms. This article attempts to shed perspective on the much-talked-about areas, their commonalities and differences.
We’ll start with machine learning, which was the first discipline to apply efficiency brought on by the computational power of computing to problem solving. Like traditional statistical and econometric analysis, Machine Learning was developed to answer questions about Nature:
- How does it work?
- Why does a sequence of inputs X generate outputs Y, as shown in Figure 1?
Perhaps the earliest idea of machine learning can be traced back to the 1950s Control Theory — a science of feedback loops and error minimization made possible with the invention and proliferation of computer technology. In the mid-1980s, machine learning came up with neural networks, still the cornerstone of machine learning today. A neural network is an advanced optimization tool that, by trial and error, delivers complex functional relationships between a set of observable inputs and outputs. It is different from most of traditional forecasting and econometric modeling. In traditional statistics or econometrics, researchers make assumptions about data distributions ahead of the analysis. Unlike traditionalists, Machine Learning experts make no assumptions about the data whatsoever and let the data (and computers) decide what fits best. Figure 2 and 3 illustrate the differences between traditional statistical analysis and machine learning.
The main drawback of Machine Learning has always been its computational complexity. To accurately map or fit a function transforming inputs X into outputs Y, computer programs necessitated millions of iterations. The iterative nature of Machine Learning resulted in two major issues: over-fitting and (relatively) slow processing. Over-fitting refers to the situation where the output function fits the observable data X and Y closely, but, perhaps, has little to do with the “true” relationship between X and Y, with many observations not yet available. The over-fitting problem has been plaguing industries like Finance, where the data used traditionally was collected on a daily basis and, as a result, expensive to generate and use: just 750 daily trading observations amount to full three years of financial data!
Scientists have come up with ways to penalize fitting X to Y too closely, leaving room for the models “to breathe” — to allow for a potential modeling error and more successful application to the data yet unseen. Still, pure Machine Learning has had adoption challenges, mostly due to the cost and inefficiency of heavy-duty processing required by iterative approach of Machine Learning algorithms. The number of times a Machine-Learning program needs to run to generate a solid non-linear prediction can number in hundreds of thousands, which can cost a lot in terms of time and processing power required.
The processing power conundrum has been largely solved by the computing industry via cloud technology (outsourced computation on distant and cheap server farms) and generally, ever decreasing costs of computers due to the insatiable demand for technology from people of all walks of life.
While the new, more powerful, computer chips moved the needle on the time aspect of Machine Learning as well, computer innovations alone have not sped up Machine Learning enough to make it a daily routine of researchers. Data Science of Big Data, however, did just that. With the advanced mathematical methods, Data Science optimizes the optimization of Machine Learning, making it fast and useful for frequent applications. Figure 4 illustrates the idea.
What exactly does Data Science do? Whether applied to Machine Learning algorithms or to raw data, Data Science identifies the core characteristics of the data at hand. These characteristics can often be summarized by what has long been known as characteristic values, or alternatively, singular values, eigenvalues from German or principal components. These data descriptors capture statistical properties of the data in a succinct and computer-friendly way, elucidating the key drivers of data in the process. Armed with the key drivers, the researchers’ problem data set shrinks instantly into a manageable smaller-scale optimization problem. Best of all, the characteristic values are able to capture the “feel” of the entire data population, mathematically stretching way beyond the observable X and Y we have fed it or Machine Learning algorithms to consume. As such, the over-fitting problem largely dissipates and the state-of-the-art machine inferences emerge.
What kind of inferences are we talking about? Inferences that made billions of dollars for companies like Google and Facebook, of course! And while Google and Facebook focused on modeling of online human behavior, the data of other industries can bring their own pots of gold to the capable hands.
And what about the Artificial Intelligence, this beast that evokes images of cyborgs in Arnold Schwarzenegger’s most famous movies? It turns out that the Artificial Intelligence is a direct by-product of Data Science. The traditional statistical or econometric analysis requires a researcher to form a “hypothesis” by asking whether a specific idea is True or False, given the data. The unfortunate side effect of the analysis has been that the output can only be as good as the input: a researcher incapable of dreaming up a hypothesis “outside the box” would be stuck on mundane inferences. The Big Data clears the boundaries, instead telling the researcher the key features and factors of the data. In this sense, Big Data is explaining all possible hypotheses to the researcher, without any preconceived notions. The new, expanded frontiers of inferences are making even the dullest accountant-type scientists into the superstars capable of seeing the strangest events appear on their respective horizons. The Artificial Intelligence then is the output of Data Scientists letting the data do the talking and the breathtaking results and business decisions this may bring along.
Irene Aldridge is a data scientist, researcher, Visiting Professor at Cornell University and Managing Director of AbleMarkets, a Deep Learning and Big Data Platform for Finance. She is a co-author of “Big Data Science in Finance: Mathematics and Applications” (forthcoming) and an organizer of the annual Big Data Finance Conference, a forum for Big Data trends and advances in Financial services (May 9–10, 2019, BigDataFinance.org). Ms. Aldridge is also the author of High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems (Wiley, 2013, second edition) and co-author of Real-Time Risk: What Investors Should Know About Fintech, High-Frequency Trading and Flash Crashes (Wiley, 2017).