By Irene Aldridge In the Spring of 2013, a lively discussion on LinkedIn went like this: – "If someone says 'Big Data' one more time, I am going to throw up", declared Head of Marketing at a prominent software firm – "Agree, 'Big Data' is such an annoying buzzword", chimed in Head of Research at a mid-tier broker-dealer – "Uh, it's such a fad", stated a well-funded hedge fund manager. And so it went: "Big Data" is annoying, fleeting, and, by implication, useless. Fast forward to today, even though Big Data is a much more established term, eyes still roll when the subject comes up.

Just 10 years ago, finance was a small-data discipline. The small-data approach was partly due to the actual lack of data. To most investors, exchanges offered only four prices per stock per day: Open, High, Low and Close, and all of those were reported the following day (on the T+1 basis). Even the largest market makers did not store intraday data beyond what was mandated by regulators. Commodity trading floors, for instance, had only 21 days of history on hand until approximately five years ago. Finance Ph.D. programs almost exclusively taught analysis of closing prices, mentioning intraday variations only in passing. Today, real-time streaming data

By Irene Aldridge It's not a secret that many pension fund, mutual fund and hedge fund managers are concerned about high-frequency traders (HFTs). While their concerns are many, perhaps the biggest uncertainty involves the actual extent of HFT participation in the markets, their identities and their intent. While some claim that HFTs comprise 60-70% of all market participants, such numbers are seldom reached in reality. Scientific examinations find that HFTs still account for as little as 25% of all market activity in such frequently traded instruments as the S&P 500 E-mini futures (see Kirilenko et al., 2011). As Figure 1 shows, even in the very