Once upon a time, or, more precisely, just some 20 years ago, “data” was a term reserved for magnetic tapes and numerous governance committees, engaged in weekly discussions on the best ways to name data fields in order to accommodate the universe of financial products. Fast forward to today, and, surprise, many financial firms still engage in the same practices. Extensive data governance committees spar over what is the optimal way to write out a name of a listed option and how that should differ from the requirements of a custom “over-the-counter” derivative.
Much of the latest CFTC Technology Advisory Committee (TAC) discussion focused on creating a unified database with clean orderly columns that can be populated by various parties. The challenges of fitting the data into set databases cost a serious amount of money not just to regulators, but to all who fall under the regulators’ jurisdiction. Some banks, for instance, employ teams of people, often one per product, whose sole task is to work out a plan to neatly store data to match the regulatory requirements. However, the idea that all the data should fit neatly into a two-dimensional table can be considered pre-historic in the modern computer age.
A discipline aptly named “data science” has helped turn volumes of un-unified, unformatted, simply called “unstructured” data into meaningful inferences without so much as a squeak from a committee of any kind. Advances in data analysis, many pioneered for applications such as genome sequencing and mapping matter transition, appear to have quietly bypassed finance altogether. And it’s a shame.
The analysis of unstructured data can be as simple or as complex as one would like to make it. Random matrix analysis, for instance, helps researchers find dependencies, inferences and clusters of insights worth knowing in seemingly random, poorly organized, barely populated data sets. The findings, in addition to answering key questions of who?, how? and when? proved to be a goldmine for companies like Google, uncovering marketable patterns. So why is it that financial businesses and regulators cannot harness this technology?
The resistance appears to come from traditional settlement firms that have depended on neatly organized rows of data for years, and these approaches are slow and cumbersome to manage. Under the current data gathering and management regime, trades take at least one whole day to process and reconcile; an eternnity in the big data world. Those very firms are presently threatened with total disruption by blockchain, a nascent, yet winning way to manage settlement risk in real time. What prevents these firms from adopting blockchain technology themselves?
Why, the data, the settlement firms say. It is not orderly enough to foster the change. It needs years of discussions before it can be used. It’s a complete disaster!
Not so at all, say data scientists. However, the parties in charge of financial data are presently too busy lamenting the faults in current data structures to listen.
Irene Aldridge is a portfolio manager, market microstructure analyst and big data afficionado, and a member of the Big Data Finance conference committee at New York University, Courant Institute for Mathematical Sciences, to take place on May 19-20, 2016. She has served on the CFTC TAC sub-committee on high-frequency trading. Irene can be reached at Irene@ .