Big Data Techniques Automate Portfolio Management, and Here Is How

Big Data Techniques Automate Portfolio Management, and Here Is How

By Irene Aldridge

Big Data techniques transform the landscape of portfolio management analytics. The classic portfolio allocation framework developed by Markowitz (1952, 1959) suggests that portfolio returns can be dramatically improved if investors distribute their holdings across many different assets and, potentially, asset classes. Markowitz portfolio allocation theory suggests that investors benefit from distributing their nest eggs among different investment baskets. Should one basket fall and the eggs contained within break, the other baskets will be unharmed. The key factors commonly considered in the Markowitz framework are the historical returns of the prospective portfolio constituents, which in turn allow for the measurement of historical variance and correlations of the returns among different “eggs”.

Modern Big Data analyses of factors that dominate variance-covariance show that factors other than the traditional measures, such as microstructure-related factors, improve portfolio analysis. Consider the following example: Russell 3000 equities, that is, 3000 of the most commonly traded U.S. stocks, and their 10 characteristics as recorded on December 29, 2015:

The price is the most basic characteristic of most financial instruments. In the context of daily analysis, the price usually represents the price of the last trade at which the stock was transacted during normal market hours (9:30 AM – 4:00 PM ET). Such a price is known as the closing price for a particular stock.

The previous-day return is another common metric that shows the change in the price of the stock from the previous day to the current day. In daily data analysis, closing prices are used to compute returns. To objectively compare the returns among various stocks, the returns are computed as not just the differences between today’s and last trading day’s prices, but they are normalized by the last closing price to create percentage differences. The percentage differences are easy to objectively compare across different stocks and even other financial instruments.

Intraday range volatility is a metric of variation of the price throughout the day. Covering the extremes of the intraday price range, High and Low, and normalized by the daily closing price, the metric presents the volatility in an easy-to-compare framework. At the same time, intraday range volatility captures intraday occurrences such as flash crashes, unexpected price rallies and the like.

The 10-day historical volatility is a frequently used measure of recent variation in stock prices. Historical volatility has been shown to be “sticky” in several academic studies (see Cont, 2005, for a comprehensive survey of the literature). Volatility stickiness or clustering describes the phenomenon where high volatility follows periods of high volatility, and low volatility precedes periods of low volatility. Many companies like MSCI Barra consider the recent two weeks of historical volatility a reliable guide to near-term volatility.

Market capitalization is not an immediately obvious potential factor, but it may indicate the liquidity of a stock: the larger is the pool of shares outstanding, the easier it may be to find a suitable counterparty to trade a large position. Market capitalization is also used by MSCI Barra as an important factor in their portfolio management optimization framework.

Beta broadly describes the sensitivity of the changes in the stock price to changes in the valuation in the broader market. Computed as a regression coefficient in the market model, beta captures the stock’s sensitivity to issues affecting the stock market as a whole: macroeconomic news, geopolitical news, and the like.

Dividend yield, typically computed as the cash dividend over the past 4 quarters divided by the stock price, is another stock characteristic the relevance of which is not entirely obvious. A body of academic literature, however, shows that the dividend yield is a proxy for corporate growth. Companies are simply more likely to pay dividends, that is, return cash to shareholders, when their growth opportunities and the associated investment opportunities shrink.

Finally, Aggressive HFT participation and Institutional buy and sell activity are the innovative data sets developed and distributed by AbleMarkets, a Big Data for Capital Markets company. The data products track the participation of the aggressive HFT and institutional block trades in lit anonymous markets using algorithms.


Performing a key Big Data analysis, known as Singular Value Decomposition, on the data shows that the factors driving returns in the Russell 3000 are distributed in the following order, from left to right: Beta, Stock price, Market capitalization, Dividend yield, Institutional Buying activity, Proportion of aggressive high-frequency traders in the markets, Institutional selling activity, 1-Day return, Intraday range volatility, 10-trading day volatility. Figure 2 shows the detail of Figure 1 “after the bend”.

Figure 1. Scree plot of Russell 3000 SVD analysis across 10 factors. Beta and stock price appear well before the bend.

Figure 2. “After the Bend” Detail of Figure 1.
Beta comes out as undisputed most important factor influencing Russell 3000 stocks. Intuitively, Beta is an aggregate measure of risk vis-à-vis the market. Beta incorporates covariances with the market (a basket of all financial instruments). Beta also incorporates a given stock’s idiosyncratic risk in variance. Price is the second most important factor. Price is necessary to determine an absolute gain: the bigger the price, the bigger the gain.

Market capitalization is the total number of shares outstanding and can be thought of as a proxy for liquidity. Dividend yield is a part of the return, and also a measure of corporate growth: companies paying out larger dividends often have a smaller set of investment opportunities.

Institutional buying activity as a percentage of buyer-initiated volume, average aggressive HFT activity, and institutional selling activity as a percentage of seller-initiated volume (all sourced from are determined via big data techniques reverse-engineering the flow from algo-broken parts observed in the anonymous markets. Aggressive HFT are a subset of high-frequency trading (HFT) strategies that: 1) Use aggressive orders: market orders or close-to-market limit orders, 2) Comprise both short-term alpha-driven and stat-arb strategies, 3) Can hold positions from milliseconds to 30 minutes, and 4) Erode liquidity on the opposing side of the limit-order book, add create surplus liquidity on same side of the limit-order book. Institutional activity measures cumulative trading volume buying and selling large blocks of equities, FI, commodities or FX. Typically, such volume is broken down into small orders using algorithms such as Value-Weighted Average Price (VWAP) and Time-Weighted Average Price (TWAP). AbleMarkets streams Aggressive HFT activity and Institutional activity every 20 to 30 minutes intraday, and also provides end-of-day and end-of-month numbers. Institutional activity and aggressive HFT activity are important factors in today’s markets, presenting risks and influencing returns of portfolios with short-term and long-term holding horizons, as described in “Real-Time Risk”.

Finally, previous day’s return, intraday range volatility and ten-trading-day historical volatility all come out last in the big data evaluation of key portfolio drivers. These are parameters most commonly used in traditional Markowitz-style portfolio optimization.

How do these Big Data inferences help in practice? It turns out, a lot. Portfolios that incorporate factors other than returns and volatility are significantly outperforming the traditional approach, and not just in equities, but also with foreign exchange and commodity futures. Big Data Finance is a must-know discipline and it is critical for modern portfolio management.


Irene Aldridge is Managing Director, Head of Research at AbleMarkets, a Big Data for Capital Markets company, specializing in real-time and near-real time Software-as-a-Service improving execution, portfolio allocation and risk management. She is a co-author of #1 New Release and Number 1 International Bestseller in Financial Risk Management category “Real-Time Risk: What Investors Should Know About Fintech, High-Frequency Trading and Flash Crashes” (Wiley, 2017), and an author of “High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems” (Wiley, 2nd edition, 2013). She can be seen at the upcoming 5th annual Big Data Finance Conference at NYU Center for Data Science on May 19, 2017.

Leave a Reply

Your email address will not be published. Required fields are marked *