# Final Project¶

## Abstract¶

It was hypotheized that Warren Buffet's claim: "Over a ten-year period commencing on January 1, 2008, and ending on December 31, 2017, the S&P 500 will outperform a portfolio of funds of hedge funds, when performance is measured on a basis net of fees, costs and expenses." is incorrect.

I will be simulating a portfolio of funds by trading stocks based on their operating margins and revenue growth, and hence by trading a large number of companies the risk per company is reduced. Similarly a fund of hedge funds is an investment into several hedge funds which in turn use several startegies to trade in the market. My basis of using operating margins, revenue growth and both shorting and longing stocks help simulate my portfolio as a hedge fund.

By using the quantopian platform, I will be simulating my algorithm between the 1st of January 2008 and 1st of June 2017 to see how I perform against the S&P 500 to see whether I prove or contradict my hypothesis

## Introduction¶

### Background¶

The idea behind algorithmic trading is not new, and human traders have been using machine assisted platforms to trade since the 1980s. However, it has only gained popularity in the last two decades. However, a high percentage of these automated trades are used by pension and retirement funds to balance their portfolios between index funds. The other side of algorithmic trading comes under High Frequency Trading (HFT) where algorithms leverage high speed, high orders per trade, and high frequency market data to make decisions. According to a Congressional Research Paper published in 2016, HFT accounts for 55% of the trading volume in US equity markets.

In the United States(2013), there are about 17 million households with brokerage accounts, with 65% of the accounts making between 0-3 trades, 18% trading at least once a month, and only 4% executing 100 trades in the preceding year. Meanwhile, Wall Street only employes 51,600 people in 2014. Now accoring to Warren Buffet's claims, "If Group A (active investors)and Group B (do-nothing investors) comprise the total investing universe, and B is destined to achieve average results before costs, so, too, must A. Whichever group has the lower costs will win."

But this is clearly not the case because: 1) Cost for executing trades for the general public have been falling significantly, with some platforms like Robinhood allowing trades without any charges. 2) People in Wall Street would be out of jobs if anybody (do-nothing invetsors) could just trade as good as them.

Now coming to the data, Quantopian provides certain data about all publically trades companies free of charge which comes from the companies' public filings.

Hedge Funds: Hedge funds are a partnership of investors who come up with investing strategies to make money whether the market is going up or down. These funds usually short or long stocks based on certain criterias. And a Fund of Hedge Funds is a portofolio of several hedge funds which short and long different stocks based on their individual criteria.

Simulation: My hypothesis is based upon the simulation of this funds of hedge funds. And while I am not necessarily dictating different selection critera for stocks, I have a broad criteria for shorting and longing stocks (Revenue Growth & Operating Margins), which are two signigificant metrics in the financial industry.

### Reason¶

If there is so much data available to traders who do this for a living, there must definitely be ways for them to leverage this data to make better decisions and hence better profits than investors who really do not have a deep understanding of the market.

### Hypothesis¶

It was hypothesized that Warren Buffet's claims of "Over a ten-year period commencing on January 1, 2008, and ending on December 31, 2017, the S&P 500 will outperform a portfolio of funds of hedge funds, when performance is measured on a basis net of fees, costs and expenses." and"If Group A (active investors)and Group B (do-nothing investors) comprise the total investing universe, and B is destined to achieve average results before costs, so, too, must A. Whichever group has the lower costs will win." are false.

The justification behind this hypothesis include that these investors trade for a profession so they would be able to leverage the huge amounts of data available to them to beat the market, and in turn the average investor as I am leveraging these data points while rebalancing my portfolio.

## Data and Methods¶

### Data Source¶

The source of the data is from Quantopian platform, which in turn get their data from public filings of companies listed on the Stock Exchanges. Firstly we get all the stocks in the Q1500US Data Source, which is the deafult universe of 1500 tradable stocks on the Stock Exchange. Then we also factor in companies that have a valid revenue growth and operation margin through the Moringstar Data Source and out final universe only includes stocks that are tradable, have a latest revenue growth value, and a latest operations margin value.

The values in the data that we care about are:

• Stock Ticker - uniquely identifies the company(stock): It is a textual, obsevational data point.
• Operation Margin - margin ratio of the companies'(stock) efficiency: It is a real-valued, observational data point
• Revenue Growth - growt % of the revenue that the company(stock) makes: It is a real-valued, observational data point
• Short - whether the company(stock) should be shorted: It is a boolean, interventional data point based upon the operation margin and revenue growth
• Long - whether the company(stock) should be longed: It is a boolean, interventional data point based upon the operation margin and revenue growth

### Data Cleaning¶

In the trading algorithm, we are making trades on a monthly basis. Although the data we get is on a quarterly basis, it wouldn't be unwise to assume that some fund managers make decisions in a shorter time frame. Every month, 3 days into the month, and 1 hour into the opening of the market, the portfolio is rebalanced. First the universe is calculated from the 1500 tradable stocks, which have a non null revenue growth and operating margin. Then they are split into two quantiles (longs, shorts). Then the algorithm makes the trades depending on the current portfolio and the rebalanced portfolio, where new stocks are shorted and longed and existing stocks are sold to clear up the necessary capital.

Note: I also made the decision to not short stocks as often as longing stocks. In the long-term, the market tends to grow so having a 50-50 split would be detrimental. I adopted a more conservative 85:15 split where I am still longing a significant amount of my capital but also making some short bets.

In [32]:
from quantopian.pipeline.filters.morningstar import Q1500US
from quantopian.pipeline.data.morningstar import operation_ratios
from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
import alphalens

In [36]:
def make_pipeline():
# Note : operation_ratios.revenue_growth, operation_ratios.operation_margin, sentiment YES
testing_factor1 = operation_ratios.revenue_growth.latest
testing_factor2 = operation_ratios.operation_margin.latest

universe = ( Q1500US() & testing_factor1.notnull() &
testing_factor2.notnull()
)

testing_factor = testing_factor1 + testing_factor2
testing_quantiles = testing_factor.quantiles(2)

pipe = Pipeline(columns={'testing_factor': testing_factor,
'shorts': testing_quantiles.eq(0),
'longs' : testing_quantiles.eq(1)},
screen=universe)
return pipe

In [37]:
res = run_pipeline(make_pipeline(), start_date='2008-01-01', end_date='2008-02-01')

In [38]:
res.head()

Out[38]:
longs shorts testing_factor
2008-01-02 00:00:00+00:00 Equity(2 [ARNC]) False True 355.0
Equity(24 [AAPL]) True False 2101.0
Equity(62 [ABT]) True False 1689.0
Equity(76 [TAP]) False True 1139.0

### Explanation¶

So what we see from creating this pipeline and doing the analysis is that some of these stocks should be longed (on the 2nd of January 2008), while others should be shorted based on the last quarter's financial results of these stocks. Note this is just a sample of the stocks in the result. The actual size of the results is:

In [39]:
assets = res.index.levels[1].unique() # Unique companies(stocks)
len(assets)

Out[39]:
1510

Moreover, not all these stocks are longed or shorted. We are both limited by the amount of capital that can be used to make these trades, and the time in a trading day. Note, while the trades might seem instantaneous they do make some time to execute. And if one is executing a million or billion different trades, it might not be possible.

In [40]:
pricing = get_pricing(assets, start_date='2007-12-01', end_date='2008-03-01', fields='open_price')

In [41]:
alphalens.tears.create_factor_tear_sheet(factor = res['testing_factor'],
prices = pricing,
quantiles = 2,
periods = (1,5,10) )

/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py:4: DeprecationWarning: This function is deprecated and will be removed in the future. Please use the new API instead.
after removing the cwd from sys.path.

Quantiles Statistics

min max mean std count count %
factor_quantile
1 7.0 1496.0 969.945585 385.772517 16227 50.037003
2 1479.0 2931.0 1983.609702 377.060410 16203 49.962997
Returns Analysis

1 5 10
Ann. alpha -0.211 -0.270 -0.269
beta -0.009 -0.008 -0.012
Mean Period Wise Return Bottom Quantile (bps) 6.359 42.723 86.517
Mean Period Wise Spread (bps) -12.782 -17.131 -17.335
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:727: FutureWarning: pd.rolling_apply is deprecated for Series and will be removed in a future version, replace with
Series.rolling(center=False,min_periods=1,window=5).apply(args=<tuple>,func=<function>,kwargs=<dict>)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:767: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(center=False,min_periods=1,window=5).apply(args=<tuple>,func=<function>,kwargs=<dict>)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:727: FutureWarning: pd.rolling_apply is deprecated for Series and will be removed in a future version, replace with
Series.rolling(center=False,min_periods=1,window=10).apply(args=<tuple>,func=<function>,kwargs=<dict>)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:767: FutureWarning: pd.rolling_apply is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(center=False,min_periods=1,window=10).apply(args=<tuple>,func=<function>,kwargs=<dict>)
min_periods=1, args=(period,))
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:519: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=22,center=False).mean()

Information Analysis

1 5 10
IC Mean -0.015 -0.059 -0.083
IC Std. 0.090 0.084 0.072
t-stat(IC) -0.805 -3.298 -5.396
p-value(IC) 0.430 0.003 0.000
IC Skew 0.102 -0.100 0.771
IC Kurtosis -0.446 -0.731 -0.200
Ann. IR -2.723 -11.163 -18.263
/usr/local/lib/python2.7/dist-packages/alphalens/plotting.py:215: FutureWarning: pd.rolling_mean is deprecated for Series and will be removed in a future version, replace with
Series.rolling(window=22,center=False).mean()
pd.rolling_mean(ic, 22).plot(ax=a,

Turnover Analysis

1 5 10
Quantile 1 Mean Turnover 0.002 0.005 0.01
Quantile 2 Mean Turnover 0.001 0.005 0.01
1 5 10
Mean Factor Rank Autocorrelation 1.0 0.998 0.995
<matplotlib.figure.Figure at 0x7f4de9937b10>