Tuesday, July 28, 2015

Momentum: Jagadeesh and Titman 2001

3-12 month returns and earning momentum is consistently profitable. The best performer are no more riskier than worst performers. Hence, standard risk adjustments tend to increase the return spread between the winner and losers.

The cause is overreaction or underreaction to information. There is reversal over weeks to months and years and 5 years, while momentum at 3-12 months. There is seasonality in January with negative returns and positive for every other month.

A New Anomaly: The Cross-Sectional Profitability of Technical Analysis - Han, Yang, Zhou 2013

Momentum portfolio sorted by volatility generates better profits than well-known return based momentum strategies. The correlations are low as well. These excess returns are not explained by market timing, investor sentiment, default and liquidity risk. Similar results hold if the portfolios are sorted based on other proxies of information uncertainty (size, distance to default, credit rating, analyst forecast dispersion, earnings volatility). The more noise-to-signal ratio or the more uncertain the information, the more profitable the technical analysis.

Strategy: Buy or remain long the portfolio today when yesterday's price is above its 10-day MA price, to to invest in risk-free asset otherwise. This is compared against buy-and-hold, for the top decile.


Monday, July 27, 2015

Momentum and Autocorrelation in Stock Returns - Lewellen

Role of size and BM factors on stock momentum. Both are negatively auto-correlated and cross-serially correlated over intermediate horizons. The excess covariance of stocks with each other, and not under-reaction, explains momentum in the portfolios.

Firm specific returns and investors under-reaction and belated overreaction does not explain a significant component of momentum. Size and BM factor based momentum is strong and distinct, showing that momentum can't be attributed solely to firm-specific returns - there must be multiple sources of momentum. Momentum shows up in individual stocks and size quintiles, but vanishes at the market level.

Sources of Momentum

Profits depend on both auto-correlations and the lead-lag relationship. The portfolio weight of asset $i$ in month $t$ is
$$w_{i,t}=\frac{1}{N}(r_{i,t-1}-r_{m,t-1})$$
where $r_{m,t}$ is the equal-weighted market index returns in month $t$. Assume returns have unconditional mean $\mu=E[r_t]$ and autocovariance matrix $\Omega=E[(r_{t-1}-\mu)(r_t-\mu)^T]$. The portfolio return in month t equals:
$$\pi_t=\sum_i w_{i,t}r_{i,t}=\frac{1}{N}\sum_i (r_{i,t-1}-r_{m,t-1})r_{i,t}.$$
Hence, the expected profit is
$$E[\pi_t] = \frac{1}{N}E\Bigg[\sum_i r_{i,t-1}r_{i,t}\Bigg]-\frac{1}{N}E\Bigg[r_{m,t-1}\sum_i r_{i,t}\Bigg] \
 =  \frac{1}{N} \sum_i (\rho_i+\mu_i^2)-(\rho_m+\mu_m^2),$$
where $\rho_i$ and $\rho_m$ are the autocovariances of the asset i and the equal-weighted index, respectively. Using that fact that average autocovariance equals $tr(\Omega)/N$ and the autocovariance of the market portfolio equals $\varsigma^T\Omega\varsigma/n^2$, where $\varsigma$ is the vector of ones.
$$E[\pi_t]=\frac{1}{N}tr(\Omega)-\frac{1}{N^2}\varsigma^T\Omega\varsigma+\sigma_{\mu}^2=\frac{N-1}{N^2}tr(\Omega)-\frac{1}{N^2}[\varsigma^T\Omega\varsigma-tr(\Omega)]+\sigma_{\mu}^2.$$
This decomposition says that momentum can arise in three ways:
1) stocks might be positively autocorrelated (first term) - meaning stocks with high returns today are expected to have higher returns tomorrow.
2) Cross-serial correlations might be negative - meaning firm with high return today predicts that other firms will have low returns in the future. This is related to excess covariance among stocks.
3) High unconditional mean stocks.

This decomposition is not unique. 

Saturday, July 25, 2015

Anticipating Correlations - Engle

These are my notes on Robert Engle's book 'Anticipating Correlations - a new paradigm for risk management'. Engle is a celebrated Nobel Laureate for his contributions to the development of GARCH model of volatility.

Ch1: Correlation Economics 

The movement in the prices of assets are not independent. If they were it would have been possible to construct a portfolio with negligible volatility. Estimating the correlations for big cross-section is a Herculean task, especially when it is recognized that these correlations var over time. Hence, a forward looking correlation estimation is needed for optimal risk-management, portfolio selection and hedging. The main method developed is dynamic conditional correlations (DCC).

There are high correlations between industry sector stocks but lower otherwise. The correlation between different asset classes is lower. For equity of different countries the data should be non-synced (e.g. by taking average over more than one days) before taking correlations.

Changes in asset prices and correlations reflect changing forecasts of future payments. The effect of a news affects all asset prices to a greater or lesser extent, depending on their correlations. The most important reason why these correlations change over time is because the firms change their line of business. A second important factor is the characteristic of the news change (e.g. change in magnitude of the news). 

Saturday, July 18, 2015

Modeling return dynamics via decomposition

The paper Modeling Financial Returns Dynamics via Decomposition - Anatolyev and Gospodinov (2010) points out that predicting excess returns for stocks is much more difficult than simply predicting the direction of change. They hence decompose returns into direction and magnitude change and jointly model them for sign and magnitude using a copula for interaction. This lets them incorporate important non-linearities.

Introduction

Valuation ratios (dividend price, earning price), yields on short and long term treasury and corporate bonds appear to posses some predictive power at short horizons for timing the market. New variable with incremental predictive power such as share of equity issue in total new equity and debt issues, consumption-wealth ratio, relative valuations of high and low-beta stocks, estimated factors from large economic datasets can be use (Lettau and Ludvigson 2008 review paper). 

This paper, instead of trying to identify better predictors, look for better ways of using predictors. This is done by decomposing returns into sign and magnitude. Sign has better predictability. We aim to predict the expected returns and the following decomposition model is proposed:
$$E[r_t|F_{t-1}]=E[|r_t|sign(r_t)|F_{t-1}]=f(|r_t|)\times g(sign(r_t))\times \text{interaction copula}.$$
The magnitude is modeled using multiplicative error model, the sign by dynamic binary choice model and a copula for their interaction. This way we are able to model hidden nonlinearities absent from the regression setup. Magnitude and signs have substantial dependence over time but hardly any for returns! e.g. magnitude is like vol which shows significant dependence. One important aspect of the bivariate analysis is that, in spite of a large unconditional correlation between the multiplicative components, they appear to conditionally very weakly dependent.

This opens avenues for strategies as well (Anatolyev and Gerko 2005). The decomposition model is better than predictive regression which is better than buy-and-hold strategy - both in and out of sample. The decomposition model also produces unbiased forecasts.

Methodological Framework

The key identity is
$$r_t=c+|r_t-c|sign(r_t-c)=c+|r_t-c|(2\mathbb{I}[r_t>c]-1)$$
and hence,
$$E[r_t|F_{t-1}]=c-E[|r_t-c| | F_{t-1}]+2E[|r_t-c|\mathbb{I}[r_t>c]|F_{t-1}],$$
where $c$ is a user defined constant used to model transaction cost, different dynamics of small or large positive and large negative returns. It would be 0 for modeling recession and expansion using GDP. 3%  for modeling output gap, and 2% for forecasting inflation. $F_{t-1}$ is all the information available till time $t-1$, which practically consists of all data like lagged returns, volatility, volume and other predictive variables available at time $t-1$. Toy example, where predictive variables are based on realized volatility $RV_{t-1}$:
a) For direct regression model: $E[r_t]=\alpha+\beta RV_{t-1}$ gives a $R^2$ of 0.39%
b) For decomposition model: $E[|r_t|]=\alpha_{|r|}+\beta_{|r|}RV_{t-1}$ and $Pr[r_t>0]=\alpha_{\mathbb{I}}+\beta_{\mathbb{I}}RV_{t-1}$. Assuming the two components are stochastically independent giving $E[r_t]=\alpha_r+\beta_rRV_{t-1}+\gamma_r RV^2_{t-1}$, showing that nonlienarities are covered in the decomposition model, giving $R^2$ of 0.72%.
c) Further adding $\mathbb{I}[r_{t-1}>0]$ and $RV_{t-1}\mathbb{I}[r_{t-1}>0]$ to the regressor list increases the $R^2$ to 1.21%.

It is important to note that it is the augmentation of the sign component which delivers nonlinear dependence, improving the prediction. The driving force behind the predictive ability of the decomposition model is the predictability in the two components. The interaction term is less significant. This is the main theme of this work.

Marginal distributions and Copula model

a) Volatility model: Absolute returns $|r_t-c|$ is a positively valued variable and is modeled using multiplicative error framework of Engle (2002) $$|r_t-c|=\psi_t\eta_t,$$ where $\psi_t=E[|r_t-c||F_{t-1}]$ and $\eta_t$ is a positive multiplicative error with $E[\eta_t|F_{t-1}]=1$ and conditional distribution $\mathbb{D}$. $\psi_t$ can be modeled using lograthimic autoregressive conditional duration (LACD) as
$$ln\psi_t=\omega_v+\beta_vln\psi_{t-1}+\gamma_vln|r_{t-1}-c|+\rho_v\mathbb{I}[r_{t-1}>c]+\pmb{x}^T_{t-1}\pmb{\delta}_v.$$ The second last term allows for regime-specific volatility dependence while the last term represents macroeconomic predictors of volatility. $\mathbb{D}$ can be modeled as constant parameter Weibull distribution (or others distributions with the shape parameter vector $\varsigma$ a function of the past).

b) Direction model: The indicator $\mathbb{I}[r_t>c]$ has a conditional distribution of Bernoulli $\mathbb{B}(p_t)$ with probability mass function $f_{\mathbb{I}[r_t>c]}(v)=p^v_t(1-p_t)^{1-v}, v\in {0,1}$, where $p_t$ denotes the conditional 'success' probability $Pr(r_t>c|F_{t-1})=E[\mathbb{I}[r_t>c]|F_{t-1}]$. Christoffersen and Diebold (2006) show a remarkable result that if data are generated by $r_t=\mu_t+\sigma_t\epsilon_t$, where $\mu_t=E[r_t|F_{t-1}]$, $\sigma^2_t=Var[r_t|F_{t-1}]$, and $\epsilon_t$ is a homoskedastic martingale difference with unit variance (i.e. can be modeled as GARCH process) and distribution function $\mathbb{F}_{\varepsilon}$, then
$$Pr[r_t>c|F_{t-1}]=1-\mathbb{F}_{\epsilon}\left(\frac{c-\mu_t}{\sigma_t}\right).$$
This suggests that time-varying volatility can generate sign predictability as long as $c-\mu_t\ne0$. Furthermore Christoffrsen (2007) derive a Gram-Charlier expansion of this distribution and show that $Pr[r_t>c|F_{t-1}]$ depend on the third and fourth conditional cumulants of the standardized errors $\epsilon_t$. Hence, sign predictability would arise from time variability in second and higher-order moments. This leads us to parametrize $p_t$ as a dynamic logit model:
$$p_t=\frac{e^{\theta_t}}{1+e^{\theta_t}}\quad\text{with}\quad\theta_t=\omega_d+\phi_d\mathbb{I}[r_{t-1}>c]+\pmb{y}^T_{t-1}\pmb{\delta}_d,$$
where the last term denotes macroeconomic variables (valuation ratios, interest rate) and realized measure (variance, bipower vriation, realized third and fourth moment of returns).

c) Copula model:To construct the bivariate conditional distribution of $R_t=[|r_t-c|, \mathbb{I}[r_t>c]]^T$ copula theory is used. In particular,$$F_{R_t}(u,v)=C(F_{|r_t-c|}(u), F_{\mathbb{I}[r_t>c]}(v))$$where $F$ denotes the CDF and $C(u,v)$ is a copula. Most common choices are Frank, Clayton or Farlie-Gumbel-Morgenstern copulas. Once the three ingredients of the joint distribution of $R_t$, i.e. the volatility model, the direction model and the copula are specified, the parameter vector can be estimated by maximum likelihood. 

Conditional mean prediction in decomposition model

The main interest is the mean forecast of returns
$$E[r_t|F_{t-1}] = c - E[|r_t-c||F_{t-1}]+2E[|r_t-c|\mathbb{I}[r_t>c]|F_{t-1}]$$
In terms of inference
\[\hat{r}_t=c-\hat{\psi}_t+2\hat{\xi}_t\]
Under conditional independence or if conditional dependence is weak we have
$$\xi_t=E[|r_t-c||F_{t-1}]E[\mathbb{I}[r_t>c]|F_{t-1}]=\psi_tp_t.$$
so,
$$\hat{r}_t=c+(2\hat{p}_t-1)\hat{\psi}_t.$$
Under the general case of dependence, the copula estimation is essential.

Empirical Analysis

TBD