Step 1

Basic Notes on the Data

We are asked to first read the data and familiarize ourselves with the variables:

Plots for Step 1

For both plots:

First, let’s plot the input variables only.

# Plot of input variables (Treasuries) only
tplot.inputs.only

Interpreting Plot 1

The first plot displays how the YTM of 7 US Treasury securities changes over time and how each Treasury’s YTM changes with respect to that of its counterparts along the yield curve. The visualization uncovers a couple interesting patterns in the US Treasury market.

First, every Treasury’s YTM has fallen substantially over time, regardless of its maturity. Treasuries of all maturities fell from 13%-16% in the early 1980s to ~10% in 2000 to between 0% and 4% today. This movement represents a quasi-parallel, downward shift in the yield curve over time. From an economic perspective, this downward trend indicates a relatively strong bull market in Treasuries over the past 30-40 years.

Second, each Treasury’s rate of change is not the same at any given time. In certain periods, YTM rates on the short end of the curve (<2yr) fall faster than those on the long end (10-30yr). These periods indicate the spread between Treasuries widening (or the yield curve steepening). In other periods (such as those leading up to 1987, 2000, & 2007), rates on the short end of the curve increase much quicker than rates on the long end of the curve. These periods demonstrate the spread between Treasuries tightening (or the yield curve flattening). While the first trend deals with the position of the yield curve, this movement represents the shape of the yield curve. From an economic perspective, the varying shape (flat vs. steep) may indicate investor’s expectations about the health of the economy, its direction, and its momentum (or lack thereof).

Next, let’s plot the input variables with the output variable.

# Plot of input variables (Treasuries) and the output column
tplot.inputs.output

Interpreting Plot 2

YTM may no longer be an appropriate label for the Y-axis, as we are not sure what the output variable is at this point. Regardless, we can make some initial observations about it from looking at the graph above. The output variable trends similarly over time with YTM of Treasuries. While it rises and falls during the same periods, the output variable is more volatile, and its swings are more pronounced. Additionally, the output value falls well below zero. Because government bond yields should be positive (let’s forget about bonds in the EU), we know the output variable is not the yield of one security.

Potential Concerns for Analysis

These plots also raise two red flags we must consider before conducting analysis.

First, these plots demonstrate that each treasury generally moves in the same direction at the same time even if each changes at different rates. This trend means Treasury YTMs are likely highly correlated. The correlation plot below confirms this suspicion:

(t_cor <- as.matrix(cor(AssignmentData[,c(1:7)])))
##             USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## USGG3M   1.0000000 0.9979328 0.9843135 0.9768273 0.9603153 0.9348717
## USGG6M   0.9979328 1.0000000 0.9910013 0.9845272 0.9691905 0.9444964
## USGG2YR  0.9843135 0.9910013 1.0000000 0.9987920 0.9916366 0.9753836
## USGG3YR  0.9768273 0.9845272 0.9987920 1.0000000 0.9963859 0.9836936
## USGG5YR  0.9603153 0.9691905 0.9916366 0.9963859 1.0000000 0.9948318
## USGG10YR 0.9348717 0.9444964 0.9753836 0.9836936 0.9948318 1.0000000
## USGG30YR 0.9069437 0.9167172 0.9539191 0.9648236 0.9819750 0.9956347
##           USGG30YR
## USGG3M   0.9069437
## USGG6M   0.9167172
## USGG2YR  0.9539191
## USGG3YR  0.9648236
## USGG5YR  0.9819750
## USGG10YR 0.9956347
## USGG30YR 1.0000000

Highly correlated inputs may introduce collinearity to our model if we use them as inputs to a multivariate regression. When our input variables are highly correlated, it is hard to distinguish the effect on Output of one Treasuy’s YTM from another Treasury’s YTM.

Second, a Treasury’s YTM may be correlated with its former yields. If we use this type of input as a predictor in a linear model, we then must be cautious about autocorrelation in the resulting residuals of that model. Autocorrelated residuals stem from violation of the OLS assumption that error terms are independent and identically distributed. When violated, an OLS may produce erroneous results for key estimates we use to measure a model’s quality and statistical significance.

Based on the graphs, we may be able to predict the output variable from one or a combination of the input variables. While we explore, we keep in mind the correlation between our input variables and potential autocorrelated residuals that could result from our model.

Step 2

Estimate simple regression model with each of the inputs and outputs.

First, let’s collect all slopes and intercepts in one table and print this table.

t(apply(AssignmentData[,1:7],2,function(x) lm(Output1~x,data=AssignmentData)$coefficients))
##          (Intercept)        x
## USGG3M     -11.72318 2.507561
## USGG6M     -12.09753 2.497235
## USGG2YR    -13.05577 2.400449
## USGG3YR    -13.86162 2.455793
## USGG5YR    -15.43665 2.568742
## USGG10YR   -18.06337 2.786991
## USGG30YR   -21.08590 3.069561

Estimation Function

Because this step calls for comparative analysis of 7 different linear models, we need more information about each one. The function below (regression.analyzer) collects and compares more detailed regression statistics for each model. At a high level, this method:
# - Names for the eventual data.frame we will want. 
# - Created globally as the default names passed to the regression.analyzer function
simple.names <- c("Int_Estimate","Pred_Estimate","Int_StdError","Pred_StdError","Int_T","Pred_T",
                  "Int_P-val","Pred_P-val","r.sqr","adj.r.sqr","f.stat","ndf.num","ndf.den","sigma")
simple.stats <- c("coefficients","r.squared","adj.r.squared","fstatistic","sigma")

# - method to return a dataframe from a list of linear regressions. USED IN 2 AND 3
regression.analyzer <- function(l.of.regs, stats = simple.stats, df.colnames = simple.names){
  
  # - helper function to extract the info we want from a regression model. We then extract stats of interest.
  simple.extractor <- function(extract) t(sapply(l.of.regs,function(l) summary(l)[[extract]]))
  simple.model.data <- sapply(stats,simple.extractor)
  
  # - prep the data we've extracted so that it plays nicely with a data.frame. Enrich the dataframe as well.
  simple.model.prep <- lapply(simple.model.data,function(i) if(nrow(i)==1) t(i) else i) 
  simple.model.df <- setNames(data.frame(simple.model.prep),df.colnames)
  simple.model.df["total.var"] <- var(AssignmentData[,8])
  if ("sigma" %in% names(simple.model.df)) simple.model.df["unexpl.var"] <- simple.model.df$sigma^2
  return(round(simple.model.df, 3))
}

Results from Estimation Function

Let’s test this function above on our Assignment Data by passing a list of regression objects. 1 for each explanatory variable.

# - generate the regression models and store them in a list
simple.reg.models <- lapply(AssignmentData[,1:7], function(x) lm(Output1~x,data=AssignmentData))
(step2.models <- regression.analyzer(simple.reg.models))
##          Int_Estimate Pred_Estimate Int_StdError Pred_StdError     Int_T
## USGG3M        -11.723         2.508        0.031         0.005  -373.713
## USGG6M        -12.098         2.497        0.026         0.004  -457.046
## USGG2YR       -13.056         2.400        0.010         0.002 -1301.507
## USGG3YR       -13.862         2.456        0.008         0.001 -1687.624
## USGG5YR       -15.437         2.569        0.018         0.003  -866.314
## USGG10YR      -18.063         2.787        0.039         0.005  -461.015
## USGG30YR      -21.086         3.070        0.066         0.009  -321.448
##            Pred_T Int_P-val Pred_P-val r.sqr adj.r.sqr    f.stat ndf.num
## USGG3M    463.464         0          0 0.963     0.963  214798.7       1
## USGG6M    561.868         0          0 0.974     0.974  315696.0       1
## USGG2YR  1566.691         0          0 0.997     0.997 2454520.2       1
## USGG3YR  1995.999         0          0 0.998     0.998 3984011.3       1
## USGG5YR   995.168         0          0 0.992     0.992  990359.0       1
## USGG10YR  510.897         0          0 0.969     0.969  261016.2       1
## USGG30YR  346.441         0          0 0.935     0.935  120021.6       1
##          ndf.den sigma total.var unexpl.var
## USGG3M      8298 1.690    76.804      2.857
## USGG6M      8298 1.403    76.804      1.967
## USGG2YR     8298 0.509    76.804      0.259
## USGG3YR     8298 0.400    76.804      0.160
## USGG5YR     8298 0.799    76.804      0.638
## USGG10YR    8298 1.538    76.804      2.367
## USGG30YR    8298 2.229    76.804      4.967
A quick note on the data frame column naming convention above:

Visualizing Model Fit

We can visually explore how well each regression predicts the output. Below are 7 plots for each regression’s fit vs. the ouptut varibale.

plot.simple.regs <- function(reg,title.name){
  matplot(AssignmentData[,8],type="l",xaxt="n",main=title.name)
  lines(reg$fitted.values,col="red")
}

mapply(plot.simple.regs,simple.reg.models,names(simple.reg.models))

## $USGG3M
## NULL
## 
## $USGG6M
## NULL
## 
## $USGG2YR
## NULL
## 
## $USGG3YR
## NULL
## 
## $USGG5YR
## NULL
## 
## $USGG10YR
## NULL
## 
## $USGG30YR
## NULL

Commentary and Analysis - The Raw Stats and Plots

Each model has a postive regession coefficient estimate for \(\hat{B}_1\), with the predictor’s coefficient increasing for Treasuries with longer maturities. The models suggest that a one unit increase in YTM will increase our Output by a factor of roughly 2.5 to 3, depending on the model. Additionally, each model’s predictor coefficient is quite significant. Their t-statistics are extremely large, and the p-value from those t-tests is 0. Therefore, getting our predictors’ coefficient values could not have happened by chance. Each model’s F-statistic confirms the significance of its predictor. Their F-statistics and corresponding p-values lead us to reject each model’s null-hypothesis and conclude that it provides a better fit than its corresponding intercept-only model. Lastly, each model has an adjusted \(R^2\) value above .9, with the 3yr adjusted \(R^2\) > .99. These values suggest that each model captures an extremely high proportion of the variance in the Output, leaving almost no unexplained variance.

Each model predicts the output variable well, but the 3yr UST does the best job while the extremes (3m and 30yr) perform the worst. The 3yr has the highest adjusted \(R^2\) and the largest t-value and F-statistic. Therefore, it seems to predict the output variable the best. The plot of its fitted values confirm its fit. This plot makes the model fitted values and the Output variable seem indistinguishable.

This outcome is in line with my expectations. I figured that the best fit would likely be somewhere in the middle of the curve, either the 3yr or 5yr USTs. This maturity range is the belly of the Treasury yield curve. It is also the portion of the yield curve where rate fluctuations can be the most volatile. Given that the Output is extremely volatile, I reasoned the 3yr or 5yr would have the best chance to explain the Output’s variance and thus provide a superior fit.

Commentary and Analysis - Model Criticism

The results from the models should make us quite suspicious. The models are essentially perfect fits of the Output variable, explaining nearly all its variance. The T and F statistics seem abnormally inflated as well.

In step 1, I mentioned that dependent error terms violate an important assumption of OLS and can result in dependent residuals. When residuals are correlated, model estimates may underestimate or overestimate their true value.

Let’s explore our residuals with a couple of plots from the 3yr UST model, which supposedly performs the best.

plot(simple.reg.models$USGG3YR$residuals)
abline(h=0,col="red")

acf(simple.reg.models$USGG3YR$residuals)

The pattern shown in the first plot represents positive autocorrelation among residuals. The ACF plot confirms this pattern. Autocorrelation among residuals violates an OLS assumption from the start, so the model’s results may be erroneous. In this case, positive autocorrelation underestimates the true variance of each regression coefficient \(\hat{B}_1\). When underestimated, then the coefficient’s standard error is artificially small as well. An artificially small standard error inflates the coefficient’s t-statistic, f-statistic, and \(R^2\) value. As a result, the model’s true predictive power suffers because we can’t trust the statistics we use to assess that power in the first place.

Conclusion

Our simple linear regressions should not be trusted. While the results seem attractive, we’ve violated a key assumption of OLS. Because of this violation, estimations are inaccurate, and in our case, this inaccuracy inflates key indicators of a model’s statistical significance. This inaccuracy increases chances of Type I error (incorrectly rejecting the null hypothesis that the predictor’s coefficient is not different from 0).

Step 3

Estimate simple models where the output is now the predictor and each input is the dependent variable.

First, let’s collect all slopes and intercepts in one table and print this table.

t(apply(AssignmentData[,1:7],2,function(x) lm(x~Output1,data=AssignmentData)$coefficients))
##          (Intercept)   Output1
## USGG3M      4.675134 0.3839609
## USGG6M      4.844370 0.3901870
## USGG2YR     5.438888 0.4151851
## USGG3YR     5.644458 0.4063541
## USGG5YR     6.009421 0.3860610
## USGG10YR    6.481316 0.3477544
## USGG30YR    6.869355 0.3047124

Results from Estimation Function

Now, let’s use the function created in step 2 to print more details regarding the new set of linear models.

#- generate reverse regression list
simple.reg.models.reversed <- lapply(AssignmentData[,1:7], function(x) lm(x~Output1,data=AssignmentData))
(step3.models <- regression.analyzer(simple.reg.models.reversed))
##          Int_Estimate Pred_Estimate Int_StdError Pred_StdError    Int_T
## USGG3M          4.675         0.384        0.007         0.001  643.956
## USGG6M          4.844         0.390        0.006         0.001  796.035
## USGG2YR         5.439         0.415        0.002         0.000 2341.988
## USGG3YR         5.644         0.406        0.002         0.000 3163.813
## USGG5YR         6.009         0.386        0.003         0.000 1767.690
## USGG10YR        6.481         0.348        0.006         0.001 1086.569
## USGG30YR        6.869         0.305        0.008         0.001  891.227
##            Pred_T Int_P-val Pred_P-val r.sqr adj.r.sqr    f.stat ndf.num
## USGG3M    463.464         0          0 0.963     0.963  214798.7       1
## USGG6M    561.868         0          0 0.974     0.974  315696.0       1
## USGG2YR  1566.691         0          0 0.997     0.997 2454520.2       1
## USGG3YR  1995.999         0          0 0.998     0.998 3984011.3       1
## USGG5YR   995.168         0          0 0.992     0.992  990359.0       1
## USGG10YR  510.897         0          0 0.969     0.969  261016.2       1
## USGG30YR  346.441         0          0 0.935     0.935  120021.6       1
##          ndf.den sigma total.var unexpl.var
## USGG3M      8298 0.661    76.804      0.437
## USGG6M      8298 0.554    76.804      0.307
## USGG2YR     8298 0.212    76.804      0.045
## USGG3YR     8298 0.163    76.804      0.026
## USGG5YR     8298 0.310    76.804      0.096
## USGG10YR    8298 0.543    76.804      0.295
## USGG30YR    8298 0.702    76.804      0.493

Analysis of new Models and Differences from Step 2

The new linear models regress the Output variable on each of the inputs, creating 7 new linear models. The data from these linear models exhibit similar results as Step 2 with a few key differences.

First, briefly touching on similarities: Now, the main differences:

Conclusion

Step 2 shows clear violation of OLS assumptions. Step 3 contains these violations as well, and the results are even more abnormal. The absence of unexplained variance also makes these models less interesting. Because the Ouput essentially explains all the variance in each of the UST YTMs, I’m suspicious that the Output variable is simply some linear combination of the input variables or the result of some linear transformation mapping the input variables to the Output. Overall, we conclude that a simple OLS model may not be an appropriate measure for our data.

Step 4

Logistic Regression - A Quick Note on What We’re Trying to Do

Historically there are periods where the spreads between Treasuries tighten and periods where they ease. We model two logistic regressions to examine if we can predict whether the spread between UST YTMs is tightening (1) or easing (0). The first model uses a single Treasury (the 3m), while the second model uses all the Treasuries in our dataset.

In logistic regression, change in probalility is not constant. Therefore, this analysis looks at odds ratios instead.

Easing and Tightening Plot

First, plot the easing and tightening periods. Steps taken to prepare plot below:
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Data and Binary Fed Mode")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")

Logistic Regression for 3M UST

s4 <- AssignmentDataLogistic.EasingTighteningOnly
LogisticModel.TighteningEasing_3M <- glm(s4[,10]~s4[,1],family=binomial(link=logit))
summary(LogisticModel.TighteningEasing_3M)
## 
## Call:
## glm(formula = s4[, 10] ~ s4[, 1], family = binomial(link = logit))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4239  -0.9014  -0.7737   1.3548   1.6743  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.15256    0.17328 -12.422   <2e-16 ***
## s4[, 1]      0.18638    0.02144   8.694   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2983.5  on 2357  degrees of freedom
## Residual deviance: 2904.8  on 2356  degrees of freedom
## AIC: 2908.8
## 
## Number of Fisher Scoring iterations: 4
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Data and Fitted Values")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")
lines(LogisticModel.TighteningEasing_3M$fitted.values*20,col="green")

Analysis of 3M Logistic Regression

This model attempts to predict whether Treasury spreads are tightening or easing using the 3m UST YTM only. The model’s coefficient is significant, but logits are hard to interpret. So let’s convert the coefficient to an odds ratio to better understand what it means:

exp(cbind(ratio = coef(LogisticModel.TighteningEasing_3M), confint(LogisticModel.TighteningEasing_3M)))
## Waiting for profiling to be done...
##                 ratio      2.5 %    97.5 %
## (Intercept) 0.1161861 0.08246528 0.1626978
## s4[, 1]     1.2048754 1.15559780 1.2569543

Given that \(e^0 = 1\), an odds ratio of 1 indicates no effect on the response from the predictor. The 3 month’s odds ratio is 1.20, which implies that a one unit will increases the odds of tightening by a factor of 1.20. While this relationship is positive, the 3m’s predictive power seems weak.

The difference between the null and residual deviance further confirms the weakness of this logistic regression. The null deviance is very high, and the residual deviance makes very minor improvements. Thus, adding 3m to the model does not significantly improve our ability to predict tightening.

This weakness should make sense. Periods of tightening and easing arise when Treasury YTMs move at different rates. It’s hard to predict whether spreads are tightening or easing when looking at the behavior of only one UST YTM.

Logistic Regression for All UST

LogisticModel.TighteningEasing_All<-glm(s4[,10]~s4[,1]+s4[,2]+s4[,3]+s4[,4]+s4[,5]+s4[,6]+s4[,7],
                                    family=binomial(link=logit))
summary(LogisticModel.TighteningEasing_All)
## 
## Call:
## glm(formula = s4[, 10] ~ s4[, 1] + s4[, 2] + s4[, 3] + s4[, 4] + 
##     s4[, 5] + s4[, 6] + s4[, 7], family = binomial(link = logit))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2113  -0.8595  -0.5935   1.1306   2.5530  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -4.7552     0.4312 -11.029  < 2e-16 ***
## s4[, 1]      -3.3456     0.2666 -12.548  < 2e-16 ***
## s4[, 2]       4.1559     0.3748  11.089  < 2e-16 ***
## s4[, 3]       3.9460     0.7554   5.224 1.75e-07 ***
## s4[, 4]      -3.4642     0.9340  -3.709 0.000208 ***
## s4[, 5]      -3.2115     0.7795  -4.120 3.79e-05 ***
## s4[, 6]      -0.9705     0.9764  -0.994 0.320214    
## s4[, 7]       3.3254     0.6138   5.418 6.04e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2983.5  on 2357  degrees of freedom
## Residual deviance: 2629.6  on 2350  degrees of freedom
## AIC: 2645.6
## 
## Number of Fisher Scoring iterations: 4
matplot(AssignmentDataLogistic.EasingTighteningOnly[,-c(9,10)],type="l",ylab="Results of Logistic Regression")
lines(AssignmentDataLogistic.EasingTighteningOnly[,10]*20,col="red")
lines(LogisticModel.TighteningEasing_All$fitted.values*20,col="green")

Analysis of All Inputs Logistic Regression

First, note that all coefficients are significant except for one, the 10yr UST. Again, let’s look at odds instead of probabilites:

exp(cbind(ratio = coef(LogisticModel.TighteningEasing_All), confint(LogisticModel.TighteningEasing_All)))
## Waiting for profiling to be done...
##                    ratio        2.5 %       97.5 %
## (Intercept)  0.008606885  0.003661540   0.01986140
## s4[, 1]      0.035238657  0.020726877   0.05897471
## s4[, 2]     63.806401309 30.902619981 134.38283941
## s4[, 3]     51.729573306 12.255538668 235.63387413
## s4[, 4]      0.031296610  0.004762038   0.18400842
## s4[, 5]      0.040294836  0.008711126   0.18524705
## s4[, 6]      0.378876738  0.055679150   2.56672506
## s4[, 7]     27.808776261  8.434165561  93.64398047

Coefficients with negative logit values correspond to odds ratios below 1. Again, this makes sense, given that \(e^0 = 1\). An odds ratio below 1 indicates a negative relationship between a Treasury’s YTM and the odds of tightening. These odds ratios are more interesting than the 3m only model, but they seem to make matters more complicated. The 3m had a positive (albeit weak) relationship with the likelihood of tightening. Now, the 3m relationship is very negative, while the 6m is substantially positive. The high correlation among Treasuries may make it hard to decipher which one has the strongest impact on the response, and it may make coefficients themselves extremely unstable when we add more predictors to said models. In this example, the 3m changes significantly when we introduce additional inputs.

The logistic regression with all Treasuries provides more information, but it is still not that powerful. Our Residual Deviance has dropped more significantly than the 3m model’s, but the difference from the null model is still pretty small.

Comparison between the Two Models

The model with all the Treasuries performs better than the 3m model. This seems fair, as one Treasury’s movement alone makes it hard to predict tightening/easing, which relies on the relationship between the movement of Treasuries against each other. The “all inputs” model has a larger difference in deviances, and its AIC score is lower than the 3m model, so we can conclude that it is likely the better predictor of tightening/easing.

Achieving our Goal

At this point, we’ve simply compared the results of two logistic regressions. We can conclude that one is better than the other, but we need to check if either are appropriate for achieving our goal: predicting tightening and easing periods. Let’s break down each model’s performance.

First, let’s look at class membership: easing = 0, tigthening = 1

# - ANALYSIS RESULTS
round(rbind(table(s4[,10]),table(s4[,10])/length(s4[,10])),3)
##             0       1
## [1,] 1585.000 773.000
## [2,]    0.672   0.328

The sample size is 2358. Of that sample, only 773 (~32.8%) records represent tigthening. Therefore, the baseline model should accurately classify easing or tightening ~67% of the time if it naively assumes easing in every case.

The null model demonstrates that baseline:

# - create the base case
LogisticModel.TighteningEasing_null<-glm(s4[,10]~1,family=binomial(link=logit))

# - create confusion matrix and display the accuracy of the model
confusion.matrix <- function(logit.reg,conf.vect=s4[,10]){
  pred <- predict(logit.reg, type = 'response')
  confusion <- table(conf.vect, pred > 0.5)
  return(list("confusion" = confusion, accuracy = sum(diag(confusion)) / sum(confusion)))
}

confusion.matrix(LogisticModel.TighteningEasing_null)
## $confusion
##          
## conf.vect FALSE
##         0  1585
##         1   773
## 
## $accuracy
## [1] 0.6721798

As expected, the null model simply predicts everything is easing. As a result, it correctly classifies 1585 records as easing when they are easing, but it incorrectly classifies 773 records as easing when they are, in fact, tightening. As a result, the null model’s accuracy is ~67%.

Given that random chance yields an accuracy of 67%, a predictive model should not perform worse. Let’s check how the 3m and “all input” models perform:

confusion.matrix(LogisticModel.TighteningEasing_3M)
## $confusion
##          
## conf.vect FALSE TRUE
##         0  1452  133
##         1   773    0
## 
## $accuracy
## [1] 0.6157761
confusion.matrix(LogisticModel.TighteningEasing_All)
## $confusion
##          
## conf.vect FALSE TRUE
##         0  1444  141
##         1   551  222
## 
## $accuracy
## [1] 0.706531

Not good. The 3m model actually did worse than the baseline model . It incorrectly guessed that some easing periods were actually tightening periods. Even worse, it did not detect any of the tightening periods, classifying them all as easing.

The “all input” model did slightly better. Its accuracy was ~71%, but that is only slightly better than chance! We went through a lot of effort to improve 3-4% in accuracy.

Conclusion

Whether or not we achieved our goal depends on context. If we strive to improve accuracy only, the “all input” model technically succeeds. It predicts at least some of the tightening periods, and in our case, predicting some tightening is better than predicting none.

But in other areas, improvements in accuracy may come at the costs of false positives and false negatives. What if these numbers represented convictions in criminal court? In that case, the “all inputs” model wrongly convicts 141 people to simply improve the accuracy of its conviction rate by ~3 or 4%. That seems like a steep price to pay for marginal gains.

Ultimately, context matters. In this context, the model predict some tigthening periods, which is better than none. But either way, it still did a pretty bad job. Therefore, we have not really achieved much using a logistic regression to predict periods of tightening and easing.

Step 5

We’ve evaluated simple and logistic regressions. Now, we evaluate what combination of inputs produces the “best” model in multiple regression.

Estimating the Full Model with All 7 Predictors

full.model <- lm(Output1~.,data=AssignmentDataRegressionComparison)
summary(full.model)
## 
## Call:
## lm(formula = Output1 ~ ., data = AssignmentDataRegressionComparison)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -5.108e-09 -3.430e-10 -1.000e-12  3.340e-10  5.085e-09 
## 
## Coefficients:
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept) -1.490e+01  1.057e-10 -1.410e+11   <2e-16 ***
## USGG3M       3.840e-01  9.860e-11  3.894e+09   <2e-16 ***
## USGG6M       3.902e-01  1.500e-10  2.601e+09   <2e-16 ***
## USGG2YR      4.152e-01  2.569e-10  1.616e+09   <2e-16 ***
## USGG3YR      4.064e-01  3.299e-10  1.232e+09   <2e-16 ***
## USGG5YR      3.861e-01  2.618e-10  1.474e+09   <2e-16 ***
## USGG10YR     3.478e-01  2.800e-10  1.242e+09   <2e-16 ***
## USGG30YR     3.047e-01  1.566e-10  1.945e+09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.562e-09 on 8292 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 3.73e+22 on 7 and 8292 DF,  p-value: < 2.2e-16
sapply(c("r.squared","adj.r.squared","df"),function(i) summary(full.model)[[i]])
## $r.squared
## [1] 1
## 
## $adj.r.squared
## [1] 1
## 
## $df
## [1]    8 8292    8

When we regress Output1 on all 7 Treasuries, the model is a perfect fit. \(R^2\) and adj. \(R^2\) equal 1, so the model explains the entire variance in the Output variable. A perfect \(R^2\) is guaranteed if we have the same number of predictors as observations (minus 1 if intercept included), but in this model, we still have 8292 degrees of freedom in the residuals. Since the predictors account for only 8 degrees of freedom, why is the fit perfect?

full.model.intercept <- full.model$coefficients[1]
full.model.weights <- as.matrix(full.model$coefficients[2:length(full.model$coefficients)])
first.five.days <- as.matrix(AssignmentDataRegressionComparison[1:5,1:7]) %*% full.model.weights + full.model.intercept
first.five.example <- cbind(first.five.days,AssignmentDataRegressionComparison$Output1[1:5])
colnames(first.five.example) <- c("linear.combo.of.inputs","output")
first.five.example
##          linear.combo.of.inputs   output
## 1/5/1981               18.01553 18.01553
## 1/6/1981               18.09140 18.09140
## 1/7/1981               19.44731 19.44731
## 1/8/1981               19.74851 19.74851
## 1/9/1981               20.57204 20.57204
round(sum(full.model$fitted.values - AssignmentDataRegressionComparison$Output1),5)
## [1] 0

The Output variable is a linear combination of the 7 inputs and intercept. The input matrix is overdetermined (8300 x 8 w/ the intercept), so the matrix equation \(Ax = b\) has 0 or 1 solutions for a given vector \(b\). In this case, the Output vector exists in the column space of the input Matrix, so we find a unique solution. Therefore, the Output vector is a linear combination of our intercept and each of the 7 treasuries. We need 8 dimensions to perfectly fit the Output vector, so \(R^2\) and adj. \(R^2\) equal 1 even though 8292 degrees of freedom remain in residuals.

At this point, we have a better idea of what the Output column is based on its relationship with the inputs. That being said, we can’t really explain the Output yet other than saying it’s a linear combination. In the final steps, we’ll come to a more contrete conclusion.

Estimating the Null Model

Now that we’ve fit the full model and each individual model, let’s fit the null model:

null.model <- lm(Output1~1,data=AssignmentDataRegressionComparison)
summary(null.model)
## 
## Call:
## lm(formula = Output1 ~ 1, data = AssignmentDataRegressionComparison)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.173  -6.509  -0.415   4.860  27.298 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.42e-11   9.62e-02       0        1
## 
## Residual standard error: 8.764 on 8299 degrees of freedom
sapply(c("r.squared","adj.r.squared","df"),function(i) summary(null.model)[[i]])
## $r.squared
## [1] 0
## 
## $adj.r.squared
## [1] 0
## 
## $df
## [1]    1 8299    1

The null model is a base model. With no predictors, the intercept above is \(\bar{Y}\), or the mean of the Output. As a coefficient, it is not significant because it is simply a mean - an estimate from \(Y\) but not a predictor of \(Y\). The fitted values from the model above form a straight line through \(\bar{Y}\) for every point on the X-axis. This straight line helps visualize why the model does not display \(R^2\) or adjusted \(R^2\), because both of these values are 0, as shown below:

Compare each Model using ANOVA

anova(full.model,null.model)
## Analysis of Variance Table
## 
## Model 1: Output1 ~ USGG3M + USGG6M + USGG2YR + USGG3YR + USGG5YR + USGG10YR + 
##     USGG30YR
## Model 2: Output1 ~ 1
##   Res.Df    RSS Df Sum of Sq        F    Pr(>F)    
## 1   8292      0                                    
## 2   8299 637400 -7   -637400 3.73e+22 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA table above compares two regressions which both share the same Output variable but have a different number of predictors. The null model has no predictors, while the full model has 7. The “Df” column of the ANOVA table displays this difference. Next, observe the difference in residual sum of squares between the two models. The full model is a perfect fit, so its RSS is 0, while the null model explains none of the variance in \(Y\), so its RSS is extremely large.

The F-test and associated p-value are significant. This significance indicates that the additional coefficients in the first model explain significantly more variance in \(Y\). Thus, we can reject the null hypothesis and conclude that the two models are not the same.

Fitting the Optimal Model

So far, we’ve seen:

To figure out what the best model is, let’s review what the best model is NOT :

The Full Model
The Null Model
Simple OLS Models
A Multivariate Regression with more than 1 but less than 7 Inputs.

Now, let’s discuss what the best model COULD BE :

The optimal model will not violate OLS assumptions. Therefore, we need a model where the residuals are independent. Additionally, if our model uses multiple predictors, we need them to have as little correlation as possible. We will want to satisfy both of these OLS assumptions but still produce a statistically significant result.

An Example: The Simple Difference Model

The model below regresses the change in the Output variable from the change in the 3yr UST YTM, which was previously the best predictor in any simple linear model. The summary results show a statistically signficant values from the t-test and F-statistic. Additionally, the \(R^2\) is quite high, so our model captures a significant portion of the Output’s variance. While these estimates are lower than the original 3yr model, we must remember that the original model’s estimates were inflated. Our new model does not violate key assumptions of OLS, so we can trust its results. Therefore, the first “difference model” is much more robust than any of our original simple linear regressions.

diff.3yr <- lm(diff(Output1)~diff(USGG3YR),data=AssignmentData)
summary(diff.3yr)
## 
## Call:
## lm(formula = diff(Output1) ~ diff(USGG3YR), data = AssignmentData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.99705 -0.02053  0.00088  0.02272  1.53558 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.0006301  0.0007249  -0.869    0.385    
## diff(USGG3YR)  2.1560782  0.0089390 241.200   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06603 on 8297 degrees of freedom
## Multiple R-squared:  0.8752, Adjusted R-squared:  0.8752 
## F-statistic: 5.818e+04 on 1 and 8297 DF,  p-value: < 2.2e-16
plot(diff.3yr$residuals)
abline(h=0,col="red")

acf(diff.3yr$residuals)

matplot(diff(AssignmentData[,8]),type="l",xaxt="n",main="Diff Plot")
lines(diff.3yr$fitted.values,col="red")

Further Example: Time Series Models

This evaluation sticks to simple and multivariate OLS regressions, attempting to demonstrate that we can improve upon the models in Steps 1-3 without adding much complexity or using more advanced techniques. That being said, a number of models out of scope for this analysis should be explored. These include specific time series models that are geared toward time series data.

Step 6

Plot of the Rolling Mean Graph

A quick note on what the graph below plots:
plot(Means.forPlot,col="red")
lines(AssignmentDataRegressionComparison[,1])

Plot the Rolling Standard Deviation

The code below does the following:
# daily differences first
dailydiff <- diff(as.matrix(AssignmentDataRegressionComparison))
head(dailydiff)
##           USGG3M USGG6M USGG2YR USGG3YR USGG5YR USGG10YR USGG30YR
## 1/6/1981    0.06   0.07    0.14    0.03   -0.08    -0.04     0.00
## 1/7/1981    0.92   0.74    0.50    0.47    0.40     0.27     0.22
## 1/8/1981    0.26   0.10    0.17    0.17    0.07    -0.03     0.02
## 1/9/1981    0.44   0.30    0.44    0.33    0.20     0.22     0.22
## 1/12/1981   0.02  -0.07   -0.36   -0.34   -0.17    -0.12    -0.05
## 1/13/1981   0.02  -0.13    0.13    0.03   -0.03     0.08     0.00
##               Output1
## 1/6/1981   0.07587222
## 1/7/1981   1.35591615
## 1/8/1981   0.30119608
## 1/9/1981   0.82353207
## 1/12/1981 -0.42985741
## 1/13/1981  0.03935812
# standard deviation rolling window second
rolling.sd <- rollapply(dailydiff,width=Window.width,by=Window.shift,by.column=TRUE, sd)
head(rolling.sd)
##         USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR  USGG30YR
## [1,] 0.3433258 0.3262462 0.2748258 0.2030161 0.1713192 0.1299585 0.1202147
## [2,] 0.2933383 0.2907504 0.2261811 0.1499219 0.1450082 0.1146895 0.1192201
## [3,] 0.2613180 0.2437530 0.2006201 0.1632596 0.1654110 0.1459308 0.1351909
## [4,] 0.2551754 0.2469663 0.1989446 0.1692794 0.1717219 0.1551052 0.1422183
## [5,] 0.2480551 0.2481595 0.2102004 0.1786057 0.1744767 0.1643960 0.1516540
## [6,] 0.1963884 0.2363672 0.2095082 0.1809180 0.1822917 0.1664956 0.1537351
##        Output1
## [1,] 0.5639875
## [2,] 0.4707427
## [3,] 0.4681168
## [4,] 0.4786189
## [5,] 0.4888569
## [6,] 0.4788897
# dates table third
rolling.dates<-rollapply(AssignmentDataRegressionComparison[-1,],width=Window.width,by=Window.shift,
                         by.column=FALSE,FUN=function(z) rownames(z))

rownames(rolling.sd)<-rolling.dates[,10]

# generate the plots
matplot(rolling.sd[,c(1,5,7,8)],xaxt="n",type="l",col=c("black","red","blue","green"))
axis(side=1,at=1:1656,rownames(rolling.sd))

Show periods of high volatility.

# Show periods of high volatility
high.volatility.periods<-rownames(rolling.sd)[rolling.sd[,8]>.5]
high.volatility.periods
##  [1] "1/19/1981"  "3/25/1981"  "4/1/1981"   "4/8/1981"   "4/23/1981" 
##  [6] "4/30/1981"  "5/7/1981"   "5/14/1981"  "5/21/1981"  "5/29/1981" 
## [11] "6/5/1981"   "6/12/1981"  "6/19/1981"  "6/26/1981"  "7/6/1981"  
## [16] "7/13/1981"  "7/20/1981"  "7/27/1981"  "10/28/1981" "11/5/1981" 
## [21] "11/13/1981" "11/20/1981" "11/30/1981" "12/7/1981"  "12/14/1981"
## [26] "12/29/1981" "1/14/1982"  "1/21/1982"  "1/28/1982"  "2/4/1982"  
## [31] "2/11/1982"  "7/29/1982"  "8/5/1982"   "8/12/1982"  "8/19/1982" 
## [36] "8/26/1982"  "9/24/1982"  "10/1/1982"  "10/8/1982"  "10/18/1982"
## [41] "10/13/1987" "10/20/1987" "10/27/1987" "11/19/2007" "11/26/2007"
## [46] "11/12/2008" "11/19/2008"

Look at pairwise X-Y plots of regression coefficients for the 3M, 5Yr and 30Yr yields as inputs.

# Pairs plot of Coefficients
pairs(Coefficients)

Interpret the Pairwise Plots

Basic notes on the pairwise plots:

The first interesting note is the range and spread of each cofficient over time.

Coeff.list <- Coefficients
lapply(c("min"=min,"max"=max,"mean"=mean),function(f) apply(Coeff.list,2,f))
## $min
## (Intercept)      USGG3M     USGG5YR    USGG30YR 
## -23.9328528  -0.9324927  -0.7510736  -1.7515157 
## 
## $max
## (Intercept)      USGG3M     USGG5YR    USGG30YR 
##   -7.819746    2.459668    3.195845    2.634268 
## 
## $mean
## (Intercept)      USGG3M     USGG5YR    USGG30YR 
## -14.7798088   0.7410698   1.4856917   0.2952447

In general, the 3m and 30yr coefficients range from -1 to 2. This range indicates that the 3m and 30yr are positively correlated with the Output variable in some rolling windows and a negatively correlated in others. The 5yr, on the other hand, almost always has a positive relationship with the Output variable. On average, it tends to have the highest impact on the Output variable, as seen by its mean above. Next, the 3m’s coefficient distribution seems to have much fatter tails than those of the 5yr and 30yr.

The next interesting observation is the relationships between each set of coefficients. First, the correlation between the 5yr and 30yr coefficients is strongly negative. The 5yr has the most positive influence in linear models where the 30yr has the most negative influence (and vice versa). Next, the 3m’s coefficient seems largely independent of the 5yr and the 30yr.

Plot the coefficients. Show periods.

# Plot of coefficients
matplot(Coefficients[,-1],xaxt="n",type="l",col=c("black","red","green"))
axis(side=1,at=1:1657,rownames(Coefficients))

Is the picture of coefficients consistent with the picture of pairs? If yes, explain why.

The coefficient line plot is consistent with the picture of pairs based on my observations:

Each plot has strengths and weaknesses. The line plot demonstrates how the variance of each coefficient changes over time, and it clearly shows the positive influence of the 5yr coefficient. The pairs plot, on the other hand, is much better for observing the relationship between coefficients.

Plot the rolling R^2 values

plot(r.squared[,2],xaxt="n",ylim=c(0,1))
axis(side=1,at=1:1657,rownames(Coefficients))

(low.r.squared.periods<-r.squared[r.squared[,2]<.9,1])
## [1] "6/24/1987" "6/27/1991" "4/28/2005" "6/20/2012"

What could cause decrease of R^2?

We know that a combination of all 7 UST YTM’s (and an intercept) fit the Output perfectly. But when we use 3 UST YTMs only and restrict the time frame for analysis, the \(R^2\) value of a model will fluctuate dependending on how well the 3 inputs explain the Output’s variance during that rolling time window. In an extreme example, let’s assume that the 3m, 5yr, and 30yr are all zero (or held constant) for a 20 day period, while the remaining 4 UST YTMs and thus the Output continue fluctuating. In this case, the model would be insignificant because its inputs would not explain any of the variance in the Output during that time window. This extreme example applies to less extreme scenarios as well.

To help visualize the concept above, let’s explore the fit of the 3m UST vs the 10yr UST in the different time frames:

example.r2 <- cbind(rbind(summary(lm(Output1~USGG3M,AssignmentData))$r.squared,
                          summary(lm(Output1~USGG3M,AssignmentData[7000:8300,]))$r.squared),
                          rbind(summary(lm(Output1~USGG10YR,AssignmentData))$r.squared,
                          summary(lm(Output1~USGG10YR,AssignmentData[7000:8300,]))$r.squared))

rownames(example.r2) <- c("all-time","last ~5 years")
colnames(example.r2) <- c("3M UST R^2","10Y UST R^2")
example.r2
##               3M UST R^2 10Y UST R^2
## all-time       0.9628054   0.9691884
## last ~5 years  0.1648486   0.9754343

Over the course of 30-40 years, the 3m and the 10yr fit the Output variable almost perfectly. Yet in the last ~5 years, the 3m has an extremely low \(R^2\), while the 10yr fits the data even better than it does over the course of the 30-40 year history. This observation has to do with context. Over the past 5-7 years, the 3m UST YTM has been essentially pinned at 0, while more distant maturity YTMs such as the 10yr have continued to fluctuate. Therefore, while the 3m is a good predictor of the Output overall, it is not a good predictor right now. Dips in \(R^2\) in the graph above result from context as well. These dips are 20 day windows where the 3m, 5yr, and 30yr were not moving as much as they usually do in relation to their peers, and thus USTs not in the model explain more of the Output’s variance than they normally do.

Plot the rolling p-values

matplot(Pvalues,xaxt="n",col=c("black","blue","red","green"),type="o")
axis(side=1,at=1:1657,rownames(Coefficients))

Analyze the rolling p-values plot.

Basic notes on the plot:

The p-values plot identifies which coefficients are significant in each multiple linear regression. It also complements the other plots in this section nicely and exposes the issue with multicollinearity.

Over time, the 5yr coefficient is consistently significant, with the exception of a few periods (~1987 and ~2000). This trend is also consistent with our line graph, which demonstrates the 5yr’s continuously positive influence over time.

In this plot, the 3m becomes less and less significant in models as time goes on, demonstrated by its string of high p-values from ~2008 onward. This trend is consistent with the experiment in the \(R^2\) analysis section. The 3m has been pegged near 0 for quite some time. Therefore, it is not surprising that the 3m’s coefficient is often not statistically significant from 0, as evidenced by its high p-values in the rolling p-values plot.

Next, the 30yr generally lacks influence in the multiple regressions. The few times it does have a low p-value (and thus statistical significance) are the same instances when the 5yr’s p-value is high. This trend is consistent with the negative relationship between the two predictor’s coefficients, which we first discovered in the pairs plot.

Finally, let’s touch briefly on the issue of multicollinearity:

vif.test<-rollapply(AssignmentDataRegressionComparison,width=Window.width,by=Window.shift,by.column=FALSE,
                        FUN=function(z) vif(lm(Output1~USGG3M+USGG5YR+USGG30YR,data=as.data.frame(z))))
rownames(vif.test)<-rolling.dates[,10]

vif.test[sample(rownames(Pvalues)[Pvalues[,4]>.5],5),]
##              USGG3M   USGG5YR USGG30YR
## 10/10/1990 1.021178 17.559289 17.53573
## 3/4/2004   1.083347 90.567743 90.77262
## 10/18/1989 1.744119 13.770313 11.59324
## 8/15/1988  2.361954 36.245538 30.67140
## 9/7/2007   1.049864  6.190367  6.21269

The table above shows the VIF for each variable in linear models from 5 random periods where the p-value for the 30yr was greater than 0.5. While not always the case, the 5yr and 30yr tend to have high variable inflation scores. This indicates strong multicollinearity in each model, as the inputs to the multiple regression are highly correlated with each other. In this case, we likely don’t need both the 5yr and the 30yr as inputs.

Step 7

Perform PCA with the inputs (columns 1-7).

First, let’s prepare the data and explore the dimensionality of 3 variables: The 3m, 2yr, and 5yr.

AssignmentData.Output<-AssignmentData$Output1
AssignmentData<-data.matrix(AssignmentData[,1:7],rownames.force="automatic")
AssignmentData.3M_2Y_5Y<-AssignmentData[,c(1,3,5)]
pairs(AssignmentData.3M_2Y_5Y)

Analyze the covariance matrix of the data. Compare results of manual calculation and cov().

Centered.Data<-apply(AssignmentData,2,function(x) x-mean(x))
(Manual.Covariance.Matrix<-(t(Centered.Data) %*% Centered.Data)/(nrow(Centered.Data)-1))
##             USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## USGG3M   11.760393 11.855287 12.303031 11.942035 11.188856  9.924865
## USGG6M   11.855287 12.000510 12.512434 12.158422 11.406959 10.128890
## USGG2YR  12.303031 12.512434 13.284203 12.977542 12.279514 11.005377
## USGG3YR  11.942035 12.158422 12.977542 12.708647 12.068078 10.856033
## USGG5YR  11.188856 11.406959 12.279514 12.068078 11.543082 10.463386
## USGG10YR  9.924865 10.128890 11.005377 10.856033 10.463386  9.583483
## USGG30YR  8.587987  8.768702  9.600181  9.497246  9.212159  8.510632
##          USGG30YR
## USGG3M   8.587987
## USGG6M   8.768702
## USGG2YR  9.600181
## USGG3YR  9.497246
## USGG5YR  9.212159
## USGG10YR 8.510632
## USGG30YR 7.624304
(Covariance.Matrix <- cov(AssignmentData))
##             USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## USGG3M   11.760393 11.855287 12.303031 11.942035 11.188856  9.924865
## USGG6M   11.855287 12.000510 12.512434 12.158422 11.406959 10.128890
## USGG2YR  12.303031 12.512434 13.284203 12.977542 12.279514 11.005377
## USGG3YR  11.942035 12.158422 12.977542 12.708647 12.068078 10.856033
## USGG5YR  11.188856 11.406959 12.279514 12.068078 11.543082 10.463386
## USGG10YR  9.924865 10.128890 11.005377 10.856033 10.463386  9.583483
## USGG30YR  8.587987  8.768702  9.600181  9.497246  9.212159  8.510632
##          USGG30YR
## USGG3M   8.587987
## USGG6M   8.768702
## USGG2YR  9.600181
## USGG3YR  9.497246
## USGG5YR  9.212159
## USGG10YR 8.510632
## USGG30YR 7.624304

Plot the covariance matrix.

Maturities<-c(.25,.5,2,3,5,10,30)
contour(Maturities,Maturities,Covariance.Matrix)

Before continuing, it’s important to understand 2 things:

1) What is the Covariance Matrix, and What’s in it?
2) What does the Covariance Matrix Tell Us and Why does it Matter?

We tend to think of variables {\({X_1,X_2,...,X_N}\)} as a set existing in N-dimensional space because we have \(N\) inputs. But some of these variables may be correlated with each other, and the true variance of our input space can be explained using fewer dimensions. To examine this redundancy, we can look at the covariance matrix \(A\) of our inputs and find which directions contain the most variance and how much variance those directions explain.

In an extreme example where our inputs are uncorrelated and independent, then the off-diagonal of our covariance matrix \(A\) will contain all zeros. In this case, the largest value on the main diagonal scales the unit vector of the dimension where variance is the largest. More likely than not, however, our inputs are correlated, so \(A\) is not diagonal. In this case, it’s not immediately clear which direction explains the most variance in our data. Luckily, \(A\) has some interesting properties that help us out.

\(A\) can have eignvalue(s) and eigenvector(s) because \(A\) is square. And because \(A\) is also symmetric, any different eigenvectors in \(A\) will be orthogonal. As a result, if \(A\) transforms random vector \(v\), the transformation \(Av\) points closer toward the direction in which the variance is largest unless \(v\) is an eigenvector itself, in which case \(Av\) is a scalar multiple of original vector \(v\). The eigenvectors of \(A\) represent a new set of vectors that are orthogonal to each other and thus uncorrelated with each other. The eigenvectors uncover the direction of greatest variance in each dimension of our dataset, while their associated eigenvalues represent the actual variance along each corresponding eigenvector.

To isolate these eigenvectors (and eigenvalues), we factor our covariance matrix. We can then use this factored matrix to transform our original inputs into uncorrelated vectors. PCA is the technique we’ll explore to perform this factorization and input transformation.

Perform PCA by manually calculating factors, loadings and analyzing the importance of factors.

First, let’s perform PCA manually. Here are the steps:
Eigen.Decomposition <- eigen(Manual.Covariance.Matrix)

# L0 is the mean of each column in our original data set
L0 <- apply(AssignmentData,2,mean)

# Loadings are the coefficients of our principal components, which are equal to orthonormal eigenvectors
Loadings <- Eigen.Decomposition$vectors

# Show that the loadings are orthormal eigenvectors -- transpose and dot product should return identity
round(t(Loadings) %*% Loadings,5)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]    1    0    0    0    0    0    0
## [2,]    0    1    0    0    0    0    0
## [3,]    0    0    1    0    0    0    0
## [4,]    0    0    0    1    0    0    0
## [5,]    0    0    0    0    1    0    0
## [6,]    0    0    0    0    0    1    0
## [7,]    0    0    0    0    0    0    1
# Note that using Centered.Data here essentially takes care of L0 intercept in calculation
Factors <- Centered.Data %*% Loadings
colnames(Loadings) <- c("Loadings 1","Loadings 2","Loadings 3","Loadings 4","Loadings 5","Loadings 6","Loadings 7")
rownames(Loadings) <- colnames(AssignmentData)
colnames(Factors) <- c("PC1","PC2","PC3","PC4","PC5","PC6","PC7")
Loadings[,1:3]
##          Loadings 1  Loadings 2 Loadings 3
## USGG3M   -0.3839609 -0.50744508  0.5298222
## USGG6M   -0.3901870 -0.43946144  0.1114737
## USGG2YR  -0.4151851 -0.11112721 -0.4187873
## USGG3YR  -0.4063541  0.01696988 -0.4476561
## USGG5YR  -0.3860610  0.23140317 -0.2462364
## USGG10YR -0.3477544  0.43245979  0.1500903
## USGG30YR -0.3047124  0.54421228  0.4979195
head(Factors[,1:3],5)
##                PC1       PC2      PC3
## 1/5/1981 -18.01553 -2.240277 1.461149
## 1/6/1981 -18.09140 -2.352346 1.442377
## 1/7/1981 -19.44731 -2.862932 1.644084
## 1/8/1981 -19.74851 -3.040712 1.633909
## 1/9/1981 -20.57204 -3.177974 1.661795
# Show that the Factors are uncorrelated versions their original X variables
round(cor(Factors),5)
##     PC1 PC2 PC3 PC4 PC5 PC6 PC7
## PC1   1   0   0   0   0   0   0
## PC2   0   1   0   0   0   0   0
## PC3   0   0   1   0   0   0   0
## PC4   0   0   0   1   0   0   0
## PC5   0   0   0   0   1   0   0
## PC6   0   0   0   0   0   1   0
## PC7   0   0   0   0   0   0   1

Plot the Importance of the Factors:

barplot(Eigen.Decomposition$values/sum(Eigen.Decomposition$values),width=2,col = "black",
        names.arg=c("F1","F2","F3","F4","F5","F6","F7"))

Plot the First Three Loadings:

matplot(Maturities,Loadings[,1:3],type="l",lty=1,col=c("black","red","green"),lwd=3)

Interpret the factors by looking at the shapes of the loadings.

A quick note about the Loadings themselves:

Using the loadings alone, we can make some observations about the factors themselves:

Loading 1 and Factor 1

In the plot above, the roughly horizontal black line represents the loadings for Factor 1. Note that the loadings are approximately equal and their line never crosses zero. Because Treasury YTMs often move in the same direction, this means that Factor 1 captures quasi-parallel shifts of the UST yield curve. Given that the yield curve’s most dominant trend is a downward shift over time, it makes sense that the most important factor accounts for its shifts.

Loading 2 and Factor 2

While the most dominant trend is a yield curve shift, each UST along the yield curve does not move at the same rate. In some periods, the short-end of the curve increaes/decreases faster than the long-end of the curve (and vice versa), causing the curve to flatten/steepen as it shifts upward or downward. Factor 2 complements factor 1, explaining the changes in the shape of the yield curve as it shifts. The red line in the loadings plot demonstrates this behavior. The coefficients have a larger magnitude for the <2yr and >10yr, while the 3-7 range loadings are not very impactful. Economically, this makes sense, as flattening/steepening depends on the spread between longer-dated USTs and shorter-dated USTs. Therefore, Factor 2 captures the second most dominant change in the curve – its average slope.

Loading 3 and Factor 3

YTM movement in the belly of the curve defines the Yield Curve’s curvature. If there were no curvature at all, the Yield Curve’s slope would be constant along the curve. But when curvature exists, the Yield curve’s slope changes along the curve, even though its position and general steepness / flatness are already defined. Factor 3 captures how curvature changes when the yield curve shifts and tilts. This movement corresponds to the green line on the plot. The gleen line crosses zero twice before increasing at a diminishing rate. This shows that Factor 3 changes the yield curve’s curvature, influencing rates along the curve very differently before mellowing out for later maturities.

Calculate and plot 3 selected factors

matplot(Factors[,1:3],type="l",col=c("black","red","green"),lty=1,lwd=3)

Change the signs of the first factor and the corresponding factor loading.

Loadings[,1]<--Loadings[,1]
Factors[,1]<--Factors[,1]
matplot(Factors[,1:3],type="l",col=c("black","red","green"),lty=1,lwd=3)

matplot(Maturities,Loadings[,1:3],type="l",lty=1,col=c("black","red","green"),lwd=3)

plot(Factors[,1],Factors[,2],type="l",lwd=2)

Draw at least three conclusions from the plot of the first two factors above.

Conclusion 1: Spread and Influence

The graph above demonstrates that Factor 1 has a much larger range and standard deviation than Factor 2. As a result, Factor 1’s influence captures more of the variance in the true yield curve’s movement over time. The yield curve has shifted dramatically downward, and Factor 1 demonstrates that behavior given its large movement form right-to-left on the X-axis. Factor 2’s range and standard deviation are much smaller. This makes sense, as Factor 2 adjusts the yield curve’s shape. While Factor 2 can cause the yield curve to move, it does not capture or measure the yield curve’s shifts. Rather, it measures the flattening or steepening of the yield curve. Those measurements are seen as adjustments to an otherwise normal yield curve that is defined more frequently by its quasi-parallel movements than its changes in shape.

Conclusion 2: Relationships between Factors Leading Up to Major Market Events

The first chart below plots Factor 1 against Factor 2, with market behavior ~1999-02 in light green / dark green and ~2006-08 in pink / red. In both time frames, the factors demonstrate a strong negative relationship, confirmed by the correlation table. This relationship strengthens as a market bubble builds leading up to an eventual market crash. The presence of this negative correlation is important, because it breaks away from the general independence between the two factors (which are orthogonal when considering the entire dataset).

From this plot, we can also examine if one Factor’s movement causes another Factor’s movement. In ~1999-02, for example, the yield curve begins shifting upward before it begins to flatten. The light green line captures this pattern. Eventually, as an economic bubble is about to burst, the market begins to correct, and the yield curve shifts downward then steepens dramatically. The dark-green line follows this general pattern.

While this pattern is not the case in every market boom and crash, its presence in this time frame brings to light the effect each factor might have as a leading or lagging indictor.

plot(Factors[,1],Factors[,2],type="l",lwd=2)

# show the rates movement in ~1998 to ~2002
lines(Factors[4450:4850,1],Factors[4450:4850,2],col="light green")
lines(Factors[4851:5450,1],Factors[4851:5450,2],col="forest green")

# show the rates movement in ~2004 to ~2007
lines(Factors[6200:6600,1],Factors[6200:6600,2],col="pink")
lines(Factors[6601:7000,1],Factors[6601:7000,2],col="red")

# show the correlation changes in these plots
cor.plots <- cbind(cor(Factors[,1],Factors[,2]),
                   cor(Factors[4450:5450,1],Factors[4450:5450,2]),
                   cor(Factors[6200:7000,1],Factors[6200:7000,2]))

colnames(cor.plots) <- c("Overall","~2001 crash", "~2008 crash")
cor.plots
##           Overall ~2001 crash ~2008 crash
## [1,] 2.196545e-15  -0.8050123  -0.9216871

Lastly, the line plots below demonstrate the effect of major market events. Normally, Factor 1 captures most of the yield curve’s movements. When the yield curve’s shape changes dramatically, however, Factor 1 does not predict the yield curve changes as well. In the two time frames above, Factor 2 is necessary to account for the unexplained variance in yield curve moves that Factor 1 cannot account for.

Conclusion 3: Volatile vs. Normal Markets

The plot below visualizes the difference between calm (if you can call them that) and volatile markets. Both the red and the blue line represent ~4-5 year periods, yet the red line’s changes are minor relative to those of the blue. This observation represents large differences in the yield curve’s volatility during these time frames. During 1981-1985 (blue line), the yield curve shifted dramatically, inverted, and normalized multiple times. As a result, the UST yield curve was all over the place, and the Factor plot below evidences this volatility. From ~2009-2014, on the other hand, the yield curve has been fairly stable. It’s remained normal the entire time, and its main movements include minor flattening and steepening periods. The factor plot demonstrates this relative calmness as well. The red line has almost no variance on the x-axis, and its variance along the y-axis is much smaller than that of the blue line.

plot(Factors[,1],Factors[,2],type="l",lwd=2)

# show the rates movement in ~1981 to ~1985 -- high volatility
lines(Factors[1:1000,1],Factors[1:1000,2],col="blue")

# show the rates movement in ~1990 to ~1995
lines(Factors[7300:8300,1],Factors[7300:8300,2],col="red")

Analyze the adjustments that each factor makes to the term curve.

OldCurve<-AssignmentData[135,]
NewCurve<-AssignmentData[136,]
CurveChange<-NewCurve-OldCurve
FactorsChange<-Factors[136,]-Factors[135,]
ModelCurveAdjustment.1Factor<-OldCurve+t(Loadings[,1])*FactorsChange[1]
ModelCurveAdjustment.2Factors<-OldCurve+t(Loadings[,1])*FactorsChange[1]+t(Loadings[,2])*FactorsChange[2]
ModelCurveAdjustment.3Factors<-OldCurve+t(Loadings[,1])*FactorsChange[1]+t(Loadings[,2])*FactorsChange[2]+
  t(Loadings[,3])*FactorsChange[3]
matplot(Maturities,
        t(rbind(OldCurve,NewCurve,ModelCurveAdjustment.1Factor,ModelCurveAdjustment.2Factors,
                ModelCurveAdjustment.3Factors)),
        type="l",lty=c(1,1,2,2,2),col=c("black","red","green","blue","magenta"),lwd=3,ylab="Curve Adjustment")
legend(x="topright",c("Old Curve","New Curve","1-Factor Adj.","2-Factor Adj.",
                      "3-Factor Adj."),lty=c(1,1,2,2,2),lwd=3,col=c("black","red","green","blue","magenta"))

rbind(CurveChange,ModelCurveAdjustment.3Factors-OldCurve)
##               USGG3M   USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR
## CurveChange 1.070000 1.070000 0.8900000 0.8300000 0.7200000 0.5000000
##             1.090063 1.041267 0.9046108 0.8248257 0.6979317 0.5531734
##              USGG30YR
## CurveChange 0.4700000
##             0.4357793

Explain how shapes of loadings affect adjustments using only factor 1, factors 1 and 2, and all 3 factors.

Let’s start with some context:

The first three factors help us model how each Treasury’s YTM change contributed to the overall yield curve shift. Factor 1 explains most of the day-over-day movement in the yield curve. The dotted-green line in the plot above represents the quasi-parallel upward shift of the UST curve. That being said, the yield curve also steepened (because in this case it’s inverted), so Factor 1 line underestimates the upward shift on the short end of the curve and overstates the upward shift on the long end of the curve. Factor 2 captures some of the unexplained variance left over from Factor 1.

Factor 2, the dotted blue line, makes adjustments to the Factor 1 curve to bring us closer to the true day-over-day shift in the UST yield curve. While it’s clear the yield curve shifted updward, the UST yields did not move proportionally across the curve. In this case, the short end of the curve (<2yr) YTM’s shifted more than they would if the shift were parallel, while the long end of the curve (>10 year) moved less than expected. Because Factor 2 accounds for the steepness/flatness of the curve, it accounts for the unexplained variance left over from Factor 1. It re-adjusts the short end of the curve upward and moves the long end of the curve downward.

Factor 3, the dotted pink line, makes some final touches to account for unexplained variance left after from Factors 1 and 2. The yield curve not only shifted upward and steepened (inversely), it also changed its curvature slightly. Factor 3 accounts for this movement, edging the longer end of the curve slightly upward in comparison to Factor 2.

All three of these Factors’ movements represent the Loading lines from the Loading vs. Maturity Plot. The best example is Loading 2 and Factor 2. Loading 2 tells us that the short end of the yield curve and the long end of the yield curve will adjust more than the belly will. As a result, Factor 2 adjusts the yield curve by pushing up the front end of the curve and pushing down the long end of the curve. The line plot for Loading 2 shows those weights in action because they correspond to how Factor 2 moves in the adjustment plot.

See the goodness of fit for the example of 10Y yield.

cbind(Maturities,Loadings[,1:3])
##          Maturities Loadings 1  Loadings 2 Loadings 3
## USGG3M         0.25  0.3839609 -0.50744508  0.5298222
## USGG6M         0.50  0.3901870 -0.43946144  0.1114737
## USGG2YR        2.00  0.4151851 -0.11112721 -0.4187873
## USGG3YR        3.00  0.4063541  0.01696988 -0.4476561
## USGG5YR        5.00  0.3860610  0.23140317 -0.2462364
## USGG10YR      10.00  0.3477544  0.43245979  0.1500903
## USGG30YR      30.00  0.3047124  0.54421228  0.4979195
Model.10Y<-L0[6]+Loadings[6,1]*Factors[,1]+Loadings[6,2]*Factors[,2]+Loadings[6,3]*Factors[,3]
matplot(cbind(AssignmentData[,6],Model.10Y),type="l",lty=1,lwd=c(3,1),col=c("black","red"),ylab="5Y Yield")

Repeat the PCA using princomp

PCA.Yields<-princomp(AssignmentData)
names(PCA.Yields)
## [1] "sdev"     "loadings" "center"   "scale"    "n.obs"    "scores"  
## [7] "call"
# Check that the loadings are the same
cbind(PCA.Yields$loadings[,1:3],Maturities,Eigen.Decomposition$vectors[,1:3])
##              Comp.1      Comp.2     Comp.3 Maturities           
## USGG3M   -0.3839609  0.50744508  0.5298222       0.25 -0.3839609
## USGG6M   -0.3901870  0.43946144  0.1114737       0.50 -0.3901870
## USGG2YR  -0.4151851  0.11112721 -0.4187873       2.00 -0.4151851
## USGG3YR  -0.4063541 -0.01696988 -0.4476561       3.00 -0.4063541
## USGG5YR  -0.3860610 -0.23140317 -0.2462364       5.00 -0.3860610
## USGG10YR -0.3477544 -0.43245979  0.1500903      10.00 -0.3477544
## USGG30YR -0.3047124 -0.54421228  0.4979195      30.00 -0.3047124
##                                
## USGG3M   -0.50744508  0.5298222
## USGG6M   -0.43946144  0.1114737
## USGG2YR  -0.11112721 -0.4187873
## USGG3YR   0.01696988 -0.4476561
## USGG5YR   0.23140317 -0.2462364
## USGG10YR  0.43245979  0.1500903
## USGG30YR  0.54421228  0.4979195
matplot(Maturities,PCA.Yields$loadings[,1:3],type="l",col=c("black","red","green"),lty=1,lwd=3)

matplot(PCA.Yields$scores[,1:3],type="l",col=c("black","red","green"),lwd=3,lty=1)

# Change the signs of the 1st factor and the first loading
PCA.Yields$loadings[,1]<--PCA.Yields$loadings[,1]
PCA.Yields$scores[,1]<--PCA.Yields$scores[,1]
matplot(Maturities,PCA.Yields$loadings[,1:3],type="l",col=c("black","red","green"),lty=1,lwd=3)

matplot(PCA.Yields$scores[,1:3],type="l",col=c("black","red","green"),lwd=3,lty=1)

Uncover the Mystery of Output 8

matplot(cbind(PCA.Yields$scores[,1],AssignmentData.Output,Factors[,1]),type="l",col=c("black","red","green"),lwd=c(3,2,1),lty=c(1,2,3),ylab="Factor 1")

The Output variable in our dataset is principal compenent 1, or factor 1. Factor 1 itself is a linear combination of each UST YTM weighted by its respective loading. This corresponds to the daily dot product between the UST YTM’s centered values and the eigenvector which explains the most variance in our input data, as shown below:

cbind(head(Centered.Data,5) %*% Loadings[,1],head(AssignmentData.Output,5))
##              [,1]     [,2]
## 1/5/1981 18.01553 18.01553
## 1/6/1981 18.09140 18.09140
## 1/7/1981 19.44731 19.44731
## 1/8/1981 19.74851 19.74851
## 1/9/1981 20.57204 20.57204

Therefore, the Output variable is the weighted average of all 7 Treasury maturities where the weights of each UST correspond to the first vector of loadings. It’s important to note that the Output does not explain the full variance in yield curve movement but rather the maximum amount of variance when using one input only. Each additional principal component would explain the maximum amount of remaining variance not captured by more important principal components. Using Output1 (or Factor 1) only, however, we capture a significant portion of the variance in the UST Yield curve because Output 1 ultimately measures how the yield curve shifts, which is its most profound behavior over time.

Compare the regression coefficients from Step 2 and Step 3 with factor loadings.

First, look at the slopes for AssignmentData.Input~AssignmentData.Output

t(apply(AssignmentData, 2, function(AssignmentData.col) lm(AssignmentData.col~AssignmentData.Output)$coef))
##          (Intercept) AssignmentData.Output
## USGG3M      4.675134             0.3839609
## USGG6M      4.844370             0.3901870
## USGG2YR     5.438888             0.4151851
## USGG3YR     5.644458             0.4063541
## USGG5YR     6.009421             0.3860610
## USGG10YR    6.481316             0.3477544
## USGG30YR    6.869355             0.3047124
cbind(PCA.Yields$center,PCA.Yields$loadings[,1])
##              [,1]      [,2]
## USGG3M   4.675134 0.3839609
## USGG6M   4.844370 0.3901870
## USGG2YR  5.438888 0.4151851
## USGG3YR  5.644458 0.4063541
## USGG5YR  6.009421 0.3860610
## USGG10YR 6.481316 0.3477544
## USGG30YR 6.869355 0.3047124

This shows that the zero loading equals the vector of intercepts of models Y~Output1, where Y is one of the columns of yields in the data. Also, the slopes of the same models are equal to the first loading.

AssignmentData.Centered<-t(apply(AssignmentData,1,function(AssignmentData.row) AssignmentData.row-PCA.Yields$center))
dim(AssignmentData.Centered)
## [1] 8300    7
t(apply(AssignmentData.Centered, 2, function(AssignmentData.col) lm(AssignmentData.Output~AssignmentData.col)$coef))
##           (Intercept) AssignmentData.col
## USGG3M   1.420077e-11           2.507561
## USGG6M   1.421187e-11           2.497235
## USGG2YR  1.419747e-11           2.400449
## USGG3YR  1.419989e-11           2.455793
## USGG5YR  1.419549e-11           2.568742
## USGG10YR 1.420297e-11           2.786991
## USGG30YR 1.420965e-11           3.069561

To recover the loading of the first factor by doing regression, use all inputs together.

This means that the factor is a portfolio of all input variables with weights.

t(lm(AssignmentData.Output~AssignmentData.Centered)$coef)[-1]
## [1] 0.3839609 0.3901870 0.4151851 0.4063541 0.3860610 0.3477544 0.3047124
PCA.Yields$loadings[,1]
##    USGG3M    USGG6M   USGG2YR   USGG3YR   USGG5YR  USGG10YR  USGG30YR 
## 0.3839609 0.3901870 0.4151851 0.4063541 0.3860610 0.3477544 0.3047124