Data visualization and non-parametric fitting can become very challenging when data is high-dimensional. This has led to the development of many dimension reduction techniques that focus on the entire conditional distribution of the data. However, when specific aspects of the conditional distribution are of interest, these methods can provide more directions than necessary.
Quantile regression (QR) has received a lot of attention since its inception from Koenker and Bassett (1978) and it has been an attractive tool for research areas where non-central parts of the data are important. Dimension reduction techniques for conditional quantiles have recently made an appearance and are still in development. For some references see Wu et al. (2010), Kong and Xia (2012, 2014), Luo et al. (2014), and Christou and Akritas (2016).
More recently, Christou (2020) proposed a new technique for finding the fewest linear combinations of the predictor variable X that contain all the information about the conditional quantile. To give further intuition, for a univariate response Y and a p-dimensional predictor X, the method focuses on finding a p × dτ matrix Bτ, for τ ∈ (0, 1) and dτ ≤ p, such that Y and Qτ(Y|X) are conditionally independent given Bτ⊤X, where Qτ(Y|X) denotes the τth conditional quantile of Y given X. The space spanned by Bτ is called the τth quantile dimension reduction subspace for the regression of Y on X. The intersection of all τth dimension reduction subspaces is called the τth central quantile subspace (τ-CQS) and is denoted by 𝒮Qτ(Y|X).
The purpose of this vignette is to demonstrate the implementation of
the functions appearing in the quantdr
package, along with
some examples. The main function of the package is cqs
which returns the estimated directions of the τ-CQS.
To get started you need to first install the package using the command
and then use it during any R
session by issuing the
command
For an overview of the available help files of the package use
The package includes the following functions:
llqr
Local linear quantile regressionllqrcv
Cross-Validation for bandwidth selection of
local linear quantile regressioncqs
Central quantile subspaceValAR
Value-at-Risk estimation using the central
quantile subspace
To get help on a specific function type
help(function.name)
or ?function.name
for a
convenient shorthand. Try
For further examples, refer to the coding examples listed in the
documentation for each quantdr
function.
The overall goal of the cqs
function is to identify the
coefficients of the linear combination Bτ⊤X
in order to replace the p × 1
predictor vector X
with the low-dimensional dτ × 1 predictor
vector Bτ⊤X.
To do that, Christou (2020) proved two important results:
βτ ∈ 𝒮Qτ(Y|X), where βτ is the slope vector from regressing the conditional quantile on X, i.e., and A spans the central subspace (Li 1991). This implies an initial dimension reduction using A, making the nonparametric estimation of Qτ(Y|A⊤X) tractable.
E{Qτ(Y|Uτ)X} ∈ 𝒮Qτ(Y|X), where Uτ is a measurable function of Bτ⊤X, provided that Qτ(Y|Uτ)X is integrable.
The above two results imply the following:
If the dimension of the τ-CQS is known to be one, then we can fit a linear regression model of Qτ(Y|A⊤X) on X and report the slope vector as the basis vector.
If the dimension of the τ-CQS is known to be greater than one, then we can set βτ, 0 = βτ as the initial vector and then create more vectors using βτ, j = E{Qτ(Y|βτ, j − 1⊤X)X}, for j = 1, …, p − 1. In order to obtain linearly independent vectors, we can perform an eigenvalue decomposition on VτVτ⊤, where Vτ = (βτ, 0, …, βτ, p − 1) and choose the eigenvectors corresponding to the dτ nonzero eigenvalues.
If the dimension of the τ-CQS is unknown, we can apply the procedure from 2. Then, we can estimate the dimension dτ using the eigenvalues resulting from the eigenvalue decomposition. Existing techniques such as the Cross-Validation (CV) criterion or the modified-BIC type criterion of Zhu et al. (2010) can be used to determine the structural dimension.
All of the above are incorporated in the cqs
function.
The first step of the algorithm requires to fit a linear regression model of Qτ(Y|A⊤X) on X, which implies the non-parametric estimation of the conditional quantile. The second step of the algorithm relies on estimating an expectation and performing an eigenvalue decomposition. Specifically,
We use a dimension reduction technique to estimate the basis
matrix A of the
central subspace, denoted by $\widehat{\mathbf{A}}$, and form the new
sufficient predictors $\widehat{\mathbf{A}}^{\top} \mathbf{X}_{i}$,
i = 1, …, n. For this
we use the function dr
from the package
dr
.
For each i = 1, …, n, we use the
local linear conditional quantile estimation method of Guerre and Sabbah
(2012) to estimate $Q_{\tau}(Y|\widehat{\mathbf{A}}^{\top}
\mathbf{X}_{i})$. For this we use the llqr
of the
presented package.
We set $\widehat{\boldsymbol{\beta}}_{\tau}$ to be
For this we use the lm
function of the stats
package.
If dτ = 1 we stop and report $\widehat{\boldsymbol{\beta}}_{\tau}$ as the estimated basis vector for 𝒮Qτ(Y|X). Otherwise, we move to Step 5.
We set $\widehat{\boldsymbol{\beta}}_{\tau, 0}=\widehat{\boldsymbol{\beta}}_{\tau}$, and, for j = 1, …, p − 1, we
llqr
function of the presented paper.We form the p × p matrix $\widehat{\mathbf{V}}_{\tau}=(\widehat{\boldsymbol{\beta}}_{\tau,
0}, \dots, \widehat{\boldsymbol{\beta}}_{\tau, p-1})$ and choose
the dτ
eigenvectors corresponding to the dτ largest
eigenvalues of $\widehat{\mathbf{V}}_{\tau}
\widehat{\mathbf{V}}_{\tau}^{\top}$. For this we use the
eigen
function of the base
package.
If dτ
is unknown, a suggested structural dimension can be estimated using
existing techniques such as the CV criterion or the modified-BIC type
criterion of Zhu et al. (2010). In terms of the cqs
function, the modified-BIC type criterion is utilized and returns a
suggested dimension. Note that the user can extract the eigenvalues and
apply them to any other existing technique.
cqs
For the purpose of this section, we simulate data from the homoscedastic single-index model where X = (X1, …, X10)⊤ and the error ϵ are generated according to a standard normal distribution. The τ-CQS is spanned by (3, 1, 0, …, 0)⊤ for all τ. We focus on the estimation of the 0.5-CQS.
set.seed(1234)
n <- 100
p <- 10
tau <- 0.5
x <- matrix(rnorm(n * p), n, p)
error <- rnorm(n)
y <- 3 * x[, 1] + x[, 2] + error
The cqs
function proceeds by specifying an n × p design matrix X, a vector of the response
variable Y, and a quantile
level τ. The dimension of the
τ-CQS, i.e., dτ, is optional.
Note that for this example dτ = 1 for every
quantile level τ.
We now discuss the output of the cqs
function.
out1 <- cqs(x, y, tau = tau, dtau = 1)
out1
#> $qvectors
#> [,1]
#> [1,] 0.952948139
#> [2,] 0.287974471
#> [3,] 0.042667367
#> [4,] 0.038590479
#> [5,] -0.003885815
#> [6,] -0.042769158
#> [7,] 0.028417264
#> [8,] 0.018789922
#> [9,] 0.023914784
#> [10,] 0.045541197
#>
#> $dtau
#> [1] 1
When the dimension of the τ-CQS is known to be one, the
algorithm reports the ordinary least-squares slope vector as the basis
vector of the subspace. This is denoted by qvectors
.
Moreover, the function cqs
outputs dtau
, which
in this case is specified by the user.
When the dimension of the τ-CQS is unknown (or known to be
greater than one), the algorithm continues, creates more vectors, and
performs an eigenvalue decomposition. Therefore, the output includes one
more element, the eigenvalues provided by the eigenvalue decomposition
and denoted by qvalues
.
out2 <- cqs(x, y, tau = tau)
out2
#> $qvectors
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 0.94285965 -0.04373861 0.323724575 0.10473035 0.06290808 -0.04667851
#> [2,] 0.30671396 0.29208998 -0.401758607 -0.16287563 0.15202916 -0.21945183
#> [3,] 0.03101309 -0.28351089 -0.199986929 -0.59193015 -0.45455105 -0.19721619
#> [4,] 0.03394904 -0.16427022 -0.478614025 0.35884778 -0.13384503 0.13125004
#> [5,] -0.00658252 -0.09730212 -0.205810821 -0.08802898 0.75194191 -0.04145494
#> [6,] -0.02433910 0.31968626 0.006386568 -0.65344860 0.37885008 -0.06677676
#> [7,] 0.02707587 0.03398101 0.354491863 -0.21839222 0.12549555 0.77574347
#> [8,] 0.01569753 -0.12261129 -0.358513162 0.23379874 0.04091144 0.40756218
#> [9,] 0.06317845 0.79095189 0.144685863 0.19507455 -0.18329861 -0.20917231
#> [10,] 0.07962450 0.54997345 -0.443034398 -0.38301190 -0.35618445 0.33809519
#> [,7] [,8] [,9] [,10]
#> [1,] 0.07376381 -0.16872174 0.008474194 0.14020956
#> [2,] -0.00725682 0.78950190 0.134953357 0.14996842
#> [3,] -0.06903646 -0.25806831 0.566790028 -0.28810369
#> [4,] 0.31360476 -0.09073977 -0.387039672 -0.57510228
#> [5,] 0.39371036 -0.29970274 0.123874984 -0.05495289
#> [6,] -0.47061321 -0.12265678 -0.366212884 -0.45406911
#> [7,] 0.07727430 0.30837138 0.410192154 -0.40082730
#> [8,] -0.74653472 -0.32831788 0.339507633 0.04297342
#> [9,] -0.22776133 -0.25554310 0.624061137 -0.30765592
#> [10,] 0.10565655 -0.04530856 -0.094568566 0.39571912
#>
#> $qvalues
#> [1] 9.943967e+01 9.879354e-02 3.285674e-03 4.633843e-04 1.760913e-04
#> [6] 3.195217e-05 1.161632e-05 4.207074e-06 1.322628e-06 1.268059e-10
#>
#> $dtau
#> [1] 1
Note that the dimension dtau
is correctly estimated by
the modified-BIC type criterion. Therefore, this output suggests to take
the first eigenvector as the basis vector for the τ-CQS. We can extract the direction
using
out2$qvectors[, 1:out2$dtau]
#> [1] 0.94285965 0.30671396 0.03101309 0.03394904 -0.00658252 -0.02433910
#> [7] 0.02707587 0.01569753 0.06317845 0.07962450
If we want to measure the estimation error we can use the angle
between the true and the estimated subspaces. This is performed using
the subspace
function of the pracma
package.
Note that the angle is measured in radians, and so we divide by π/2.
library(pracma)
beta_true <- c(3, 1, rep(0, p - 2))
beta_hat1 <- out1$qvectors
beta_hat2 <- out2$qvectors[, 1:out2$dtau]
subspace(beta_true, beta_hat1) / (pi / 2)
#> [1] 0.06297377
subspace(beta_true, beta_hat2) / (pi / 2)
#> [1] 0.07591763
Note that the estimation error is slightly higher when the dimension of the subspace is unknown and needs to be estimated.
We can continue further and estimate the conditional quantile function. To do this, we first form the new sufficient predictor $\widehat{\mathbf{B}}_{\tau}^{\top}\mathbf{X}$ using
We then estimate the conditional quantile using the llqr
function of the presented package. The main arguments of the function
are: the design matrix, which in this case is $\widehat{\mathbf{B}}_{\tau}^{\top}\mathbf{X}$,
the vector of the response variable Y, and the quantile level τ. The rest of the arguments, i.e.,
the bandwidth h, the method
for the estimation of h, and
the single observation x0 for which to perform
the estimation, are optional. If h is not specified, then it will be
determined using either the rule-of-thumb bandwidth of Yu and Jones
(1998) or the CV criterion; the default choice is the rule-of-thumb.
However, the user needs to be careful about the bandwidth selection.
When the dimension of the predictor variable is large compared to the
sample size, local linear fitting meets the ‘curse of dimensionality’
problem. In situations like that, the bandwidth selected by the
rule-of-thumb or the CV criterion might be small and cause the function
to fail. For these cases, we advice the user to pre-specify the
bandwidth in the function. Finally, if x0 is not specified, the
estimation will be performed on the entire design matrix. For this
example we use
qhat1 <- llqr(newx, y, tau)
qhat1
#> $ll_est
#> [1] -4.29612236 0.50206070 2.83802778 -6.90420017 0.19579698 1.85561955
#> [7] -2.31913140 -1.33434436 -1.72001737 -3.20272158 -1.28031021 -3.94174571
#> [13] -3.49941104 0.52131417 2.73413065 0.56630321 -2.36093711 -3.12557074
#> [19] -2.51407640 6.52189600 -0.08583814 -2.33251792 0.55809983 2.13418429
#> [25] -0.34625794 -4.56943348 1.68280128 -4.83741324 -0.91380392 -3.26092075
#> [31] 4.29663797 -1.08891160 -3.99021058 -1.31242281 -6.35220726 -4.49313490
#> [37] -7.03023081 -5.53238392 0.11381859 -2.35896143 4.22278587 -2.60786133
#> [43] -2.04607770 -1.12522131 -2.76437352 -2.77165266 -1.55178162 -4.52444343
#> [49] -1.83495363 -1.63030447 -6.01074528 -1.51250554 -2.17197891 -3.86216645
#> [55] -1.07434001 2.96517288 4.61003905 -3.74261116 3.95256798 -4.11844704
#> [61] 1.69068143 7.09879779 -0.58651720 -2.81532043 0.15631061 5.27102863
#> [67] -2.71082703 3.91064983 3.68574190 2.73121453 1.64619818 -1.30946960
#> [73] -1.27896874 0.55501360 7.05577045 -1.05247670 -5.45933552 0.89422383
#> [79] 0.89854628 -1.36919251 -3.28789275 -1.27642763 -3.79301884 0.14684430
#> [85] 3.47472770 4.20733501 3.02325432 -2.32795975 -0.19731735 -4.25561959
#> [91] -0.47404473 -2.57259595 4.22777744 3.91939381 0.47869246 1.89990559
#> [97] -2.87174638 1.69526276 3.55421064 4.38360715
#>
#> $h
#> [1] 0.3587732
The output consists of the estimation of the conditional quantile function at each point of the design matrix, i.e., $\widehat{Q}_{\tau}(Y|\widehat{\mathbf{B}}_{\tau}^{\top} \mathbf{X}_{i})$, i = 1, …, n, and the estimation of the bandwidth using the rule-of-thumb. For illustration purposes, we repeat the above using the CV criterion for the bandwidth selection.
qhat2 <- llqr(newx, y, tau, method = "CV")
qhat2
#> $ll_est
#> [1] -4.16471702 0.42243692 2.92565950 -6.90420017 0.13404652 1.85561955
#> [7] -2.37072469 -1.34044965 -1.74117469 -3.22002253 -1.28563104 -3.81894259
#> [13] -3.53049520 0.44056678 2.83018199 0.48293029 -2.41430029 -3.15966390
#> [19] -2.57392284 6.65279536 -0.13115259 -2.38467794 0.47520565 2.13529423
#> [25] -0.37101672 -4.30454499 1.62995552 -4.65447360 -0.91380392 -3.27669481
#> [31] 4.35903141 -1.08469511 -3.86885124 -1.31820985 -6.35093348 -4.23891907
#> [37] -7.04312759 -5.49387034 0.05685232 -2.41224096 4.23569199 -2.67233348
#> [43] -2.09085856 -1.12013049 -2.83530198 -2.84288140 -1.56852396 -4.26584816
#> [49] -1.87138464 -1.64910747 -6.01074528 -1.52119732 -2.22173910 -3.74999646
#> [55] -1.07047439 3.04250102 5.05306616 -3.64641578 3.88369939 -3.99491726
#> [61] 1.63802186 7.09879779 -0.59906765 -2.88835051 0.09686454 5.54979008
#> [67] -2.77954663 3.83234757 3.56330169 2.82750218 1.59248756 -1.31521376
#> [73] -1.28427010 0.47229953 7.07269696 -1.04913754 -5.41494071 0.87880733
#> [79] 0.88309542 -1.37580371 -3.30295918 -1.28169210 -3.69008814 0.08795068
#> [85] 3.36253416 4.21394754 3.09136422 -2.37992680 -0.22964427 -4.12600955
#> [91] -0.49231026 -2.63561335 4.25858264 3.80649004 0.40043244 1.89990559
#> [97] -2.94710410 1.64271143 3.43847962 4.50888583
#>
#> $h
#> [1] 0.5981072
If h is pre-specified, then the output reports the value given by the user, i.e.,
qhat3 <- llqr(newx, y, tau, h = 1)
qhat3
#> $ll_est
#> [1] -4.10733119 0.41575048 2.82984989 -6.96560477 0.12867419 1.80232825
#> [7] -2.31913140 -1.33238865 -1.71846335 -3.13228281 -1.27829822 -3.78217584
#> [13] -3.43094360 0.43379773 2.73390739 0.47596820 -2.36093711 -3.07602690
#> [19] -2.51407640 6.86316722 -0.13531649 -2.33251792 0.46827876 2.07701731
#> [25] -0.37942098 -4.33565856 1.58076620 -4.56941194 -0.91141020 -3.18679900
#> [31] 4.36291723 -1.08003138 -3.83145286 -1.31044427 -6.38394707 -4.27831201
#> [37] -7.11408028 -5.47538775 0.05183175 -2.35896143 4.23499991 -2.60848998
#> [43] -2.04607770 -1.11499607 -2.76483932 -2.77211089 -1.55005238 -4.30757720
#> [49] -1.83495363 -1.62865701 -6.01074528 -1.51073539 -2.17197891 -3.74999646
#> [55] -1.06599957 2.94726045 5.17473676 -3.64641578 3.87772477 -3.95592341
#> [61] 1.58868591 7.09879779 -0.60462800 -2.81573322 0.09166164 5.80181928
#> [67] -2.71134854 3.82487019 3.55953216 2.73121453 1.54397932 -1.30748798
#> [73] -1.27695535 0.46538589 7.07218396 -1.04494614 -5.38301842 0.86579411
#> [79] 0.87005822 -1.36727309 -3.21206415 -1.27441160 -3.69008814 0.08278840
#> [85] 3.35565465 4.21259803 2.99636162 -2.32795975 -0.23981152 -4.08535663
#> [91] -0.49920193 -2.57326128 4.25858264 3.79825597 0.39384627 1.84580928
#> [97] -2.87210046 1.59329025 3.43277654 4.52152403
#>
#> $h
#> [1] 1
To illustrate the median regression, compared with the original data, we can use
true_dir <- x %*% beta_true
plot(true_dir, y, xlab = "sufficient direction", ylab = "y", pch = 16)
points(true_dir, qhat1$ll_est, pch = 16, col = 'red')
The same procedure can be performed for multiple quantile levels. For example, we estimate the τ-CQS for τ = 0.1, 0.25, 0.5, 0.75, 0.9.
taus <- c(0.1, 0.25, 0.5, 0.75, 0.9)
out3 <- matrix(0, p, length(taus))
for (i in 1:length(taus)) {
out3[, i] <- cqs(x, y, tau = taus[i], dtau = 1)$qvectors
}
out3
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.955128387 0.953563098 0.952948139 0.95432205 0.957157424
#> [2,] 0.282221145 0.286814969 0.287974471 0.28469153 0.275583662
#> [3,] 0.040025231 0.040124118 0.042667367 0.04252055 0.039230711
#> [4,] 0.037794516 0.037393312 0.038590479 0.03742298 0.037849461
#> [5,] -0.002145479 -0.002710592 -0.003885815 -0.00284509 0.002339282
#> [6,] -0.049903668 -0.047392988 -0.042769158 -0.05199378 -0.053432303
#> [7,] 0.029549555 0.029730897 0.028417264 0.02857128 0.032300729
#> [8,] 0.016985520 0.017341603 0.018789922 0.01678999 0.017397007
#> [9,] 0.015690746 0.019725351 0.023914784 0.01175831 0.009852825
#> [10,] 0.033877447 0.040239263 0.045541197 0.03261534 0.025062461
Similarly, we can plot the data along with the estimated conditional quantile functions for the various quantile levels.
newx <- x %*% out3
oldpar <- par(no.readonly = TRUE)
par(mfrow=c(2,3))
qhat_tau <- as.null()
for (i in 1:length(taus)) {
plot(true_dir, y, xlab = "sufficient direction", ylab = "y", main = taus[i], pch = 16)
qhat_tau <- llqr(newx[, i], y, tau = taus[i])$ll_est
points(true_dir, qhat_tau, pch = 16, col = "red")
}
par(oldpar)
Value-at-risk (VaR) is an important financial quantity directly related to the QR framework. Specifically, for {rt}t = 1n a time series of returns, the τth VaR at time t, denoted by VaRτ(t), is the smallest number for which P{rt < −VaRτ(t)|ℱt − 1} = τ, where ℱt − 1 denotes the information set at time t − 1. Therefore, −VaRτ(t) is the τth conditional quantile of rt, and we can write Qτ(rt|ℱt − 1) = −VaRτ(t).
Christou and Grabchak (2019) used the methodology proposed by
Christou (2020) to estimate VaR and compare it with existing
commonly-used methods. The estimation of VaRτ(t)
at time t reduces to the
estimation of Qτ(rt|ℱt − 1),
which can be performed using the cqs
and llqr
functions of the presented package. Specifically, let rt denote a
daily return (usually log-return) and rt
denote a p-dimensional vector
of covariates consisting of the p past returns, i.e., rt, p = (rt − 1, …, rt − p)⊤.
We want to estimate Qτ(rt|rt, p)
at time t by assuming that
there exists a p × dτ
matrix Bτ,
dτ ≤ p,
such that Qτ(rt|rt, p) = Qτ(rt|Bτ⊤rt, p).
This implies that Bτ⊤rt, p
contains all the information about rt that is
available from Qτ(rt|rt, p).
Once Bτ is
estimated using the cqs
function, the sufficient predictors
$\widehat{\mathbf{B}}_{\tau}^{\top}\mathbf{r}_{t,p}$
are formed and $Q_{\tau}(r_{t} |
\widehat{\mathbf{B}}_{\tau}^{\top}\mathbf{r}_{t,p})$ is estimated
using the llqr
function.
ValAR
For an illustration of VaR estimation, we use the edhec
data set available in the PerformanceAnalytics
package in
R. The data set includes the EDHEC composite hedge fund style index
returns.
library(PerformanceAnalytics)
data(edhec, package = "PerformanceAnalytics")
head(edhec)
#> Convertible Arbitrage CTA Global Distressed Securities
#> 1997-01-31 0.0119 0.0393 0.0178
#> 1997-02-28 0.0123 0.0298 0.0122
#> 1997-03-31 0.0078 -0.0021 -0.0012
#> 1997-04-30 0.0086 -0.0170 0.0030
#> 1997-05-31 0.0156 -0.0015 0.0233
#> 1997-06-30 0.0212 0.0085 0.0217
#> Emerging Markets Equity Market Neutral Event Driven
#> 1997-01-31 0.0791 0.0189 0.0213
#> 1997-02-28 0.0525 0.0101 0.0084
#> 1997-03-31 -0.0120 0.0016 -0.0023
#> 1997-04-30 0.0119 0.0119 -0.0005
#> 1997-05-31 0.0315 0.0189 0.0346
#> 1997-06-30 0.0581 0.0165 0.0258
#> Fixed Income Arbitrage Global Macro Long/Short Equity
#> 1997-01-31 0.0191 0.0573 0.0281
#> 1997-02-28 0.0122 0.0175 -0.0006
#> 1997-03-31 0.0109 -0.0119 -0.0084
#> 1997-04-30 0.0130 0.0172 0.0084
#> 1997-05-31 0.0118 0.0108 0.0394
#> 1997-06-30 0.0108 0.0218 0.0223
#> Merger Arbitrage Relative Value Short Selling Funds of Funds
#> 1997-01-31 0.0150 0.0180 -0.0166 0.0317
#> 1997-02-28 0.0034 0.0118 0.0426 0.0106
#> 1997-03-31 0.0060 0.0010 0.0778 -0.0077
#> 1997-04-30 -0.0001 0.0122 -0.0129 0.0009
#> 1997-05-31 0.0197 0.0173 -0.0737 0.0275
#> 1997-06-30 0.0231 0.0198 -0.0065 0.0225
To estimate the one-step ahead VaR we need a vector of returns in a
standard chronological order. For this, we will select the ‘CTA Global
Distressed’ as a vector of returns. The ValAR
function
proceeds by specifying the vector of returns y, the number p of past observations to be used as
the predictor variables, the quantile level τ, a moving window for the number of
observations to be used for fitting the model, and a logical operator to
indicate whether the returns are given in a standard chronological order
(oldest to newest) or not. If a moving window is not specified, then the
default value of min(250, n - p) will be used to estimate the
conditional quantile. Note that, typical values for a moving window
correspond to one or two years of returns. A moving window will be
useful if the number of observations is large and old returns will not
be of much value to the prediction. Moreover, if the number of
observations is large, a moving window will also speed up the algorithm,
since the number of observations to be used will be smaller. Finally, if
the returns are in reversed chronological order, then the user should
either reverse the vector of returns or indicate that using the
chronological
option of the function.
For this example, we will use p = 5, τ = 0.05, and will not specify any
moving window so that the default value of min(250, 275 - 5) = 250 will
be used. Also, the returns are given in standard chronological order
and, since this is the default setting, we do not specify anything for
chronological
. Note that, the last day of returns is
2019-11-30 and therefore, we are predicting the 5% VaR for
2019-12-31.
y <- as.vector(edhec[, 2])
n <- length(y)
p <- 5
tau <- 0.05
ValAR(y, p = p, tau = tau)
#> [1] -0.02966812
If we want to compare this number with the 5% VaR estimated by the historical method, we can use
For illustration purposes, we will use the last 100 observations as a testing data set for one-step ahead VaR estimation and the rest of the observations as historical data. Specifically, for each of the time points, t, of the observations not included in the historical data, we estimate the model based on all of the data up to time t − 1 and report the 5% VaR.
size <- 100
VaRest <- as.null(size)
for (i in 1:size){
VaRest[i] <- ValAR(y[1:(n - size + i - 1)], p, tau)
}
To visualize the results, we can plot the time series of returns with the one-step ahead 5% VaR forecasts.
plot.ts(y[(n - size + 1):n], ylim = range(y[(n - size + 1):n], VaRest), ylab = 'returns')
lines(VaRest, col = 'red')
Moreover, we can calculate the proportion of exceptions, i.e., the number of times $r_{t} < -\widehat{VaR}_{\tau}(t)$ for each time point t of the last 100 observations. This number should be a reasonable estimate for τ. In this case we can see that the estimated proportion is actually 0.05.
We can repeat the above procedure for multiple quantile levels, i.e., for τ = 0.01, 0.025, 0.05.
taus <- c(0.01, 0.025, 0.05)
VaRest <- matrix(0, size, length(taus))
for (i in 1:size) {
for (j in 1:length(taus)) {
VaRest[i, j] <- ValAR(y[1:(n - size + i - 1)], p, taus[j])
}
}
# plots
plot.ts(y[(n - size + 1):n], ylim = range(y[(n - size + 1):n], VaRest), ylab = 'returns')
lines(VaRest[, 1], col = 'red')
lines(VaRest[, 2], col = 'blue')
lines(VaRest[, 3], col = 'green')
legend('top', legend = c("1%", "2.5%", "5%"), col = c("red", "blue", "green"),
lty=1, cex=1, horiz = T, bty = "n")
# proportion of exceptions
sum(y[(n - size + 1):n] < VaRest[, 1]) / size
#> [1] 0.03
sum(y[(n - size + 1):n] < VaRest[, 2]) / size
#> [1] 0.03
sum(y[(n - size + 1):n] < VaRest[, 3]) / size
#> [1] 0.05
We can see that the estimated proportion of exceptions are very close to the true values of 0.01, 0.025, and 0.05.
This vignette provides a brief introduction to the R package
quantdr
and presents a tutorial on how to implement the
basic function cqs
. Performing dimension reduction
techniques to conditional quantiles is an active research topic and
therefore updates and new functions will be incorporated into
forthcoming versions of the package. For this reason, this document will
be updated accordingly and will be made available through the package.
Suggestions and recommendations from the users will be highly
appreciated and welcomed. This package is also available through GitHub. Feel free
to contact the author at [email protected].
Christou, E. (2020) Central quantile subspace. Statistics and Computing, 30, 677–695
Christou, E. and Akritas, M. (2016) Single index quantile regression for heteroscedastic data. Journal of Multivariate Analysis, 150, 169-182
Christou, E. and Grabchak, M. (2019) Estimation of value-at-risk using single index quantile regression. Journal of Applied Statistics, 46(13), 2418–2433
Guerre, E., and Sabbah, C. (2012) Uniform bias study and Bahadur representation for local polynomial estimators of the conditional quantile function. Econometric Theory, 28, 87-129
Koenker, R., and Bassett, G. (1978) Regression quantiles. Econometrica, 46, 33-50
Kong, E., and Xia, Y. (2012) A single-index quantile regression model and its estimation. Econometric Theory, 28, 730-768
Kong, E., and Xia, Y. (2014) An adaptive composite quantile approach to dimension reduction. Annals of Statistics, 42, 1657-1688
Li, K.-C. (1991) Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316-327
Luo, W., Li, B., and Yin, X. (2014) On efficient dimension reduction with respect to a statistical functional of interest. Annals of Statistics, 42, 382-412
Wu, T. Z., Yu, K., and Yu, Y. (2010) Single index quantile regression. Journal of Multivariate Analysis, 101, 1607-1621
Yu, K., and Jones, M. C. (1998) Local linear quantile regression. Journal of the American Statistical Association, 93, 228-238
Zhu, L.-P., Zhu, L.-X., and Feng, Z.-H. (2010) Dimension reduction in regression through cumulative slicing estimation. Journal of the American Statistical Association, 105, 1455-1466