Jekyll2021-07-29T22:50:33-04:00https://www.adrian.idv.hk/feed.xml∫ntegrabℓε ∂ifferentiαℓsunorganised memo, notes, code, data, and writings of random topicsAdrian S. Tamrighthandabacus@users.github.comHurst parameter and fractional Brownian motion2021-07-26T19:08:16-04:002021-07-26T19:08:16-04:00https://www.adrian.idv.hk/hurst<p>I was introduced to the concept of self-similarity and long-range dependency of
a time series from the seminal paper <a href="http://ccr.sigcomm.org/archive/1995/jan95/ccr-9501-leland.pdf">On the Self-Similar Nature of Ethernet
Traffic</a> by
Leland et al (1995). The Hurst parameter or the Hurst exponent is the key
behind all these.</p>
<p>If we consider a Brownian motion, regardless of scale, we always have the
property that the standard deviation of the process is proportional to the
square root of time, namely, \(B_t - B_s \sim N(0, t-s)\) in distribution. The
Brownian motion is memoryless, hence no long-range dependency. When we
generalize the Brownian motion, we can consider a zero-mean process \(B_H(t)\)
with the property</p>
\[\langle\vert B_H(t+\tau) - B_H(t)\vert^2\rangle \sim \tau^{2H}\]
<p>namely, the mean of the square difference is proportional to the time window to
the power of \(2H\). The range of \(H\) is from 0 to 1 and Brownian motion has
\(H=0.5\). The parameter \(H\) is the Hurst exponent. Fractal dimension is
defined in terms of Hurst exponent as \(D=2-H\).</p>
<p>In J. Feder’s book <em>Fractals</em> (1998), it accounts for how Hurst calculate the
Hurst exponential for the water level in Lake Albert. Hurst denotes the influx
of year \(t\) as \(\xi(t)\) and the discharge as \(\langle\xi\rangle_\tau\), which</p>
\[\langle\xi\rangle_\tau = \frac{1}{\tau}\sum_{t=1}^\tau \xi(t)\]
<p>The accumulation is therefore the running sum</p>
\[X(t)=\sum_{u=1}^t\left(\xi(u)-\langle\xi\rangle_\tau\right)\]
<p>The range is defined as</p>
\[R(\tau) = \max_{t: t=1,\cdots,\tau} X(t) - \min_{t: t=1,\cdots,\tau} X(t)\]
<p>and the standard deviation is defined as</p>
\[S=\sqrt{\frac{1}{\tau}\sum_{t=1}^\tau\left(\xi(t)-\langle\xi\rangle_\tau\right)^2}\]
<p>Hurst found that, \(R/S=(\tau/2)^H\), which the LHS is called the <em>rescaled
range</em> and it is proportional to \(\tau^H\). This can be understood intuitively
if we consider the range is roughly a measure to the standard deviation, which
its square is the variance and is proportional to \(\tau^{2H}\).</p>
<h2 id="determining-hurst-exponent">Determining Hurst exponent</h2>
<p>If we are given a time series \(X(t)\), how could we find its Hurst exponent
(and hence tell if it is Brownian)?</p>
<p>The intuitive way is using Hurst’s empirical method: With different time ranges
\(\tau\), find the rescaled range \(R/S\) and then fit for the parameter \(H\)
using \(R/S = C\tau^H\) for some constant \(C\). But as the time ranges
\(\tau\) varies, we may be able to fit multiple windows in the input time
series. Hence multiple rescaled range can be computed, and we can take the
average for the particular \(\tau\).</p>
<p>Here is the code:</p>
<pre><code class="language-python">def hurst_rs(ts, min_win=5, max_win=None):
"""Find Husrt exponent using rescaled range method
Args:
ts: The time series, as 1D numpy array
min_win: Minimum window to use
max_win: Maximum window to use
Return:
Hurst exponent as a float
"""
ts = np.array(ts)
max_win = max_win or len(ts)
win = np.unique(np.round(np.exp(np.linspace(np.log(min_win), np.log(max_win), 10))).astype(int))
rs_w = []
for tau in win:
rs = []
for start in np.arange(0, len(ts)+1, tau)[:-1]:
pts = ts[start:start+tau] # partial time series
r = np.max(pts) - np.min(pts) # range
s = np.sqrt(np.mean(np.diff(pts)**2)) # RMS of increments as standard deviation
rs.append(r/s)
rs_w.append(np.mean(rs))
p = np.polyfit(np.log(win), np.log(rs_w), deg=1)
return p[0]
</code></pre>
<p>The function would not find the rescaled range for all time window \(\tau\)
because it is too slow for practical use. Instead, it evenly takes 10 points in
the log scale from the minimum to the maximum. For each \(\tau\), <code>np.arange(0,
len(ts)+1, tau)</code> generates starting points separated by one full window, hence
we partitioned the time series into non-overlapping sequences of length <code>tau</code>,
except the last one, which the input time series may run out, and hence
discarded. Then for each partiel time series, a range is found and the
root-mean-squared increment is considered as standard deviation (since we
assumed the increments are having zero mean). For each \(\tau\), the \(R/S\) is
taken as the mean of all rescaled ranges from different partial time series. Then we consider</p>
\[\log(R/S) = k + H\log(\tau)\]
<p>and hence a linear regression (degree-1 polynomial) fitting \(\log(R/S)\)
against \(\log(\tau)\) will produce the Hurst exponent as the order-1
coefficient.</p>
<p>Another method is to use the scaling properties of a fBm:</p>
<pre><code class="language-python">def hurst_sp(ts, max_lag=50):
"""Returns the Hurst Exponent of the time series using scaling properties"""
lags = range(2, max_lag)
ts = np.array(ts)
stdev = [np.std(ts[tau:]-ts[:-tau]) for tau in lags]
p = np.polyfit(np.log(lags), np.log(stdev), 1)
return p[0]
</code></pre>
<p>This is a much shorter code but it considered \(B_H(t+\tau)-B_H(t)\), which its
standard deviation is expected to be proportional to \(\tau^H\). The difference
is computed directly across the entire time series and then the standard
deviation is computed. Then, as before, we fit a linear equation between the
time lag and the standard deviation of the difference, and the Hurst exponent
is the order-1 coefficient.</p>
<p>It turns out, I found that the rescaled range method often overestimate the
Hurst exponent and the scaling property method sometimes underestimates. As
seen below:</p>
<pre><code class="language-python">N = 2500
sigma = 0.15
dt = 1/250.0
bm = np.cumsum(np.random.randn(N)) * sigma / (N*dt)
h1 = hurst_rs(bm)
h2 = hurst_sp(bm)
print(f"Hurst (RS): {h1:.4f}")
print(f"Hurst (scaling): {h2:.4f}")
print(f"Hurst (average): {(h1+h2)/2:.4f}")
</code></pre>
<p>This gives</p>
<pre><code class="language-text">Hurst (RS): 0.5927
Hurst (scaling): 0.4783
Hurst (average): 0.5355
</code></pre>
<h2 id="generating-fractional-brownian-motion">Generating fractional Brownian motion</h2>
<p>What if we are given \(H\) and generate a time series? This is more difficult
then it seems. The Hurst exponent is ranged from 0 to 1, with Brownian motion
is \(H=0.5\). If \(H<0.5\), the time series is <em>mean-reverting</em>, and if
\(H>0.5\), the time series is trending or with long-range dependency (LRD). The
other way to understand this is that, if \(H>0.5\), the increments are
positively correlated, while \(H<0.5\) then they are negatively correlated.</p>
<p><a href="https://en.wikipedia.org/wiki/Fractional_Brownian_motion">Wikipedia</a> gives a
few property of the fractional Brownian motion:</p>
<ul>
<li>self-similarity: \(B_H(at) \sim \vert a\vert^H B_H(t)\)</li>
<li>stationary increment: \(B_H(t)-B_H(s) = B_H(t-s)\)</li>
<li>long range dependency: if \(H>0.5\), we have \(\sum_{k=1}^\infty \mathbb{E}[B_H(1)(B_H(k+1)-B_H(k))] = \infty\)</li>
<li>regularity: for any \(\epsilon>0\), there exists constant \(c\) such that \(\vert B_H(t) - B_H(s)\vert \le c\vert t-s\vert^{H-\epsilon}\)</li>
<li>covariance of increment of \(B_H(s)\) and \(B_H(t)\) is \(R(s,t) = \frac12(s^{2H}+t^{2H}-\vert t-s\vert^{2H})\)</li>
</ul>
<p>We can based on the covariance of increment to create a huge covariance matrix
(each row and column corresponds to one time sample) and use the Cholesky
decomposition method to generate correlated Gaussian samples. The fBm is the
running sum of these samples.</p>
<p>Another way to generate this is as follows, adopted from a MATLAB code:</p>
<pre><code class="language-python">def fbm1d(H=0.7, n=4096, T=10):
"""fast one dimensional fractional Brownian motion (FBM) generator
output is 'W_t' with t in [0,T] using 'n' equally spaced grid points;
code uses Fast Fourier Transform (FFT) for speed.
Adapted from http://www.mathworks.com.au/matlabcentral/fileexchange/38935-fractional-brownian-motion-generator
Args:
H: Hurst parameter, in [0,1]
n: number of grid points, will be adjusted to a power of 2 by n:=2**ceil(log2(n))
T: final time
Returns:
W_t and t for the fBm and the time
Example:
W, t = fbm1d(H, n, T)
plt.plot(t, W)
Reference:
Kroese, D. P., & Botev, Z. I. (2015). Spatial Process Simulation.
In Stochastic Geometry, Spatial Statistics and Random Fields(pp. 369-404)
Springer International Publishing, DOI: 10.1007/978-3-319-10064-7_12
"""
# sanitation
assert 0<H<1, "Hust parameter must be between 0 and 1"
n = int(np.exp2(np.ceil(np.log2(n))))
r = np.zeros(n+1)
r[0] = 1
idx = np.arange(1,n+1)
r[1:] = 0.5 * ((idx+1)**(2*H) - 2*idx**(2*H) + (idx-1)**(2*H))
r = np.concatenate([r, r[-2:0:-1]]) # First row of circulant matrix
lamb = np.fft.fft(r).real/(2*n) # Eigenvalues
z = np.random.randn(2*n) + np.random.randn(2*n)*1j
W = np.fft.fft(np.sqrt(lamb) * z)
W = n**(-H) * np.cumsum(W[:n].real) # rescale
W = T**H * W
t = np.arange(n)/n * T # Scale for final time T
return W, t
</code></pre>
<p>The explanation of why this works is in the article referenced above. But we can see the plot as follows:</p>
<p><img src="/img/hurst.png" alt="fbm sample paths" /></p>
<p>which we can see that the lower the Hurst exponent, the more fluctuating the
random walk, and the higher the Hurst exponent, the smoother.</p>Adrian S. Tamrighthandabacus@users.github.comI was introduced to the concept of self-similarity and long-range dependency of a time series from the seminal paper On the Self-Similar Nature of Ethernet Traffic by Leland et al (1995). The Hurst parameter or the Hurst exponent is the key behind all these.QQ-plot and PP-plot2021-07-23T17:43:54-04:002021-07-23T17:43:54-04:00https://www.adrian.idv.hk/qqplot<p>Both QQ-plot and PP-plot are called the probability plot, but they are
different. These plots are intended to compare two distributions, usually at
least one of them is empirical. It is to graphically tell how good the two
distributions fit.</p>
<p>Assume the two distributions have the cumulative distribution functions
\(F(x)\) and \(G(x)\), the PP-plot is to show \(G(x)\) against \(F(x)\) for
varying \(x\). Hence the domain and range of the plot is always from 0 to 1, as
we are plotting only the range of the cumulative distribution functions.</p>
<p>QQ-plot, however, is to plot the inverse cumulative distribution function
\(G^{-1}(x)\) against \(F^{-1}(x)\) for varying \(x\in[0,1]\). Therefore the
domain and range of the plot are the support of the cumulative distribution
functions \(F(x)\) and \(G(x)\). If we consider the data are empirical, we can
see this as the plot of the order statistic of \(G(x)\) against that of
\(F(x)\).</p>
<h2 id="tools">Tools</h2>
<p>In Python, it is a surprise that matplotlib does not support making PP-plot
nor QQ-plot out of the box. However, it should not be difficult to see that
we can make use of the order statistics to do the QQ-plot:</p>
<pre><code class="language-python">import numpy as np
import matplotlib.pyplot as plt
N = 1000 # number of samples
rv_norm = np.random.randn(N) * 2 + 1 # normal with mean 1 s.d. 2
rv_uni = np.random.rand(N) * 8 - 4 # uniform in [-4,4]
plt.scatter(np.sort(rv_uni). np.sort(rv_norm), alpha=0.2, s=2)
</code></pre>
<p><img src="/img/qqplot-01.png" alt="QQplot" /></p>
<p>PP-plot is a bit more complicated. We need to use interpolation function to
achieve that. The idea is that, the CDF of an empirical distribution can be
constructed using <code>np.sort(rv_norm)</code> and <code>np.linspace(0,1,N)</code>. Then with the
other distribution, we can look up the value of the CDF by interpolation using
<code>np.interp(x0, x, y)</code>, which is to return \(y_0 = f(x_0)\) from the provided
curve \(y=f(x)\):</p>
<pre><code class="language-python">plt.scatter(np.linspace(0,1,N), np.interp(np.sort(rv_uni), np.sort(rv_norm), np.linspace(0,1,N)), alpha=0.2, s=2)
</code></pre>
<p><img src="/img/qqplot-02.png" alt="PPplot" /></p>
<p>However, there is a fancier tool in Python to do this. <code>statsmodels</code> has
functions <code>qqplot()</code> and <code>qqplot_2samples()</code> for doing QQ-plot of one empirical
against theoretical normal distribution, and between two empirical
distributions, respectively. But it is just a wrapper for the more generic
<code>ProbPlot</code> object. For example, this is how we can do the same as above:</p>
<pre><code class="language-python">import statsmodels.api as sm
_ = sm.ProbPlot(rv_uni) \
.qqplot(other=sm.ProbPlot(rv_norm), line="r", alpha=0.2, ms=2, lw=1)
plt.show()
_ = sm.ProbPlot(rv_uni) \
.ppplot(other=sm.ProbPlot(rv_norm), line="r", alpha=0.2, ms=2, lw=1)
plt.show()
</code></pre>
<p>its output will go with the regression line (<code>line="r"</code>):</p>
<p><img src="/img/qqplot-03.png" alt="QQplot from statsmodels" /></p>
<p><img src="/img/qqplot-04.png" alt="PPplot from statsmodels" /></p>
<h2 id="qq-plot-and-pp-plot-as-eda-tool">QQ-plot and PP-plot as EDA tool</h2>
<p>When we get a table of data the first time, we would like to get some insight
from it before further processing it. This is what the exploratory data
analysis is about. For a multidimensional data set, my favorite is to run a
correlogram to see how the data looks like, visually:</p>
<pre><code class="language-python">import seaborn as sns
sns.pairplot(df_data)
</code></pre>
<p><img src="/img/qqplot-05.png" alt="correlogram" /></p>
<p>This graph is generated using Seaborn, a wrapper for matplotlib. We can make
the graph prettier, for example, draw the regression line between each pair of
data and show the CDF empirically found by KDE instead of histogram:</p>
<pre><code class="language-python">sns.pairplot(df_data, diag_kind="kde", kind="reg",
plot_kws={'line_kws':{'color':'red', 'alpha':0.2}, 'scatter_kws': {'alpha': 0.2, 's':4}},
)
plt.show()
</code></pre>
<p><img src="/img/qqplot-06.png" alt="KDE correlogram" /></p>
<p>We call the same Seaborn function <code>pairplot()</code> with the <code>kind="reg"</code>
(regression) for off-diagonal charts and <code>diag_kind="kde"</code> for on-diagonal
charts. This tells you how correlated are any two series and the distribution
of each sample. In this graph, we do not see any particularly strong
correlation. So how about the series are independent but in the similar
distribution? This can be answered by a PP-plot of each pair. Unfortunately,
PP-plot and QQ-plot are not supported by Seaborn. Nevertheless we an add this.
Here is the patch file, we need only to modify <code>axisgrid.py</code> and <code>regression.py</code>:</p>
<pre><code class="language-diff">diff --git a/seaborn/axisgrid.py b/seaborn/axisgrid.py
index ba70553..7a9d836 100644
--- a/seaborn/axisgrid.py
+++ b/seaborn/axisgrid.py
@@ -1959,6 +1959,7 @@ def pairplot(
"""
# Avoid circular import
from .distributions import histplot, kdeplot
+ from .regression import qqplot, ppplot # Avoid circular import
# Handle deprecations
if size is not None:
@@ -1992,7 +1993,7 @@ def pairplot(
# Add the markers here as PairGrid has figured out how many levels of the
# hue variable are needed and we don't want to duplicate that process
if markers is not None:
- if kind == "reg":
+ if kind in ["reg", "pp", "qq"]:
# Needed until regplot supports style
if grid.hue_names is None:
n_markers = 1
@@ -2020,6 +2021,10 @@ def pairplot(
diag_kws.setdefault("fill", True)
diag_kws.setdefault("warn_singular", False)
grid.map_diag(kdeplot, **diag_kws)
+ elif diag_kind == "pp":
+ grid.map_diag(ppplot, **diag_kws)
+ elif diag_kind == "qq":
+ grid.map_diag(qqplot, **diag_kws)
# Maybe plot on the off-diagonals
if diag_kind is not None:
@@ -2030,6 +2035,10 @@ def pairplot(
if kind == "scatter":
from .relational import scatterplot # Avoid circular import
plotter(scatterplot, **plot_kws)
+ elif kind == "qq":
+ plotter(qqplot, **plot_kws)
+ elif kind == "pp":
+ plotter(ppplot, **plot_kws)
elif kind == "reg":
from .regression import regplot # Avoid circular import
plotter(regplot, **plot_kws)
diff --git a/seaborn/regression.py b/seaborn/regression.py
index ce21927..cc366d1 100644
--- a/seaborn/regression.py
+++ b/seaborn/regression.py
@@ -20,7 +20,7 @@ from .axisgrid import FacetGrid, _facet_docs
from ._decorators import _deprecate_positional_args
-__all__ = ["lmplot", "regplot", "residplot"]
+__all__ = ["lmplot", "regplot", "residplot", "ppplot", "qqplot"]
class _LinearPlotter(object):
@@ -833,6 +833,91 @@ lmplot.__doc__ = dedent("""\
""").format(**_regression_docs)
+@_deprecate_positional_args
+def qqplot(
+ *,
+ x=None, y=None,
+ data=None,
+ x_estimator=None, x_bins=None, x_ci="ci",
+ scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None,
+ seed=None, order=1, logistic=False, lowess=False, robust=False,
+ logx=False, x_partial=None, y_partial=None,
+ truncate=True, dropna=True, x_jitter=None, y_jitter=None,
+ label=None, color=None, marker="o",
+ scatter_kws=None, line_kws=None, ax=None,
+ legend=None
+):
+
+ plotter = _RegressionPlotter(x, y, data, x_estimator, x_bins, x_ci,
+ scatter, fit_reg, ci, n_boot, units, seed,
+ order, logistic, lowess, robust, logx,
+ x_partial, y_partial, truncate, dropna,
+ x_jitter, y_jitter, color, label)
+
+ # Manipulate input data for plotting
+ if plotter.x is None:
+ err = "missing x or y in plot data"
+ raise ValueError(err)
+ if plotter.y is None:
+ # set it to normal distribution scaled according to x
+ from scipy.stats import norm
+ plotter.y = norm.ppf(np.linspace(0,1,len(plotter.x)+2)[1:-1])
+ plotter.y = plotter.y * plotter.x.std() + plotter.x.mean()
+ plotter.x = np.sort(plotter.x)
+ plotter.y = np.sort(plotter.y)
+
+ if ax is None:
+ ax = plt.gca()
+
+ scatter_kws = {} if scatter_kws is None else copy.copy(scatter_kws)
+ scatter_kws["marker"] = marker
+ line_kws = {} if line_kws is None else copy.copy(line_kws)
+ plotter.plot(ax, scatter_kws, line_kws)
+ return ax
+
+
+@_deprecate_positional_args
+def ppplot(
+ *,
+ x=None, y=None,
+ data=None,
+ x_estimator=None, x_bins=None, x_ci="ci",
+ scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None,
+ seed=None, order=1, logistic=False, lowess=False, robust=False,
+ logx=False, x_partial=None, y_partial=None,
+ truncate=True, dropna=True, x_jitter=None, y_jitter=None,
+ label=None, color=None, marker="o",
+ scatter_kws=None, line_kws=None, ax=None,
+ legend=None
+):
+
+ plotter = _RegressionPlotter(x, y, data, x_estimator, x_bins, x_ci,
+ scatter, fit_reg, ci, n_boot, units, seed,
+ order, logistic, lowess, robust, logx,
+ x_partial, y_partial, truncate, dropna,
+ x_jitter, y_jitter, color, label)
+
+ # Manipulate input data for plotting
+ if plotter.x is None:
+ err = "missing x in plot data"
+ raise ValueError(err)
+ if plotter.y is None:
+ # set it to normal distribution
+ from scipy.stats import norm
+ plotter.y = norm.ppf(np.linspace(0,1,len(plotter.x)+2)[1:-1])
+ linspace = np.linspace(0,1,len(plotter.x))
+ plotter.y = np.interp(np.sort(plotter.x), np.sort(plotter.y), linspace)
+ plotter.x = linspace
+ if plotter.fit_reg:
+ plotter.x_range = (0, 1)
+
+ if ax is None:
+ ax = plt.gca()
+
+ scatter_kws = {} if scatter_kws is None else copy.copy(scatter_kws)
+ scatter_kws["marker"] = marker
+ line_kws = {} if line_kws is None else copy.copy(line_kws)
+ plotter.plot(ax, scatter_kws, line_kws)
+ return ax
+
+
@_deprecate_positional_args
def regplot(
*,
</code></pre>
<p>The key changes are the functions <code>sns.ppplot()</code> and <code>sns.qqplot()</code> defined in
<code>regression.py</code>, which are modified from the function <code>regplot()</code>. The function
<code>regplot()</code> is to do a scatter plot, then make a regression line on top of it.
As we saw, PP-plot and QQ-plot are only the modified scatter plot. Therefore we
manipulate the data in the plotter using <code>np.sort()</code> and <code>np.interp()</code> before
we invoked its <code>plot()</code> function. At this point, these two functions can
compare <em>two</em> empirical distributions. However, in the <code>pairplot()</code>, the
diagonal charts are handled differently – be it a KDE plot or histogram plot.
We can indeed make the PP-plot and QQ-plot a single distribution plot by
plotting it against a theoretical normal distribution. The way we do it is to
generate one if there is not the second distribution (<code>y</code>): Using the inverse
normal CDF function <code>norm.ppf()</code> from scipy, we look for the evenly distributed
values from 0 to 1 (clipped the two ends as we know they will be infinite). In
case of QQ-plot, the data should be scaled according to the input data to
match the mean and standard deviation. The plot will be as follows:</p>
<pre><code class="language-python">sns2.pairplot(df_data, diag_kind="pp", kind="pp",
plot_kws={'line_kws':{'color':'red', 'alpha':0.2}, 'scatter_kws': {'alpha': 0.2, 's':4}},
diag_kws={'line_kws':{'color':'red', 'alpha':0.2}, 'scatter_kws': {'alpha': 0.2, 's':4}})
plt.show()
</code></pre>
<p><img src="/img/qqplot-07.png" alt="PPplot from seaborn" /></p>
<pre><code class="language-python">sns2.pairplot(df_data, diag_kind="qq", kind="qq",
plot_kws={'line_kws':{'color':'red', 'alpha':0.2}, 'scatter_kws': {'alpha': 0.2, 's':4}},
diag_kws={'line_kws':{'color':'red', 'alpha':0.2}, 'scatter_kws': {'alpha': 0.2, 's':4}})
plt.show()
</code></pre>
<p><img src="/img/qqplot-08.png" alt="QQplot from seaborn" /></p>
<p>Of course, if you just want one PP-plot, you can use <code>sns.ppplot()</code> directly.</p>Adrian S. Tamrighthandabacus@users.github.comBoth QQ-plot and PP-plot are called the probability plot, but they are different. These plots are intended to compare two distributions, usually at least one of them is empirical. It is to graphically tell how good the two distributions fit.Interpreting linear regression summary from statsmodels2021-07-16T11:54:36-04:002021-07-16T11:54:36-04:00https://www.adrian.idv.hk/statsmodels<p>The python package statsmodels has OLS functions to fit a linear regression
problem. How well the linear regression is fitted, or whether the data fits a
linear model, is often a question to be asked. The way to tell is to use some
statistics, which by default the OLS module produces a few in its summary.</p>
<p>This is an example of using statsmodels to fit a linear regression:</p>
<pre><code class="language-python">import statsmodels.api as sm
import numpy as np
import pandas as pd
X1 = np.random.rand(200)*3.1
X2 = np.random.rand(200)*4.1
X3 = np.random.rand(200)*5.9
X4 = np.random.rand(200)*2.6
X5 = np.random.rand(200)*5.3
Y0 = 0.58*X1 - 0.97*X2 + 0.93*X3 - 2.3
err = np.random.randn(200)
df = pd.DataFrame(dict(X1=X1, X2=X2, X3=X3, X4=X4, X5=X5, Y=Y0+err))
model = sm.OLS(df["Y"], sm.add_constant(df[["X1","X2","X3","X4","X5"]]), missing="drop").fit()
print(model.summary2())
</code></pre>
<p>We print the summary using <code>summary2()</code> function instead of <code>summary()</code>
function because it looks more compact, but the result should be the same. This is how the above looks like:</p>
<pre><code class="language-text"> Results: Ordinary least squares
=================================================================
Model: OLS Adj. R-squared: 0.799
Dependent Variable: Y AIC: 572.1603
Date: 2021-07-16 11:49 BIC: 591.9502
No. Observations: 200 Log-Likelihood: -280.08
Df Model: 5 F-statistic: 159.0
Df Residuals: 194 Prob (F-statistic): 1.27e-66
R-squared: 0.804 Scale: 0.99341
-------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
-------------------------------------------------------------------
const -2.2590 0.2889 -7.8187 0.0000 -2.8288 -1.6892
X1 0.6440 0.0848 7.5968 0.0000 0.4768 0.8112
X2 -0.9834 0.0595 -16.5186 0.0000 -1.1009 -0.8660
X3 0.8920 0.0445 20.0478 0.0000 0.8043 0.9798
X4 -0.0200 0.0921 -0.2167 0.8287 -0.2015 0.1616
X5 -0.0209 0.0465 -0.4486 0.6542 -0.1126 0.0709
-----------------------------------------------------------------
Omnibus: 0.319 Durbin-Watson: 1.825
Prob(Omnibus): 0.853 Jarque-Bera (JB): 0.471
Skew: 0.030 Prob(JB): 0.790
Kurtosis: 2.770 Condition No.: 22
=================================================================
</code></pre>
<p>Showing the names of the dependent and independent variables are supported if
the data are provided as pandas dataframe. We can see that the summary screen
above has three sections, and the elements in each are explained as follows:</p>
<p>First section: The statistics of the overall linear model. In a linear
regression of fitting \(y = \beta^T X + \epsilon\) using \(N\) data points with
\(p\) regressor and one regressand, the value \(\hat{y}_i\) as predicted by the
model, we have the RSS (residual sum of square) defined as
\(RSS=\sum_i (y_i-\hat{y}_i)^2\) and the ESS (explained sum of square) defined
as \(ESS = \sum_i (\hat{y}_i - \bar{y})^2\), and the total sum of square is
\(TSS=ESS+RSS=\sum_i(y_i-\bar{y})^2\). The items on the first section of the
summary are:</p>
<ul>
<li>No. Observations: The number of data points \(N\)</li>
<li>Df model: Number of parameters in the model \(p\)
<ul>
<li>statsmodels can take string-typed categorical variables in regression. In
that case, one-hot encoding would be used and the number of parameters will
be expanded by the number of categories in such variables</li>
</ul>
</li>
<li>Df residuals: Degree of freedom of the residuals, equals to \(N-p-1\)</li>
<li>R-squared: \(R^2 = 1-\dfrac{RSS}{TSS} = 1-\dfrac{\sum_i (y_i-\hat{y}_i)^2}{\sum_i (y_i-\bar{y})^2}\) as the coefficient of determination</li>
<li>adjusted R-squared: \(\bar{R}^2 = 1-\dfrac{RSS/df_e}{TSS/df_t}=1-(1-R^2)\dfrac{n-1}{n-p-1}\) where
\(df_t=N-1\) is the degrees of freedom of the estimate of the population
variance of the dependent variable, and \(df_e = n-k-1\) is the degrees of
freedom of the estimate of the underlying population error variance</li>
<li>Log-Likelihood: \(\log p(X|\mu,\Sigma)=\sum_{i=1}^N\log\mathcal{N}(e_i|\mu_i,\Sigma_i)\). Assumed
the model is correct, the log of the probability that the set of data is produced by the model</li>
<li>AIC: Akaike Information Criterion, \(-2\log L + kp\) with \(k=2\). It depends on
the log-likelihood \(\log L\) and estimates the relative distance between the
unknown true likelihood and the fitted likelihood. The lower the AIC means
the closer to the truth</li>
<li>BIC: Bayesian Information Criterion, \(-2\log L + kp\) with \(k=\log(N)\). Based
on a Bayesian set up and meansures the posterior probability of a model being
true. The lower the BIC means the closer to the truth
<ul>
<li>BIC penalizes the model complexity more heavily (usually \(\log N>2\)) than
AIC, hence AIC may prefer a bigger model compared to BIC</li>
<li>AIC is better in situations when false negatives are more misleading than a
false positive; BIC is better in situations when false positive is more
misleading than a false negative</li>
</ul>
</li>
<li>F-statistic and Prob (F-statsitic): The null hypothesis that all the
coefficients of regressors are zero, hence a high p-value means the model is
more significant</li>
<li>Scale: The scale factor of the covariance matrix, \(\dfrac{RSS}{N-p}\)</li>
</ul>
<p>The second section: Coefficients determined by the regression.</p>
<ul>
<li>Coef: Coefficient determined by OLS regression, it is solved analytically with \(\beta=(X^TX)^{-1}X^Ty\)</li>
<li>Std Err: Estimate of the standard deviation of the coefficient,
\(\hat\sigma^2_j = \hat\sigma^2[Q_{xx}^{-1}]_{jj}\) with \(Q_{xx}=X^TX\) and
\(\hat\sigma^2=\dfrac{\epsilon^T\epsilon}{N}\)</li>
<li>t: the t statistic, with the null hypothesis that this particular coefficient
is zero. It is used as a measurement of whether theh coefficient is
significant. A coefficient is significant if its magnitude is large with
small standard error</li>
<li>P>|t|: the p-value of the t test, i.e., the probability that the variable has
no effect on the dependent variable as the null hypothesis is true</li>
<li>0.025 and 0.975: The two boundaries of the coefficient at 95% confidence
interval, approximately mean value of the coefficient ±2 standard error</li>
</ul>
<p>The third section: Normality of the residuals. Linear regression is built based
on the assumption that \(\epsilon\) is normally distributed with zero mean.</p>
<ul>
<li>Omnibus: D’Agostino’s \(K^2\) test, based on skew and kurtosis. Perfect normality will produce 0</li>
<li>Prob(Ominbus): Probability that the residuals are normally distributed according to omnibus statistic</li>
<li>Skew: Skewness (symmetry) of the residual, 0 if perfect symmetry</li>
<li>Kurtosis: Peakiness of the residual (concentration around 0), higher kurtosis means fewer outliers. Normal distribution will gives 3 here</li>
<li>Durbin-Watson: Test for autocorrelation in the residuals or the homoscedasticity, i.e., whether the error are independent of each other and even throughout the data
<ul>
<li>if relative error is higher when the data points are higher, then the error is not even</li>
<li>ideal measure is between 1 and 2</li>
</ul>
</li>
<li>Jarque-Bera (JB) and Prob(JB): also a normality test using skewness and kurtosis, as an alternative way to omnibus statistic
<ul>
<li>we need JB and Omnibus mutually confirm with each other</li>
</ul>
</li>
<li>Condition no.: Measurement of sensitivity of the model compared to the size of changes in the data
<ul>
<li>multicollinearity (i.e., two independent variables are linearly related) has high condition number</li>
</ul>
</li>
</ul>
<p>Knowing what each of these elements measures, we can see how well the model
fits. Here we try to change the code to give a different summary:</p>
<p>If we use fewer regressor in the input, we should see a lowered AIC and BIC:</p>
<pre><code class="language-python">model = sm.OLS(df["Y"], sm.add_constant(df[["X1","X2","X3"]]), missing="drop").fit()
print(model.summary2())
</code></pre>
<p>Result as follows, which the AIC and BIC are lowered a bit due to lowered df
model (simpler model), but the \(R^2\) has not changed:</p>
<pre><code class="language-text"> Results: Ordinary least squares
=================================================================
Model: OLS Adj. R-squared: 0.801
Dependent Variable: Y AIC: 568.4052
Date: 2021-07-16 11:51 BIC: 581.5985
No. Observations: 200 Log-Likelihood: -280.20
Df Model: 3 F-statistic: 267.3
Df Residuals: 196 Prob (F-statistic): 5.35e-69
R-squared: 0.804 Scale: 0.98447
-------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
-------------------------------------------------------------------
const -2.3391 0.2294 -10.1962 0.0000 -2.7915 -1.8867
X1 0.6385 0.0836 7.6355 0.0000 0.4735 0.8034
X2 -0.9812 0.0591 -16.6130 0.0000 -1.0977 -0.8647
X3 0.8921 0.0443 20.1416 0.0000 0.8048 0.9795
-----------------------------------------------------------------
Omnibus: 0.378 Durbin-Watson: 1.826
Prob(Omnibus): 0.828 Jarque-Bera (JB): 0.526
Skew: 0.029 Prob(JB): 0.769
Kurtosis: 2.755 Condition No.: 14
=================================================================
</code></pre>
<p>Indeed if we check the p-value of the t test in the previous output, we can see
that they are high and the null hypothesis is not rejected for X4 and X5,
hinting that these two regressors should not be included in the model.</p>
<p>If we skew the error by taking its absolute value, the error distribution is no
longer normal:</p>
<pre><code class="language-python">df = pd.DataFrame(dict(X1=X1, X2=X2, X3=X3, X4=X4, X5=X5, Y=Y0+np.abs(err)))
model = sm.OLS(df["Y"], sm.add_constant(df[["X1","X2","X3","X4","X5"]]), missing="drop").fit()
print(model.summary2())
</code></pre>
<p>Result as follows. We see that the \(R^2\) is higher (because the range of error is smaller now)
but the test of normality in the residual has low p-value in both the omnibus
test and the Jarque-Bera statistic. Hence we concluded that the residual is not
normal. This is why the coefficients found deviated from the truth.</p>
<pre><code class="language-text"> Results: Ordinary least squares
==================================================================
Model: OLS Adj. R-squared: 0.922
Dependent Variable: Y AIC: 359.9204
Date: 2021-07-16 11:52 BIC: 379.7103
No. Observations: 200 Log-Likelihood: -173.96
Df Model: 5 F-statistic: 474.7
Df Residuals: 194 Prob (F-statistic): 1.02e-106
R-squared: 0.924 Scale: 0.34376
--------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
--------------------------------------------------------------------
const -1.2735 0.1700 -7.4931 0.0000 -1.6087 -0.9383
X1 0.4774 0.0499 9.5733 0.0000 0.3790 0.5757
X2 -1.0152 0.0350 -28.9883 0.0000 -1.0843 -0.9461
X3 0.9284 0.0262 35.4709 0.0000 0.8768 0.9801
X4 -0.0195 0.0542 -0.3606 0.7188 -0.1264 0.0873
X5 0.0183 0.0274 0.6691 0.5042 -0.0357 0.0723
------------------------------------------------------------------
Omnibus: 21.305 Durbin-Watson: 2.091
Prob(Omnibus): 0.000 Jarque-Bera (JB): 24.991
Skew: 0.854 Prob(JB): 0.000
Kurtosis: 3.291 Condition No.: 22
==================================================================
</code></pre>
<p>If we introduce multilinearity, statsmodels will produce a vastly large
conditon number and warn us about the result:</p>
<pre><code class="language-python">df = pd.DataFrame(dict(X1=X1, X2=X2, X3=X3, X4=X2-2*X3, X5=X1+0.5*X2, Y=Y0+(Y0**2)*err))
model = sm.OLS(df["Y"], sm.add_constant(df[["X1","X2","X3","X4","X5"]]), missing="drop").fit()
print(model.summary2())
</code></pre>
<p>with the result as follows, we can see that all coefficients are significant
according to the p-value of t test but indeed only the first 3 are independent.
The condition number suggested that these set of coefficient is not stable.</p>
<pre><code class="language-text"> Results: Ordinary least squares
=================================================================
Model: OLS Adj. R-squared: 0.801
Dependent Variable: Y AIC: 568.4052
Date: 2021-07-16 13:07 BIC: 581.5985
No. Observations: 200 Log-Likelihood: -280.20
Df Model: 3 F-statistic: 267.3
Df Residuals: 196 Prob (F-statistic): 5.35e-69
R-squared: 0.804 Scale: 0.98447
-------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
-------------------------------------------------------------------
const -2.3391 0.2294 -10.1962 0.0000 -2.7915 -1.8867
X1 0.4671 0.0473 9.8842 0.0000 0.3739 0.5603
X2 -0.5917 0.0498 -11.8909 0.0000 -0.6898 -0.4935
X3 -0.0582 0.0243 -2.3936 0.0176 -0.1062 -0.0103
X4 -0.4752 0.0172 -27.6363 0.0000 -0.5091 -0.4413
X5 0.1713 0.0396 4.3213 0.0000 0.0931 0.2495
-----------------------------------------------------------------
Omnibus: 0.378 Durbin-Watson: 1.826
Prob(Omnibus): 0.828 Jarque-Bera (JB): 0.526
Skew: 0.029 Prob(JB): 0.769
Kurtosis: 2.755 Condition No.: 24475138936904036
=================================================================
* The condition number is large (2e+16). This might indicate
strong multicollinearity or other numerical problems.
</code></pre>
<p>We can also create heteroscedasticity by making residual larger when the regressand is small:</p>
<pre><code class="language-python">df = pd.DataFrame(dict(X1=X1, X2=X2, X3=X3, X4=X4, X5=X5, Y=Y0+err/Y0))
model = sm.OLS(df["Y"], sm.add_constant(df[["X1","X2","X3","X4","X5"]]), missing="drop").fit()
print(model.summary2())
</code></pre>
<p>The result as follows, which we can see the Durbin-Watson statistic is larger
than 2, and as a result, the residual is not normally distributed as well:</p>
<pre><code class="language-text"> Results: Ordinary least squares
==================================================================
Model: OLS Adj. R-squared: 0.074
Dependent Variable: Y AIC: 1330.7666
Date: 2021-07-16 13:16 BIC: 1350.5565
No. Observations: 200 Log-Likelihood: -659.38
Df Model: 5 F-statistic: 4.177
Df Residuals: 194 Prob (F-statistic): 0.00126
R-squared: 0.097 Scale: 44.098
--------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
--------------------------------------------------------------------
const -1.5268 1.9250 -0.7932 0.4287 -5.3235 2.2698
X1 1.2981 0.5648 2.2983 0.0226 0.1841 2.4120
X2 -1.0072 0.3967 -2.5393 0.0119 -1.7896 -0.2249
X3 0.7941 0.2965 2.6786 0.0080 0.2094 1.3788
X4 -0.3668 0.6134 -0.5979 0.5506 -1.5766 0.8431
X5 -0.2874 0.3100 -0.9271 0.3550 -0.8987 0.3240
------------------------------------------------------------------
Omnibus: 147.586 Durbin-Watson: 2.232
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9060.224
Skew: 2.033 Prob(JB): 0.000
Kurtosis: 35.721 Condition No.: 22
==================================================================
</code></pre>
<p>We can also do a nonlinear model:</p>
<pre><code class="language-python">Y0 = 0.58*X1 - 0.97*X2 + 0.93*X3**2 - 2.3
df = pd.DataFrame(dict(X1=X1, X2=X2, X3=X3, X4=X4, X5=X5, Y=Y0+err))
model = sm.OLS(df["Y"], sm.add_constant(df[["X1","X2","X3","X4","X5"]]), missing="drop").fit()
print(model.summary2())
</code></pre>
<p>which we take the squared of X3 and the result is as follows. Because of the
nonlinear model, the residual is no longer normally distributed. The \(R^2\)
here is larger than before. Hence we should be cautious not to merely select
a model based on the coefficient of determination.</p>
<pre><code class="language-text"> Results: Ordinary least squares
==================================================================
Model: OLS Adj. R-squared: 0.930
Dependent Variable: Y AIC: 926.7164
Date: 2021-07-16 13:31 BIC: 946.5063
No. Observations: 200 Log-Likelihood: -457.36
Df Model: 5 F-statistic: 532.4
Df Residuals: 194 Prob (F-statistic): 3.37e-111
R-squared: 0.932 Scale: 5.8484
--------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
--------------------------------------------------------------------
const -7.9247 0.7010 -11.3043 0.0000 -9.3074 -6.5421
X1 0.5560 0.2057 2.7031 0.0075 0.1503 0.9616
X2 -1.0398 0.1445 -7.1978 0.0000 -1.3247 -0.7549
X3 5.4317 0.1080 50.3107 0.0000 5.2187 5.6446
X4 0.2395 0.2234 1.0720 0.2850 -0.2011 0.6801
X5 -0.0700 0.1129 -0.6198 0.5361 -0.2926 0.1527
------------------------------------------------------------------
Omnibus: 12.714 Durbin-Watson: 1.895
Prob(Omnibus): 0.002 Jarque-Bera (JB): 13.907
Skew: 0.631 Prob(JB): 0.001
Kurtosis: 2.727 Condition No.: 22
==================================================================
</code></pre>
<p>Finally, we can try to use the error as the regressand and see the F statistic
became low (or its p-value became high):</p>
<pre><code class="language-python">df = pd.DataFrame(dict(X1=X1, X2=X2, X3=X3, X4=X4, X5=X5, Y=err))
model = sm.OLS(df["Y"], sm.add_constant(df[["X1","X2","X3","X4","X5"]]), missing="drop").fit()
print(model.summary2())
</code></pre>
<p>result:</p>
<pre><code class="language-text"> Results: Ordinary least squares
=================================================================
Model: OLS Adj. R-squared: -0.018
Dependent Variable: Y AIC: 572.1603
Date: 2021-07-16 13:36 BIC: 591.9502
No. Observations: 200 Log-Likelihood: -280.08
Df Model: 5 F-statistic: 0.2807
Df Residuals: 194 Prob (F-statistic): 0.923
R-squared: 0.007 Scale: 0.99341
-------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
-------------------------------------------------------------------
const 0.0410 0.2889 0.1419 0.8873 -0.5288 0.6108
X1 0.0640 0.0848 0.7547 0.4513 -0.1032 0.2312
X2 -0.0134 0.0595 -0.2257 0.8217 -0.1309 0.1040
X3 -0.0380 0.0445 -0.8531 0.3947 -0.1257 0.0498
X4 -0.0200 0.0921 -0.2167 0.8287 -0.2015 0.1616
X5 -0.0209 0.0465 -0.4486 0.6542 -0.1126 0.0709
-----------------------------------------------------------------
Omnibus: 0.319 Durbin-Watson: 1.825
Prob(Omnibus): 0.853 Jarque-Bera (JB): 0.471
Skew: 0.030 Prob(JB): 0.790
Kurtosis: 2.770 Condition No.: 22
=================================================================
</code></pre>Adrian S. Tamrighthandabacus@users.github.comThe python package statsmodels has OLS functions to fit a linear regression problem. How well the linear regression is fitted, or whether the data fits a linear model, is often a question to be asked. The way to tell is to use some statistics, which by default the OLS module produces a few in its summary.Bokeh, interactive widgets, and jupyterlab2021-07-13T21:37:24-04:002021-07-13T21:37:24-04:00https://www.adrian.idv.hk/jupyter<p>Jupyter notebooks and visualization are natural marriage. It is more fun if we
can skew this or that a bit by turning a knob or selecting something from a
drop down. This is where so called <em>interactive widgets</em> come to play. There
are a lot of examples on how to set up a widget and control the matplotlib
chart interactively. Doing so in jupyterlab, however, is not so straightforward.</p>
<h2 id="matplotlib-and-the-widgets">matplotlib and the widgets</h2>
<p>Jupyter notebook widgets are just come control elements for user interaction.
They receive user input and trigger events, which then can invoke some
function. To use widgets to control matplotlib graphics, we have to understand
what are the matplotlib backends.</p>
<pre><code class="language-text">%matplotlib --list
</code></pre>
<p>This, if run in jupyter, will list out all backends. In my case,</p>
<pre><code class="language-text">Available matplotlib backends: ['tk', 'gtk', 'gtk3', 'wx', 'qt4', 'qt5', 'qt',
'osx', 'nbagg', 'notebook', 'agg', 'svg', 'pdf', 'ps', 'inline', 'ipympl',
'widget']
</code></pre>
<p>Amongst them the <code>inline</code> backend is the dumbest. Which just render the plot
and make it read-only. Therefore, no update is allowed on the chart, but you
can always clear it and redraw. The <code>notebook</code> backend makes the matplotlib
output aware of the Jupyter environment and the charts can be updated. The
<code>widget</code> and <code>ipympl</code> backend are similar to that of <code>notebook</code>, but fancier.
They make the matplotlib output as a widget that you can pan or zoom. Using
matplotlib with different backend requires the interactive widgets to be
configured differently.</p>
<p>The widgets on jupyter is from the module
<a href="https://ipywidgets.readthedocs.io/en/latest/">ipywidgets</a>. The simplest
example (without graphics!) is as follows:</p>
<pre><code class="language-python">from ipywidgets import interact
def f(x):
return x**2
interact(f, x=10.0);
</code></pre>
<p>Running this in a jupyter notebook will give you a slider and a row of text (for printing the output of the function):</p>
<p><img src="/img/jupyter-01.png" alt="" /></p>
<p>a equivalent way of the above would be the following snippet, which use <code>interact()</code> as a decorator:</p>
<pre><code class="language-python">from ipywidgets import interact, widgets
@interact(x=widgets.FloatSlider(min=-10, max=30, step=0.1, value=10))
def f(x):
return x**2
</code></pre>
<p>The <code>interact()</code> decorator accepts keyword arguments that match with the
function. You may create the widget explicitly and assign to the keyword
argument, or in a short form, you can also simply provide a value and let
<code>interact()</code> infer the widget. If the argument is:</p>
<ul>
<li>a boolean (<code>True</code> or <code>False</code>), a checkbox widget is provided (<code>widgets.Checkbox</code>)</li>
<li>an integer or a float: a slider widget is provided (<code>widgets.IntSlider</code> or <code>widgets.FloatSlider</code>)</li>
<li>a string: a textbox widget is provided (<code>widget.Text</code>)</li>
<li>a list of strings: a dropdown widget is provided (<code>widget.Dropdown</code>)</li>
</ul>
<p>Full list of available widgets and their configuration can be found in
<a href="https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20List.html">ipywidget documentation</a></p>
<p>The way to connect the ipywidgets to matplotlib is as follows.</p>
<p>Let us try to plot a sine curve with different angular frequency, phrase, and
amplitude. If we use the <code>inline</code> backend, this is the code:</p>
<pre><code class="language-python">import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interact
%matplotlib inline
x = np.linspace(0, 5*np.pi, 500)
@interact(w=(0, 10, .1), amp=(-4, 4, .1), phi=(0, 2*np.pi, 0.1))
def plot(w=1.0, amp=1, phi=0):
y = amp*np.sin(w*x-phi)
plt.plot(x,y)
plt.ylim([-4,4])
plt.show()
</code></pre>
<p><img src="/img/jupyter-02.png" alt="" /></p>
<p>The function <code>plot()</code> uses keyword arguments that matched the <code>interact()</code>
function. The tuple notations are just another way to specify a slider (in
terms off min, max, and step). In the function, it simply replot the figure
everytime using the value provided by the slider. This works in <code>inline</code>
backend because all pictures are static.</p>
<p>If we use the <code>widget</code> backend or <code>ipympl</code> backend instead, we do this:</p>
<pre><code class="language-python">%matplotlib widget
fig, ax = plt.subplots(figsize=(6, 4))
ax.set_ylim([-4, 4])
ax.grid(True)
# fix x values
x = np.linspace(0, 5*np.pi, 500)
ax.scatter(x[::20], np.cos(x)[::20], color='r', alpha=0.5)
@interact(w=(0, 10, .1), amp=(-4, 4, .1), phi=(0, 2*np.pi, 0.1))
def update(w=1.0, amp=1, phi=0):
"""Remove old lines from plot and plot new one"""
for l in ax.lines:
l.remove()
ax.plot(x, amp*np.sin(w*x-phi), color='C0')
</code></pre>
<p><img src="/img/jupyter-03.png" alt="" /></p>
<p>This is different from the previous in the sense that we do not call
<code>plt.show()</code> but simply remove the plot line and draw a new one. Note that when
we do the removal, the scatter plot is not removed because it is not part of
the <code>ax.lines</code>. Also, we do not need to redraw other elements of the plot. The
figure about also shows the icons on the left, which are part of the <code>widget</code>
or <code>ipympl</code> backend that allows us to pan and zoom.</p>
<p>There is yet another way to do the same, which the line is not even removed but
simply updated:</p>
<pre><code class="language-python">%matplotlib notebook
fig, ax = plt.subplots(figsize=(6, 4))
ax.set_ylim([-4, 4])
ax.grid(True)
# fix x values, and create line plot object
x = np.linspace(0, 5*np.pi, 500)
line, = ax.plot(x,np.sin(x))
@interact(w=(0, 10, .1), amp=(-4, 4, .1), phi=(0, 2*np.pi, 0.1))
def plot(w=1.0, amp=1, phi=0):
line.set_ydata(amp*np.sin(w*x-phi))
#fig.canvas.draw()
</code></pre>
<p><img src="/img/jupyter-04.png" alt="" /></p>
<p>This code works only on jupyter-notebooks but not jupyterlab, for the
<code>notebook</code> backend is used. What it does is to create a line as a object from
<code>ax.plot()</code> and then when the widgets are updated, the data of the line are
updated using <code>line.set_ydata()</code>. Usually you may see the examples elsewhere
would invoke <code>fig.canvas.draw()</code> after the <code>set_ydata()</code> function so the
changes are applied. But I found that is unnecessary.</p>
<p>If we use seaborn, the code is mostly the same since it is just a wrapper to
matplotlib. The exception is the line object in the <code>notebook</code> backend example
above as <code>seaborn.lineplot()</code> will return you the axis, not the line object.</p>
<h2 id="bokeh-and-ipywidget">Bokeh and ipywidget</h2>
<p>This is a similar example in Bokeh</p>
<pre><code class="language-python">from bokeh.io import output_notebook, push_notebook
output_notebook()
from bokeh.layouts import column, row
from bokeh.models import Slider, Span, Range1d
from bokeh.plotting import figure, show
from bokeh.palettes import cividis
from ipywidgets import interact, interactive, widgets
plot = figure(plot_width=800, plot_height=400)
x = np.linspace(0, 5*np.pi, 500)
color = cividis(5)
sine = plot.line(x, np.sin(x), line_width=1, alpha=0.8, line_color=color[0], legend_label="sin")
cosine = plot.line(x, np.cos(x), line_width=1, alpha=0.8, line_color=color[3], legend_label="cos")
vline = Span(location=0, dimension="height", line_color=color[2], line_width=3, line_alpha=0.5)
hline = Span(location=0, dimension="width", line_color=color[2], line_width=3, line_alpha=0.5)
plot.add_layout(vline)
plot.add_layout(hline)
plot.title.text = "Sine and cosine"
plot.legend.click_policy = "hide"
plot.legend.location = "top_left"
plot.xaxis.axis_label = "x"
plot.yaxis.axis_label = "y"
plot.y_range = Range1d(-4, 4)
handle = show(plot, notebook_handle=True)
# Slider: Using ipython widgets slider instead of Bokeh slider
@interact(w=widgets.FloatSlider(min=-10, max=10, value=1),
amp=widgets.FloatSlider(min=-5, max=5, value=1),
phi=widgets.FloatSlider(min=-4, max=4, value=0))
def update(w=1.0, amp=1, phi=0):
sine.data_source.data["y"] = amp*np.sin(w*x-phi)
cosine.data_source.data["y"] = amp*np.cos(w*x-phi)
vline.location = phi
hline.location = amp*np.sin(-phi)
push_notebook(handle=handle)
</code></pre>
<p><img src="/img/jupyter-05.png" alt="" /></p>
<p>The logic is similar to the case of <code>notebook</code> backend for matplotlib but this
works for both jupyter-notebooks and jupyterlab. Bokeh allows to change the
data of the data source but the x and y dimension must be consistent. If we
change the curve entirely, we can either use <code>data_source.data.update(x=x, y=y)</code>
to do the update in one shot, or reassign the data with <code>data_source.data =
newdata</code>. What necessary in using Bokeh interactively are</p>
<ul>
<li>after we set up the figure, we show it with <code>show(plot, notebook_handle=True)</code> and remember the handle</li>
<li>in the update function, after we update the data, we need to invoke
<code>push_notebook(handle=handle)</code> to refresh the figure as pointed by the handle</li>
</ul>
<p>The handle is not necessarily for one figure. Other widgets or multiple figures
can be shown using the same notebook handle. The <code>push_notebook()</code> call is to
make the handle refresh itself as some underlying data is known to be changed.</p>
<p>Bokeh indeed goes with its own slider widget but it will not work in the
notebook because it is purely Javascript. Unless we can do the interactive
update in Javascript (e.g., all data are loaded, and the updated value can be
computed using Javascript), it will not get the job ddone. The other use of
Bokeh widgets is when we have a Bokeh server, which the widget will get the
data updated via a web request. If we use the Bokeh slider anyway, we will get
an error message:</p>
<pre><code class="language-python">def update(w=1.0, amp=1, phi=0):
sine.data_source.data["y"] = amp*np.sin(w*x-phi)
cosine.data_source.data["y"] = amp*np.cos(w*x-phi)
vline.location = phi
hline.location = amp*np.sin(-phi)
push_notebook(handle=handle)
# Bokeh sliders
slider_w = Slider(start=-10, end=10, value=1, step=0.1, title="frequency")
slider_amp = Slider(start=-5, end=5, value=1, step=0.1, title="amplitude")
slider_phi = Slider(start=-4, end=4, value=0, step=0.1, title="phrase")
def slider_change(attr, old, new):
update(slider_w.value, slider_amp.value, slider_phi.value)
slider_w.on_change('value', slider_change)
slider_amp.on_change('value', slider_change)
slider_phi.on_change('value', slider_change)
handle = show(column(plot, slider_w, slider_amp, slider_phi), notebook_handle=True)
</code></pre>
<p>This will be shown on the notebook with the widgets impotent.</p>
<pre><code class="language-text">WARNING:bokeh.embed.util:
You are generating standalone HTML/JS output, but trying to use real Python
callbacks (i.e. with on_change or on_event). This combination cannot work.
Only JavaScript callbacks may be used with standalone output. For more
information on JavaScript callbacks with Bokeh, see:
https://docs.bokeh.org/en/latest/docs/user_guide/interaction/callbacks.html
Alternatively, to use real Python callbacks, a Bokeh server application may
be used. For more information on building and running Bokeh applications, see:
https://docs.bokeh.org/en/latest/docs/user_guide/server.html
</code></pre>
<h2 id="jupyterlab">Jupyterlab</h2>
<p>Because of the different design, the jupyter notebook is way easier to set up
the interactive widgets. Your installation should include <code>ipywidgets</code> and
<code>widgetsnbextension</code> (which the latter should be automatically installed by the
former). To get the ipywidgets working in jupyterlab, after these python
modules are installed, you still need to install node.js (<code>brew install
nodejs</code>) and then run the following command</p>
<pre><code>jupyter labextension install @jupyter-widgets/jupyterlab-manager
</code></pre>
<p>After this, a restart of jupyterlab will make it work.</p>Adrian S. Tamrighthandabacus@users.github.comJupyter notebooks and visualization are natural marriage. It is more fun if we can skew this or that a bit by turning a knob or selecting something from a drop down. This is where so called interactive widgets come to play. There are a lot of examples on how to set up a widget and control the matplotlib chart interactively. Doing so in jupyterlab, however, is not so straightforward.Lagrangians and Portfolio Optimization2021-06-22T12:04:14-04:002021-06-22T12:04:14-04:00https://www.adrian.idv.hk/kkt<p>A portfolio optimization problem in Markowitz style looks like the following</p>
\[\begin{aligned}
\min && f(w) &= \frac12 w^T\Sigma w\\
\textrm{subject to} && w^Tr &= R \\
&& w^T e &= 1 \\
&& w & \succeq b_L \\
&& w & \preceq b_U
\end{aligned}\]
<p>which the last two are to bound the weight of each asset in the portfolio. This
is a nicely formulated optimization problem and one way to analytically solve
it is to use Lagrange multipliers.</p>
<h2 id="shadow-price">Shadow price</h2>
<p>Assume we do not have the last two inequality constraints, the Lagrangian for
the above problem would be</p>
\[L(w,\lambda) = \frac12w^T\Sigma w - \lambda_1(w^Tr-R) - \lambda_2(w^Te-1)\]
<p>The Lagrangian has the property that for the optimal solution \(w^*\) to the
original problem, \(L(w^*,\lambda) = f(w^*)\), namely the Lagrangian function
attained the same value as the objective function. This is trivial as we know
that the optimizer must satisfy the equality constraints and hence the two
extra terms in the Lagrangian always reduced to zero.</p>
<p>Mathematically, we can also make the Lagrangian having two Lagrange multiplier
terms added to the objective function instead of subtraction. In doing so, we
reversed our solution to \(\lambda_1\) and \(\lambda_2\) above. But if we
consider that</p>
\[\frac{\partial L(w^*,\lambda)}{\partial R} = \lambda_1\]
<p>we see that it bears a physical meaning for subtraction, i.e., \(\lambda_1\)
indicates how much it changes to the objective function if we marginally
increased the boundary value \(R\) on the constraint. In this particular
equality constraint, we are pushing the expected portfolio return \(R\) to a
higher level and \(\lambda_1\) is the amount of variance increased. Hence the
Lagrange multiplier \(\lambda_1\) is called the <em>shadow price</em> for the return
\(R\).</p>
<h2 id="inequality-constraints-and-activeness">Inequality constraints and activeness</h2>
<p>A similar Lagrangian can be created when there are inequality constraints, but
their Lagrange multiplier is no longer arbitrary:</p>
\[L(w, \lambda, \theta, \phi)
=\frac12 w^T\Sigma w
-\lambda_1(w^Tr-R) - \lambda_2(w^Te-1)
-\theta^T(w-b_L) + \phi^T(w-b_U)\]
<p>The way to think of what sign should a Lagrange multiplier carry is to consider
the dual. As we are doing a minimization here, the dual is a maximization
problem, namely,</p>
\[g(\lambda,\theta,\phi) = \inf_w L(w,\lambda,\theta,\phi)\]
<p>and according to the max-min inequality we have the weak-duality property</p>
\[\sup_{\lambda,\theta,\phi}\inf_w L(w,\lambda,\theta,\phi)
\le
\inf_w \sup_{\lambda,\theta,\phi}L(w,\lambda,\theta,\phi)\]
<p>and the equality holds if we have strong duality. The RHS is the solution to
the optimization problem and the LHS is the dual problem. Therefore the dual
must be less than the optimal solution in the original problem</p>
\[g(\lambda,\theta,\phi) \le \inf_w\sup_{\lambda,\theta,\phi} L(w,\lambda,\theta,\phi)\]
<p>If we consider the Lagrange multipliers associated with inequality constraints
\(\theta\) and \(\phi\) to be positive (there is no restriction for equality
constraints), we must augment the objective function into
\(L(w,\lambda,\theta,\phi)\) with negative values. Hence for \(w-b_L\succeq 0\),
we augment it with \(-\theta(w-b_L)\), and for \(w-b_U\preceq 0\), we augment
it with \(+\phi(w-b_U)\).</p>
<p>Why we need it in this way? Let us denote the feasible domain as \(\mathcal{D}\)
and the optimal solution to the problem as \(w^*\in\mathcal{D}\). An inequality
constraint at \(w^*\) shall either has the equality holds (which we call the
constraint is <em>active</em>) or not (<em>inactive</em>). An constraint is inactive iff its
removal does not change the optimal solution. The boundary of \(\mathcal{D}\)
is defined by the constraints as if they are active (which the equality
constraints can be assumed always active).</p>
<p>The solution \(w^*\) is a point on this boundary. As we are studying a
minimization problem, \(f(w^*)\) is increasing into \(\mathcal{D}\) and
decreasing away from \(\mathcal{D}\). Similarly, if \(w-b_L\succeq 0\) is
active, \(w^*-b_L=0\) and it is increasing into \(\mathcal{D}\) and decreasing
away from it (and similarly for \(w-b_U\preceq 0\)). In summary, we have</p>
<table>
<thead>
<tr>
<th> </th>
<th>\(w^*+\delta\in\mathcal{D}\)</th>
<th>\(w^*+\delta \notin\mathcal{D}\)</th>
</tr>
</thead>
<tbody>
<tr>
<td>\(f(w^*)\)</td>
<td>\(f(w^*+\delta)\ge f(w^*)\)</td>
<td>\(f(w^*+\delta)\le f(w^*)\)</td>
</tr>
<tr>
<td>\(w^*-b_L = 0\)</td>
<td>\(w^*+\delta-b_L\succeq 0\)</td>
<td>\(w^*+\delta-b_L\preceq 0\)</td>
</tr>
<tr>
<td>\(w^*-b_U = 0\)</td>
<td>\(w^*+\delta-b_U\preceq 0\)</td>
<td>\(w^*+\delta-b_U\succeq 0\)</td>
</tr>
</tbody>
</table>
<p>and we need to make
\(L(w^*+\delta,\lambda,\theta,\phi)\ge L(w^*,\lambda,\theta,\phi)\)
for \(w^*+\delta\in\mathcal{D}\) so that we can find the optimal solution
\(w^* = \arg\min L(w,\lambda,\theta,\phi)\) as a saddle point.</p>
<h2 id="karush-kuhn-tucker-conditions">Karush-Kuhn-Tucker conditions</h2>
<p>The KKT conditions state that</p>
<ol>
<li>\(\nabla L(w^*,\lambda,\theta,\phi)=0\) at the optimal solution \(w^*\)</li>
<li>Primal constraints are satisfied for \(w^*\)</li>
<li>Dual constraints \(\theta\ge 0\) and \(\phi\ge 0\) are satisfied, i.e. the
Lagrange multipliers for inequality constraints are non-negative</li>
<li>Complementary slackness: \(\theta\odot(w^*-b_L)=0\) and
\(\phi\odot(w^*-b_U)=0\), i.e., the Lagrange multiplier will be zero if the
corresponding inequality constraint is inactive</li>
</ol>
<h2 id="solution">Solution</h2>
<p>We can use the KKT conditions to solve for the above optimization problem. Since</p>
\[L(w, \lambda, \theta, \phi)
=\frac12 w^T\Sigma w
-\lambda_1(w^Tr-R) - \lambda_2(w^Te-1)
-\theta^T(w-b_L) + \phi^T(w-b_U)\]
<p>The first condition states that</p>
\[\nabla_w L(w, \lambda, \theta, \phi)
= \Sigma w
-\lambda_1r - \lambda_2 e
-\theta + \phi = 0\]
<p>the second condition states that</p>
\[\begin{aligned}
w^Tr - R = -\nabla_{\lambda_1} L(w, \lambda, \theta, \phi) &=0 \\
w^Te - 1 = -\nabla_{\lambda_2} L(w, \lambda, \theta, \phi) &=0 \\
w - b_L & \succeq 0 \\
w - b_U & \preceq 0
\end{aligned}\]
<p>the third condition states that</p>
\[\theta \ge 0;\qquad\phi \ge 0\]
<p>and the fourth condition states that</p>
\[\theta\odot(w-b_L)=0;\qquad\phi\odot(w-b_U)=0.\]
<p>Assume \(w\) is a vector of \(n\) elements, we have \(3n+2\) unknowns
(\(w,\theta,\phi\) are \(n\)-vectors and \(\lambda\) is 2-vector),
\(n+2+0+2n=3n+2\) equalities from the four conditions, and \(0+2n+2n+0=4n\)
inequalities. This should sufficient to provide a solution, but note that the
equations from fourth condition are nonlinear as it includes \(\theta\odot w\)
and \(\phi\odot w\) terms. To make it a system of linear equations, we can
consider various combination of activeness of inequality constraints to
simplify it. It would be tremendously easier if none of the inequality
constraints are active (e.g., when \(b_L=-\infty\) and \(b_U=\infty\), which
for sure \(\theta=\phi=\mathbf{0}\) based on the complementart slackness), in
this case we have</p>
\[\begin{aligned}
\Sigma w - \lambda_1r-\lambda_2e &=0 \\
w &= \Sigma^{-1}(\lambda_1r+\lambda_2e) \\
&= \lambda_1\Sigma^{-1}r+\lambda_2\Sigma^{-1}e
\end{aligned}\]
<p>substitute:</p>
\[\begin{aligned}
w^Tr - R &=
\lambda_1 r^T\Sigma^{-1}r + \lambda_2 e^T\Sigma^{-1}r - R = 0 \\
w^Te - 1 &= \lambda_1r^T\Sigma^{-1}e+\lambda_2e^T\Sigma^{-1}e - 1 = 0
\end{aligned}\]
<p>therefore</p>
\[\begin{aligned}
\begin{bmatrix}r^T\Sigma^{-1}e & r^T\Sigma^{-1}e\\ r^T\Sigma^{-1}e & e^T\Sigma^{-1}e\end{bmatrix}
\begin{bmatrix}\lambda_1\\ \lambda_2\end{bmatrix} &=
\begin{bmatrix}R\\ 1\end{bmatrix} \\
\begin{bmatrix}\lambda_1\\ \lambda_2\end{bmatrix} &=
\begin{bmatrix}r^T\Sigma^{-1}e & r^T\Sigma^{-1}e\\ r^T\Sigma^{-1}e & e^T\Sigma^{-1}e\end{bmatrix}^{-1}\begin{bmatrix}R\\ 1\end{bmatrix}
\end{aligned}\]
<p>and substitute back to above for \(w^*\). But the solution under this condition
must not violate the second conditions, namely, \(b_L \preceq w^* \preceq b_U\).
In fact we can also solve for both \(w\) and \(\lambda\) together in a matrix
form equation of</p>
\[\begin{aligned}
\Sigma w -\lambda_1 r - \lambda_2 e &= 0 \\
w^Tr - R &=0 \\
w^Te - 1 &=0 \\
\implies\quad
\begin{bmatrix}\Sigma & r & e\\ r^T & 0 & 0\\ e^T & 0 & 0\end{bmatrix}
\begin{bmatrix}w\\ -\lambda_1\\ -\lambda_2\end{bmatrix} &= \begin{bmatrix}0\\ R\\ 1\end{bmatrix} \\
\therefore\quad
\begin{bmatrix}w\\ -\lambda_1\\ -\lambda_2\end{bmatrix} &= \begin{bmatrix}\Sigma & r & e\\ r^T & 0 & 0\\ e^T & 0 & 0\end{bmatrix}^{-1}\begin{bmatrix}0\\ R\\ 1\end{bmatrix}.
\end{aligned}\]
<p>But the essence of using Karush-Kuhn-Tucker conditions to solve an optimization
problem with inequality constraints is to make it combinatorial. Assume
\(b_L\prec b_U\) and in some reasonable finite range (e.g. \(b_L=\mathbf{0}\)
and \(b_U=e\)), to solve this we need to test all combinations of activeness of
inequality constraints. In above, we have \(2n\) inequalities from the second
KKT condition and there are \(2^{2n}\) combinations of activeness. When an
inequality is active, its equality holds and the corresponding Lagrange
multiplier can be non-zero. Hence a new set of linear equations are created and
we can solve for \(w\) and other Lagrange multipliers, but we need to validate
the solution not violating the KKT conditions, especially that of
\(b_L \preceq w \preceq b_U\), and check the objective function. For example,
if all inequality constraints are active, the optimization problem has its
solution presented as</p>
\[\begin{aligned}
\begin{bmatrix}\Sigma & r & e & I & I\\ r^T & 0 & 0 & 0 & 0\\ e^T & 0 & 0 & 0 & 0 \\I & 0 & 0 & 0 & 0\\ I & 0 & 0 & 0 & 0\end{bmatrix}
\begin{bmatrix}w\\ -\lambda_1\\ -\lambda_2\\ -\theta\\ \phi\end{bmatrix} &= \begin{bmatrix}0\\ R\\ 1\\ b_L\\ b_U\end{bmatrix} \\
\therefore\quad
\begin{bmatrix}w\\ -\lambda_1\\ -\lambda_2\\ -\theta\\ \phi\end{bmatrix} &=
\begin{bmatrix}\Sigma & r & e & I & I\\ r^T & 0 & 0 & 0 & 0\\ e^T & 0 & 0 & 0 & 0 \\I & 0 & 0 & 0 & 0\\ I & 0 & 0 & 0 & 0\end{bmatrix}^{-1}
\begin{bmatrix}0\\ R\\ 1\\ b_L\\ b_U\end{bmatrix}
\end{aligned}\]
<p>and if some constraints are inactive, some of the rows and columns above shall
be removed. After checking all combinations of activeness, the best solution
based on the objective function are selected.</p>
<h2 id="implementation">Implementation</h2>
<p>The function below shows how the above optimization can be solved numerically.
It try out all combinations of activeness and find the solution using the
matrix equation described above. The best solution is then returned.</p>
<pre><code class="language-python">import numpy as np
def markowitz(ret, cov, r, lb=np.nan, ub=np.nan):
"""Markowitz minimizer with bounds constraints for a specified portfolio return
Args:
ret: A vector of N asset returns
cov: NxN matrix of covariance of asset returns
r (float): portfolio return to achieve
lb, ub (float or vector): lowerbound and upperbound for the portfolio weights,
if float, all weights are subject the same bound
Returns:
A (N+2) vector of portfolio weights and the Lagrange multipliers or
None if no solution can be found
"""
# Sanitation
ret = np.array(ret).squeeze()
cov = np.array(cov).squeeze()
r = float(r)
N = len(ret)
if ret.shape != (N,):
raise ValueError("Asset returns `ret` should be a vector")
if cov.shape != (N,N):
raise ValueError("Covariance matrix `cov` should be in shape ({},{}) to match the return vector".format(N,N))
if isinstance(lb, (float,int)):
lb = np.ones(N) * lb
if isinstance(ub, (float,int)):
ub = np.ones(N) * ub
lb = lb.squeeze()
ub = ub.squeeze()
if lb.shape != (N,):
raise ValueError("Lowerbound `lb` should be in shape (%d,) to match the return vector" % N)
if ub.shape != (N,):
raise ValueError("Upperbound `ub` should be in shape (%d,) to match the return vector" % N)
if (lb > ub).any():
raise ValueError("Lowerbound must no greater than upperbound")
# Construct matrices as templates for the equation AX=B
A = np.zeros((N+2+N+N,N+2+N+N))
A[:N, :N] = cov
A[:N, N] = A[N, :N] = ret
A[:N, N+1] = A[N+1, :N] = np.ones(N)
A[:N, N+2:N+N+2] = A[N+2:N+N+2, :N] = A[:N, N+N+2:] = A[N+N+2:, :N] = np.eye(N)
b = np.zeros((N+2+N+N,1))
b[N:N+2, 0] = [r, 1]
b[N+2:N+N+2, 0] = lb
b[N+N+2:, 0] = ub
# Try all activeness combinations and track the best result to minimize objective
bitmaps = 2**(2*N)
best_obj = np.inf
best_vector = None
for bitmap in range(bitmaps):
# constraints 0 to N-1 are for lowerbound and N to 2N-1 are for upperbound
# row/column N+2+i corresponds to the constraint i
inactive = [N+2+i for i in range(2*N) if bitmap & (2**i)]
active = [N+2+i for i in range(2*N) if i not in inactive]
# verify no conflicting active constraints
if any(N+i in active for i in active):
continue # conflicting activeness found, skip this one
# Delete some rows and columns from the template for this activeness combination
A_ = np.delete(np.delete(A, inactive, axis=0), inactive, axis=1)
b_ = np.delete(b, inactive, axis=0)
# Solve and check using matrix algebra
try:
x_ = (np.linalg.inv(A_) @ b_).squeeze()
w = x_[:N]
if (w < lb).any() or (w > ub).any():
continue # solution not in feasible domain, try next one
obj_val = w @ cov @ w # compute the covariance, i.e., objective function * 2
if obj_val < best_obj:
# Lower variance found, save the solution vector
best_obj = obj_val
x = np.zeros(N+2+N+N)
x[:N+2] = x_[:N+2] # w and negative lambda
x[active] = x_[N+2:] # negative theta and phi
x[N:N+2+N] *= -1 # lambda and theta are negated
best_vector = x
except np.linalg.LinAlgError:
pass # no solution found for this combination
return best_vector
</code></pre>Adrian S. Tamrighthandabacus@users.github.comA portfolio optimization problem in Markowitz style looks like the followinghtop cheatsheet2021-05-25T00:00:00-04:002021-05-25T00:00:00-04:00https://www.adrian.idv.hk/htop<p><code>htop</code> is useful and bring forth very rich information on one screen. Here is the cheatsheet to understand it:</p>
<p><img src="/img/htop.png" alt="htop cheatsheet" /></p>
<p>The source <a href="/img/htop.key">Keynote file is available</a>. Of course, there are
<a href="https://peteris.rocks/blog/htop/">more detailed explanation</a> as well as
pressing <code>h</code> for help screen.</p>Adrian S. Tamrighthandabacus@users.github.comhtop is useful and bring forth very rich information on one screen. Here is the cheatsheet to understand it:Correct ways of deploying NFSv42021-05-05T10:10:57-04:002021-05-05T10:10:57-04:00https://www.adrian.idv.hk/nfsv4<p>NFS is old. Its root can trace back to the remote file system in System V R3
while the first release of NFS (version 2) is in 1985 on SunOS 2.0. It must not
be considered as a counterpart of Samba or CIFS, since NFS does not do user
authentication while CIFS is created with user credentials.</p>
<p>Traditionally NFS is communicated using UDP — for it can be run stateless.
One concern is the resilence of clients after server reboot. If we connect NFS
over TCP, the stream protocol will never be recovered if server died. However,
even if NFS is over UDP, the client will hang until the server respond back,
which may take a few minutes, and in the meantime the client program that reads
the NFS mount will look like a zombie process. This failure recovery behavior
at the client side can be tuned using the following options during NFS mount:</p>
<pre><code>mount -t nfs -o soft,intr,timeo=600,retrans=3 host:/path /mountpoint
</code></pre>
<p>where <code>timeo</code> is the response timeout in deciseconds and <code>retrans</code> is the
number of retransmissions before the client determined that the server is not
responding.</p>
<p>The issue of reliable communication between NFS server and client is profound,
especially when the mount is writable. There are questions of whether the
written data are committed at the server, issues of detecting data corruption
in the channel, acknowledgement and retransmission of requests, cache
consistency, file locking, and even quotas management. These are the problems
addressed by the
<a href="https://www.kernel.org/doc/ols/2006/ols2006v2-pages-59-72.pdf">2006 paper</a></p>
<p>The latest version of NFS is v4.2 (RFC 7862) but mostly we identify NFS by its
major version. Besides security measures (e.g., Kerberos 5) support, NFSv4
distinguished from its predecessor versions that a distinguished filesystem
root is identified. It is identified with <code>fsid=0</code> option in <code>/etc/exports</code> and
we cannot avoid that. Depends on how the server side is configurated, sometimes
we will see the error message “<code>exportfs: Warning: /mnt does not support NFS
export.</code>” This is what I encountered in exporting <code>/mnt</code> in OpenWRT. It turns
out that a NFS kernel server will not see everything we see in the userland. If
we are not going to use NFS user server (such as unfs daemon), we need to make
share the NFSv4 root is not on a FUSE mount or anything special, such as an
overlay mount in the case of OpenWRT:</p>
<pre><code class="language-text">root@openwrt:~# mount
/dev/root on /rom type squashfs (ro,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,noatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,noatime)
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noatime)
/dev/ubi0_1 on /overlay type ubifs (rw,noatime,ubi=0,vol=1)
overlayfs:/overlay on / type overlay (rw,noatime,lowerdir=/,upperdir=/overlay/upper,workdir=/overlay/work)
ubi1:syscfg on /tmp/syscfg type ubifs (rw,relatime,ubi=1,vol=0)
tmpfs on /dev type tmpfs (rw,nosuid,relatime,size=512k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,mode=600,ptmxmode=000)
debugfs on /sys/kernel/debug type debugfs (rw,noatime)
</code></pre>
<p>In my case of OpenWRT, I make the mount point under <code>/tmp</code> (which is on a
tmpfs) and NFS root under tmpfs instead of the overlay file system. Whether you
can export a directory under NFS can be checked right away with <code>exportfs -ra</code>,
which should print nothing if no issues. Afterwards, we can check at remote to
see what the NFS server exported using <code>showmount -e hostname</code>.</p>
<p>One may see that such implementation in NFSv4 is restrictive. If we want to
share <code>/path1/dir1</code> and <code>/path2/dir2</code>, we must export the root directory in NFS
as it is the first common parent directory. But then, everything under the root
directory would technically be shared too (access control can still be set up,
but files and directories not supposed to be exported are still visible). That
is why an alternative set up is to create dedicated NFS root directory to
export and bind mount the exportable paths under it. Below is an example from
the Ubuntu NFSv4 how-to:</p>
<pre><code class="language-sh">mkdir /export
mkdir /export/users
mount --bind /home/users /export/users
</code></pre>
<p>then the corresponding <code>/etc/exports</code> is</p>
<pre><code class="language-text">/export 192.168.1.0/24(rw,fsid=0,no_subtree_check,sync)
/export/users 192.168.1.0/24(rw,nohide,insecure,no_subtree_check,sync)
</code></pre>
<p>On client side, mounting NFSv4 is using the command:</p>
<pre><code>mount -t nfs -o vers=4,soft serverhost:/users /mountpoint
</code></pre>
<p>which the server’s export to mount are provided as paths under the root export
instead of the absolute path at the server as in NFSv3.</p>
<h2 id="openwrt-example">OpenWRT example</h2>
<p>I am exporting NFSv4 mounts in OpenWRT (<code>opkg install kmod-fs-nfs-v4
nfs-kernel-server</code>). There are a few points that are quite different from other
systems. First is the <code>fstab</code>. Instead of <code>/etc/fstab</code>, the one honoured by
OpenWRT is indeed <code>/etc/config/fstab</code>, which can be initialized with the stdout
of <code>block detect</code> command. The mount points specified over there will be
automatically created. So we can safely point it to a tmpfs directory. If any
more complicated set up is required, we probably need to create a new init
script in <code>/etc/init.d</code>.</p>
<p>The second is the <code>fsid</code> option for NFSv4. It is required to say <code>fsid=0</code> for
the root directory in NFSv4 but the other should be automatically detected.
However, I found that we still need to specify <code>fsid</code> in OpenWRT. So my <code>/etc/exports</code> now looks like this:</p>
<pre><code class="language-text">/tmp *(rw,all_squash,insecure,no_subtree_check,sync,fsid=0)
/tmp/dir1 *(rw,all_squash,insecure,no_subtree_check,sync,fsid=1)
/tmp/dir2 *(rw,all_squash,insecure,no_subtree_check,sync,fsid=2)
</code></pre>
<h2 id="macos-issues">MacOS issues</h2>
<p>While MacOS shipped with NFSv4 client, the NFS server only supports v2 and v3.
One issue common in MacOS is the //insecure mount//, which the client is using
high ports instead of ports below 1024 as expected by most NFS servers by
default. Therefore the NFS servers need to say <code>insecure</code> in <code>/etc/exports</code>.</p>
<p>The NFS server in MacOS, however, expects clients using low port numbers unless
launched with the <code>-N</code> option. We can either use <code>launchctl stop
com.apple.nfsd</code> and then run <code>/sbin/nfsd -N</code>, or modify
<code>/System/Library/LaunchDaemons/com.apple.nfsd.plist</code> to add the <code>-N</code> option by adding a row:</p>
<pre><code class="language-xml"><array>
<string>/sbin/nfsd</string>
<string>-N</string>
</array>
</code></pre>
<p>However, in MacOS 10.11+, we need to override the system integrity protection
(SIP) before anything under <code>/System</code> can be modified (which is to use
<code>Command+R</code> at boot to start recovery mode and run <code>/usr/bin/csrutil disable</code>
in terminal and then boot back to normal MacOS; and we should do this again to
re-enable SIP after modification).</p>
<p>The MacOS implementation is also using BSD syntax for <code>/etc/exports</code>. It looks like this:</p>
<pre><code>/path/to/export -ro -alldirs -mapall=nobody -32bitclients -network 192.168.0.0 -mask 255.255.255.0
</code></pre>
<h2 id="references">References</h2>
<ul>
<li>Olaf Kirch. Why NFS Sucks. In Proceedings of the Ottawa Linux Symposium, pp.59-72, 2006. <a href="https://www.kernel.org/doc/ols/2006/ols2006v2-pages-59-72.pdf">https://www.kernel.org/doc/ols/2006/ols2006v2-pages-59-72.pdf</a></li>
<li>Ubuntu NFSv4 how-to <a href="https://help.ubuntu.com/community/NFSv4Howto">https://help.ubuntu.com/community/NFSv4Howto</a></li>
<li>RHEL Storage Administration Guide, sec 8.6 Configuring the NFS server <a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/nfs-serverconfig">https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/nfs-serverconfig</a></li>
</ul>Adrian S. Tamrighthandabacus@users.github.comNFS is old. Its root can trace back to the remote file system in System V R3 while the first release of NFS (version 2) is in 1985 on SunOS 2.0. It must not be considered as a counterpart of Samba or CIFS, since NFS does not do user authentication while CIFS is created with user credentials.Threshold linear regression2021-04-12T14:41:53-04:002021-04-12T14:41:53-04:00https://www.adrian.idv.hk/threshreg<p>Threshold regression means to do regression separately in different segments,
as separated by some threshold. Take linear models as example, we have a
response variable \(y\) and predictors \(X\), and additionally, we have a
discriminator \(q\), which may be derived from \(X\), and set of transition
functions \(g_i\). The model is expressed as
\(y = \sum_{i=1}^m (\mu_i + \beta_i^T x)g_i(q),\)
of which, there would always be exactly one of the \(g_i(q)\) set to 1 and all
other would be zero. In essense, if we know of the value of \(g_i(q)\) apriori,
\(X\) and \(y\) are related with a linear model. For convenience of some
derivation, we may rewrite the above into</p>
\[y = \mu_1 + \beta_1^T x + \sum_{i=2}^m (\mu'_i + \beta'^T_i x)g'_i(q)\]
<p>so that \(\mu_k = \sum_{i=1}^k \mu'_i\) and \(\beta_k = \sum_{i=1}^k \beta'_i\)
with \(\mu_1=\mu'_1\) and \(\beta_1=\beta'_1\) and \(g'_i(q)\) defined
correspondingly. A simple case of transition functions \(g_i\) would be an
indicator function. For example, we have a vector of thresholds \((-\infty,
c_1, c_2, \cdots, c_{m-1}, \infty)\) to partition \(\mathbb{R}\) into \(m\)
segments, it can be that \(g_i(q)=\mathbb{I}\{q\in(c_{i-1},c_i]\}\) and
therefore \(g'_i(q)=\mathbb{I}\{q>c_{i-1}\}\). Further detail can be seen
from, for example, González et al (2005)<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Regression on a linear model can be obtained from OLS. But in case of threshold
regression, the introduction of unknown threshold variables
\((c_1,\cdots,c_{m-1})\) made the model regression difficult. Usually the thresholds
are found by exhaustive search on grid points<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>. There is, however, an attempt to
solve for the threshold values together with the regression coefficients<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.
Its model is as follows:</p>
\[\begin{aligned}
y &= \sum_{i=1}^m \beta_i^T x \mathbb{I}\{c_{i-1}<q\le c_i\} + \epsilon \\
&= \sum_{i=1}^m \beta'^T_i x \mathbb{I}\{c_{i-1}<q\} + \epsilon
\end{aligned}\]
<p>which \(m\) is also a parameter to be found, and can be greater than the actual
number of segments in the model. In order to perform regression, we can
introduce the following loss metric<sup id="fnref:3:1" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> to be minimized:</p>
\[L(\beta,c) = \frac{1}{n}\sum_{i=1}^n\Big(y_i - \sum_{j=1}^m \beta'^T_j x_i\mathbb{I}\{c_{i-1}<q_i\}\Big)^2 + \sum_{j=2}^m p_\lambda(\Vert \beta'_j\Vert_1)\]
<p>This is defined as the mean squared error plus some penalty metric. The penalty
function \(p_\lambda(\cdot)\) of choice is the smoothly dlipped absolute
deviation (SCAD) function<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. It is defined, with parameters \(a\) and
\(\lambda\), as</p>
\[p_\lambda(\omega)=\begin{cases}
\lambda\omega & \omega\le\lambda \\
\dfrac{2a\lambda\omega-\omega^2-\lambda^2}{2(a-1)} & \lambda<\omega\le a\lambda \\
\dfrac{\lambda^2(a+1)}{2} & a\lambda<\omega
\end{cases}\]
<p>and its first-order derivative is correspondingly</p>
\[p'_\lambda(\omega)=\begin{cases}
\lambda & \omega\le\lambda \\
\dfrac{a\lambda-\omega}{a-1} & \lambda<\omega\le a\lambda\\
0 & a\lambda<\omega
\end{cases}\]
<p>Note that the parameter to the SCAD function is the L1-norm of \(\beta'_j\),
which at the region of \(\Vert \beta'_j\Vert_1 \le \lambda\), minimizing this
loss metric is same as performing lasso regression.</p>
<p>The benefit of using SCAD as penalty function can be found from Fan & Li
(2001)<sup id="fnref:4:1" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Essentially, if this loss metric is minimized, some of the
\(\beta'_j\) term may be forced to zero by such penalty to infer that the
corresponding segment can be omitted but merged with the previous segment.
However, this loss metric cannot be easily minimized as the transition function
\(\mathbb{I}\{c_{j-1}<q\}\) is not always differentiable. Therefore, Jiang et
al (2016)<sup id="fnref:3:2" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> suggested to replace it with a smoothed function. Namely, make
\(\mathbb{I}\{c_{j-1}<q\} = \Phi\left(\frac{q-c_{j-1}}{h}\right)\)
for some small value \(h\). This is the standard normal CDF and runs from 0 to
1 over \(\mathbb{R}\). With small \(h\), this is approximately an indicator
function turning from 0 to 1 at \(c_{j-1}\). Hence the smoothed loss metric is</p>
\[Q(\beta,c) = \frac{1}{n}\sum_{i=1}^n\Big(y_i - \sum_{j=1}^m \beta'^T_j x_i\Phi\left(\frac{q_i-c_{j-1}}{h}\right)\Big)^2 + \sum_{j=2}^m p_\lambda(\Vert \beta'_j\Vert_1)\]
<p>and \(\partial Q/\partial c_i\) and \(\partial Q/\partial \beta'_i\) are well
defined. We may therefore find the minimizer using Newton’s method or otherwise
and the regression can be obtained.</p>
<p>In order to try out this idea, the following code is implemented:</p>
<pre><code class="language-python">import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np
import scipy as sp
from scipy.stats import norm
# Generate data points
n = 2000
in_theta = [-3, -2, 4, -1, -7, 4]
in_c = [-np.sqrt(3), np.sqrt(3)]
x = np.linspace(-6,6,n).reshape(-1,1)
y = np.where(x<in_c[0], in_theta[0]+in_theta[1]*x,
np.where(x<in_c[1], in_theta[2]+in_theta[3]*x,
in_theta[4]+in_theta[5]*x)
) + np.random.rand(n).reshape(-1,1)
X = tf.constant(np.hstack([np.ones((n,1)), x]), dtype="float32")
y = tf.constant(y, dtype="float32")
in_theta = np.asarray(in_theta).reshape(-1,2)
in_beta = np.vstack([in_theta[0:1], np.diff(in_theta, axis=0)])
class ThresholdRegressionKeras(tf.keras.Model):
def __init__(self, d, m, lamb, a, h, **kwargs):
super().__init__(**kwargs)
self.d = d
self.m = m
self.lamb = lamb
self.a = a
self.h = h
self.c = tf.Variable(np.sort(np.random.randn(m-1)), dtype='float32', name="c")
self.beta = tf.Variable(np.random.randn(d,m), dtype='float32', name="beta")
self.normal = tfp.distributions.Normal(loc=0, scale=1)
def call(self, x):
"""Compute the threshold regression response y using the last column of
x as thresholding variable"""
# thresholding variable as the last column of x
q = x[:, -1:]
# find the smoothed indicator function
ind = self.normal.cdf((q-self.c)/self.h)
# threshold linear model using the smoothed indicator function
xbeta = x @ self.beta
y = xbeta[:, 0] + tf.reduce_sum(xbeta[:, 1:] * ind, axis=1)
return tf.reshape(y, (-1,1))
def loss(model):
'''Penality loss function factory'''
def loss_(y, y_hat):
residual = y - y_hat
mse = tf.reduce_mean(tf.square(y-y_hat))
b_norm = tf.norm(model.beta[:, 1:], ord=1, axis=0)
# SCAD penalty on L1-norm of beta
penalty = tf.where(b_norm <= model.lamb,
model.lamb*b_norm,
tf.where(b_norm <= model.a*model.lamb,
(2*model.a*b_norm - tf.square(b_norm) - model.lamb**2)/(2*model.a-2),
(model.a+1)*model.lamb**2/2))
return mse + tf.reduce_sum(penalty)
return loss_
# find h
q = X[:,-1:]
n = len(q)
h = np.log(n)*q.numpy().std(ddof=1)/n
earlystop = tf.keras.callbacks.EarlyStopping(
monitor='loss',
min_delta=0,
patience=100,
verbose=1,
mode='min',
baseline=None,
restore_best_weights=True
)
model = ThresholdRegressionKeras(d=2, m=3, lamb=1.0, a=3.7, h=h)
sgd = tf.keras.optimizers.SGD(learning_rate=0.05, momentum=0.0, nesterov=False, name="SGD")
loss_metric = loss(model)
model.compile(optimizer=sgd, loss=loss_metric)
model.fit(X, y, epochs=10000, batch_size=n, verbose=False, callbacks=[earlystop])
with np.printoptions(precision=4, suppress=True):
for v in model.variables:
print("Model {} (dtype {})".format(v.name, v.dtype.name))
print(v.numpy())
print("Loss: {}".format(loss_metric(y, model.predict(X)).numpy().squeeze()))
print("MSE: {}".format(tf.keras.losses.MSE(tf.reshape(y, (-1,)), tf.reshape(model.predict(X), (-1))).numpy().squeeze()))
print("h: {}".format(model.h))
print("Expected beta:")
print(in_beta.T)
print("Expected c:")
print(np.asarray(in_c))
</code></pre>
<p>We can make the same with SciPy and its optimize function but using Tensorflow
would be easier as the gradient vector can be inferred automatically (and comes
with the gradient descend algorithm). This model, however, does not perform
well. The issue is on the threshold values \(c_j\) that, it may produce
\(c_{j-1} > c_j\) and distorted the linear model. We may further add a penalty
factor to the loss function to correct this but after all, I suspect the loss
function is not concave with respect to the threshold variables \(c_j\) —
hence the global minimum is not always found.</p>
<p>To compare, we can build a grid-search model as follows, using scikit-learn for
its linear regression function:</p>
<pre><code class="language-python">import json
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate data points
n = 2000
in_theta = [-3, -2, 4, -1, -7, 4]
in_c = [-np.sqrt(3), np.sqrt(3)]
x = np.linspace(-6,6,n).reshape(-1,1)
y = np.where(x<in_c[0], in_theta[0]+in_theta[1]*x,
np.where(x<in_c[1], in_theta[2]+in_theta[3]*x,
in_theta[4]+in_theta[5]*x)
) + np.random.rand(n).reshape(-1,1)
in_theta = np.asarray(in_theta).reshape(-1,2)
# Threshold regression model
class ThresholdRegression:
def __init__(self, thresholds):
self.thresholds = thresholds
def fit(self, X, y, q):
self.regs = []
thresh = np.append(-np.inf, np.append(self.thresholds, np.inf))
for lthresh, uthresh in zip(thresh, self.thresholds):
selector = (lthresh < q) & (q <= uthresh)
if selector.any():
self.regs.append(LinearRegression().fit(X[selector], y[selector]))
else:
# degenerated case of no data point
self.regs.append(None)
selector = (q >= self.thresholds[-1])
self.regs.append(LinearRegression().fit(X[selector], y[selector]))
return self
def predict(self, X, q):
thresh = np.append(-np.inf, np.append(self.thresholds, np.inf))
y = np.zeros_like(q)
for i in range(len(self.thresholds)+1):
# segment i: thresh[i] < q < thresh[i+1]
selector = (thresh[i] <= q) & (q < thresh[i+1])
if selector.any():
y = np.where((thresh[i] <= q) & (q < thresh[i+1]), self.regs[i].predict(X).reshape(y.shape), y)
return y.reshape(-1,1)
def mse(self, X, y, q):
y_hat = self.predict(X, q)
return ((y - y_hat)**2).mean()
def score(self, X, y, q):
y_hat = self.predict(X, q)
u = ((y - y_hat)**2).sum()
v = ((y - y.mean())**2).sum()
return 1-(u/v)
def get_params(self):
params = {
th: {'inter':reg.intercept_.squeeze().tolist(),
'coef':reg.coef_.squeeze().tolist()}
for reg, th in zip(self.regs, self.thresholds) if reg
}
if self.regs[-1]:
params[float('inf')] = {
'inter':self.regs[-1].intercept_.squeeze().tolist(),
'coef':self.regs[-1].coef_.squeeze().tolist(),
}
return params
def is_ascending(vec):
"Check if the sequence `vec` is strictly ascending numbers"
if len(vec) < 2:
return True
vec = np.asarray(vec).reshape(-1,)
return (np.diff(vec) > 0).all()
# Enumerate thresholds in grid
m = 3 # num thresholds
lbound, ubound = -5.0, 5.0
grid_precision = 0.01
n_grid = int((ubound - lbound)/grid_precision) + 1
thresh_points = np.linspace(lbound, ubound, n_grid)
thresh_arrays = np.meshgrid(*([thresh_points]*(m-1)))
best_r_score = -np.inf
best_mse_score = np.inf
best_r_model = best_mse_model = None
for thresh_rows in zip(*thresh_arrays):
for threshs in zip(*thresh_rows):
if not is_ascending(threshs):
continue # bad thresholds vector
model = ThresholdRegression(threshs).fit(x,y,x[:,0])
score = model.score(x,y,x[:,0])
if score > best_r_score:
best_r_score, best_r_model = score, model
score = mean_squared_error(y, model.predict(x,x[:,0]))
if score < best_mse_score:
best_mse_score, best_mse_model = score, model
# Reference model
ref_model = ThresholdRegression(in_c).fit(x,y,x[:,0])
with np.printoptions(precision=4, suppress=True):
print("Best R^2 model")
print(json.dumps(best_r_model.get_params(), indent=4))
print("Best R^2 score: {}".format(best_r_score))
print("Best R^2 MSE: {}".format(mean_squared_error(y, best_r_model.predict(x,x[:,0]))))
print("Best MSE model")
print(json.dumps(best_mse_model.get_params(), indent=4))
print("Best MSE R^2: {}".format(best_mse_model.score(x, y, x[:,0])))
print("Best MSE score: {}".format(best_mse_score))
print("Reference model")
print(json.dumps(ref_model.get_params(), indent=4))
print("Reference score: {}".format(ref_model.score(x,y,x[:,0])))
print("Reference MSE: {}".format(mean_squared_error(y, ref_model.predict(x,x[:,0]))))
</code></pre>
<p>This code often gives good result, but surprisingly, not always reproduce the
parameters of the original model. Depends on the particular structure of the
data, sometimes a slightly different threshold partition and hence a slightly
different model would be found that produces a smaller MSE or higher \(R^2\)
score. But this is a very slow search trying out half a million combinations of
threshold partitions. I can find several factors that impact the accuracy of
regression: The imbalanced number of samples in each segment, the magnitude of
the noise, and the similarity of linear models of neighbouring segments.</p>
<p>Introducing a test set or using k-fold validation may improve the search result
in both code. But this is sufficient to show that threshold regression, despite
a simple extension to linear model, can be a hard problem to solve.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Andrés González, Timo Teräsvirta, and Dick van Dijk, “Panel Smooth Transition Regression Models”. Research Paper 165, Quantitative Finance Research Centre, University of Technology Sydney. August 2005. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Youyi Fong, Ying Huang, Peter B. Gilbert, and Sallie R. Permar, “chngpt: threshold regression model estimation and inference”. BMC Bioinformatics 18:454, 2017. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>J. K. Jiang, H. Z. Lin, L. Jiang, and Paul Siu Fai Yip, “Estimation of threshold values and regression parameters in threshold regression model” (in Chinese). Scientia Sinica Mathematica, 46(4):409–422, 2016. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:3:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a> <a href="#fnref:3:2" class="reversefootnote" role="doc-backlink">↩<sup>3</sup></a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>Jianqing Fan and Runze Li, “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties”. Journal of the American Statistical Association, 96(456):1348–1360, December 2001. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a> <a href="#fnref:4:1" class="reversefootnote" role="doc-backlink">↩<sup>2</sup></a></p>
</li>
</ol>
</div>Adrian S. Tamrighthandabacus@users.github.comThreshold regression means to do regression separately in different segments, as separated by some threshold. Take linear models as example, we have a response variable \(y\) and predictors \(X\), and additionally, we have a discriminator \(q\), which may be derived from \(X\), and set of transition functions \(g_i\). The model is expressed as \(y = \sum_{i=1}^m (\mu_i + \beta_i^T x)g_i(q),\) of which, there would always be exactly one of the \(g_i(q)\) set to 1 and all other would be zero. In essense, if we know of the value of \(g_i(q)\) apriori, \(X\) and \(y\) are related with a linear model. For convenience of some derivation, we may rewrite the above intoConverting a DD-WRT router to OpenWRT2021-04-06T00:00:20-04:002021-04-06T00:00:20-04:00https://www.adrian.idv.hk/ddwrt<p>I have an old WiFi router (not my gateway, but that doesn’t matter) that used
to be running DD-WRT and I am experimenting with the WiFi mesh or 802.11r
roaming at home. Unfortunately the DD-WRT version is too old to do either of
these – it was a Linux 2.4 kernel on a Netgear WNR3500L. There seems to be a
K3X build that works with it, and there is definitely a working OpenWRT version
too. The problem is, both of them are provided as a <code>.chk</code> file while, if you
have DD-WRT already installed, you can only refresh the firmware using <code>.bin</code>
file.</p>
<p>It doesn’t seem to be any way out. I tried to use the DD-WRT web interface to
flash a <code>.chk</code> file anyway but it doesn’t seem to do anything. Similarly, try
to use <code>write</code> command when you ssh into the router also does nothing.
<a href="https://wiki.dd-wrt.com/wiki/index.php/Tftp_flash">TFTP</a> seems to be the last
resort.</p>
<p>I am not sure for other routers, but this particular model seems to have a boot
cycle as follows. When the router is powered on, in a few second it will pick
up the IP address of 192.168.1.1/24 in its first phrase of boot cycle. Then it
will try to load the firmware from its flash memory and start the second phrase
of boot. When the ownership is transitioned, the IP address of the router will
be reset. This can be observed by a computer connected to one of its LAN port,
carrying a 192.168.1.0/24 address, and keep pinging to 192.168.1.1 while the
router is power cycled. We will see a “no route to host” when it powered on,
then responding with TTL of 100 after a few seconds, then not responding again,
and finally responding with TTL of 64 when the router successfully boot from
the firmware.</p>
<p>Sources from the web tells that the router will check for TFTP and load
firmware from it at the first phrase of boot cycle, namely, at the time when
ping is responding with TTL 100. It is <strong>not correct</strong>. Firmware will be loaded
from TFTP <strong>only if</strong> the router is bricked! That is, the router must not be
able to load firmware from its flash storage, then TFTP path will be taken.</p>
<p>Therefore, my way of loading a OpenWRT onto a DD-WRT run of Netgear WNR3500L is
the following:</p>
<ol>
<li>Prepare the OpenWRT firmware. Mine is downloaded as
<code>openwrt-19.07.7-brcm47xx-mips74k-netgear-wnr3500l-v1-na-squashfs.chk</code></li>
<li>Unplug all cables except the one connecting a Mac to its LAN port. Reset the
DD-WRT on the WNR3500L, so that it takes 192.168.1.1 address as default and
no funny thing is running over there. But I turned on the SSH from it.</li>
<li>ssh into the router, then run <code>/sbin/mtd erase linux</code>; this will take around
1 minute to complete. Afterwards, the router will brick if power cycled.</li>
<li>Turn off the router</li>
<li>On a Mac, keep it connected to the router through a Ethernet cable to its
LAN port. Set the Mac to a 192.168.1.0/24 address.</li>
<li>On the Mac, open two terminal window. One is <code>cd</code> into the directory of the
downloaded firmware then run <code>tftp</code> and leave the prompt open. Another
terminal simply run <code>ping 192.168.1.1</code>. Keep the two terminals side by side</li>
<li>
<p>Open TextEdit in Mac and type the following</p>
<pre><code> connect 192.168.1.1
binary
rexmt 1
timeout 60
put openwrt-19.07.7-brcm47xx-mips74k-netgear-wnr3500l-v1-na-squashfs.chk
</code></pre>
<p>Then copy the whole 5 lines into clipboard</p>
</li>
<li>Power on the router, set focus to the <code>tftp</code> terminal, and keep an eye on the ping terminal</li>
<li>If the ping terminal start to respond (with TTL 100), then paste the
clipboard onto the tftp terminal. If successfully, the ping terminal will
keep responding with TTL 100 until the tftp terminal says transmit
completed. But <code>tftp</code> will fail if the router is not brick, i.e., if the
firmware can still be found.</li>
<li>If the previous step is successful, keep the ping window until you see it
stop responding and back on again. This will take a few minutes to appear.
Or you may also power cycle the router, but still you need to wait for a
few minutes as a new firmware is being used.</li>
<li>Open up a browser to 192.168.1.1, and wait for the LuCI interface of OpenWRT to appear</li>
</ol>Adrian S. Tamrighthandabacus@users.github.comI have an old WiFi router (not my gateway, but that doesn’t matter) that used to be running DD-WRT and I am experimenting with the WiFi mesh or 802.11r roaming at home. Unfortunately the DD-WRT version is too old to do either of these – it was a Linux 2.4 kernel on a Netgear WNR3500L. There seems to be a K3X build that works with it, and there is definitely a working OpenWRT version too. The problem is, both of them are provided as a .chk file while, if you have DD-WRT already installed, you can only refresh the firmware using .bin file.Heat equation and Black-Scholes formula2021-01-16T17:46:13-05:002021-01-16T17:46:13-05:00https://www.adrian.idv.hk/heateq<p>It is well known for a long time that the quant finance borrowed a lot of
results from physics. The notable Feynman-Kac formula is one example. In the
case of vanilla European option pricing, the Black-Scholes formula gives the
following result:</p>
\[\begin{aligned}
C &= Se^{rT}\Phi(d_1) - Ke^{-rT}\Phi(d_2) \\
P &= -Se^{rT}\Phi(-d_1) + Ke^{-rT}\Phi(-d_2) \\
\textrm{where}\qquad
d_1 &= \frac{1}{\sigma\sqrt{T}}\left(\ln\frac{S}{K}+\left(r+\frac{\sigma^2}{2}\right)T\right) \\
d_2 &= \frac{1}{\sigma\sqrt{T}}\left(\ln\frac{S}{K}+\left(r-\frac{\sigma^2}{2}\right)T\right) = d_1 - \sigma\sqrt{T}
\end{aligned}\]
<p>This result can be derived using heat equation, which usually just mentioned
but not provided in detail in the textbooks. Here I try to lay out the full
derivation.</p>
<h2 id="heat-equation">Heat equation</h2>
<p>The heat flow equation is to relate the temperature \(u(\vec{x},t)\) at
position \(\vec{x}\) at time \(t\) to the temperature of its neighbour
positions and time:</p>
\[\frac{\partial u}{\partial t} = k \nabla^2u\]
<p>The RHS is the heat flow from the neighbour and the LHS is the temporal rate of
temperature change. Sometimes we will assume the scale \(k=1\) and the equation
becomes</p>
\[\frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2}.\]
<p>The heat flow equation has the boundary and initional conditions as follows:</p>
<ul>
<li>as \(|x|\to\infty\), \(|u(x,t)|\le \alpha_t e^{a|x|}\) for some constant
\(a>0\) and \(\alpha_t>0\) which \(\alpha_t\) is independent of \(x\), i.e.,
\(u(x,t)\) grow no faster than \(e^{a|x|}\)</li>
<li>assume temperate at time 0 is known, \(u(x,0)=u_0(x)\) for all \(x\)</li>
</ul>
<p>and the solution is</p>
\[\begin{aligned}
u_t &= ku_{xx} \\
u(x,0) &= u_0(x) \\
\implies u(x,t) &= \frac{1}{\sqrt{4\pi kt}}\int_{-\infty}^\infty u_0(s)\exp\left(-\frac{(x-s)^2}{4kt}\right)ds
\end{aligned}\]
<h2 id="option-pricing-by-heat-flow-equation">Option pricing by heat flow equation</h2>
<p>Assume the price follows \(dS=\mu S dt + \sigma S dW_t\), the Black-Scholes PDE says</p>
\[\begin{aligned}
\frac{\partial V}{\partial t}+\frac12\sigma^2 S^2\frac{\partial^2V}{\partial S^2}+rS\frac{\partial V}{\partial S}-rV &=0 \\
V(S,T) &= f(S)
\end{aligned}\]
<p>which \(V(S,t)\) is the option price and \(f(S)\) is the payoff function at
maturity \(T\). We need to transform the Black-Scholes PDE into the form of
homogeneous heat equation (with \(k=1\)). First is to perform change of
variables:</p>
\[\begin{aligned}
S &= e^x \\
t &= T-\frac{2\tau}{\sigma^2} \\
V(S,t) &= v(x,\tau) = v\left(\ln S, \frac{\sigma^2}{2}(T-t)\right)
\end{aligned}\]
<p>then</p>
\[\begin{aligned}
\frac{\partial V}{\partial t}
&= -\frac{\sigma^2}{2}\frac{\partial v}{\partial\tau} \\
\frac{\partial V}{\partial S}
&= \frac{1}{S}\frac{\partial v}{\partial x} \\
\frac{\partial^2 V}{\partial S^2}
&= -\frac{1}{S^2}\frac{\partial v}{\partial x}+\frac{1}{S}\frac{\partial v}{\partial x}\frac{\partial x}{\partial S} = -\frac{1}{S^2}\frac{\partial v}{\partial x}+\frac{1}{S^2}\frac{\partial^2 v}{\partial x^2} \\
%
\therefore
\frac{\partial V}{\partial t}+\frac12\sigma^2 S^2\frac{\partial^2V}{\partial S^2}+rS\frac{\partial V}{\partial S}-rV
&= -\frac{\sigma^2}{2}\frac{\partial v}{\partial\tau}+\frac{\sigma^2}{2} S^2\left(-\frac{1}{S^2}\frac{\partial v}{\partial x}+\frac{1}{S^2}\frac{\partial^2 v}{\partial x^2}\right)+rS\frac{1}{S}\frac{\partial v}{\partial x}-rv \\
&= -\frac{\sigma^2}{2}\frac{\partial v}{\partial\tau}
-\frac{\sigma^2}{2}\frac{\partial v}{\partial x}+\frac{\sigma^2}{2} \frac{\partial^2 v}{\partial x^2}
+r\frac{\partial v}{\partial x}
-rv \\
&= -\frac{\partial v}{\partial\tau}
-\frac{\partial v}{\partial x}
+ \frac{\partial^2 v}{\partial x^2}
+\frac{2r}{\sigma^2}\frac{\partial v}{\partial x}
-\frac{2r}{\sigma^2}v \\
\therefore \frac{\partial v}{\partial\tau} &= \frac{\partial^2 v}{\partial x^2}
+\left(\frac{2r}{\sigma^2}-1\right)\frac{\partial v}{\partial x}
-\frac{2r}{\sigma^2}v
\end{aligned}\]
<p>further substitute \(k=2r/\sigma^2\),</p>
\[\begin{aligned}
\frac{\partial v}{\partial\tau} &= \frac{\partial^2 v}{\partial x^2}
+(k-1)\frac{\partial v}{\partial x}
-kv \\
v(x,0) &= V(e^x,T) = f(e^x)
\end{aligned}\]
<p>The above is in the form of forward parabolic equation. To make the RHS has
only the second derivative term, we substitute
\(v(x,t)=e^{\alpha x+\beta t}u(x,t)=\phi u\), which</p>
\[\begin{aligned}
\frac{\partial v}{\partial \tau} &= \beta \phi u + \phi \frac{\partial u}{\partial \tau} \\
\frac{\partial v}{\partial x} &= \alpha \phi u + \phi \frac{\partial u}{\partial x} \\
\frac{\partial^2 v}{\partial x^2} &= \alpha^2 \phi u + 2\alpha\phi \frac{\partial u}{\partial x}+\phi\frac{\partial^2 u}{\partial x^2}
\end{aligned}\]
<p>Then we can have</p>
\[\begin{aligned}
\frac{\partial v}{\partial\tau} &= \frac{\partial^2 v}{\partial x^2}
+(k-1)\frac{\partial v}{\partial x}
-kv \\
\implies \beta \phi u + \phi \frac{\partial u}{\partial\tau} &= (\alpha^2 \phi u + 2\alpha\phi \frac{\partial u}{\partial x}+\phi\frac{\partial^2 u}{\partial x^2})
+(k-1)(\alpha \phi u + \phi \frac{\partial u}{\partial x})
-k\phi u \\
\frac{\partial u}{\partial\tau} &= \frac{\partial^2 u}{\partial x^2}
+(k-1+2\alpha)\frac{\partial u}{\partial x}
+(\alpha^2 +(k-1)\alpha-k-\beta)u \\
\textrm{with }\alpha &=-\frac12(k-1) \\
\beta &= -\frac14(k+1)^2 \\
\implies \frac{\partial u}{\partial\tau} &= \frac{\partial^2 u}{\partial x^2}
\end{aligned}\]
<p>and with the initial conditions</p>
\[\begin{aligned}
u(x,\tau)&=e^{-\alpha x - \beta \tau}v(x,\tau)\\
u(x,0)&=e^{-\alpha x}v(x,0) = e^{\frac12(k-1)x}v(x,0) \\
&=e^{(\frac{r}{\sigma^2}-\frac12)x}f(e^x)
\end{aligned}\]
<p>In the above \(\tau\) is measuring the time to maturity. Hence at \(\tau=0\),
it is the payoff that we know. The solution to the above is</p>
\[u(x,\tau) = \frac{1}{\sqrt{4\pi \tau}}\int_{-\infty}^\infty\exp\left(-\frac{(x-s)^2}{4\tau}\right)u(s,0)ds\]
<p>and we get the pricing formula by reversing all substitutions.</p>
<h2 id="european-call-option-as-an-example">European call option as an example</h2>
<p>Step 1: Transform Black-Scholes equation into heat equation. Black-Scholes is</p>
\[\frac{\partial V}{\partial t}+\frac12\sigma^2 S^2\frac{\partial^2V}{\partial S^2}+rS\frac{\partial V}{\partial S}-rV=0\]
<p>and boundary conditions are:</p>
\[\begin{cases}
V(0,t)=0 & \textrm{($V=0$ whenever $S_t=0$ for any $t$)}\\
\displaystyle\lim_{S\to\infty}V(S,t)=S-Ke^{-r(T-t)} & \textrm{(discounted exercise price)}\\
V(S,T) = (S-K)^+ & \textrm{(payoff at expiration)}
\end{cases}\]
<p>Transformations</p>
\[\begin{aligned}
S &= Ke^x \\
t &= T-\frac{2\tau}{\sigma^2} \\
V(S,t) &= Kv(x,\tau) = Kv\left(\ln S, \frac{\sigma^2}{2}(T-t)\right)
\end{aligned}\]
<p>and we can derive</p>
\[\begin{aligned}
\frac{\partial v}{\partial\tau} &= \frac1K\frac{\partial V}{\partial t}\frac{\partial t}{\partial\tau} = -\frac{2}{K\sigma^2}\frac{\partial V}{\partial t} \\
\frac{\partial v}{\partial x} &= \frac1K\frac{\partial V}{\partial S}\frac{\partial S}{\partial x} = e^x \frac{\partial V}{\partial S} \\
\frac{\partial^2 v}{\partial x^2} &= e^x\frac{\partial V}{\partial S} + e^x\frac{\partial}{\partial x}\frac{\partial V}{\partial S} = e^x\frac{\partial V}{\partial S} + e^x\frac{\partial^2 V}{\partial S^2}\frac{\partial S}{\partial x} = e^x\frac{\partial V}{\partial S} + Ke^{2x}\frac{\partial^2 V}{\partial S^2}
\end{aligned}\]
<p>then substitute back to the Black-Scholes equation to get</p>
\[\begin{aligned}
\left(-\frac{K\sigma^2}{2}\frac{\partial v}{\partial\tau}\right) + \frac12\sigma^2S^2\left(\frac{1}{Ke^{2x}}(\frac{\partial^2v}{\partial x^2}-e^xe^{-x}\frac{\partial v}{\partial x})\right)+rS\left(e^{-x}\frac{\partial v}{\partial x}\right)-rKv &=0 \\
\left(-\frac{K\sigma^2}{2}\frac{\partial v}{\partial\tau}\right) + \frac{\sigma^2S^2}{2Ke^{2x}}\left(\frac{\partial^2v}{\partial x^2}-\frac{\partial v}{\partial x}\right)+rSe^{-x}\frac{\partial v}{\partial x}-rKv &=0 \\
-\frac{\partial v}{\partial\tau} + \frac{S^2}{K^2e^{2x}}\frac{\partial^2v}{\partial x^2}-\frac{S^2}{K^2e^{2x}}\frac{\partial v}{\partial x}+2r\sigma^{-2}\frac{S}{Ke^{-x}}\frac{\partial v}{\partial x}-2r\sigma^{-2}v &=0 \\
-\frac{\partial v}{\partial\tau} + \frac{\partial^2v}{\partial x^2}-\frac{\partial v}{\partial x}+2r\sigma^{-2}\frac{\partial v}{\partial x}-2r\sigma^{-2}v &=0 \\
-\frac{\partial v}{\partial\tau} + \frac{\partial^2v}{\partial x^2}+(2r\sigma^{-2}-1)\frac{\partial v}{\partial x}-2r\sigma^{-2}v &=0 \\
\end{aligned}\]
<p>with \(k=2r\sigma^{-2}\), we have (forward parabolic equation)</p>
\[\frac{\partial v}{\partial\tau} = \frac{\partial^2v}{\partial x^2}+(k-1)\frac{\partial v}{\partial x}-kv\]
<p>If we further substitute</p>
\[v(x,\tau)=e^{\alpha x+\beta\tau}u(x,\tau)\]
<p>then the forward parabolic equation becomes</p>
\[\begin{aligned}
e^{\alpha x+\beta\tau}\frac{\partial u}{\partial\tau}+\beta e^{\alpha x+\beta\tau}u &= \left(e^{\alpha x+\beta\tau}\frac{\partial^2 u}{\partial x^2}+\alpha e^{\alpha x+\beta\tau}\frac{\partial u}{\partial x}+\alpha^2 e^{\alpha x+\beta\tau}u+\alpha e^{\alpha x+\beta\tau} \frac{\partial u}{\partial x}\right)\\
&\qquad
+(k-1)\left(e^{\alpha x+\beta\tau}\frac{\partial u}{\partial x}+\alpha e^{\alpha x+\beta\tau} u\right)-ke^{\alpha x+\beta\tau}u \\
%
\frac{\partial u}{\partial\tau}+\beta u &= \left(\frac{\partial^2 u}{\partial x^2}+\alpha\frac{\partial u}{\partial x}+\alpha^2u+\alpha \frac{\partial u}{\partial x}\right)
+(k-1)\left(\frac{\partial u}{\partial x}+\alpha u\right)-ku \\
\frac{\partial u}{\partial\tau}+\beta u &= \left(\frac{\partial^2 u}{\partial x^2}+2\alpha\frac{\partial u}{\partial x}+\alpha^2u\right)
+(k-1)\left(\frac{\partial u}{\partial x}+\alpha u\right)-ku \\
\frac{\partial u}{\partial\tau} &= \frac{\partial^2 u}{\partial x^2}+2\alpha\frac{\partial u}{\partial x}+(k-1)\frac{\partial u}{\partial x}+\alpha^2u
+(k-1)\alpha u-ku-\beta u
\end{aligned}\]
<p>to make this become the heat equation, we need</p>
\[\begin{aligned}
& \begin{cases}
2\alpha+(k-1) = 0 \\
\alpha^2+(k-1)\alpha-k-\beta = 0
\end{cases} \\
\therefore&
\begin{cases}
\alpha = -\frac{k-1}{2} \\
\beta = \alpha^2+(k-1)\alpha-k = \frac{(k-1)^2}{4}-\frac{(k-1)^2}{2}-k = -\frac{(k+1)^2}{4}
\end{cases}
\end{aligned}\]
<p>i.e., to substitute</p>
\[v(x,\tau)=e^{-\frac12(k-1)x-\frac14(k+1)^2\tau}u(x,t)\]
<p>The boundary conditions become:</p>
\[\begin{cases}
\displaystyle\lim_{x\to-\infty}v(x,\tau)=0 & \textrm{($V=0$ whenever $S_t=0$ for any $t$)}\\
\displaystyle\lim_{x\to\infty}v(x,\tau)=e^x-e^{-r\tau} & \textrm{(discounted exercise price)}\\
v(x,0) = (e^x-1)^+ & \textrm{(payoff at expiration)}
\end{cases}\]
<p>or in terms of \(u(x,\tau)\):</p>
\[u(x,0) = e^{\frac12(k-1)x}(e^x-1)^+ = (e^{\frac12(k+1)x}-e^{\frac12(k-1)x})^+\]
<p>which \(u(x,0)>0\) when \(x > 0\)</p>
<p>Step 2: Using the solution of heat equation</p>
\[\begin{aligned}
u_t &= u_{xx} \\
u(x,0) &= u_0(x) \\
\implies u(x,t) &= \frac{1}{\sqrt{4\pi t}}\int_{-\infty}^\infty u_0(s)\exp\left(-\frac{(x-s)^2}{4t}\right)ds
\end{aligned}\]
<p>therefore</p>
\[\begin{aligned}
u(x,\tau)
&= \frac{1}{\sqrt{4\pi\tau}}\int_{-\infty}^\infty (e^{\frac12(k+1)s}-e^{\frac12(k-1)s})^+\exp\left(-\frac{(x-s)^2}{4\tau}\right)ds \\
&= \frac{1}{\sqrt{2\pi}}\int_{-\infty}^\infty (e^{\frac12(k+1)(\sqrt{2\tau}y+x)}-e^{\frac12(k-1)(\sqrt{2\tau}y+x)})^+e^{-\frac12y^2}dy & (\textrm{Sub. }y=\frac{s-x}{\sqrt{2\tau}})\\
&= \frac{1}{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty (e^{\frac12(k+1)(\sqrt{2\tau}y+x)}-e^{\frac12(k-1)(\sqrt{2\tau}y+x)}) e^{-\frac12y^2}dy\\
&= \frac{1}{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{\frac12(k+1)(\sqrt{2\tau}y+x)-\frac12y^2}dy-\frac1{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{\frac12(k-1)(\sqrt{2\tau}y+x)-\frac12y^2}dy
\end{aligned}\]
<p>for the first integral, we perform completing square to get</p>
\[\begin{aligned}
\frac12\left[(k+1)(\sqrt{2\tau}y+x)-y^2\right]
& = -\frac12\left[y^2-(k+1)\sqrt{2\tau}y-(k+1)x\right]\\
& = -\frac12\left[\left(y-\frac12(k+1)\sqrt{2\tau}\right)^2-\frac12(k+1)^2\tau-(k+1)x\right]\\
\end{aligned}\]
<p>therefore</p>
\[\begin{aligned}
& \frac{1}{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{\frac12(k+1)(\sqrt{2\tau}y+x)-\frac12y^2}dy \\
=&\frac{1}{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{-\frac12\left[\left(y-\frac12(k+1)\sqrt{2\tau}\right)^2-\frac12(k+1)^2\tau-(k+1)x\right]}dy \\
=& e^{\frac14(k+1)^2\tau+\frac12(k+1)x}\frac{1}{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{-\frac12\left(y-\frac12(k+1)\sqrt{2\tau}\right)^2}dy \\
=& e^{\frac14(k+1)^2\tau+\frac12(k+1)x}\frac{1}{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}-\frac12(k+1)\sqrt{2\tau}}^\infty e^{-\frac12z^2}dz \\
=& e^{\frac14(k+1)^2\tau+\frac12(k+1)x}\Phi(\frac{x}{\sqrt{2\tau}}+\frac12(k+1)\sqrt{2\tau}) \\
=& e^{\frac14(k+1)^2\tau+\frac12(k+1)x}\Phi(d_1)
\end{aligned}\]
<p>and for the second integral, completing square to get</p>
\[\begin{aligned}
\frac12\left[(k-1)(\sqrt{2\tau}y+x)-y^2\right]
& = -\frac12\left[y^2-(k-1)\sqrt{2\tau}y-(k-1)x\right]\\
& = -\frac12\left[\left(y-\frac12(k-1)\sqrt{2\tau}\right)^2-\frac12(k-1)^2\tau-(k-1)x\right]\\
\end{aligned}\]
<p>therefore</p>
\[\begin{aligned}
& \frac1{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{\frac12(k-1)(\sqrt{2\tau}y+x)-\frac12y^2}dy \\
=& \frac1{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{-\frac12\left[\left(y-\frac12(k-1)\sqrt{2\tau}\right)^2-\frac12(k-1)^2\tau-(k-1)x\right]}dy \\
=& e^{\frac14(k-1)^2\tau+\frac12(k-1)x}\frac1{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}}^\infty e^{-\frac12\left(y-\frac12(k-1)\sqrt{2\tau}\right)^2}dy \\
=& e^{\frac14(k-1)^2\tau+\frac12(k-1)x}\frac1{\sqrt{2\pi}}\int_{-x/\sqrt{2\tau}-\frac12(k-1)\sqrt{2\tau}}^\infty e^{-\frac12z^2}dz \\
=& e^{\frac14(k-1)^2\tau+\frac12(k-1)x}\Phi(\frac{x}{\sqrt{2\tau}}+\frac12(k-1)\sqrt{2\tau}) \\
=& e^{\frac14(k-1)^2\tau+\frac12(k-1)x}\Phi(d_2)
\end{aligned}\]
<p>Step 3: The pricing formula can be obtained by reversing all substitutions back to \(V(S,t)\)</p>
\[\begin{aligned}
v(x,\tau)
&=e^{-\frac12(k-1)x-\frac14(k+1)^2\tau}u(x,t) \\
&= e^{-\frac12(k-1)x-\frac14(k+1)^2\tau}\left(e^{\frac14(k+1)^2\tau+\frac12(k+1)x}\Phi(d_1)-e^{\frac14(k-1)^2\tau+\frac12(k-1)x}\Phi(d_2)\right) \\
&= e^{-\frac12(k-1)x+\frac12(k+1)x}\Phi(d_1)-e^{-\frac14(k+1)^2\tau+\frac14(k-1)^2\tau}\Phi(d_2) \\
&= e^{x}\Phi(d_1)-e^{-k\tau}\Phi(d_2) \\
&= \frac{S}{K}\Phi(d_1)-e^{-2r\sigma^{-2}(\frac12\sigma^2(T-t))}\Phi(d_2) \\
&= \frac{S}{K}\Phi(d_1)-e^{-r(T-t)}\Phi(d_2) \\
V(S,t) &= Kv(x,\tau) \\
&= S\Phi(d_1)-Ke^{-r(T-t)}\Phi(d_2)
\end{aligned}\]
<p>with</p>
\[\begin{aligned}
d_1 &= \frac{x}{\sqrt{2\tau}}+\frac12(k+1)\sqrt{2\tau} \\
&= \frac{\ln(S/K)}{\sqrt{2\frac{\sigma^2}{2}(T-t)}}+\frac12(\frac{2r}{\sigma^2}+1)\sqrt{2\frac{\sigma^2}{2}(T-t)}\\
&= \frac{\ln(S/K)}{\sqrt{\sigma^2(T-t)}}+(\frac{r}{\sigma^2}+\frac12)\sqrt{\sigma^2(T-t)}\\
&= \frac{\ln(S/K)+(r+\frac12\sigma^2)(T-t)}{\sqrt{\sigma^2(T-t)}}\\
%
d_2 &= \frac{x}{\sqrt{2\tau}}+\frac12(k-1)\sqrt{2\tau} \\
&= \frac{\ln(S/K)}{\sqrt{2\frac{\sigma^2}{2}(T-t)}}+\frac12(\frac{2r}{\sigma^2}-1)\sqrt{2\frac{\sigma^2}{2}(T-t)} \\
&= \frac{\ln(S/K)}{\sqrt{\sigma^2(T-t)}}+(\frac{r}{\sigma^2}-\frac12)\sqrt{\sigma^2(T-t)} \\
&= \frac{\ln(S/K)+(r-\frac12\sigma^2)(T-t)}{\sqrt{\sigma^2(T-t)}} \\
\end{aligned}\]Adrian S. Tamrighthandabacus@users.github.comIt is well known for a long time that the quant finance borrowed a lot of results from physics. The notable Feynman-Kac formula is one example. In the case of vanilla European option pricing, the Black-Scholes formula gives the following result: