The Lipschutz continuity definition inspired me on how to distort a curve to “smooth out” outliers. Smoothing out means that we want to retain most characteristics of a curve if it falls within proximity of the mean (for whatever definition of proximity and mean) and for the part that the curve is distorted, it should not depends on explicit parameters. Thus, for example, cap the curve into is less favorible because the cap has to be dependent to to be useful.

Recall that Lipschutz continuity means:

Roughly speaking, the slope of is bounded (by ), for slope is defined as if is connected. Now consider a simplified function to make the problem tractable, namely, is monotonically increasing and defined only on . Then the outlier is either at the proximity of 0 or 1, and the derivative due to monotonicity.

If we define a *modified* derivative, for some ,
then we have an approximate of as

Consider the case is roughly at the middle portion of and is defined as

Then will be exactly the same as for , and it will be distorted from at iff too large (gauged by ).

Let’s look at an example (code in Julia):

```
using PyPlot
function outlier_distort(vector, xmin, xmax)
N = length(vector)
imin = round(Int, (xmin * (N-1)) + 1)
imax = round(Int, (xmax * (N-1)))
delta = vector[2:end] - vector[1:end-1]
maxdelta = maximum(delta[imin:imax]) # max element in delta[imin:imax]
delta = min.(delta, fill(maxdelta, size(delta))) # elementwise cap delta to maxdelta
newvector = foldl( (v,x)->push!(v,v[end]+x), [vector[1]], delta)
offset = newvector[round(Int,N/2)] - vector[round(Int,N/2)]
return newvector .- offset
end
value = [30.67, 31.88, 31.99, 32.11, 32.19, 32.56, 32.61, 32.89, 32.90, 32.91,
32.94, 33.02, 33.08, 33.16, 33.41, 33.74, 33.77, 33.92, 33.96, 34.07,
34.12, 34.14, 34.18, 34.18, 34.25, 34.33, 34.33, 34.33, 34.36, 34.36,
34.38, 34.51, 34.55, 34.57, 34.59, 34.59, 34.62, 34.68, 34.68, 34.70,
34.72, 34.74, 34.79, 34.86, 34.86, 34.89, 34.91, 34.92, 34.96, 34.97,
34.98, 35.00, 35.00, 35.00, 35.01, 35.03, 35.03, 35.03, 35.03, 35.05,
35.05, 35.06, 35.09, 35.09, 35.09, 35.12, 35.13, 35.14, 35.19, 35.19,
35.24, 35.25, 35.25, 35.27, 35.27, 35.29, 35.29, 35.35, 35.36, 35.38,
35.40, 35.41, 35.43, 35.50, 35.50, 35.56, 35.60, 35.60, 35.63, 35.66,
35.66, 35.68, 35.68, 35.69, 35.71, 35.71, 35.71, 35.74, 35.76, 35.76,
35.77, 35.78, 35.78, 35.80, 35.80, 35.81, 35.81, 35.81, 35.82, 35.83,
35.85, 35.87, 35.87, 35.88, 35.89, 35.89, 35.89, 35.89, 35.90, 35.91,
35.93, 35.94, 35.94, 35.94, 35.94, 35.97, 35.97, 35.98, 36.00, 36.00,
36.01, 36.01, 36.01, 36.02, 36.03, 36.03, 36.03, 36.03, 36.04, 36.04,
36.07, 36.11, 36.11, 36.11, 36.12, 36.12, 36.16, 36.17, 36.17, 36.19,
36.19, 36.20, 36.21, 36.21, 36.24, 36.24, 36.25, 36.26, 36.39, 36.39,
36.39, 36.39, 36.44, 36.46, 36.58, 36.69, 36.90, 37.02, 37.08, 37.11,
37.15, 37.16, 37.18, 37.37, 37.69, 37.71, 37.71, 37.74, 37.87, 37.92,
37.97, 37.97, 38.02, 38.02, 38.05, 38.15, 38.22, 38.22, 38.25, 38.36,
38.36, 38.38, 38.47, 38.55, 38.57, 38.63, 38.65, 38.65, 38.73, 38.76,
38.76, 38.80, 38.80, 38.84, 38.85, 38.85, 38.91, 39.07, 39.09, 39.12,
39.15, 39.27, 39.27, 39.32, 39.34, 39.42, 39.43, 39.46, 39.48, 39.50,
39.57, 39.57, 39.59, 39.67, 39.70, 39.83, 39.87, 39.87, 39.88, 39.91,
39.97, 40.00, 40.00, 40.05, 40.06, 40.06, 40.08, 40.11, 40.13, 40.16,
40.23, 40.27, 40.30, 40.35, 40.35, 40.40, 40.41, 40.58, 40.58, 40.63,
40.67, 40.68, 40.71, 40.71, 40.71, 40.83, 40.86, 40.87, 40.87, 40.92,
41.01, 41.01, 41.03, 41.07, 41.12, 41.12, 41.23, 41.25, 41.25, 41.27,
41.29, 41.33, 41.33, 41.34, 41.42, 41.47, 41.54, 41.58, 41.59, 41.65,
41.65, 41.75, 41.85, 41.85, 41.85, 41.85, 41.88, 42.01, 42.32, 42.57,
42.57, 42.61, 42.66, 42.66, 42.78, 42.78, 42.89, 42.93, 42.93, 43.05,
43.21, 43.47, 43.60, 43.87, 44.06, 44.35, 45.04, 45.04, 45.09, 45.09,
45.45, 46.02, 46.02, 46.52, 46.90, 46.90, 46.90, 46.90, 47.69, 50.52,
55.76, 98.97, 99.07, 99.25, 99.28, 99.33, 99.40, 99.40, 99.42, 99.43,
99.44, 99.44, 99.46, 99.47, 99.50, 99.55, 99.60, 99.79, 99.84]
N = max(size(value)...)
x = collect(1:N) ./ N
distorted = outlier_distort(value, 0.2, 0.8)
plot(x,value,x,distorted)
```

The function `outlier_distort`

expects input of monotonically increasing
values, and finds from the 20% to 80%-ile, with . The plot of the
above code:

The orange line is the distorted curve and the blue curve is . We can notice that at above 0.95, that bit of flat curve looks alike in both colors albeit at different altitude. And the part that quickly ramps up in the blue curve is moderated in the orange curve such that it is less disruptive. A larger jump would be allowed if is larger.

The code above is suitable for a percentile function. For general , we can use similar technique with a “translate” function that converts any value into . Such as the following:

```
function interpolate(invector, outvector, val)
r = searchsorted(invector, val)
lbound = invector[r.start]
ubound = invector[r.stop]
if lbound == ubound
return outvector[r.start]
else
ratio = (val-lbound)/(ubound-lbound)
return outvector[r.start] * ratio + outvector[r.stop] * (1-ratio)
end
end
perm = shuffle(value)
distortedperm = [interpolate(value, distorted, v) for v=perm]
plot(x,perm,x,distortedperm)
```

Plot: