Recently I encountered with the problem of range minimum query (RMQ) and there are many ways to solve it, depending on the trade off between preprocessing complexity and per-query runtime complexity. Block-paritioning and a sparse table would be clever techniques. A hybrid of both would need some more code but brings great performance.

For reference, this is how we can do a RMQ with different algorithms:

```
import math
import random
import sys
def minargmin(arr, L, R):
"""Reusable function to find the min element in arr between indices L and R inclusive
Returns the min value and the index
"""
try:
return min((v,i) for i,v in enumerate(arr[L:R+1], L))
except ValueError:
return (float("inf"), -1)
def naive(arr, queries):
"""Complexity: < O(1), O(n) >
Search on array directly every time, O(1) preprocessing as none is done,
O(n) time for each query"""
ans = []
for L,R in queries:
ans.append(minargmin(arr, L, R))
return ans
def lookup(arr, queries):
"""Complexity < O(n^2), O(1) >
Build lookup table of O(n^2) size for all possible queries, then lookup
immediately for answer
"""
# preprocess a lookup table
table, N = [], len(arr)
for i in range(N):
table.append([0] * N)
idx, val = i, arr[i]
table[i][i] = i
for j in range(i+1, N):
if arr[j] < val:
idx, val = j, arr[j]
table[i][j] = idx
# find answer for each query
ans = []
for L,R in queries:
idx = table[L][R]
val = arr[idx]
ans.append((val,idx))
return ans
def blockpartition(arr, queries):
"""Complexity < O(n), O(sqrt(n)) >
Build a block-partition table of minimum and combine the array search with table for answer.
Need to consider the corner case that the query range covers no block
"""
# build a block-partition table
N = len(arr)
sqrtN = int(math.sqrt(N))
block = [minargmin(arr, i, i+sqrtN-1) for i in range(0, N, sqrtN)]
def blockmin(begin, end):
try:
return min(block[begin:end])
except ValueError:
return float("inf"), -1
# search on query
ceildiv = lambda a,b: -(a // -b) # ceiling division
ans = []
for L, R in queries:
block_begin = ceildiv(L, sqrtN)
block_end = R // sqrtN
if block_begin < block_end:
# we have one block at least!
L_end = block_begin * sqrtN
R_begin = block_end * sqrtN
candidate = [
minargmin(arr, L, L_end-1),
blockmin(block_begin, block_end),
minargmin(arr, R_begin, R)
]
ans.append(min(candidate))
else:
# range too small to use any block, find minimum directly
ans.append(minargmin(arr, L, R))
return ans
def sparsetable(arr, queries):
"""Complexity < O(n log n), O(1) >
Build a sparse table for lookup, table only supports range of 2^n but can
start anywhere. Any range (L,R) can be a combination of two range at most.
"""
# preprocess a sparse table
N = len(arr)
log2 = N.bit_length()
table = [[0]*log2 for _ in range(N)]
# fill the first column
for i in range(N):
table[i][0] = (arr[i], i)
# fill the subsequent columns
for j in range(1,log2):
size = 1 << j
halfsize = size >> 1
for i in range(N):
if i+size > N:
break # unusable range
table[i][j] = min(table[i][j-1], table[i+halfsize][j-1])
# search on query
ans = []
for L, R in queries:
length = R-L+1
log2 = length.bit_length() - 1
pow2 = 1 << log2
R2 = R - pow2 + 1
# (L,R) as union of (L,L+pow2-1) and (R2,R)
ans.append(min(table[L][log2], table[R2][log2]))
return ans
def test(N, Q):
# Generate random array of N integers
arr = [random.randint(0, 5000) for _ in range(N)]
# Generate queries (L,R) for finding minimum in arr[L:R+1]
queries = [sorted([random.randint(0,N-1), random.randint(0,N-1)]) for _ in range(Q)]
print(arr)
print(f"Queries: {queries}")
print("Answers (naive):")
ans_1 = naive(arr, queries)
print(", ".join(f"arr[{idx}]={val}" for val,idx in ans_1))
print("Answers (lookup):")
ans_2 = lookup(arr, queries)
print(", ".join(f"arr[{idx}]={val}" for val,idx in ans_2))
assert ans_1 == ans_2
print("Answers (block partition):")
ans_3 = blockpartition(arr, queries)
print(", ".join(f"arr[{idx}]={val}" for val,idx in ans_3))
assert ans_1 == ans_3
print("Answers (sparse table):")
ans_4 = sparsetable(arr, queries)
print(", ".join(f"arr[{idx}]={val}" for val,idx in ans_4))
assert ans_1 == ans_4
if __name__ == "__main__":
try:
N = int(sys.argv[1])
except:
N = 100
try:
Q = int(sys.argv[2])
except:
Q = 30
test(N, Q)
```

There is Fischer-Heun structure too, but it would be more code to implement and harder to explain.

Now here’s the Arpa’s trick. First
we use a disjoint set union (DSU) for each element of a unsorted array `arr`

. A
DSU is merely some tree structure stored as an array, zero-indexed with length
N, that each element is an integer of 0 to N such that it is a pointer to
another element in the array. An element may point to itself, which is the root
of a tree. Every element in the array is an arc in a digraph and no cycle should
be formed. It is called DSU because all elements in the same tree will be
considered as in the same set. The tree would not necessarily binary. In fact,
for optimal use, it should be a very fat tree with high fan-out at root and
depth 1.

What the DSU does here is to tell on each element, what is the next number smaller than myself. It is used like this (original was in C++, converted the code into Python):

```
import numpy as np
arr = [3928, 53, 3093, 4657, 2209, 1823, 3613, 1018, 129, 32, # 10
3585, 903, 1538, 2462, 2092, 2093, 2230, 3209, 2800, 1689, # 20
4938, 3443, 386, 2725, 3363, 2351, 2696, 1641, 3931, 1073, # 30
3121, 2160, 1132, 2829, 2447, 2411, 381, 3528, 3309, 1496, # 40
4439, 4848, 4050, 2572, 158, 1076, 4222, 662, 3294, 4084, # 50
4312, 2752, 4420, 210, 4073, 1403, 800, 766, 2433, 1255, # 60
4260, 1391, 215, 1826, 488, 4379, 2582, 4896, 1245, 1328, # 70
1093, 2146, 1081, 48, 4918, 1037, 2653, 2201, 2080, 656, # 80
1124, 2575, 2037, 183, 2912, 2952, 2409, 1323, 1764, 2647, # 90
2035, 1950, 4997, 844, 2437, 2825, 4001, 3263, 3897, 2227] # 100
# variables
N = len(arr) # size of input
dsu = list(range(N)) # DSU
stack = [] # stack
# algorithm
for i in range(N):
while stack and arr[stack[-1]] >= arr[i]:
dsu[stack.pop()] = i
stack.append(i)
# print answer
with np.printoptions(precision=2, linewidth=80, suppress=True, threshold=1000):
print("Array")
print(np.array(arr).reshape(10,-1))
print("DSU")
print(np.array(dsu).reshape(10,-1))
print("Stack")
print(stack)
```

The use of numpy above is just for aesthetic display. Initially the DSU makes
every node a tree of itself. The for loop moves the cursor `i`

from index 0
till the end of the array `arr`

. All indices will be pushed into the stack. But
when we are at an index, we check with the elements at the top of the stack and
rewrite the DSU to the current index if the array’s element is greater. This
for loop essentially makes the DSU tells what is the next element that is
smaller than myself.

The above code runs to produce the following:

```
Array
[[3928 53 3093 4657 2209 1823 3613 1018 129 32]
[3585 903 1538 2462 2092 2093 2230 3209 2800 1689]
[4938 3443 386 2725 3363 2351 2696 1641 3931 1073]
[3121 2160 1132 2829 2447 2411 381 3528 3309 1496]
[4439 4848 4050 2572 158 1076 4222 662 3294 4084]
[4312 2752 4420 210 4073 1403 800 766 2433 1255]
[4260 1391 215 1826 488 4379 2582 4896 1245 1328]
[1093 2146 1081 48 4918 1037 2653 2201 2080 656]
[1124 2575 2037 183 2912 2952 2409 1323 1764 2647]
[2035 1950 4997 844 2437 2825 4001 3263 3897 2227]]
DSU
[[ 1 9 4 4 5 7 7 8 9 9]
[11 22 22 14 19 19 19 18 19 22]
[21 22 36 25 25 27 27 29 29 36]
[31 32 36 34 35 36 44 38 39 44]
[42 42 43 44 73 47 47 53 51 51]
[51 53 53 73 55 56 57 62 59 62]
[61 62 73 64 73 66 68 68 70 70]
[72 72 73 73 75 79 77 78 79 83]
[83 82 83 83 86 86 87 93 93 90]
[91 93 93 93 99 99 97 99 99 99]]
Stack
[9, 73, 83, 93, 99]
```

We can see that `dsu[0]==1`

as the next element in `arr`

that is smaller than
`arr[0]`

is `arr[1]`

. Similarly `dsu[8]==9`

. But `dsu[9]==9`

because itself is
the smallest element in the array. We also have `dsu[73]==73`

because
`arr[73]==48`

, the second smallest element in the array. We have `dsu[73] != 9`

because we always have `dsu[i] >= i`

for it points to the *next* smaller
element. Upon finish, the stack has all the elements that has `dsu[i]==i`

,
i.e., the root of trees. The DSU partitions nodes into index ranges 0 to
\(k_1\), then \((k_1+1)\) to \(k_2\), etc., with final partition
\((k_{n-1}+1)\) to \(k_n=N-1\). Each partition is one set such that if we
navigate from index `i`

according to the index pointed by `dsu[i]`

, we
eventually reached and stay at \(k_m\) which `i`

was started in the same
partition. For example, let’s pick `i==23`

. We see this:

```
dsu[23] == 25
dsu[25] == 27
dsu[27] == 29
dsu[29] == 36
dsu[36] == 44
dsu[44] == 73
dsu[73] == 73
```

and we will never jump to below 23 or above 73. Now we can see that the line
`dsu[stack.pop()] = i`

is to merge an existing set to a new root in DSU.

This is a powerful structure because for any range \((L,R)\) with \(R\) at the
end of the array (or DSU), we just need to start with the index \(L\) of the
DSU and navigate until we meet the root. Then it is the minimum of the range.
In fact, the DSU we built in the above loop is incremental. When the loop index
is at `i`

, we processed the input array `arr`

up to index `i`

and the DSU works
as we described in the above paragraph for up to index `i`

too.

Here is how we can use this DSU trick to do the RMQ. Consider the queries and additional code below:

```
queries = [[61, 78], [53, 74], [14, 26], [15, 96], [63, 80],
[ 3, 62], [ 1, 49], [ 2, 57], [ 9, 33], [16, 83],
[69, 80], [62, 84], [25, 58], [29, 75], [28, 55],
[12, 53], [52, 97], [11, 96], [66, 98], [ 9, 27],
[39, 86], [23, 88], [22, 96], [66, 68], [56, 83],
[ 3, 7], [31, 44], [ 9, 88], [ 5, 60], [18, 71]]
Q = len(queries) # number of queries
qu = [[] for _ in range(N)] # 2D array to group queries by R-end
ans = [None]*Q # to hold answers
# set up 2D array of queries
for i,(L,R) in enumerate(queries):
qu[R].append(i)
# helper function on DSU
def root(k):
if dsu[k] == k:
return k
dsu[k] = root(dsu[k])
return dsu[k]
# algorithm
for i in range(N):
while stack and arr[stack[-1]] >= arr[i]:
dsu[stack.pop()] = i
stack.append(i)
for j in qu[i]:
idx = root(queries[j][0])
ans[j] = [idx, arr[idx]]
```

The function `root(k)`

is to get the root of a tree that element `k`

belongs in
the DSU. When it is invoked, it may mutate the DSU to make it more efficient by
bringing nodes to direct element of the root (a.k.a. *path compression*). Hence
after we called `root(23)`

, we will see `dsu[23]==73`

so as `dsu[25]==73`

etc.,
due to the recursive nature of the function. This is allowed because the
ultimate use of DSU is to find the root, i.e., the minimum element found so far
starting from index `i`

, rather than what is the next smaller element.

Range queries are presented as a pair \((L,R)\) which, in Python notation, is
to find `min(arr[L:R+1])`

. We assumed \(L\le R\). We used a kind of radix sort
to organize the queries by the \(R\) term. And we answer those queries right
after we updated the DSU up to index \(R\). We can’t wait until the DSU is
fully processed for the entire array or otherwise the DSU will reflect the
minimum in ranges of \((L,N-1)\) instead.

The following is the complete code, with the model answer to verify the correctness of the code:

```
import numpy as np
# input array
arr = [3928, 53, 3093, 4657, 2209, 1823, 3613, 1018, 129, 32, # 10
3585, 903, 1538, 2462, 2092, 2093, 2230, 3209, 2800, 1689, # 20
4938, 3443, 386, 2725, 3363, 2351, 2696, 1641, 3931, 1073, # 30
3121, 2160, 1132, 2829, 2447, 2411, 381, 3528, 3309, 1496, # 40
4439, 4848, 4050, 2572, 158, 1076, 4222, 662, 3294, 4084, # 50
4312, 2752, 4420, 210, 4073, 1403, 800, 766, 2433, 1255, # 60
4260, 1391, 215, 1826, 488, 4379, 2582, 4896, 1245, 1328, # 70
1093, 2146, 1081, 48, 4918, 1037, 2653, 2201, 2080, 656, # 80
1124, 2575, 2037, 183, 2912, 2952, 2409, 1323, 1764, 2647, # 90
2035, 1950, 4997, 844, 2437, 2825, 4001, 3263, 3897, 2227] # 100
# queries, (L,R) inclusive
queries = [[61, 78], [53, 74], [14, 26], [15, 96], [63, 80],
[ 3, 62], [ 1, 49], [ 2, 57], [ 9, 33], [16, 83],
[69, 80], [62, 84], [25, 58], [29, 75], [28, 55],
[12, 53], [52, 97], [11, 96], [66, 98], [ 9, 27],
[39, 86], [23, 88], [22, 96], [66, 68], [56, 83],
[ 3, 7], [31, 44], [ 9, 88], [ 5, 60], [18, 71]]
# model answers, (idx, value)
model_answer = [[73, 48], [73, 48], [22, 386], [73, 48], [73, 48],
[ 9, 32], [ 9, 32], [ 9, 32], [ 9, 32], [73, 48],
[73, 48], [73, 48], [44, 158], [73, 48], [44, 158],
[44, 158], [73, 48], [73, 48], [73, 48], [ 9, 32],
[73, 48], [73, 48], [73, 48], [68,1245], [73, 48],
[ 7,1018], [44, 158], [ 9, 32], [ 9, 32], [44, 158]]
# variables
N = len(arr) # size of input
Q = len(queries) # number of queries
stack = [] # stack
dsu = list(range(N)) # DSU
ans = [None]*Q
qu = [[] for _ in range(N)] # 2D array of queries
# set up 2D array of queries
for i,(L,R) in enumerate(queries):
qu[R].append(i)
# helper function on DSU
def root(k):
if dsu[k] == k:
return k
dsu[k] = root(dsu[k])
return dsu[k]
# algorithm
for i in range(N):
while stack and arr[stack[-1]] >= arr[i]:
dsu[stack.pop()] = i
stack.append(i)
for j in qu[i]:
idx = root(queries[j][0])
ans[j] = [idx, arr[idx]]
# print answer
with np.printoptions(precision=2, linewidth=80, suppress=True, threshold=1000):
print("Array")
print(np.array(arr).reshape(10,-1))
print("Queries")
print(np.array(queries))
print("Answers")
print(np.array(ans))
print("(correct)" if ans == model_answer else "(incorrect)")
```

This code processed the input array sequentially. When the DSU is built, the while-loop will run in the order of the size of the stack. Similarly at the inner for-loop, the DSU root query will be done in the order of the depth of a tree (\(O(\log n)\)). When the path compression is performed, a DSU should be run in the order of \(O(\alpha(n))\) for \(\alpha(n)\) the inverse Ackermann function, which is very close to \(O(1)\). Hence the total overall time complexity (so as space complexity) should be close to \(O(N+Q)\).