Recently I encountered a problem of mixed integer programming with linear constraints but non-linear convex objective function. There do not seem to have any nice solver package available in Python. Even if I tried to look for anything with a C/C++ API, there are not many open source solutions, AFAIK. Therefore my second attempt is to solve the problem heuristically. It turns out the problem structure may allow a simple hill climbing (i.e., greedy search algorithm). However, while the objective function is convex, I cannot prove the solution domain is contiguous in all cases, hence I cannot guarantee there will not be any local minima. So play safe and use simulated annealing can be a good move.
Simulated annealing is a metaheuristic algorithm which means it depends on a handful of other parameters to work. But a simple skeleton algorithm is as follows:
def simulated_annealing(s0, k_max): s = s0 for k in range(k_max): T = temperature(k/k_max) s_new = neighbour(s) if P(E(s), E(s_new), T) >= random.random(): s = s_new return s
It starts searching with an initial solution
s0 at temperature
depends on an annealing schedule
temperature(r). It keeps looking for a
neighbour(s) and accept such neighbour by a probability
P(), which in turn depends on the energy level
E() of solutions.
The acceptance probability function (e.g., in Metropolis-Hastings algorithm) is
exponential for to be the
energy level of
s and its neighbour respectively. The function
raise above 1 if the neighbour of
s has lower energy (i.e. better) and in
such case the neighbour is accepted deterministically.
Variations of algorithm
Because simulated annealing is a metaheuristic algorithm, there are many variations available. A paper, Adaptive simulated annealing (ASA): Lessons learned, gave a very concise summary.
The original simulated annealing is modeled after Boltzmann equation. Thus named as Boltzmann Annealing (BA). It has a solution modelled as a -vector with the annealing schedule for denote the step count. It search a solution space by the probability density
where which usually is the current solution and is the neighbour. This is modelling the class of physical systems that called the Gaussian-Markovian systems. And the acceptance probability function is
where and this is also the form of probability function mentioned above. The Boltzmann Annealing is characterized by the annealing schedule . It is proved that with such schedule, any point in the solution space can be sampled infinitely often in annealing time, i.e., , or the sampling is ergodic.
If we switch to use the exponential schedule
this will make the annealing faster but violates the assumption of ergodic sampling in BA. But people find this useful and named this as simulated quenching. Such schedule is also called the exponential cooling scheme (ECS).
Another variation is to use the Cauchy distribution for solution probability density:
which has a fatter tail than the Gaussian distribution as in BA. Therefore, it is easier to get around the local minima while searching for the global minima. However, such density allows statistically same ergodicity as BA if the annealing schedule is ., due to
Therefore, it is faster while similarly effective. This one is called fast annealing (FA).
Adaptive Simulated Annealing (ASA) is a further extension that allows each component of the solution vector to take a different domain, with different density function, and undergo a different annealing schedule. This expands the number of parameters to the algorithm to many folds.
The literature of simulated annealing usually do not explicitly concern about constrained optimization. One way to apply SA to constrained optimization is to implement the logic in the neighbour function so that each neighbour used is always within constrain. Another way is to implement the constrain in the objective function as penalties, e.g.
for each the magnitude of violation in constrain and the corresponding weight in penalty. This approach may not be a good one especially if the valid solutions are sparse.
SA algorithm is mostly controlled by the temperate and its annealing schedule.
The well-received believe is that the initial temperature should be set to such a level that there is at least 80% probability to accept an increase in the objective function. And during the last 40% of annealing time, the algorithm should search in proximity of the optimal solution. One way to find the initial temperature is to estimate (or observe) the average objective increase and then set for the initial probability of acceptance (e.g., ).
The annealing schedule can also be fine-tuned such that there can be more than one neighbour searched in every temperature step . Indeed, for a fixed temperature, the search and transition between neighbours forms a Markov Chain. We can designate a maximum length for a Markov Chain at step such that we either check a total of solutions or accepted solutions, then the temperate should be decreased for one step.
I found a Python package simanneal for simulated annealing. SA algorithm is simple enough that we can code our own, this package may not be an absolute necessary. However, it is simple enough to use. Let’s start with the code:
#!/usr/bin/env python import copy import random from simanneal import Annealer # from https://neos-guide.org/content/sudoku _ = 0 PROBLEM = [ 1, _, _, _, _, 6, 3, _, 8, _, _, 2, 3, _, _, _, 9, _, _, _, _, _, _, _, 7, 1, 6, 7, _, 8, 9, 4, _, _, _, 2, _, _, 4, _, _, _, 9, _, _, 9, _, _, _, 2, 5, 1, _, 4, 6, 2, 9, _, _, _, _, _, _, _, 4, _, _, _, 7, 6, _, _, 5, _, 7, 6, _, _, _, _, 3, ] def print_sudoku(state): border = "------+------+------" rows = [state[i:i+9] for i in range(0,81,9)] for i,row in enumerate(rows): if i % 3 == 0: print(border) three = [row[i:i+3] for i in range(0,9,3)] print(" |".join( " ".join(str(x or "_") for x in one) for one in three )) print(border) def init_solution_row(problem): """Generate a random solution from a Sudoku problem """ solution =  for i in range(0, 81, 9): row = problem[i:i+9] permu = [n for n in range(1,10) if n not in row] random.shuffle(permu) solution.extend([n or permu.pop() for n in row]) return solution def coord(row, col): return row*9 + col class Sudoku_Row(Annealer): def __init__(self, problem): self.problem = problem state = init_solution_row(problem) super().__init__(state) def move(self): """randomly swap two cells in a random row""" row = random.randrange(9) coords = range(9*row, 9*(row+1)) n1, n2 = random.sample([n for n in coords if self.problem[n] == 0], 2) self.state[n1], self.state[n2] = self.state[n2], self.state[n1] def energy(self): """calculate the number of violations: assume all rows are OK""" column_score = lambda n: -len(set(self.state[coord(i, n)] for i in range(9))) square_score = lambda n, m: -len(set(self.state[coord(3*n+i, 3*m+j)] for i in range(3) for j in range(3))) return sum(column_score(n) for n in range(9)) + sum(square_score(n,m) for n in range(3) for m in range(3)) def coord(row, col): return row*9+col def main(): sudoku = Sudoku_Row(PROBLEM) sudoku.copy_strategy = "slice" sudoku.steps = 1000000 print_sudoku(sudoku.state) state, e = sudoku.anneal() print("\n") print_sudoku(state) print(e) if __name__ == "__main__": main()
Instead of try with the traditional unconstrained continuous minimization
problem, we go with an unnatural choice of a constrained combinatorial problem.
We pick the sudoku problem from https://neos-guide.org/content/sudoku, and
simanneal module. The implementation is easy: Just derive a class
Annealer and define two member functions
neighbour search and objective function to minimize respectively. The SA
algorithm should start with initializing the
Annealer with an initial
solution and call
anneal() to get the optimal state variable and its
corresponding energy level.
The above program try to find a permutation of 1 to 9 in each row and check if it comes up with a solution to the sudoku problem. Every step is to pick a row in the original state and swap two positions. The energy function is to count the number of distinct elements in each row and square block and take the negation of the sum. A perfect solution will be . This program does not work well. It does not come up with a correct solution. So my second attempt is not to shuffle by row but by square blocks:
import copy import random import numpy as np from simanneal import Annealer # from https://neos-guide.org/content/sudoku _ = 0 PROBLEM = np.array([ 1, _, _, _, _, 6, 3, _, 8, _, _, 2, 3, _, _, _, 9, _, _, _, _, _, _, _, 7, 1, 6, 7, _, 8, 9, 4, _, _, _, 2, _, _, 4, _, _, _, 9, _, _, 9, _, _, _, 2, 5, 1, _, 4, 6, 2, 9, _, _, _, _, _, _, _, 4, _, _, _, 7, 6, _, _, 5, _, 7, 6, _, _, _, _, 3, ]) def print_sudoku(state): border = "------+-------+------" rows = [state[i:i+9] for i in range(0,81,9)] for i,row in enumerate(rows): if i % 3 == 0: print(border) three = [row[i:i+3] for i in range(0,9,3)] print(" | ".join( " ".join(str(x or "_") for x in one) for one in three )) print(border) def coord(row, col): return row*9+col def block_indices(block_num): """return linear array indices corresp to the sq block, row major, 0-indexed. block: 0 1 2 (0,0) (0,3) (0,6) 3 4 5 --> (3,0) (3,3) (3,6) 6 7 8 (6,0) (6,3) (6,6) """ firstrow = (block_num // 3) * 3 firstcol = (block_num % 3) * 3 indices = [coord(firstrow+i, firstcol+j) for i in range(3) for j in range(3)] return indices def initial_solution(problem): """provide sudoku problem, generate an init solution by randomly filling each sq block without considering row/col consistency""" solution = problem.copy() for block in range(9): indices = block_indices(block) block = problem[indices] zeros = [i for i in indices if problem[i] == 0] to_fill = [i for i in range(1, 10) if i not in block] random.shuffle(to_fill) for index, value in zip(zeros, to_fill): solution[index] = value return solution class Sudoku_Sq(Annealer): def __init__(self, problem): self.problem = problem state = initial_solution(problem) super().__init__(state) def move(self): """randomly swap two cells in a random square""" block = random.randrange(9) indices = [i for i in block_indices(block) if self.problem[i] == 0] m, n = random.sample(indices, 2) self.state[m], self.state[n] = self.state[n], self.state[m] def energy(self): """calculate the number of violations: assume all rows are OK""" column_score = lambda n: -len(set(self.state[coord(i, n)] for i in range(9))) row_score = lambda n: -len(set(self.state[coord(n, i)] for i in range(9))) score = sum(column_score(n)+row_score(n) for n in range(9)) if score == -162: self.user_exit = True # early quit, we found a solution return score def main(): sudoku = Sudoku_Sq(PROBLEM) sudoku.copy_strategy = "method" print_sudoku(sudoku.state) #sudoku.steps = 1000000 #auto_schedule = sudoku.auto(minutes=1) #print(auto_schedule) #sudoku.set_schedule(auto_schedule) sudoku.Tmax = 0.5 sudoku.Tmin = 0.05 sudoku.steps = 100000 sudoku.updates = 100 state, e = sudoku.anneal() print("\n") print_sudoku(state) print("E=%f (expect -162)" % e) if __name__ == "__main__": main()
This version is to find a neighbour by picking a square
block randomly and swap two positions in it. Therefore each block is guaranteed
to have 1 to 9 but we have to check each row and column for consistency. We
also add assert
self.user_exit when a solution is found so we can have a
early termination. The performance is only slightly better (or may be not, I
didn’t collect enough statistics to claim this). And neither program ever find
However, if we try to use the
auto() function to find the schedule (the
commented lines in
main() above), the program start to find a solution,
albeit not always. A few other modification increased the chance of reaching a
simanneal module, the temperature update is based on this formula:
T = self.Tmax * math.exp(Tfactor * step / self.steps)
Tfactor is a constant derived from
Tmin so that it
guarantees the temperate
T goes from
Tmin over the annealing
schedule. It turns out the normal setting is having the temperature dropping
too fast. This is an adverse effect to the search that trap us in a sub-optimal
solution. If we change this to
T = 0.99999 * T, it will be easier to reach a
solution. But we can achieve the same by gauging
Tmin to closer to
each other, so we have the second modification:
Secondly, we set some particular number for
They are found empirically such that a solution can be reached in a few seconds
instead of minutes. It turns out a low
Tmax and comparable
maintain a good effectiveness by keeping the temperature decrease steady. The
following chart shows the annealing schedule according to the exponential decay
simanneal above, scaled to match the starting point at step 0. The
blue curve is using the value 0.5 and 0.05 respectively, as in the code above.
The red curve is the default value of 2500 and 2.5 respectively. The absolute
T does not matter in this chart but the ratio of
do. The value of
T, however, plays a role in the probability function of
acceptance and thus it set as such.
How about SA with constant temperature? Make
Tmax equal. How
about simple hill-climbing instead of SA? Set
Tmax to as close to zero as
possible so the acceptance probability function is virtually 0. But a problem
of combinatorial nature should not be suitable for hill-climbing search.
Lester Ingber, Adaptive simulated annealing (ASA): Lessons learned. Control and Cybernetics, 1995. arXiv:cs/0001018
Franco Busetti, Simulated annealing overview