Let’s hold back for a moment and talk about why we can’t stop at tuning solvers.

Article directory

Article background
backpack problem
assignment problem
Summarize
references

Article background

After practicing the method of integer programming modeling + solver calculation, I felt that I had found a master key. All combinatorial optimization problems can be solved in this way.

But in fact, I also learned that many classic combinatorial optimization problems have relatively classic solving algorithms, such as the dynamic programming algorithm for the knapsack problem and the Hungarian algorithm for the assignment problem.

This triggered a question in me: What is the value of these classic algorithms? Is it still necessary to learn and use it?

The follow-up of this article will discuss in detail the solutions and effects of the knapsack problem and assignment problem in order to answer the above doubts.

backpack problem

The knapsack problem can be described as: given $n$ weight $w_1, w_2,..., w_n$ , 价值为 $v_1,v_2,···,v_n$ Items that have a maximum load capacity of $For W$ 's backpack, find a subset of these items with the highest value that can be loaded into the backpack.

mathematical model

integer programming

Ignore the classical algorithm for the moment and model it directly as an integer programming problem.

fixed义 $x_{i}$ Part $Whether i$ items are put into the backpack, when the value is 0, it means not put in, and when the value is 1, it means put.

At this time, the following integer programming model
$\quad \sum_{i=1}^nv_{i}x_{i} \\ \text{st} \quad \sum_{i=1}^nw_ix_{i}≤W, \quad i =1,2,...,n \\ \nonumber x_{i} \in \{0,1\} ,\quad i=1,2,...,n\\$

dynamic programming

As for the classic algorithm for solving the knapsack problem, anyone who has used LeetCode should know that it is a dynamic programming algorithm. Since the algorithm principle of dynamic programming is not the focus of this article, I only give a link here , and those who are interested can check it out for themselves.

Simulation

The following code is based on the integer programming algorithm and dynamic programming algorithm for solving the knapsack problem implemented in Python. By adjusting $The value of N$ can change the scale of the knapsack problem. Therefore, we can intuitively compare the results of the two algorithms under different problem sizes, including the quality of the optimal solution and the speed of solution, to evaluate the capabilities of the algorithm.

from ortools.linear_solver import pywraplp
import numpy as np
import time


def calc_by_ortools(N, w, v, W):
    # 声明ortools求解器，使用SCIP算法
    solver = pywraplp.Solver.CreateSolver('SCIP')

    # 优化变量，0-1变量
    x = {
    
    }
    for j in range(N):
        x[j] = solver.IntVar(0, 1, 'x[%i]' % j)

    # 目标函数
    obj_expr = [v[j][0] * x[j] for j in range(N)]
    solver.Maximize(solver.Sum(obj_expr))

    # 约束条件
    cons_expr = [w[j][0] * x[j] for j in range(N)]
    solver.Add(solver.Sum(cons_expr) <= W)

    # 模型求解
    status = solver.Solve()

    # 打印模型结果
    if status == pywraplp.Solver.OPTIMAL:
        # 求解成功，打印最优目标函数值
        print('ortools, best_f =', solver.Objective().Value())

    else:
        # 求解不成功，提示未收敛
        print('not converge.')


def calc_by_dp(weight, value, bag_weight):
    # 初始化: 全为0
    dp = [0] * (bag_weight + 1)

    # 先遍历物品, 再遍历背包容量
    for i in range(len(weight)):
        for j in range(bag_weight, weight[i][0] - 1, -1):
            # 递归公式
            dp[j] = max(dp[j], dp[j - weight[i][0]] + value[i][0])
    print('dp, best_f =', dp[-1])


if __name__ == '__main__':
    # 设置随机种子，确保每次运行生成的随机数相同
    np.random.seed(0)

    # 设定物品数量N，重量w，价值v，背包可承重W
    N = 1000
    w = np.random.randint(1, 10, (N, 1))
    v = np.random.randint(1, 100, (N, 1))
    W = int(N / 10)
    print('N = ', N)

    # 使用ortools求解，并统计计算耗时
    t0 = time.time()
    calc_by_ortools(N, w, v, W)
    print('ortools计算耗时：{}'.format(time.time() - t0))

    # 使用动态规划方法求解，并统计计算耗时
    t1 = time.time()
    calc_by_dp(w, v, W)
    print('dp计算耗时：{}'.format(time.time() - t1))

The following table shows the two algorithms in different $Detailed performance data on N$ , where ortools refers to the integer programming algorithm and dp refers to the dynamic programming algorithm.

From the perspective of solution quality, both algorithms can find the global optimal solution, so there is no difference.

However, there is a big difference between the two algorithms in terms of solution efficiency: when N<1000, the calculation time of ortools is greater than dp, but the absolute values are very small; when N=1000, the calculation time difference between ortools and dp is already relatively small. Small; continue to increase $After N$ , the calculation time of ortools is less than dp, and the calculation time of dp obviously increases faster than that of ortools.

N	algorithm	Optimal solution	Time consuming, s
10	ortools	89	0.0085
10	dp	89	0.0000
100	ortools	616	0.0117
100	dp	616	0.0007
1000	ortools	6154	0.0424
1000	dp	6154	0.0629
10000	ortools	60509	0.4257
10000	dp	60509	7.6769
100000	ortools	617258	5.111
100000	dp	617258	730.8

At first glance, there seems to be nothing wrong with this comparison. But when I was summarizing, I suddenly remembered that ortools is written based on C++, and dp is written in Python. Will dp suffer a loss in programming language, so I changed it to Java and try again (don’t ask me why I don’t use C++, If you ask, you won’t).

The following is the Java version of the algorithm implementation. The overall logic is consistent with Python, so I won’t go into details.

import java.util.Random;
import com.google.ortools.Loader;
import com.google.ortools.linearsolver.MPConstraint;
import com.google.ortools.linearsolver.MPObjective;
import com.google.ortools.linearsolver.MPSolver;
import com.google.ortools.linearsolver.MPVariable;

public class ZeroOnePack {
    
    

    // 预加载本地库
     static {
    
    
        Loader.loadNativeLibraries();
    }

    public static void DP(int W, int N, int[] weight, int[] value){
    
    
        //动态规划
        int[] dp = new int[W +1];
        for(int i=1;i<N+1;i++){
    
    
            //逆序实现
            for(int j = W; j>=weight[i-1]; j--){
    
    
                dp[j] = Math.max(dp[j-weight[i-1]]+value[i-1],dp[j]);
            }
        }

        // 打印最优解
        System.out.println("DP, best_f: " + dp[W]);

    }

    public static void orToolsMethod(int W, int N, int[] weight, int[] value){
    
    
         // 声明求解器
         MPSolver solver = MPSolver.createSolver("SCIP");
         if (solver == null) {
    
    
             System.out.println("Could not create solver SCIP");
             return;
         }

         // 优化变量
         MPVariable[] x = new MPVariable[N];
         for (int j = 0; j < N; ++j) {
    
    
             x[j] = solver.makeIntVar(0.0, 1, "");
         }

         // 目标函数
         MPObjective objective = solver.objective();
         for (int j = 0; j < N; ++j) {
    
    
             objective.setCoefficient(x[j], value[j]);
         }

         // 约束条件
         objective.setMaximization();
         MPConstraint constraint = solver.makeConstraint(0, W, "");
         for (int j = 0; j < N; ++j) {
    
    
             constraint.setCoefficient(x[j], weight[j]);
         }
         // 模型求解
         MPSolver.ResultStatus resultStatus = solver.solve();

         if (resultStatus == MPSolver.ResultStatus.OPTIMAL) {
    
    
             // 求解成功，打印最优目标函数值
             System.out.println("ortools, best_f = " + objective.value());
         } else {
    
    
             // 求解不成功，提示未收敛
             System.err.println("The problem does not have an optimal solution.");
         }
     }

    public static void main(String[] args) {
    
    
         //设置随机种子，确保每次运行生成的随机数相同
         Random rand =new Random(0);

        // 设定物品数量N，重量weight，价值value，背包可承重W
         int N = 1000000;

         int[] weight=new int[N];
         for(int i=0;i<weight.length;i++){
    
    
             weight[i]= rand.nextInt(10) + 1;
         }

         int[] value=new int[N];
         for(int i=0;i<value.length;i++){
    
    
             value[i]= rand.nextInt(100) + 1;
         }

         int W = (int) N / 10;
         System.out.println("N = " + N);

         // 使用ortools求解，并统计计算耗时
         long start = System.currentTimeMillis();
         orToolsMethod(W, N, weight, value);
         System.out.println("cost time: " + (System.currentTimeMillis() - start) + " ms");

         // 使用动态规划方法求解，并统计计算耗时
         start = System.currentTimeMillis();
         DP(W, N, weight, value);
         System.out.println("cost time: " + (System.currentTimeMillis() - start) + " ms");

     }
}

The following table shows two algorithms (Java version) in different For detailed performance data on $N , you need to pay attention to the fourth column here. The time-consuming unit is ms, and in the Python version it is s.$

Because of the use of Java, the computational efficiency of dp has increased significantly, for example, $N = At 100000$ , java takes 2s, but python takes up to 730s.

But even so, the conclusion of the algorithm comparison remains unchanged: both algorithms can find the optimal solution; but in terms of computational efficiency, ortools first lags behind and then leads.

N	algorithm	Optimal solution	Time consuming, ms
10	ortools	78	9
10	dp	78	0
100	ortools	481	10
100	dp	481	0
1000	ortools	6224	17
1000	dp	6224	3
10000	ortools	60442	81
10000	dp	60442	22
100000	ortools	603439	2405
100000	dp	603439	2039
200000	ortools	1207108	3128
200000	dp	1207108	6994
1000000	ortools	6037100	15614
1000000	dp	6037100	135898

Result analysis

The dynamic programming algorithm can be used to obtain the global optimal solution to the knapsack problem because the knapsack problem satisfies the optimization principle and the principle of no aftereffects. During the solution process, the time complexity of the dynamic programming algorithm is $O (nW)$ , but since $W$ is just an input data, which can be expressed as input scale $The exponential form of n$ , so it is a pseudo-polynomial algorithm, that is, it is not a polynomial algorithm. Therefore, with $As N$ increases, the calculation time of dp increases very quickly.

There is no doubt that integer programming can obtain the global optimal solution; but since it is not a polynomial algorithm itself, as $As N$ increases, the calculation time also increases a lot. However, from the comparative data, integer programming is still more efficient than dynamic programming algorithm.

assignment problem

After analyzing the backpack problem, let’s study the assignment problem.

The assignment problem can be described as: $n$ personal allocation $There are n$ tasks. One person can only be assigned one task, and one task can only be assigned to one person. To assign a task to one person requires payment of compensation. Find how to allocate tasks to ensure that the total amount of compensation paid is the minimum.

mathematical model

integer programming

Set the reward to be paid as matrix $Cn×n \pmb C_{n\times n}$ , in which $c_{i,j}$ Represents the Person $i$ $The compensation that needs to be paid for j$ tasks.

Define $x_{i,j}$ Part Whether person $i$ $When the value of j$ task is 0, it means it is not assigned, and when its value is 1, it means it is assigned.

At this time, the following mathematical programming model can be established:
$\quad \sum_{i=1} ^n\sum_{j=1}^nc_{i,j}x_{i,j} \\ \text{st} \quad \sum_{j=1}^nx_{i,j}=1, \quad i=1,2,...,n \\ \nonumber \sum_{i=1}^nx_{i,j}=1, \quad j=1,2,...,n \\ \nonumber x_ {i,j} \in \{0,1\} ,\quad i,j=1,2,...,n\\$

Hungarian algorithm

The classic algorithm for assignment problems is the Hungarian algorithm . However, this algorithm is not as easy to implement as the dynamic programming algorithm, so it is easier to find a ready-made toolkit. This article uses the linear_sum_assignment module in the scipy.optimize package. The principle of this algorithm can be found in the literature: On implementing 2D rectangular assignment algorithms . It is said that its essence is still the Hungarian algorithm, but since it does not affect the conclusions in the article, the author did not study it carefully~

Simulation

The following code is based on the integer programming algorithm and Hungarian algorithm for solving assignment problems implemented in Python. By adjusting $The value of N$ can change the size of the assignment problem. Therefore, we can easily compare the results of these two algorithms under different problem sizes, including the quality of the solution and the speed of solution, to evaluate the ability of the algorithm.

from ortools.linear_solver import pywraplp
from scipy.optimize import linear_sum_assignment
import numpy as np
import time


def calc_by_ortools(C):
    # 声明ortools求解器，使用SCIP算法
    solver = pywraplp.Solver.CreateSolver('SCIP')
    m = C.shape[0]
    n = C.shape[1]

    # 优化变量，0-1变量
    x = {
    
    }
    for i in range(m):
        for j in range(n):
            x[i, j] = solver.IntVar(0, 1, 'x[%i,%i]' % (i, j))

    # 目标函数
    obj_expr = [C[i][j] * x[i, j] for i in range(m) for j in range(n)]
    solver.Minimize(solver.Sum(obj_expr))

    # 约束条件
    for i in range(m):
        cons_expr = [x[i, j] for j in range(n)]
        solver.Add(solver.Sum(cons_expr) == 1)

    for j in range(n):
        cons_expr = [x[i, j] for i in range(m)]
        solver.Add(solver.Sum(cons_expr) == 1)

    # 模型求解
    status = solver.Solve()

    # 打印模型结果
    if status == pywraplp.Solver.OPTIMAL:

        # 求解成功，打印最优目标函数值
        print('ortools, best_f =', solver.Objective().Value())

    else:
        # 求解不成功，提示未收敛
        print('not converge.')


def calc_by_scipy(C):
    # 调用工具包：linear_sum_assignment
    row_ind, col_ind = linear_sum_assignment(C)
    # 打印最优目标函数值
    print('scipy, best_f =', cost[row_ind, col_ind].sum())


if __name__ == '__main__':
    # 设置随机种子，确保每次运行生成的随机数相同
    np.random.seed(0)

    # 设定报酬矩阵的维度
    N = 1000
    # 报酬范围是10~100间的随机值
    cost = np.random.randint(10, 100, (N, N))
    print('N = ', N)

    # 使用ortools求解，并统计计算耗时
    t0 = time.time()
    calc_by_ortools(cost)
    print('ortools计算耗时：{}'.format(time.time() - t0))

    # 使用求解scipy中的 modified Jonker-Volgenant algorithm求解，并统计计算耗时
    t1 = time.time()
    calc_by_scipy(cost)
    print('scipy计算耗时：{}'.format(time.time() - t1))

The following table shows the two algorithms in different $Detailed performance data on N$ , where ortools refers to the integer programming algorithm and scipy refers to the Hungarian algorithm.

From the perspective of solution quality, the two algorithms can always find the global optimal solution, so there is no difference.

In terms of solution time, ortools is always higher than scipy; and with $As N$ increases, scipy's solution time increases slowly, but ortools increases very quickly.

N	algorithm	Optimal solution	Time consuming, s
10	ortools	222	0.0136
10	scipy	222	0
50	ortools	621	0.1599
50	scipy	621	0.0001
100	ortools	1087	0.9516
100	scipy	1087	0.0003
300	ortools	3034	9.9593
300	scipy	3034	0.0047
500	ortools	5004	24.89
500	scipy	5004	0.0118
1000	ortools	10000	177.5
1000	scipy	10000	0.0396

Result analysis

The results of the integer programming algorithm will not be analyzed. They are basically the same as the knapsack problem. Let’s take a brief look at the Hungarian algorithm.

从算法原理上来说，它是针对指派问题的特点，找到的一个多项式算法，所以耗时非常短。

总结

其实分析分析后，结论已经呼之欲出了：将组合优化问题建模为整数规划问题来求解，本质上使用的是一种通用方案，只是由于很多公司都致力于迭代优化求解器的效率，所以目前来看，这个通用方案的整体表现还不错；但那些针对特定问题的特定算法，可以理解为一种个性化解决方案，旨在通过利用问题自身的特征，探查更高效的解决方案。

基于这个理解，在实际问题的求解中，应该优先将问题建模为有个性化求解算法同时问题复杂度是多项式的经典问题；其次是建模为有个性化求解算法但问题复杂度不是多项式的经典问题，此时需要对比经典算法和整数规划的效率和精度；最后才是直接建模为整数规划问题。

参考文献

背包问题，动态规划Python代码：https://blog.csdn.net/m0_51370744/article/details/127120649

背包问题，动态规划java代码：https://blog.csdn.net/baidu_41602099/article/details/110383230

指派问题和匈牙利算法:https://zhuanlan.zhihu.com/p/103125599

指派问题scipy算法原理：https://sci-hub.se/10.1109/TAES.2016.140952