An Algorithm for the Subset Sum Problem with O(2ⁿ) Time Complexity and O(n) Space Complexity

Subset and problem (Subset-Sum Problem, SSP) is to say that a given one has nnA set of natural numbers with n elements S = { a 1 , a 2 , ⋯ , an } S=\{a_1,a_2,\cdots,a_n\}S={ a1,a2,,an} and a natural numbersss , asks if there isSSA subset of S TTT makesTTThe sum of all elements in T is equal to sss

make ttt is equal toSSThe sum of all elements in S ( t = ∑ j = 1 najt=\sum\limits_{j=1}^{n}a_jt=j=1naj). Obviously, when s > t s>ts>When t , the subsetTTT does not exist, output directlyFalse; whens = ts=ts=When t ,T = ST = ST=S is a subset that satisfies the conditions and is output directlyTrue; whens = 0 s=0s=0 , because the empty set∅ \emptyset isSSA subset of S and the sum of the elements of the empty set is 0 00 , also output directlyTrue. So, we only need to consider0 < s < t 0<s<t0<s<the case of t .

There are currently two popular solutions, one is search (time complexity O ( 2 n ) O\left(2^n\right)O(2n ), space complexityO ( n ) O(n)O ( n ) ), the other isdynamic programming(reducing the problem to a knapsack problem, time complexityO ( nt ) O(nt)O ( n t ) , the space complexity after rolling array optimization isO ( n + t ) O(n+t)O ( n+t ) ). The former has the advantage that the time complexity does not depend onttthe size of t , but the worst-case time complexity is exponential; the advantage of the latter is that whenttt is relatively small (t < 2 n t<2^nt<2n ) is faster than search, the disadvantage is that whent ≥ 2 nt\ge 2^nt2n is slower than search, and takes up a lot of space (generallyttt is much larger thannnn ). This paper presents a strategy that can replace dynamic programming, making the time complexityO ( nt ) O(nt)The simultaneous space complexity of O ( n t ) is only O ( n ) O(n)O ( n ) . whenttWhen t is large, we still use search; whenttWhen t is small, we use this new strategy for calculation. In this way, facingttWhen t is small, we have an efficient and less space-consuming method.

The main idea of ​​our new strategy is to exploit the property of roots of unity of complex numbers. In my previous article, I introduced how to use the unit root to cleverly solve a mathematical problem (see that article for the nature and proof of the unit root), so in this article we will use this idea to give a subset and algorithms for the problem.

Consider the function f ( x ) = ∏ j = 1 n ( 1 + xaj ) f(x)=\prod\limits_{j=1}^{n}\left(1+x^{a_j}\right)f(x)=j=1n(1+xaj) to expand it into a polynomialf ( x ) = c 0 + c 1 x + c 2 x 2 + ⋯ + ctxtf(x)=c_0+c_1x+c_2x^2+\cdots+c_tx^tf(x)=c0+c1x+c2x2++ctxThe coefficient cp c_pof each term in tcpEqual to set S = { a 1 , a 2 , ⋯ , an } S=\{a_1,a_2,\cdots,a_n\}S={ a1,a2,,an} all sum toppThe number of subsets of p ( cp = ∣ { T ∣ T ⊆ S , sum ( T ) = p } ∣ c_p=\left|\left\{T|T\subseteq S,\,\mathrm{sum}( T)=p\right\}\right|cp={ TTS,sum(T)=p } ). Then the problem we want to solve is essentially to judgecs c_scsIs it equal to 0 00 . Putf ( x ) f(x)f ( x ) divided byxsx^sxs f ( x ) x s = c 0 x − s + c 1 x 1 − s + ⋯ + c s + c s + 1 x + ⋯ + c t x t − s \frac{f(x)}{x^s}=c_0x^{-s}+c_1x^{1-s}+\cdots+c_s+c_{s+1}x+\cdots+c_t x^{t-s} xsf(x)=c0xs+c1x1s++cs+cs+1x++ctxt s Now we consider the problem in the field of complex numbers. Letz ∈ C z\in\mathbb{C}zC , substitute into the above formula to get z − sf ( z ) = ∑ j = 0 tcjzj − sz^{-s}f(z)=\sum\limits_{j=0}^{t}c_j z^{js}zsf(z)=j=0tcjzj s Next, consider using the property of the unit root. Letω = e 2 π im = cos ⁡ 2 π m + i sin ⁡ 2 π m \omega=e^{\frac{2\pi i}{m}}=\cos\frac{2\pi}{m }+i\sin\frac{2\pi}{m}oh=em2πi=cosm2 p.m+isinm2 p.mis mmm times unit root, satisfyingω m = 1 \omega^m=1ohm=1 . We also know that ∑ k = 0 m − 1 ( ω k ) u = { 0 , u is not a multiple of m m , u is a multiple of m\sum\limits_{k=0}^{m-1}\left(\ omega^k\right)^u=\begin{cases} 0,&u\text{is not a multiple of}m\text{}\\ m,&u\text{is a multiple of}m\text{} \end{cases }k=0m1( ohk)u={ 0,m,u is not a multiple of mu is a multiple of mfor any integer uuu (whether positive or negative) holds. This gives us to separate out the coefficientscs c_scsDefinitely ∑ k = 0 m − 1 ( ω k ) − sf ( ω k ) = ∑ j = 0 t [ ∑ k = 0 m − 1 cj ( ω k ) j − s ] \sum\limits_ . {k=0}^{m-1}{\left(\omega^k\right)}^{-s}f\left(\omega^k\right)=\sum\limits_{j=0}^ {t}\left[\textcolor{orange}{\sum\limits_{k=0}^{m-1}c_j{\left(\omega^k\right)}^{js}}\right]k=0m1( ohk)sf( ohk)=j=0t[k=0m1cj( ohk)j s ]We hope that atj − s ≠ 0 js\ne 0js=When 0 , the value in square bracketsis 0 00 , only whenj − s = 0 js=0js=0 isj = sj=sj=s is not0 00 , so the value of the above formula will becomemcs mc_smcs, we can find cs c_s from thiscsvalue. And j − s jsjThe span of s is from− s -ss tot − s tsts , we hope− s > − m -s>-ms>m t − s < m t-s<m ts<m , so that exceptj = sj=sj=The values ​​in square bracketsother than s must be0 00 .
range of m
Thereforem > s m>sm>s m > t − s m>t-s m>ts . For the convenience of calculation,mmm should be as small as possible, so we takem = max ⁡ ( s , t − s ) + 1 m=\max(s,ts)+1m=max(s,ts)+1。 Determine, find the field field cs = 1 m ∑ k = 0 m − 1 ω − ksf ( ω k ) = 1 m ∑ k = 0 m − 1 e − 2 π iskmf ( e 2 π ikm ) \begin {aligned} c_s&=\frac{1}{m}\sum\limits_{k=0}^{m-1}\omega^{-ks}f\left(\omega^k\right)\\ &= \textcolor{dodgerblue}{\frac{1}{m}\sum\limits_{k=0}^{m-1}e^{-\frac{2\pi isk}{m}}f\left(e ^{\frac{2\pi ik}{m}}\right)} \end{aligned}cs=m1k=0m1ohk s f( ohk)=m1k=0m1em2πiskf(em2πik)计算 f ( e 2 π i k m ) f\left(e^{\frac{2\pi ik}{m}}\right) f(em2πik) demandO ( n ) O(n)O ( n ) time, summing takesO(m) O(m)O ( m ) time, so a total cost ofO ( nm ) O(nm)O ( nm ) time. Andm ≤ tm\le tmt , so in the worst case it takesO ( nt ) O(nt)O ( nt ) time . We only spentO ( 1 ) O(1)O ( 1 ) space, so the total space spent by the program is to storea 1 , a 2 , ⋯ , an a_1,a_2,\cdots,a_na1,a2,,anO ( n ) O(n) usedO ( n ) space. In this way, we save a lot of space compared to the dynamic programming solution of the knapsack problem.

In general, our algorithm is designed as follows: firstly, the judgment s = 0 s=0s=0 s = t s=t s=t s > t s>t s>the case of t . Next findm = max ⁡ ( s , t − s ) + 1 m=\max(s,ts)+1m=max(s,ts)+Value of 1 ifm > 2 n m>2^nm>2n calls the search algorithm to solve, ifm ≤ 2 mm\le 2^mm2m callsthe blue formulato solve. Note that in the process of calculating complex numbers, we repeatedly use Euler's formulaei θ = cos ⁡ θ + i sin ⁡ θ e^{i\theta}=\cos\theta+i\sin\thetaeiθ=cosi+isinθ . Calculation functionffWhen f , my idea is to pass inθ = 2 π km \theta=\frac{2\pi k}{m}i=m2πkAs a parameter (instead of passing in the complex number z = ei θ z=e^{i\theta} directlyz=eiθ),而 f ( z ) = ∏ j = 1 n ( 1 + z a j ) f(z)=\prod\limits_{j=1}^{n}\left(1+z^{a_j}\right) f(z)=j=1n(1+zaj),所以zaj = cos ⁡ aj θ + i sin ⁡ aj θ z^{a_j}=\cos a_j\theta+i\sin a_j\thetazaj=cosaji+isinajθ , thus avoiding the direct calculation of the complex numberzzzaj a_jajpower, thereby reducing the error. Eventually, we get cs c_scsFinally, in theory, if the sum is ssA subset of s does not exist then cs c_scsshould be 0 00 , but because there are rounding errors in the calculation process, we relax the condition that the subset does not exist as∣ cs ∣ < 1 2 |c_s|<\frac{1}{2}cs<21, so that the problems caused by errors can be avoided to a large extent. At the same time, when cs c_scsWhen the calculation is more accurate, it is actually equal to and is ssThe number of subsets of s .

The complete Pythoncode is as follows:

# encoding: utf-8

import math
from typing import *

class SubsetSumSolver:
    def __init__(self, a: List[int], s: int):
        self.a = a
        self.s = s
        self.n = len(a)
        self.t = sum(a)
    def search(self, u: int, m: int) -> bool: # search method (m>2^n)
        if m == 0:
            return True
        if u == 0:
            return False
        if m >= self.a[u - 1] and self.search(u - 1, m - self.a[u - 1]):
            return True
        return self.search(u - 1, m)
    def f(self, theta: float) -> complex:
        r = 1.
        for h in self.a: # h: a_j
            arg = h * theta # a_j*θ
            r  *= 1 + math.cos(arg) + 1j * math.sin(arg) # r*=(1+e^(i*a_j*θ))
        return r
    def complex_method(self) -> complex:
        m = max(self.s, self.t - self.s) + 1
        r = 0 # result
        for k in range(m):
            theta = 2 * math.pi * k / m
            f_result = self.f(theta) # f(e^(2πik/m))
            theta *= -self.s
            zs = math.cos(theta) + 1j * math.sin(theta) # e^(2πisk/m)
            r += zs * f_result
        r /= m
        return r
    def solve(self) -> bool:
        if self.s > self.t:
            return False
        if self.s == self.t:
            return True
        if self.s == 0:
            return True
        if max(self.s, self.t - self.s) + 1 > 2 ** self.n: # see if m>2^n
            return self.search(self.n, self.s)
        else:
            return abs(self.complex_method()) > 0.5

a = [73383, 66729, 31459, 76611, 70029, 11389, 10089, 63531, \
    87311, 64114, 1566, 30601, 45294, 92796, 57129, 18475, 17759, \
    25253, 93402]
s = 242514 # test data
solver = SubsetSumSolver(a, s)
print(solver.solve())

This algorithm works well in most cases, except for some extreme cases, such as f ( z ) f(z)The modulus length of f ( z ) is very large (close to 2 n 2^n2n ), it will magnify the rounding error, resulting in inaccurate results. However, after my test, the probability of this situation is very small (this situation has not occurred in my test data), so don't worry.

Guess you like

Origin blog.csdn.net/qaqwqaqwq/article/details/128486167