SlideShare a Scribd company logo
1
Alexander Y. Davydov
AlgoTerra LLC, 249 Rollins Avenue, Suite 202, Rockville, MD 20852, USA
E'mail:
June 30, 2011
Using the probability theory'based approach, this paper reveals the equivalence of an arbitrary NP'
complete problem to a problem of checking whether a level hypersurface of a specifically constructed
harmonic cost function (with all diagonal entries of its Hessian matrix equal to zero) intersects with a unit
hypercube in many'dimensional Euclidean space. This connection suggests the possibility that methods of
continuous mathematics can provide crucial insights into the most intriguing open questions in modern
complexity theory.
Key words: NP'complete, Harmonic cost function, Level hypersurface, Union bound, Universality
Non'deterministic polynomial time complete (NP'complete) problems are of
considerable theoretical and practical interest and play a central role in the theory of
computational complexity in modern computer science. Currently, more than three
thousand vital computational tasks in operations research, machine learning, hardware
design, software verification, computational biology, and other fields have been shown to
be NP'complete. The ‘completeness’ designates the property that, if an efficient (=
polynomial'time) algorithm for solving any of NP'complete problems could be
found, then we would immediately have (as a minimum) an efficient algorithm for
problems in this class [1'4]. Despite persistent efforts by many talented researchers
throughout several decades, it is not currently known whether NP'complete problems can
be efficiently solved. An unproven conjecture broadly spread among complexity theorists
is that such polynomial'time algorithm cannot exist. It is also a general belief that either
proof or disproof of this conjecture can only be obtained through development of some
new mathematical techniques.
In the present paper, a novel approach to tackling NP'complete problems is
proposed bringing a fresh perspective on the subject matter. More specifically, our
approach takes the problem from the realm of mathematics and reformulates it in
the realm of mathematics, where it is then treated using tools of mathematical
analysis and probability theory. The main idea of the proposed method stems from
recognizing that, owing to exponentially large solution space for whichever NP'complete
problem, any prospective approach to solving it by examining solution candidates
sequentially, one by one, is predestined to fail in yielding an efficient algorithm,
regardless of how smart and sophisticated this approach is. Assume for a moment that an
efficient algorithm for solving NP'complete problems does exist. Then we can only hope
to discover it if, at the very minimum, we learn how to manipulate the solution
candidates and avoid detailed examination of a specific candidate
prematurely. At first sight, it seems like an impossible task. Surprisingly, this paper
shows that it is attainable. The two key elements to success are (i) introduction of a new
2
set of variables using probabilistic reasoning and (ii) smart choice of a cost function to be
minimized which is expressed in terms of these variables.
For definiteness, we will consider a specific NP'complete problem, the
( ) which is equivalent to [5]. The problem is to
decide whether there exists an assignment of bits )
,
,
,
( 2
1
! = , each taking value
" or , such that # clauses (constraints) are simultaneously satisfied. Each clause
involves exactly three bits, say $ , and with { }
$ ,
,
2
,
1
,
, ∈ , and is satisfied if
and only if one of the bits is and the other two are ", i.e., 1
=
+
+
$ . It is assumed
that, within each clause, the indices $, , and are all distinct.
The rest of the paper is organized as follows. In Section 2, we reformulate the
problem in the realm of continuous mathematics using a new set of variables and a
cost function with some rather remarkable properties. Section 3 presents an iterative
algorithm for solving that candidly exploits these properties. The illustrative
examples of the algorithm’s performance for different problem sizes are given in Section
4. The empirical evidence of the surprising of variable flows as the algorithm
proceeds at low clauses'to'variables ratios is presented in Section 5. Finally, in Section 6,
we discuss open issues related to computational complexity of the presented algorithm
and summarize the results.
There are 2 possible assignments of bits $ , so checking them sequentially until a
solution is found would take (on average) exponential time. Instead, one would like to
evaluate all candidate solutions simultaneously and manipulate the whole pool of them in
a manner ensuring that promising candidate solutions get % while the
unpromising ones become in some sense. To reach this goal, we introduce a
set of new variables for the problem as follows. With each bit , we associate a
probability that value of is chosen as ":
= Pr{ = 0}, 0 ≤ ≤ 1, = 1, 2, …, . (1)
It follows that
1 − = Pr{ = 1}, 0 ≤ ≤ 1, = 1, 2, …, .
Vice versa, given some value of ]
1
,
0
[
∈ , the associated bit is selected to be " or at
random with probabilities and −
1 , respectively. Let )
,
,
,
( 2
1
& = denote an
'element vector representing the whole set of new variables. Consider two limiting
cases. In first case, let all components of vector & be either " or . Then we have no
ambiguity in choosing the corresponding bits !, i.e., there is one'to'one correspondence
between & and !. In another limiting case, all components of vector & are distinct from
either " or . Then, for a fixed vector &, we can choose any of 2 possible assignments of
bits !, although possibly with different weights. In particular, when )
,
,
,
( 2
1
2
1
2
1
=
& ,
3
each assignment of bits has the same weight (probability), no one assignment is chosen
more often (on average) than the other.
Now we would like to construct a continuous ' ( for the problem
with domain coinciding with the hypercube : 0 ≤ ≤ 1, ( = 1,…, ). The highly
desired property of the cost function is its dependence upon the structure of the problem
in such a way that makes it ‘feel’, in some sense, where the search for the satisfying
assignments should be conducted. As a first step in search for such a function, let us find
the probability that a clause is ' given some fixed &. Table 1 lists all
possible assignments of three distinct bits involved in the clause along with the
corresponding probabilities.
List of bit assignments and their probabilities for an arbitrary clause
!
$ +
+
=
"
0 0 0 unsatisfied $ ⋅
⋅
0 0 1 '
)
1
(
$ −
⋅
⋅
0 1 0 ' $ ⋅
−
⋅ )
1
(
0 1 1 unsatisfied
)
1
(
)
1
(
$ −
⋅
−
⋅
1 0 0 ' $ ⋅
⋅
− )
1
(
1 0 1 unsatisfied
)
1
(
)
1
( $ −
⋅
⋅
−
1 1 0 unsatisfied $ ⋅
−
⋅
− )
1
(
)
1
(
1 1 1 unsatisfied
)
1
(
)
1
(
)
1
( $ −
⋅
−
⋅
−
The probability reads
$
$
$
$
$
$
⋅
−
⋅
−
⋅
−
⋅
⋅
⋅
+
=
⋅
⋅
−
−
⋅
−
⋅
−
−
⋅
⋅
−
=
3
1
)
1
(
)
1
(
)
1
(
1
(2)
Let us now define a ' as a sum of probabilities for all clauses:
)
,
,
(
)
(
1
$
#
&
( ∑
=
= (3)
4
If all clauses were independent, then ((&) would have a meaning of ) that at
least one of the clauses of the corresponding problem is unsatisfied by the
assignment of bits chosen in accordance with &. However, clauses that contain common
variables are dependent and so one cannot interpret ( as such probability in general case.
Nevertheless, ( has a number of interesting properties which happen to be very useful for
reformulating the problem in terms of continuous mathematics. Let us now turn to
investigating these properties.
It is useful to expand the domain of the cost function ( to the 'dimensional space
ℜ . This allows us to focus on general behavior of ( without restrictions irrelevant to our
goal and also has some other benefits as it will become evident shortly. The domain
expansion is straightforward since ( is a non'singular function everywhere in ℜ .
((&) is a % function in ℜ : 0
2
=
∇ ( . Moreover, ( is a harmonic
function of any non'empty subset of variables (assuming that values of the remaining
variables are hold fixed).
'* Since each clause C consists of three distinct bits, the corresponding probability
depends on three different variables. From (2), we then observe that
0
,
0
,
0 2
2
2
2
2
2
=
∂
∂
=
∂
∂
=
∂
∂
$
for any , and the statement of the + immediately
follows. ▄
From + and the well'known property of harmonic functions in ℜ , it follows that
the cost function ( can attain its maximum and minimum values in an arbitrary compact
domain , only on the boundary ,
δ ( % ) ) ). Notice also that the
Hessian matrix of function ( has all its diagonal elements equal to zero everywhere in
ℜ .
((&) > 0 everywhere in of the hypercube (0 < < 1, =1,…, ).
'* By definition, ( is a sum of that all attain values inside the
hypercube because they have a meaning of probability there. Note that "inside the
hypercube" means that all components of vector & are different from either " or
although they might come very close to these limiting values. Recall that, any vector &
inside corresponds to all 2 assignments of logic variables although some of them are
more probable that the other. Now suppose (-&. = 0 somewhere in the interior of .
Then it follows that probabilities = 0 for all =1,…, simultaneously. This means that
clauses in the corresponding EC3 problem are satisfied with probability 1 )
' % ' ! But clearly this is impossible. Thus, by
, (-&. must be positive everywhere inside the hypercube . ( may or may not
vanish only on the boundary ∂ . ▄
The spectrum of possible values of ( on the vertices of the hypercube is
discrete and consists of integers from 0 to /0 / ≤ #:
( = {0, 1, 2,…, /} for &∈1 '
5
The value of ( at any particular vertex gives the number of unsatisfied clauses of the
corresponding problem, provided that logic variables are chosen in accordance with
vector & at this vertex. / is the maximum number of clauses that can be simultaneously
unsatisfied and it depends on the specific structure of the problem under
consideration.
'* As mentioned earlier, there is one'to'one correspondence between ! and & on the
vertices of the hypercube . So there is no ambiguity in values of logic variables and
hence every clause can be either satisfied or unsatisfied by the definite set of logic
variables !. This means that each is either (clause is unsatisfied) or " (clause is
satisfied). Since ( is a sum of all , its spectrum of possible values consists of numbers
of ' clauses and can vary from 0 to some integer /, which does not exceed #
and is problem'specific. ▄
If a level hypersurface defined by the equation (-&. = 1 passes through the
of , then there exists at least one satisfying bit assignment for the problem.
'* Recall that, inside the hypercube , each has the meaning of probability of an
event that the corresponding clause is unsatisfied, i.e., { } ∈
= &
& ,
)
(
Pr . From
the (or 2 3 4 ) [6], it follows
∑
=
=
≤





 #
#
1
1
Pr , (4)
which can be also recast as
∑
=
=
≤










−
#
#
1
1
Pr
1 (4a)
Applying DeMorgan’s law [7] to the lhs of (4a) and making use of the definition (3), one
obtains
)
(
1
Pr
1
&
(
#
−
≥






=
(5)
If (-&. = 1 inside , then there exists an open set ∈
*
& where 1
)
( *
<
&
( due to
continuity of the cost function. It then follows from (5) that, for such *
& , the probability
of all clauses being simultaneously satisfied is ) : 0
)
(
Pr
1
*
>






=
#
& , which
proves the + 5.
!
In section 2.2, we presented a simple procedure of how, for a given problem with
logic variables and # clauses, a % function ( of continuous variables can be
6
constructed in the analytic form. Function ( defines a hypersurface " by the equation:
((&) = 0. From + 6 it follows that there are only two possibilities:
1. " intersects the hypercube (0 ≤ ≤ 1, =1, 2, …, ) on the boundary ∂ ;
2. " does not have any common points with the hypercube .
From the meaning of (, it is obvious that, in Case #1, the corresponding
problem is satisfiable (and the vertices of that belong to the hypersurface " uniquely
determine the satisfying assignments of logic variables) while, in Case #2, the problem
has no solutions Thus can be thought of as a problem of between a
level hypersurface " of a harmonic cost function ( and a unit hypercube in ℜ .
Alternatively, based on the + 5, we can present the problem as a
question of whether a hypersurface defined by the equation ((&) = 1 with
the of the hypercube .
In any case, we observe that the , initially formulated as a problem in the
realm of discrete mathematics, can be mapped into a problem in the realm of continuous
mathematics. The reformulated problem can be then attacked using the methods of
mathematical analysis, differential geometry, and manifold topology. New interesting
algorithms for solving NP'complete problems may emerge as a result of these attacks.
The next section presents an example of such an algorithm.
# ! $ # ! !
Possibly the simplest new algorithm for the problem belongs to the
family and is described as follows:
2 ) 7 , -2 7,. %
(1) Construct cost function ((&) using Eqs (2) and (3);
(2) Select a starting point )
0
(
& inside and a constant step parameter 0
>
η ;
(3)
(3.1) Update: )
(
)
(
)
(
)
1
( )
(
$
$
$
$ &
(
∂
∂
⋅
−
=
+
η for ...,
,
2
,
1
= ;
(3.2) 1
)
1
(
>
+
$
1
)
1
(
=
+
$
;
(3.3) 0
)
1
(
<
+
$
0
)
1
(
=
+
$
;
% stopping criterion )
(
)
1
( $
$
&
& =
+
is satisfied
(4) Compute: )
(&
( ;
(5) 0
)
( =
&
( has a solution given by ! chosen in accordance with &;
& report that a single run from starting point )
0
(
& failed to find a solution.
7
Several remarks are in order. First, + guarantees that the algorithm
presented above never becomes trapped at some local minimum simply because of
absence of local minima for a harmonic function of a compact domain. The
only stationary points that can be encountered during evolution of & are the )
where 0
=
∇( . The saddle points form a set of measure zero in ℜ , thus making the
probability of & passing through them negligible. Nevertheless, saddles play a crucial role
by being bifurcation points that separate different evolution paths. They may cause two
trajectories that originate at close starting points to end up far away from each other. The
noticeable saddle point in the interior of the hypercube is )
,
,
,
( 3
2
3
2
3
2
=
& , as can be
easily verified by differentiation of with respect to any variable. Hence this point must
not be used to start the algorithm with. However, any other point in the vicinity of & can
serve as a starting point (provided it is not a saddle). Second remark concerns consecutive
runs of the algorithm, each time starting from a different point. If, during the latest run,
the solution is found, we stop; otherwise we continue until maximum time designated to
find a solution has been exhausted. With each unsuccessful run of the algorithm, the
confidence that there is no solution increases. Whether it is possible or not to obtain a
quantitative estimate of the probability that solution exists in terms of , #, and the
number of unsuccessful runs 8 is an open question which we will discuss briefly in
Section 6. Third, notice that consecutive runs of the algorithm can be implemented in
parallel. Once the cost function is generated, steps (2) through (4) of the 2 7, can be
carried out on / different processors simultaneously resulting in almost linear speedup,
with each processor using its own starting point generated at random according to some
protocol.
' # (
Let us consider several examples illustrating the performance of the algorithm described
in Section 3. All problem instances presented below were generated randomly, and the
starting points for consecutive runs of the 2 7, algorithm were chosen at random lying
on the surface of an 'dimensional sphere centered at )
,
,
( 2
1
2
1
with the radius 0.05. #
denotes the ‘) ’ of size × # which consists of indices of variables
involved in clauses columnwise. The step parameter η was chosen to be equal to 0.005.
A. Small'size problem: = 15; # = 8 (M/N ≅ 0.53)
10
8
15
15
12
15
13
15
9
6
13
12
6
14
7
6
6
1
5
7
2
11
4
3
=
#
The solution was found after the very first run:
Z = (1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0)
Figures 1 and 2 show the evolution of the cost function and the variables )
,
,
1
( = ,
respectively, as the algorithm proceeds. Notice that some are changing in non'
monotonic manner.
8
0 50 100 150 200 250 300 350 400 450 500
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
k = iteration #
Cost
function
F(k)
) ! Cost function ( versus iteration number for the case .
0 50 100 150 200 250 300 350 400 450 500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
) ! Evolution of variables for the case .
9
B. Medium'size problem: = 100; # = 40 (M/N = 0.4)
It took four successive runs of the 2 7, algorithm to find a solution. The evolution
curves for the cost function ( for all four runs are shown in Figure 3, and the evolution of
the variables for the last successful run is presented in Figure 4.
0 100 200 300 400 500 600 700 800
0
5
10
15
20
25
k = iteration #
Cost
function
F(k)
) ! Cost function ( versus iteration number for the case 2 (for four runs of the BSGD
algorithm).
0 100 200 300 400 500 600 700 800 900
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
) ! ' Evolution of variables in the successful run for the case 2.
10
C. Large'size problem: = 1000; # = 250 (M/N = 0.25)
It took three runs to find a solution. Figures 5 and 6 present the results (see captions for
details).
0 200 400 600 800 1000 1200 1400 1600
0
20
40
60
80
100
120
140
160
k = iteration #
Cost
function
F(k)
) ! * Cost function ( versus iteration number for the case (for three runs of the BSGD
algorithm).
0 200 400 600 800 1000 1200 1400 1600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
) ! + Evolution of variables in the successful run for the case .
11
* % # " , ) & - & ,
9 )) ' $ : % )) ' % )) ; ;
+ 0 < / =
There exists extensive empirical evidence suggesting that many constraint satisfaction
problems exhibit phase transition phenomena from satisfiability to unsatisfiability as the
ratio of the number of clauses (constraints) to the number of variables (= #/ ) passes
through some threshold value *
in the limit ∞
→ [8'11]. For the
(or ) problem, it is found by numerical experiments that 62
.
0
*
≈ [12]. Consequently,
randomly generated instance of the problem with large has a solution w.h.p. when
*
<< . Figure 7 shows the evolution of variables for the randomly generated
problem with 025
.
0
= (# = 25, = 1000). Surprisingly, these variable flows are
) and persist for different problem instances with small (which all have satisfying
assignments w.h.p.). One can distinguish five families of well'separated variable flows,
all originating from starting points around ½ (see Fig. 7):
– Monotonically growing towards ;
– Monotonically growing to a plateau at (roughly) 6> and then splitting into
two sub'flows leading in the opposite directions to and ", respectively;
– Growing towards first and then returning to a plateau at ? where the flow
splits into two sub'flows towards and ", respectively;
, – Growing towards first and then changing the direction towards ";
, – Horizontal flow of irrelevant variables (those that are not involved in any
clause).
0 500 1000 1500 2000 2500 3000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
) ! . Typical variable flows for randomly generated problem at low clauses'to'
variables ratios (see text for details).
12
It is also worth noting that, for any variable and independently from the value of
, the slope of the corresponding trajectory can take only discrete set of values near the
starting point close to ½. Specifically, for any { }
$ ,
,
2
,
1
∈ it holds
$
$
$ ⋅
≅
− η
4
1
)
1
(
)
2
(
, (6)
where { }
#
$ ,
,
2
,
1
,
0
∈ denotes the total number of clauses that $ is involved in.
Figure 8 clearly illustrates this fact by zooming in the leftmost region of Fig. 7.
0 20 40 60 80 100 120 140 160 180 200
0.48
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
iteration #
X
n
(n
=
1,...,N)
) ! / Zooming in the region of Fig. 7 near the starting points around ½; discreteness of the
spectrum of starting slopes for variable trajectories can be clearly observed.
Numerical experiments show that, as the ratio increases, the highly ordered
behavior of variable flows (as demonstrated in Figure 7) starts to crumble, with new
trajectories not belonging to any of the abovementioned five families beginning to
appear. With further increase of , the basic five types of flow become wider and start to
overlap while at the same time share of the ‘irregular’ trajectories grows. Finally, when
ratio approaches the threshold value, only few variables follow the familiar paths, as
illustrated in Table 2. The situation, to some extent, resembles the transition from laminar
to turbulent flow in hydrodynamics, with control parameter playing the role of the
Reynolds number. The nature of such behavior of variable flows for randomly generated
instances remains to be understood and maybe even quantified.
13
Typical variable flows generated by the 2 7, algorithm
"
0.28
200 400 600 800 1000 1200 1400 1600 1800
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
r = 0.28 ( M = 28; N = 100 )
0
0.44
100 200 300 400 500 600 700 800 900 1000 1100 1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
r = 0.44 ( M = 44; N = 100 )
200 400 600 800 1000 1200 1400 1600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
r = 0.44 ( M = 44; N = 100 )
0.52
200 400 600 800 1000 1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
r = 0.52 ( M = 52; N = 100 )
200 400 600 800 1000 1200 1400 1600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
r = 0.52 ( M = 52; N = 100 )
0.6 0
100 200 300 400 500 600 700 800 900 1000 1100 1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
iteration #
X
n
(n
=
1,...,N)
r = 0.6 ( M = 60; N = 100 )
14
+ 1 !
The most interesting question regarding the 2 7, algorithm concerns its computational
complexity. This is a very difficult question and we will not attempt to fully address it in
this paper. Instead, some preliminary analysis of the complexity issues is given below.
Assume that the problem at hand has one or several satisfying bit
assignments. Suppose also that, without knowing that the problem is satisfiable, we have
run the 2 7, algorithm independently (using randomly selected starting points) 8 times
and it has failed to find at least one satisfying solution. How confident can we be that
there is a solution to be found if we continue? Let 4 be a probability to find any satisfying
bit assignment in a single run of the 2 7, algorithm in case when at least one solution
does exist. Then the expected number of trials until the first success is 1/4. Indeed,
probability that exactly @ trials are needed is { } 1
)
1
(
Pr −
−
=
= @
4
4
@ , and hence
[ ] { } 4
4
@
4
@
@ @
@
@
/
1
)
1
(
Pr 1
1
1
=
−
⋅
⋅
=
=
⋅
=
≡ −
∞
=
∞
=
∑
∑ (7)
It is also easy to find a standard deviation σ for the number of trials until the first
success:
[ ] 2
2
1
1
2
2
2
2 1
)
1
(
4
4
4
4
4
@
@
@ −
=
−
−
⋅
=
−
≡ −
∞
=
−
∑
σ (8)
Making use of the one'sided Chebyshev’s inequality, we then obtain
{ } 2
)
1
(
1
1
Pr
−
+
−
−
≤
⋅
≥
$
4
4
$ , (9)
where $ > 0. If, for instance, we select 11
=
$ and run the 2 7, algorithm
)
1
11
( −
⋅
=
8 times without success, then, with confidence approximately 99%, we can
conclude that no solutions exist and stop. The question of the complexity of the algorithm
thus boils down to the estimation of : whether it grows as a ) or
) ) of the problem size . Consider a set of all points inside a unit
hypercube which possess the following property: when used as starting entries to the
2 7, algorithm, they evolve to the satisfying bit assignments during its progression. Let
*
and ]
[ *
1 denote this set and its measure, respectively. Then, assuming that we
choose the starting point at random with uniform distribution inside , we obtain
]
[
1
]
[
]
[
*
*
1
=
=
= −
1
1
1
4 (10)
Thus, we observe that the computational complexity of the problem is intrinsically
linked to the properties of the corresponding harmonic cost function ( in ℜ . This
connection suggests the possibility that methods of mathematical analysis, algebraic and
differential topology, as well as other disciplines traditionally belonging to the domain of
% may provide crucial insights into the most intriguing open
questions in modern complexity theory.
15
Needless to say that the obtained results are relevant not only for the specific
problem considered throughout this paper but for NP'complete problems since they
transform to each other by polynomial'time reduction.
To summarize, we have demonstrated the 4 of an arbitrary NP'
complete problem to a problem of checking whether a level hypersurface of a specifically
constructed harmonic cost function (with all diagonal entries of its Hessian matrix equal
to zero) intersects with a unit hypercube in many'dimensional Euclidean space. This is
the main result of the paper which can potentially lead to development of new algorithms
for NP'complete problems. As an illustration of power of our method, a simple iterative
algorithm (2 7,) belonging to the gradient descent family has been implemented for the
specific NP'complete problem ( or ). The algorithm allows for
almost linear speedup when carried out on multiple processors working in parallel.
Numerical simulations confirm its good performance on problems of different sizes and
reveal surprising behavior of variable flows for problems with low clause'to'
variable ratios. The computational complexity of the 2 7, algorithm remains an open
question intrinsically linked to the properties of the corresponding harmonic cost function
in ℜ .
[1]. Cook, S. A., The complexity of theorem'proving procedures, #
) % ' ) , pp. 151'158 (1971).
[2] Karp, R. M., Reducibility among combinatorial problems, in ) ' )
) 0 ) A2# % B C 8 0 D $ ; 9 % 0 D 0
Eds. R. E. Miller and J. W. Thatcher, New York: Plenum, pp. 85'103 (1972).
[3] Garey, M.R., Johnson, D.S., ) A * 7 % % '
) , W.H. Freeman and Co. (1979).
[4] Papadimitriou, C. H., Steiglitz, K., E) * % ) ,
Dover Publications, Inc. (1998).
[5] Schaefer, T. J., The complexity of satisfiability problems, ' % " % #
) % ' ) , ACM, New York, pp. 216–226 (1978).
[6] Galambos, J., Simonelli, I., 2 ' ) A 4 ; % )) , New York:
Springer'Verlag (1996).
[7] Papoulis, A., 0 8 1 0 % 0 2nd ed , New York:
McGraw'Hill, p. 23 (1984).
[8] Prosser, P., An empirical study of phase transitions in binary constraint satisfaction problems,
' A , v. 81, pp. 81'109 (1996).
[9] Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B. & Troyansky, L., Determining
computational complexity from characteristic ‘phase transitions’, , v. 400, pp. 133'137
(1999).
16
[10] Xu K., Li W., Exact Phase Transitions in Random Constraint Satisfaction Problems, B '
A 8 , v. 12, pp. 93'103 (2000).
[11] Achlioptas, D., Naor, A., & Peres, Y., Rigorous location of phase transitions in hard
optimization problems, , v. 435, pp. 759'764 (2005).
[12] Kalapala, V., and Moore, C., The phase transition in Exact Cover, % B '
% ) , 2008(5), pp.1–9 (2008).

More Related Content

PDF
Lect 31_32 NP and Intractability_Part 1.pdf
PDF
np hard, np complete, polynomial and non polynomial
PPS
Some topics in analysis of boolean functions
PDF
ExamsGamesAndKnapsacks_RobMooreOxfordThesis
PPTX
Design and Analysis of Algorithms Exam Help
PDF
OPTIMAL CHOICE: NEW MACHINE LEARNING PROBLEM AND ITS SOLUTION
PDF
Algorithm chapter 10
Lect 31_32 NP and Intractability_Part 1.pdf
np hard, np complete, polynomial and non polynomial
Some topics in analysis of boolean functions
ExamsGamesAndKnapsacks_RobMooreOxfordThesis
Design and Analysis of Algorithms Exam Help
OPTIMAL CHOICE: NEW MACHINE LEARNING PROBLEM AND ITS SOLUTION
Algorithm chapter 10

Similar to A Probabilistic Attack On NP-Complete Problems (20)

PDF
Função de mão única
PDF
PDF
28 Dealing with the NP Poblems: Exponential Search and Approximation Algorithms
PPTX
Lower bound theory Np hard & Np completeness
PPTX
P, NP and NP-Complete, Theory of NP-Completeness V2
PDF
Fine Grained Complexity
PDF
Cs6402 design and analysis of algorithms may june 2016 answer key
PDF
10.1.1.96.9176
PPT
lecture07 dicrete mathematics relation .ppt
PPTX
5.2 primitive recursive functions
PDF
Problemas de Smale
PDF
Solutions for Problems from Applied Optimization by Ross Baldick
PDF
Solutions for Problems from Applied Optimization by Ross Baldick
PPTX
Approximate-At-Most-k Encoding of SAT for Soft Constraints
PDF
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
PPT
tutorial.ppt
PDF
Litvinenko, Uncertainty Quantification - an Overview
PPTX
Design and Analysis of Algorithms Assignment Help
Função de mão única
28 Dealing with the NP Poblems: Exponential Search and Approximation Algorithms
Lower bound theory Np hard & Np completeness
P, NP and NP-Complete, Theory of NP-Completeness V2
Fine Grained Complexity
Cs6402 design and analysis of algorithms may june 2016 answer key
10.1.1.96.9176
lecture07 dicrete mathematics relation .ppt
5.2 primitive recursive functions
Problemas de Smale
Solutions for Problems from Applied Optimization by Ross Baldick
Solutions for Problems from Applied Optimization by Ross Baldick
Approximate-At-Most-k Encoding of SAT for Soft Constraints
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
tutorial.ppt
Litvinenko, Uncertainty Quantification - an Overview
Design and Analysis of Algorithms Assignment Help
Ad

More from Brittany Allen (20)

PDF
Article Paragraph Example. How To Write A 5 Paragrap
PDF
Exploring Writing Paragraphs And Essays By John Langan
PDF
Write My Personal Statement For Me Uk
PDF
Fountain Pen Handwriting Practice Part 2 Beautiful Handwriting ASMR Writing...
PDF
Argumentative Essays For College Students Coffee - F
PDF
Reflective Essay Structure Uk - INKSTERSCHOOLS.
PDF
Incredible Essay Prompt Examples Thatsnotus
PDF
Creative Writing (Structured Online Course For Writing
PDF
Short Story Writing
PDF
Write My Opinion Essay
PDF
Business Title Page Template Quote Templates Apa Essa
PDF
Ieee Paper Review Format - (PDF) A Technical Review
PDF
Funny College Application Essays - College Homework Help An
PDF
Essay Tips For Exams
PDF
Visual Text Analysis Essay Examples.
PDF
Editorial Outline
PDF
My Understanding Of Anxiety - Free Essay Example P
PDF
39 Personal Narrative Essay Examples 6Th Grad
PDF
006 Essay Example Five Paragraph Essays Paragra
PDF
Blank Paper To Write On Computer Hrwcolombia Co
Article Paragraph Example. How To Write A 5 Paragrap
Exploring Writing Paragraphs And Essays By John Langan
Write My Personal Statement For Me Uk
Fountain Pen Handwriting Practice Part 2 Beautiful Handwriting ASMR Writing...
Argumentative Essays For College Students Coffee - F
Reflective Essay Structure Uk - INKSTERSCHOOLS.
Incredible Essay Prompt Examples Thatsnotus
Creative Writing (Structured Online Course For Writing
Short Story Writing
Write My Opinion Essay
Business Title Page Template Quote Templates Apa Essa
Ieee Paper Review Format - (PDF) A Technical Review
Funny College Application Essays - College Homework Help An
Essay Tips For Exams
Visual Text Analysis Essay Examples.
Editorial Outline
My Understanding Of Anxiety - Free Essay Example P
39 Personal Narrative Essay Examples 6Th Grad
006 Essay Example Five Paragraph Essays Paragra
Blank Paper To Write On Computer Hrwcolombia Co
Ad

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
master seminar digital applications in india
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Basic Mud Logging Guide for educational purpose
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Classroom Observation Tools for Teachers
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Insiders guide to clinical Medicine.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Microbial disease of the cardiovascular and lymphatic systems
Microbial diseases, their pathogenesis and prophylaxis
master seminar digital applications in india
01-Introduction-to-Information-Management.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Basic Mud Logging Guide for educational purpose
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Computing-Curriculum for Schools in Ghana
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
RMMM.pdf make it easy to upload and study
Final Presentation General Medicine 03-08-2024.pptx
Supply Chain Operations Speaking Notes -ICLT Program
Classroom Observation Tools for Teachers
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...

A Probabilistic Attack On NP-Complete Problems

  • 1. 1 Alexander Y. Davydov AlgoTerra LLC, 249 Rollins Avenue, Suite 202, Rockville, MD 20852, USA E'mail: June 30, 2011 Using the probability theory'based approach, this paper reveals the equivalence of an arbitrary NP' complete problem to a problem of checking whether a level hypersurface of a specifically constructed harmonic cost function (with all diagonal entries of its Hessian matrix equal to zero) intersects with a unit hypercube in many'dimensional Euclidean space. This connection suggests the possibility that methods of continuous mathematics can provide crucial insights into the most intriguing open questions in modern complexity theory. Key words: NP'complete, Harmonic cost function, Level hypersurface, Union bound, Universality Non'deterministic polynomial time complete (NP'complete) problems are of considerable theoretical and practical interest and play a central role in the theory of computational complexity in modern computer science. Currently, more than three thousand vital computational tasks in operations research, machine learning, hardware design, software verification, computational biology, and other fields have been shown to be NP'complete. The ‘completeness’ designates the property that, if an efficient (= polynomial'time) algorithm for solving any of NP'complete problems could be found, then we would immediately have (as a minimum) an efficient algorithm for problems in this class [1'4]. Despite persistent efforts by many talented researchers throughout several decades, it is not currently known whether NP'complete problems can be efficiently solved. An unproven conjecture broadly spread among complexity theorists is that such polynomial'time algorithm cannot exist. It is also a general belief that either proof or disproof of this conjecture can only be obtained through development of some new mathematical techniques. In the present paper, a novel approach to tackling NP'complete problems is proposed bringing a fresh perspective on the subject matter. More specifically, our approach takes the problem from the realm of mathematics and reformulates it in the realm of mathematics, where it is then treated using tools of mathematical analysis and probability theory. The main idea of the proposed method stems from recognizing that, owing to exponentially large solution space for whichever NP'complete problem, any prospective approach to solving it by examining solution candidates sequentially, one by one, is predestined to fail in yielding an efficient algorithm, regardless of how smart and sophisticated this approach is. Assume for a moment that an efficient algorithm for solving NP'complete problems does exist. Then we can only hope to discover it if, at the very minimum, we learn how to manipulate the solution candidates and avoid detailed examination of a specific candidate prematurely. At first sight, it seems like an impossible task. Surprisingly, this paper shows that it is attainable. The two key elements to success are (i) introduction of a new
  • 2. 2 set of variables using probabilistic reasoning and (ii) smart choice of a cost function to be minimized which is expressed in terms of these variables. For definiteness, we will consider a specific NP'complete problem, the ( ) which is equivalent to [5]. The problem is to decide whether there exists an assignment of bits ) , , , ( 2 1 ! = , each taking value " or , such that # clauses (constraints) are simultaneously satisfied. Each clause involves exactly three bits, say $ , and with { } $ , , 2 , 1 , , ∈ , and is satisfied if and only if one of the bits is and the other two are ", i.e., 1 = + + $ . It is assumed that, within each clause, the indices $, , and are all distinct. The rest of the paper is organized as follows. In Section 2, we reformulate the problem in the realm of continuous mathematics using a new set of variables and a cost function with some rather remarkable properties. Section 3 presents an iterative algorithm for solving that candidly exploits these properties. The illustrative examples of the algorithm’s performance for different problem sizes are given in Section 4. The empirical evidence of the surprising of variable flows as the algorithm proceeds at low clauses'to'variables ratios is presented in Section 5. Finally, in Section 6, we discuss open issues related to computational complexity of the presented algorithm and summarize the results. There are 2 possible assignments of bits $ , so checking them sequentially until a solution is found would take (on average) exponential time. Instead, one would like to evaluate all candidate solutions simultaneously and manipulate the whole pool of them in a manner ensuring that promising candidate solutions get % while the unpromising ones become in some sense. To reach this goal, we introduce a set of new variables for the problem as follows. With each bit , we associate a probability that value of is chosen as ": = Pr{ = 0}, 0 ≤ ≤ 1, = 1, 2, …, . (1) It follows that 1 − = Pr{ = 1}, 0 ≤ ≤ 1, = 1, 2, …, . Vice versa, given some value of ] 1 , 0 [ ∈ , the associated bit is selected to be " or at random with probabilities and − 1 , respectively. Let ) , , , ( 2 1 & = denote an 'element vector representing the whole set of new variables. Consider two limiting cases. In first case, let all components of vector & be either " or . Then we have no ambiguity in choosing the corresponding bits !, i.e., there is one'to'one correspondence between & and !. In another limiting case, all components of vector & are distinct from either " or . Then, for a fixed vector &, we can choose any of 2 possible assignments of bits !, although possibly with different weights. In particular, when ) , , , ( 2 1 2 1 2 1 = & ,
  • 3. 3 each assignment of bits has the same weight (probability), no one assignment is chosen more often (on average) than the other. Now we would like to construct a continuous ' ( for the problem with domain coinciding with the hypercube : 0 ≤ ≤ 1, ( = 1,…, ). The highly desired property of the cost function is its dependence upon the structure of the problem in such a way that makes it ‘feel’, in some sense, where the search for the satisfying assignments should be conducted. As a first step in search for such a function, let us find the probability that a clause is ' given some fixed &. Table 1 lists all possible assignments of three distinct bits involved in the clause along with the corresponding probabilities. List of bit assignments and their probabilities for an arbitrary clause ! $ + + = " 0 0 0 unsatisfied $ ⋅ ⋅ 0 0 1 ' ) 1 ( $ − ⋅ ⋅ 0 1 0 ' $ ⋅ − ⋅ ) 1 ( 0 1 1 unsatisfied ) 1 ( ) 1 ( $ − ⋅ − ⋅ 1 0 0 ' $ ⋅ ⋅ − ) 1 ( 1 0 1 unsatisfied ) 1 ( ) 1 ( $ − ⋅ ⋅ − 1 1 0 unsatisfied $ ⋅ − ⋅ − ) 1 ( ) 1 ( 1 1 1 unsatisfied ) 1 ( ) 1 ( ) 1 ( $ − ⋅ − ⋅ − The probability reads $ $ $ $ $ $ ⋅ − ⋅ − ⋅ − ⋅ ⋅ ⋅ + = ⋅ ⋅ − − ⋅ − ⋅ − − ⋅ ⋅ − = 3 1 ) 1 ( ) 1 ( ) 1 ( 1 (2) Let us now define a ' as a sum of probabilities for all clauses: ) , , ( ) ( 1 $ # & ( ∑ = = (3)
  • 4. 4 If all clauses were independent, then ((&) would have a meaning of ) that at least one of the clauses of the corresponding problem is unsatisfied by the assignment of bits chosen in accordance with &. However, clauses that contain common variables are dependent and so one cannot interpret ( as such probability in general case. Nevertheless, ( has a number of interesting properties which happen to be very useful for reformulating the problem in terms of continuous mathematics. Let us now turn to investigating these properties. It is useful to expand the domain of the cost function ( to the 'dimensional space ℜ . This allows us to focus on general behavior of ( without restrictions irrelevant to our goal and also has some other benefits as it will become evident shortly. The domain expansion is straightforward since ( is a non'singular function everywhere in ℜ . ((&) is a % function in ℜ : 0 2 = ∇ ( . Moreover, ( is a harmonic function of any non'empty subset of variables (assuming that values of the remaining variables are hold fixed). '* Since each clause C consists of three distinct bits, the corresponding probability depends on three different variables. From (2), we then observe that 0 , 0 , 0 2 2 2 2 2 2 = ∂ ∂ = ∂ ∂ = ∂ ∂ $ for any , and the statement of the + immediately follows. ▄ From + and the well'known property of harmonic functions in ℜ , it follows that the cost function ( can attain its maximum and minimum values in an arbitrary compact domain , only on the boundary , δ ( % ) ) ). Notice also that the Hessian matrix of function ( has all its diagonal elements equal to zero everywhere in ℜ . ((&) > 0 everywhere in of the hypercube (0 < < 1, =1,…, ). '* By definition, ( is a sum of that all attain values inside the hypercube because they have a meaning of probability there. Note that "inside the hypercube" means that all components of vector & are different from either " or although they might come very close to these limiting values. Recall that, any vector & inside corresponds to all 2 assignments of logic variables although some of them are more probable that the other. Now suppose (-&. = 0 somewhere in the interior of . Then it follows that probabilities = 0 for all =1,…, simultaneously. This means that clauses in the corresponding EC3 problem are satisfied with probability 1 ) ' % ' ! But clearly this is impossible. Thus, by , (-&. must be positive everywhere inside the hypercube . ( may or may not vanish only on the boundary ∂ . ▄ The spectrum of possible values of ( on the vertices of the hypercube is discrete and consists of integers from 0 to /0 / ≤ #: ( = {0, 1, 2,…, /} for &∈1 '
  • 5. 5 The value of ( at any particular vertex gives the number of unsatisfied clauses of the corresponding problem, provided that logic variables are chosen in accordance with vector & at this vertex. / is the maximum number of clauses that can be simultaneously unsatisfied and it depends on the specific structure of the problem under consideration. '* As mentioned earlier, there is one'to'one correspondence between ! and & on the vertices of the hypercube . So there is no ambiguity in values of logic variables and hence every clause can be either satisfied or unsatisfied by the definite set of logic variables !. This means that each is either (clause is unsatisfied) or " (clause is satisfied). Since ( is a sum of all , its spectrum of possible values consists of numbers of ' clauses and can vary from 0 to some integer /, which does not exceed # and is problem'specific. ▄ If a level hypersurface defined by the equation (-&. = 1 passes through the of , then there exists at least one satisfying bit assignment for the problem. '* Recall that, inside the hypercube , each has the meaning of probability of an event that the corresponding clause is unsatisfied, i.e., { } ∈ = & & , ) ( Pr . From the (or 2 3 4 ) [6], it follows ∑ = = ≤       # # 1 1 Pr , (4) which can be also recast as ∑ = = ≤           − # # 1 1 Pr 1 (4a) Applying DeMorgan’s law [7] to the lhs of (4a) and making use of the definition (3), one obtains ) ( 1 Pr 1 & ( # − ≥       = (5) If (-&. = 1 inside , then there exists an open set ∈ * & where 1 ) ( * < & ( due to continuity of the cost function. It then follows from (5) that, for such * & , the probability of all clauses being simultaneously satisfied is ) : 0 ) ( Pr 1 * >       = # & , which proves the + 5. ! In section 2.2, we presented a simple procedure of how, for a given problem with logic variables and # clauses, a % function ( of continuous variables can be
  • 6. 6 constructed in the analytic form. Function ( defines a hypersurface " by the equation: ((&) = 0. From + 6 it follows that there are only two possibilities: 1. " intersects the hypercube (0 ≤ ≤ 1, =1, 2, …, ) on the boundary ∂ ; 2. " does not have any common points with the hypercube . From the meaning of (, it is obvious that, in Case #1, the corresponding problem is satisfiable (and the vertices of that belong to the hypersurface " uniquely determine the satisfying assignments of logic variables) while, in Case #2, the problem has no solutions Thus can be thought of as a problem of between a level hypersurface " of a harmonic cost function ( and a unit hypercube in ℜ . Alternatively, based on the + 5, we can present the problem as a question of whether a hypersurface defined by the equation ((&) = 1 with the of the hypercube . In any case, we observe that the , initially formulated as a problem in the realm of discrete mathematics, can be mapped into a problem in the realm of continuous mathematics. The reformulated problem can be then attacked using the methods of mathematical analysis, differential geometry, and manifold topology. New interesting algorithms for solving NP'complete problems may emerge as a result of these attacks. The next section presents an example of such an algorithm. # ! $ # ! ! Possibly the simplest new algorithm for the problem belongs to the family and is described as follows: 2 ) 7 , -2 7,. % (1) Construct cost function ((&) using Eqs (2) and (3); (2) Select a starting point ) 0 ( & inside and a constant step parameter 0 > η ; (3) (3.1) Update: ) ( ) ( ) ( ) 1 ( ) ( $ $ $ $ & ( ∂ ∂ ⋅ − = + η for ..., , 2 , 1 = ; (3.2) 1 ) 1 ( > + $ 1 ) 1 ( = + $ ; (3.3) 0 ) 1 ( < + $ 0 ) 1 ( = + $ ; % stopping criterion ) ( ) 1 ( $ $ & & = + is satisfied (4) Compute: ) (& ( ; (5) 0 ) ( = & ( has a solution given by ! chosen in accordance with &; & report that a single run from starting point ) 0 ( & failed to find a solution.
  • 7. 7 Several remarks are in order. First, + guarantees that the algorithm presented above never becomes trapped at some local minimum simply because of absence of local minima for a harmonic function of a compact domain. The only stationary points that can be encountered during evolution of & are the ) where 0 = ∇( . The saddle points form a set of measure zero in ℜ , thus making the probability of & passing through them negligible. Nevertheless, saddles play a crucial role by being bifurcation points that separate different evolution paths. They may cause two trajectories that originate at close starting points to end up far away from each other. The noticeable saddle point in the interior of the hypercube is ) , , , ( 3 2 3 2 3 2 = & , as can be easily verified by differentiation of with respect to any variable. Hence this point must not be used to start the algorithm with. However, any other point in the vicinity of & can serve as a starting point (provided it is not a saddle). Second remark concerns consecutive runs of the algorithm, each time starting from a different point. If, during the latest run, the solution is found, we stop; otherwise we continue until maximum time designated to find a solution has been exhausted. With each unsuccessful run of the algorithm, the confidence that there is no solution increases. Whether it is possible or not to obtain a quantitative estimate of the probability that solution exists in terms of , #, and the number of unsuccessful runs 8 is an open question which we will discuss briefly in Section 6. Third, notice that consecutive runs of the algorithm can be implemented in parallel. Once the cost function is generated, steps (2) through (4) of the 2 7, can be carried out on / different processors simultaneously resulting in almost linear speedup, with each processor using its own starting point generated at random according to some protocol. ' # ( Let us consider several examples illustrating the performance of the algorithm described in Section 3. All problem instances presented below were generated randomly, and the starting points for consecutive runs of the 2 7, algorithm were chosen at random lying on the surface of an 'dimensional sphere centered at ) , , ( 2 1 2 1 with the radius 0.05. # denotes the ‘) ’ of size × # which consists of indices of variables involved in clauses columnwise. The step parameter η was chosen to be equal to 0.005. A. Small'size problem: = 15; # = 8 (M/N ≅ 0.53) 10 8 15 15 12 15 13 15 9 6 13 12 6 14 7 6 6 1 5 7 2 11 4 3 = # The solution was found after the very first run: Z = (1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0) Figures 1 and 2 show the evolution of the cost function and the variables ) , , 1 ( = , respectively, as the algorithm proceeds. Notice that some are changing in non' monotonic manner.
  • 8. 8 0 50 100 150 200 250 300 350 400 450 500 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 k = iteration # Cost function F(k) ) ! Cost function ( versus iteration number for the case . 0 50 100 150 200 250 300 350 400 450 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) ) ! Evolution of variables for the case .
  • 9. 9 B. Medium'size problem: = 100; # = 40 (M/N = 0.4) It took four successive runs of the 2 7, algorithm to find a solution. The evolution curves for the cost function ( for all four runs are shown in Figure 3, and the evolution of the variables for the last successful run is presented in Figure 4. 0 100 200 300 400 500 600 700 800 0 5 10 15 20 25 k = iteration # Cost function F(k) ) ! Cost function ( versus iteration number for the case 2 (for four runs of the BSGD algorithm). 0 100 200 300 400 500 600 700 800 900 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) ) ! ' Evolution of variables in the successful run for the case 2.
  • 10. 10 C. Large'size problem: = 1000; # = 250 (M/N = 0.25) It took three runs to find a solution. Figures 5 and 6 present the results (see captions for details). 0 200 400 600 800 1000 1200 1400 1600 0 20 40 60 80 100 120 140 160 k = iteration # Cost function F(k) ) ! * Cost function ( versus iteration number for the case (for three runs of the BSGD algorithm). 0 200 400 600 800 1000 1200 1400 1600 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) ) ! + Evolution of variables in the successful run for the case .
  • 11. 11 * % # " , ) & - & , 9 )) ' $ : % )) ' % )) ; ; + 0 < / = There exists extensive empirical evidence suggesting that many constraint satisfaction problems exhibit phase transition phenomena from satisfiability to unsatisfiability as the ratio of the number of clauses (constraints) to the number of variables (= #/ ) passes through some threshold value * in the limit ∞ → [8'11]. For the (or ) problem, it is found by numerical experiments that 62 . 0 * ≈ [12]. Consequently, randomly generated instance of the problem with large has a solution w.h.p. when * << . Figure 7 shows the evolution of variables for the randomly generated problem with 025 . 0 = (# = 25, = 1000). Surprisingly, these variable flows are ) and persist for different problem instances with small (which all have satisfying assignments w.h.p.). One can distinguish five families of well'separated variable flows, all originating from starting points around ½ (see Fig. 7): – Monotonically growing towards ; – Monotonically growing to a plateau at (roughly) 6> and then splitting into two sub'flows leading in the opposite directions to and ", respectively; – Growing towards first and then returning to a plateau at ? where the flow splits into two sub'flows towards and ", respectively; , – Growing towards first and then changing the direction towards "; , – Horizontal flow of irrelevant variables (those that are not involved in any clause). 0 500 1000 1500 2000 2500 3000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) ) ! . Typical variable flows for randomly generated problem at low clauses'to' variables ratios (see text for details).
  • 12. 12 It is also worth noting that, for any variable and independently from the value of , the slope of the corresponding trajectory can take only discrete set of values near the starting point close to ½. Specifically, for any { } $ , , 2 , 1 ∈ it holds $ $ $ ⋅ ≅ − η 4 1 ) 1 ( ) 2 ( , (6) where { } # $ , , 2 , 1 , 0 ∈ denotes the total number of clauses that $ is involved in. Figure 8 clearly illustrates this fact by zooming in the leftmost region of Fig. 7. 0 20 40 60 80 100 120 140 160 180 200 0.48 0.5 0.52 0.54 0.56 0.58 0.6 0.62 0.64 iteration # X n (n = 1,...,N) ) ! / Zooming in the region of Fig. 7 near the starting points around ½; discreteness of the spectrum of starting slopes for variable trajectories can be clearly observed. Numerical experiments show that, as the ratio increases, the highly ordered behavior of variable flows (as demonstrated in Figure 7) starts to crumble, with new trajectories not belonging to any of the abovementioned five families beginning to appear. With further increase of , the basic five types of flow become wider and start to overlap while at the same time share of the ‘irregular’ trajectories grows. Finally, when ratio approaches the threshold value, only few variables follow the familiar paths, as illustrated in Table 2. The situation, to some extent, resembles the transition from laminar to turbulent flow in hydrodynamics, with control parameter playing the role of the Reynolds number. The nature of such behavior of variable flows for randomly generated instances remains to be understood and maybe even quantified.
  • 13. 13 Typical variable flows generated by the 2 7, algorithm " 0.28 200 400 600 800 1000 1200 1400 1600 1800 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) r = 0.28 ( M = 28; N = 100 ) 0 0.44 100 200 300 400 500 600 700 800 900 1000 1100 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) r = 0.44 ( M = 44; N = 100 ) 200 400 600 800 1000 1200 1400 1600 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) r = 0.44 ( M = 44; N = 100 ) 0.52 200 400 600 800 1000 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) r = 0.52 ( M = 52; N = 100 ) 200 400 600 800 1000 1200 1400 1600 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) r = 0.52 ( M = 52; N = 100 ) 0.6 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 iteration # X n (n = 1,...,N) r = 0.6 ( M = 60; N = 100 )
  • 14. 14 + 1 ! The most interesting question regarding the 2 7, algorithm concerns its computational complexity. This is a very difficult question and we will not attempt to fully address it in this paper. Instead, some preliminary analysis of the complexity issues is given below. Assume that the problem at hand has one or several satisfying bit assignments. Suppose also that, without knowing that the problem is satisfiable, we have run the 2 7, algorithm independently (using randomly selected starting points) 8 times and it has failed to find at least one satisfying solution. How confident can we be that there is a solution to be found if we continue? Let 4 be a probability to find any satisfying bit assignment in a single run of the 2 7, algorithm in case when at least one solution does exist. Then the expected number of trials until the first success is 1/4. Indeed, probability that exactly @ trials are needed is { } 1 ) 1 ( Pr − − = = @ 4 4 @ , and hence [ ] { } 4 4 @ 4 @ @ @ @ @ / 1 ) 1 ( Pr 1 1 1 = − ⋅ ⋅ = = ⋅ = ≡ − ∞ = ∞ = ∑ ∑ (7) It is also easy to find a standard deviation σ for the number of trials until the first success: [ ] 2 2 1 1 2 2 2 2 1 ) 1 ( 4 4 4 4 4 @ @ @ − = − − ⋅ = − ≡ − ∞ = − ∑ σ (8) Making use of the one'sided Chebyshev’s inequality, we then obtain { } 2 ) 1 ( 1 1 Pr − + − − ≤ ⋅ ≥ $ 4 4 $ , (9) where $ > 0. If, for instance, we select 11 = $ and run the 2 7, algorithm ) 1 11 ( − ⋅ = 8 times without success, then, with confidence approximately 99%, we can conclude that no solutions exist and stop. The question of the complexity of the algorithm thus boils down to the estimation of : whether it grows as a ) or ) ) of the problem size . Consider a set of all points inside a unit hypercube which possess the following property: when used as starting entries to the 2 7, algorithm, they evolve to the satisfying bit assignments during its progression. Let * and ] [ * 1 denote this set and its measure, respectively. Then, assuming that we choose the starting point at random with uniform distribution inside , we obtain ] [ 1 ] [ ] [ * * 1 = = = − 1 1 1 4 (10) Thus, we observe that the computational complexity of the problem is intrinsically linked to the properties of the corresponding harmonic cost function ( in ℜ . This connection suggests the possibility that methods of mathematical analysis, algebraic and differential topology, as well as other disciplines traditionally belonging to the domain of % may provide crucial insights into the most intriguing open questions in modern complexity theory.
  • 15. 15 Needless to say that the obtained results are relevant not only for the specific problem considered throughout this paper but for NP'complete problems since they transform to each other by polynomial'time reduction. To summarize, we have demonstrated the 4 of an arbitrary NP' complete problem to a problem of checking whether a level hypersurface of a specifically constructed harmonic cost function (with all diagonal entries of its Hessian matrix equal to zero) intersects with a unit hypercube in many'dimensional Euclidean space. This is the main result of the paper which can potentially lead to development of new algorithms for NP'complete problems. As an illustration of power of our method, a simple iterative algorithm (2 7,) belonging to the gradient descent family has been implemented for the specific NP'complete problem ( or ). The algorithm allows for almost linear speedup when carried out on multiple processors working in parallel. Numerical simulations confirm its good performance on problems of different sizes and reveal surprising behavior of variable flows for problems with low clause'to' variable ratios. The computational complexity of the 2 7, algorithm remains an open question intrinsically linked to the properties of the corresponding harmonic cost function in ℜ . [1]. Cook, S. A., The complexity of theorem'proving procedures, # ) % ' ) , pp. 151'158 (1971). [2] Karp, R. M., Reducibility among combinatorial problems, in ) ' ) ) 0 ) A2# % B C 8 0 D $ ; 9 % 0 D 0 Eds. R. E. Miller and J. W. Thatcher, New York: Plenum, pp. 85'103 (1972). [3] Garey, M.R., Johnson, D.S., ) A * 7 % % ' ) , W.H. Freeman and Co. (1979). [4] Papadimitriou, C. H., Steiglitz, K., E) * % ) , Dover Publications, Inc. (1998). [5] Schaefer, T. J., The complexity of satisfiability problems, ' % " % # ) % ' ) , ACM, New York, pp. 216–226 (1978). [6] Galambos, J., Simonelli, I., 2 ' ) A 4 ; % )) , New York: Springer'Verlag (1996). [7] Papoulis, A., 0 8 1 0 % 0 2nd ed , New York: McGraw'Hill, p. 23 (1984). [8] Prosser, P., An empirical study of phase transitions in binary constraint satisfaction problems, ' A , v. 81, pp. 81'109 (1996). [9] Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B. & Troyansky, L., Determining computational complexity from characteristic ‘phase transitions’, , v. 400, pp. 133'137 (1999).
  • 16. 16 [10] Xu K., Li W., Exact Phase Transitions in Random Constraint Satisfaction Problems, B ' A 8 , v. 12, pp. 93'103 (2000). [11] Achlioptas, D., Naor, A., & Peres, Y., Rigorous location of phase transitions in hard optimization problems, , v. 435, pp. 759'764 (2005). [12] Kalapala, V., and Moore, C., The phase transition in Exact Cover, % B ' % ) , 2008(5), pp.1–9 (2008).