NP-Completeness And Reduction


There are many problems for which no polynomial-time algorithms ins known. Some of these problems are traveling salesperson, optimal graph coloring, the knapsack problem, Hamiltonian cycles, integer programming, finding the longest simple path in a graph, and satisfying a Boolean formula.

These problems belongs to an interesting class of problems, called the NP-Complete" problems, whose status is unknown.

The NP-Complete problems are tractable i.e., require a superpolynomial time. The reason is that a polynomial time algorithm to solve any one of the NP-Complete problems would automatically provide us with polynomial-time algorithms for all of them.

The basic idea is that there may be problems that are hard to solve, but the validity of any purported solution can be verified easily.

Consider the problem "Hamiltonian Cycle". Hence an undirected graph, the problem is to find a path that starts from some node, visits each node once and only once, and returns to the starting node. If such cycle exists, we say that the graph is Hamiltonian. This problem is hard. However, it is easy to verify whether a "given" sequence of nodes defines a Hamiltonian cycle.


Polynomial-Time Algorithm

  1. Algorithms with worst case running time of O(nk), where k is a constant, are called tractable others are called intractable or super-polynomial.
  2. Formally, an algorithm is polynomial-time algorithm if there exists a polynomial p(n) such that the algorithm can solve any instance of size n in a time O(p(n)).
  3. Problem requiring Ω(n35) time to solve are essentially intractable for large n. Most known polynomial time algorithm run in time O(nk) for fairly low value of k.

The advantages in considering the class of polynomial-time algorithms is that all reasonable deterministic single processor model of computation can be simulated on each other with at most a polynomial slow-down.



Decision Problem

For these problems, the answer is either yes or no or equivalently either true or false.
For example, the problem "Find a Hamiltonian  cycle in graph G" is not a decision problem, but "Is  graph G Hamiltonian?" is a decision problem.

The theory of NP-Completeness is concerned with the notion of polynomially varifiable properties. Intuitively, a decision problem X is polynomial-time verifiable if someone could show that x in X whenever this is so. Given this example (i.e., x in X), one should be able to verify in polynomial-time (efficiently or easily) that indeed x in X. However, if in fact x not in X, then one should not be falsely show that x in X.


Example: Hamiltonian Cycle

This problem is difficult but it is easily verifiable: If one produces a Hamiltonian cycle, we can easily verify that it is correct. On other hand, nothing one could show to us would convince us that the graph is Hamiltonian if it is not.

Consider a decision problem X. Let the set Q be the proof space of X. A proof system for X is a set F of pairs <x, q>. For any x in X there must exist at least one q in Q such that <x, q> in F; on the other hand no such q must exist when x in X. Therefore, it suffices to such that for some q in Q such that <x, q> in F to convince that x in X.

Formally, F is a subset of X Q such that
    (For all x in X) (There exists q in Q) [<x, q > in F]
Any q such <x, q> in F is called a proof or a certified that x in X .

Note that
    (For all x not in X) (For all q in Q) [<x, q > not in F]
is implicit in the requirement that F must be a subset of X Q.

For example, if X is the set of all Hamiltonian graphs, let Q be a set of sequences of graph nodes and define <G, δ) in F if and only if the sequence of nodes δ specifies a Hamiltonian cycle in graph G.

The class NP correspond to the decision problems that have an efficient proof system, which means that each yes-instance must have at least one certificate whose validity can be verified quickly.



NP is the class of decision problems X that admit a proof system F subset X Q such that there exists a polynomial p(n) and a polynomial-time algorithm A such that

The Hamiltonian graph example files this definition. A Hamiltonian cycle in a graph, if it exists, takes less space to describe than the graph itself, and it can be verified in linear time whether a sequence of nodes defines a Hamiltonian cycle.


Optimization Problem

The problems in which some value must be minimized or maximized. Optimization problems can recast in terms of a decision problem by placing a bound on value to be optimized. For example, in recasting the shortest-path problem as the decision.


Polynomial Reduction

Let A and B two problems. We say that A is polynomially turing reducible to B if there exists an algorithm for sorting A in a time that would be polynomial if we could solve arbitrary instance of problem B at unit cost. This is denoted A T= <p B.
Note that if A T= <p B and B T= <p A then A == pBT and is called polynomially turing equivalent and polynomial reductions are transitive if A T= <p B and B T= <p A  then A T= <p C.

As an example, prove the polynomial equivalence of two versions of the Hamiltonian problem.


HAM == the problem of finding a Hamiltonian cycle in the graph if one exists and
HAMD == the problem of deciding whether or not a graph is Hamiltonian.

We allow an algorithm for HAM to return an arbitrary answer when presented with non-Hamiltonian graph.

The following theorem say that it is not significant harder to find a Hamiltonian cycle than to decide if a graph is Hamiltonian

Theorem: HAM == pHAMDT

1. HAMD ≤ pHAMDT direction

Consider the following algorithm

HamD (G:graph)
    δ Ham (G)
    if δ defines a Hamiltonian Cycle
HamD (G: graph)
δ Ham(G)
if δ defines a Hamiltonian Cycle in G
    then return true
    else return false

This algorithm solves HAMD correctly provided algorithm Ham solves problem HAM correctly: by definition of HAM, algorithm Ham must return a Hamiltonian cycle in G provided one exists, in which case HamD correctly return true. Conversely, if the graph is not Hamiltonian, the output δ returned by Ham cannot be a Hamiltonian cycle, and thus HamD will correctly return false.
If we count the call on Ham at a unit cost then algorithm HamD clearly takes polynomial time.
    HAMD ≤ pHAMDT direction

We are to find a Hamiltonian cycle assuming we know how to decide if such cycle exist. The idea is to remove each edge in turn and see whether the graph skill be Hamiltonian. We keep the edge if its removed would make the graph non-Hamiltonian; otherwise remove it

The resulting graph will still be Hamiltonian because we never make a change that would destroy this property. Moreover, the graph contains only the edges necessary to define a Hamiltonian cycle. Therefore, it suffices to follow the edges of the final graph to obtained a Hamiltonian cycle in the original graph.

Here is a greedy algorithm

Ham (G = (V, E))
    if HamD(G) = false
        then return "No Solution"
    For each e in E
        do if HamD(<V, A\{e}>)
            then E E \{e}

δ Sequence of nodes obtained by following the unique cycle remaining in G.

If we count each call on HamD at unit cost then clearly Ham takes polynomial time.


Abstract Problem

We define a n abstract problem Q to be a binary relation on a set I of problem instances and a set of problem solutions.

For example, An instance of SHORTEST-PATH is a triple consisting of a graph and given two vertices, <G, U, V>. A solution is a sequence of vertices in the graph including empty sequence (no path).

1. SHORTEST-PATH: Find a shortest path between two given vertices in an unweight and undirected graph G = (V, E).

In case of decision problem, we can view an abstract decision problem as a function that maps the instance set, I, to the solution set {0, 1}

f: I {0, 1}
f: I solution {0, 1}

For example, a decision problem PATH related to this shortest-path problem is, "Given a graph G = (V, E), two vertices u, v in V, and a nonnegative integer k, does a path exist in G between u and v whose length is at most k?



An encoding of a set S of abstract objects is a mapping 'e' from S to the set of binary strings
        e: s binary string

For example, encoding of natural number N = {0, 1, 2, 3, 4, . . . } as the string {0, 1, 10, 11, 100, . . . }. Using this encoding e(4) = 100. In the ASCII code e(A) = 100 0001.

Concrete Problem:
A problem whose set of instances is the set of binary strings.


Polynomial-Time Computable Function

A function f: {0, 1} * {0, 1} is polynomial-time computable function if there exists a polynomial time algorithm A that given any input x in {0, 1}*, produces f(x) as output.

The two codings e1 and e2 are polynomially related for some set of problem instances I, if there exists two polynomial-time computable function f12 and f21 such that for any i in I, we have f12(e1(i)) = e2(i) and f21(e2(i)) = e1(i). That is the encoding e2(i) can be computed from encoding e1(i) by a polynomial-time algorithm, and vice versa.

Lemma: Let Q be an abstract decision problem on an instance set I, and let e1 and e2 be polynomially related encoding on I. Then, e1(Q) in P if and on if e2(Q) in P.

Let e1(Q) can be solved in O(nk) time
Suppose, for instance i, encoding can be computed from encoding e1(i) in O(ne) where n = |e2(i)|

To solve problem e2(Q) on input e2(i).

Since both c and k are constant therefore O(nck) is polynomial.
And since, the argument in the other is symmetric, this completes the proof.


Formal Language

An alphabet S is a finite set of symbols.
A language over S is a subset of the set of all strings of symbols from S.
E is the empty language.
S*is the language consisting of all strings of symbols from S.

Operations on Languages

From the language theory, perspective, the set of instances for any decision problem Q is the set S*, where S = {0, 1}.

Since decision problem Q is entirely characterized by those instances of problem that produce a 1 (true) answer, Q can be viewed as a language L over S = {0, 1}, where
        L = {x in Sx : Q(x) =1}

For example,

The language L is decided by an algorithm A if every binary string is either accepted or rejected by the algorithm.

A language L is accepted in polynomial time by an algorithm A if for any length-n string x in L, the algorithm accepts x in time for some constant k.

A language L is decided in polynomial time by an algorithm A if for any length-n string x in {0, 1}* , the algorithm decides x O(nk) time for constant k.

Key Points:

To accept a language in polynomial time, an algorithm A consider only strings in L i.e., x in L.
To decide a language in polynomial time, an algorithm A must consider all string in {0, 1}* .
For example, the language PATH, as define above, can be accepted in polynomial time. One polynomial-time accepting algorithm would be
        Compute the distance d from u to v using a breadth first search.
        Compare: If d ≤ k
                then return true {and halt}.
                else {keep computing}

An algorithm accepts a strings belonging to PATH in polynomial time. But algorithm is unable to reject a string that does not belong to PATH.

Key Point

In order to decide an algorithm must explicitly reject (returns 0 or False) in addition accept (return 1 or True).


Complexity Class P

Recall P is the set of all decision problem that are solvable in polynomial time.

An algorithm definition by using language theory:
P = {L: L Subset {0,1}* and there exists an algorithm A that decides L in polynomial time}.

Theorem: P = {L: L is accepted by a polynomial-time algorithm}

The set of language decided by a polynomial-time algorithm is a subset of those accept by a polynomial algorithm.
Only show that if L is accepted by polynomial-time algorithm, it is also decided by a polynomial-time algorithm.
Let L be a language accepted by some polynomial-time algorithm, say A.
Use a "simulation" argument to constant another polynomial-time algorithm A` that decides L.
Because A accept L in O(nk) time for some constant k, there also exists a constant c such that A accepts L in at most T = cnk steps.
For input string x, the algorithm A` simulates the action of A for time T. At the end of time T, algorithm A` inspects the behavior of A.
        If A has accepted x, then A` accepts x by outputting a 1.
        If A has not accepted x, then A` rejects x by outputting a 0.
The overhead of A` does not increase the running time by more than a polynomial factor, and thus A` is a polynomial-time algorithm that decides L.

Note: Algorithm A` is a polynomial-time algorithm since A is polynomial-time algorithm.



  36.1-1 Let us define the optimization problem LONGEST-PATH-LENG as the relation that associates each instance of an undirected graph and the two vertices u and v. The decision problem  LONGEST-PATH is define as

{<G, u, v, k>: G = (V, E) is an undirected graph, u, v in V,  k>=0 is an integer, and there exists a simple path from u to v whose length is at least k}

Our goal is to proof that the optimization problem LONGEST-PATH-LENGTH can be solved in polynomial-time if and only if Longest-Path in P.

Forward direction: If  LONGEST-PATH-LENGTH can be solved in polynomial-time then following is a polynomial-time algorithm for LONGEST-PATH.

  1. INPUT: G, u, v, k.
  2. RUN the LONGEST-PATH-LENGTH on G, u, v.
  3. Suppose x be the result.
  4. If x ≥ k.
  5.     then return True.
  6.     else return False

Clearly this algorithm requires polynomial time provided we count the call on LONGEST-PATH-LENGTH at unit cost and LONGEST-PATH is in P.

Backward Direction: Suppose LONGEST-PATH can be solved in polynomial-time, then following is a polynomial-time algorithm for LONGEST-PATH-LENGTH:

1.INPUT: G, u, v
2. FOR k = n-1 to 0
        do RUN LONGEST-PATH on G, u, v, k
             IF result of LONGEST-PATH is TRUE
                 THEN return k
                 ELSE return arbitrary answer.

Clearly this requires only polynomial time, and LONGEST-PATH-LENGTH is in P.

This completes the proof.

Problem: Find the longest simple cycle in an undirected graph

To define this problem formally, one could phrase as an "abstract problem", which is a kind of relation:
LONGEST-SIMPLE-PATH = {<G, k>: G =(V, E) is an undirected graph, c is one of the longest simple cycle in G where length is k}.

A closely related decision problem is "Given an undirected graph G = (V, E), and an integer k, does G have a simple cycle of length at least k?

Note that since this is a decision problem, it must give a yes-or-no answer.

The language corresponding to the decision problem is :

SIMPLE-Cycle = {(G, k) is an undirected graph, K>= is an integer, and there exists a simple cycle in G whose length is at least k}.  

In the Encodings
A formal encoding of directed graphs as binary strings using adjacency matrix representation.

To encode this type of graph as a bit string, list the bits in the matrix, in left-to-right, top-to-bottom order. (This matrix taken from CLR- chapter Elementary Graph Algorithms).

0 1 0 0 1
1 0 1 1 1
0 1 0 1 0
0 1 1 0 1
1 1 0 1 0

This matrix-representation of graph could be encoded as:

        0 1 0 0  1 1 0 1 1 1 0 1 0 1 0 0 1 1 0 1 1 1 0 1 0

Since we know the matrix is square, we can create it by counting the bits in the input, taking the square root, and writing the bits in the matrix in same order. The representation uses order. The representation uses |V|2 bits.

A formal encoding of directed graph as binary string using adjacency-list representation.

This is little more complicated. The simplest thing we could do is to just write an ASCII file, with the nth line of file being the vertex number, followed by the numbers of all the vertices connected to this vertex.

For example, again taken from CLR-Chapter, elementary graph algorithms could be written as:

1 2 5    
2 1 5 3 4
3 2 4    
4 2 5 3  
5 4 1 2  

Clearly, an ASCII text file can be represented as a string of bits; that is how one could save some bits by writing the numbers in binary. At any rate, as the CLR says, the number of words of memory required for the adjacency-list representation wiil be O(|V| + |E|). Each vertex number will be between 1 and |V|, and so will require  Theta (lg|V|) bits to store. So, the total number of bits needed will be O((|V| + |E|) lg|V|)

Now the question is "are these two representations are polynomially related?"

To show that these representations are polynomially related, one can how that one representation can be transform into other representation in a polynomial-time.

To Convert the matrix to the adjacency list:

Repeat for every row
    Scan through the row from left-to-right.
    Add the column number to the list

To Convert the adjacency list to matrix:

Start with a matrix M of all 0's.
Let i be a counter of what line you are at. In that line:
    if hit the number j
        then put a 1 at M[i, j]  

  Suppose that a language L can accept any string xis in L in polynomial time, but that the algorithm that does this runs in superpolynomial time if string x is not in L. Now the question is whether or not a language L cab be decided in polynomial-time. Let n be the size of the input, and assume that we are given some algorithm A and a polynomial p(n) meeting the following conditions:

If a string x is in L, then algorithm A halts and answer yes in no more than p(n) steps.

If the string x is not in L,  then we know that string x is in L. If algorithm A answers NO, or has not yet stopped running (does not halt), then we know that string x is not in L.

Therefore, answer to the above question is yes, language L can be decided in polynomial time.

To say same thing in slightly different words we can say that to "accept" a language an algorithm (Turing Machine) halts on all x in L, but may not halt on x not in L. If on 'yes' instance the algorithm always halts in time less that cnk where c and k are constants, then another algorithm can be designed to stop itself after cnk steps, and thus decide the language.  □

Show that the class P is closed under union, intersection, concatenation,  complement, and Kleene star. Put it another way, if L1, L2 are in P then L1 U L2 is in P, etc.
Let L1, L2 be in P and A1, A2 be algorithms which decide L1, L2 within polynomial time. Then all new algorithm A3 (below) can be constructed in polynomial time.

L1 Union L2  = {x: (x is in L1) or (x is in L2) }

Algorithm A3(x)

IF (A1(x) = 1 or A2(x) = 1)
    THEN return 1;
    ELSE return 0

L1 Intersection L2  = {x: (x is in L1) and (x is in L2) }

Algorithm A3(x)

IF (A1(x) = 1 or A2(x) = 1)
    THEN return 1;
    ELSE return 0

L1 + L2  = {xy: (x is in L1) and (y is in L2) }

Algorithm A3(z)

(let |z| = n and z = z1z2 . . . zn)
FOR i = o to n
    do IF (A1(z1 . . . zi) = 1 and
            A2(zi+1 . . . zn) = 1)
         THEN return 1
return 0



Polynomial-time Verification

Suppose we are given the instance <G, u, v, k> of the decision problem PATH and a path p from u to v. We can easily check (or verify) that indeed p belongs to PATH (in polynomial-time). The p can be view as a "certificate" or "proof".


Hamiltonian Cycle

A hamiltonian  cycle of an undirected graph G = (V, E) is a simple cycle that contains each vertex in V. A hamiltonian graph is a graph with a hamiltonian cycle.

Decision problem Version.

Does a graph G have a hamiltonian cycle?
In formal language:
HAM-CYCLE = {<G> : G is a hamiltonian graph}.

Decide the language HAM-CYCLE?


Brute Force Algorithm
                Check all permutations of the vertices of G to see if given instance <G> is a hamiltonian path.



If n vertices in the graph, then there are n! permutation and n! is Ω(2n1/2). See CLR-chapter Growth of functions.
And Ω is not O(nk). Thus, this naive algorithm does run in polynomial time and in fact, no polynomial time algorithm is known (i.e., this problem is NP-complete).


Verification Algorithm

It is a two argument algorithm A(x, y), where x is an input string giving problem instance and y is a binary string giving a proposal solution and called certificate or proof.

A(x, y) = 1, if y is a solution
            = 0, otherwise.

An algorithm A verifies an input string x if there exists a certificate y such that A(x, y) = 1.
The formal language verified by a verification algorithm A is
            L = {x is in {0, 1}*: there exists y is in {0, 1} such that A(x, y) = 1}
Intuitively, an algorithm A(x, y) verifies a language L is for any string x in L, there is a certificate y that algorithm A(x, y) can use to prove that x belongs to L.


The Complexity class NP

The class of languages (i.e.,) decision problems) that can be verified by polynomial-time algorithm.
Formally, a language L belongs to NP if and only if there exists a two-input polynomial-time algorithm A and constant c such that
L = {x belongs to {x, 1}*: there exists a certificate (proof) y with |y| = O(|x|c) such that A(x, y) = 1}

We say that algorithm A(x, y) verifies language L in polynomial time.

Lemma: P subset NP

Proof: There is a polynomial-time algorithm B that decides L.
        Define A(x, y) = B(x) for any x belongs to {0, 1}* and for any certificate y.
        Algorithm A(x, y) ignores y and accepts those x belongs to L.

Note that we only need to show that a certificate exists; we don't need to verify a particular certificate.

Complexity class co-NP
This is the set of languages L such that L  belongs to NP. It is unknown whether NP = Co-NP.



Intuitively, a problem A can be reduced to another problem B if any instance of A can be "rephrased" as an instance of B, the solution to the instance of B provides a solution to the instance of A.

A language L1 is polynomial-time reducible to another language L2, denoted L1  p L2 if there exists a polynomial-time computable function
f:{0, 1}* {0, 1}* such that for all x belongs to {0, 1}*, x belongs to L1 if and only if f(x) belongs to L2.

We call the function f the reduction function, and a polynomial-time algorithm F that computes a reduction function f.