Strongly Connected Components

Decomposing a directed graph into its strongly connected components is a classic application of depth-first search. The problem of finding connected components is at the heart of many graph application. Generally speaking, the connected components of the graph correspond to different classes of objects. The first linear-time algorithm for strongly connected components is due to Tarjan (1972). Perhaps, the algorithm in the CLRS is easiest to code (program) to find strongly connected components and is due to Sharir and Kosaraju.

Given digraph or directed graph G = (V, E), a strongly connected component (SCC) of G is a maximal set of vertices C subset of V, such that for all u, v in C, both u Þ v and v Þ u; that is, both u and v are reachable from each other. In other words, two vertices of directed graph are in the same component if and only if they are reachable from each other.

SSC example

C₁ C₂ C₃ C₄

The above directed graph has 4 strongly connected components: C₁, C₂, C₃ and C₄. If G has an edge from some vertex in C_i to some vertex in C_j where i ≠ j, then one can reach any vertex in C_j from any vertex in C_i but not return. In the example, one can reach any vertex in C₂ from any vertex in C₁ but cannot return to C₁ from C₂.

The algorithm in CLRS for finding strongly connected components of G = (V, E) uses the transpose of G, which define as:

G^T = (V, E^T), where E^T = {(u, v): (v, u) in E}.
G^T is G with all edges reversed.

From the given graph G, one can create G^T in linear time (i.e., Θ(V + E)) if using adjacency lists.

Observation:

The graphs G and G^T have the same SCC's. This means that vertices u and v are reachable from each other in G if and only if reachable from each other in G^T.

Component Graph

The idea behind the computation of SCC comes from a key property of the component graph, which is defined as follows:

G^SCC = (V^SCC, E^SCC), where V^SCC has one vertex for each SCC in G and E^SCChas an edge if there's an edge between the corresponding SCC's in G.

For our example (above) the G^SCC is:

SSC example

The key property of G^SCC is that the component graph is a dag, which the following lemma implies.

Lemma G^SCC is a dag. More formally, let C and C' be distinct SCC's in G, let u, v in C, u', v' in C', and suppose there is a path u Þ u' in G. Then there cannot also be a path v' Þ v in G.

Proof Suppose there is a path v' Þ v in G. Then there are paths u Þ u' Þ v' and v' Þ v Þ u in G. Therefore, u and v' are reachable from each other, so they are not in separate SCC's.

This completes the proof.

ALGORITHM

A DFS(G) produces a forest of DFS-trees. Let C be any strongly connected component of G, let v be the first vertex on C discovered by the DFS and let T be the DFS-tree containing v when DFS-visit(v) is called all vertices in C are reachable from v along paths containing visible vertices; DFS-visit(v) will visit every vertex in C, add it to T as a descendant of v.

STRONGLY-CONNECTED-COMPONENTS (G)

1. Call DFS(G) to compute finishing times f[u] for all u.
2. Compute G^T
3. Call DFS(G^T), but in the main loop, consider vertices in order of decreasing f[u] (as computed in first DFS)
4. Output the vertices in each tree of the depth-first forest formed in second DFS as a separate SCC.

Time: The algorithm takes linear time i.e., θ(V + E), to compute SCC of a digraph G.

From our Example (above):

1. Do DFS
2. G^T
3. DFS (roots blackened)

SSC example

Another Example (CLRS) Consider a graph G = (V, E).

1. Call DFS(G)

2. Compute G^T

3. Call DFS(G^T) but this time consider the vertices in order to decreasing finish time.

4. Output the vertices of each tree in the DFS-forest as a separate strongly connected components.

{a, b, e}, {c, d}, {f, g}, and {h}

Now the question is how can this possibly work?

Idea By considering vertices in second DFS in decreasing order of finishing times from first DFS, we are visiting vertices of the component graph in topological sort order.

To prove that it really works, first we deal with two notational issues:

We will be discussing d[u] and f[u]. These always refer to the first DFS in the above algorithm.
We extend notation for d and f to sets of vertices U subset V:
- d(U) = min_{u in U} {d[u]} (earliest discovery time of any vertex in U)
- f(U) = min_{u in U} {f[u]} (latest finishing time of any vertex in U)

Lemma Let C and C' be distinct SCC's in G = (V, E). Suppose there is an edge (u, v) in E such that u in C and v in C'. Then f(C) > f(C').

scc6-Lemma1

Proof There are two cases, depending on which SCC had the first discovered vertex during the first DFS.

Case i. If d(C) > d(C'), let x be the first vertex discovered in C. At time d[x], all vertices in C and C' are white. Thus, there exist paths of white vertices from x to all vertices in C and C'.

By the white-path theorem, all vertices in C and C' are descendants of x in depth-first tree.

By the parenthesis theorem, we have f[x] = f(C) > f(C').

Case ii. If d(C) > d(C'), let y be the first vertex discovered in C'. At time d[y], all vertices in C' are white and there is a white path from y to each vertex in C. This implies that all vertices in C' become descendants of y. Again, f[y] = f(C').

At time d[y], all vertices in C are white.

By earlier lemma, since there is an edge (u, v), we cannot have a path from C' to C. So, no vertex in C is reachable from y. Therefore, at time f[y], all vertices in C are still white. Therefore, for all w in C, f[w] > f[y], which implies that f(C) > f(C').

This completes the proof.

Corollary Let C and C' be distinct SCC's in G = (V, E). Suppose there is an edge (u, v) in E^T where u in C and v in C'. Then f(C) < f(C').

Proof Edge (u, v) in E^T implies (v, u) in E. Since SCC's of G and G^T are the same, f(C') > f(C). This completes the proof.

Corollary Let C and C' be distinct SCC's in G = (V, E), and suppose that f(C) > f(C'). Then there cannot be an edge from C to C' in G^T.

Proof Idea It's the contrapositive of the previous corollary.

Now, we have the intuition to understand why the SCC procedure works.

When we do the second DFS, on G^T, start with SCC C such that f(C) is maximum. The second DFS starts from some x in C, and it visits all vertices in C. Corollary says that since f(C) > f(C') for all C' ≠ C, there are no edges from C to C' in G^T. Therefore, DFS will visit only vertices in C.

Which means that the depth-first tree rooted at x contains exactly the vertices of C.

The next root chosen in the second DFS is in SCC C' such that f(C') is maximum over all SCC's other than C. DFS visits all vertices in C', but the only edges out of C' go to C, which we've already visited.

Therefore, the only tree edges will be to vertices in C'.

We can continue the process.

Each time we choose a root for the second DFS, it can reach only

vertices in its SCC ‾ get tree edges to these,
vertices in SCC's already visited in second DFS ‾ get no tree edges to these.

We are visiting vertices of (G^T)^SCC in reverse of topologically sorted order. [CLRS has a formal proof.]

Before leaving strongly connected components, lets prove that the component graph of G = (V, E) is a directed acyclic graph.

Proof (by contradiction) Suppose component graph of G = (V, E) was not a DAG and G comprised of a cycle consisting of vertices v₁, v₂ , . . . , v_n . Each v_i corresponds to a strongly connected component (SCC) of component graph G. If v₁, v₂ , . . . , v_n themselves form a cycle then each v_i ( i runs from 1 to n) should have been included in the SCC corresponding to v_j ( j runs from 1 to n and i ≠ j). But each of the vertices is a vertex from a difference SCC of G. Hence, we have a contradiction! Therefore, SCC of G is a directed acyclic graph.

Related Problems

1. Edge-vertex connectivity problem.
2. Shortest path problem.

Updated: March 13, 2010.