Depth-first search is a systematic way to find all the vertices reachable from a source vertex, s. Historically, depth-first was first stated formally hundreds of years ago as a method for traversing mazes. Like breadth-first search, DFS traverse a connected component of a given graph and defines a spanning tree. The basic idea of depth-first search is this: It methodically explore every edge. We start over from different vertices as necessary. As soon as we discover a vertex, DFS starts exploring from it (unlike BFS, which puts a vertex on a queue so that it explores from it later).
Overall Strategy of DFS Algorithm
Depth-first search selects a source vertex s
in the graph and paint it as "visited." Now the vertex s becomes our
current vertex. Then, we traverse the graph by considering an arbitrary edge (u,
v) from the current vertex u. If the edge (u, v)
takes us to a painted vertex v, then we back down to the vertex u.
On the other hand, if edge (u, v) takes us to an unpainted vertex, then we paint
the vertex v and make it our current vertex, and repeat the above
computation. Sooner or later, we will get to a “dead end,” meaning all the edges
from our current vertex u takes us to painted vertices. This is a
deadlock. To get out of this, we back down along the edge that brought us here
to vertex u and go back to a previously painted vertex v. We
again make the vertex v our current vertex and start repeating the
above computation for any edge that we missed earlier. If all of v's
edges take us to painted vertices, then we again back down to the vertex we came
from to get to vertex v, and repeat the computation at that vertex.
Thus, we continue to back down the path that we have traced so far until we find
a vertex that has yet unexplored edges, at which point we take one such edge and
continue the traversal. When the depth-first search has backtracked all the way
back to the original source vertex, s, it has built a DFS tree of all vertices
reachable from that source. If there still undiscovered vertices in the graph
then it selects one of them as the source for another DFS tree. The result is a
forest of DFS-trees.
Note that the edges lead to new vertices are called discovery or tree edges and
the edges lead to already visited (painted) vertices are called back edges.
Like BFS, to keep track of progress depth-first-search colors each vertex. Each vertex of the graph is in one of three states:
1. Undiscovered;
2. Discovered but not finished (not done exploring from it); and
3. Finished (have found everything reachable from it) i.e. fully explored.
The state of a vertex, u, is stored in a color variable as follows:
1. color[u] = White - for the "undiscovered" state,
2. color[u] = Gray - for the "discovered but not finished" state, and
3. color[u] = Black - for the "finished" state.
Like BFS, depth-first search uses π[v] to record the parent of vertex v. We have π[v] = NIL if and only if vertex v is the root of a depth-first tree.
DFS time-stamps each vertex when its color is changed.
1. When vertex v is changed from white to gray the time is
recorded in d[v].
2. When vertex v is changed from gray to black the time is recorded in f[v].
The discovery and the finish times are unique integers, where for each vertex the finish time is always after the discovery time. That is, each time-stamp is an unique integer in the range of 1 to 2|V| and for each vertex v, d[v] < f[v]. In other words, the following inequalities hold:
1 ≤ d[v] < f[v] ≤ 2|V|
Algorithm Depth-First Search
The DFS forms a depth-first forest comprised of more than one depth-first trees. Each tree is made of edges (u, v) such that u is gray and v is white when edge (u, v) is explored. The following pseudocode for DFS uses a global timestamp time.
DFS (V, E)
1. for each vertex u in V[G]
2.
do color[u] ← WHITE
3.
π[u] ← NIL
4.
time ← 0
5.
for each vertex u in V[G]
6.
do if color[u] ← WHITE
7.
then DFS-Visit(u) ▷ build a new DFS-tree from
u
DFS-Visit(u)
1. color[u] ← GRAY ▷ discover
u
2.
time ← time + 1
3.
d[u] ← time
4.
for each vertex v adjacent to u ▷ explore (u,
v)
5.
do if color[v] ← WHITE
6.
then π[v] ← u
7.
DFS-Visit(v)
8.
color[u] ← BLACK
9.
time ← time + 1
10.
f[u] ← time
▷ we are done with u
Example (CLRS):
In the following figure, the solid edge represents discovery or tree edge and
the dashed edge shows the back edge. Furthermore, each vertex has two time
stamps: the first time-stamp records when vertex is first discovered and second
time-stamp records when the search finishes examining adjacency list of vertex.
Analysis
The analysis is similar to that of BFS analysis. The DFS-Visit is called (from DFS or from itself) once for each vertex in V[G] since each vertex is changed from white to gray once. The for-loop in DFS-Visit is executed a total of |E| times for a directed graph or 2|E| times for an undirected graph since each edge is explored once. Moreover, initialization takes Θ(|V|) time. Therefore, the running time of DFS is Θ(V + E).
Note that its Θ, not just O, since guaranteed to examine every vertex and edge.
Consider vertex u and vertex v in V[G] after a DFS. Suppose vertex v in some DFS-tree. Then we have d[u] < d[v] < f[v] < f[u] because of the following reasons:
1. Vertex u was discovered before vertex v; and
2. Vertex v was fully explored before vertex u was fully explored.
Note that converse also holds: if d[u] < d[v] < f[v] < f[u] then vertex
v is in
the same DFS-tree and a vertex v is a descendent of vertex u.
Suppose vertex u and vertex v are in different DFS-trees or suppose vertex
u and
vertex v are in the same DFS-tree but neither vertex is the descendent of the
other. Then one vertex was discovered and fully explored before the other was
discovered i.e., f[u] < d[v] or f[v] < d[u].
Parenthesis Theorem For all u, v, exactly one of the following holds:
1. d[u] < f[u] < d[v] < f[v]
or d[v] < f[v] < d[u] < f[u] and neither of
u and v is a descendant of the other.
2. d[u] < d[v] < f[v] < f[u] and v
is a descendant of u.
3. d[v] < d[u] < f[u] < f[v] and u
is a descendant of v.
[Proof omitted.]
So, d[u] < d[v] < f[u] < f[v] cannot happen. Like parentheses: ( ) [], ( [ ] ), and [ ( ) ] are OK but ( [ ) ] and [ ( ] ) are not OK.
Corollary Vertex v is a proper descendant of u if and only if d[u] < d[v] < f[v] < f[u].
White-path Theorem Vertex v is a descendant of u if and only if at time d[u], there is a path u to v consisting of only white vertices. (Except for u, which was just colored gray.)
[Proof omitted.]
Consider a directed graph G = (V, E). After a DFS of graph G we can put each edge into one of four classes:
1.
A tree edge is an edge in a DFS-tree.
2.
A back edge connects a vertex to an ancestor in a DFS-tree. Note that a
self-loop is a back edge.
3.
A forward edge is a non-tree edge that connects a vertex to a descendent in a
DFS-tree.
4.
A cross edge is any other edge in graph G. It connects vertices in two different
DFS-tree or two vertices in the same DFS-tree neither of which is the ancestor
of the other.
Lemma 1 An Edge (u, v) is a back edge if and only if d[v] < d[u] < f[u] < f[v].
Proof
(=> direction) From the definition of a back edge, it connects vertex u to an ancestor vertex v in a DFS-tree. Hence, vertex u is a descendent of vertex v. Corollary 22.8 in the CLRS (or see above) states that vertex u is a proper descendent of vertex v if and only if d[v] < d[u] < f[u] < f[v]. Hence proved forward direction.
(<= direction) Again by the Corollary 22.8 (CLRS), vertex u is a proper descendent of vertex v. Hence if an edge (u, v) exists from u to v then it is an edge connecting a descendent vertex u to its ancestor vertex v. Hence, it is a back edge. Hence proved backward direction.
Conclusion: Immediate from both directions.
Lemma 2 An edge (u, v) is a cross edge if and only if d[v] < f[v] < d[u] < f[v].
Proof
First take => direction.
Observation 1 For
an edge (u, v), d[u] < f[u] and d[v]
< f[v] since for any vertex has to be discovered before we can finish
exploring it.
Observation 2 From the definition
of a cross edge it is an edge which is not a tree edge, forward edge or a
backward edge. This implies that none of the relationships for forward edge {d[u]
< d[v] < f[v] < f[u]} or back edge {d[v] <
d[u] < f[u] < f[v]} can hold for a cross edge.
From the above two observations we conclude that the only two possibilities are:
1. d[u] < f[u] < d[v] < f[v]
2. d[v] < f[v] < d[u] < f[u]
When the cross edge (u, v) is discovered we must be at vertex u and vertex v must be black. The reason is that if v was white then edge (u, v) would be a tree edge and if v was gray edge (u, v) would be a back edge. Therefore, d[v] < d[u] and hence possibility (2) holds true.
Now take <= direction.
We can prove this direction by eliminating the various possible edges that the given relation can convey. If d[v] < d[v] < d[u] < f[u] then edge (u, v) cannot be a tree or a forward edge. Also, it cannot be a back edge by lemma 1. Edge (u, v) is not a forward or back edge. Hence it must be a cross edge (Confused? please go above and look again the definition of cross edge).
Conclusion: Immediately from both directions.
DFS-Visit can be modified to classify the edges of a directed graph during the depth first search:
DFS-Visit(u) ▷ with edge classification. G must be a directed graph
1. color[u] ← GRAY
2.
time ← time + 1
3.
d[u] ← time
4. for each vertex
v adjacent to u
5. do if color[v]
← BLACK
6.
then if d[u] < d[v]
7.
then Classify (u, v) as a forward edge
8.
else Classify (u, v) as a cross edge
9.
if color[v] ← GRAY
10.
then Classify (u, v) as a back edge
11.
if color[v] ← WHITE
12.
then π[v] ← u
13.
Classify (u, v) as a tree edge
14.
DFS-Visit(v)
15.
color[u] ← BLACK
16.
time ← time + 1
17.
f[u] ← time
Suppose G be an undirected graph, then we have following edge classification:
1. Tree Edge an edge connects a vertex with its parent.
2. Back Edge a non-tree edge connects a vertex with an ancestor.
3. Forward Edge There is no forward edges because they become back edges when
considered in the opposite direction.
4. Cross Edge There cannot be any cross edge because every edge of G must
connect an ancestor with a descendant.
Theorem In a depth-first search of an undirected graph G, every edge in E[G] is either a tree edge or a back edge. No forward or cross edges.
[Proof omitted.]
Algorithms based on DFS
Based upon DFS, there are O(V + E)-time algorithms for the following problems:
Testing whether graph is connected.
Computing a spanning forest of G.
Computing the connected components of G.
Computing a path between two vertices of G or reporting that no such path exists.
Computing a cycle in G or reporting that no such cycle exists.
Application
As an application of DFS lets determine whether or not an undirected graph contains a cycle. It is not difficult to see that the algorithm for this problem would be very similar to DFS(G) except that when the adjacent edge is already a GRAY edge than a cycle is detected. While doing this the algorithm also takes care that it is not detecting a cycle when the GRAY edge is actually a tree edge from a ancestor to a descendent.
ALGORITHM DFS_DETECT_CYCLES [G]
for each vertex u in V[G]
do color[u] ← WHITE
predecessor[u] ← NIL
time ← 0
for each vertex u in V[G]
do if color[u] ← WHITE
DFS_visit(u)
The sub-algorithm DFS_visit(u) is as follows:
DFS_visit(u)
color(u) ← GRAY
d[u] ← time ← time + 1
for each v adjacent to u
do if color[v] ← GRAY and
Predecessor[u] ≠ v
return "cycle exists"
if color[v]
← WHITE
do predecessor[v] ← u
recursively DFS_visit(v)
color[u] ← BLACK
f[u] ← time ← time + 1
Correctness
To see why this algorithm works suppose the node to visited v is a gray node, then there are two possibilities. The first possibility is that the node v is a parent node of u and we are going back the tree edge which we traversed while visiting u after visiting v. In that case it is not a cycle. The second possibility is that v has already been encountered once during DFS_visit and what we are traversing now will be back edge and hence a cycle is detected.
Time Complexity
The maximum number of possible edges in the graph G if it does
not have cycle is |V| − 1. If G has a cycles, then the number of edges exceeds
this number. Hence, the algorithm will detects a cycle at the most at the V^{th}
edge if not before it. Therefore, the algorithm will run in O(V) time.
Updated: April 2, 2010.