Fix loop in a linked list at Samuel Jacob's Weblog

Fix loop in a linked list

Cycle or loop in a linked list with $n$ nodes is shown in pictorial representation.

Loop in a singly linked list

In this picture the number of nodes in the list is $n$ and the last node points to node at a position x(which created the cycle). From the picture it is obvious that to fix the loop:

Find the last element
Correct it to point end(NULL)

In the above steps, step 1(find the last element) is little difficult if total number of nodes in the list is unknown. In this post I will describe a method I discovered to find the last node. I drew couple of pictures to explain the complexity of the problem. The following pictures explains the same loop condition but in different form. It explains why it is hard to find the last node.

The illustration also tells us the following
$n = x+y$
where
$n$ is total number of nodes in the list
$x$ is number of nodes before the cycle starts.
$y$ is number of nodes forming the cycle.

If we have value of $x$ and $y$ then the equation can be solved, which will give the last node’s location.

Finding $y$

Detect the cycle using Floyd’s Cycle detection algorithm. (Two pointer with varying speed)
When cycle is detected
1. Store the current node as H
2. Move to next node and increment y(y++)
3. If current node is H then exit
4. Goto step 2

Finding $x$
Let’s first see the easiest way to solve this:

Have two pointers (p1, p2)
p1 = node 0
p2 = p1 + y nodes
if p1 == p2 then we found the result is $x = p1$
p1 = p1 + y nodes
goto step 3.

The complexity of this algorithm linear depends on $x$ . For example if $x$ is 10K and $y$ is 2 then number of node visits are > 20K. In other words the complexity is $O(2x + y)$ .

I believe it can be done in $O(n)$ or $O(x + y)$ with following method This following diagram captures state when the “tortoise and the hare” algorithm detects the cycle.

Lets define few more variables at the time of cycle detection:
$T$ is the total number of nodes tortoise traveled.
$H$ is the total number of nodes hare traveled.
$r$ can be defined as number of nodes before the ‘last node’

$T = 2H \newline x + cy - r = T \newline x + y - r = H \newline$
where c is the number of complete cycles hare covered.
$T > H \newline c > 1 \newline r >= x \ and \ r <= n \newline$
From this we can derive
$c = H / y + 1$

Using value of $c$ , $r$ can be derived.
$x + y - r = H \newline r = x + y - H \newline \newline x + cy - r = T \newline x = T - cy + r \newline \newline r = T - cy + r + y - H \newline r = H - cy + r + y \newline$