Samuel Jacob's Weblog

Just another technical blog

Fix loop in a linked list

without comments

Cycle or loop in a linked list with n nodes is shown in pictorial representation.

Loop in a singly linked list

Loop in a singly linked list

In this picture the number of nodes in the list is n and the last node points to node at a position x(which created the cycle). From the picture it is obvious that to fix the loop:

  1. Find the last element
  2. Correct it to point end(NULL)

In the above steps, step 1(find the last element) is little difficult if total number of nodes in the list is unknown. In this post I will describe a method I discovered to find the last node. I drew couple of pictures to explain the complexity of the problem. The following pictures explains the same loop condition but in different form. It explains why it is hard to find the last node.
Cycle2

Cycle3
The illustration also tells us the following
n = x+y
where
  n is total number of nodes in the list
  x is number of nodes before the cycle starts.
  y is number of nodes forming the cycle.

If we have value of x and y then the equation can be solved, which will give the last node’s location.

Finding y

  1. Detect the cycle using Floyd’s Cycle detection algorithm. (Two pointer with varying speed)
  2. When cycle is detected
    1. Store the current node as H
    2. Move to next node and increment y(y++)
    3. If current node is H then exit
    4. Goto step 2

Finding x
Let’s first see the easiest way to solve this:

  1. Have two pointers (p1, p2)
  2. p1 = node 0
  3. p2 = p1 + y nodes
  4. if p1 == p2 then we found the result is x = p1
  5. p1 = p1 + y nodes
  6. goto step 3.

The complexity of this algorithm linear depends on x. For example if x is 10K and y is 2 then number of node visits are > 20K. In other words the complexity is O(2x + y).

I believe it can be done in O(n) or O(x + y) with following method This following diagram captures state when the “tortoise and the hare” algorithm detects the cycle.
CycleFix

Lets define few more variables at the time of cycle detection:
T is the total number of nodes tortoise traveled.
H is the total number of nodes hare traveled.
r can be defined as number of nodes before the ‘last node’

T = 2H                   \newline x + cy - r = T           \newline x + y - r = H            \newline
where c is the number of complete cycles hare covered.
T > H                                \newline c > 1                                \newline r >= x \ and \ r <= n                \newline
From this we can derive
c = H / y + 1

Using value of c, r can be derived.
x + y - r = H            \newline r = x + y - H            \newline \newline x + cy - r = T           \newline x =  T - cy + r           \newline \newline r =  T - cy + r  +  y - H  \newline r =  H - cy + r  +  y      \newline

Using this formula r can be found. With this r value, the last node can be found by travelling r-1 nodes where T and H met.

Written by samueldotj

August 29th, 2013 at 3:35 pm

Posted in Uncategorized