The Interview Cake Course (8. Linked lists)

Linked List

Data Structure

Quick reference

Worst Case
space O(n)O(n)
prepend O(1)O(1)
append O(1)O(1)
lookup O(n)O(n)
insert O(n)O(n)
delete O(n)O(n)

A linked list organizes items sequentially, with each item storing a pointer to the next one.

Picture a linked list like a chain of paperclips linked together. It’s quick to add another paperclip to the top or bottom. It’s even quick to insert one in the middle—just disconnect the chain at the middle link, add the new paperclip, then reconnect the other half.

An item in a linked list is called a node . The first node is called the head . The last node is called the tail .

Confusingly, sometimes people use the word tail to refer to “the whole rest of the list after the head.”

A linked list with 3 nodes. The first node is labelled "head" and the last node is labelled "tail."

Unlike an array, consecutive items in a linked list are not necessarily next to each other in memory.

Strengths:

  • Fast operations on the ends . Adding elements at either end of a linked list is O(1)O(1). Removing the first element is also O(1)O(1).
  • Flexible size . There’s no need to specify how many elements you’re going to store ahead of time. You can keep adding elements as long as there’s enough space on the machine.

Weaknesses:

  • Costly lookups . To access or edit an item in a linked list, you have to take O(i)O(i) time to walk from the head of the list to the iith item.

Uses:

  • Stacks and queues only need fast operations on the ends, so linked lists are ideal.

In Python 2.7

Most languages (including Python 2.7) don’t provide a linked list implementation. Assuming we’ve already implemented our own, here’s how we’d construct the linked list above:

a = LinkedListNode(5)
b = LinkedListNode(1)
c = LinkedListNode(9)

a.next = b
b.next = c

Doubly Linked Lists

In a basic linked list, each item stores a single pointer to the next element.

In a doubly linked list , items have pointers to the next and the previous nodes.

A doubly-linked list with 3 nodes. The first node has value 5 with a "next" arrow pointing ahead to the second node and a "previous" arrow pointing back to "None." The second node has value 1 with a "next" arrow pointing ahead to the third node and a "previous" arrow pointing back to the first node. The third node has value 9 with a "next" arrow pointing ahead to "None" and a "previous" arrow pointing back to the second node.

Doubly linked lists allow us to traverse our list backwards . In a singly linked list, if you just had a pointer to a node in the middle of a list, there would be no way to know what nodes came before it. Not a problem in a doubly linked list.

Not cache-friendly

Most computers have caching systems that make reading from sequential addresses in memory faster than reading from scattered addresses.

Array items are always located right next to each other in computer memory, but linked list nodes can be scattered all over.

So iterating through a linked list is usually quite a bit slower than iterating through the items in an array, even though they’re both theoretically O(n)O(n) time.

Delete a node from a singly-linked list, ↴ given only a variable pointing to that node.

The input could, for example, be the variable b below:

class LinkedListNode(object):

    def __init__(self, value):
        self.value = value
        self.next  = None

a = LinkedListNode('A')
b = LinkedListNode('B')
c = LinkedListNode('C')

a.next = b
b.next = c

delete_node(b)

Gotchas

We can do this in O(1)O(1) time and space! But our answer is tricky, and it could have some side effects…

Breakdown

It might be tempting to try to traverse the list from the beginning until we encounter the node we want to delete. But in this situation, we don’t know where the head of the list is—we only have a reference to the node we want to delete.

But hold on—how do we even delete a node from a linked list in general, when we do have a reference to the first node?

We’d change the previous node’s pointer to skip the node we want to delete, so it just points straight to the node after it. So if these were our nodes before deleting a node:

A singly-linked list with 3 nodes. The first node has value 1 and points to the second node, which has value 4 and points to the third node, which has value 0 and points to "None." We're trying to delete the second node.

These would be our nodes after our deletion:

Now, in the same linked list, the first node points to the third node. So the second node will be skipped and no arrows point to it.

So we need a way to skip over the current node and go straight to the next node. But we don’t even have access to the previous node!

Other than rerouting the previous node’s pointer, is there another way to skip from the previous pointer’s value to the next pointer’s value ?

What if we modify the current node instead of deleting it?

Solution

We take value and next from the input node’s next node and copy them into the input node . Now the input node’s previous node effectively skips the input node’s old value!

So for example, if this was our linked list before we called our function:

A singly-linked list with 3 nodes. The first node has value 3 and points to the second node, which has value 8 and points to the third node, which has value 2 and points to "None." We're trying to delete the second node.

This would be our list after we called our function:

Now, in the same linked list, the second node has value 2 and points to "None." The second node matches the third node, and no arrows point to the third node.

In some languages, like C, we’d have to manually delete the node we copied from, since we won’t be using that node anymore. Here, we’ll let Python’s garbage collector ↴ take care of it.

def delete_node(node_to_delete):
    # Get the input node's next node, the one we want to skip to
    next_node = node_to_delete.next

    if next_node:
        # Replace the input node's value and pointer with the next
        # node's value and pointer. The previous node now effectively
        # skips over the input node
        node_to_delete.value = next_node.value
        node_to_delete.next  = next_node.next
    else:
        # Eep, we're trying to delete the last node!
        raise Exception("Can't delete the last node with this technique!")

But be careful—there are some potential problems with this implementation:

First, it doesn’t work for deleting the last node in the list . We could change the node we’re deleting to have a value of None, but the second-to-last node’s next pointer would still point to a node, even though it should be None. This could work—we could treat this last, “deleted” node with value None as a “dead node” or a “sentinel node,” and adjust any node traversing code to stop traversing when it hits such a node. The trade-off there is we couldn’t have non-dead nodes with values set to None. Instead we chose to throw an exception in this case.

Second, this technique can cause some unexpected side-effects. For example, let’s say we call:

a = LinkedListNode(3)
b = LinkedListNode(8)
c = LinkedListNode(2)

a.next = b
b.next = c

delete_node(b)

There are two potential side-effects:

  1. Any references to the input node have now effectively been reassigned to its next node. In our example, we “deleted” the node assigned to the variable b, but in actuality we just gave it a new value (2) and a new next! If we had another pointer to b somewhere else in our code and we were assuming it still had its old value (8), that could cause bugs.
  2. If there are pointers to the input node’s original next node , those pointers now point to a “dangling” node (a node that’s no longer reachable by walking down our list). In our example above, c is now dangling. If we changed c, we’d never encounter that new value by walking down our list from the head to the tail.

Complexity

O(1)O(1) time and O(1)O(1) space.

What We Learned

My favorite part of this problem is how imperfect the solution is. Because it modifies the list “in place” it can cause other parts of the surrounding system to break. This is called a “side effect.”

In-place operations like this can save time and/or space, but they’re risky. If you ever make in-place modifications in an interview, make sure you tell your interviewer that in a real system you’d carefully check for side effects in the rest of the code base.

You have a singly-linked list ↴ and want to check if it contains a cycle.

A singly-linked list is built with nodes, where each node has:

  • node.next—the next node in the list.
  • node.value—the data held in the node. For example, if our linked list stores people in line at the movies, node.value might be the person’s name.

For example:

class LinkedListNode(object):

    def __init__(self, value):
        self.value = value
        self.next  = None

A cycle occurs when a node’s next points back to a previous node in the list . The linked list is no longer linear with a beginning and end—instead, it cycles through a loop of nodes.

Write a function contains_cycle() that takes the first node in a singly-linked list and returns a boolean indicating whether the list contains a cycle.

Gotchas

Careful—a cycle can occur in the middle of a list, or it can simply mean the last node links back to the first node. Does your function work for both?

We can do this in O(n)O(n) time and O(1)O(1) space!

Breakdown

Because a cycle could result from the last node linking to the first node, we might need to look at every node before we even see the start of our cycle again. So it seems like we can’t do better than O(n)O(n) runtime.

How can we track the nodes we’ve already seen?

Using a set, we could store all the nodes we’ve seen so far . The algorithm is simple:

  1. If the current node is already in our set, we have a cycle. Return True.
  2. If the current node is None we’ve hit the end of the list. Return False.
  3. Else throw the current node in our set and keep going.

What are the time and space costs of this approach? Can we do better?

Our runtime is O(n)O(n), the best we can do. But our space cost is also O(n)O(n). Can we get our space cost down to O(1)O(1) by storing a constant number of nodes?

Think about a looping list and a linear list. What happens when you traverse one versus the other?

A linear list has an end —a node that doesn’t have a next node. But a looped list will run forever. We know we don’t have a loop if we ever reach an end node, but how can we tell if we’ve run into a loop?

We can’t just run our function for a really long time, because we’d never really know with certainty if we were in a loop or just a really long list.

Imagine that you’re running on a long, mountainous running trail that happens to be a loop. What are some ways you can tell you’re running in a loop?

One way is to look for landmarks . You could remember one specific point, and if you pass that point again, you know you’re running in a loop. Can we use that principle here?

Well, our cycle can occur after a non-cyclical “head” section in the beginning of our linked list . So we’d need to make sure we chose a “landmark” node that is in the cyclical “tail” and not in the non-cyclical “head.” That seems impossible unless we already know whether or not there’s a cycle…

Think back to the running trail. Besides landmarks, what are some other ways you could tell you’re running in a loop? What if you had another runner ? (Remember, it’s a singly -linked list, so no running backwards!)

A tempting approach could be to have the other runner stop and act as a “landmark,” and see if you pass her again. But we still have the problem of making sure our “landmark” is in the loop and not in the non-looping beginning of the trail.

What if our “landmark” runner moves continuously but slowly ?

If we sprint quickly down the trail and the landmark runner jogs slowly , we will eventually “lap” (catch up to) the landmark runner!

But what if there isn’t a loop?

Then we (the faster runner) will simply hit the end of the trail (or linked list).

So let’s make two variables, slow_runner and fast_runner. We’ll start both on the first node, and every time slow_runner advances one node, we’ll have fast_runner advance two nodes.

If fast_runner catches up with slow_runner, we know we have a loop. If not, eventually fast_runner will hit the end of the linked list and we’ll know we don’t have a loop.

This is a complete solution! Can you code it up?

Make sure the function eventually terminates in all cases!

Solution

We keep two pointers to nodes (we’ll call these “runners”), both starting at the first node. Every time slow_runner moves one node ahead, fast_runner moves two nodes ahead.

If the linked list has a cycle, fast_runner will “lap” (catch up with) slow_runner, and they will momentarily equal each other.

If the list does not have a cycle, fast_runner will reach the end.

def contains_cycle(first_node):
    # Start both runners at the beginning
    slow_runner = first_node
    fast_runner = first_node

    # Until we hit the end of the list
    while fast_runner is not None and fast_runner.next is not None:
        slow_runner = slow_runner.next
        fast_runner = fast_runner.next.next

        # Case: fast_runner is about to "lap" slow_runner
        if fast_runner is slow_runner:
            return True

    # Case: fast_runner hit the end of the list
    return False

Complexity

O(n)O(n) time and O(1)O(1) space.

The runtime analysis is a little tricky. The worst case is when we do have a cycle, so we don’t return until fast_runner equals slow_runner. But how long will that take?

First, we notice that when both runners are circling around the cycle fast_runner can never skip over slow_runner . Why is this true?

Suppose fast_runner had just skipped over slow_runner. fast_runner would only be 1 node ahead of slow_runner, since their speeds differ by only 1. So we would have something like this:

[ ] -> [s] -> [f]

What would the step right before this “skipping step” look like? fast_runner would be 2 nodes back, and slow_runner would be 1 node back. But wait, that means they would be at the same node ! So fast_runner didn’t skip over slow_runner! (This is a proof by contradiction.)

Since fast_runner can’t skip over slow_runner, at most slow_runner will run around the cycle once and fast_runner will run around twice. This gives us a runtime of O(n)O(n).

For space, we store two variables no matter how long the linked list is, which gives us a space cost of O(1)O(1).

Bonus

  1. How would you detect the first node in the cycle? Define the first node of the cycle as the one closest to the head of the list.
  2. Would the program always work if the fast runner moves three steps every time the slow runner moves one step?
  3. What if instead of a simple linked list, you had a structure where each node could have several “next” nodes? This data structure is called a “directed graph.” How would you test if your directed graph had a cycle?

What We Learned

Some people have trouble coming up with the “two runners” approach. That’s expected—it’s somewhat of a blind insight. Even great candidates might need a few hints to get all the way there. And that’s fine.

Remember that the coding interview is a dialogue , and sometimes your interviewer expects she’ll have to offer some hints along the way.

One of the most impressive things you can do as a candidate is listen to a hint, fully understand it, and take it to its next logical step. Interview Cake gives you lots of opportunities to practice this. Don’t be shy about showing lots of hints on our exercises—that’s what they’re there for!

Hooray! It’s opposite day. Linked lists go the opposite way today.

Write a function for reversing a linked list. ↴ Do it in place. ↴

Your function will have one input: the head of the list.

Your function should return the new head of the list.

Here’s a sample linked list node class:

class LinkedListNode(object):

    def __init__(self, value):
        self.value = value
        self.next  = None

Gotchas

We can do this in O(1)O(1) space. So don’t make a new list; use the existing list nodes!

We can do this is in O(n)O(n) time.

Careful—even the right approach will fail if done in the wrong order .

Try drawing a picture of a small linked list and running your function by hand. Does it actually work?

The most obvious edge cases are:

  1. the list has 0 elements
  2. the list has 1 element

Does your function correctly handle those cases?

Breakdown

Our first thought might be to build our reversed list “from the beginning,” starting with the head of the final reversed linked list.

The head of the reversed list will be the tail of the input list. To get to that node we’ll have to walk through the whole list once (O(n)O(n) time). And that’s just to get started.

That seems inefficient. Can we reverse the list while making just one walk from head to tail of the input list?

We can reverse the list by changing the next pointer of each node. Where should each node’s next pointer…point?

Each node’s next pointer should point to the previous node.

How can we move each node’s next pointer to its previous node in one pass from head to tail of our current list?

Solution

In one pass from head to tail of our input list, we point each node’s next pointer to the previous item.

The order of operations is important here! We’re careful to copy current_node.next into next before setting current_node.next to previous_node. Otherwise “stepping forward” at the end could actually mean stepping back to previous_node!

def reverse(head_of_list):
    current_node = head_of_list
    previous_node = None
    next_node = None

    # Until we have 'fallen off' the end of the list
    while current_node:
        # Copy a pointer to the next element
        # before we overwrite current_node.next
        next_node = current_node.next

        # Reverse the 'next' pointer
        current_node.next = previous_node

        # Step forward in the list
        previous_node = current_node
        current_node = next_node

    return previous_node

We return previous_node because when we exit the list, current_node is None. Which means that the last node we visited—previous_node—was the tail of the original list, and thus the head of our reversed list.

Complexity

O(n)O(n) time and O(1)O(1) space. We pass over the list only once, and maintain a constant number of variables in memory.

Bonus

This in-place ↴ reversal destroys the input linked list. What if we wanted to keep a copy of the original linked list? Write a function for reversing a linked list out-of-place.

What We Learned

It’s one of those problems where, even once you know the procedure, it’s hard to write a bug-free solution. Drawing it out helps a lot. Write out a sample linked list and walk through your code by hand, step by step, running each operation on your sample input to see if the final output is what you expect. This is a great strategy for any coding interview question.

You have a linked list ↴ and want to find the kkth to last node.

Write a function kth_to_last_node() that takes an integer kk and the head_node of a singly-linked list, and returns the kkth to last node in the list.

For example:

class LinkedListNode:

    def __init__(self, value):
        self.value = value
        self.next  = None


a = LinkedListNode("Angel Food")
b = LinkedListNode("Bundt")
c = LinkedListNode("Cheese")
d = LinkedListNode("Devil's Food")
e = LinkedListNode("Eccles")

a.next = b
b.next = c
c.next = d
d.next = e

# Returns the node with value "Devil's Food" (the 2nd to last node)
kth_to_last_node(2, a)

Gotchas

We can do this in O(n)O(n) time.

We can do this in O(1)O(1) space. If you’re recursing, you’re probably taking O(n)O(n) space on the call stack! ↴

Breakdown

It might be tempting to iterate through the list until we reach the end and then walk backward kk nodes.

But we have a singly linked list! We can’t go backward. What else can we do?

What if we had the length of the list?

Then we could calculate how far to walk, starting from the head, to reach the kkth to last node.

If the list has nn nodes:

A linked list represented by circles and arrows, with the distance from the first to the last node labelled n.

And our target is the kkth to last node:

A linked list represented by circles and arrows, with the distance from the first to the last node labelled n. The third-to-last node is the kth to last node, with its distance to the last node labelled k.

The distance from the head to the target is n-kn−k:

A linked list represented by circles and arrows, with the distance from the first to the last node labelled n. The third-to-last node is the kth to last node, with its distance to the last node labelled k and its distance to the first node labelled n minus k.

Well, we don’t know the length of the list (nn). But can we figure it out?

Yes, we could iterate from the head to the tail and count the nodes!

Can you implement this approach in code?

def kth_to_last_node(k, head):
    # Step 1: get the length of the list
    # Start at 1, not 0
    # else we'd fail to count the head node!
    list_length = 1
    current_node = head

    # Traverse the whole list,
    # counting all the nodes
    while current_node.next:
        current_node = current_node.next
        list_length += 1

    # Step 2: walk to the target node
    # Calculate how far to go, from the head,
    # to get to the kth to last node
    how_far_to_go = list_length - k
    current_node = head
    for i in xrange(how_far_to_go):
        current_node = current_node.next

    return current_node

What are our time and space costs?

O(n)O(n) time and O(1)O(1) space, where nn is the length of the list.

More precisely, it takes nn steps to get the length of the list, and another n-kn−k steps to reach the target node. In the worst case k=1k=1, so we have to walk all the way from head to tail again to reach the target node. This is a total of 2n2n steps, which is O(n)O(n).

Can we do better?

Mmmmaaaaaaybe.

The fact that we walk through our whole list once just to get the length, then walk through the list again to get to the kkth to last element sounds like a lot of work. Perhaps we can do this in just one pass?

What if we had like a “stick” that was kk nodes wide. We could start it at the beginning of the list so that the left side of the stick was on the head and the right side was on the kkth node.

A linked list represented by circles and arrows. The third node is labelled "kth," and a linear "stick" k nodes long extends from above the first node to above the kth node.

Then we could slide the stick down the list…

A linked list represented by circles and arrows. The third node is labelled "kth," and a linear "stick" k nodes long extends from above the second node to above the fourth node.

until the right side hit the end. At that point, the left side of the stick would be on the kkth to last node!

A linked list represented by circles and arrows. The third-to-last node is labelled "kth to last," and a linear "stick" k nodes long extends from above the kth-to-last node to above the last node.

How can we implement this? Maybe it’ll help to keep two pointers?

We can allocate two variables that’ll be references to the nodes at the left and right sides of the “stick”!

def kth_to_last_node(k, head):
    left_node  = head
    right_node = head

    # Move right_node to the kth node
    for _ in xrange(k - 1):
        right_node = right_node.next

    # Starting with left_node on the head,
    # move left_node and right_node down the list,
    # maintaining a distance of k between them,
    # until right_node hits the end of the list
    while right_node.next:
        left_node  = left_node.next
        right_node = right_node.next

    # Since left_node is k nodes behind right_node,
    # left_node is now the kth to last node!
    return left_node

This’ll work, but does it actually save us any time ?

Solution

We can think of this two ways.

First approach: use the length of the list.

  1. walk down the whole list, counting nodes, to get the total list_length.
  2. subtract kk from the list_length to get the distance from the head node to the target node (the kth to last node).
  3. walk that distance from the head to arrive at the target node.
def kth_to_last_node(k, head):
    if k < 1:
        raise ValueError(
            'Impossible to find less than first to last node: %s' % k
        )

    # Step 1: get the length of the list
    # Start at 1, not 0
    # else we'd fail to count the head node!
    list_length = 1
    current_node = head

    # Traverse the whole list,
    # counting all the nodes
    while current_node.next:
        current_node = current_node.next
        list_length += 1

    # If k is greater than the length of the list, there can't
    # be a kth-to-last node, so we'll return an error!
    if k > list_length:
        raise ValueError(
            'k is larger than the length of the linked list: %s' % k
        )

    # Step 2: walk to the target node
    # Calculate how far to go, from the head,
    # to get to the kth to last node
    how_far_to_go = list_length - k
    current_node = head
    for i in xrange(how_far_to_go):
        current_node = current_node.next

    return current_node

Second approach: maintain a kk-wide “stick” in one walk down the list.

  1. Walk one pointer kk nodes from the head. Call it right_node.
  2. Put another pointer at the head. Call it left_node.
  3. Walk both pointers, at the same speed, towards the tail. This keeps a distance of kk between them.
  4. When right_node hits the tail, left_node is on the target (since it’s kk nodes from the end of the list).
def kth_to_last_node(k, head):
    if k < 1:
        raise ValueError(
            'Impossible to find less than first to last node: %s' % k
        )

    left_node  = head
    right_node = head

    # Move right_node to the kth node
    for _ in xrange(k - 1):
        # But along the way, if a right_node doesn't have a next,
        # then k is greater than the length of the list and there
        # can't be a kth-to-last node! we'll raise an error
        if not right_node.next:
            raise ValueError(
                'k is larger than the length of the linked list: %s' % k
            )
        right_node = right_node.next

    # Starting with left_node on the head,
    # move left_node and right_node down the list,
    # maintaining a distance of k between them,
    # until right_node hits the end of the list
    while right_node.next:
        left_node  = left_node.next
        right_node = right_node.next

    # Since left_node is k nodes behind right_node,
    # left_node is now the kth to last node!
    return left_node

In either approach, make sure to check if k is greater than the length of the linked list! That’s bad input, and we’ll want to raise an error.

Complexity

Both approaches use O(n)O(n) time and O(1)O(1) space.

But the second approach is fewer steps since it gets the answer “in one pass,” right? Wrong.

In the first approach, we walk one pointer from head to tail (to get the list’s length), then walk another pointer from the head node to the target node (the kkth to last node).

In the second approach, right_node also walks all the way from head to tail, and left_node also walks from the head to the target node.

So in both cases, we have two pointers taking the same steps through our list. The only difference is the order in which the steps are taken. The number of steps is the same either way.

However, the second approach might still be slightly faster , due to some caching and other optimizations that modern processors and memory have.

Let’s focus on caching. Usually when we grab some data from memory (for example, info about a linked list node), we also store that data in a small cache right on the processor. If we need to use that same data again soon after, we can quickly grab it from the cache. But if we don’t use that data for a while, we’re likely to replace it with other stuff we’ve used more recently (this is called a “least recently used” replacement policy).

Both of our algorithms access a lot of nodes in our list twice, so they could exploit this caching. But notice that in our second algorithm there’s a much shorter time between the first and second times that we access a given node (this is sometimes called “temporal locality of reference”). Thus it seems more likely that our second algorithm will save time by using the processor’s cache! But this assumes our processor’s cache uses something like a “least recently used” replacement policy—it might use something else. Ultimately the best way to really know which algorithm is faster is to implement both and time them on a few different inputs!

Bonus

Can we do better? What if we expect nn to be huge and kk to be pretty small? In this case, our target node will be close to the end of the list…so it seems a waste that we have to walk all the way from the beginning twice .

Can we trim down the number of steps in the “second trip”? One pointer will certainly have to travel all the way from head to tail in the list to get the total length…but can we store some “checkpoints” as we go so that the second pointer doesn’t have to start all the way at the beginning? Can we store these “checkpoints” in constant space? Note: this approach only saves time if we know that our target node is towards the end of the list (in other words, nn is much larger than kk).

What We Learned

We listed two good solutions. One seemed to solve the problem in one pass, while the other took two passes. But the single-pass approach didn’t take half as many steps, it just took the same steps in a different order .

So don’t be fooled: “one pass” isn’t always fewer steps than “two passes.” Always ask yourself, “Have I actually changed the number of steps?”

Find a duplicate, Space Edition™ BEAST MODE

In Find a duplicate, Space Edition™, we were given a list of integers where:

  1. the integers are in the range 1…n1…n
  2. the list has a length of n+1n+1

These properties mean the list must have at least 1 duplicate . Our challenge was to find a duplicate number, while optimizing for space . We used a divide and conquer approach, iteratively cutting the list in half to find a duplicate integer in O(n\lg{n})O(nlgn) time and O(1)O(1) space (sort of a modified binary search).

But we can actually do better . We can find a duplicate integer in O(n)O(n) time while keeping our space cost at O(1)O(1) .

This is a tricky one to derive (unless you have a strong background in graph theory), so we’ll get you started:

Imagine each item in the list as a node in a linked list . In any linked list, ↴ each node has a value and a “next” pointer. In this case:

  • The value is the integer from the list.
  • The “next” pointer points to the value-eth node in the list (numbered starting from 1). For example, if our value was 3 , the “next” node would be the third node.

Here’s a full example:

A list [2, 3, 1, 3], so 2 is in the first position and points to 3 in the second position.

Notice we’re using “positions” and not “indices.” For this problem, we’ll use the word “position” to mean something like “index,” but different: indices start at 0, while positions start at 1. More rigorously: position = index + 1.

Using this, find a duplicate integer in O(n)O(n) time while keeping our space cost at O(1)O(1) .

Drawing pictures will help a lot with this one. Grab some paper and pencil (or a whiteboard, if you have one).

Gotchas

We don’t need any new data structures. Your final space cost must be O(1)O(1).

We can do this without destroying the input.

Breakdown

Here are a few sample lists. Try drawing them out as linked lists:

[3, 4, 2, 3, 1, 5]
[3, 1, 2, 2]
[4, 3, 1, 1, 4]

Look for patterns. Then think about how we might use those patterns to find a duplicate number in our list.

When a value is repeated, how will that affect the structure of our linked list?

If two nodes have the same value , their next pointers will point to the same node!

So if we can find a node with two incoming next pointers, we know the position of that node is a duplicate integer in our list.

For example, if there are two 2s in our list, the node in the 2nd position will have two incoming pointers.

A list [3, 1, 2, 2], so the 2s in the third and fourth position both point to the 1 in the second position.

Alright, we’re on to something. But hold on—creating a linked list would take O(n)O(n) space, and we don’t want to change our space cost from O(1)O(1).

No problem—turns out we can just think of the list as a linked list, and traverse it without actually creating a new data structure.

If you’re stuck on figuring out how to traverse the list like a linked list, don’t sweat it too much. Just use a real linked list for now, while we finish deriving the algorithm.

Ok, so we figured out that the position of a node with multiple incoming pointers must be a duplicate . If we can find a node with multiple incoming pointers in a constant number of walks through our list, we can find a duplicate value in O(n)O(n) time.

How can we find a node with multiple incoming pointers?

Let’s look back at those sample lists and their corresponding linked lists, which we drew out to look for patterns:

A list [3, 4, 2, 3, 1, 5], so 3 is in the first position and points to 2 in the third position.

A list [3, 1, 2, 2], so 3 is in the first position and points to 2 in the third position.

A list [4, 3, 1, 1, 4], so 4 is in the first position and points to 1 in the fourth position.

Are there any patterns that might help us find a node with two incoming pointers?

Here’s a pattern: the last node never has any incoming pointers . This makes sense—since the list has a length n + 1n+1 and all the values are nn or less, there can never be a pointer to the last position. If nn is 5, the length of the list is 6 but there can’t be a value 6 so no pointer will ever point to the 6th node. Since it has no incoming pointers, we should treat the last position in our list as the “head” (starting point) of our linked list.

Here’s another pattern: there’s never an end to our list . No pointer ever points to None. Every node has a value in the range 1…n1…n, so every node points to another node (or to itself). Since the list goes on forever, it must have a cycle (a loop) . Here are the cycles in our example lists:

The list [3, 4, 2, 3, 1, 5] has a loop where 4 in the second position points to 3 in the fourth position, which points to 2 in the third position, which points back to 4 in the second position.

The list [3, 1, 2, 2] has a loop where 3 in the first position points to 2 in the third position, which points to 1 in the second position, which points back to 3 in the first position.

The list [4, 3, 1, 1, 4] has a loop where 4 in the first position points to 1 in the fourth position, which points back to 4 in the first position.

Can we use these cycles to find a duplicate value?

If we walk through our linked list, starting from the head, at some point we will enter our cycle. Try tracing that path on the example lists above. Notice anything special about the first node we hit when we enter the cycle?

The first node in the cycle always has at least two incoming pointers !

We’re getting close to an algorithm for finding a duplicate value. How can we find the beginning of a cycle?

Again, drawing an example is helpful here:

A linked list with 9 nodes represented by circles and arrows, with a cycle because node 9 links back to node 6.

If we were traversing this list and wanted to know if we were inside a cycle, that would be pretty easy—we could just remember our current position and keep stepping ahead to see if we get to that position again.

But our problem is a little trickier—we need to know the first node in the cycle.

What if we knew the length of the cycle ?

If we knew the length of the cycle, we could use the “stick approach” to start at the head of the list and find the first node. We use two pointers. One pointer starts at the head of the list:

We put a pointer above node 1, and we know the cycle has 4 steps because it uses nodes 6, 7, 8, and 9.

Then we lay down a “stick” with the same length as the cycle, by starting the second pointer at the end. So here, for example, the second pointer is starting 4 steps ahead because the cycle has 4 steps:

We put a second pointer above node 4 and connected it to the first pointer at node 1, forming a "stick" 4 nodes long.

Then we move the stick along the list by advancing the two pointers at the same speed (one node at a time).

The "stick" advancing down the linked list, now with the first pointer at node 4 and the second pointer at node 8.

When the first pointer reaches the first node in the cycle, the second pointer will have circled around exactly once. The stick wraps around the cycle, and the two ends are in the same place: the start of the cycle .

The "stick" wrapped around the cycle with both pointers on the start of the cycle, node 6. The pointers are on the same node because the cycle and stick are the same length.

We already know where the head of our list is (the last position in the list) so we just need the length of the cycle. How can we find the length of a cycle?

If we can get inside the cycle, we can just remember a position and count how many steps it takes to get back to that position.

How can we make sure we’ve gotten inside a cycle?

Well, there has to be a cycle in our list, and at the latest , the cycle is just the last node we hit as we traverse the list from the head:

A linked list with 7 nodes represented by circles and arrows, with a cycle because the last node links back to itself.

So we can just start at the head and walk nn steps. By then we’ll have to be inside a cycle.

Alright, we’ve pieced together an entire strategy to find a duplicate integer! Working backward:

  1. We know the position of a node with multiple incoming pointers is a duplicate in our list because the nodes that pointed to it must have the same value.
  2. We find a node with multiple incoming pointers by finding the first node in a cycle.
  3. We find the first node in a cycle by finding the length of the cycle and advancing two pointers: one starting at the head of the linked list, and the other starting ahead as many steps as there are steps in the cycle. The pointers will meet at the first node in the cycle.
  4. We find the length of a cycle by remembering a position inside the cycle and counting the number of steps it takes to get back to that position.
  5. We get inside a cycle by starting at the head and walking nn steps. We know the head of the list is at position n + 1n+1 .

Can you implement this? And remember—we won’t want to actually create a linked list. Here’s how we can traverse our list as if it were a linked list. ↴

To get inside a cycle (step E above), we identify nn, start at the head (the node in position n + 1n+1), and walk nn steps.

def find_duplicate(int_list):
    n = len(int_list) - 1

    # STEP 1: GET INSIDE A CYCLE
    # Start at position n+1 and walk n steps to
    # find a position guaranteed to be in a cycle
    position_in_cycle = n + 1
    for _ in xrange(n):
        position_in_cycle = int_list[position_in_cycle - 1]

Now we’re guaranteed to be inside a cycle. To find the cycle’s length (D), we remember the current position and step ahead until we come back to that same position, counting the number of steps.

def find_duplicate(int_list):
    n = len(int_list) - 1

    # STEP 1: GET INSIDE A CYCLE
    # Start at position n+1 and walk n steps to
    # find a position guaranteed to be in a cycle
    position_in_cycle = n + 1
    for _ in xrange(n):
        position_in_cycle = int_list[position_in_cycle - 1]

    # STEP 2: FIND THE LENGTH OF THE CYCLE
    # Find the length of the cycle by remembering a position in the cycle
    # and counting the steps it takes to get back to that position
    remembered_position_in_cycle = position_in_cycle
    current_position_in_cycle = int_list[position_in_cycle - 1]  # 1 step ahead
    cycle_step_count = 1

    while current_position_in_cycle != remembered_position_in_cycle:
        current_position_in_cycle = int_list[current_position_in_cycle - 1]
        cycle_step_count += 1

Now we have the head and the length of the cycle. We need to find the first node in the cycle ©. We set up 2 pointers: 1 at the head, and 1 ahead as many steps as there are nodes in the cycle. These two pointers form our “stick.”

# STEP 3: FIND THE FIRST NODE OF THE CYCLE
# Start two pointers
#   (1) at position n+1
#   (2) ahead of position n+1 as many steps as the cycle's length
pointer_start = n + 1
pointer_ahead = n + 1
for _ in xrange(cycle_step_count):
    pointer_ahead = int_list[pointer_ahead - 1]

Alright, we just need to find to the first node in the cycle (B), and return a duplicate value (A)!

Solution

We treat the input list as a linked list like we described at the top in the problem.

To find a duplicate integer:

  1. We know the position of a node with multiple incoming pointers is a duplicate in our list because the nodes that pointed to it must have the same value.
  2. We find a node with multiple incoming pointers by finding the first node in a cycle.
  3. We find the first node in a cycle by finding the length of the cycle and advancing two pointers: one starting at the head of the linked list, and the other starting ahead as many steps as there are nodes in the cycle. The pointers will meet at the first node in the cycle.
  4. We find the length of a cycle by remembering a position inside the cycle and counting the number of steps it takes to get back to that position.
  5. We get inside a cycle by starting at the head and walking nn steps. We know the head of the list is at position n + 1n+1 .

We want to think of our list as a linked list but we don’t want to actually use up all that space, so we traverse our list as if it were a linked list ↴ by converting positions to indices.

def find_duplicate(int_list):
    n = len(int_list) - 1

    # STEP 1: GET INSIDE A CYCLE
    # Start at position n+1 and walk n steps to
    # find a position guaranteed to be in a cycle
    position_in_cycle = n + 1
    for _ in xrange(n):
        position_in_cycle = int_list[position_in_cycle - 1]
        # we subtract 1 from the current position to step ahead:
        # the 2nd *position* in a list is *index* 1

    # STEP 2: FIND THE LENGTH OF THE CYCLE
    # Find the length of the cycle by remembering a position in the cycle
    # and counting the steps it takes to get back to that position
    remembered_position_in_cycle = position_in_cycle
    current_position_in_cycle = int_list[position_in_cycle - 1]  # 1 step ahead
    cycle_step_count = 1

    while current_position_in_cycle != remembered_position_in_cycle:
        current_position_in_cycle = int_list[current_position_in_cycle - 1]
        cycle_step_count += 1

    # STEP 3: FIND THE FIRST NODE OF THE CYCLE
    # Start two pointers
    #   (1) at position n+1
    #   (2) ahead of position n+1 as many steps as the cycle's length
    pointer_start = n + 1
    pointer_ahead = n + 1
    for _ in xrange(cycle_step_count):
        pointer_ahead = int_list[pointer_ahead - 1]

    # Advance until the pointers are in the same position
    # which is the first node in the cycle
    while pointer_start != pointer_ahead:
        pointer_start = int_list[pointer_start - 1]
        pointer_ahead = int_list[pointer_ahead - 1]

    # Since there are multiple values pointing to the first node
    # in the cycle, its position is a duplicate in our list
    return pointer_start

Complexity

O(n)O(n) time and O(1)O(1) space.

Our space cost is O(1)O(1) because all of our additional variables are integers, which each take constant space.

For our runtime, we iterate over the array a constant number of times, and each iteration takes O(n)O(n) time in its worst case. So we traverse the linked list more than once, but it’s still a constant number of times—about 3.

Bonus

There another approach using randomized algorithms that is O(n)O(n) time and O(1)O(1) space. Can you come up with that one? (Hint: You’ll want to focus on the median.)

What We Learned

This one’s pretty crazy. It’s hard to imagine an interviewer expecting you to get all the way through this question without help.

But just because it takes a few hints to get to the answer doesn’t mean a question is “too hard.” Some interviewers expect they’ll have to offer a few hints.

So if you get a hint in an interview, just relax and listen. The most impressive thing you can do is drop what you’re doing, fully understand the hint, and then run with it.