Completing the Google Foo.bar Challenge (part 2)

This is the second question for the Google foobar interactive coding challenge.

Numbers Station Coded Messages

When you went undercover in Commander Lambda's organization, you set up a coded messaging system with Bunny Headquarters to allow them to send you important mission updates. Now that you're here and promoted to Henchman, you need to make sure you can receive those messages - but since you need to sneak them past Commander Lambda's spies, it won't be easy!

Bunny HQ has secretly taken control of two of the galaxy's more obscure numbers stations, and will use them to broadcast lists of numbers. They've given you a numerical key, and their messages will be encrypted within the first sequence of numbers that adds up to that key within any given list of numbers.

Given a non-empty list of positive integers l and a target positive integer t, write a function answer(l, t) which verifies if there is at least one consecutive sequence of positive integers within the list l (i.e. a contiguous sub-list) that can be summed up to the given target positive integer t (the key) and returns the lexicographically smallest list containing the smallest start and end indexes where this sequence can be found, or returns the array [-1, -1] in the case that there is no such sequence (to throw off Lambda's spies, not all number broadcasts will contain a coded message).

For example, given the broadcast list l as [4, 3, 5, 7, 8] and the key t as 12, the function answer(l, t) would return the list [0, 2] because the list l contains the sub-list [4, 3, 5] starting at index 0 and ending at index 2, for which 4 + 3 + 5 = 12, even though there is a shorter sequence that happens later in the list (5 + 7). On the other hand, given the list l as [1, 2, 3, 4] and the key t as 15, the function answer(l, t) would return [-1, -1] because there is no sub-list of list l that can be summed up to the given target value t = 15.

To help you identify the coded broadcasts, Bunny HQ has agreed to the following standards:

Each list l will contain at least 1 element but never more than 100. Each element of l will be between 1 and 100. It will be a positive integer, not exceeding 250. The first element of the list l has index 0. For the list returned by answer(l, t), the start index must be equal or smaller than the end index. Remember, to throw off Lambda's spies, Bunny HQ might include more than one contiguous sublist of a number broadcast that can be summed up to the key. You know that the message will always be hidden in the first sublist that sums up to the key, so answer(l, t) should only return that sublist.

Test Cases
Inputs: (int list) l = [4, 3, 10, 2, 8] (int) t = 12
Output: (int list) [2, 3]
Inputs: (int list) l = [1, 2, 3, 4] (int) t = 15
Output: (int list) [-1, -1]

This was an interesting challenge. My initial attempt was "correct", but since we're iterating through the list twice, time complexity is O(n^2) . As the input list size increased, performance suffered. I discovered the challenge would timeout before completing based on the hidden input tests.

def answer(l, t):
    for x in range(len(l)):
        for y in range(len(l)):
            if sum(l[x:y+1]) == t:
                return [x, y]
    return [-1, -1]

def test_case_1():
    assert answer([4, 3, 5, 7, 8], 12) == [0, 2]

def test_case_2():
    assert answer([4, 3, 10, 2, 8], 12) == [2, 3]

def test_case_3():
    assert answer([1, 2, 3, 4], 15) == [-1, -1]

One of the most useful optimization tools is a hash table. Python's dictionary data type implements hash tables under the hood, which can significantly improve our performance. Jessica Yung wrote a great series of posts explaining the tradeoffs of lists versus dicts. Since we're only iterating through the list once, time complexity is now O(n).

def optimized_answer(l, t):
    # Store key values in dictionary with sum:index
    dict = {}
    # Initialize with 0: -1 in case sublist starts from index 0
    dict[0] = -1
    sum = 0
    # traverse the given list
    for i in range(len(l)):
        # sum of elements so far
        sum += l[i]
        # If the sum hasn't been recorded, add to dict
        if sum not in dict:
            dict[sum] = i
        # If sum - t exists in dict, we've found a valid solution
        if sum - t in dict:
            # Subtract index of sum - t
            length = i - dict[sum - t]
            return [i - length + 1, i]
    # No valid values found
    return [-1, -1]

def test_case_4():
    assert optimized_answer([4, 3, 5, 7, 8], 12) == [0, 2]

def test_case_5():
    assert optimized_answer([4, 3, 10, 2, 8], 12) == [2, 3]

def test_case_6():
    assert optimized_answer([1, 2, 3, 4], 15) == [-1, -1]

Using the line-profiler tool, we can see how much better the optimized_answer function performs:

$ kernprof -v -l numbers_station.py 
Wrote profile results to numbers_station.py.lprof
Timer unit: 1e-06 s

Total time: 3.5e-05 s
File: numbers_station.py
Function: answer at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           @profile
     2                                           def answer(l, t):
     3         3         20.0      6.7     57.1      for x in range(len(l)):
     4        16          6.0      0.4     17.1          for y in range(len(l)):
     5        14          9.0      0.6     25.7              if sum(l[x:y+1]) == t:
     6         1          0.0      0.0      0.0                  return [x, y]
     7                                               return [-1, -1]

Total time: 9e-06 s
File: numbers_station.py
Function: optimized_answer at line 18

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    18                                           @profile
    19                                           def optimized_answer(l, t):
    20                                               # Store key values in dictionary with sum:index
    21         1          0.0      0.0      0.0      dict = {}
    22                                               # Initialize with 0: -1 in case sublist starts from index 0
    23         1          0.0      0.0      0.0      dict[0] = -1
    24         1          0.0      0.0      0.0      sum = 0
    25                                               # traverse the given list
    26         4          2.0      0.5     22.2      for i in range(len(l)):
    27                                                   # sum of elements so far
    28         4          1.0      0.2     11.1          sum += l[i]
    29                                                   # If the sum hasn't been recorded, add to dict
    30         4          0.0      0.0      0.0          if sum not in dict:
    31         4          2.0      0.5     22.2              dict[sum] = i
    32                                                   # If sum - t exists in dict, we've found a valid solution
    33         4          2.0      0.5     22.2          if sum - t in dict:
    34                                                       # Subtract index of sum - t
    35         1          1.0      1.0     11.1              length = i - dict[sum - t]
    36         1          1.0      1.0     11.1              return [i - length + 1, i]
    37                                               # No valid values found
    38                                               return [-1, -1]