Finding the maximum profit in a shareprice array

The following is a classic programming problem: find the best time to buy / sell a share, given an array of past share prices.

Well, there is one obvious answer: buy low, sell high. The question is: given a (potentially huge) array of share prices over time, at which point should you have bought, at which point should you have sold.

Ask yourself: Do I understand the problem?

• Input is an array of N values. We don't know N yet, but obviously the algorithm, whatever it is, shouldn't depend on a particular N.
• Without loss of generality, let's assume all values are integers. Shareprices will typically be modelled more like floating point numbers: but floating point numbers are slow, and have rounding errors. So imagine you get the share prices in cents, or even fractions of cents: as integers. We will examine further down below what changes if the datatype changes.
• Output is the index at which to buy, the index at which to sell, and the profit achieved thereby. Note that it can be possible that within a time range, the same profit can be made twice, so while the maximum profit is unique, the best buy/sell time is not necessarily unique.

Setting up a test harness

As usual, our first step is to prepare a testbed. Our code strategy for the test will be this:

• We will implement a couple of functions for the same problem. As usual, we are going to store them in a dictionary along with their name, so that we can call them, time their performance and print out results
• We will verify that all algorithms return the same output for the same input.

Therefor, our test code looks like this:

#! -*- Encoding: Latin-1 -*-

def selftest():
import random
import time

A = [random.random() for i in range(5000)]
expected_result = None

methods = { 'using brute force' : fmax_brute_force,
'using divide and conquer' : fmax_dandq,
'using one-pass sweep' : fmax_sequential,
}

for method_name in methods.keys():
method = methods[method_name]
t0 = time.time()
result = method(A)
print "%25s: %r after %r" % (method_name, result, time.time() - t0, )
if expected_result is None:
expected_result = result

else:
assert result == expected_result

if __name__ == "__main__":
selftest()

Brute-Force approach

Let's try brute-force first:

• For each possible buying times:
• calculate the maximum profit that can be reached if the shares were bought at this point
• If this profit is greater than the best profit seen so far, update the buy/sell positions

This clearly separates into two logical functions:

# given a strategy that buys the share at position index, what is the maximum profit that can be achieved?
max_profit = 0
sell_at = index
for i in range(index+1, len(A)):
selling_price = A[i]
if profit > max_profit:
max_profit = profit
sell_at = i

return max_profit, sell_at

# check all possible buying times
def fmax_brute_force(A):
max_profit = None
sell = None

for index, item in enumerate(A):
if (max_profit is None) or (profit > max_profit):
max_profit = profit
sell = sell_at

There is a small pythonesque optimization that we can do with calculate_profit_when_buying_now, by using the builtin max/index functions on arrays, like this:

sell_value = max(A[index:])
sell_at = A.index(sell_value, index)
return sell_value - A[index], sell_at

Some notes on this implementation:

• The time complexity of this solution is O(N2), because we loop through all items in the array twice.
• The space complexity of this solution is O(1) for the non-optimized version, and O(N) for the optimized one

Using Divide & Conquer

This strategy proceeds by dividing the problem into smaller subproblems, that can be solved more easy. Let's think about how we could do this:

• Split the array in two halfs: lower and upper.
• Find the best strategy in the lower half
• Find the best strategy in the upper half
• Find the best strategy by buying at the minimum price in the lower half, and selling at the maximum price in the upper half

The code for this is much more complex: we need to keep track of edge cases, ensure that the recursion terminates, and handle the "cross-border" selling part. And in Python, complex code often equals ugly code. But here we go:

def fmax_dandq_recursive(A, start, stop):
n = stop - start

# edge case 1: start == stop: buy and sell immediately = no profit at all
if n == 0:
return 0, start, start

# edge case 2: two consequtive timestamps. buy on the first day, sell on the second.
# this actually can result in a loss, so maybe buying and selling on the same day would be
# preferable (becaue at least profit would be nil, rather than a loss), but this will sort
# itself out further down the road, so we can keep the code simple here:
if n == 1:
return A[stop] - A[start], start, stop

mid = start + n/2

# the "divide" part in Divide & Conquer: try both halfs of the array
max_profit1, buy1, sell1 = fmax_dandq_recursive(A, start, mid-1)
max_profit2, buy2, sell2 = fmax_dandq_recursive(A, mid, stop)

# the "cross-border sales strategy" part: find the minium buy price in the lower half,
# and the maximum sell price in the upper half
for k in range(start+1, mid):

mp_sell_idx = mid
mp_sell_val = A[mid]
for k in range(mid+1, stop+1):
if A[k] > mp_sell_val:
mp_sell_val = A[k]
mp_sell_idx = k

# those two points generate the maximum cross border profit

# and now compare our three options and find the best one
if max_profit2 > max_profit1:
if max_profit_cross_border > max_profit2:
else:
else:
if max_profit_cross_border > max_profit1:
else:

def fmax_dandq(A):
return fmax_dandq_recursive(A, 0, len(A)-1)

I did tell you the code is ugly, didn't I?

Some notes on this implementation:

• The time complexity of this solution is O(N Log N).
• The space complexity of this solution is O(Log N) because of the recursions we are going to need

Using One-Pass Sweep

Once you've reached a N-Log-N-ish time complexity, often you can call it a day. But for this particular problem, there actually exists a more beautiful algorithm - that is also much faster: O(N). What could it be? Well, O(N) gives you a hint: it means we somehow will have to visit every item in the array once. So the basic structure will be something like this:

def fmax_one_pass_sweep(A):
max_profit = None
sell = None

for index, item in enumerate(A):
# ... MAGICALLY COMPUTE THE BEST BUY/SELL DATES HERE ...

When do we buy? When prices are low. That means, we're always going to buy at the lowest price we've seen so far. So we can update the logic like this:

def fmax_one_pass_sweep(A):

# minimum price seen so far
min = None

max_profit = None
sell = None

for index, item in enumerate(A):

# sell at the minimum - update minimum seen so far
if (min is None) or (item < min):
min = item

else:
# ... MAGICALLY COMPUTE THE BEST SELL DATES HERE ...

When do we sell? When prices are high. That means, we're always going to buy at the maximum profit possible so far. This completes our logic:

def fmax_one_pass_sweep(A):

# minimum price seen so far
min = None

max_profit = None
sell = None

for index, item in enumerate(A):

# sell at the minimum - update minimum seen so far
if (min is None) or (item < min):
min = item

# buy at the maximum - update maximum seen so far
elif (max_profit is None) or (item - min) > max_profit:
max_profit = item - min
sell = index

Beautiful and fast.

Other strategies

Let's briefly revisit other common strategys and see if they could help with our problem.

• Could hashing help? Not really. Hashing is a way to speed up data lookup - but data lookup is not a concern we have in our code here. We know we need to look at each element at least once (minimum), and we are already only looking at each element once (maximum), so that leaves little room for optimization right there.
• Could sorting help? Not really: unless part of our financial strategy is inventing a time machine and using that to optimize our financial investments, we must keep the original time order as it was.
• Could a greedy algorithm help? Not really. Greedy algorithms are about making choices when you are unsure if a certain choice is optimal. And we already know that some choices are suboptimal.
• Could a graph algorithm help? Hard to say. Probably there exists an ingenious mapping of a complex graph structure to this problem - but it will most certainly increase at least time complexity, if not space complexity.

But how does it actually perform?

The results on my machine for an array of 10.000 integers values were this:

using brute force: (9995, 315, 8336) after 1.7940 secs
using divide and conquer: (9995, 315, 8336) after 0.0200 secs
using one-pass sweep: (9995, 315, 8336) after 0.0010 secs

Abstraction time: from int to arbitrary objects

Replacing integer values by floats or decimals is trivial and works the same way. But could we relax the input conditions even further? I honestly think we ought to sit down calmly, take a stress pill, and think things over.

Think about the following modification of the original problem: you have an array of complex numbers, and you are to compute the maximum distance between two complex numbers.

• The distance between two complex numbers is well defined (if you think of them as points in a cartesian plane, then the distance is the length of the line between the two points).
• But there is no comparison defined: you cannot say which of any given two points is smaller than the other.

The brute-force algorithm we settled on above, using our python implementation, won't work:

# won't work, because there is no maximum defined for complex numbers
sell_at = A.index(sell_value, index)
return sell_value - A[index], sell_at

But the one we presented at the very beginning can work, with a slight modification:

# find best combination starting at index
max_profit = 0
sell_at = index
for i in range(index+1, len(A)):
selling_price = A[i]

# won't work, because the result of the subtraction of two complex numbers is another complex number,
# and you cannot say if two complex numbers are bigger or smaller
# profit = selling_price - buying_price
# if profit > max_profit:

# will work: you look at the absolute distance between two complex numbers
if profit > max_profit:

max_profit = profit
sell_at = i

return max_profit, sell_at

# check all possible buying times
def fmax_brute_force(A):
max_profit = None
sell = None

for index, item in enumerate(A):
if (max_profit is None) or (profit > max_profit):
max_profit = profit
sell = sell_at

This tells us that we really need is two functions:

1. a function that ranks possible solutions, and
2. a function that tells us if a rank is better than another. For example, so far we've implied that the bigger the rank, the better. But what if we want to compute the minimum distance between two (distinct) points?

So, a generalized brute-force solution looks like this:

def get_best_range_from_starting_pos(A, starting_pos, get_rank, evaluate_rank):
best_rank = None
ending_pos = None
for current_pos in range(starting_pos+1, len(A)):
current_rank = get_rank(A, starting_pos, current_pos)
if (best_rank is None) or evaluate_rank(current_rank, best_rank):
best_rank = current_rank
ending_pos = current_pos

return best_rank, ending_pos

def get_best_range_by_rank_brute_force(A, get_rank, evaluate_rank):
best_rank = None
best_starting_pos = None
best_ending_pos = None

for starting_pos in range(len(A)):
this_rank, ending_pos = get_best_range_from_starting_pos(A, starting_pos, get_rank, evaluate_rank)
if (best_rank is None) or (this_rank > best_rank):
best_rank = this_rank
best_starting_pos = starting_pos
best_ending_pos = ending_pos

return best_rank, best_starting_pos, best_ending_pos

For example, to find the two complex numbers closest together, you'd use something like this:

def rank_distance(A, starting_pos, current_pos):
return abs(A[current_pos] - A[starting_pos])

def get_min_distance(current_rank, best_rank):
return current_rank < best_rank

And to find the maximum distance of two complex numbers, you'd keep the rank and simply change the rank evaluation function to

def get_max_distance(current_rank, best_rank):
return current_rank > best_rank

Generalizing the One-Pass Sweep function

Now, can we do this for the one-pass sweep? Let's revisit the code:

def fmax_one_pass_sweep(A):

# minimum price seen so far
min = None

max_profit = None
sell = None

for index, item in enumerate(A):

# sell at the minimum - update minimum seen so far
if (min is None) or (item < min):
min = item

# buy at the maximum - update maximum seen so far
elif (max_profit is None) or (item - min) > max_profit:
max_profit = item - min
sell = index