Data Types¶

Up to this point, we have been using several data types that are part of Python (integers, strings, lists, dictionaries, etc.). However, programmers often need to define their own data types. We will be building up to defining custom data types in an object-oriented manner, but we will start with something simpler: defining an API (Application Programmer Interface) for a data type. To do this, we will use two classical data structures: stacks and queues (these data structures are also very useful in their own right, so they are more than just representative examples).

Stacks¶

A stack is essentialy a list of values, but with a more constrained set of operations available on that list. Most notably, we can only access and manipulate one element in the list: the last element in the list (which, in stacks, is known as the top of the stack). This data structure has a number of uses, including:

Undo\redo operation in word processors
Expression evaluation and syntax parsing
Function evaluation

A common visualization of stacks is (nonsurprisingly) as values stacked on top of each other. For example:

  TOP

|  10 |
|  56 |
| 105 |
|  42 |
|  5  |
-------

 BOTTOM

In the above stack, we can only interact with the value 10: we can see its value, we can remove it, or we can stack another value on top of it. However, we cannot interact with the other values in the stack (unless, for example, I remove value 10, and then 56 becomes the top of the stack.

The main operations on a stack are the following:

Creating an empty stack
Pushing a value into the stack. If a stack was empty, then the stack will contain just the pushed value. If the stack already had values, we "stack" the new value on the top of stack.
Popping a value from the stack. This takes the value at the top of the stack and removes it (we also get to see what that value is)
Peeking at the value at the top of the stack. This tells us what the value at the top of the stack is but unlike popping does not remove it from the stack.
Checking whether a stack is empty.

ASK: How can we implement a stack?

ANSWER: Using a list (follow-up: what is the top/bottom of the stack in a list?)

For example, the above stack could be represented like this:

s = [ 5, 42, 105, 56, 10 ]

ASK: How do we implement pushing?

# We can "push" onto the stack using the append method
s.append(56)

s

[5, 42, 105, 56, 10, 56]

ASK: How do we implement popping?

# We can "pop" with the list's pop method
s.pop()

56

s

[5, 42, 105, 56, 10]

So far, so good. However, there's a problem: nothing is stopping me from performing non-stack operations on the list:

# I shouldn't be able to do this in a stack!
s[2] = 37

s

[5, 42, 37, 56, 10]

I want to make sure that I only interact with the stack using the operations I've defined earlier.

ASK: Why would we want to constrain what we can do with a list?

ANSWER: Although stacks are lists of values (which, yes, we could manipulate with the full set of operations available with lists), there are many cases where it is desirable to have a stack, ensuring that the programmer who uses that stack cannot manipulate it in non-stack ways. Furthermore, there are many algorithms that can be easily and elegantly implemented in terms of stacks instead of lists (we will see one such example soon).

So, it makes sense to define a new stack data type that ensures that programmers only interact with the data type in acceptable ways. When it comes to data types, there are two roles for programmers:

The data type developer, who is aware of all the internal details of the data type.
The data type user, not in the sense of end-user of an application, but in the sense of another programmer that is using that data type.

Take into account that, up to this point, you have all been users of the Python data types (integers, lists, dictionaries, etc.). Notice how you have been able to use these data types without knowing the internal details of the data types (much less being able to manipulate the internal details of the data type). When you created a new dictionary, one was provided for you, and you were given a well-defined set of operations that you could use to interact with that dictionary. You are blissfully unaware of what happens under the hood and, more importantly, you can't use dictionaries except in the ways intended by the Python developers (e.g., you can't use a list as a key in a dictionary).

This "public interface" to the data type (the set of operations and attributes that the data type developer wants the user to be able to access) is called the data types API (Application Programmer Interface). You will now switch roles to being a data type developer although, by necessity, you will also be using the data types you develop (but it's important that you understand that there is a "barrier" between the two roles, and that the data type users should only interact with the data type through the API provided by the data type developer).

For now, we will define APIs as a collection of functions. Later on, we will see how to define APIs in an object-oriented manner.

So, we will need functions for the operations we described earlier. Creating an empty stack is simple enough:

def stack_create():
    return []

s = stack_create()

Notice how the function returns a list but, conceptually, it returns a stack. The data type user doesn't need to know that s is actually a list (even though in Python this is easy enough to find out). In fact, a very important principle of API design is that it should be possible for the data type developer to change the internal implementation without affecting the data type users (which should be treating the value returned by create as an opaque type). When you are both the data type developer and user, this can be a bit confusing but, again, think about how you use dictionaries: when you do d = {} you are blisfully unaware that variable d actually refers to a hash table and, if the Python developers decided to switch to a different internal representation, you would keep using dictionaries the same way.

Pushing, popping, peeking, and checking emptiness are similarly straightforward:

def stack_push(stack, value):
    stack.append(value)

def stack_pop(stack):
    return stack.pop()

def stack_top(stack):
    return stack[-1]

def stack_is_empty(stack):
    return len(stack) == 0

Let's also add a function that creates a string representation of the stack:

def stack_to_string(stack):
    s  = " TOP OF THE STACK\n"
    s += "-------------------\n"
    
    for v in reversed(stack):
        s += str(v).center(20) + "\n"

    s += "-------------------\n"
    s += "BOTTOM OF THE STACK\n"
    return s

Now, we can work with stacks using only these functions:

stack_push(s, 10)
stack_push(s, 27)
stack_push(s, 5)
stack_push(s, 9)
stack_push(s, 7)

print(stack_to_string(s))

 TOP OF THE STACK
-------------------
         7          
         9          
         5          
         27         
         10         
-------------------
BOTTOM OF THE STACK

stack_pop(s)

7

print(stack_to_string(s))

 TOP OF THE STACK
-------------------
         9          
         5          
         27         
         10         
-------------------
BOTTOM OF THE STACK

The Stock Span problem¶

--> See http://www.geeksforgeeks.org/the-stock-span-problem for problem description (note: the code in that page is in Java, but the problem description should be easy to understand regardless)

Given a list of prices $P$ for a stock, where each element $P_i$ corresponds to the price of the stock on day $i$, the span $S_i$ is the number of consecutive days before and including $i$, for which the price of the stock is less than or equal to $P_i$.

So let's say we have this list of prices:

p = [100, 80, 60, 70, 60, 75, 85]

The values of S should be [1, 1, 1, 2, 1, 4, 6]. Work through this on the blackboard.

The stock span problem is an example of a problem that can be solved more efficiently with stacks. For now, let's look at a solution without stacks:

def compute_span(prices):
    s = []
    
    # Traverse the list of prices
    for i, price in enumerate(prices):
        # For each price, traverse all the previous prices,
        # incrementing S_i until we encounter a price that
        # is larger than the current price.
        s_i = 0
        for prev_price in prices[i::-1]:           
            if prev_price > price:
                break
            else:
                s_i += 1
                
        s.append(s_i)
                
    return s

compute_span(p)

[1, 1, 1, 2, 1, 4, 6]

ASK: What is wrong with this solution?

ANSWER: It is $O(n^2)$

We can use stacks to produce a more efficient implementation.

# Stack-based implementation of the stock span problem.
# Notice how we only use the stack functions when interacting
# with the stack. We never access the values in the 'st' list
# directly.
def compute_span_stacks(prices):
    s = []
    
    # Create a stack
    st = stack_create()
    
    # Traverse the list of prices
    for i, price in enumerate(prices):
        
        # Pop elements from stack until we encounter a price that
        # is greater than the current price, or until the stack
        # is empty
        while not stack_is_empty(st):
            prev_index, prev_price = stack_top(st)
            
            if prev_price > price:
                break
            else:
                stack_pop(st)
    
        if stack_is_empty(st):
            # If the stack is empty, then the current price
            # is greater than all the elements to the left of
            # it (note: this is trivially true for the first
            # price in the list)
            s_i = i + 1
        else:
            # Otherwise, the current price is only greater than the
            # elements after the index at the top of the stack.
            top_index, top_price = stack_top(st)
            s_i = i - top_index
               
        s.append(s_i)
                
        # Push the index of this price into the stack
        stack_push(st, (i, price))
        
    return s

compute_span_stacks(p)

[1, 1, 1, 2, 1, 4, 6]

The above algorithm is $O(n)$. We can also see this just by timing both algorithms.

import random

def gen_random_prices(n):
    return random.sample(range(n*1000), n)

prices = gen_random_prices(1000)
%timeit compute_span(prices)
%timeit compute_span_stacks(prices)

1.65 ms ± 4.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
852 µs ± 8.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

prices = gen_random_prices(2000)
%timeit compute_span(prices)
%timeit compute_span_stacks(prices)

7.11 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.75 ms ± 24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

prices = gen_random_prices(3000)
%timeit compute_span(prices)
%timeit compute_span_stacks(prices)

15.5 ms ± 36.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.64 ms ± 29.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But:

prices = gen_random_prices(100)
%timeit compute_span(prices)
%timeit compute_span_stacks(prices)

48.9 µs ± 445 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
83.7 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Using stacks does come with a price, but that price is worth it as soon as we start dealing with large lists of prices.

Python modules¶

Once we've defined an API, we can place those functions in a Python module (which takes the form of a Python file). For example mystack.py contains the stack functions we just defined. I can then import that module from the interpreter or from other Python files.

Show this in a separate IPython session, to avoid confusing the functions defined in the notebook with the functions in the mystack module.

Queues¶

Now that we've seen stacks, explaining queues is relatively straightforward. Like stacks, queues represent a list of values, where only the following operations are allowed:

Creating an empty queue
Enqueueing a value at the back of the queue
Dequeueing a value at the front of the queue
Peeking at the value at the front of the queue
Checking the size of queue, including whether it is empty

ASK: At this point, it should be obvious that we are going to implement a queue as a list. But, what is the front/back of the queue?

ANSWER: Depends! In Python, it doesn't really matter. As it turns out, appending/deleting at the end of the list is an $O(1)$ operation, and inserting/deleting at the start of the list is $O(n)$ (because all the elements have to be shifted forward). See https://wiki.python.org/moin/TimeComplexity (note that deletion is listed as $O(n)$ in all cases, but presumably deleting from the end of the list, which is internally an array, does not involve shifting any values).

If we make the end of the list be the front of the queue, enqueueing is $O(n)$ (because we're inserting at the start of the list) but dequeueing is $O(1)$ (because we're deleting at the end). If we make the end of the list be the back of the queue, the complexities are swapped (enqueueing is $O(1)$ and dequeueing is $O(n)$).

Assuming that the number of enqueue/dequeue operations is roughly the same (because in many applications, what gets enqueued eventually gets dequeued), there's really no difference between using the start or end of the list as the front (and viceversa for the back of the queue). These are the kind of issues that data type implementors have to deal with, but which data type users should not care about (except that, sometimes, the documentation will tell you the complexity of certain operations).

def queue_create():
    return []

def queue_is_empty(queue):
    return len(queue) == 0

def queue_length(queue):
    return len(queue)

def queue_enqueue(queue, value):
    queue.append(value)

def queue_dequeue(queue):
    return queue.pop(0)

def queue_front(queue):
    return queue[0]

def queue_to_string(queue):
    s  = "FRONT OF THE QUEUE\n"
    s += "------------------\n"
    
    for v in queue:
        s += str(v).center(19) + "\n"

    s += "------------------\n"
    s += "BACK OF THE QUEUE \n"
    return s

q1 = queue_create()
queue_enqueue(q1, 10)
queue_enqueue(q1, 27)
queue_enqueue(q1, 5)
queue_enqueue(q1, 9)
queue_enqueue(q1, 7)

print(queue_to_string(q1))

FRONT OF THE QUEUE
------------------
         10        
         27        
         5         
         9         
         7         
------------------
BACK OF THE QUEUE

queue_front(q1)

10

print(queue_to_string(q1))

FRONT OF THE QUEUE
------------------
         10        
         27        
         5         
         9         
         7         
------------------
BACK OF THE QUEUE

queue_dequeue(q1)

10

print(queue_to_string(q1))

FRONT OF THE QUEUE
------------------
         27        
         5         
         9         
         7         
------------------
BACK OF THE QUEUE