Testing Your Code

Once you’ve written a piece of code, you will want to test that it works as expected. There are broadly two ways of doing this:

  • Manual testing: This just involves running the code you wrote with some sample values, to check whether it behaves as expected. For example, if you were writing an expression to compute whether a year is a leap year, you may try running your code in IPython with a few leap years and a few non-leap years, to see whether your code correctly identifies the leap years.

  • Automated testing: There are automated testing frameworks that allow you to specify a series of tests you want to run on your code, and which make it easy to automatically re-run all those tests. For example, following the leap year example, you wouldn’t have to manually test each year value one by one; instead, the testing framework would test all these year values for you, and would report back how many produced the expected result.

All the exercises and programming assignments include a suite of automated tests that you can use to check whether your code is working correctly, Furthermore, these tests also factor into your score for an exercise or programming assignment.

Because the automated tests are easy to run (and affect your score for the assignment), you may be tempted to do the following: write some code, immediately try running the automated tests to find a test that fails, make a guess as to how to modify your code, and then repeat the process until all of the tests pass.

This is not a good way to test your code. Instead, you should start by doing some manual testing to get a sense of whether it is working before you try the automated tests. In this page, we describe how to do this.

Manual testing

Let’s say you’re working on Exercises #1 and, specifically, on the add_one_and_multiply exercise. You can start by informally testing your expression on IPython. For example, you could do this:

In [1]: a = 5

In [2]: x = 2

In [3]: a + 1 * x
Out[3]: 7

That doesn’t seem quite right: the result should’ve been 12 (if we add 1 to 5, that gives us 6, which multiplied by 2 gives us 12). Looks like you need to experiment a bit more in IPython!

Let’s say you’ve figured out what you believe it the correct expression, and you’ve added the code to the se1.py file. You should then try making some sample calls to the add_one_and_multiply function:

In [1]: %load_ext autoreload

In [2]: %autoreload 2

In [3]: import se1

In [4]: se1.add_one_and_multiply(5, 2)
Out[4]: 12

In [5]: se1.add_one_and_multiply(7, 0)
Out[5]: 0

If you get the wrong answer for some sample input, stop to reason why your code is behaving the way it is and think about how to modify it to get the correct result.

In short exercises like the one above, it is sometimes enough to look at the code and figure our why it is not working. However, for more complex code, especially once you move on to the programming assignments, you will want to follow a more rigorous approach. You should make a hypothesis about what might be wrong and use print statements to print out key values to help you verify or disprove your hypothesis. You can also find a lot of tips on how to debug your code in The Debugging Guide

After you’ve done some manual testing, and get the sense that your function seems to be working for, at least, a few simple inputs, you should try running the automated tests. The tests could reveal that there are still issues with your function and, at that point, you could repeat the same process we described above. The important thing is that you always have an idea in your head about how your code works, and don’t make random changes that you don’t fully understand.

Automated testing

Now on to the automated tests. We will be using the pytest Python testing framework for this and subsequent assignments. Pytest is available on the CS machines. To run our automated tests, you will use the py.test command from the Linux command line (not from within ipython3). We recommend opening a new terminal window for running this command, which will allow you to go back and forth easily between testing code by hand in ipython3 in one terminal window and running the test suite using py.test in the other. (When we work on assignments, we usually have three windows open: one for editing, one for experimenting in ipython3, and one for running the automated tests.)

For example, to run all the tests for add_one_and_multiply, you can run the following command from the Linux command-line:

$ py.test -v -x -k add_one_and_multiply test_se1.py

(Recall that the $ represents the prompt and is not included in the command.)

Here is what each part of this command means:

  • py.test indicates that we want to run pytest.

  • test_se1.py is the name of the file that contains the testing code. (If you look at this file, you may find some of the syntax unfamiliar; this is okay for now.)

  • In between these, we specify three options:

    • The flag -v means run in verbose mode; this gives us a more detailed readout of the test results.

    • The flag -x means that pytest should stop running tests after a single test failure.

    • The option -k add_one_and_multiply restricts pytest to only running the tests for add_one_and_multiply. The way the -k option works is actually a bit more elaborate but, for now, you can assume that providing the name of the function you’re testing will run only the tests for that function.

Here is (slightly-modified) output from using this command to test our reference implementation of add_one_and_multiply:

$ py.test -v -x -k add_one_and_multiply test_se1.py
====================================== test session starts =======================================
platform linux -- Python 3.8.5, pytest-3.9.1, py-1.10.0, pluggy-0.13.1 -- /bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.8.5', 'Platform': 'Linux-5.8.0-59-generic-x86_64-with-glibc2.29',
'Packages': {'pytest': '3.9.1', 'py': '1.10.0', 'pluggy': '0.13.1'}, 'Plugins': {'timeout':
'1.3.2', 'json': '0.4.0', 'metadata': '1.7.0', 'html': '1.19.0'}}
rootdir: /home/username/cmsc12100/short-exercises/se1, inifile: pytest.ini
plugins: timeout-1.3.2, json-0.4.0, metadata-1.7.0, html-1.19.0
collected 26 items / 20 deselected

::test_se1.py::test_add_one_and_multiply_1 PASSED                                          [ 16%]
::test_se1.py::test_add_one_and_multiply_2 PASSED                                          [ 33%]
::test_se1.py::test_add_one_and_multiply_3 PASSED                                          [ 50%]
::test_se1.py::test_add_one_and_multiply_4 PASSED                                          [ 66%]
::test_se1.py::test_add_one_and_multiply_5 PASSED                                          [ 83%]
::test_se1.py::test_add_one_and_multiply_6 PASSED                                          [100%]

- generated json report: /home/username/cmsc12100/short-exercises/se1/tests.json -
============================ 6 passed, 20 deselected in 0.04 seconds =============================

This output shows that our code passed all six tests for add_one_and_multiply. It also shows that there were 20 tests that were deselected (that is, were not run) because they did not match the test selection criteria specified by the argument to -k.

If you fail a test, pytest will print out some information about what went wrong. For example, let’s say you specified the following expression, which we saw earlier was incorrect:

a + 1 * x

This would pass the first test, but would fail the second one:

$ py.test -v -x -k add_one_and_multiply test_se1.py
====================================== test session starts =======================================
platform linux -- Python 3.8.5, pytest-3.9.1, py-1.10.0, pluggy-0.13.1 -- /bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.8.5', 'Platform': 'Linux-5.8.0-59-generic-x86_64-with-glibc2.29',
'Packages': {'pytest': '3.9.1', 'py': '1.10.0', 'pluggy': '0.13.1'}, 'Plugins': {'timeout':
'1.3.2', 'json': '0.4.0', 'metadata': '1.7.0', 'html': '1.19.0'}}
rootdir: /home/username/cmsc12100/short-exercises/se1, inifile: pytest.ini
plugins: timeout-1.3.2, json-0.4.0, metadata-1.7.0, html-1.19.0
collected 26 items / 20 deselected

::test_se1.py::test_add_one_and_multiply_1 PASSED                                          [ 16%]
::test_se1.py::test_add_one_and_multiply_2 FAILED                                          [ 33%]

============================================ FAILURES ============================================
__________________________________ test_add_one_and_multiply_2 ___________________________________

    def test_add_one_and_multiply_2():
>       do_test_add_one_and_multiply(a=5, x=2, expected=12)

test_se1.py:17:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_se1.py:185: in do_test_add_one_and_multiply
    check_equals(actual, expected, recreate_msg)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

actual = 7, expected = 12
recreate_msg = 'To recreate this test in ipython3 run:\n  se1.add_one_and_multiply(5, 2)'

    def check_equals(actual, expected, recreate_msg=None):
        msg = "Actual ({}) and expected ({}) values do not match.".format(actual, expected)
        if recreate_msg is not None:
            msg += "\n" + recreate_msg

>       assert actual == expected, msg
E       AssertionError: Actual (7) and expected (12) values do not match.
E         To recreate this test in ipython3 run:
E           se1.add_one_and_multiply(5, 2)
E       assert 7 == 12

test_se1.py:158: AssertionError
- generated json report: /home/username/cmsc12100/short-exercises/se1/tests.json -
======================= 1 failed, 1 passed, 20 deselected in 0.10 seconds ========================

The volume of output can be a bit overwhelming. You should focus on the lines towards the end that start with E. These lines will usually contain a helpful message telling you why the test failed:

E       AssertionError: Actual (7) and expected (12) values do not match.
E         To recreate this test in ipython3 run:
E           se1.add_one_and_multiply(5, 2)

This information can help you narrow down the issue with your code. This error message, in particular, tells you that, like the manual testing example we saw earlier, the test code expected a return value of 12, but got a return value of 7. It also shows you how to run this test in ipython3. At this point, you should switch back to testing your function in ipython3 until you have fixed the problem.

A few more notes on the py.test command and its options:

  • By default, if you do not supply the name of a specific test file (such as test_se1.py), pytest will look in the current directory tree for Python files that have names that start with test_.

  • Because we specified the -x option, pytest exited as soon as the second test failed (without running the remaining tests). Omitting the -x option makes sense when you want to get a sense of which tests are passing and which ones aren’t; however, when debugging your code, you should always use the -x option so that you can focus on one error at a time.

  • If you don’t use the -k option, pytest will run any function that starts with test_. You can limit the tests that get run by using the -k option along with any string that uniquely identifies the desired tests. The string is not required to be a prefix. For example, if you specify -k add, pytest will run test functions that start with test_ and include the word add.

  • Pytest has many other options that we did not use here. You can see the rest of the options by running the command py.test -h.

In general, we will leave out the name of the file with the test code (test_se1.py), use short substrings to describe the desired tests, and combine the option flags (-v -x -k) into a single string (-xvk). For example, the tests for add_one_and_multiply can also be run with the following command:

$ py.test -xvk add

Obtaining your test score

Your score on the exercises, as well as the “Completeness” portion of the programming assignments, is determined by the automated tests. To get your score for the automated tests, simply run the following from the Linux command-line. (Remember to leave out the $ prompt when you type the command.)

For the short exercises:

$ py.test
$ ../common/grader.py

For the programming assignments:

$ py.test
$ ./grader.py

Notice that we’re running py.test without the -k or -x options: we want it to run all the tests. If you’re still failing some tests, and don’t want to see the output from all the failed tests, you can add the --tb=no option when running py.test:

$ py.test --tb=no

Take into account that the grader.py program will look at the results of the last time you ran py.test so, if you make any changes to your code, you need to make sure to re-run py.test.

You can also just run py.test followed by the grader on one line by running this:

$ py.test --tb=no; ../common/grader.py

If you run this inside your se1 directory, you should see something like this (of course, your actual scores may be different!):

Category                                                       Passed / Total       Score  / Points
----------------------------------------------------------------------------------------------------
Exercise 1                                                     6      / 6           15.00  / 15.00
Exercise 2                                                     4      / 4           15.00  / 15.00
Exercise 3                                                     3      / 3           15.00  / 15.00
Exercise 4                                                     5      / 5           15.00  / 15.00
Exercise 5                                                     4      / 4           20.00  / 20.00
Exercise 6                                                     4      / 4           20.00  / 20.00
----------------------------------------------------------------------------------------------------
                                                                            TOTAL = 100.00 / 100
====================================================================================================