M4: Data Cleaning and Wrangling¶
This fourth modules provides an overview data cleaning and wrangling. We will explore techniques using regular expressions and pandas to clean and wrangle data before performing analysis on it.
Pre-recorded Lectures¶
The pre-recorded lectures are available here. You can also find the videos under the “Panopto” tab on the CAPP 30122 canvas site.
The lectures are a series of approx 5-20 minute videos divided into 5 sections:
4.1 - Motivation for Regular Expressions
4.2 - Regex Language
4.3 - Group Matching
4.4 - Data Cleaning using Pandas (Part 1)
4.5 - Data Cleaning using Pandas (Part 2)
The jupyter notebooks for data cleaning module are located in the upstream repository under the m4/resources
directory.
Supplementary Resources¶
Lab¶
Labs provide additional practice problems for topics covered in a module. You may work on them individually or with your peers.
Regular Expressions Lab (You find distribution code in the upstream repository in the
/labs/reg/dist
directory. The solution is located inlabs/reg/soln
.)
This time around not all solutions are provided but majority of them are provided. Please ask on Ed if you are having difficulty with the exercises for solutions that are not provided.
Programming Assignment¶
Programming Assignment #3, due Saturday, February 12th at 4:30pm CDT