In-class exercise#
The in-class exercise is distributed via a GitHub Classroom repository. To get access to your group’s git repository, you can follow this link.
The first person will create a group and set up the repository for the group; the others will land on the webpage of your group’s repository immediately. All group members can then clone the repository.
Introduction#
In this exercise you will use pytask for the first time. You will take code from in-class exercise 6 (the one on functional data management and plotting) so you don’t have to write too much code and can focus on the mechanics of using pytask.
Important#
You will find the solution to all tasks in the folder solution
. You will write your
own solutions in the folder exercise
. When executing pytask you should cd to
exercise
or solution
(depending on which one you want to look at). Do not just
execute pytask in the root of this repository.
Task 1#
Copy all function definitions from the solution to in-class exercise 6, task 1 into a
file called task_clean_election_data.py
.
Add a task function at the top of the file (after import statements but before function definitions) that does the following things:
load the original dataset
load the metadata
call the
clean_data
function you copiedsave the result in a
bld
folder under a suitable name.
Run pytask
from a terminal in the exercise
directory and verify that the task is
executed correctly.
Task 2#
The purpose of this task is to make sure that your dependencies are configured correctly.
Open a shell in the directory exercise
and execute pytask.
Don’t make any changes and run pytask again. The result should be that all tasks are skipped because nothing has changed. If this is not the case, ask for help.
Now delete the created dataset and run pytask again. The result should be that your task runs again and the created dataset is stored on disk. If this is not the case, you have a problem in your product specification (in the produces statement). Debug it!
Now add a comment or docstring in
task_clean_election_data.py
and run pytask again. The result should be that the task is executed again. If this is not the case, you have a problem in your dependency specification (in the depends_on statement). Debug it!
Task 3#
In task 5 of in-class exercise 6 we use the gapminder data. Create the file
task_prepare_gapminder_data.py
and add the following functions:
A function called
_reduce_gapminder_data
that takes the raw gapminder data as a DataFrame and returns a reduced DataFrame containing only the countries Algeria, Egypt, Sudan and South AfricaA function called
task_prepare_gapminder_data
that downloads the gapminder data, calls the_reduce_gapminder_data
function and saves the result under a suitable name in the bld folder.
No matter how little there is to do, you should implement your main logic in a function that works on Python objects (here DataFrames) and delegate all loading and saving to the task function.
Task 4#
Add a file called task_create_simple_lineplot.py
and create the simple line-plot from
in-class exercise 6, Task 5.
Again, you can define all functions you need inside the task file but you should define a plotting function that only works on python objects (and returns python objects).
Save the generated plot under a suitable name in the bld
folder.
Task 5#
Add a file called task_create_highlighted_lineplot.py
and add a function that creates
the lineplot from tasks 8, 9 and 10 in in-class exercise 6.
Add a corresponding task function that imports data, calls the create_plot function and saves the generated plot under a suitable name in the bld folder.
Task 6#
Add a file called task_create_scatterplots.py
that loops over a task function in order
to create the three scatterplots from in-class exercise 6, task 4.
Task 7 (Bonus)#
Refactor your plotting code in create_highlighted_lineplot.py
such that there is one
main function that calls several private functions that do the actual plotting.
Note that when writing code for complex plots, you are allowed to use functions that modify a figure in-place, i.e. have a side effect.