Grading and Final Projects#

The following page contains everything you need to know about final projects and grading. Please read it carefully before approaching us with questions.

Grading#

The grade is determined by a final project and bonus points you can collect by submitting assignments throughout the term. You can achieve a perfect grade by submitting the final project alone but it is much easier to do so if you have bonus points.

The final project will get between 0 and 150 points. Each of the five assignments can get between 0 and 3 points.

Bonus points cannot help you to achieve a passing grade if you would not achieve a passing grade with the final project alone.

Points translate to grades as follows:

Grade

Minimum number of points

1.0

147

1.3

139

1.7

131

2.0

123

2.3

115

2.7

107

3.0

99

3.3

91

3.7

83

4.0

75

How do we grade the final project#

The grade will be a holistic assessment of your project with a focus on code quality.

We look at the following criteria:

  • Does the project satisfy our definition of reproducibility?

  • Did you use git effectively throughout the project?

  • Does your code reflect the best practices we discuss in the software engineering chapter?

  • Did you test your code effectively?

  • How challenging is the task you solve?

A final project will be about as much work as two or three assignments.

How the project will be submitted#

Final projects are submitted in a GitHub repository created by GitHub classroom. Follow this invitation to create the repository which you can then clone to your computer.

The deadline is Thursday, March 6, 11:59pm. Changes that are pushed after the deadline will automatically be ignored when the submissions are downloaded via GitHub classroom.

If you do not know yet what git is and how it works, don’t worry. You will learn this during the class.

Which programming languages and libraries can you use#

The final projects have to be in Python.

You can use any python library you want. If you need packages that are not installed in the course environment, your project needs to contain an environment.yml file with all packages you use.

If you use libraries that are not part of the course environment and do not provide an environment file, we will deduct points.

Examples projects from previous years#

It is not the goal to produce projects that are similar to the example projects. The only reason we describe them here is to give you a better feeling for our expectations.

Project 1#

For this project, I contacted a professor I wanted to work with and I asked them whether they needed help in any of their project. I specified that the help I would have offered would have been on a programming-related part of the project (data cleaning and data analysis).

I joined a project in which the authors document individuals’ preferences about how to split the cost of climate change within a society and across societies.

  • Upsides of this kind of project

    • show your skills to a professor while working on the exam

    • experience first-hand what it means to work with data in an empirical project

    • it helps define the topic to work on since it is provided by a third-party

  • Downsides of this kind of project

    • extra work related to the project involvement (in my case data gathering)

    • extra work to positively impress the professor you are working with

    • possible extension of your involvement in the project beyond the end of the final project

Contents#

For this project, I contributed to the data gathering process (design the survey on Qualtrix)(note: this was outside the scope of the EPP final project); I wrote a script to clean the data; a script to generate summary statistics and perform preliminary regressions; a script to perform some (very) elementary text analyses (e.g. parsing, word cleaning and word count); and finally, a script to produce outputs (latex readable tables and graphs). The main output of my final project was a latex document that briefly described the project, the data, and the results of the analyses.

Size of the Project#

This is a summary of the lines of code in my project:

Language

Files

Blank Lines

Comment Lines

Code Lines

Python

24

498

467

1435

YAML

7

64

17

703

TeX

4

46

14

189

Markdown

6

58

4

148

Tests#

All code was written in functional form and I tried to write at least one test per function in the data cleaning and analysis parts.

Libraries Used#

In addition to the libraries coming out of the project template I used:

  • nltk - to analyze human language data

  • seaborn - to visualize statistical data

  • stargazer - to produce latex-formatted tables

Grade#

The project was evaluated with 148 resulting in a 1.0 (even without bonus points)

Project 2#

For this project, I wanted to learn more about Natural Language Processing (NLP) to use it in my thesis. I decided to work on a relatively simple NLP task, namely I used a dataset containing tweets about climate change (from kaggle.com) to train four classification models and evaluate their performance predicting features of the tweets (e.g. whether the tweet supports the belief of man-made climate change).

Upsides of this project:

  • It was a useful learning opportunity beyond simply the scope of the course.

  • I had the possibility to get acquainted with different libraries that came in useful later when doing my thesis, such as pytorch.

  • Because of the nature of the project, I had to face different unexpected issues, such as for example using libraries with limited support, or to deal with datasets and models that could not fit on my RAM. This was very useful as it helped me develop and improve my approach to problem-solving and debugging.

Downsides of this project:

  • Because of the various unexpected issues, the project was sometimes a bit frustrating and I ended up focusing less than I would have wanted to on good practice for the code.

  • Learning new methods took extra time compared to a project in which I would have done something I was already familiar with.

Contents#

In the first part of the project I wrote a script to clean the text data and preprocess it for the classification task. In the second part I trained the four models and tried out different fine-tuning strategies. Finally, in the last part I wrote a script to evaluate the performance of the models and produce a latex table with (some of) the results.

Size of the Project#

This is a summary of the lines of code in my project:

Language

Files

Blank

Comment

Code

Python

32

347

387

1180

Markdown

6

64

0

208

YAML

4

4

4

186

TeX

1

25

1

62

Tests#

I tested all key functions in all parts to check if they would return the expected outputs. For some of the more complex functions I also tested specific features depending on the function (e.g. for the preprocessing function I tested a sizeable set of text inputs to see whether it would handle them how I expected it to).

Libraries Used#

In addition to the standard libraries saw in the course I used a series of libraries specific to NLP tasks, and to some fine-tuning strategies I used:

  • nltk

  • scikit-learn

  • bayesian-optimization

  • imbalanced-learn

  • keras

  • tensorflow

Grade#

The project got 140 points, which corresponds to a 1.3 (before bonus points)