This is an old revision of the document!


Project PP 2022

Project team

  • Mihai-Valentin DUMITRU
  • Vlad-Andrei BĂDOIU
  • Teodora-Andreea ION
  • Mihai UDUBAȘA
  • Matei POPOVICI
  • Your Teaching Assistant
    • You are highly encouraged to talk with your TA about the project and to ask questions during lab. You may also have sessions outside lab hours to check your code with them (either requested by you or by them).

Points

  • You can get a maximum of 4.2 points for the project, over the entire semester.
  • You need a minimum of 2.5 points from the project in order to qualify for the exam, along with a minimum of 3 points accumulated during the semester (lab, project, lecture).
  • Your final score will be computed at the end of the semester.
  • There will be automated testing, but the score given by the checker is not necessarily the final score. There can be points removed by either:
    1. not complying to the task requirements (e.g. using library functions instead of implementing a required function), in which case the task is considered not completed and the points for that task are removed.
    2. not meeting a deadline (see below).

Task Sets & Deadlines:

  • The project will be divided in 3 steps, each with its own task set, submission assignment and number of points associated. The task sets will have progressively-increasing difficulty, and will reflect the lecture progress.
  • Each task set has its own deadline; there is also a hard deadline for the project, on the 15th of may.
  • You can submit solutions to each taskset until the project deadline.
  • If you miss a task set deadline, you will lose 0.7 points from that taskset (to a minimum of zero).
  • To meet a task set deadline, you have to make a submission worth at least 0.7 points by that deadline; the grading team might manually deduce some points after the manual review, but unless this is done for extreme circumstances (e.g. hardcoding answers to automated tests), it will not count as missing the deadline.

We are working for the Haskell Health company that builds fitness trackers. For this project we want to build a basic data science analysis framework similar to pandas from Python.

The exercises are based on and tested with the tables that you can found in the file Dataset.hs:

  • emails: contains the connection between the name of a person and their email
  • eight_hours: contains the number of steps during 8 hours on a specific day
  • physical_activity: contains the total steps, total distance, very active minutes, fairly active minutes, lightly active minutes of a person on a specific day
  • sleep_min: contains the number of minutes slept in a week

When working on the project, we suggest parsing the table cells in a single function and passing appropriate data types to helper functions (e.g. for an eight_hours table, a helper function might work on a (String, [Float])).

During the whole project we will consider the following types: type CSV = String type Value = String type Row = [Value] type Table = [Row]

At the beginning of the Tasks.hs file you already have implemented the functions split_by, read_csv, write_csv that you can use to solve the tasks but you can also implement your own.

Important: Use the function (printf “%.2f”) when you have to transform a Float number to String.

Your module with the solution will have to be called Tasks.hs.

Checker

We offer you an automated checker and a test suite. We also encourage you to add your own tests.

To run the checker, simply run runhaskell main.hs in your shell (if you only wish to run certain tasksets, you can give it additional arguments: runhaskell main.hs 1 3 4).

The checker is written in Haskell and can be easily extended to add new testcases. Check main.hs for more.

Task 1 - 0.2p

We will first like to compute everyone's average number of steps per given day, based on the formula: $(10+11+12+13+14+15+16+17)/8$. You will have to write a function which takes an eight_hours table and returns a table with columns “Name”, “Average Number of Steps”.

Task 2 - 0.2p

Based on their number of steps, check:

  • how many people have achieved their daily goal (have at least 1000 steps given the total number of steps on the 8 hours from the eight_hours table),
  • what is the percentage of people who have achieved their goal and
  • what is the average number of daily steps for all people.

For this you will have to implement 3 functions which take an eight_hours table and return an Int.

Task 3 - 0.2p

We want to find out which are the hardest hours to be active. We want to compute the average steps for each hour. The result will be a table with columns “H10”, “H11”, “H12”, “H13”, “H14”, “H15”, “H16”, “H17”, where the “Hx” column will represent the average number of steps for the x hour in the eight_hours table.

Task 4 - 0.2p

Similar to the previous task, we want to know how many minutes were spent being active based on their intensity. You will use the physical_activity table and compute a table with 3 rows “VeryActiveMinutes”, “FairlyActiveMinutes”, “LightlyActiveMinutes” and 3 columns (“range1”, “range2”, “range3”), each column with a range of number of minutes: first column will represent the range $[0-50)$, second $[50-100)$ and third $[100,500)$. A value in the table found at the intersection between a column and a row will be the number of people who have spent a number of minutes included in the range given by the column at the intensity given by the row.

Task 5 - 0.2p

At this point, we would like to see the overall leaderboard. You'll have to sort the people by their total number of steps (ascending). If two people have the same number of steps, they will be listed in alphabetical name order. The function for this will take a physical_activity table and return a table with columns “Name”, “Total steps”.

Hint: Use the Ordering data type

Task 6 - 0.2p

Then we will compute the difference between two parts of the day regarding the number of steps based on the table eight_hours. We will create a table with 4 columns: “Name”, “Average first 4h”, “Average last 4h”, “Difference”. This table will contain the people sorted ascending by the “Difference” column. If 2 people have the same difference, they will be listed in alphabetical name order.

Task 7 - 0.1p

Implement a function which applies a function to all values in a table.

vmap :: (Value -> Value) -> Table -> Table

An example use of this would be:

correct_table = vmap (\x -> if x == "" then "0" else x) table

Task 8 - 0.1p

Implement a function which applies a function to all entries (rows) in a table. The new column names are given. Additionally, you have to implement a function which takes a Row from sleep_min table and returns a Row with 2 values: name, total number of minutes slept that week.

rmap :: (Row -> Row) -> [String] -> Table -> Table
get_sleep_total :: Row -> Row

For the get_sleep_total function, print the number of minutes using (printf “%.2f”).