This is an old revision of the document!


Task Set 4

For the final tasks of this project, we want to merge the information about students' class situation from the 3 tables (lecture, exam, homework). The issue with joining these 3 tables is that table lecture_grades is mapped using students' emails, whereas the other 2 tables are mapped using students' names. Fortunately, we have another table, email_map, that maps students' names to their emails. The bad news is that this tables contains typos.

Your job is to fix the typos in email_map table, generate the correct table and then use that to join the information about students' lecture, exam and homework grades.

Column Name from table email_map contains typos. Example: Zane Kayle is misspelled as Zan Kayle or Amelia Caden is misspelled as Amelia Camdden.

The goal is to write function correct_table, which receives the name of the column containing typos, the table containing typos, a reference table (a table where the values from that column are correct, but not necessarily in the same order) and returns the first table where the typos have been fixed. All parameters are strings in CSV-format.

correct_table :: String -> CSV -> CSV -> CSV
-- this will be tested using: correct_table "Nume" email_map_csv hw_grades_csv

So in order to fix the typos, you will use a reference table, where the specified column has correct values.

  • extract the necessary column from the faulty table and the reference table (let's call these T and Ref);
  • filter out only the values from T and Ref which don't have a perfect match in the other table - these are the problematic entries (this will help improve time performance);
  • calculate the distance between each value from T and each value from Ref (distance = how similar the 2 strings are - you decide how to formally define this distance);
  • for every value from T, its correct form is the value from Ref with the shortest distance to it;
  • lastly, restore the original table, replacing the incorrect values from T with the correct values from Ref.

Timeout

For this task, we will set a timeout of TBA s (suggestion: 1 min?). Your implementation must succeed in that time in order to receive the points. The most time consuming process should be calculating the distance between each value from T and each value from Ref. We suggest Lazy Dynamic Programming.

Note

  • Your implementation must be generic! It can't depend on the table structure or rows order. You can't assume that the rows in the the faulty table and in the reference table have the same order. Similarity, you can't choose a distance function that only works for these tables.
  • The steps above are only recommended. If you find another implementation that works and respects the first condition (generic implementation), that's great!
  • Also recommended is that you use your previously implemented Query Language, but it's not a restriction. You might find some queries really helpful, such as Cartesian, Projection or Filter.

Note: Task Typos is a prerequisite to these tasks. You need a correct email_map table in order to implement them.

Grades Table

Now that you can have the correct mapping of each student's name to their email, we can join all tables (hw_grades, lecture_grades, exam_grades). Write function grades:

grades :: CSV -> CSV -> CSV -> CSV -> CSV
-- tested with: grades email_map_csv hw_grades_csv exam_grades_csv lecture_grades_csv

The result will be a table with the following column names: grades_schema = [“Nume”, “Punctaj Teme”, “Punctaj Curs”, “Punctaj Exam”, “Punctaj Total”]. The table is sorted by the first column (column “Nume”).

The formulas for computing each grade are:

  • hw_grade = sum of all columns, excluding “Nume” from table hw_grades (Lab, T1, T2, T3, Ex1, Ex2, Ex3, Ex4) ⇒ Punctaj Teme;
  • lecture_grade = 2 * (sum of all columns, excluding “Email” from table lecture_grades) / (number of columns, excluding “Email”) ⇒ Punctaj Curs;
  • exam_grade = (Q1 + Q2 + Q3 + Q4 + Q6) / 4 + Ex. scris ⇒ Punctaj Exam;
  • total = min(hw_grade + lecture_grade, 5) + exam_grade or 4 if hw_grade + lecture_grade < 2.5 or exam_grade < 2.5 ⇒ Punctaj Total.

The resulting table must be sorted by column “Nume”.

Note

  • Don't forget to use correct_table on email_map_csv! If this takes too long to test, you can save the output of correct_table “Nume” email_map_csv hw_grades_csv and use it directly, but only while testing.
  • Table lecture_grade has several entries without a value for column “Nume”. Ignore those.

Deadline: 30.05, 23:50. Vmchecker: TBA.