Task Set 4

For the final tasks of this project, we want to merge the information about students' class situation from the 3 tables (lecture, exam, homework) !!! Please add links to csvs!!! . The issue with joining these 3 tables is that table lecture_grades is mapped using students' emails, whereas the other 2 tables are mapped using students' names.

Fortunately, we have another table, email_map, that maps students' names to their emails. The bad news is that this table contains typos.

Your job is to fix the typos in email_map table, generate the correct table and then use that to join the information about students' lecture, exam and homework grades.

Column Name from table email_map contains typos. Example: Zane Kayle is misspelled as Zan Kayle or Amelia Caden is misspelled as Amelia Camdden. Fortunately, the name Zane Kayle contains only a few misspelled letters and it is guaranteed to appear in the correct form, in the name column of the exam or homework tables. Hence, we can use the latter names as reference to correct the typos.

The goal is to write function correct_table, which receives the name of the column containing typos, the table containing typos, a reference table (a table where the values from that column are correct, but not necessarily in the same order) and returns the first table where the typos have been fixed. All parameters are strings in CSV-format.

correct_table :: String -> CSV -> CSV -> CSV
-- this will be tested using: correct_table "Nume" email_map_csv hw_grades_csv

So in order to fix the typos, you will use a reference table, where the specified column has correct values.

  • extract the necessary column from the table with typos and the reference table (let's call these T and Ref);
  • filter out only the values from T and Ref which don't have a perfect match in the other table - these are the problematic entries (this will help improve time performance);
  • calculate the distance between each value from T and each value from Ref (distance = how similar the 2 strings are - you decide how to formally define this distance);
  • for every value from T, its correct form is the value from Ref with the shortest distance to it;
  • lastly, restore the original table, replacing the incorrect values from T with the correct values from Ref.

Timeout

For this task, we will set a timeout of 30 seconds. Your implementation must succeed in that time in order to receive the points. The most time consuming process should be calculating the distance between each value from T and each value from Ref. We suggest Lazy Dynamic Programming.

Note

  • Your implementation must be generic! It can't depend on the table structure or rows order. You can't assume that the rows in the the faulty table and in the reference table have the same order. Similarly, you can't choose a distance function that only works for these tables.
  • The steps above are only recommended. If you find another implementation that works and respects the first condition (generic implementation), that's great!
  • Also recommended is that you use your previously implemented Query Language, but it's not a restriction. You might find some queries really helpful, such as Cartesian, Projection or Filter.

Note: Task Typos is a prerequisite to this task. You need a correct email_map table in order to implement them.

Grades Table

Now that you can have the correct mapping of each student's name to their email, we can join all tables (hw_grades, lecture_grades, exam_grades). Write function grades:

grades :: CSV -> CSV -> CSV -> CSV -> CSV
-- tested with: grades email_map_csv hw_grades_csv exam_grades_csv lecture_grades_csv

The result will be a table with the following column names: grades_schema = [“Nume”, “Punctaj Teme”, “Punctaj Curs”, “Punctaj Exam”, “Punctaj Total”]. The table is sorted by the first column (column “Nume”).

The formulas for computing each grade are:

  • hw_grade = sum of all columns, excluding “Nume” from table hw_grades (Lab, T1, T2, T3, Ex1, Ex2, Ex3, Ex4) ⇒ Punctaj Teme;
  • lecture_grade = 2 * (sum of all columns, excluding “Email” from table lecture_grades) / (number of columns, excluding “Email”) ⇒ Punctaj Curs;
  • exam_grade = (Q1 + Q2 + Q3 + Q4 + Q6) / 4 + Ex. scris ⇒ Punctaj Exam;
  • total = min(hw_grade + lecture_grade, 5) + exam_grade or 4.00 if hw_grade + lecture_grade < 2.5 or exam_grade < 2.5 ⇒ Punctaj Total.

The resulting table must be sorted by column “Nume”.

Note

  • Don't forget to use correct_table on email_map_csv! If this takes too long to test, you can save the output of correct_table “Nume” email_map_csv hw_grades_csv and use it directly, but only while testing.
  • Table lecture_grade has several entries without a value for column “Email”. Ignore those. Also if a name is missing in any of the tables, just complete those columns with “” and consider value to be 0.
  • Use printf “%.2f” for printing Float values.
  • Unlike the previous task, your implementation for grades can depend on the tables' known structures.

Deadline: 30.05, 23:50. Vmchecker: TBA.