====== Task Set 4 ====== For the final tasks of this project, we want to merge the information about students' class situation from the 3 tables (lecture, exam, homework) !!! Please add links to csvs!!! . The issue with joining these 3 tables is that table ''lecture_grades'' is mapped using students' emails, whereas the other 2 tables are mapped using students' names. Fortunately, we have another table, email_map, that maps students' names to their emails. The **bad news** is that this table contains **typos**. Your job is to fix the typos in ''email_map'' table, generate the correct table and then use that to join the information about students' lecture, exam and homework grades. ===== Typos (0.5p) ===== Column ''Name'' from table ''email_map'' contains typos. Example: //Zane Kayle// is misspelled as //Zan Kayle// or //Amelia Caden// is misspelled as //Amelia Camdden//. Fortunately, the name //Zane Kayle// contains only a few misspelled letters and it is guaranteed to appear in the correct form, in the name column of the ''exam'' or ''homework'' tables. Hence, we can use the latter names as **reference** to correct the typos. The goal is to write function ''correct_table'', which receives the name of the column containing typos, the table containing typos, a reference table (a table where the values from that column are correct, but not necessarily in the same order) and returns the first table where the typos have been fixed. All parameters are strings in CSV-format. correct_table :: String -> CSV -> CSV -> CSV -- this will be tested using: correct_table "Nume" email_map_csv hw_grades_csv So in order to fix the typos, you will use a **reference table**, where the specified column has correct values. === Recommended steps === * extract the necessary column from the table with typos and the reference table (let's call these //T// and //Ref//); * filter out only the values from T and Ref which don't have a perfect match in the other table - these are the problematic entries (this will help improve time performance); * calculate the distance between each value from T and each value from Ref (distance = how similar the 2 strings are - //you decide how to formally define this distance//); * for every value from T, its correct form is the value from Ref with the shortest distance to it; * lastly, restore the original table, replacing the incorrect values from T with the correct values from Ref. === Timeout === For this task, we will set a timeout of 30 seconds. Your implementation must succeed in that time in order to receive the points. The most time consuming process should be calculating the distance between each value from T and each value from Ref. We suggest **Lazy Dynamic Programming**. === Note === * **Your implementation must be generic! It can't depend on the table structure or rows order. You can't assume that the rows in the the faulty table and in the reference table have the same order. Similarly, you can't choose a distance function that only works for these tables.** * The steps above are only //recommended//. If you find another implementation that works and respects the first condition (generic implementation), that's great! * Also recommended is that you use your previously implemented Query Language, but it's not a restriction. You might find some queries really helpful, such as Cartesian, Projection or Filter. ===== Project Summary (0.5p) ===== Note: Task **Typos** is a prerequisite to this task. You need a correct ''email_map'' table in order to implement them. ==== Grades Table ==== Now that you can have the correct mapping of each student's name to their email, we can join all tables (hw_grades, lecture_grades, exam_grades). Write function ''grades'': grades :: CSV -> CSV -> CSV -> CSV -> CSV -- tested with: grades email_map_csv hw_grades_csv exam_grades_csv lecture_grades_csv The result will be a table with the following column names: ''grades_schema = ["Nume", "Punctaj Teme", "Punctaj Curs", "Punctaj Exam", "Punctaj Total"]''. The table is sorted by the first column (column "Nume"). The formulas for computing each grade are: * ''hw_grade = sum of all columns, excluding "Nume" from table hw_grades'' //(Lab, T1, T2, T3, Ex1, Ex2, Ex3, Ex4) => Punctaj Teme//; * ''lecture_grade = 2 * (sum of all columns, excluding "Email" from table lecture_grades) / (number of columns, excluding "Email")'' //=> Punctaj Curs//; * ''exam_grade = (Q1 + Q2 + Q3 + Q4 + Q6) / 4 + Ex. scris'' //=> Punctaj Exam//; * ''total = min(hw_grade + lecture_grade, 5) + exam_grade'' or **4.00** if ''hw_grade + lecture_grade < 2.5'' or ''exam_grade < 2.5'' //=> Punctaj Total//. The resulting table must be **sorted by column "Nume"**. === Note === * Don't forget to use ''correct_table'' on ''email_map_csv''! If this takes too long to test, you can save the output of ''correct_table "Nume" email_map_csv hw_grades_csv'' and use it directly, but only while testing. * Table lecture_grade has several entries without a value for column "Email". Ignore those. Also if a name is missing in any of the tables, just complete those columns with "" and consider value to be 0. * Use ''printf "%.2f"'' for printing Float values. * Unlike the previous task, your implementation for ''grades'' can depend on the tables' known structures. ===== Submit ===== **Deadline**: 30.05, 23:50. **Vmchecker**: TBA.