Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
project-2-draft [2021/05/07 16:30] roxana_elena.stiuca [Typos (0.5p)] |
project-2-draft [2021/05/16 21:53] (current) roxana_elena.stiuca [Typos (0.5p)] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Task Set 4 ====== | ====== Task Set 4 ====== | ||
- | For the final tasks of this project, we want to merge the information about students' class situation from the 3 tables (lecture, exam, homework). The issue with joining these 3 tables is that table lecture_grades is mapped using students' emails, whereas the other 2 tables are mapped using students' names. | + | For the final tasks of this project, we want to merge the information about students' class situation from the 3 tables (lecture, exam, homework) !!! Please add links to csvs!!! . The issue with joining these 3 tables is that table ''lecture_grades'' is mapped using students' emails, whereas the other 2 tables are mapped using students' names. |
- | Fortunately, we have another table, email_map, that maps students' names to their emails. The bad news is that this tables contains typos. | + | |
- | Your job is to fix the typos in email_map table, generate the correct table and then use that to join the information about students' lecture, exam and homework grades. | + | Fortunately, we have another table, email_map, that maps students' names to their emails. The **bad news** is that this table contains **typos**. |
+ | |||
+ | Your job is to fix the typos in ''email_map'' table, generate the correct table and then use that to join the information about students' lecture, exam and homework grades. | ||
===== Typos (0.5p) ===== | ===== Typos (0.5p) ===== | ||
- | Column ''Name'' from table ''email_map'' contains typos. Example: //Zane Kayle// is misspelled as //Zan Kayle// or //Amelia Caden// is misspelled as //Amelia Camdden//. | + | Column ''Name'' from table ''email_map'' contains typos. Example: //Zane Kayle// is misspelled as //Zan Kayle// or //Amelia Caden// is misspelled as //Amelia Camdden//. Fortunately, the name //Zane Kayle// contains only a few misspelled letters and it is guaranteed to appear in the correct form, in the name column of the ''exam'' or ''homework'' tables. Hence, we can use the latter names as **reference** to correct the typos. |
The goal is to write function ''correct_table'', which receives the name of the column containing typos, the table containing typos, a reference table (a table where the values from that column are correct, but not necessarily in the same order) and returns the first table where the typos have been fixed. All parameters are strings in CSV-format. | The goal is to write function ''correct_table'', which receives the name of the column containing typos, the table containing typos, a reference table (a table where the values from that column are correct, but not necessarily in the same order) and returns the first table where the typos have been fixed. All parameters are strings in CSV-format. | ||
Line 20: | Line 21: | ||
=== Recommended steps === | === Recommended steps === | ||
- | * extract the necessary column from the faulty table and the reference table (let's call these //T// and //Ref//); | + | * extract the necessary column from the table with typos and the reference table (let's call these //T// and //Ref//); |
* filter out only the values from T and Ref which don't have a perfect match in the other table - these are the problematic entries (this will help improve time performance); | * filter out only the values from T and Ref which don't have a perfect match in the other table - these are the problematic entries (this will help improve time performance); | ||
* calculate the distance between each value from T and each value from Ref (distance = how similar the 2 strings are - //you decide how to formally define this distance//); | * calculate the distance between each value from T and each value from Ref (distance = how similar the 2 strings are - //you decide how to formally define this distance//); | ||
Line 27: | Line 28: | ||
=== Timeout === | === Timeout === | ||
- | For this task, we will set a timeout of TBA s (//suggestion: 1 min?//). Your implementation must succeed in that time in order to receive the points. | + | For this task, we will set a timeout of 30 seconds. Your implementation must succeed in that time in order to receive the points. |
The most time consuming process should be calculating the distance between each value from T and each value from Ref. We suggest **Lazy Dynamic Programming**. | The most time consuming process should be calculating the distance between each value from T and each value from Ref. We suggest **Lazy Dynamic Programming**. | ||
=== Note === | === Note === | ||
- | * **Your implementation must be generic! It can't depend on the table structure or rows order. You can't assume that the rows in the the faulty table and in the reference table have the same order. Similarity, you can't choose a distance function that only works for these tables.** | + | * **Your implementation must be generic! It can't depend on the table structure or rows order. You can't assume that the rows in the the faulty table and in the reference table have the same order. Similarly, you can't choose a distance function that only works for these tables.** |
* The steps above are only //recommended//. If you find another implementation that works and respects the first condition (generic implementation), that's great! | * The steps above are only //recommended//. If you find another implementation that works and respects the first condition (generic implementation), that's great! | ||
* Also recommended is that you use your previously implemented Query Language, but it's not a restriction. You might find some queries really helpful, such as Cartesian, Projection or Filter. | * Also recommended is that you use your previously implemented Query Language, but it's not a restriction. You might find some queries really helpful, such as Cartesian, Projection or Filter. | ||
Line 37: | Line 38: | ||
===== Project Summary (0.5p) ===== | ===== Project Summary (0.5p) ===== | ||
- | Note: Task **Typos** is a prerequisite to these tasks. You need a correct ''email_map'' table in order to implement them. | + | Note: Task **Typos** is a prerequisite to this task. You need a correct ''email_map'' table in order to implement them. |
==== Grades Table ==== | ==== Grades Table ==== | ||
Line 52: | Line 53: | ||
* ''lecture_grade = 2 * (sum of all columns, excluding "Email" from table lecture_grades) / (number of columns, excluding "Email")'' //=> Punctaj Curs//; | * ''lecture_grade = 2 * (sum of all columns, excluding "Email" from table lecture_grades) / (number of columns, excluding "Email")'' //=> Punctaj Curs//; | ||
* ''exam_grade = (Q1 + Q2 + Q3 + Q4 + Q6) / 4 + Ex. scris'' //=> Punctaj Exam//; | * ''exam_grade = (Q1 + Q2 + Q3 + Q4 + Q6) / 4 + Ex. scris'' //=> Punctaj Exam//; | ||
- | * ''total = min(hw_grade + lecture_grade, 5) + exam_grade'' or **4** if ''hw_grade + lecture_grade < 2.5'' or ''exam_grade < 2.5'' //=> Punctaj Total//. | + | * ''total = min(hw_grade + lecture_grade, 5) + exam_grade'' or **4.00** if ''hw_grade + lecture_grade < 2.5'' or ''exam_grade < 2.5'' //=> Punctaj Total//. |
The resulting table must be **sorted by column "Nume"**. | The resulting table must be **sorted by column "Nume"**. | ||
Line 58: | Line 59: | ||
=== Note === | === Note === | ||
* Don't forget to use ''correct_table'' on ''email_map_csv''! If this takes too long to test, you can save the output of ''correct_table "Nume" email_map_csv hw_grades_csv'' and use it directly, but only while testing. | * Don't forget to use ''correct_table'' on ''email_map_csv''! If this takes too long to test, you can save the output of ''correct_table "Nume" email_map_csv hw_grades_csv'' and use it directly, but only while testing. | ||
- | * Table lecture_grade has several entries without a value for column "Nume". Ignore those. | + | * Table lecture_grade has several entries without a value for column "Email". Ignore those. Also if a name is missing in any of the tables, just complete those columns with "" and consider value to be 0. |
+ | * Use ''printf "%.2f"'' for printing Float values. | ||
+ | * Unlike the previous task, your implementation for ''grades'' can depend on the tables' known structures. | ||
===== Submit ===== | ===== Submit ===== | ||
**Deadline**: 30.05, 23:50. | **Deadline**: 30.05, 23:50. | ||
**Vmchecker**: TBA. | **Vmchecker**: TBA. |