Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
pp:project2021 [2022/05/11 00:24] vbadoiu [3.6 Typos (1p)] |
pp:project2021 [2022/05/11 18:45] (current) dmihai [3.6 Typos (1p)] |
||
---|---|---|---|
Line 198: | Line 198: | ||
Deadline is on the **12th of May, 23:55** | Deadline is on the **12th of May, 23:55** | ||
+ | **Note**: the chekcer will not run until you have the implementation of the first tasks due to the missing definition of eval. | ||
For the last part of our work at our data science framework at Haskell Health we will need two important functionalities: a query language and graph queries. This will make enable the user of our framework to quickly develop a product based on our framework. Moreover, to showcase the power of our framework to the marketing team, we will do typo corrections on the data using a distance function. | For the last part of our work at our data science framework at Haskell Health we will need two important functionalities: a query language and graph queries. This will make enable the user of our framework to quickly develop a product based on our framework. Moreover, to showcase the power of our framework to the marketing team, we will do typo corrections on the data using a distance function. | ||
Line 374: | Line 375: | ||
**Task 3.4.**: Implement ''eval'' for Graph queries. | **Task 3.4.**: Implement ''eval'' for Graph queries. | ||
+ | **Note** The result should be in alphabetical order between the two ends of the interval. | ||
==== Similarities graph, using queries ==== | ==== Similarities graph, using queries ==== | ||
We want to check the similarities between users’ hours slept. | We want to check the similarities between users’ hours slept. | ||
- | * For that, we want to obtain a graph where "From" and "To" are users’ emails and "Value" is the distance between the 2 users’ hours slept. | + | * For that, we want to obtain a graph where "From" and "To" are users’ names and "Value" is the distance between the 2 users’ hours slept. |
- | * We define the distance between user1 and user2 as "the sum of intervals where they both slept a equal amount" (e.g. same value for TotalMinutesAsleep1). Keep only the rows with ''distance >= 5''. | + | * We define the distance between user1 and user2 as "the sum of intervals where they both slept a equal amount" (e.g. same value for "10" in ''eight_hours''). Keep only the rows with ''distance >= 5''. |
* The edges in the resulting graph (the rows in the resulting table) should be sorted by the "Value" column. If email is missing, don't include that entry. | * The edges in the resulting graph (the rows in the resulting table) should be sorted by the "Value" column. If email is missing, don't include that entry. | ||
**Task 3.5.**: Your task is to write ''similarities_query'' as a **sequence of queries**, that once evaluated results in the graph described above. | **Task 3.5.**: Your task is to write ''similarities_query'' as a **sequence of queries**, that once evaluated results in the graph described above. | ||
+ | |||
+ | The input table is ''eight_hours''. | ||
**Note**: ''similarities_query'' is a Query. The checker applies ''eval'' on it. | **Note**: ''similarities_query'' is a Query. The checker applies ''eval'' on it. | ||
Line 398: | Line 402: | ||
Your job is to fix the typos in ''email_map'' table, generate the correct table and then use that to join the information about user’s stats. | Your job is to fix the typos in ''email_map'' table, generate the correct table and then use that to join the information about user’s stats. | ||
- | ==== 3.6 Typos (1p) ==== | + | ==== 3.6 Typos (0.4p) ==== |
Column ''Name'' from table ''email_map'' contains typos. Example: //Zane Kayle// is misspelled as //Zan Kayle// or //Amelia Caden// is misspelled as //Amelia Camdden//. Fortunately, the name //Zane Kayle// contains only a few misspelled letters and it is guaranteed to appear in the correct form, in the name column of the ''8h sleep'' or ''physical_activity_in_a_day'' tables. Hence, we can use the latter names as **reference** to correct the typos. | Column ''Name'' from table ''email_map'' contains typos. Example: //Zane Kayle// is misspelled as //Zan Kayle// or //Amelia Caden// is misspelled as //Amelia Camdden//. Fortunately, the name //Zane Kayle// contains only a few misspelled letters and it is guaranteed to appear in the correct form, in the name column of the ''8h sleep'' or ''physical_activity_in_a_day'' tables. Hence, we can use the latter names as **reference** to correct the typos. | ||
Line 404: | Line 408: | ||
The goal is to write function ''correct_table'', which receives the name of the column containing typos, the table containing typos, a reference table (a table where the values from that column are correct, but not necessarily in the same order) and returns the first table where the typos have been fixed. All parameters are strings in CSV-format. | The goal is to write function ''correct_table'', which receives the name of the column containing typos, the table containing typos, a reference table (a table where the values from that column are correct, but not necessarily in the same order) and returns the first table where the typos have been fixed. All parameters are strings in CSV-format. | ||
- | The input table is `eight_hours` | ||
<code haskell> | <code haskell> | ||
- | correct_table :: String -> CSV -> CSV -> CSV | + | correct_table :: String -> Table -> Table -> Table |
- | -- this will be tested using: correct_table "Nume" email_map_csv hw_grades_csv | + | -- this will be tested using: correct_table "Nume" emails physical_activity |
</code> | </code> | ||