====== Task Set 3 (Query Language & Graphs) ====== ===== Query Language overview ===== The objective of this task set is to: * allow programmers to **combine** table processing functions in a flexible way * create a **separation** between what functions **do** and how they are implemented. This is helpful for a number of reasons: * **integration** of new table functionality later on * continuous code **upgrade** (new algorithms for different table processing functions can be implemented without altering the rest of the project) * debugging and testing You will implement a **query language** which **represents** a variety of table transformations, some of which you have already implemented as table functions. A ''Query'' is **sequence** (in some cases, a **combination**) of such transformations. You will also implement an evaluation function which performs a ''Query'' on a given table. The query language to be implemented is shown below: data Query = FromCSV CSV | ToCSV Query | AsList String Query | Sort String Query | ValueMap (Value -> Value) Query | RowMap (Row -> Row) [String] Query | VUnion Query Query | HUnion Query Query | TableJoin String Query Query | Cartesian (Row -> Row -> Row) [String] Query Query | Projection [String] Query | forall a. FEval a => Filter (FilterCondition a) Query | Graph EdgeOp Query -- where EdgeOp is defined: type EdgeOp = Row -> Row -> Maybe Value **Don't worry about Graph or Filter queries yet.** ==== Prerequisite ==== Add the following lines at the beginning of your .hs files: {-# LANGUAGE ExistentialQuantification #-} {-# LANGUAGE FlexibleInstances #-} The first line allows ''forall a''. The second allows ''instance FEval String''. ===== Query Evaluation ===== While most queries take a ''Table'' and return a transformed ''Table'', there are some queries which evaluate to a ''String'' or a list of ''String''. Thus we define the type: ''QResult'' which describes any of these three possible query result types: data QResult = CSV CSV | Table Table | List [String] -- don't confuse first 'CSV' and second 'CSV': first refers to constructor name, -- second refers to type CSV (defined in taskset 1); same with 'Table'. **Task 3.1.**: Enroll ''QResult'' in class ''Show''. For Table, use ''write_csv'' (see task set X). For ''List'' and ''String'', use default. instance Show QResult where ... In order to ensure separation between queries and their evaluation (!?! is it so?) we define class ''Eval'', which offers function ''eval''. Your job is to enroll ''Query'' in this class. class Eval a where eval :: a -> QResult instance Eval Query where ... We explain below how each data constructor from ''Query'' should be evaluated: - ''FromCSV str'': converts string ''str'' to a ''Table''. - ''ToCSV query'': converts a table obtained from the evaluation of query to a ''String'' in CSV format. - ''AsList colname query'': returns values from column ''colname'' as a list. - ''Sort colname query'': sorts table by column ''colname''. - ''ValueMap op query'': maps all values from table, using ''op''. - ''RowMap op colnames query'': maps all rows from table, using ''op''. - ''VUnion query1 query2'': vertical union of the 2 tables obtained through the evaluations of ''query1'' and ''query2''. - ''HUnion query1 query2'': horizontal union of the 2 tables. - ''TableJoin colname query1 query2'': table join with respect to column ''colname''. - ''Cartesian op colnames query1 query2'': cartesian product. - ''Projection colnames query'': extract specified columns from table. ===== Filters & filter conditions ===== You may have noticed that filter query is commented-out. You can **uncomment** it at this stage. Filtering will receive a special treatment. Because filter conditions are usually complex, instead of performing successive filter queries it is better to build complex query conditions. For this reason, we define type ''FilterCondition a'', illustrated below: data FilterCondition a = Eq String a | Lt String a | Gt String a | In String [a] | FNot (FilterCondition a) | FieldEq String String **Remark:** the type ''FilterCondition a'' is **polymorphic** because such conditions may be expressed over (in this homework) two types: * ''Float'' and * ''String'' We briefly explain what each condition expresses: - ''Eq colname ref'': checks if value from column ''colname'' is equal to ''ref''. - ''Lt colname ref'': checks if value from column ''colname'' is less than ''ref''. - ''Gt colname ref'': checks if value from column ''colname'' is greater than ''ref''. - ''In colname list'': checks if value from column ''colname'' is in list. - ''FNot cond'': negates condition. - ''FieldEq colname1 colname2'': checks if values from columns ''colname1'' and ''colname2'' are equal. === FilterCondition Evaluation === A ''FilterCondition'' must evaluate to an actual filtering function, which has type: type FilterOp = Row -> Bool Since such filtering functions work differently for ''FilterCondition Float'' and ''FilterCondition String'', we need a class ''FEval'' which contains function ''feval''. The latter is used to evaluate a ''FilterCondition a'' to a function of type ''FilterOp''. In order to do so, ''feval'' needs to have information about column names (the table head), hence it's type is shown below. class FEval a where feval :: [String] -> (FilterCondition a) -> FilterOp **Task 3.x.**: Your task is to write the instances for ''(FEval Float)'' and ''(FEval String)''. instance FEval Float where ... instance FEval String where ... Now you can write the evaluation for the data constructor ''Filter query'' (see function **eval** from the previous section). ===== Graph queries ===== A **graph** is a special kind of table which has precisely the following column names: ''["From", "To", "Value"]''. Each row defines a **weighted edge** between node ''From'' and node ''To''. The query ''Graph edgeop query'': creates such a table starting from the result of the evaluation of ''query''. Suppose the query evaluates to a table **T**. * The nodes are the **rows** in table **T**. * The weight of an edge between 2 nodes is given by ''edgeop'', which returns a ''Maybe Value''. If ''edgeop row1 row2'' returns ''Nothing'', then we don't have an edge between those 2 nodes. If it returns ''Just val'' then we have an edge between ''row1'' and ''row2'' of weight ''val''. * In the resulting table, each row describes an edge between node_i and node_j and will have the values: * "From" = first column from node_i * "To" = first column from node_j * "Value" = edgeop node_i node_j The edge //node_i-node_j// is the same as //node_j-node_i//, so it should only appear once (graphs are unoriented). "From" value should be lexicographically before "To". **Example:** Suppose **T** is the table shown below: Name Grade Class Mihai 9 321 Andrei 8 322 Stefan 10 321 Ana 9 322 If we would like to build a graph that connects all students in the same class, then: edgeop [_,_,z] [_,_,c] | z == c = Just c | otherwise = Nothing and the resulting graph will be: From To Value Mihai Stefan 321 Ana Andrei 322 If we would like to build a graph that connects students with grades equal or with a difference of **at least** a point, then: edgeop [_,x,_] [_,y,_] | abs $ (read x :: Int) - (read y :: Int) <= 1 = Just "similar" | otherwise = Nothing and the resulting graph is: From To Value Andrei Mihai similar Mihai Stefan similar Ana Mihai similar Ana Andrei similar Ana Stefan similar ==== Similarities graph, using queries ==== We want to check the similarities between students lecture points. * For that, we want to obtain a graph where "From" and "To" are students' emails and "Value" is the distance between the 2 students' points. * We define the distance between stud1 and stud2 as ''the sum of questions where they both received the same points''. Keep only the rows with ''distance >= 5''. * The edges in the resulting graph (the rows in the resulting table) should be sorted by the "Value" column. If email is missing, don't include that entry. Your task is to write ''similarities_query'' as a **sequence of queries**, that once evaluated results in the graph described above. **Note**: ''similarities_query'' is a Query. The checker applies ''eval'' on it. ===== TL;DR Tasks ===== - Enroll ''Query'' in class ''Eval'' (without ''Filter'' or ''Graph''). **0.3p** - Enroll ''FilterCondition'' in class ''FEval'' and implement ''eval'' for ''Filter'' query. **0.2p** - Implement ''eval'' for ''Graph'' query. 0.2p - Extract similarity graph. 0.3p ===== Checker ===== ===== Submit ===== **Deadline**: 16.05, 23:50. **Vmchecker**: TBA.