{{:fp2025:hw2-word-tree.zip|}}====== Homework 2. Sets as trees ====== In this homework, you will implement a **binary search tree**, that you will use to gather stats about **words** from a particular text. Generally, in a [[https://en.wikipedia.org/wiki/Binary_search_tree| binary search tree]]: * each non-empty node contains **exactly one value** and **two children** * all values from the **left** sub-tree are smaller or equal to that of the current node * all values from the **right** sub-tree are larger or equal to that of the current node In your project, the **value** of each node will be represented by ''Token'' objects. The class ''Token'' is already implemented for you: case class Token(word: String, freq: Int) A token stores: * the **number of occurrences**, or **frequency** ''freq'' of a string ''word'', in a text. Your binary search tree will use **frequencies** as an ordering criterion. For instance, the text: ''All for one and one for one'', may be represented by the tree: for (2) / \ and (1) one (3) / all (1) Notice that there are multiple possible BS trees to represent one text, however you do not need to take this into account in this homework. ==== Keeping the tree balanced ==== Adding new nodes to a BST might require a change in the current tree structure, in order to (i) preserve the binary-search property stated above, as well as (ii) minimal height. Fortunately, a ''rebalance'' method that preserves both (i) and (ii) has already been implemented, and in our limited set of tests, you will not need it in the code you will write. ==== Constructing the tree ==== Our tree is called ''WTree'', and is implemented by the following case classes: case object Empty extends WTree case class Node(word: Token, left: WTree, right: WTree) extends WTree ''WTree'' implements the following trait: trait WTreeInterface { def isEmpty: Boolean def filter(pred: Token => Boolean): WTree def ins(w: Token): WTree def contains(s:String): Boolean def size: Int } The method ''ins'' is already implemented, but the rest must be implemented by you. The project has two parts: * **building a WTree** from a text, and * **using a WTree**, to gather info about that particular text. In the next section you will find implementation details about each of the above. ===== Implementation ===== **1.** Write a function which splits a text using the single whitespace character as a separator. Multiple whitespaces should be treated as a single separator. If the list contains only whitespaces, ''split'' should return the empty list. (//Hints: Your implementation must be recursive, but do not try to make it tail-recursive. It will make your code unnecessarily complicated. Several patterns over lists, in the proper order will make the implementation cleaner.//) /* split(List('h','i',' ','t','h','e','r','e')) = List(List('h','i'), List('t','h','e','r','e')) */ def split(text: List[Char]): List[List[Char]] = ??? **2.** Write a function which computes a list of ''Token'' from a list of strings. Recall that Tokens keep track of the string frequency. Use an auxiliary function ''insWord'' which inserts a new string in a list of Tokens. If the string is already a token, its frequency is incremented, otherwise it is added as a new token. (//Hint: the cleanest way to implement aux is to use one of the two folds//). def computeTokens(words: List[String]): List[Token] = { /* insert a new string in a list of tokens */ def insWord(s: String, acc: List[Token]): List[Token] = ??? def aux(rest: List[String], acc: List[Token]): List[Token] = ??? ??? } **3.** Write a function ''tokensToTree'' which creates a ''WTree'' from a list of tokens. Use the insertion function ''ins'' which is already implemented. (//Hint: you can implement it as a single fold call, but you have to choose the right one//) def tokensToTree(tokens: List[Token]): WTree = ?? **4.** Write a function ''makeTree'' which takes a string and builds a ''WTree''. ''makeTree'' relies on all the previous functions you implemented. You should use ''_.toList'', which converts a ''String'' to ''List[Char]''. You can also use ''andThen'', which allows writing a concise and clear implementation. ''andThen'' is explained in detail in the next section. def makeTree(s:String): WTree = ??? **5.** Implement the member method ''size'', which must return the number of non-empty nodes in the tree. **6.** Implement the member method ''contains'', which must check if a string is a member of the tree (no matter its frequency). **7.** Implement the ''filter'' method in the abstract class ''WTree''. Filter will rely on the tail-recursive ''filterAux'' method, which must be implemented in the case classes ''Empty'' and ''Node''. **8.** In the code template you will find a string: ''scalaDescription''. Compute the number of occurrences of the keyword "Scala" in ''scalaDescription''. Use word-trees and any of the previous functions you have defined. def scalaFreq: Int = ??? **9.** Find how many programming languages are referenced in the same text. You may consider that a programming language is any keyword which starts with an uppercase character. To reference character ''i'' in a string ''s'', use ''s(i)''. You can also use the method ''_.isUpper''. def progLang: Int = ??? **10.** Find how many words which are not prepositions or conjunctions appear in the same text. You may consider that a preposition or conjunction is any word whose size is less or equal to 3. def wordCount : Int = ??? **Note:** In order to be graded, exercises 5 to 9 must rely on a correct implementation of the previous parts of the homework. ===== Using andThen ===== Suppose you want to apply a **sequence** of transformations over an object ''o''. Some of them may be functions (''f'', ''g'') while other may be member functions (''m1,m2''). Instead of defining expressions such as: ''g(f(o).m1).m2'' which reflects the sequence: ''f'', ''m1'', ''g'', ''m2'' of transformations on object ''o'', you can instead use ''andThen'': val sequence = (x => f(x)) andThen (_.m1) andThen (x => g(x)) andThen(_.m2) which is more legible and easy to debug. ===== Submission rules ===== * Please follow the [[fp2025:submission-guidelines| Submission guidelines]] which are the same for all homework. * To solve your homework, download the {{:fp2025:hw2-word-tree.zip|Project template}}, import it in IntellIJ, and you are all set. Do not rename the project manually, as this may cause problems with IntellIJ.