{{:fp2025:hw2-word-tree.zip|}}====== Homework 2. Sets as trees ======
In this homework, you will implement a **binary search tree**, that you will use to gather stats about **words** from a particular text. Generally, in a [[https://en.wikipedia.org/wiki/Binary_search_tree| binary search tree]]:
* each non-empty node contains **exactly one value** and **two children**
* all values from the **left** sub-tree are smaller or equal to that of the current node
* all values from the **right** sub-tree are larger or equal to that of the current node
In your project, the **value** of each node will be represented by ''Token'' objects. The class ''Token'' is already implemented for you:
case class Token(word: String, freq: Int)
A token stores:
* the **number of occurrences**, or **frequency** ''freq'' of a string ''word'', in a text.
Your binary search tree will use **frequencies** as an ordering criterion. For instance, the text: ''All for one and one for one'', may be represented by the tree:
for (2)
/ \
and (1) one (3)
/
all (1)
Notice that there are multiple possible BS trees to represent one text, however you do not need to take this into account in this homework.
==== Keeping the tree balanced ====
Adding new nodes to a BST might require a change in the current tree structure, in order to (i) preserve the binary-search property stated above, as well as (ii) minimal height. Fortunately, a ''rebalance'' method that preserves both (i) and (ii) has already been implemented, and in our limited set of tests, you will not need it in the code you will write.
==== Constructing the tree ====
Our tree is called ''WTree'', and is implemented by the following case classes:
case object Empty extends WTree
case class Node(word: Token, left: WTree, right: WTree) extends WTree
''WTree'' implements the following trait:
trait WTreeInterface {
def isEmpty: Boolean
def filter(pred: Token => Boolean): WTree
def ins(w: Token): WTree
def contains(s:String): Boolean
def size: Int
}
The method ''ins'' is already implemented, but the rest must be implemented by you. The project has two parts:
* **building a WTree** from a text, and
* **using a WTree**, to gather info about that particular text.
In the next section you will find implementation details about each of the above.
===== Implementation =====
**1.** Write a function which splits a text using the single whitespace character as a separator. Multiple whitespaces should be treated as a single separator. If the list contains only whitespaces, ''split'' should return the empty list. (//Hints: Your implementation must be recursive, but do not try to make it tail-recursive. It will make your code unnecessarily complicated. Several patterns over lists, in the proper order will make the implementation cleaner.//)
/* split(List('h','i',' ','t','h','e','r','e')) = List(List('h','i'), List('t','h','e','r','e'))
*/
def split(text: List[Char]): List[List[Char]] = ???
**2.** Write a function which computes a list of ''Token'' from a list of strings. Recall that Tokens keep track of the string frequency. Use an auxiliary function
''insWord'' which inserts a new string in a list of Tokens. If the string is already a token, its frequency is incremented, otherwise it is added as a new token. (//Hint: the cleanest way to implement aux is to use one of the two folds//).
def computeTokens(words: List[String]): List[Token] = {
/* insert a new string in a list of tokens */
def insWord(s: String, acc: List[Token]): List[Token] = ???
def aux(rest: List[String], acc: List[Token]): List[Token] = ???
???
}
**3.** Write a function ''tokensToTree'' which creates a ''WTree'' from a list of tokens. Use the insertion function ''ins'' which is already implemented. (//Hint: you can implement it as a single fold call, but you have to choose the right one//)
def tokensToTree(tokens: List[Token]): WTree = ??
**4.** Write a function ''makeTree'' which takes a string and builds a ''WTree''. ''makeTree'' relies on all the previous functions you implemented. You should use ''_.toList'', which converts a ''String'' to ''List[Char]''. You can also use ''andThen'', which allows writing a concise and clear implementation. ''andThen'' is explained in detail in the next section.
def makeTree(s:String): WTree = ???
**5.** Implement the member method ''size'', which must return the number of non-empty nodes in the tree.
**6.** Implement the member method ''contains'', which must check if a string is a member of the tree (no matter its frequency).
**7.** Implement the ''filter'' method in the abstract class ''WTree''. Filter will rely on the tail-recursive ''filterAux'' method, which must be implemented in the case classes ''Empty'' and ''Node''.
**8.** In the code template you will find a string: ''scalaDescription''.
Compute the number of occurrences of the keyword "Scala" in ''scalaDescription''. Use word-trees and any of the previous functions you have defined.
def scalaFreq: Int = ???
**9.** Find how many programming languages are referenced in the same text. You may consider that a programming language is any keyword which starts with an uppercase character. To reference character ''i'' in a string ''s'', use ''s(i)''. You can also use the method ''_.isUpper''.
def progLang: Int = ???
**10.** Find how many words which are not prepositions or conjunctions appear in the same text. You may consider that a preposition or conjunction is any word whose size is less or equal to 3.
def wordCount : Int = ???
**Note:** In order to be graded, exercises 5 to 9 must rely on a correct implementation of the previous parts of the homework.
===== Using andThen =====
Suppose you want to apply a **sequence** of transformations over an object ''o''. Some of them may be functions (''f'', ''g'') while other may be member functions (''m1,m2''). Instead of defining expressions such as: ''g(f(o).m1).m2'' which reflects the sequence: ''f'', ''m1'', ''g'', ''m2'' of transformations on object ''o'', you can instead use ''andThen'':
val sequence =
(x => f(x))
andThen (_.m1)
andThen (x => g(x))
andThen(_.m2)
which is more legible and easy to debug.
===== Submission rules =====
* Please follow the [[fp2025:submission-guidelines| Submission guidelines]] which are the same for all homework.
* To solve your homework, download the {{:fp2025:hw2-word-tree.zip|Project template}}, import it in IntellIJ, and you are all set. Do not rename the project manually, as this may cause problems with IntellIJ.