====== Homework 2. Sets as trees ======

In this homework, you will implement a **binary search tree**, that you will use to gather stats about **words** from a particular text. Generally, in a [[https://en.wikipedia.org/wiki/Binary_search_tree| binary search tree]]:
  * each non-empty node contains **exactly one value** and **two children**
  * all values from the **left** sub-tree are smaller or equal to that of the current node
  * all values from the **right** sub-tree are larger or equal to that of the current node

In your project, the **value** of each node will be represented by ''Token'' objects. The class ''Token'' is already implemented for you:
<code scala>
case class Token(word: String, freq: Int)
</code>
A token stores:
  * the **number of occurrences**, or **frequency** ''freq'' of a string ''word'', in a text.

Your binary search tree will use **frequencies** as an ordering criterion. For instance, the text: ''All for one and one for one'', may be represented by the tree:
<code>
      for (2)
      /   \
 and (1)  one (3)
  /            
all (1)            
</code>

Notice that there are multiple possible BS trees to represent one text, however you do not need to take this into account in this homework.

==== Keeping the tree balanced ====
Adding new nodes to a BST might require a change in the current tree structure, in order to (i) preserve the binary-search property stated above, as well as (ii) minimal height. Fortunately, a ''rebalance'' method that preserves both (i) and (ii) has already been implemented, and in our limited set of tests, you will not need it in the code you will write.

==== Constructing the tree ====

Our tree is called ''WTree'', and is implemented by the following case classes:
<code scala>
case object Empty extends WTree
case class Node(word: Token, left: WTree, right: WTree) extends WTree 
</code>

''WTree'' implements the following trait:
<code scala>
trait WTreeInterface {
  def isEmpty: Boolean
  def filter(pred: Token => Boolean): WTree
  def ins(w: Token): WTree
  def contains(s:String): Boolean
  def size: Int
}
</code>

The method ''ins'' is already implemented, but the rest must be implemented by you. The project has two parts: 
  * **building a WTree** from a text, and
  * **using a WTree**, to gather info about that particular text.

In the next section you will find implementation details about each of the above.

===== Implementation =====

**1.** Write a function which splits a text using the single whitespace character as a separator. Multiple whitespaces should be treated as a single separator. If the list contains only whitespaces, ''split'' should return the empty list. (//Hints: Your implementation must be recursive, but do not try to make it tail-recursive. It will make your code unnecessarily complicated. Several patterns over lists, in the proper order will make the implementation cleaner.//)
<code scala>
/*  split(List('h','i',' ','t','h','e','r','e')) = List(List('h','i'), List('t','h','e','r','e'))
*/
def split(text: List[Char]): List[List[Char]] = ???
</code>

**2.** Write a function which computes a list of ''Token'' from a list of strings. Recall that Tokens keep track of the string frequency. Use an auxiliary function 
 ''insWord'' which inserts a new string in a list of Tokens. If the string is already a token, its frequency is incremented, otherwise it is added as a new token. (//Hint: the cleanest way to implement aux is to use one of the two folds//).
<code scala>
def computeTokens(words: List[String]): List[Token] = {
    /* insert a new string in a list of tokens */
    def insWord(s: String, acc: List[Token]): List[Token] = ???
    def aux(rest: List[String], acc: List[Token]): List[Token] = ???
    ???
  }
</code>

**3.** Write a function ''tokensToTree'' which creates a ''WTree'' from a list of tokens. Use the insertion function ''ins'' which is already implemented. (//Hint: you can implement it as a single fold call, but you have to choose the right one//)

<code scala>
def tokensToTree(tokens: List[Token]): WTree = ??
</code>

**4.** Write a function ''makeTree'' which takes a string and builds a ''WTree''. ''makeTree'' relies on all the previous functions you implemented. You should use ''_.toList'', which converts a ''String'' to ''List[Char]''. You can also use ''andThen'', which allows writing a concise and clear implementation. ''andThen'' is explained in detail in the next section.

<code scala>
def makeTree(s:String): WTree = ???
</code>

**5.** Implement the member method ''size'', which must return the number of non-empty nodes in the tree.

**6.** Implement the member method ''contains'', which must check if a string is a member of the tree (no matter its frequency).

**7.** Implement the ''filter'' method in the abstract class ''WTree''. Filter will rely on the tail-recursive ''filterAux'' method, which must be implemented in the case classes ''Empty'' and ''Node''.

**8.** In the code template you will find a string: ''scalaDescription''.

Compute the number of occurrences of the keyword "Scala" in ''scalaDescription''. Use word-trees and any of the previous functions you have defined.
<code scala>
def scalaFreq: Int = ??? 
</code>

**9.** Find how many programming languages are referenced in the same text. You may consider that a programming language is any keyword which starts with an uppercase character. To reference character ''i'' in a string ''s'', use ''s(i)''. You can also use the method ''_.isUpper''.

<code scala>
def progLang: Int = ???
</code>

**10.** Find how many words which are not prepositions or conjunctions appear in the same text. You may consider that a preposition or conjunction is any word whose size is less or equal to 3.

<code scala>
def wordCount : Int = ???
</code>

**Note:** In order to be graded, exercises 5 to 9 must rely on a correct implementation of the previous parts of the homework.

===== Using andThen =====

Suppose you want to apply a **sequence** of transformations over an object ''o''. Some of them may be functions (''f'', ''g'') while other may be member functions (''m1,m2''). Instead of defining expressions such as: ''g(f(o).m1).m2'' which reflects the sequence: ''f'', ''m1'', ''g'', ''m2'' of transformations on object ''o'', you can instead use ''andThen'':

<code scala>
val sequence = 
   (x => f(x))
      andThen (_.m1)
      andThen (x => g(x))
      andThen(_.m2)
</code>

which is more legible and easy to debug.


===== Submission rules =====

  * Please follow the [[fp2024:submission-guidelines| Submission guidelines]] which are the same for all homework.
  * To solve your homework, download the {{:fp2024:h2-word-tree.zip|Project template}}, import it in IntellIJ, and you are all set. Do not rename the project manually, as this may cause problems with IntellIJ.