Differences

This shows you the differences between two versions of the page.

--- fp:homework02-draft [2022/03/28 12:08]
pdmatei
+++ fp:homework02-draft [2022/03/31 13:38] (current)
pdmatei
@@ Line 3: / Line 3: @@
 ===== Problem statement =====
+In this homework, you will implement a **binary search tree**, that you will use to gather stats about **words** from a particular text. Generally, in a [[https://en.wikipedia.org/wiki/Binary_search_tree| binary search tree]]:
+  * each non-empty node contains **exactly one value** and **two children**
+  * all values from the **left** sub-tree are smaller or equal to that of the current node
+  * all values from the **right** sub-tree are larger or equal to that of the current node
-===== Implementation details =====
+In your project, the **value** of each node will be represented by ''Token'' objects. The class ''Token'' is already implemented for you:
+<code scala>
+case class Token(word: String, freq: Int)
+</code>
+A token stores:
+  * the **number of occurrences**, or **frequency** ''freq'' of a string ''word'', in a text.
+Your binary search tree will use **frequencies** as an ordering criterion. For instance, the text: ''All for one and one for one'', may be represented by the tree:
+<code>
+      for (2)
+      /   \
+ and (1)  one (3)
+  /
+all (1)
+</code>
+Notice that there are multiple possible BS trees to represent one text, however you do not need to take this into account in this homework.
+Our tree is called ''WTree'', and is implemented by the following case classes:
+<code scala>
+case object Empty extends WTree
+case class Node(word: Token, left: WTree, right: WTree) extends WTree
+</code>
+''WTree'' implements the following trait:
+<code scala>
+trait WTreeInterface {
+  def isEmpty: Boolean
+  def filter(pred: Token => Boolean): WTree
+  def ins(w: Token): WTree
+  def contains(s:String): Boolean
+  def size: Int
+}
+</code>
+The method ''ins'' is already implemented, but the rest must be implemented by you. The project has two parts:
+  * **building a WTree** from a text, and
+  * **using a WTree**, to gather info about that particular text.
-About ''andThen''
+In the next section you will find implementation details about each of the above.
 ===== Implementation =====
-Write a function which splits a text using the single whitespace character as a separator. Multiple whitespaces should be treated as a single separator. If the list contains only whitespaces, ''split'' should return the empty list.
+**1.** Write a function which splits a text using the single whitespace character as a separator. Multiple whitespaces should be treated as a single separator. If the list contains only whitespaces, ''split'' should return the empty list. (//Hints: Your implementation must be recursive, but do not try to make it tail-recursive. It will make your code unnecessarily complicated. Several patterns over lists, in the proper order will make the implementation cleaner.//)
 <code scala>
 /*  split(List('h','i',' ','t','h','e','r','e')) = List(List('h','i'), List('t','h','e','r','e'))
@@ Line 17: / Line 57: @@
 </code>
-Write a function which computes a list of ''Token'' from a list of strings. Recall that Tokens keep track of the string frequency. Use an auxiliary function
+**2.** Write a function which computes a list of ''Token'' from a list of strings. Recall that Tokens keep track of the string frequency. Use an auxiliary function
- ''insWord'' which inserts a new string in a list of Tokens. If the string is already a token, its frequency is incremented, otherwise it is added as a new token.
+ ''insWord'' which inserts a new string in a list of Tokens. If the string is already a token, its frequency is incremented, otherwise it is added as a new token. (//Hint: the cleanest way to implement aux is to use one of the two folds//).
 <code scala>
 def computeTokens(words: List[String]): List[Token] = {
@@ Line 28: / Line 68: @@
 </code>
-Write a function ''tokensToTree'' which creates a ''WTree'' from a list of tokens. Use the insertion function ''ins'' which is already implemented. (//Hint: you can implement it as a single fold call, but you have to choose the right one//)
+**3.** Write a function ''tokensToTree'' which creates a ''WTree'' from a list of tokens. Use the insertion function ''ins'' which is already implemented. (//Hint: you can implement it as a single fold call, but you have to choose the right one//)
 <code scala>
@@ Line 34: / Line 74: @@
 </code>
-Write a function ''makeTree'' which takes a string and builds a ''WTree''. ''makeTree'' relies on all the previous functions you implemented. You should use ''_.toList'', which converts a ''String'' to ''List[Char]''. You can also use ''andThen'', which allows writing a concise and clear implementation.
+**4.** Write a function ''makeTree'' which takes a string and builds a ''WTree''. ''makeTree'' relies on all the previous functions you implemented. You should use ''_.toList'', which converts a ''String'' to ''List[Char]''. You can also use ''andThen'', which allows writing a concise and clear implementation. ''andThen'' is explained in detail in the next section.
 <code scala>
@@ Line 40: / Line 80: @@
 </code>
-Implement a ''filter'' method in the
+**5.** Implement the member method ''size'', which must return the number of non-empty nodes in the tree.
-In the code template you will find a string: ''scalaDescription''.
+**6.** Implement the member method ''contains'', which must check if a string is a member of the tree (no matter its frequency).
-Compute the number of occurrences of the keyword "Scala" in ''scalaDescription'':
+**7.** Implement the ''filter'' method in the abstract class ''WTree''. Filter will rely on the tail-recursive ''filterAux'' method, which must be implemented in the case classes ''Empty'' and ''Node''.
+**8.** In the code template you will find a string: ''scalaDescription''.
+Compute the number of occurrences of the keyword "Scala" in ''scalaDescription''. Use word-trees and any of the previous functions you have defined.
 <code scala>
 def scalaFreq: Int = ???
 </code>
+**9.** Find how many programming languages are referenced in the same text. You may consider that a programming language is any keyword which starts with an uppercase character. To reference character ''i'' in a string ''s'', use ''s(i)''. You can also use the method ''_.isUpper''.
+<code scala>
+def progLang: Int = ???
+</code>
+**10.** Find how many words which are not prepositions or conjunctions appear in the same text. You may consider that a preposition or conjunction is any word whose size is less or equal to 3.
+<code scala>
+def wordCount : Int = ???
+</code>
+**Note:** In order to be graded, exercises 5 to 9 must rely on a correct implementation of the previous parts of the homework.
 ===== Using andThen =====
@@ Line 62: / Line 119: @@
 function1(function2(function3(x)))
 </code>
-which may become less intuitive as the number of applied function grows, we may use:
+which may become less intuitive as the number of applied functions grows, we may use:
 <code scala>
 function1
@@ Line 74: / Line 131: @@
 ==== Project format ====
-  * **You should not change any other files of the project, except for the //template-file//**. For this homework, the //template-file// is ''FSets.scala''. **Warning:** if a submission has changes in other files, it **may not be graded**.
+  * **You should not change any other files of the project, except for the //template-file//**. For this homework, the //template-file// is ''Main.scala''. **Warning:** if a submission has changes in other files, it **may not be graded**.
-  * To solve your homework, download the {{:fp:h1-fsets.zip| Homework project}} and **rename** it using the following convention: ''HX_<LastName>_<FirstName>'', where X is the homework number. (Example: ''H1_Popovici_Matei''). If your project name disregards this convention, it **may not be graded**.
+  * To solve your homework, download the {{:fp:H2-word-tree.zip| Homework project}} and **rename** it using the following convention: ''HX_<LastName>_<FirstName>'', where X is the homework number. (Example: ''H2_Popovici_Matei''). If your project name disregards this convention, it **may not be graded**.
   * Each project file contains a ''profileID'' definition which you must fill out with your token ID received via email for this lecture. Make sure the token id is defined correctly. (Grades will be automatically assigned by token ID).
   * In order to be graded, **the homework must compile**. If a homework has **compilation errors** (does not compile), it **will not be graded**. Please take care to remove code that does not compile by replacing (or keeping) function bodies implemented with ''???''.