Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:03:contents:tasks:ex3 [2019/10/15 08:18]
radu.mantu [02. [30p] You vs grep]
ep:labs:03:contents:tasks:ex3 [2025/03/17 20:53] (current)
radu.mantu [03. [15p] Zip with compression levels]
Line 1: Line 1:
-==== 02. [30pYou vs grep ==== +==== 03. [15pZip with compression levels ​==== 
- +The **zip** ​command ​is used for compression and file packaging under Linux/Unix operating system. It provides 10 levels ​of compression,​ where
-=== [15p] Task A - Your grep === +  * **level 0** provides no compression, only packaging 
- +  * **level 6** : used as default compression level 
-Write a python3 script that receives a list of files from //stdin// (one per line) and an arbitrary number of words as command ​line arguments. The script must search ​for these words in each file and output each line that contains at least one of them in this format+  * **level 9** provides maximum compression
- +
-<​code>​ +
-<​file_name>​:<​line_number>:<​line>​ +
-</​code>​ +
- +
-Note that if a line contains more than one wordit still must appear ​only once in your output. +
-Your program should be run like this:+
  
 <code bash> <code bash>
-find . -type f | ./my_grep.py import for sys+zip -5 file.zip file.txt
 </​code>​ </​code>​
  
-=== [10p] Task Compare to grep === +=== [10p] Task Measurements ​=== 
- +Write a script ​to measure the compression rate and the time required ​for each levelYou have few large files in the code skeleton but feel free to add more. If you do add new filesmake sure that they are not random data!
-Use any commands that you have learned in this or any other lab to compare your implementation to that of **grep** (at least one must be related to I/O). Test cases: +
- +
-<code bash> +
-$ find /​usr/​include/​ -type f | ./​my_grep.py int include define ​for +
-$ grep -rn "​int\|include\|define\|for"​ /​usr/​include/​ +
-$ grep -Frn -f <(echo "​int\ninclude\ndefine\nfor"​) /​usr/​include +
-</​code>​ +
- +
-What algorithm does **grep** use? How does **grep -F** differ from **fgrep**?​ +
- +
-<​solution -hidden>​ +
-grep => Boyer Moore : good prefix/bad suffix heuristics, works with patterns (i.e. regex) \\ +
-fgrep => Aho-Corasick : creates ​prefix tree (trie) and uses it as a finite automaton ​to find matchesdeprecated (use grep -F) \\ +
-grep -F => Commentz-Walter : don't know much about this one; will read up on it +
-</​solution>​ +
- +
-=== [5p] Task C - Aho-Corasick === +
- +
-Read up a bit about [[https://​www.geeksforgeeks.org/​aho-corasick-algorithm-pattern-searching/​|Aho-Corasick]]. Explain to your assistant how it works.+
  
 +=== [5p] Task B - Plot ===
 +Generate a plot illustrating the compression rate, size decrease, etc. as a function of **zip** compression level. Make sure that your plot is //​understandable//​ (i.e., has labels, a legend, etc.) Make sure to average multiple measurements for each compression level.
ep/labs/03/contents/tasks/ex3.1571116716.txt.gz · Last modified: 2019/10/15 08:18 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0