This shows you the differences between two versions of the page.
ep:labs:03:contents:tasks:ex3 [2019/10/13 18:56] radu.mantu |
ep:labs:03:contents:tasks:ex3 [2025/03/17 20:53] (current) radu.mantu [03. [15p] Zip with compression levels] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 02. [30p] You vs grep ==== | + | ==== 03. [15p] Zip with compression levels ==== |
- | + | The **zip** command is used for compression and file packaging under Linux/Unix operating system. It provides 10 levels of compression, where: | |
- | === [15p] Task A - Your grep === | + | * **level 0** : provides no compression, only packaging |
- | + | * **level 6** : used as default compression level | |
- | Write a python3 script that receives a list of files from //stdin// (one per line) and an arbitrary number of words as command line arguments. The script must search for these words in each file and output each line that contains at least one of them in this format: | + | * **level 9** : provides maximum compression |
- | + | ||
- | <code> | + | |
- | <file_name>:<line_number>:<line> | + | |
- | </code> | + | |
- | + | ||
- | Note that if a line contains more than one word, it still must appear only once in your output. | + | |
- | Your program should be run like this: | + | |
<code bash> | <code bash> | ||
- | $ find . | ./my_grep.py import for sys | + | $ zip -5 file.zip file.txt |
</code> | </code> | ||
- | === [10p] Task B - Compare to grep === | + | === [10p] Task A - Measurements === |
- | + | Write a script to measure the compression rate and the time required for each level. You have a few large files in the code skeleton but feel free to add more. If you do add new files, make sure that they are not random data! | |
- | Use any commands that you have learned in this or any other lab to compare your implementation to that **grep** (at least one must be related to I/O). Test case: | + | |
- | + | ||
- | <code bash> | + | |
- | $ find /usr/include/ | ./my_grep.py int include define for | + | |
- | $ grep -rn "int\|include\|define\|for" /usr/include/ | + | |
- | $ grep -Frn -f <(echo "int\ninclude\ndefine\nfor") /usr/include | + | |
- | </code> | + | |
- | + | ||
- | What algorithm does **grep** use? How does **grep -F** differ from **fgrep**? | + | |
- | + | ||
- | === [5p] Task C - Aho-Corasick === | + | |
- | + | ||
- | Read up a bit about [[https://www.geeksforgeeks.org/aho-corasick-algorithm-pattern-searching/|Aho-Corasick]]. Explain to your assistant how it works. | + | |
- | <solution> | + | === [5p] Task B - Plot === |
- | grep => Boyer Moore \\ | + | Generate a plot illustrating the compression rate, size decrease, etc. as a function of **zip** compression level. Make sure that your plot is //understandable// (i.e., has labels, a legend, etc.) Make sure to average multiple measurements for each compression level. |
- | fgrep => Aho-Corasick \\ | + | |
- | grep -F => Commentz-Walter | + | |
- | </solution> | + |