This shows you the differences between two versions of the page.
ep:labs:03:contents:tasks:ex3 [2019/10/13 19:00] radu.mantu |
— (current) | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ==== 02. [30p] You vs grep ==== | ||
- | |||
- | === [15p] Task A - Your grep === | ||
- | |||
- | Write a python3 script that receives a list of files from //stdin// (one per line) and an arbitrary number of words as command line arguments. The script must search for these words in each file and output each line that contains at least one of them in this format: | ||
- | |||
- | <code> | ||
- | <file_name>:<line_number>:<line> | ||
- | </code> | ||
- | |||
- | Note that if a line contains more than one word, it still must appear only once in your output. | ||
- | Your program should be run like this: | ||
- | |||
- | <code bash> | ||
- | $ find . | ./my_grep.py import for sys | ||
- | </code> | ||
- | |||
- | === [10p] Task B - Compare to grep === | ||
- | |||
- | Use any commands that you have learned in this or any other lab to compare your implementation to that **grep** (at least one must be related to I/O). Test case: | ||
- | |||
- | <code bash> | ||
- | $ find /usr/include/ | ./my_grep.py int include define for | ||
- | $ grep -rn "int\|include\|define\|for" /usr/include/ | ||
- | $ grep -Frn -f <(echo "int\ninclude\ndefine\nfor") /usr/include | ||
- | </code> | ||
- | |||
- | What algorithm does **grep** use? How does **grep -F** differ from **fgrep**? | ||
- | |||
- | <solution> | ||
- | grep => Boyer Moore : good prefix/bad suffix heuristics, works with patterns (i.e. regex) \\ | ||
- | fgrep => Aho-Corasick : creates a prefix tree (trie) and uses it as a finite automaton to find matches, deprecated (use grep -F) \\ | ||
- | grep -F => Commentz-Walter : don't know much about this one; will read up on it | ||
- | </solution> | ||
- | |||
- | === [5p] Task C - Aho-Corasick === | ||
- | |||
- | Read up a bit about [[https://www.geeksforgeeks.org/aho-corasick-algorithm-pattern-searching/|Aho-Corasick]]. Explain to your assistant how it works. | ||