This shows you the differences between two versions of the page.

ep:labs:05 [2019/10/27 16:59] emilian.radoi |
ep:labs:05 [2020/11/10 16:44] (current) ioan_adrian.cosma [Python Scientific Computing Resources] |
||
---|---|---|---|

Line 1: | Line 1: | ||

- | ====== Lab 05 - Plotting ====== | + | ====== Lab 05 - Plotting (Numpy & Matplotlib) ====== |

===== Objectives ===== | ===== Objectives ===== | ||

- | * Offer an introduction to Gnuplot | + | * Offer an introduction to Numpy & matplotlib |

- | * Get you familiarised with basic plots in Gnuplot | + | * Get you familiarised with the numpy API |

+ | * Understand basic plotting with matplotlib | ||

- | ===== Contents ===== | ||

- | {{page>:ep:labs:05:meta:nav&nofooter&noeditbutton}} | ||

- | ===== Gnuplot Introduction ===== | + | ===== Python Scientific Computing Resources ===== |

- | Gnuplot is a free, command-driven, interactive, function and data plotting program. It can be downloaded at https://sourceforge.net/projects/gnuplot/. The official Gnuplot documentation can be found at http://gnuplot.sourceforge.net/documentation.html. | + | In this lab, we will study a new library in python that offers fast, memory efficient manipulation of vectors, matrices and tensors: **numpy**. We will also study basic plotting of data using the most popular data visualization libraries in the python ecosystem: **matplotlib**. |

- | It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications such as Octave. | + | For scientific computing we need an environment that is easy to use, and provides a couple of tools like manipulating data and visualizing results. |

+ | Python is very easy to use, but the downside is that it's not fast at numerical computing. Luckily, we have very eficient libraries for all our use-cases. | ||

- | The command language of Gnuplot is case sensitive, i.e. commands and function names written in lowercase are not the same as those written in capitals. All command names may be abbreviated as long as the abbreviation is not ambiguous. Any number of commands may appear on a line, separated by semicolons (;). Strings may be set off by either single or double quotes, although there are some subtle differences. See syntax (p. 44) and quotes (p. 44) for more details in the Gnuplot documentation (http://gnuplot.sourceforge.net/docs_5.0/gnuplot.pdf). | + | **Core computing libraries** |

- | Commands may extend over several input lines by ending each line but the last with a backslash (\). The backslash must be the last character on each line. The effect is as if the backslash and newline were not there. That is, no white space is implied, nor is a comment terminated. | + | * numpy and scipy: scientific computing |

+ | * matplotlib: plotting library | ||

- | For built-in help on any topic, type **help** followed by the name of the topic or **help ?** to get a menu of available topics. | + | **Machine Learning** |

+ | * sklearn: machine learning toolkit | ||

+ | * tensorflow: deep learning framework developed by google | ||

+ | * keras: deep learning framework on top of `tensorflow` for easier implementation | ||

+ | * pytorch: deep learning framework developed by facebook | ||

- | ===== Tutorial ===== | ||

+ | **Statistics and data analysis** | ||

- | ==== 01. [10p] Basic Plotting ==== | + | * pandas: very popular data analysis library |

+ | * statsmodels: statistics | ||

- | Download the following sets of data: {{:ep:laboratoare:data1.txt|}} and {{:ep:laboratoare:data2.txt|}}, and update gnuplot: | + | We also have advanced interactive environments: |

- | <code> | + | |

- | $ sudo apt-get install gnuplot | + | |

- | </code> | + | |

- | Start gnuplot by using the command: | + | * IPython: advanced python console |

+ | * Jupyter: notebooks in the browser | ||

- | <code> | + | There are many more scientific libraries available. |

- | $ gnuplot | + | |

- | </code> | + | |

- | The default terminal type is dependent on your environment. One of the recommended terminal types in terms of flexibility and functionality is //wxt enhanced//. To set the terminal type use: | ||

- | ''gnuplot> set terminal wxt enhanced'' | + | Check out these cheetsheets for fast reference to the common libraries: |

- | If setting the terminal to //wxt enhanced// doesn't work, use the default terminal. If it is desired to display the terminal parameters at any point, use: | + | **Cheat sheets:** |

- | ''gnuplot> show terminal'' | + | - [[https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf)|python]] |

+ | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf|numpy]] | ||

+ | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf|matplotlib]] | ||

+ | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf|sklearn]] | ||

+ | - [[https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf|pandas]] | ||

- | Plot the sets of data found in //data1.txt// and //data2.txt// on the same graph: | + | **Other:** |

- | ''gnuplot> plot 'data1.txt', 'data2.txt''' | + | - [[https://stanford.edu/~shervine/teaching/cs-229/refresher-probabilities-statistics|Probabilities & Stats Refresher]] |

+ | - [[https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus|Algebra]] | ||

- | In case the window plot does not appear, you might be missing gnuplot-x11. Try installing it: | ||

- | |||

- | ''$ sudo apt-get install gnuplot-x11'' | ||

- | |||

- | Set a title for the plot: | ||

- | |||

- | ''gnuplot> set title 'Example 1''' | ||

- | |||

- | Plot the data again for the title to appear. Notice that Gnuplot automatically selects different colours for each dataset. Change the colour for both datasets to black. This can be done with the lc parameter (stands for line colour). The command below changes the colour of the data from data2.txt to black. | ||

- | |||

- | ''gnuplot> plot 'data1.txt', 'data2.txt' lc rgb 'black''' | ||

- | |||

- | The next step is to assign label names to the two axis: | ||

- | |||

- | ''gnuplot> set xlabel 'X Label' \\ | ||

- | gnuplot> set ylabel 'Y Label' \\ | ||

- | gnuplot> plot 'data1.txt' lc rgb 'black', 'data2.txt' lc rgb 'black''' | ||

- | |||

- | The top-right corner of the plot area displays the names of the files containing the data, along with the symbol type associated with each data file. This is usually not something you would want to have on a plot. To remove it use: | ||

- | |||

- | ''gnuplot> unset key \\ | ||

- | gnuplot> plot 'data1.txt' lc rgb 'black', 'data2.txt' lc rgb 'black''' | ||

- | |||

- | If you want to set it back you can use the set key command: | ||

- | |||

- | ''gnuplot> set key'' | ||

- | |||

- | The text of the keys can be changed when plotting the data like this: | ||

- | |||

- | ''gnuplot> plot 'data1.txt' lc rgb 'black' title 'Data 1', 'data2.txt' lc rgb 'black' title 'Data 2''' | ||

- | |||

- | This is how the plot should look like so far: | ||

- | |||

- | {{ :ep:laboratoare:ep_l3_p1.png?500 |}} | ||

- | |||

- | It can be noticed that the data is obscuring the keys. In order to get the data off the keys, we can increase the Y axis to go all the way to 900. The X and Y ranges can be set as follows, using hard brackets and separating the low and high ranges by semicolon: | ||

- | |||

- | ''gnuplot> plot [:] [:900] 'data1.txt' lc rgb 'black' title 'Data 1', 'data2.txt' lc rgb 'black' title 'Data 2''' | ||

- | |||

- | There is a gap showing on the right-hand side of the graph. This can be eliminated by setting the high range for the X axis to the number of rows in the data files which is 1024: | ||

- | |||

- | ''gnuplot> plot [:1024] [:900] 'data1.txt' lc rgb 'black' title 'Data 1', 'data2.txt' lc rgb 'black' title 'Data 2''' | ||

- | |||

- | Saving the plot to an encapsulated postscript (eps is a vector graphic that can used in Latex documents) can be done as follows: | ||

- | |||

- | ''gnuplot> set terminal postscript eps enhanced “Helvetica” 24 \\ | ||

- | gnuplot> set output 'exercise1.eps' \\ | ||

- | gnuplot> replot'' | ||

- | |||

- | Exit gnuplot, and notice that exercise1.eps was created. Open it and check how it looks. | ||

- | |||

- | |||

- | ==== 02. [10p] Fitting ==== | ||

- | |||

- | This exercise is aiming to familiarise you with using Gnuplot to do fitting. | ||

- | Start Gnuplot, change terminal to //wxt enhanced// and plot the data from {{:ep:laboratoare:data3.txt|}}. | ||

- | The aim here is to fit a straight line between 800 and 1500. Plot the data from 800 to 1500 for a better look. This will give you the section that you will be trying to fit on a straight line (use //w l// (//with line//) to connect the points). | ||

- | |||

- | ''gnuplot> plot [800:1500] 'data3.txt' \\ | ||

- | gnuplot> plot [800:1500] 'data3.txt' w l'' | ||

- | |||

- | You need to specify the straight line: | ||

- | |||

- | ''gnuplot> f(x) = a + b*x'' | ||

- | |||

- | Fit the data between 800 and 1500 using the function that was just defined //f(x)//, by varying the parameters //a// and //b//: | ||

- | |||

- | ''gnuplot> fit [800:1500] f(x) 'data3.txt' via a,b'' | ||

- | |||

- | {{ :ep:laboratoare:ep_l3_p2.png?500 |}} | ||

- | |||

- | The fit information shows the reduced chi-squared statistic which is used in the goodness of fit testing (https://en.wikipedia.org/wiki/Reduced_chi-squared_statistic), along with the slope parameter (//b//) with its uncertainty and the uncertainty percentage, the offset parameter (//a//) with the uncertainty in the offset and the uncertainty percentage, and the correlation matrix of the fit parameters. | ||

- | |||

- | Plot the fit on top of the initial data (use //lw// (//line width//) to set the thickness of the line): | ||

- | |||

- | ''gnuplot> plot [800:1500] 'data3.txt' w l, f(x) lw 3'' | ||

- | |||

- | {{ :ep:laboratoare:ep_l3_p3.png?500 |}} | ||

- | |||

- | Look again at the data from //data3.txt//, and fit a polynomical line between 0 and 1300: | ||

- | |||

- | ''gnuplot> f(x) = a+ b*x+ c*x*x \\ | ||

- | gnuplot> fit [0:1300] f(x) 'data3.txt' via a,b,c \\ | ||

- | gnuplot> plot [0:1300] 'data3.txt' w l, f(x) lw 3'' | ||

- | |||

- | {{ :ep:laboratoare:ep_l3_p4.png?500 |}} | ||

- | |||

- | ==== 03. [10p] Gnuplot Scripts ==== | ||

- | |||

- | The benefit of using a script is that you do not have to retype everything every time you want to make a change or want to reproduce your plot. The following is an example of a Gnuplot script that plots two graphs on a single plot. | ||

- | |||

- | <code> | ||

- | reset # flush all the variables | ||

- | |||

- | set size 1,1 # use default pallet size (100% of width and height) | ||

- | set multiplot | ||

- | |||

- | #Graph 1 | ||

- | set size 0.5,1 #half the width and full height | ||

- | set origin 0,0 #x,y | ||

- | plot 'data2.txt' w l lw 0.5 | ||

- | |||

- | #Graph 2 | ||

- | set size 0.5,1 | ||

- | set origin 0.5,0 | ||

- | plot 'data3.txt' w l lw 0.5 | ||

- | |||

- | unset multiplot | ||

- | </code> | ||

- | |||

- | Run the script: | ||

- | |||

- | <code> | ||

- | $ gnuplot | ||

- | gnuplot> load 'script_name' | ||

- | </code> | ||

- | |||

- | ==== 04. [10p] Animations ==== | ||

- | |||

- | For very basic animations in Gnuplot you would need to set up a loop, decide what commands to give to Gnuplot and use the pipe utility to pipe that command into Gnuplot. | ||

- | |||

- | Example 1: | ||

- | <code> | ||

- | $ for ((i=-70; i<70; i++)); do echo -e "set sample 50000; set yrange [-40:40]; plot $i*sin(x)*cos(x) \n"; done | gnuplot | ||

- | </code> | ||

- | |||

- | Example 2: | ||

- | |||

- | <code> | ||

- | $ for ((i=-100; i<100; i++)); do echo -e "set isosample 100; spl [:] [:] [-100:100] $i*(sin(sqrt(x**2+y**2))/sqrt(x**2+y**2)) \n"; done | gnuplot -persist | ||

- | </code> | ||

+ | <note>This lab is organized in a Jupyer Notebook hosted on Google Colab. You will find there some intuitions and applications for numpy and matplotlib. Check out the Tasks section below.</note> | ||

===== Tasks ===== | ===== Tasks ===== | ||

- | | + | {{namespace>:ep:labs:05:contents:tasks&nofooter&noeditbutton}} |

- | ==== 05. [20p] Gnuplot graphs ==== | + | |

- | | + | |

- | Using the Gnuplot documentation, implement a script that plots four graphs. Use the data from {{:ep:laboratoare:data4.txt|}} as follows: the first graph should plot columns 1 and 2, the second columns 1 and 3, the third one columns 1 and 4, and the fourth one should be a 3D graph plotting columns 1, 2 and 3. | + | |

- | * Use different colours for the data in each graph. | + | |

- | * Remove the keys for each graph. | + | |

- | * Give a title to each graph. | + | |

- | * Give names to each of the axes: X or Y (or Z for the 3D graph). In the case of the 3D graph: Make the numbers on the axes readable, and correct the position of the names of the axes if these are displayed over the axes numbers. | + | |

- | * Make the script generate the .eps file containing your plot. | + | |

- | | + | |

- | <note tip> | + | |

- | **Hint:** Consider the following commands: set key, xlabel, set title, set xtics, set terminal, set output, using. | + | |

- | </note> | + | |

- | | + | |

- | <solution -hidden> | + | |

- | <code> | + | |

- | | + | |

- | reset #flush all variables | + | |

- | | + | |

- | set term postscript color eps enhanced | + | |

- | set output 'myplot.eps' | + | |

- | set size 1,1 #use default pallet size(100% of width and height) | + | |

- | set multiplot | + | |

- | unset key | + | |

- | | + | |

- | #Graph 1 | + | |

- | set size 0.5,0.5 #half the width and height | + | |

- | set origin 0,0.5 #x,y | + | |

- | set title 'First' | + | |

- | plot 'data4.txt' using 1:2 w l lw 0.5 | + | |

- | | + | |

- | #Graph 2 | + | |

- | set size 0.5,0.5 | + | |

- | set origin 0.5,0.5 | + | |

- | set xlabel 'X' | + | |

- | set ylabel 'Y' | + | |

- | set title 'Second' | + | |

- | plot 'data4.txt' using 1:3 w l lw 0.5 | + | |

- | | + | |

- | #Graph 3 | + | |

- | set size 0.5,0.5 | + | |

- | set origin 0,0 | + | |

- | set xlabel 'X' | + | |

- | set ylabel 'Y' | + | |

- | set title 'Third' | + | |

- | plot 'data4.txt' using 1:4 w l lw 0.5 | + | |

- | | + | |

- | #Graph 4 | + | |

- | set size 0.5,0.5 | + | |

- | set origin 0.5,0 | + | |

- | set view 60,15 | + | |

- | set xtics 0,1000 | + | |

- | set ytics 1.8,0.3 | + | |

- | set ztics 1,0.2 | + | |

- | set xlabel 'X' offset 0,-1 | + | |

- | set ylabel 'Y' | + | |

- | set zlabel 'Z' | + | |

- | set title 'Fourth' | + | |

- | splot 'data4.txt' using 1:2:3 w l lw 0.5 lc rgb 'black' | + | |

- | set xtics auto | + | |

- | set ytics auto | + | |

- | set ztics auto | + | |

- | | + | |

- | unset multiplot | + | |

- | | + | |

- | </code> | + | |

- | </solution> | + | |

- | | + | |

- | ==== 06. [30p] Gnuplot bar graphs ==== | + | |

- | | + | |

- | Use Gnuplot and the data from {{:ep:labs:ep_lab5_autodata.txt|autoData.txt}} to generate separate **bar** graphs for the following: | + | |

- | * The "//MidPrice//" of all the "//small//" cars. | + | |

- | * The average fuel consumption (MPG - miles per gallon) for all the "//large//" cars. | + | |

- | * The "//MaxPrice//" over the average fuel consumption for all "//chevrolet//" and "//ford//" cars. | + | |

- | | + | |

- | The graphs should be as complete as possible (title, axes names, etc.) | + | |

- | | + | |

- | <note tip> | + | |

- | **Hint:** Gnuplot conditional plotting. | + | |

- | </note> | + | |

- | | + | |

- | <solution -hidden> | + | |

- | <code> | + | |

- | | + | |

- | with boxes - pentru bar chart (set style fill solid - sa fie pline) | + | |

- | | + | |

- | reset | + | |

- | | + | |

- | set size 1, 1 | + | |

- | set multiplot layout 2,2 rowsfirst | + | |

- | | + | |

- | set title 'Graph Small Cars' | + | |

- | set xlabel 'Car' | + | |

- | set ylabel 'MidPrice' | + | |

- | unset key | + | |

- | plot "data5.txt" using (strcol(4) eq "small" ? $6 : 0) w l lw 0.5 lc rgb 'red' | + | |

- | | + | |

- | set title 'Graph Fuel Consumption' | + | |

- | set xlabel 'Car' | + | |

- | set ylabel 'Fuel' | + | |

- | unset key | + | |

- | plot "data5.txt" using (strcol(4) eq "large" ? ($8 + $9)/2 : 0) w l lw 0.5 lc rgb 'blue' | + | |

- | | + | |

- | set title 'Graph Avg' | + | |

- | set xlabel 'X' | + | |

- | set ylabel 'Y' | + | |

- | unset key | + | |

- | plot "data5.txt" using (strcol(2) eq 'chevrolet' || strcol(2) eq "ford" ? $7 : 0) w l lw 0.5 lc rgb 'green', \ | + | |

- | "data5.txt" using (strcol(2) eq 'chevrolet' || strcol(2) eq "ford" ? ($8 + $9)/2 : 0) w l lw 0.5 lc rgb 'orange' | + | |

- | | + | |

- | unset multiplot | + | |

- | </code> | + | |

- | </solution> | + | |

- | | + | |

- | ==== 07. [10p] Feedback ==== | ||

- | Please take a minute to fill in the **[[https://docs.google.com/forms/d/e/1FAIpQLSfsMBl2EFu10jJG2qHEiSsR-qYr3wkzQPfDwjhChKnjRtDT_w/viewform | feedback form]]** for this lab. | ||