This shows you the differences between two versions of the page.
ep:labs:061 [2019/09/27 06:30] andreea.alistar |
ep:labs:061 [2023/10/07 21:54] (current) emilian.radoi [[10p] Feedback] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Lab 06 - Advanced plotting ====== | + | ====== Lab 06 - Advanced plotting (seaborn & pandas) ====== |
- | + | ||
- | == You’ve got the basics, now let’s unleash the power! == | + | |
===== Objectives ===== | ===== Objectives ===== | ||
- | * Conditional plotting | + | * Introduction to pandas |
- | * Time-based data when plotting in gnuplot | + | * Easy data manipulations with pandas |
- | * Advanced plotting concepts: Histograms, animations, heatmaps, three-dimensional plots | + | * Introduction to seaborn |
- | * Insertion of graphics in the .tex file | + | * More types of cool looking plots with seaborn |
- | + | * Apply what you learned on exploring COVID data for Romania | |
- | ===== Contents ===== | + | |
- | + | ||
- | {{page>:ep:labs:061:meta:nav&nofooter&noeditbutton}} | + | |
- | + | ||
- | ===== Introduction ===== | + | |
- | + | ||
- | A quick plot is enough when you are exploring a data set or a function. But when you present your results to others you need to prepare the plots much more carefully so that they give the information to someone who does not know all the background you do. | + | |
- | + | ||
- | **Using PostScript plots with LaTeX** | + | |
- | + | ||
- | - Make sure all the individual image files are properly trimmed EPS files. | + | |
- | - Create a LaTeX document. | + | |
- | - Process this document using LaTeX. | + | |
- | - Use the dvips utility with the -E flag to turn the resulting DVI file into Encapsulated PostScript. | + | |
- | + | ||
- | ===== Summary from the previous laboratory ===== | + | |
- | + | ||
- | <code> | + | |
- | scatter plot: | + | |
- | plot ’data.txt’ using 1:2 | + | |
- | plot ’data.txt’ using 1:2 with points | + | |
- | + | ||
- | example for the short format: | + | |
- | p ’data.txt’ u 1:2 w p pt 1 lt 2 lw 2 | + | |
- | notitle | + | |
- | + | ||
- | line plot: | + | |
- | plot ’data.txt’ using 1:2 with lines | + | |
- | + | ||
- | multiple data series: | + | |
- | use replot or separate by commas | + | |
- | plot ’data.txt’ using 1:2, ’data.csv’ using 1:3 | + | |
- | + | ||
- | set key: | + | |
- | plot ’data.txt’ using 1:2 title "key" | + | |
- | </code> | + | |
- | + | ||
- | ===== Tutorial ===== | + | |
- | + | ||
- | {{namespace>:ep:labs:061:contents:tutorial&nofooter&noeditbutton}} | + | |
- | + | ||
- | ===== Exercises ===== | + | |
- | + | ||
- | == Exercise 01. [10p] Tutorials == | + | |
- | + | ||
- | * Go through tutorials. | + | |
- | + | ||
- | == Exercise 02. [10p] Conditional plotting== | + | |
- | + | ||
- | <note warning> | + | |
- | Datafile: {{:ep:labs:conditional_plotting.txt|}} | + | |
- | </note> | + | |
- | + | ||
- | Using Gnuplot, generate two separate bar graphs for the following: | + | |
- | * **calories_consumed/km-ran**. | + | |
- | * **sugar_consumed/km-ran**. | + | |
- | * **ratio = too high? colour the ticks in red : colour the ticks in green**. | + | |
- | The ratio is considered to be high enough when $6/$4 > 1. This will help you spot the people who live less healthy. The graphs should be as complete as possible (title, axes names, etc.). | + | |
- | + | ||
- | <solution -hidden> | + | |
- | set multiplot | + | |
- | plot "data.txt" using 1:($6 / $4 > 1? $5 : 1/0) lt rgb "red" | + | |
- | plot "data.txt" using 1:($6 / $4 < 1? $5 : 1/0) lt rgb "green" | + | |
- | + | ||
- | File generated with: | + | |
- | + | ||
- | <code python> | + | |
- | import random | + | |
- | nume = ["Cazan", "Ionescu", "Popescu", "Mateescu", "Pop", "Stancu", "Almas", "Bucur", "Ghelbea", "Rusu", "Toncu", "Bogza", "Avram", "Nicolae", "Bibescu"] | + | |
- | prenume = ["Andrei", "George", "Adrian", "Alexandra", "Mircea", "Andreea", "Ioana", "Dana", "Iulia", "Horia", "Vlad"] | + | |
- | out = open("data.txt", "w") | + | |
- | out.write("Idx\tName\tSurname\tKm_ran\tCalories_consumed\tSugar_consumed\n") | + | |
- | for i in range(0, 1000): | + | |
- | numei = random.choice(nume) | + | |
- | prenumei = random.choice(prenume) | + | |
- | km = round(random.uniform(0, 40), 1) | + | |
- | kcal = random.randint(1200, 7000) | + | |
- | sugar = random.randint(0, 100) | + | |
- | out.write(str(i) + "\t" + numei + "\t"+ prenumei + "\t" + str(km) + "\t" + str(kcal) + "\t" + str(sugar) + "\n") | + | |
- | out.close() | + | |
- | </code> | + | |
- | </solution> | + | |
- | + | ||
- | == Exercise 03. [10p] Stats == | + | |
- | + | ||
- | <note warning> | + | |
- | Datafile: {{:ep:labs:health.txt|}} | + | |
- | </note> | + | |
- | + | ||
- | Use Gnuplot to generate the following graphs: | + | |
- | * Using the 'stats' command, find out the mean and standard deviation value for the “Temperature” and “Heart Rate” columns. | + | |
- | * Create a rectangle that contains all the data points considered to be in the average normal values (assume that the “normal” values should be in the interval [mean-stddev, mean+stddev]). | + | |
- | * Create a multiplot containing 3 plots using the “Temperature” and “Heart Rate” columns: one for all genders, one for males and one for females. | + | |
- | * The graphs should be as complete as possible (title, axes names, etc.) | + | |
- | + | ||
- | <solution -hidden> | + | |
- | <code bash> | + | |
- | reset #flush all variables | + | |
- | + | ||
- | set size 1, 1 | + | |
- | set multiplot layout 2,2 rowsfirst | + | |
- | + | ||
- | stats 'health.txt' using 2:4 nooutput | + | |
- | + | ||
- | set object 1 rect from STATS_mean_x -STATS_stddev_x,STATS_mean_y - STATS_stddev_y to STATS_mean_x + STATS_stddev_x, STATS_mean_y + STATS_stddev_y lw 2 | + | |
- | + | ||
- | set title 'All genders' | + | |
- | set xlabel 'Temperature(F)' | + | |
- | set ylabel 'Heart Rate' | + | |
- | unset key | + | |
- | plot 'health.txt' using 2:4 | + | |
- | + | ||
- | set title 'Male' | + | |
- | set xlabel 'Temperature(F)' | + | |
- | set ylabel 'Heart Rate' | + | |
- | unset key | + | |
- | plot 'health.txt' using (strcol(3) eq "male" ? $2: 1/0):4 | + | |
- | + | ||
- | set title 'Female' | + | |
- | set xlabel 'Temperature(F)' | + | |
- | set ylabel 'Heart Rate' | + | |
- | unset key | + | |
- | plot 'health.txt' using (strcol(3) eq "female" ? $2: 1/0):4 | + | |
- | </code> | + | |
- | </solution> | + | |
- | + | ||
- | == Exercise 04. [10p] Time-based data when plotting in gnuplot == | + | |
- | + | ||
- | <note warning> | + | |
- | Datafile: {{:ep:labs:time_data.txt|}} | + | |
- | </note> | + | |
- | + | ||
- | Using the code provided in “Tutorial 02. Time-based data when plotting in gnuplot”, use the histogram style, and format the xtic labels using strftime and timecolumn. | + | |
- | + | ||
- | <code> | + | |
- | set timefmt "%H:%S" | + | |
- | set style fill solid 0.6 border -1 | + | |
- | set style data histogram | + | |
- | set style histogram clustered gap 1 plot 'data.dat' using 2:xtic(strftime('%H', timecolumn(1))), \ '' using ($2*0.5), \ '' using ($2*0.7) | + | |
- | </code> | + | |
- | + | ||
- | == Exercise 05. [10p] Plot histograms == | + | |
- | + | ||
- | <note warning> | + | |
- | Datafile: {{:ep:labs:histograms.txt|}} | + | |
- | </note> | + | |
- | + | ||
- | == [5p] Task A - Multiple histograms == | + | |
- | + | ||
- | Using Gnuplot, create multiple histograms with '**set style histogram**' and '**boxes**'. | + | |
- | + | ||
- | == [5p] Task B - Bar graphs == | + | |
- | + | ||
- | Create a simple bar graph. Remember to make the lines solid. | + | |
- | * Style your bars differently (set a different color for every bar). | + | |
- | * Do multiple bars for each entry. | + | |
- | * Use a function to pick the colors you want. Remember to set width and fill. | + | |
- | + | ||
- | == Exercise 05. [10p] Animations == | + | |
- | + | ||
- | <note warning> | + | |
- | Datafile: {{:ep:labs:animations.txt|}} | + | |
- | </note> | + | |
- | + | ||
- | * Create a script that animates a trajectory. Set a circle in the centre as a green filled circle. | + | |
- | * **Hint:** Check the code from “Tutorial 04. Animations” and adjust. | + | |
- | + | ||
- | <solution -hidden> | + | |
- | <code bash> | + | |
- | reset | + | |
- | + | ||
- | # Plot setting | + | |
- | # ------------------ | + | |
- | set xrange [-1:1] | + | |
- | set yrange [-1:1] | + | |
- | + | ||
- | set xlabel "x" font ", 18" | + | |
- | set ylabel "y" font ", 18" | + | |
- | set ylabel "z" font ", 18" | + | |
- | + | ||
- | unset key | + | |
- | + | ||
- | set pointsize 2 # symbol size | + | |
- | set style line 2 lc rgb '#0060ad' pt 7 # circle | + | |
- | + | ||
- | set object circle at first 0,0 size scr 0.01 fillcolor rgb 'green' fillstyle solid | + | |
- | + | ||
- | do for [ii=1:3762] { | + | |
- | title = sprintf ("Step = %d",ii) | + | |
- | set title title | + | |
- | plot 'data0.txt' using 2:3 every ::ii::ii linestyle 2 | + | |
- | pause 0.02 | + | |
- | } | + | |
- | </code> | + | |
- | </solution> | + | |
- | + | ||
- | == Exercise 06. [20p] Heatmaps == | + | |
- | + | ||
- | <note warning> | + | |
- | Datafile: {{:ep:labs:heatmaps.txt|}} | + | |
- | </note> | + | |
- | + | ||
- | == [10p] Task A - With image/pm3d/dgrid3d == | + | |
- | + | ||
- | Using Gnuplot, create heatmaps using: | + | |
- | * **“with image”** | + | |
- | * **“pm3d/dgrid3d”** and **“splot”** | + | |
- | + | ||
- | == [10p] Task B - Interpolation == | + | |
- | + | ||
- | Create heatmap WITHOUT interpolation; | + | |
- | * As default, pm3d uses a color map which varies from black to yellow via blue and red. Change the pallete! | + | |
- | * Double the number of visible points. | + | |
- | * Question: Have Gnuplot choose the correct number of interpolation points by itself. | + | |
- | + | ||
- | <solution -hidden> | + | |
- | <code> | + | |
- | set pm3d map | + | |
- | splot ‘map_data.txt’ matrix | + | |
- | set palette rgbformulae 33,13,10 | + | |
- | OR | + | |
- | set palette negative | + | |
- | OR | + | |
- | set palette grey | + | |
- | </code> | + | |
- | or | ||
- | <code> | ||
- | set view map | ||
- | set yrange [0.4:0.8] | ||
- | set xrange [0.2:0.8] | ||
- | set dgrid3d 100,100,4 | ||
- | splot 'map_data.txt' u 1:2:3 w pm3d | ||
- | </code> | ||
- | </solution> | ||
- | <solution --hidden> | + | ===== Resources ===== |
- | <code> | + | |
- | set pm3d map interpolate 2,2 | + | |
- | splot ‘map_data.txt’ matrix | + | |
- | </code> | + | |
- | </solution> | + | |
- | <solution --hidden> | + | In this lab, we will study the basic API of pandas for easier data manipulations, and seaborn for some more advanced and visually appealing plots that are also easy to produce. |
- | <code> | + | |
- | set pm3d map interpolate 0,0 | + | |
- | </code> | + | |
- | </solution> | + | |
- | == Exercise 07. [20p] Latex == | + | For the exercises, you will explore the evolution of the COVID pandemic in Romania, using the information learned in this lab. |
- | <note warning> | + | For scientific computing we need an environment that is easy to use, and provides a couple of tools like manipulating data and visualizing results. We will use Google Colab, which comes with a variety of useful tools already installed. |
- | Datafile: {{:ep:labs:heat_map_data.txt|}} | + | |
- | </note> | + | |
- | == [10p] Task A - 2D maps == | + | Check out these cheetsheets for fast reference to the common libraries: |
- | Use Gnuplot to create three 2D maps in a single 3D graph. Export the result as a .pdf file (using gnuplottex package) and include also a \caption{Describe how you did the exercise}. Hint: You have to give the splot command 4 pieces of information: the x, y, and the z coordinate,and the value for the color. | + | **Cheat sheets:** |
- | <code> | + | - [[https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf)|python]] |
- | set view 55,110 | + | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf|numpy]] |
- | splot "heat_map_data.txt" matrix u 1:2:(-0.5):3 w image, \ | + | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf|matplotlib]] |
- | "" matrix u 1:(-0.5):2:3 w image, \ | + | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf|sklearn]] |
- | "" matrix u (-0.5):1:2:3 w image | + | - [[https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf|pandas]] |
- | </code> | + | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf|seaborn]] |
- | == [5p] Task B - Generate pdf == | + | <note>This lab is organized in a Jupyer Notebook hosted on Google Colab. You will find there some intuitions and applications for pandas and seaborn. Check out the Tasks section below.</note> |
- | Create myscript.tex and add the lines below. You should put in your 'begin{gnuplot}…end{gnuplot}' your solution for plotting. The main advantage for using gnuplottex is that you are allowed to use gnuplot directly inside the .tex file. | + | ===== Tasks ===== |
- | <code> | + | ==== Google Colab Notebook ==== |
- | \documentclass[a4paper]{article} | + | |
- | \usepackage{gnuplottex} | + | |
- | + | ||
- | \begin{document} | + | |
- | + | ||
- | \begin{gnuplot}[terminal=pdf,terminaloptions={font ",10" linewidth 3}] | + | |
- | plot sin(x), cos(x) | + | |
- | \end{gnuplot} | + | |
- | + | ||
- | \begin{gnuplot}[scale=0.8] | + | |
- | set grid | + | |
- | set title 'gnuplottex test $e^x$' | + | |
- | set ylabel '$y$' | + | |
- | set xlabel '$x$' | + | |
- | plot exp(x) with linespoints | + | |
- | \end{gnuplot} | + | |
- | + | ||
- | \end{document} | + | |
- | </code> | + | |
- | == [5p] Task C - Compile == | ||
- | Compile it! Your final result should look like this: myscript.pdf. | + | For this lab, we will use Google Colab for exploring pandas and seaborn. Please solve your tasks [[https://github.com/cosmaadrian/ml-environment/blob/master/EP_Plotting_II.ipynb|here]] by clicking "**Open in Colaboratory**". |
- | <code> | + | You can then export this python notebook as a PDF (**File -> Print**) and upload it to **Moodle**. |
- | #compile with | + | |
- | pdflatex --shell-escape myscript.tex | + | |
- | </code> | + | |
- | Observations: If gnuplottex is missing, here is gnuplottex.sty | + | ==== [10p] Feedback ==== |
- | === 05 - Feedback === | + | Please take a minute to fill in the **[[https://forms.gle/NpSRnoEh9NLYowFr5 | feedback form]]** for this lab. |
- | * Please take a minute to fill in the **[[https://docs.google.com/forms/d/e/1FAIpQLSfsMBl2EFu10jJG2qHEiSsR-qYr3wkzQPfDwjhChKnjRtDT_w/viewform | feedback form]]** for this lab. |