Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:05 [2019/10/27 16:59]
emilian.radoi
ep:labs:05 [2025/04/01 10:33] (current)
cezar.craciunoiu Move introduction to spoiler
Line 1: Line 1:
-====== Lab 05 - Plotting ​======+====== Lab 05 - I/O Monitoring ​======
  
 ===== Objectives ===== ===== Objectives =====
  
-  * Offer an introduction to Gnuplot +  * Offer an introduction to I/O monitoring. 
-  * Get you familiarised ​with basic plots in Gnuplot +  * Get you acquainted ​with a few linux standard monitoring tools and their outputs, for monitoring the impact of the I/Os on the system. 
 +  * Gives an intuition to be able to compare two relatively similar systems, but I/O different.
  
 ===== Contents ===== ===== Contents =====
Line 11: Line 11:
 {{page>:​ep:​labs:​05:​meta:​nav&​nofooter&​noeditbutton}} {{page>:​ep:​labs:​05:​meta:​nav&​nofooter&​noeditbutton}}
  
 +===== Proof of Work =====
  
-===== Gnuplot Introduction =====+Before you start, create a [[http://​docs.google.com/​|Google Doc]]. Here, you will add screenshots / code snippets / comments for each exercise. Whatever you decide to include, it must prove that you managed to solve the given task (so don't show just the output, but how you obtained it and what conclusion can be drawn from it). If you decide to complete the feedback for bonus points, include a screenshot with the form submission confirmation,​ but not with its contents.
  
-Gnuplot is free, command-driven,​ interactive,​ function ​and data plotting program. It can be downloaded at https://sourceforge.net/projects/gnuplot/. The official Gnuplot documentation can be found at http://​gnuplot.sourceforge.net/​documentation.html.+When done, export the document as //​pdf// ​and upload in the appropriate assignment on [[https://curs.upb.ro/2023/course/view.php?​id=4631#​section-4|moodle]]. The deadline is 23:55 on Friday.
  
-It was originally created to allow scientists and students to visualize mathematical functions and data interactively,​ but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications such as Octave.+===== Introduction =====
  
-The command language of Gnuplot is case sensitive, i.e. commands and function names written in lowercase are not the same as those written in capitals. All command names may be abbreviated as long as the abbreviation is not ambiguous. Any number of commands may appear on a line, separated by semicolons (;). Strings may be set off by either single or double quotes, although there are some subtle differences. See syntax (p. 44) and quotes (p. 44) for more details in the Gnuplot documentation (http://​gnuplot.sourceforge.net/​docs_5.0/​gnuplot.pdf).+<​spoiler>​
  
-Commands may extend over several input lines by ending each line but the last with a backslash ​(\). The backslash must be the last character on each line. The effect is as if the backslash ​and newline were not thereThat isno white space is implied, nor is a comment terminated+<note important>​Disk I/O subsystems are the slowest part of any Linux system. This is mainly due to their distance from the CPU and for the old HDD the fact that disk requires physics to work (rotation and seek). If the time taken to access disk as opposed to memory was converted into days and minutes, it is the difference between 7 days and 7 minutesAs a resultit is essential that the Linux kernel minimises the amount of I/O operations it generates on disk. </​note>​ 
 +The following subsections describe the different ways the kernel processes data I/O from disk to memory and back.
  
-For built-in help on any topic, type **help** followed by the name of the topic or **help ?** to get a menu of available topics.+**01. Reading and Writing Data - Memory Pages**
  
 +The Linux kernel breaks disk I/O into pages. The default page size on most Linux systems is **4K**. It reads and writes disk blocks in and out of memory in 4K page sizes. You can check the page size of your system by using the time command in verbose mode and searching for the page size: 
  
-===== Tutorial =====+//# getconf PAGESIZE//
  
 +**02. Major and Minor Page Faults**
  
-==== 01[10p] Basic Plotting ====+Linux, like most UNIX systems, uses a **virtual memory layer** that maps into physical address space. This mapping is **"​on-demand"​** in the sense that when a process starts, the kernel only maps what is required. When an application starts, the kernel searches the CPU caches and then physical memory. If the data does not exist in either, the kernel issues a **Major Page Fault** (MPF). A MPF is a request to the disk subsystem to retrieve pages of the disk and buffer them in RAM.
  
-Download ​the following sets of data: {{:​ep:​laboratoare:​data1.txt|}} and {{:​ep:​laboratoare:​data2.txt|}}, and update gnuplot: +Once memory pages are mapped into the buffer cache, the kernel will attempt to use these pages resulting in a **Minor Page Fault** (MnPF)A MnPF saves the kernel time by reusing a page in memory as opposed to placing it back on the disk.
-<​code>​ +
-$ sudo apt-get install gnuplot +
-</​code>​+
  
-Start gnuplot by using the command:+<​note>​ 
 +To find out how many MPF and MnPF occurred when an application starts, ​the time command ​can be used:
  
-<​code>​ +<​code ​bash
-$ gnuplot+# /​usr/​bin/​time –v evolution
 </​code>​ </​code>​
  
-The default terminal type is dependent on your environment. One of the recommended terminal types in terms of flexibility and functionality is //wxt enhanced//. To set the terminal type use:+As an alternative,​ a more elegant solution for a specific pid is:
  
-''​gnuplotset terminal wxt enhanced''​ +<code bash
- +# ps -o min_flt,maj_flt ​${pid}
-If setting the terminal to //wxt enhanced// doesn'​t workuse the default terminal. If it is desired to display the terminal parameters at any point, use: +
- +
-''​gnuplot>​ show terminal''​ +
- +
-Plot the sets of data found in //​data1.txt//​ and //​data2.txt//​ on the same graph: +
- +
-''​gnuplot>​ plot '​data1.txt',​ '​data2.txt'''​ +
- +
-In case the window plot does not appear, you might be missing gnuplot-x11. Try installing it: +
- +
-''​sudo apt-get install gnuplot-x11''​ +
- +
-Set a title for the plot: +
- +
-''​gnuplot>​ set title '​Example 1'''​ +
- +
-Plot the data again for the title to appear. Notice that Gnuplot automatically selects different colours for each dataset. Change the colour for both datasets to black. This can be done with the lc parameter (stands for line colour). The command below changes the colour of the data from data2.txt to black. +
- +
-''​gnuplot>​ plot '​data1.txt',​ '​data2.txt'​ lc rgb '​black'''​ +
- +
-The next step is to assign label names to the two axis: +
- +
-''​gnuplot>​ set xlabel 'X Label' \\  +
-gnuplot> set ylabel 'Y Label' \\  +
-gnuplot> plot '​data1.txt'​ lc rgb '​black',​ '​data2.txt'​ lc rgb '​black'''​ +
- +
-The top-right corner of the plot area displays the names of the files containing the data, along with the symbol type associated with each data file. This is usually not something you would want to have on a plot. To remove it use: +
- +
-''​gnuplot>​ unset key \\  +
-gnuplot> plot '​data1.txt'​ lc rgb '​black',​ '​data2.txt'​ lc rgb '​black'''​ +
- +
-If you want to set it back you can use the set key command: +
- +
-''​gnuplot>​ set key''​ +
- +
-The text of the keys can be changed when plotting the data like this: +
- +
-''​gnuplot>​ plot '​data1.txt'​ lc rgb '​black'​ title 'Data 1', '​data2.txt'​ lc rgb '​black'​ title 'Data 2'''​ +
- +
-This is how the plot should look like so far: +
- +
-{{ :​ep:​laboratoare:​ep_l3_p1.png?​500 |}+
- +
-It can be noticed that the data is obscuring the keys. In order to get the data off the keys, we can increase the Y axis to go all the way to 900. The X and Y ranges can be set as follows, using hard brackets and separating the low and high ranges by semicolon:​ +
- +
-''​gnuplot>​ plot [:] [:900] '​data1.txt'​ lc rgb '​black'​ title 'Data 1', '​data2.txt'​ lc rgb '​black'​ title 'Data 2'''​ +
- +
-There is a gap showing on the right-hand side of the graph. This can be eliminated by setting the high range for the X axis to the number of rows in the data files which is 1024: +
- +
-''​gnuplot>​ plot [:1024] [:900] '​data1.txt'​ lc rgb '​black'​ title 'Data 1', '​data2.txt'​ lc rgb '​black'​ title 'Data 2'''​ +
- +
-Saving the plot to an encapsulated postscript (eps is a vector graphic that can used in Latex documents) can be done as follows: +
- +
-''​gnuplot>​ set terminal postscript eps enhanced “Helvetica” 24 \\  +
-gnuplot> set output '​exercise1.eps'​ \\  +
-gnuplot> replot''​ +
- +
-Exit gnuplot, and notice that exercise1.eps was created. Open it and check how it looks. +
- +
- +
-==== 02. [10p] Fitting ==== +
- +
-This exercise is aiming to familiarise you with using Gnuplot to do fitting. +
-Start Gnuplot, change terminal to //wxt enhanced// and plot the data from {{:​ep:​laboratoare:​data3.txt|}}. +
-The aim here is to fit a straight line between 800 and 1500. Plot the data from 800 to 1500 for a better look. This will give you the section that you will be trying to fit on a straight line (use //w l// (//with line//) to connect the points). +
- +
-''​gnuplot>​ plot [800:1500] '​data3.txt'​ \\  +
-gnuplot> plot [800:1500] '​data3.txt'​ w l''​ +
- +
-You need to specify the straight line: +
- +
-''​gnuplot>​ f(x) = a + b*x''​ +
- +
-Fit the data between 800 and 1500 using the function that was just defined //f(x)//, by varying the parameters //a// and //b//: +
- +
-''​gnuplot>​ fit [800:1500] f(x) '​data3.txt'​ via a,​b''​ +
- +
-{{ :​ep:​laboratoare:​ep_l3_p2.png?​500 |}} +
- +
-The fit information shows the reduced chi-squared statistic which is used in the goodness of fit testing (https://​en.wikipedia.org/​wiki/​Reduced_chi-squared_statistic),​ along with the slope parameter (//b//) with its uncertainty and the uncertainty percentage, the offset parameter (//a//) with the uncertainty in the offset and the uncertainty percentage, and the correlation matrix of the fit parameters. +
- +
-Plot the fit on top of the initial data (use //lw// (//line width//) to set the thickness of the line): +
- +
-''​gnuplot>​ plot [800:1500] '​data3.txt'​ w l, f(x) lw 3''​ +
- +
-{{ :​ep:​laboratoare:​ep_l3_p3.png?​500 |}} +
- +
-Look again at the data from //​data3.txt//,​ and fit a polynomical line between 0 and 1300: +
- +
-''​gnuplot>​ f(x) = a+ b*x+ c*x*x \\  +
-gnuplot> fit [0:1300] f(x) '​data3.txt'​ via a,b,c \\  +
-gnuplot> plot [0:1300] '​data3.txt'​ w l, f(x) lw 3''​ +
- +
-{{ :​ep:​laboratoare:​ep_l3_p4.png?​500 |}} +
- +
-==== 03. [10p] Gnuplot Scripts ==== +
- +
-The benefit of using a script is that you do not have to retype everything every time you want to make a change or want to reproduce your plot. The following is an example of a Gnuplot script that plots two graphs on a single plot. +
- +
-<​code>​ +
-reset # flush all the variables +
- +
-set size 1,1 # use default pallet size (100% of width and height) +
-set multiplot ​  +
- +
-#Graph 1 +
-set size 0.5,1 #half the width and full height  +
-set origin 0,0 #x,y  +
-plot '​data2.txt'​ w l lw 0.5 +
- +
-#Graph 2 +
-set size 0.5,1  +
-set origin 0.5,0 +
-plot '​data3.txt'​ w l lw 0.5 +
- +
-unset multiplot+
 </​code>​ </​code>​
 +</​note>​
  
-Run the script:+**03. The File Buffer Cache**
  
-<​code>​ +The **file buffer cache** is used by the kernel to** minimise MPFs and maximise MnPFs**. As a system generates I/O over time, this buffer cache will continue to grow as the system will leave these pages in memory until memory gets low and the kernel needs to "​**free**"​ some of these pages for other uses. The result is that many system administrators see low amounts of free memory and become concerned when in reality, the system is just making good use of its caches ;-)
-$ gnuplot  +
-gnuplot> load '​script_name'​ +
-</code>+
  
-==== 04. [10p] Animations ==== 
  
-For very basic animations in Gnuplot you would need to set up a loop, decide what commands to give to Gnuplot and use the pipe utility to pipe that command into Gnuplot.+**04Types of Memory Pages**
  
-Example 1+There are **3** types of memory pages in the Linux kernel
-<​code>​ +  * **Read Pages** – Pages of data read in via disk (MPFthat are read only and backed on disk. These pages exist in the Buffer Cache and include **static files**, **binaries**,​ and **libraries** that do not change. The Kernel will continue to page these into memory as it needs them. If the system becomes short on memory, the kernel will "steal" these pages and place them back on the free list causing an application to have to MPF to bring them back in. 
-$ for ((i=-70; i<70; i++)); do echo -e "set sample 50000; set yrange [-40:40]; plot $i*sin(x)*cos(x) \n"; done | gnuplot +  * **Dirty Pages*– Pages of data that have been modified by the kernel while in memory. These pages need to be synced back to disk at some point by the pdflush daemon. In the event of a memory shortage, kswapd ​(along with pdflushwill write these pages to disk in order to make room in memory. 
-</​code>​+  ​**Anonymous Pages** – Pages of data that do belong to a process, but do not have any file or backing store associated with them. They can't be synchronised back to disk. In the event of a memory shortage, kswapd writes these to the swap device as temporary storage until more RAM is free ("swapping"​ pages).
  
-Example 2:+**05. Writing Data Pages Back to Disk**
  
-<​code>​ +Applications themselves may choose to write **dirty pages** back to disk immediately using the **fsync()** or **sync()** system calls. These system calls issue a direct request to the **I/O scheduler**. If an application does not invoke these system calls, the pdflush kernel daemon runs at periodic intervals and writes pages back to disk.
-$ for ((i=-100; i<100; i++)); do echo -e "set isosample 100; spl [:] [:] [-100:100] $i*(sin(sqrt(x**2+y**2))/sqrt(x**2+y**2)) \n"; done | gnuplot -persist +
-</code>+
  
 +**Monitoring I/O**
  
-===== Tasks =====+<note important>​Certain conditions occur on a system that may create I/O bottlenecks. These conditions may be identified by using a standard set of system monitoring tools. These tools include **top**, **vmstat**, **iostat**, and **sar**. There are some similarities between the outputs of these commands, but for the most part, each offers a unique set of output that provides a different aspect on performance. The following subsections describe conditions that cause **I/O bottlenecks**.</​note>​
  
 +**Calculating IOs Per Second**
  
-==== 05[20p] Gnuplot graphs ====+Every I/O __request__ to a disk takes a certain amount of time. This is due primarily to the fact that a //disk must spin// and //a head must seek//. The spinning of a disk is often referred to as "​**rotational delay**"​ (RD 8-))  and the moving of the head as a "​**disk seek**"​ (DS). The time it takes for each I/O request is calculated by __adding__ DS and RD. A disk's RD is fixed based on the RPM of the drive. An RD is considered half a revolution around a disk.
  
-Using the Gnuplot documentationimplement ​script that plots four graphsUse the data from {{:​ep:​laboratoare:​data4.txt|}} ​as follows: ​the first graph should plot columns 1 and 2, the second columns 1 and 3, the third one columns 1 and 4, and the fourth one should be a 3D graph plotting columns 1, 2 and 3. +Each time an application issues an I/Oit takes an average of 8MS to service that I/O on 10K RPM diskSince this is a fixed time, it is imperative that the disk be as efficient ​as possible with the time it will spend reading ​and writing to the diskThe amount of I/O requests is often measured ​in I/Os Per Second (IOPS)The 10K RPM disk has the ability ​to push 120 to 150 (burstIOPSTo measure ​the effectiveness ​of IOPSdivide ​the amount ​of IOPS by the amount ​of data read or written for each I/O.
-  * Use different colours for the data in each graph. +
-  * Remove ​the keys for each graph. +
-  * Give a title to each graph. +
-  * Give names to each of the axes: X or Y (or Z for the 3D graph). In the case of the 3D graph: Make the numbers on the axes readableand correct ​the position ​of the names of the axes if these are displayed over the axes numbers. +
-  * Make the script generate the .eps file containing your plot.+
  
-<note tip> +**Random vs Sequential I/O** 
- **Hint:** Consider ​the following commandsset key, xlabel, set title, set xtics, set terminal, set output, using. +The relevance of KB per I/O depends on the __workload__ of the system. There are two different types of workload categories on a systemsequential and random.
-</​note>​+
  
-<​solution ​-hidden>​ +**Sequential I/O** The **iostat** command provides information on IOPS and the amount of data processed during each I/O. Use the **–x** switch with **iostat** (//iostat –x 1//). **Sequential workloads** require large amounts of data to be read sequentially and at once. These include applications such as enterprise databases executing large queries and streaming media services capturing data. With sequential workloads, the KB per I/O ratio should be high. Sequential workload performance relies on the ability to move large amounts of data as fast as possible. If each I/O costs time, it is imperative to get as much data out of that I/O as possible.
-<​code>​+
  
-reset #flush all variables+**Random I/O** - Random access workloads do not depend as much on size of data. They depend primarily on the amount of IOPS a disk can push. Web and mail servers are examples of random access workloads. The I/O requests are rather small. Random access workload relies on how many requests can be processed at once. Therefore, the amount of IOPS the disk can push becomes crucial.
  
-set term postscript color eps enhanced +**When Virtual Memory Kills I/O**
-set output '​myplot.eps'​ +
-set size 1,1 #use default pallet size(100% of width and height) +
-set multiplot +
-unset key+
  
-#Graph 1 +If the system does not have enough **RAM** to accommodate all requests, it must start to use the **SWAP** deviceAs file system I/Oswrites to the SWAP device are just as costlyIf the system is extremely deprived of RAMit is possible that it will create a __paging storm__ to the SWAP diskIf the SWAP device is on the same file system as the data trying to be accessedthe system will enter into contention for the I/O pathsThis will cause a complete **performance breakdown** on the system. If pages can't be read or written to disk, they will stay in RAM longer. If they stay in RAM longer, the kernel will need to free the RAM. The problem is that the __I/O channels__ are so __clogged__ that nothing can be done. This inevitably leads to a __kernel panic and crash of the system__.
-set size 0.5,0.5 #half the width and height +
-set origin 0,0.5 #x,+
-set title '​First'​ +
-plot 'data4.txt' ​using 1:2  w l lw 0.5+
  
-#Graph 2 +The following **vmstat** output demonstrates a system under memory distressIt is writing data out to the swap device:
-set size 0.5,0.5  +
-set origin 0.5,0.5 +
-set xlabel '​X'​ +
-set ylabel '​Y'​ +
-set title '​Second'​ +
-plot '​data4.txt'​ using 1:3 w l lw 0.5+
  
-#Graph 3 +{{ :​ep:​laboratoare:ep2_poz1.png?550 |}}
-set size 0.5,0.5 +
-set origin 0,0 +
-set xlabel '​X'​ +
-set ylabel '​Y'​ +
-set title '​Third'​ +
-plot '​data4.txt'​ using 1:4 w l lw 0.5+
  
-#Graph 4 +<note tip>The previous output demonstrates a __large amount of read requests__ into memory (**bi**)The requests are so many that the system is short on memory (**free**)This is causing the __system to send blocks to the swap device__ (**so**) and the size of swap keeps growing (**swpd**)Also notice a large percentage of WIO time (**wa**)This indicates that the __CPU is starting to slow down__ because of I/O requestsFurthermore**id** represents the time spent idle and it is included in **wa** </​note>​
-set size 0.5,0.+
-set origin 0.5,0 +
-set view 60,15 +
-set xtics 0,1000 +
-set ytics 1.8,0.+
-set ztics 1,0.2 +
-set xlabel '​X'​ offset 0,-1 +
-set ylabel '​Y'​ +
-set zlabel '​Z'​ +
-set title '​Fourth'​ +
-splot '​data4.txt'​ using 1:2:3 w l lw 0.5 lc rgb '​black'​ +
-set xtics auto +
-set ytics auto +
-set ztics auto+
  
-unset multiplot+To see the effect the swapping to disk is having on the system, check the swap partition on the drive using **iostat**.
  
-</​code>​ +{{ :​ep:​laboratoare:​ep2_poz2.png?​650 |}}
-</​solution>​+
  
-==== 06[30p] Gnuplot bar graphs ====+<note tip>Both the swap device (///​dev/​sda1//​) and the file system device (///​dev/​sda3//​) are contending for I/OBoth have __high amounts of write requests per second__ (//w/s//) and __high wait time__ (//await//) to __low service time ratios__ (//​svctm//​). This indicates that there is **contention** between the two partitions, causing both to **underperform**.</​note>​
  
-Use Gnuplot and the data from {{:​ep:​labs:​ep_lab5_autodata.txt|autoData.txt}} to generate separate ​**bar** graphs for the following:​ +**Takeaways**
-  * The "//​MidPrice//"​ of all the "//​small//"​ cars. +
-  * The average fuel consumption (MPG - miles per gallon) for all the "//​large//"​ cars. +
-  * The "//​MaxPrice//"​ over the average fuel consumption for all "//​chevrolet//"​ and "//​ford//"​ cars.+
  
-The graphs should be as complete as possible (title, axes namesetc.) +<note important>​ 
- +  * Any time the **CPU is waiting** on I/Othe **disks are overloaded**
-<note tip> +  * Calculate the amount of **IOPS** your disks can sustain. 
-**Hint:** Gnuplot conditional plotting.+  * Determine whether your applications require **random** or **sequential** disk access. 
 +  * Monitor slow disks by comparing **wait times** and **service times**. 
 +  * Monitor the swap and file system partitions to make sure that **virtual memory** is not contending for **filesystem I/O**.
 </​note>​ </​note>​
  
-<​solution -hidden>​ +</spoiler>
-<​code>​ +
- +
-with boxes - pentru bar chart (set style fill solid - sa fie pline) +
- +
-reset +
- +
-set size 1, 1 +
-set multiplot layout 2,2 rowsfirst +
- +
-set title 'Graph Small Cars'​ +
-set xlabel '​Car'​ +
-set ylabel '​MidPrice'​ +
-unset key +
-plot "​data5.txt"​ using (strcol(4) eq "​small"​ ? $6 : 0) w l lw 0.5 lc rgb '​red'​ +
- +
-set title 'Graph Fuel Consumption'​ +
-set xlabel '​Car'​ +
-set ylabel '​Fuel'​ +
-unset key +
-plot "​data5.txt"​ using (strcol(4) eq "​large"​ ? ($8 + $9)/2 : 0) w l lw 0.5 lc rgb '​blue'​ +
- +
-set title 'Graph Avg' +
-set xlabel '​X'​ +
-set ylabel '​Y'​ +
-unset key +
-plot "​data5.txt"​ using (strcol(2) eq '​chevrolet'​ || strcol(2) eq "​ford"​ ? $7 : 0) w l lw 0.5 lc rgb '​green',​ \ +
-     "​data5.txt"​ using (strcol(2) eq '​chevrolet'​ || strcol(2) eq "​ford"​ ? ($8 + $9)/2 : 0) w l lw 0.5 lc rgb '​orange'​ +
- +
-unset multiplot +
-</​code>​ +
-</solution> +
- +
- +
  
 +===== Tasks =====
  
-==== 07. [10p] Feedback ====+{{namespace>:​ep:​labs:​05:​contents:​tasks&​nofooter&​noeditbutton}}
  
-Please take a minute to fill in the **[[https://​docs.google.com/​forms/​d/​e/​1FAIpQLSfsMBl2EFu10jJG2qHEiSsR-qYr3wkzQPfDwjhChKnjRtDT_w/​viewform | feedback form]]** for this lab.+===== References =====
  
 +  * These examples are from Darren Hoch’s [[http://​ufsdump.org/​papers/​oscon2009-linux-monitoring.pdf|Linux System and Performance Monitoring]].
ep/labs/05.1572188384.txt.gz · Last modified: 2019/10/27 16:59 by emilian.radoi
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0