Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:03 [2020/08/24 22:06]
gheorghe.petre2608 [02. Major and Minor Page Faults]
ep:labs:03 [2023/10/21 10:27] (current)
andrei.mirciu [Proof of Work]
Line 1: Line 1:
 ====== Lab 03 - I/O Monitoring (Linux) ====== ====== Lab 03 - I/O Monitoring (Linux) ======
- 
-**"​Every thing is a file"​**,​ is a very famous **Linux** philosophy. There is a reason for this philosophy. 
- 
-It's because, Linux operating system considers and works with most of its devices, by the same way a file is opened or closed. 
-  * Block devices (Hard-disks,​ Compact Disk, Floppy, Flash Memory) 
-  * Character devices or serial devices (Mouse, keyboard) 
-  * Network Devices 
- 
- 
  
 ===== Objectives ===== ===== Objectives =====
Line 15: Line 6:
   * Get you acquainted with a few linux standard monitoring tools and their outputs, for monitoring the impact of the I/Os on the system.   * Get you acquainted with a few linux standard monitoring tools and their outputs, for monitoring the impact of the I/Os on the system.
   * Gives an intuition to be able to compare two relatively similar systems, but I/O different.   * Gives an intuition to be able to compare two relatively similar systems, but I/O different.
- 
- 
  
 ===== Contents ===== ===== Contents =====
Line 22: Line 11:
 {{page>:​ep:​labs:​03:​meta:​nav&​nofooter&​noeditbutton}} {{page>:​ep:​labs:​03:​meta:​nav&​nofooter&​noeditbutton}}
  
 +===== Proof of Work =====
 +
 +Before you start, create a [[http://​docs.google.com/​|Google Doc]]. Here, you will add screenshots / code snippets / comments for each exercise. Whatever you decide to include, it must prove that you managed to solve the given task (so don't show just the output, but how you obtained it and what conclusion can be drawn from it). If you decide to complete the feedback for bonus points, include a screenshot with the form submission confirmation,​ but not with its contents.
 +
 +When done, export the document as a //pdf// and upload in the appropriate assignment on [[https://​curs.upb.ro/​2023/​course/​view.php?​id=4631#​section-4|moodle]]. The deadline is 23:55 on Friday.
  
 ===== Introduction ===== ===== Introduction =====
Line 29: Line 23:
  
 ==== 01. Reading and Writing Data - Memory Pages ==== ==== 01. Reading and Writing Data - Memory Pages ====
-<​note>​The Linux kernel breaks disk I/O into pages. The default page size on most Linux systems is **4K**. It reads and writes disk blocks in and out of memory in 4K page sizes. You can check the page size of your system by using the time command in verbose mode and searching for the page size: +The Linux kernel breaks disk I/O into pages. The default page size on most Linux systems is **4K**. It reads and writes disk blocks in and out of memory in 4K page sizes. You can check the page size of your system by using the time command in verbose mode and searching for the page size: 
  
 //# getconf PAGESIZE// //# getconf PAGESIZE//
-</​note>​ 
  
 ==== 02. Major and Minor Page Faults ==== ==== 02. Major and Minor Page Faults ====
- +Linux, like most UNIX systems, uses a **virtual memory layer** that maps into physical address space. This mapping is **"​on-demand"​** in the sense that when a process starts, the kernel only maps what is required. When an application starts, the kernel searches the CPU caches and then physical memory. If the data does not exist in either, the kernel issues a **Major Page Fault** (MPF). A MPF is a request to the disk subsystem to retrieve pages of the disk and buffer them in RAM.
-<​note>​Linux, like most UNIX systems, uses a **virtual memory layer** that maps into physical address space. This mapping is **"​on-demand"​** in the sense that when a process starts, the kernel only maps what is required. When an application starts, the kernel searches the CPU caches and then physical memory. If the data does not exist in either, the kernel issues a **Major Page Fault** (MPF). A MPF is a request to the disk subsystem to retrieve pages of the disk and buffer them in RAM.+
  
 Once memory pages are mapped into the buffer cache, the kernel will attempt to use these pages resulting in a **Minor Page Fault** (MnPF). A MnPF saves the kernel time by reusing a page in memory as opposed to placing it back on the disk. Once memory pages are mapped into the buffer cache, the kernel will attempt to use these pages resulting in a **Minor Page Fault** (MnPF). A MnPF saves the kernel time by reusing a page in memory as opposed to placing it back on the disk.
  
 +<​note>​
 To find out how many MPF and MnPF occurred when an application starts, the time command can be used: To find out how many MPF and MnPF occurred when an application starts, the time command can be used:
  
-<​code>​+<​code ​bash>
 # /​usr/​bin/​time –v evolution # /​usr/​bin/​time –v evolution
 </​code>​ </​code>​
  
-As an alternative,​ a more elegant solution for a specific ​ping is:+As an alternative,​ a more elegant solution for a specific ​pid is:
  
-<​code>​+<​code ​bash>
 # ps -o min_flt,​maj_flt ${pid} # ps -o min_flt,​maj_flt ${pid}
 </​code>​ </​code>​
Line 54: Line 47:
  
 ==== 03. The File Buffer Cache ==== ==== 03. The File Buffer Cache ====
-<​note>​+
 The **file buffer cache** is used by the kernel to** minimise MPFs and maximise MnPFs**. As a system generates I/O over time, this buffer cache will continue to grow as the system will leave these pages in memory until memory gets low and the kernel needs to "​**free**"​ some of these pages for other uses. The result is that many system administrators see low amounts of free memory and become concerned when in reality, the system is just making good use of its caches ;-) The **file buffer cache** is used by the kernel to** minimise MPFs and maximise MnPFs**. As a system generates I/O over time, this buffer cache will continue to grow as the system will leave these pages in memory until memory gets low and the kernel needs to "​**free**"​ some of these pages for other uses. The result is that many system administrators see low amounts of free memory and become concerned when in reality, the system is just making good use of its caches ;-)
  
-</​note>​+
 ==== 04. Types of Memory Pages ==== ==== 04. Types of Memory Pages ====
-<​note>​+
 There are **3** types of memory pages in the Linux kernel: There are **3** types of memory pages in the Linux kernel:
   * **Read Pages** – Pages of data read in via disk (MPF) that are read only and backed on disk. These pages exist in the Buffer Cache and include **static files**, **binaries**,​ and **libraries** that do not change. The Kernel will continue to page these into memory as it needs them. If the system becomes short on memory, the kernel will "​steal"​ these pages and place them back on the free list causing an application to have to MPF to bring them back in.   * **Read Pages** – Pages of data read in via disk (MPF) that are read only and backed on disk. These pages exist in the Buffer Cache and include **static files**, **binaries**,​ and **libraries** that do not change. The Kernel will continue to page these into memory as it needs them. If the system becomes short on memory, the kernel will "​steal"​ these pages and place them back on the free list causing an application to have to MPF to bring them back in.
   * **Dirty Pages** – Pages of data that have been modified by the kernel while in memory. These pages need to be synced back to disk at some point by the pdflush daemon. In the event of a memory shortage, kswapd (along with pdflush) will write these pages to disk in order to make room in memory.   * **Dirty Pages** – Pages of data that have been modified by the kernel while in memory. These pages need to be synced back to disk at some point by the pdflush daemon. In the event of a memory shortage, kswapd (along with pdflush) will write these pages to disk in order to make room in memory.
   * **Anonymous Pages** – Pages of data that do belong to a process, but do not have any file or backing store associated with them. They can't be synchronised back to disk. In the event of a memory shortage, kswapd writes these to the swap device as temporary storage until more RAM is free ("​swapping"​ pages).   * **Anonymous Pages** – Pages of data that do belong to a process, but do not have any file or backing store associated with them. They can't be synchronised back to disk. In the event of a memory shortage, kswapd writes these to the swap device as temporary storage until more RAM is free ("​swapping"​ pages).
-</​note>​+
 ==== 05. Writing Data Pages Back to Disk ==== ==== 05. Writing Data Pages Back to Disk ====
  
-<​note>​+
 Applications themselves may choose to write **dirty pages** back to disk immediately using the **fsync()** or **sync()** system calls. These system calls issue a direct request to the **I/O scheduler**. If an application does not invoke these system calls, the pdflush kernel daemon runs at periodic intervals and writes pages back to disk. Applications themselves may choose to write **dirty pages** back to disk immediately using the **fsync()** or **sync()** system calls. These system calls issue a direct request to the **I/O scheduler**. If an application does not invoke these system calls, the pdflush kernel daemon runs at periodic intervals and writes pages back to disk.
-</​note>​+
 ===== Monitoring I/O ===== ===== Monitoring I/O =====
  
Line 76: Line 69:
 === Calculating IOs Per Second === === Calculating IOs Per Second ===
  
-<​note>​Every I/O __request__ to a disk takes a certain amount of time. This is due primarily to the fact that a //disk must spin// and //a head must seek//. The spinning of a disk is often referred to as "​**rotational delay**"​ (RD 8-))  and the moving of the head as a "​**disk seek**"​ (DS). The time it takes for each I/O request is calculated by __adding__ DS and RD. A disk's RD is fixed based on the RPM of the drive. An RD is considered half a revolution around a disk.+Every I/O __request__ to a disk takes a certain amount of time. This is due primarily to the fact that a //disk must spin// and //a head must seek//. The spinning of a disk is often referred to as "​**rotational delay**"​ (RD 8-))  and the moving of the head as a "​**disk seek**"​ (DS). The time it takes for each I/O request is calculated by __adding__ DS and RD. A disk's RD is fixed based on the RPM of the drive. An RD is considered half a revolution around a disk.
  
-Each time an application issues an I/O, it takes an average of 8MS to service that I/O on a 10K RPM disk. Since this is a fixed time, it is imperative that the disk be as efficient as possible with the time it will spend reading and writing to the disk. The amount of I/O requests is often measured in I/Os Per Second (IOPS). The 10K RPM disk has the ability to push 120 to 150 (burst) IOPS. To measure the effectiveness of IOPS, divide the amount of IOPS by the amount of data read or written for each I/O.</​note>​+Each time an application issues an I/O, it takes an average of 8MS to service that I/O on a 10K RPM disk. Since this is a fixed time, it is imperative that the disk be as efficient as possible with the time it will spend reading and writing to the disk. The amount of I/O requests is often measured in I/Os Per Second (IOPS). The 10K RPM disk has the ability to push 120 to 150 (burst) IOPS. To measure the effectiveness of IOPS, divide the amount of IOPS by the amount of data read or written for each I/O.
  
 === Random vs Sequential I/O === === Random vs Sequential I/O ===
- +The relevance of KB per I/O depends on the __workload__ of the system. There are two different types of workload categories on a system: sequential and random.
-<​note>​The relevance of KB per I/O depends on the __workload__ of the system. There are two different types of workload categories on a system: sequential and random.+
  
 **Sequential I/O** - The **iostat** command provides information on IOPS and the amount of data processed during each I/O. Use the **–x** switch with **iostat** (//iostat –x 1//). **Sequential workloads** require large amounts of data to be read sequentially and at once. These include applications such as enterprise databases executing large queries and streaming media services capturing data. With sequential workloads, the KB per I/O ratio should be high. Sequential workload performance relies on the ability to move large amounts of data as fast as possible. If each I/O costs time, it is imperative to get as much data out of that I/O as possible. **Sequential I/O** - The **iostat** command provides information on IOPS and the amount of data processed during each I/O. Use the **–x** switch with **iostat** (//iostat –x 1//). **Sequential workloads** require large amounts of data to be read sequentially and at once. These include applications such as enterprise databases executing large queries and streaming media services capturing data. With sequential workloads, the KB per I/O ratio should be high. Sequential workload performance relies on the ability to move large amounts of data as fast as possible. If each I/O costs time, it is imperative to get as much data out of that I/O as possible.
  
-**Random I/O** - Random access workloads do not depend as much on size of data. They depend primarily on the amount of IOPS a disk can push. Web and mail servers are examples of random access workloads. The I/O requests are rather small. Random access workload relies on how many requests can be processed at once. Therefore, the amount of IOPS the disk can push becomes crucial.</​note>​+**Random I/O** - Random access workloads do not depend as much on size of data. They depend primarily on the amount of IOPS a disk can push. Web and mail servers are examples of random access workloads. The I/O requests are rather small. Random access workload relies on how many requests can be processed at once. Therefore, the amount of IOPS the disk can push becomes crucial.
  
 === When Virtual Memory Kills I/O === === When Virtual Memory Kills I/O ===
  
-<​note>​If the system does not have enough **RAM** to accommodate all requests, it must start to use the **SWAP** device. As file system I/Os, writes to the SWAP device are just as costly. If the system is extremely deprived of RAM, it is possible that it will create a __paging storm__ to the SWAP disk. If the SWAP device is on the same file system as the data trying to be accessed, the system will enter into contention for the I/O paths. This will cause a complete **performance breakdown** on the system. If pages can't be read or written to disk, they will stay in RAM longer. If they stay in RAM longer, the kernel will need to free the RAM. The problem is that the __I/O channels__ are so __clogged__ that nothing can be done. This inevitably leads to a __kernel panic and crash of the system__.</​note>​+If the system does not have enough **RAM** to accommodate all requests, it must start to use the **SWAP** device. As file system I/Os, writes to the SWAP device are just as costly. If the system is extremely deprived of RAM, it is possible that it will create a __paging storm__ to the SWAP disk. If the SWAP device is on the same file system as the data trying to be accessed, the system will enter into contention for the I/O paths. This will cause a complete **performance breakdown** on the system. If pages can't be read or written to disk, they will stay in RAM longer. If they stay in RAM longer, the kernel will need to free the RAM. The problem is that the __I/O channels__ are so __clogged__ that nothing can be done. This inevitably leads to a __kernel panic and crash of the system__.
  
 The following **vmstat** output demonstrates a system under memory distress. It is writing data out to the swap device: The following **vmstat** output demonstrates a system under memory distress. It is writing data out to the swap device:
Line 96: Line 88:
 {{ :​ep:​laboratoare:​ep2_poz1.png?​550 |}} {{ :​ep:​laboratoare:​ep2_poz1.png?​550 |}}
  
-<note tip>The previous output demonstrates a __large amount of read requests__ into memory (**bi**). The requests are so many that the system is short on memory (**free**). This is causing the __system to send blocks to the swap device__ (**so**) and the size of swap keeps growing (**swpd**). Also notice a large percentage of WIO time (**wa**). This indicates that the __CPU is starting to slow down__ because of I/O requests.</​note>​+<note tip>The previous output demonstrates a __large amount of read requests__ into memory (**bi**). The requests are so many that the system is short on memory (**free**). This is causing the __system to send blocks to the swap device__ (**so**) and the size of swap keeps growing (**swpd**). Also notice a large percentage of WIO time (**wa**). This indicates that the __CPU is starting to slow down__ because of I/O requests. ​Furthermore,​ **id** represents the time spent idle and it is included in **wa** ​</​note>​
  
 To see the effect the swapping to disk is having on the system, check the swap partition on the drive using **iostat**. To see the effect the swapping to disk is having on the system, check the swap partition on the drive using **iostat**.
Line 104: Line 96:
 <note tip>Both the swap device (///​dev/​sda1//​) and the file system device (///​dev/​sda3//​) are contending for I/O. Both have __high amounts of write requests per second__ (//w/s//) and __high wait time__ (//await//) to __low service time ratios__ (//​svctm//​). This indicates that there is **contention** between the two partitions, causing both to **underperform**.</​note>​ <note tip>Both the swap device (///​dev/​sda1//​) and the file system device (///​dev/​sda3//​) are contending for I/O. Both have __high amounts of write requests per second__ (//w/s//) and __high wait time__ (//await//) to __low service time ratios__ (//​svctm//​). This indicates that there is **contention** between the two partitions, causing both to **underperform**.</​note>​
  
-==== Good to know: ====+==== Takeaways ​====
  
-<note important>​Takeaways for I/O monitoring:+<note important>​
   * Any time the **CPU is waiting** on I/O, the **disks are overloaded**.   * Any time the **CPU is waiting** on I/O, the **disks are overloaded**.
   * Calculate the amount of **IOPS** your disks can sustain.   * Calculate the amount of **IOPS** your disks can sustain.
ep/labs/03.1598296018.txt.gz · Last modified: 2020/08/24 22:06 by gheorghe.petre2608
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0