This shows you the differences between two versions of the page.
ep:labs:02 [2020/08/18 18:57] radu.mantu [Objectives] |
ep:labs:02 [2025/02/12 00:00] (current) cezar.craciunoiu |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Lab 02 - Memory Monitoring (Linux) ====== | + | ====== Lab 02 - Advanced Plotting ====== |
===== Objectives ===== | ===== Objectives ===== | ||
- | * Offer an introduction to Virtual Memory. | + | * Introduction to pandas |
- | * Get you acquainted with relevant commands and their outputs for monitoring memory related aspects. | + | * Easy data manipulations with pandas |
- | * Introduce the concept of page deduplication. | + | * Introduction to seaborn |
- | * Present a step-by-step guide to Intel PIN for dynamic instrumentation. | + | * More types of cool looking plots with seaborn |
- | ===== Contents ===== | + | * Apply what you learned on exploring COVID data for Romania |
- | {{page>:ep:labs:02:meta:nav&nofooter&noeditbutton}} | ||
- | ===== Introduction ===== | + | ===== Resources ===== |
- | ==== 01. Virtual Memory ==== | + | In this lab, we will study the basic API of pandas for easier data manipulations, and seaborn for some more advanced and visually appealing plots that are also easy to produce. |
- | Virtual memory uses a disk as an extension of RAM so that the effective size of usable memory grows correspondingly. The kernel will write the contents of a currently unused block of memory to the hard disk so that the memory can be used for another purpose. When the original contents are needed again, they are read back into memory. This is all made completely transparent to the user; programs running under Linux only see the larger amount of memory available and don't notice that parts of them reside on the disk from time to time. Of course, reading and writing the hard disk is slower (on the order of a thousand times slower) than using real memory, so the programs don't run as fast. The part of the hard disk that is used as virtual memory is called the swap space. | + | For the exercises, you will explore the evolution of the COVID pandemic in Romania, using the information learned in this lab. |
- | ==== 02. Virtual Memory Pages ==== | + | For scientific computing we need an environment that is easy to use, and provides a couple of tools like manipulating data and visualizing results. We will use Google Colab, which comes with a variety of useful tools already installed. |
- | Virtual memory is divided into pages. Each virtual memory page on the X86 architecture is 4KB. When the kernel writes memory to and from disk, it writes memory in pages. The kernel writes memory pages to both the swap device and the file system. | + | Check out these cheetsheets for fast reference to the common libraries: |
- | ==== 03. Kernel Memory Paging ==== | + | **Cheat sheets:** |
- | Memory paging is a normal activity not to be confused with memory swapping. Memory paging is the process of syncing memory back to disk at normal intervals. Over time, applications will grow to consume all of memory. At some point, the kernel must scan memory and reclaim unused pages to be allocated to other applications. | + | - [[https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf)|python]] |
+ | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Numpy_Python_Cheat_Sheet.pdf|numpy]] | ||
+ | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Matplotlib_Cheat_Sheet.pdf|matplotlib]] | ||
+ | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Scikit_Learn_Cheat_Sheet_Python.pdf|sklearn]] | ||
+ | - [[https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf|pandas]] | ||
+ | - [[https://s3.amazonaws.com/assets.datacamp.com/blog_assets/Python_Seaborn_Cheat_Sheet.pdf|seaborn]] | ||
- | ==== 04. The Page Frame Reclaim Algorithm (PFRA) ==== | + | <note>This lab is organized in a Jupyer Notebook hosted on Google Colab. You will find there some intuitions and applications for pandas and seaborn. Check out the Tasks section below.</note> |
- | + | ||
- | The PFRA is responsible for freeing memory. The PFRA selects which memory pages to free by page type. Page types are listed below: | + | |
- | * **Unreclaimable** – locked, kernel, reserved pages | + | |
- | * **Swappable** – anonymous memory pages | + | |
- | * **Syncable** – pages backed by a disk file | + | |
- | * **Discardable** – static pages, discarded pages | + | |
- | + | ||
- | All but the “unreclaimable” pages may be reclaimed by the PFRA. There are two main functions in the PFRA. These include the kswapd kernel thread and the “Low On Memory Reclaiming” function. | + | |
- | + | ||
- | ==== 05. Kswapd ==== | + | |
- | + | ||
- | The **kswapd** daemon is responsible for ensuring that memory stays free. It monitors the **pages_high** and **pages_low** watermarks in the kernel. If the amount of free memory is below **pages_low**, the **kswapd** process starts a scan to attempt to free 32 pages at a time. It repeats this process until the amount of free memory is above the **pages_high** watermark. | + | |
- | + | ||
- | The **kswapd** thread performs the following actions: | + | |
- | * If the page __is unmodified__, it places the page on the free list. | + | |
- | * If the page is__ modified and backed by a file system__, it writes the contents of the page to disk. | + | |
- | * If the page __is modified and not backed up by any file system (anonymous)__, it writes the contents of the page to the swap device. | + | |
- | + | ||
- | ==== 06. Kernel Paging with pdflush ==== | + | |
- | + | ||
- | * The **pdflush** daemon is responsible for synchronizing any pages associated with a file on a filesystem back to disk. In other words, when a file is modified in memory, the **pdflush** daemon writes it back to disk. | + | |
- | * The **pdflush** daemon starts synchronizing dirty pages back to the filesystem when 10% of the pages in memory are dirty. This is due to a kernel tuning parameter called **vm.dirty_background_ratio**. | + | |
- | * The **pdflush** daemon works independently of the PFRA under most circumstances. When the kernel invokes the LMR (Low on Memory Reclaiming) algorithm, the LMR specifically forces **pdflush** to flush dirty pages in addition to other page freeing routines. | + | |
- | * The **vmstat** utility reports on virtual memory usage in addition to CPU usage. The following fields in the **vmstat** output are relevant to virtual memory: **Swapd**, **Free**, **Buff**, **Cache**, **So**, **Si**, **Bo**, **Bi** (use //man vmstat// to read their description). | + | |
- | + | ||
- | The following **vmstat** output demonstrates heavy utilization of virtual memory during an I/O application spike. The following observations can be made based on this output: | + | |
- | + | ||
- | * A large amount of disk blocks are paged in (//bi//) from the filesystem. This is evident in the fact that the cache of data in process address spaces (//cache//) grows. | + | |
- | * During this period, the amount of free memory (//free//) remains steady at 17MB even though data is paging in from the disk to consume free RAM. | + | |
- | * To maintain the free list, **kswapd** steals memory from the read/write buffers (//buff//) and assigns it to the free list. This is evident in the gradual decrease of the buffer cache (buff). | + | |
- | * The **kswapd** process then writes dirty pages to the swap device (//so//). This is evident in the fact that the amount of virtual memory utilized gradually increases (//swpd//). | + | |
- | + | ||
- | Conclusions: | + | |
- | * The less major page faults on a system, the better response times achieved as the system is leveraging memory caches over disk caches. | + | |
- | * Low amounts of free memory are a good sign that caches are effectively used unless there are sustained writes to the swap device and disk. | + | |
- | * If a system reports any sustained activity on the swap device, it means there is a memory shortage on the system. | + | |
===== Tasks ===== | ===== Tasks ===== |