This is an old revision of the document!

Lab 01 - CPU Monitoring (Linux)

Objectives

Offer an introduction to Performance Monitoring
Present the main CPU metrics and how to interpret them
Get you to use various tools for monitoring the performance of the CPU

Tasks

Introduction

01. Performance Monitoring

Performance monitoring is the process of regularly checking a set of metrics and tracking the overall health of a specific system. Monitoring is tightly coupled with performance tuning, and a Linux system administrator should be proficient in these two subjects, as one of their main responsibilities is to identify bottlenecks and find solutions to help the operating system surpass them. Pinpointing a Linux system bottleneck requires a deep understanding of how various components of this operating system work (e.g. how processes are scheduled on the CPU, how memory is managed , the way that I/O interrupts are handled, the details of network layer implementation, etc). From a high level, the main subsystems that you should think of when tuning are CPU, Memory, I/O and Network.

These four subsystems are vastly depending on each other and tuning the whole system implies keeping them in harmony. To quote a famous idiom, “a chain is no stronger than its weakest link”. Thus, when investigating a system performance issue, all the subsystems must be checked and analysed.

Being able to discover the bottleneck in a system requires also understanding of what types of processes are running on it. The application stack of a system can be broken down in two categories:

CPU Bound - performance is limited by the CPU
- Requires heavy use of the CPU (e.g. for batch processing, mathematical operations, etc)
- e.g. High volume web servers
I/O Bound - performance is limited by the I/O subsystem
- Requires heavy use of memory and storage system
- An I/O Bound application is usually processing large amounts of data
- An often behaviour is to use CPU resources for making I/O requests and to enter a sleeping state
- e.g. Database applications

Before going further with the CPU specific metrics and tools, here is a methodical approach which can guide you when tuning the performance of a system:

Understand the factors which affect the performance
Create a baseline measurement with the normal performance of the system
Reproduce the issue and compare the measurements with the baseline to narrow down the bottleneck to a specific subsystem
Try a single change at a time and test the results

02. Introducing the CPU and CPU Metrics

Before looking at the numerous performance measurement tools present in the Linux operating system, it is important to understand some key concepts and metrics, along with their interpretation regarding the performance of the system.

The kernel contains a scheduler which is in charge of scheduling two types of resources: interrupts and threads. The resources are assigned by the scheduler with different priorities. The following list presents the priorities:

User Processes - All the processes running in the user space - having the lowest priority in the scheduling mechanism
System Processes - All kernel processing
Interrupts - Devices announcing the kernel that they are done processing

Context Switches

While executing a process, the necessary set of data is stored in registers on the processor and cache. This group of information is called a context. Each thread owns an allotted time quantum to spend on the CPU, and when the time finishes or it is preempted by a higher priority task, a new ready to run process will be scheduled. When the next process is scheduled to run, the context of the current will be stored and the context of the new one is restored to the registers, this process being named context switch. Having a great volume of context switching is not desired because the CPU has to flush its register and cache each time, to gain room for the new process, which leads to performance issues.

The Run Queue

Each CPU preserves its own run queue of threads. In an ideal scenario, the scheduler would be constantly executing threads. Threads can be in different states: runnable - processes which are ready to be executed or in a sleep state - being blocked while waiting for I/O. If the system has performance issues or it’s overloaded, then the queue starts to fill up and a process thread will take longer to execute.

The same concept is known also as “load”. This term is measured by load average, which is a rolling average of the sum of the processes waiting to be processed and the processes waiting for uninterruptible task to be completed. Unix systems traditionally present the CPU load as 1-minute, 5-minute and 15-minute averages.

CPU Utilisation

The CPU Utilisation is a meaningful metric to observe how the running processes make use of the given processing resources. You can find the following categories the vast majority of performance monitoring tools:

User time - the time percentage a CPU spends on user processes
- High user time values are recommended because this usually means that the system carries out actual work
System time - the time percentage a CPU spends on kernel threads and interrupts
- High system time values could mean bottlenecks in the network and driver stack
Waiting I/O - the time percentage a CPU waits for a I/O event to occur
- A system should not spend too much time waiting for I/O operations
Idle time - the time percentage a CPU spends waiting for tasks
Nice time - the time percentage spends on changing the priority and execution order of processes. It is often included in the user time

03. CPU Performance Monitoring

The Linux distributions have various monitoring tools available. Some of the utilities deal with metrics in a single tool, providing well formatted output which eases the understanding of the system performance. Other tools are specialized on more specific metrics and give us detailed information.

Some of the most important Linux CPU performance monitoring tools:

Tool	Most useful function
vmstat	System activity
top	Process activity
uptime, w	Average system load
ps, pstree	Displays the processes
iostat	Average CPU load
sar	Collect and report system activity
mpstat	Multiprocessor usage

04. Examples

Understanding how well a CPU is performing is a matter of interpreting the run queue, its utilisation, and the amount of context switching performed. Although performance is relative to baseline statistics, in the absence of these statistics, the following general performance expectations of a system can be used as a guideline:

Run Queues – A run queue should not have more than 3 threads queued per processor. For example, a dual processor system should not have more than 6 threads in the run queue.
CPU Utilisation – A fully utilised CPU should have the following utilisation distribution:
- 65% – 70% User Time
- 30% – 35% System Time
- 0% – 5% Idle Time
Context Switches – The amount of context switches is directly relevant to CPU utilisation. As long as the CPU sustains the previously presented utilisation distribution, it is acceptable to have a high amount of context switches.

The following two examples give interpretations of the outputs generated by vmstat.

Example A - Sustained CPU Utilisation

The following observations can be made based on this output:

There are a high amount of interrupts (in) and a low amount of context switches (cs). It appears that a single process is making requests to hardware devices.
To further prove the presence of a single application, the user (us) time is constantly at 85% and above. Along with the low amount of context switches, we deduce that the process comes on the processor and stays on the processor.
The run queue is just about at the limits of acceptable performance. On a couple occasions, it goes beyond acceptable limits.

Example B - Overloaded Scheduler

The following observations can be made based on this output:

The amount of context switches is higher than interrupts, suggesting that the kernel has to spend a considerable amount of time context switching threads.
The high volume of context switches is causing an unhealthy balance of CPU utilisation. This is evident by the fact that the wait on IO percentage is extremely high and the user percentage is extremely low.
Because the CPU is blocked waiting for I/O, the run queue starts to fill and the amount of threads blocked waiting on I/O also fills.

These examples are from Darren Hoch’s Linux System and Performance Monitoring.

Tasks

Google Colab Notebook

For this lab, we will use Google Colab for exploring numpy and matplotlib. Please solve your tasks here by clicking “Open in Colaboratory”.

You can then export this python notebook as a PDF (File → Print) and upload it to Moodle.

06. [10p] Feedback

Please take a minute to fill in the feedback form for this lab.

General Information

Lectures

Labs

Assignments

Archived Labs

ep/labs/01.1597763737.txt.gz · Last modified: 2020/08/18 18:15 by radu.mantu

Old revisions

Media Manager Back to top