This is an old revision of the document!


Lab 01 - Plotting (Numpy & Matplotlib)

Objectives

  • Offer an introduction to Numpy & matplotlib
  • Get you familiarised with the numpy API
  • Understand basic plotting with matplotlib

Python Scientific Computing Resources

In this lab, we will study a new library in python that offers fast, memory efficient manipulation of vectors, matrices and tensors: numpy. We will also study basic plotting of data using the most popular data visualization libraries in the python ecosystem: matplotlib.

For scientific computing we need an environment that is easy to use, and provides a couple of tools like manipulating data and visualizing results. Python is very easy to use, but the downside is that it's not fast at numerical computing. Luckily, we have very eficient libraries for all our use-cases.

Core computing libraries

  • numpy and scipy: scientific computing
  • matplotlib: plotting library

Machine Learning

  • sklearn: machine learning toolkit
  • tensorflow: deep learning framework developed by google
  • keras: deep learning framework on top of `tensorflow` for easier implementation
  • pytorch: deep learning framework developed by facebook

Statistics and data analysis

  • pandas: very popular data analysis library
  • statsmodels: statistics

We also have advanced interactive environments:

  • IPython: advanced python console
  • Jupyter: notebooks in the browser

There are many more scientific libraries available.

Check out these cheetsheets for fast reference to the common libraries:

Cheat sheets:

Other:

This lab is organized in a Jupyer Notebook hosted on Google Colab. You will find there some intuitions and applications for numpy and matplotlib. Check out the Tasks section below.

Tasks

01. [10p] Rotational delay - IOPS calculations

Every disk in your storage system has a maximum theoretical IOPS value that is based on a formula. Disk performance and IOPS is based on three key factors:

  • Rotational speed. Measured in RPM, mostly 7,200, 10,000 or 15,000 RPM. A higher rotational speed is associated with a higher-performing disk.
  • Average latency. The time it takes for the sector of the disk being accessed to rotate into position under a read/write head.
  • Average seek time. The time (in ms) it takes for the hard drive’s read/write head to position itself over the track being read or written.
  • Average IOPS: Divide 1 by the sum of the average latency in ms and the average seek time in ms (1 / (average latency in ms + average seek time in ms).

To calculate the IOPS range divide 1 by the sum of the average latency in ms and the average seek time in ms. The formula is:

average IOPS = 1 / (average latency in ms + average seek time in ms).

Let's calculate the Rotational Delay - RD for a 10K RPM drive:

  • Divide 10000 RPM by 60 seconds: 10000/60 = 166 RPS
  • Convert 1 of 166 to decimal: 1/166 = 0.006 seconds per Rotation
  • Multiply the seconds per rotation by 1000 milliseconds (6 MS per rotation).
  • Divide the total in half (RD is considered half a revolution around a disk): 6/2 = 3 MS
  • Add an average of 3 MS for seek time: 3 MS + 3 MS = 6 MS
  • Add 2 MS for latency (internal transfer): 6 MS + 2 MS = 8 MS
  • Divide 1000 MS by 8 MS per I/O: 1000/8 = 125 IOPS

[10p] Task A - Calculate rotational delay

Add in your archive the operations and the result you obtained. (Screenshot, picture of calculations made by hand on paper)

Calculate the Rotational Delay, and then the IOPS for a 5400 RPM drive.

02. [30p] iostat & iotop

[15p] Task A - Monitoring the behaviour with Iostat

Parameteres for iostat:

  • -x for extended statistics
  • -d to display device stastistics only
  • -m for displaying r/w in MB/s
$ iostat -xdm

Use iostat with -p for specific device statistics:

$ iostat -xdm -p sda

  • Run iostat -x 1 5.
  • Considering the last two outputs provided by the previous command, calculate the efficiency of IOPS for each of them. Does the amount of data written per I/O increase or decrease?

Add in your archive screenshot or pictures of the operations and the result you obtained, also showing the output of iostat from which you took the values.

How to do:

  • Divide the kilobytes read (rkB/s) and written (wkB/s) per second by the reads per second (r/s) and the writes per second (w/s).
  • If you happen to have quite a few loop devices in your iostat output, find out what they are exactly:
$ df -kh /dev/loop*

[15p] Task B - Monitoring the behaviour with Iotop

Iotop is an utility similar to top command, that interfaces with the kernel to provide per-thread/process I/O usage statistics.

Debian/Ubuntu Linux install iotop
$ sudo apt-get install iotop

How to use iotop command
$ sudo iotop OR $ iotop

Supported options by iotop command:

Options Description
–version show program’s version number and exit
-h, –help show this help message and exit
-o, –only only show processes or threads actually doing I/O
-b, –batch non-interactive mode
-n NUM, –iter=NUM number of iterations before ending [infinite]
-d SEC, –delay=SEC delay between iterations [1 second]
-p PID, –pid=PID processes/threads to monitor [all]
-u USER, –user=USER users to monitor [all]
-P, –processes only show processes, not all threads
-a, –accumulated show accumulated I/O instead of bandwidth
-k, –kilobytes use kilobytes instead of a human friendly unit
-t, –time add a timestamp on each line (implies –batch)
-q, –quiet suppress some lines of header (implies –batch)

  • Run iotop (install it if you do not already have it) in a separate shell showing only processes or threads actually doing I/O.
  • Inspect the script code (dummy.sh) to see what it does.
  • Monitor the behaviour of the system with iotop while running the script.
  • Identify the PID and PPID of the process running the dummy script and kill the process using command line from another shell (sending SIGINT signal to both parent & child processes).

Provide a screenshot in which it shows the iotop with only the active processes and one of them being the running script. Then another screenshot after you succeeded to kill it.

03. [30p] RAM disk

Linux allows you to use part of your RAM as a block device, viewing it as a hard disk partition. The advantage of using a RAM disk is the extremely low latency (even when compared to SSDs). The disadvantage is that all contents will be lost after a reboot.

There are two main types of RAM disks:

  • ramfs - cannot be limited in size and will continue to grow until you run out of RAM. Its size can not be determined precisely with tools like df. Instead, you have to estimate it by looking at the “cached” entry from free's output.
  • tmpfs - newer than ramfs. Can set a size limit. Behaves exactly like a hard disk partition but can't be monitored through conventional means (i.e. iostat). Size can be precisely estimated using df.

[15p] Task A - Create RAM Disk

Before getting started, let's find out the file system that our root partition uses. Run the following command (T - print file system type, h - human readable):

$ df -Th

The result should look like this:

Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs  1.1G     0  1.1G   0% /dev
tmpfs          tmpfs     214M  3.8M  210M   2% /run
/dev/sda1      ext4      218G  4.1G  202G   2% / <- root partition
tmpfs          tmpfs     1.1G  252K  1.1G   1% /dev/shm
tmpfs          tmpfs     5.0M  4.0K  5.0M   1% /run/lock
tmpfs          tmpfs     1.1G     0  1.1G   0% /sys/fs/cgroup
/dev/sda2      ext4      923M   73M  787M   9% /boot
/dev/sda4      ext4      266G   62M  253G   1% /home

From the results, we will assume in the following commands that the file system is ext4. If it's not your case, just replace with what you have:

$ sudo mkdir /mnt/ramdisk
$ sudo mount -t tmpfs -o size=1G ext4 /mnt/ramdisk

If you want the RAM disk to persist after a reboot, you can add the following line to /etc/fstab. Remember that its contents will still be lost.

tmpfs     /mnt/ramdisk     tmpfs     rw,nodev,nosuid,size=1G     0  0

That's it. We just created a 1Gb tmpfs ramdisk with an ext4 file system and mounted it at /mnt/ramdisk. Use df again to check this yourself.

[15p] Task B - Pipe View & RAM Disk

As we mentioned before, you can't get I/O statistics regarding tmpfs since it is not a real partition. One solution to this problem is using pv to monitor the progress of data transfer through a pipe. This is a valid approach only if we consider the disk I/O being the bottleneck.

Next, we will generate 512Mb of random data and place it in /mnt/ramdisk/file first and then in /home/student/file. The transfer is done using dd with 2048-byte blocks.

$ pv /dev/urandom | dd of=/mnt/ramdisk/rand  bs=2048 count=$((512 * 1024 * 1024 / 2048))
$ pv /dev/urandom | dd of=/home/student/rand bs=2048 count=$((512 * 1024 * 1024 / 2048))

Look at the elapsed time and average transfer speed. What conclusion can you draw?

:!: Put one screenshot with the tmpfs partition in df output and one screenshot of both pv commands and write your conclusion.

04. [30p] GPU Monitoring

a. [0p] Clone Repository and Build Project

Clone the repository containing the tasks and change to this lab's task 04. Follow the instructions to install the dependencies and build the project from the README.md.

$ git clone https://github.com/cs-pub-ro/EP-labs.git
$ cd EP-labs/lab_05/task_04

b. [10p] Run Project and Collect Measurements

To run the project, simply run the binary generated by the build step. This will render a scene with a sphere. Follow the instructions in the terminal and progressively increase the number of vertices. Upon exiting the simulation with Esc, two .csv files will be created. You will use these measurements to generate plots.

The simulation runs with FPS unbounded, this means it will use your whole GPU. Careful!

Also pay close attention to your RAM!

Every time you modify the number of vertices, wait at least a couple of seconds so the FPS becomes stable.

Increase vertices until you have less than 10 FPS for good results.

c. [10p] Generate Plot

We want to interpret the results recorded. In order to do this, we need to visually see them in a suggestive way. Plot the results in such a way that they are suggestive and easy to understand.

Recommended way to do the plots would be to follow these specifications:

  • One single plot for all results
  • Left OY axis shows FPS as a continuous variable
  • Right OY axis shows time spent per event in ms
  • OX axis follows the time of the simulation without any time ticks
  • OX axis has ticks showing the number of vertices for each event that happens
  • Every event marked with ticks on the OX axis has one stacked bar chart made of two components:
  • a. a bottom component showing time spent copying buffers
  • b. a top component showing the rest without the time spent on copying buffers

d. [10p] Interpret Results

Explain the results you have plotted. Answer the following questions:

  • Why does the FPS plot look like downwards stairs upon increasing the number of vertices?
  • Why does the FPS decrease more initially and the stabilizes itself at a higher value?
  • What takes more to compute: generating the vertices, or copying them in the VRAM?
  • What is the correlation between the number of vertices and the time to copy the Vertex Buffer?
  • Why is the program less responsive on a lower number of frames?

e. [10p] Bonus Dedicated GPU

Go back to step b. and rerun the binary and make it run on your dedicated GPU. Redo the plot with the new measurements. You do not need to answer the questions again.

If you use Nvidia, you can use prime-run.

https://gist.github.com/abenson/a5264836c4e6bf22c8c8415bb616204a

If you use AMD, you can use the DRI_PRIME=1 environment variable.

05. [10p] Feedback

Please take a minute to fill in the feedback form for this lab.

ep/labs/01.1739307892.txt.gz · Last modified: 2025/02/11 23:04 by cezar.craciunoiu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0