Assignment

01. Overview

The goal of this assignment is to implement a tool based on Linux Perf Events that is able to monitor main memory accesses performed by another process.

For this assignment you will be allowed to work in pairs. Also, you will need to have an Intel CPU capable of recording MEM_INST_RETIRED events. Anything newer than Nehalem should do.

02. Requirements

Partner up

Select a partner for this assignment and submit your choice via this form.
If you can't find a partner, try advertising on the assignment forum.

Only one student is required to complete the form on behalf of the team.
Only one student (not necessarily the same) will have to upload the assignment on moodle.
You are required to work with a partner on this assignment.

Usage

Your application should be implemented in C/C++ and take as positional arguments the commandline invocation of the program under test. For example, ./my_tracer curl http://example.com will launch the tracer program that will then fork() & exec() curl and start monitoring its memory transactions at the same time. In case you need to add flags to your application, you can separate them from the commandline of the child process with --.

Memory access tracing

Once the child process is up and running, you will have to monitor the read and write operations separately. Specifically, you will have to determine what address has been accessed and what instruction performed this access. This can be achieved using Intel Processor Event Based Sampling (PEBS), a mode of operation that will write detailed sample information in a physical memory ring buffer whenever the event counter triggers. You will not be required to interact with this system directly, but instead utilize the sampled mode of Linux Perf Events.

Mapping addresses to objects

Once this task is complete, your next objective is to map both the accessed address and the instruction's address to a memory mapped object (where appropriate). For instance, you will have to be able to distinguish between a memory access performed by code belonging to libc or libz. Additionally, you must identify whether the accessed memory address belongs to a data segment of a memory mapped object, or the heap / stack instead. To solve this task, know that the Linux Perf system can generate more than PMC Event Records while in sampled mode. In fact, the kernel can be configured to report any mmap() that the program under test performs. This is how perf record can embed object information into the sample file in order for perf report to subsequently translate those samples into “hot” functions, even with ASLR enabled.

It is possible for memory accesses to be performed by instructions located in non-file backed regions. For example, JIT-ed JavaScript code generated by V8 for Chromium and SpiderMoneky for Firefox, or LuaJit for Neovim plugins or World of Warcraft addons.

Plotting

The final implementation task is to create a dynamic visualization interface that can show the amount of both memory reads and writes performed live, as well as the locations being accessed and the objects performing them. Note that you must provide a fine-grained view of each object. For example, if you decide to implement this feature as a histogram, you will have to create multiple buckets for each object. So if you create a micro-benchmark that follows a linear memory access pattern in heap, your visualization tool must show how each bucket representing the heap region gets filled, one by one.

You are free to implement this feature in any way you desire. E.g., you can pass the data to be plotted to a Python3 script that generates a matplotlib interactive figure. Or you can generate an in-process frontend using ImGui or ncurses. Or you can write an HTTP server that can accept state updates over the network and display the plots in your browser. These are just a few ideas; feel free to utilize whatever you're most comfortable with.

Small bonus available if you can limit the displayed samples to a user-specified time window. In other words, show the memory access distribution for the past N seconds while continuously updating the plot. Whether a sample is part of the window or not should be decided based on the time it was taken, not when you consumed it from the record ring buffer. Perf also has an option for attaching a timestamp to each record.

Documentation

Implementation aside, your last task is to test and document your project. Your documentation should be in PDF format and describe your design choices, what tasks you found most difficult, how you solved those problems, and how you tested your tracer. Naturally, this implies you adding plots generated after tracing multiple benchmark programs. Explain how you chose these benchmarks and what observations you could make.

The goal of this documentation is to convince the reader of the soundness of your design and implementation. Try to pose and answer questions such as What guarantee do we have that the sampling is uniform? Is it possible to have a burst of localized samples followed by a period of PMC inactivity? or How did we verify that both read and write accesses have been reported, and not just one type?.

Such issues will arise naturally as you implement the assignment so don't give them much thought beforehand. But remember to address them in the end. Also, needless to say, don't limit yourselves to these examples.

Grading

The deadline for this assignment is 11 May. Upload a zip archive containing the source code, Makefile, documentation and any micro-benchmarks used in testing (don't go and include redis in your submission). The archive should be uploaded to this moodle assignment.

This assignment is worth 1.5p of your final grade. The breakdown by task is as follows:

  • Memory access tracing (30%): If nothing else, the application can provably monitor memory accesses by printing the relevant information to stdout.
  • Mapping addresses to objects (30%): The application should be able to generate statistics for both accessed data regions and code regions performing the accesses. Reads and writes must be treated separately.
  • Plotting (10%): Live illustration of the statistics mentioned in the previous task. Be creative and include even more data if you can.
  • Documentation (30%): Adequately explains the design and implementation. Can convincingly prove that both are sound. Describes the testing methodology and presents the results in a concise but thorough manner. In other words: “Someone has to read this so be considerate and don't waste their time. Improves your chances of not pissing them off.”

The first pair that submits an assignment that receives full marks will automatically pass the exam with a maximum grade.

FAQ

:?:

ep/teme/01.txt · Last modified: 2026/03/04 14:35 by radu.mantu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0