Differences

This shows you the differences between two versions of the page.

Link to this comparison view

dss:laboratoare:05 [2019/06/26 13:29]
eduard.staniloiu [Concurrency]
dss:laboratoare:05 [2019/06/26 16:12] (current)
eduard.staniloiu [Exercises]
Line 378: Line 378:
 </​code>​ </​code>​
  
-Because ​this time both synchronized blocks are connected by the same lock, only one of them is executed at a given time and the result is zero as expected.+Because both synchronized blocks are connected by the same lock, only one of them is executed at a given time and the result is zero as expected.
  
 <note tip> <note tip>
Line 384: Line 384:
 advantage of [[http://​ddili.org/​ders/​d.en/​concurrency_shared.html|atomic operations]]. advantage of [[http://​ddili.org/​ders/​d.en/​concurrency_shared.html|atomic operations]].
 </​note>​ </​note>​
 +
 +==== Fibers ====
 +
 +As we've previously discussed, modern operating systems implement multitasking through the use of threads and context switching, also known as preemptive multitasking.
 +
 +A thread is given, by the kernel, a slice of time to run on the physical core; when it's time has elapsed or if the thread is doing a blocking operation (waiting for an I/O operation to complete), the thread is preempted and the kernel chooses another thread to run.
 +
 +Ideally, from a threads point of view, a thread would run until his time slice has elapsed. For HPC applications this might very well be the case, but for applications and services that have interact with users and/or the disk, this means a lot of blocking I/O operations that will result in an early context switch. Since every thread is competing with all the other threads in the system for its time slice, being preempted at a 3rd of your slice is not ideal: it might take significant more time until it gets scheduled again than it took for the I/O operation to complete. To mitigate this problem, developers are using asynchronous operating system APIs to achieve [[http://​vibed.org/​features#​aio|Asynchronous I/O operations]].
 +
 +Working with the asynchronous I/O model (AIO) can become tedious and confusing to write sequences of code (e.g. performing multiple consecutive database queries). Each step will introduce a new callback with a new scope, error callbacks often have to be handled separately. Especially the latter is a reason why it is tempting to just perform lax error handling. Another consequence of asynchronous callbacks is that there is no meaningful call stack. Not only can this make debugging more difficult, but features such as exceptions cannot be used effectively in such an environment.
 +
 +A success story in the D community is the vibe.d framework to achieve AIO through a simple interface. The approach of vibe.d is to use asynchronous I/O under the hood, but at the same time make it seem as if all operations were synchronous and blocking, just like ordinary I/O.
 +
 +What makes this possible is D's support for so called fibers (also often called co-routines). Fibers behave a lot like threads, just that they are actually all running in the same thread. As soon as a running fiber calls a special **yield()** function, it returns control to the function that started the fiber. The fiber can then later be resumed at exactly the position and with the same state it had when it called **yield()**. This way fibers can be multiplexed together, running quasi-parallel and using each threads capacity as much as possible.
 +
 +A fiber is a thread of execution enabling a single thread achieve multiple tasks.
 +Compared to regular threads that are commonly used in parallelism and concurrency,​ it is more efficient to switch between fibers. Fibers are similar to //​coroutines//​ and //green threads//.
 +
 +Fibers are a form of cooperative multitasking. As the name implies,
 +cooperative multitasking requires some help from the user functions. A function
 +runs up to a point where the developer decides would be a good place to run
 +another task. Usually, a library function named yield() is called, which continues
 +the execution of another function. This is best shown with an example. Here is a
 +simplified version of the classic producer-consumer pattern:
 +
 +<code d>
 +private int goods;
 +private bool exit;
 +
 +void producerFiber()
 +{
 +  foreach (i; 0..3)
 +  {
 +    goods = i^^2;
 +    writefln("​Produced %s", goods);
 +    Thread.sleep(500.msecs);​
 +    Fiber.yield();​
 +  }
 +}
 +
 +void consumerFiber()
 +{
 +  while (!exit)
 +  {
 +    /* do something */
 +    writefln("​Consumed %s", goods);
 +    Thread.sleep(500.msecs);​
 +    Fiber.yield();​
 +  }
 +}
 +
 +void main()
 +{
 +  auto producer = new Fiber(&​producerFiber);​
 +  auto consumer = new Fiber(&​consumerFiber);​
 +  while (producer.state != Fiber.State.TERM)
 +  {
 +    producer.call();​
 +    exit = producer.state == Fiber.State.TERM;​
 +    consumer.call();​
 +  }
 +}
 +</​code>​
 +
 +We know this looks like much to process, but it's actually not that complicated to understand.
 +First, we create two fiber instances, **producer** and **consumer**,​ that receive a **function** or **delegate** to the code they will execute. When **main()** issues the **producer.call()** method, the "​control"​ is passed to the producer and the code from
 +**producerFiber** starts executing. The control is transferred back to **main()** by the **Fiber.yield()** call from the **producerFiber**;​ when a future **producer.call()** is made, the code will resume after the **Fiber.yield()** method call.
 +Next, **main()** checks if the producer has finished executing and then passes the control to the **consumer** fiber through the same API.
 +
 +<note tip>
 +For a detailed and thorough discussion about fibers, have a read [[http://​ddili.org/​ders/​d.en/​fibers.html|here]].
 +</​note>​
 +
 +==== Exercises ====
 +
 +The lab can be found at this [[https://​github.com/​RazvanN7/​D-Summer-School/​tree/​master/​lab-05|link]]. ​
 +
 +=== 1. Parallel programming ===
 +
 +Navigate to the 1-parallel directory. Read and understand the source file students.d. Compile and run the program, and explain the behaviour.
 +
 +  - What is the issue, if any.
 +  - We want to fix the issue, but we want to continue using **Task**s.
 +  - Do we really have to manage all of this ourselves? I think we can do a better **parallel** job.
 +  - Increase the number of students by a factor of 10, then 100. Does the code scale?
 +
 +=== 2. Getting functional with parallel programming ===
 +
 +Navigate to the 2-parallel directory. Read and understand the source file students.d.
 +
 +  - The code looks simple enough, but always ask yourselves: can we do better? Can we change the **foreach** into a oneliner?
 +  - Increase the number of students by a factor of 10, then 100. Does the code scale?
 +  - Depending on the size of our data, we might gain performance by tweaking the **workUnitSize** parameter. Lets try it out.
 +
 +=== 3. Heterogeneous tasks ===
 +
 +Until now we've been using **std.parallelism** on sets of homogeneous tasks.
 +Q: What happens when we want to perform parallel computations on distinct, unrelated tasks?
 +A: We can use [[https://​dlang.org/​phobos/​std_parallelism.html#​.TaskPool|taskPool]] to run our task on a pool of worker threads.
 +
 +Navigate to the 3-taskpool directory. Write a program that performs three tasks in parallel:
 +  - One reads the contents of **in.txt** and writes to stdout the total number of lines in the file
 +  - One calculates the average from the previous exercise
 +  - One does a task of your choice
 +
 +To submit tasks to the **taskPool** use [[https://​dlang.org/​phobos/​std_parallelism.html#​.TaskPool.put|put]].
 +<​note>​
 +Don't forget to wait for your tasks to finish.
 +</​note>​
 +
 +=== 4. I did it My way ===
 +
 +Let's implement our own concurrent **map** function.
 +Navigate to the 4-concurrent-map directory. Starting from the serial implementation found in **mymap.d** modify the code such that
 +the call to **mymap** function will execute on multiple threads. You are required to use the **std.concurrency** module for this task.
 +
 +Creating a thread implies some overhead, thus we don't want to create a thread for each element, but rather have a thread process chunks of elements; basically we need a **workUnitSize**.
 +
 +=== 5. Don't stop me now ===
 +
 +Since we just got started, let's implement our our concurrent **reduce** function. **reduce** must take the initial accumulator value as it's first parameter, and then the list of elements to reduce.
 +
 +<​note>​
 +Be careful about those race conditions.
 +</​note>​
 +
 +=== 6. Under pressure ===
 +
 +The implementations we did at ex. 4 and ex. 5 are great and all, but they have the following shortcoming:​ they will each spawn a number of threads (most likely equal to the number of physical cores), so calling them both in parallel will spawn twice the amount of threads that can run in parallel.
 +
 +Change your implementations to use a thread pool. The worker threads will consume jobs from a queue. The map and reduce implementations will push job abstractions into the queue.
 +
 +Now we're talking!
dss/laboratoare/05.1561544979.txt.gz ยท Last modified: 2019/06/26 13:29 by eduard.staniloiu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0