Show page

Differences

This shows you the differences between two versions of the page.

--- dss:laboratoare:05 [2019/06/25 15:12]
eduard.staniloiu [Concurrency]
+++ dss:laboratoare:05 [2019/06/26 16:12] (current)
eduard.staniloiu [Exercises]
@@ Line 187: / Line 187: @@
 If the majority of its threads are I/O bound, then a program can afford to start more threads than the number of cores without any degradation of performance. As it should be in every design decision that concerns program performance, one must take actual measurements to be exactly sure whether that really is the case.
 </note>
+=== Message passing ===
+**send()** sends messages and **receiveOnly()** waits for a message of a particular type. (There are also **prioritySend()**, **receive()** and **receiveTimeout()**, which we encourage you to read about in the docs.)
+The owner in the following program sends its worker a message of type **int** and waits for a message from the worker of type **double**. The threads continue sending messages back and forth until the owner sends a negative **int**. This is the owner thread:
+<code d>
+void main() {
+  Tid worker = spawn(&workerFunc);
+  foreach (value; 1 .. 5) {
+    worker.send(value);
+    double result = receiveOnly!double();
+    writefln("sent: %s, received: %s", value, result);
+  }
+  /* Sending a negative value to the worker so that it
+   * terminates. */
+  worker.send(-1);
+}
+</code>
+The return value of **spawn()** is the id of the worker thread. **main()** stores the return value of **spawn()** under the name **worker** and uses that variable when sending messages to the worker.
+On the other side, the worker receives the message that it needs as an **int**, uses
+that value in a calculation and sends the result as type **double** to its owner:
+<code d>
+void workerFunc() {
+  int value = 0;
+  while (value >= 0) {
+    value = receiveOnly!int();
+    double result = to!double(value) / 5;
+    ownerTid.send(result);
+  }
+}
+</code>
+<note tip>
+We strongly encourage you to read more about message passing concurrency in [[http://ddili.org/ders/d.en/concurrency.html|this chapter]] from Ali's book.
+</note>
+=== Data sharing ===
+We gave you a small insight about data sharing in D in [[https://ocw.cs.pub.ro/courses/dss/laboratoare/03#shared|lab03]].
+Unlike most other programming languages, data is not automatically shared in D; data is thread-local by default.
+Although module-level variables may give the impression of being accessible by all threads, each thread actually gets its own
+copy:
+<code d>
+import std.stdio;
+import std.concurrency;
+import core.thread;
+int variable;
+void printInfo(string message)
+{
+  writefln("%s: %s (@%s)", message, variable, &variable);
+}
+void worker()
+{
+  variable = 42;
+  printInfo("Before the worker is terminated");
+}
+void main()
+{
+  spawn(&worker);
+  thread_joinAll();
+  printInfo("After the worker is terminated");
+}
+</code>
+The variable that is modified inside **worker()** is not the same variable that is seen by **main()**.
+**spawn()** does not allow passing references to thread-local variables.
+Attempting to do so will result in a compilation error
+<code d>
+void worker(bool* isDone) { /* ... */ }
+void main() {
+  bool isDone = false;
+  spawn(&worker, &isDone); // Error: Aliases to mutable thread-local data not allowed.
+}
+</code>
+Mutable variables that need to be shared must be defined with the shared keyword.
+<code d>
+void worker(shared(bool)* isDone) { /* ... */ }
+void main() {
+  shared(bool) isDone = false;
+</code>
+On the other hand, since **immutable** variables cannot be modified, there is no
+problem with sharing them directly. For that reason, **immutable** implies **shared**.
+== shared is transitive ==
+As you can remember, in the D programming language, the **const** and **immutable** type qualifiers are transitive.
+The same is true for the **shared** type qualifier.
+<code d>
+shared int* pInt;
+shared(int*) pInt;
+</code>
+The statements above are equivalent.
+The correct meaning of pInt is "The pointer is shared and the data pointed to by the pointer is also shared."
+There is, also, a notion of "unshared pointer to shared data" that does hold water. Some thread holds a private pointer, and the pointer "looks" at shared data. That is easily expressible syntactically as
+<code d>
+shared(int)* pInt;
+</code>
+== Race conditions ==
+The correctness of the program requires extra attention when mutable data is shared between threads.
+<code d>
+void inc(shared(int)* val) {
+  ++*val;
+}
+void main() {
+  shared int x = 0;
+  foreach (i; 0 .. 10) {
+    spawn(&inc, &x);
+  }
+  thread_joinAll();
+}
+</code>
+The code above exemplifies a simple race condition: it's called a race because any thread can access (read and/or write) to the shared variable at any given time. As the threads are run in an nondeterministic order, the result of the operation is also nondeterministic. Although it is possible that the program can indeed produce that result, most of
+the time the actual outcome would be wrong (corrupted).
+== synchronized ==
+The incorrect program behavior above is due to more than one thread accessing the same mutable data (and at least one of them modifying it). One way of avoiding these race conditions is to mark the common code with the synchronized keyword. The program would work correctly with the following change:
+<code d>
+void inc(shared(int)* val) {
+  synchronized {
+    ++*val;
+  }
+}
+</code>
+A synchronized block will create an anonymous lock and use it to serialize the critical section.
+If we need to synchronize access to a shared variable in multiple **synchronized** blocks, we need to create a lock object and pass it to the **synchronized** statement.
+There is no need for a special lock type in D because any class object can be used as a **synchronized** lock. The following program defines an empty class named **Lock** to use its objects as locks:
+<code d>
+import std.stdio;
+import std.concurrency;
+import core.thread;
+enum count = 1000;
+class Lock {}
+void incrementer(shared(int) * value, shared(Lock) lock) {
+  foreach (i; 0 .. count) {
+    synchronized (lock) {
+      *value = *value + 1;
+    }
+  }
+}
+void decrementer(shared(int) * value, shared(Lock) lock) {
+  foreach (i; 0 .. count) {
+    synchronized (lock) {
+      *value = *value - 1;
+    }
+  }
+}
+void main() {
+  shared(Lock) lock = new shared(Lock)();
+  shared(int) number = 0;
+  foreach (i; 0 .. 100) {
+    spawn(&incrementer, &number, lock);
+    spawn(&decrementer, &number, lock);
+  }
+  thread_joinAll();
+  writeln("Final value: ", number);
+}
+</code>
+Because both synchronized blocks are connected by the same lock, only one of them is executed at a given time and the result is zero as expected.
+<note tip>
+It is a relatively expensive operation for a thread to wait for a lock, which may slow down the execution of the program noticeably. Fortunately, in some cases program correctness can be ensured without the use of a synchronized block, by taking
+advantage of [[http://ddili.org/ders/d.en/concurrency_shared.html|atomic operations]].
+</note>
+==== Fibers ====
+As we've previously discussed, modern operating systems implement multitasking through the use of threads and context switching, also known as preemptive multitasking.
+A thread is given, by the kernel, a slice of time to run on the physical core; when it's time has elapsed or if the thread is doing a blocking operation (waiting for an I/O operation to complete), the thread is preempted and the kernel chooses another thread to run.
+Ideally, from a threads point of view, a thread would run until his time slice has elapsed. For HPC applications this might very well be the case, but for applications and services that have interact with users and/or the disk, this means a lot of blocking I/O operations that will result in an early context switch. Since every thread is competing with all the other threads in the system for its time slice, being preempted at a 3rd of your slice is not ideal: it might take significant more time until it gets scheduled again than it took for the I/O operation to complete. To mitigate this problem, developers are using asynchronous operating system APIs to achieve [[http://vibed.org/features#aio|Asynchronous I/O operations]].
+Working with the asynchronous I/O model (AIO) can become tedious and confusing to write sequences of code (e.g. performing multiple consecutive database queries). Each step will introduce a new callback with a new scope, error callbacks often have to be handled separately. Especially the latter is a reason why it is tempting to just perform lax error handling. Another consequence of asynchronous callbacks is that there is no meaningful call stack. Not only can this make debugging more difficult, but features such as exceptions cannot be used effectively in such an environment.
+A success story in the D community is the vibe.d framework to achieve AIO through a simple interface. The approach of vibe.d is to use asynchronous I/O under the hood, but at the same time make it seem as if all operations were synchronous and blocking, just like ordinary I/O.
+What makes this possible is D's support for so called fibers (also often called co-routines). Fibers behave a lot like threads, just that they are actually all running in the same thread. As soon as a running fiber calls a special **yield()** function, it returns control to the function that started the fiber. The fiber can then later be resumed at exactly the position and with the same state it had when it called **yield()**. This way fibers can be multiplexed together, running quasi-parallel and using each threads capacity as much as possible.
+A fiber is a thread of execution enabling a single thread achieve multiple tasks.
+Compared to regular threads that are commonly used in parallelism and concurrency, it is more efficient to switch between fibers. Fibers are similar to //coroutines// and //green threads//.
+Fibers are a form of cooperative multitasking. As the name implies,
+cooperative multitasking requires some help from the user functions. A function
+runs up to a point where the developer decides would be a good place to run
+another task. Usually, a library function named yield() is called, which continues
+the execution of another function. This is best shown with an example. Here is a
+simplified version of the classic producer-consumer pattern:
+<code d>
+private int goods;
+private bool exit;
+void producerFiber()
+{
+  foreach (i; 0..3)
+  {
+    goods = i^^2;
+    writefln("Produced %s", goods);
+    Thread.sleep(500.msecs);
+    Fiber.yield();
+  }
+}
+void consumerFiber()
+{
+  while (!exit)
+  {
+    /* do something */
+    writefln("Consumed %s", goods);
+    Thread.sleep(500.msecs);
+    Fiber.yield();
+  }
+}
+void main()
+{
+  auto producer = new Fiber(&producerFiber);
+  auto consumer = new Fiber(&consumerFiber);
+  while (producer.state != Fiber.State.TERM)
+  {
+    producer.call();
+    exit = producer.state == Fiber.State.TERM;
+    consumer.call();
+  }
+}
+</code>
+We know this looks like much to process, but it's actually not that complicated to understand.
+First, we create two fiber instances, **producer** and **consumer**, that receive a **function** or **delegate** to the code they will execute. When **main()** issues the **producer.call()** method, the "control" is passed to the producer and the code from
+**producerFiber** starts executing. The control is transferred back to **main()** by the **Fiber.yield()** call from the **producerFiber**; when a future **producer.call()** is made, the code will resume after the **Fiber.yield()** method call.
+Next, **main()** checks if the producer has finished executing and then passes the control to the **consumer** fiber through the same API.
+<note tip>
+For a detailed and thorough discussion about fibers, have a read [[http://ddili.org/ders/d.en/fibers.html|here]].
+</note>
+==== Exercises ====
+The lab can be found at this [[https://github.com/RazvanN7/D-Summer-School/tree/master/lab-05|link]].
+=== 1. Parallel programming ===
+Navigate to the 1-parallel directory. Read and understand the source file students.d. Compile and run the program, and explain the behaviour.
+  - What is the issue, if any.
+  - We want to fix the issue, but we want to continue using **Task**s.
+  - Do we really have to manage all of this ourselves? I think we can do a better **parallel** job.
+  - Increase the number of students by a factor of 10, then 100. Does the code scale?
+=== 2. Getting functional with parallel programming ===
+Navigate to the 2-parallel directory. Read and understand the source file students.d.
+  - The code looks simple enough, but always ask yourselves: can we do better? Can we change the **foreach** into a oneliner?
+  - Increase the number of students by a factor of 10, then 100. Does the code scale?
+  - Depending on the size of our data, we might gain performance by tweaking the **workUnitSize** parameter. Lets try it out.
+=== 3. Heterogeneous tasks ===
+Until now we've been using **std.parallelism** on sets of homogeneous tasks.
+Q: What happens when we want to perform parallel computations on distinct, unrelated tasks?
+A: We can use [[https://dlang.org/phobos/std_parallelism.html#.TaskPool|taskPool]] to run our task on a pool of worker threads.
+Navigate to the 3-taskpool directory. Write a program that performs three tasks in parallel:
+  - One reads the contents of **in.txt** and writes to stdout the total number of lines in the file
+  - One calculates the average from the previous exercise
+  - One does a task of your choice
+To submit tasks to the **taskPool** use [[https://dlang.org/phobos/std_parallelism.html#.TaskPool.put|put]].
+<note>
+Don't forget to wait for your tasks to finish.
+</note>
+=== 4. I did it My way ===
+Let's implement our own concurrent **map** function.
+Navigate to the 4-concurrent-map directory. Starting from the serial implementation found in **mymap.d** modify the code such that
+the call to **mymap** function will execute on multiple threads. You are required to use the **std.concurrency** module for this task.
+Creating a thread implies some overhead, thus we don't want to create a thread for each element, but rather have a thread process chunks of elements; basically we need a **workUnitSize**.
+=== 5. Don't stop me now ===
+Since we just got started, let's implement our our concurrent **reduce** function. **reduce** must take the initial accumulator value as it's first parameter, and then the list of elements to reduce.
+<note>
+Be careful about those race conditions.
+</note>
+=== 6. Under pressure ===
+The implementations we did at ex. 4 and ex. 5 are great and all, but they have the following shortcoming: they will each spawn a number of threads (most likely equal to the number of physical cores), so calling them both in parallel will spawn twice the amount of threads that can run in parallel.
+Change your implementations to use a thread pool. The worker threads will consume jobs from a queue. The map and reduce implementations will push job abstractions into the queue.
+Now we're talking!

dss/laboratoare/05.1561464777.txt.gz · Last modified: 2019/06/25 15:12 by eduard.staniloiu

Show page Old revisions

Media Manager Back to top

Differences

Cursuri

Laboratoare

Resurse

Highschool workshop