This is an old revision of the document!


Lab 05: Multithreading

Concurrency

Most modern microprocessors consist of more than one core, each of which can operate as an individual processing unit. They can execute different parts of different programs at the same time.

A flow of execution through certain parts of a program is called a thread of execution or a thread. Programs can consist of multiple threads that are being actively executed at the same time. The operating system starts and executes each thread on a core and then suspends it to execute other threads, thus each thread is competing with the other threads in the system for computational time on the processor. The execution of each thread may involve many cycles of starting and suspending.

All of the threads of all of the programs that are active at a given time are executed on the very cores of the microprocessor. The operating system decides when and under what condition to start and suspend each thread. We call this process of start/suspend/swap a context switch.

The features of the std.parallelism module make it possible for programs to take advantage of all of the cores in order to run faster.

std.parallelism.Task

Operations that are executed in parallel with other operations of a program are called tasks. Tasks are represented by the type std.parallelism.Task.

Task represents the fundamental unit of work. A Task may be executed in parallel with any other Task. Using this struct directly allows future/promise parallelism. In this paradigm, a function (or delegate or other callable) is executed in a thread other than the one it was called from. The calling thread does not block while the function is being executed.

For simplicity, the std.parallelism.task and std.parallelism.scopedTask functions are generally used to create an instance of the Task struct.

Using the Task struct has three steps:

1. First, we need to create a task instance.

int anOperation(string id) {
  writefln("Executing %s", id);
  Thread.sleep(1.seconds);
  return 42;
}
 
void main() {
  /* Construct a task object that will execute
   * anOperation(). The function parameters that are
   * specified here are passed to the task function as its
   * function parameters. */
   auto theTask = task!anOperation("theTask");
   /* the main thread continues to do stuff */
}

2. Now we've just created a new Task instance, but the task isn't running yet. Next we'll launch the task execution.

  /* ... */
  auto theTask = task!anOperation("theTask");
 
  theTask.executeInNewThread(); // start task execution
  /* ... */

3. At this point we are sure that the operation has been started, but it's unsure whether theTask has completed its execution. yieldForce() waits for the task to complete its operations; it returns only when the task has been completed. Its return value is the return value of the task function, i.e. anOperation().

  /* ... */
  immutable taskResult = theTask.yieldForce();
  writefln("All finished; the result is %s\n", taskResult);
  /* ... */

The Task struct has two other methods, workForce and spinForce, that are used to ensure that the Task has finished executing and to obtain the return value, if any. Read their docs and discover the differences in behaviour and when their usage is preferred.

std.parallelism.TaskPool

As we've previously stated: all of the threads of all of the programs that are active at a given time are executed on the very cores of the microprocessor, competing for computational time with each other.

This observation has the following implication: on a system that has N cores, we can have at most N threads running in parallel at a given time. This means that in our application we should create at most N worker threads that will execute tasks (from a tasks queue) for us, thus our N worker threads will be part of a thread pool; this is a common pattern used in concurrent applications.

The std.parallelism.TaskPool gives us access to a task pool implementation to which we can submit std.parallelism.Tasks to be executed by the worker threads.

The std.parallelism module gives us access to a ready to use std.parallelism.TaskPool instance, named std.parallelism.taskPool. std.parallelism.taskPool has totalCPUs - 1 worker threads available, where totalCPUs is the total number of CPU cores available on the current machine, as reported by the operating system. The minus 1 is included because the main thread will also be available to do work.

std.parallelism.taskPool.parallel

Let's start with a simple example:

struct Student {
  int number;
  void aSlowOperation() {
    writefln("The work on student %s has begun", number);
    // Wait for a while to simulate a long-lasting operation
    Thread.sleep(1.seconds);
    writefln("The work on student %s has ended", number);
  }
}
 
void main() {
  auto students = [ Student(1), Student(2), Student(3), Student(4) ];
 
  foreach (student; students) {
    student.aSlowOperation();
  }
}

In the code above, as the foreach loop normally operates on elements one after the other, aSlowOperation() would be called for each student sequentially. However, in many cases it is not necessary for the operations of preceding students to be completed before starting the operations of successive students. If the operations on the Student objects were truly independent, it would be wasteful to ignore the other microprocessor cores, which might potentially be waiting idle on the system.

Meet taskPool.parallel. This function can also be called simply as parallel(). parallel() accesses the elements of a range in parallel. An effective usage is with foreach loops. Merely importing the std.parallelism module and replacing students with parallel(students) in the program above is sufficient to take advantage of all of the cores of the system.

This simple change

  /* ... */
  foreach (student; parallel(students)) {
  /* ... */

Is enough to drop our application's total running time from 4 seconds to just 1 second.

In Ali's foreach for structs and classes chapter we can see that the expressions that are in foreach blocks are passed to opApply() member functions as delegates. parallel() returns a range object that knows how to distribute the execution of the delegate to a separate core for each element.

parallel() constructs a new Task object for every worker thread and starts that task automatically. parallel() then waits for all of the tasks to be completed before finally exiting the loop. parallel() is very convenient as it constructs, starts, and waits for the tasks automatically.

We strongly encourage you to have a look at taskPool.map, taskPool.amap and taskPool.reduce to unlock your full concurrent potential.

std.concurrency

dss/laboratoare/05.1561462865.txt.gz ยท Last modified: 2019/06/25 14:41 by eduard.staniloiu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0