Learn How To Parallelize Scala Programs Using Functions



Creating programs that run in parallel is difficult because of challenges posed by data races and deadlocks. These challenges arise as a result sharing state which can be eliminated by using immutable state. By doing this we eliminate risks associated with concurrency. Due to parallel programming challenges and benefits provided by use of immutable data structures functional programming becomes an excellent choice to the parallel programmer. This tutorial requires a good understanding of immutability, pure functions and functional data structures. Previous tutorials have tried to explain these concepts so reviews please refer to them.
The main objective in parallel programming is to develop computations that run in parallel. To make this clear let us first consider a simple example. Look at the function below that computes the sum.

In the function above the most important part in our discussion is the expression val (l,r) = as.splitAt(as.length/2). We are dividing our collection into two equal halves, summing each of the halves independently then combining them into a single result. If we have two cpu cores on our machine then each core could handle summation of one half of the collection. In this example there are no gains in parallelization we just wanted to lay a good foundation of how to parallelize computations.

To understand how parallel programming is implemented you first need to understand the actor model. An actor is the basic unit of performing computations. An actor is able to send messages to other actors, create new actors and decide what to do when they receive a message. There is no order in the way the actors can perform these actions. When performing these actions actors are completely separate from each other with no memory sharing. Actors process messages in the order they arrive so they have a mailbox that stores messages as they wait processing.

In Scala you implement actors by using java threads. There are two types of threads that are differentiated by the way message waiting happens. Receive syntax uses one JVM thread for each actor. React syntax uses one thread for every CPU. With receive you are able to get better response time while with react you are able to handle many actors.

To avoid data races you avoid copying data and take the following measures for every message sent.

  • Return new data
  • When an actor needs to use data reference it
  • Prevent the actor that receives the message from any data modification

By using immutable messages you eliminate any possibility of data races happening.

Consider the code below that shows how you can define an actor. We define two messages using case class because we will rely on pattern matching. With pattern matching the first case encountered that matches input data is usually executed.

Sending messages to a controller is done using the syntax shown below.

to make use of actors more clear consider a complete example that consists of two actors. One actor sends a “hello” message and the other actor message responds with a “bonjour” message.

We are going to briefly discuss this example. At the top of our code we create objects Bonjour and Hello using a case class because we will be using pattern matching. Next we perform imports from scala.actors package that contains classes, objects and traits that implement the actor model. Actors are ordinarily instatiated using Actors class found in scala.actors package.

In this tutorial we looked at parallel computations and how they benefit from immutable data structures available in Scala. We looked at a trivial example that split a collection into two equal parts and each part was computed by a different CPU. The actor model of sending messages in a parallel computing environment was introduced. Use of actors and how they send and receive messages were discussed. Finally we discussed two examples that demonstrated how message passing is implemented in Scala.


Please enter your comment!
Please enter your name here