Multithreading and Synchronization

A thread, often considered the smallest unit of processing that can be scheduled by an operating system, is a component of a process. Multiple threads can exist within the same process, sharing certain resources while operating independently in execution. Here's a detailed breakdown to clarify these concepts and how they relate to one another:

What is a Thread?

Definition: A thread, also known as a lightweight process, is a path of execution within a process. It's the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system.
Characteristics: Threads within the same process run in a shared memory space, allowing for efficient communication and data exchange. They are more lightweight than processes because they share resources and have less overhead for context switching.

How Does it Compare with a Process?

Process Definition: A process is a program in execution. It is a self-contained execution environment and includes a unique set of resources such as memory space, file handles, and system resources.
Key Differences:
- Resource Isolation: Processes are fully isolated from each other, with each process having its own separate memory space. In contrast, threads share the memory and resources of their parent process.
- Creation Overhead: Creating a new process is more resource-intensive and time-consuming than creating a new thread, due to the need for allocating separate memory and resources.
- Communication: Inter-process communication (IPC) mechanisms are required for processes to communicate with each other, which can be complex and slower. Threads can directly communicate through shared memory within the same process.
- Failure Impact: A fault in one process does not affect other processes directly, due to the isolation. However, a fault in one thread can affect all threads within the same process because of shared resources.

What do Threads Share?

Within the same process, threads share several resources, which facilitates efficient communication and resource utilization:

Memory Space: Threads share the heap and static memory segments, making data exchange and access to shared data straightforward.
System Resources: File handles, network connections, and other system resources are shared among threads.
Program Code: Threads execute the same program code but can operate on different data or perform different tasks concurrently.
Process Context: While threads have their individual stack and registers (for local variables and thread-specific execution context), they share the process's global variables, heap memory, and system state.

pthreads stands for POSIX Threads. It is a standardized C library for multi-threading that allows for the creation and control of threads in a UNIX/Linux environment. It is widely supported on UNIX-like operating systems, including Linux, macOS, and BSD variants, making it a portable and versatile choice for developing multi-threaded applications in C and C++. The pthreads library defines a set of functions, data types, and constants to facilitate threading in software development. Using pthreads, developers can create multiple threads for concurrent execution within the same process. This concurrency enables more efficient use of CPU resources and can significantly improve the performance of applications by allowing them to do multiple operations simultaneously. Here are some key concepts and features of pthreads:

Thread Creation and Termination: Functions like pthread_create() and pthread_exit() are used to create and terminate threads.
Synchronization: pthreads provides mechanisms such as mutexes (mutual exclusions), condition variables, and read-write locks to manage synchronization among threads and prevent race conditions.
Thread-specific Data: Allows threads to create and use data that is unique to each thread, ensuring thread safety.
Joining and Detaching Threads: Threads can be joined using pthread_join(), which allows one thread to wait for another to finish. Alternatively, threads can be detached (pthread_detach()) to allow them to run independently and have their resources automatically reclaimed upon completion.
Scheduling and Management: The library offers functions to set and get thread scheduling policies and parameters, along with thread attributes (such as stack size and scheduling behavior).

Let's start with a very basic pthreads program that sums numbers from 1 to N by splitting the task into n threads. This program will demonstrate how to create and join threads, how to pass arguments to them, and how to use global variables.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

int totalSum = 0; // Global variable to store the total sum
pthread_mutex_t mutexSum; // Mutex for updating the total sum

void* sumRange(void* arg) {
    int start = ((int*)arg)[0];
    int end = ((int*)arg)[1];
    int partialSum = 0;
    for (int i = start; i <= end; i++) {
        partialSum += i;
    }
    pthread_mutex_lock(&mutexSum);
    totalSum += partialSum;
    pthread_mutex_unlock(&mutexSum);
    printf("Partial sum: %d\n", partialSum);
    pthread_exit(NULL);
}

int main(int argc, char *argv[]) {
    if (argc < 3) {
        printf("Usage: %s <N> <n>\n", argv[0]);
        return 1;
    }

    int N = atoi(argv[1]);
    int n = atoi(argv[2]);
    pthread_t threads[n];
    pthread_mutex_init(&mutexSum, NULL);

    int range = N / n;
    for (int i = 0; i < n; i++) {
        int* bounds = malloc(sizeof(int) * 2); // Dynamic allocation to avoid stack issues
        bounds[0] = i * range + 1;
        bounds[1] = (i == n - 1) ? N : (i + 1) * range;
        pthread_create(&threads[i], NULL, sumRange, (void*)bounds);
    }

    for (int i = 0; i < n; i++) {
        pthread_join(threads[i], NULL);
    }

    printf("Total sum: %d\n", totalSum);
    pthread_mutex_destroy(&mutexSum);
    return 0;
}

To run programs that use pthreads, use a form of gcc that looks like the folllowing

gcc -pthread your_file.c -o executable_name

Memory Layout

+----------------+       +-----------------------+
|     Stack      |       |       Stack (Main)    |
| for Thread 1   |       |  - argc, argv         |
+----------------+       |  - N, n               |
| Local Vars     |       |  - threads[n]         |
| - start, end   |       |  - range, i, bounds   |
| - partialSum   |       +-----------------------+
+----------------+       
|     Stack      |       
| for Thread 2   |       +-----------------------+
+----------------+       |       Heap            |
| Local Vars     |       |  - Dynamic variables  |
| - start, end   |       |    (e.g., bounds)     |
| - partialSum   |       +-----------------------+
+----------------+       
|       ...      |       +-----------------------+
+----------------+       |   Data Segment        |
                         |  - totalSum           |
+----------------+       |  - mutexSum           |
|     Stack      |       +-----------------------+
| for Thread n   |       
+----------------+       +-----------------------+
| Local Vars     |       |   Code Segment        |
| - start, end   |       |  - Program code       |
| - partialSum   |       +-----------------------+
+----------------+

Global Variables: totalSum and mutexSum live in the data segment, shared across all threads.
Thread Stacks: Each thread has its own stack, containing local variables like start, end, and partialSum. These illustrate the threads' work on different parts of the summation task.
Main Stack: The main function's stack contains its local variables, including argc, argv, N, n, threads, range, i, and bounds.
Heap: Dynamically allocated memory (e.g., for the bounds arrays passed to threads) is stored here, accessible by any thread.
Code and Heap Sharing: All threads share the program's code segment and the heap, allowing them to execute the same code and access dynamically allocated memory.

The POSIX Threads (pthreads) library provides a rich set of functions for creating, managing, and synchronizing threads. Here's a detailed look at some of the most important pthread functions and their usage:

1. pthread_create()

Purpose: Used to create a new thread.
Prototype: int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
Parameters:
- thread: A pointer to pthread_t where the ID of the newly created thread is stored.
- attr: Attributes for the thread (NULL for default attributes).
- start_routine: The function that the thread will execute once it is created.
- arg: A single argument that can be passed to the start_routine.
Return Value: Returns 0 on success; an error number on failure.

Usage Example:

  pthread_t threadID;
  int status = pthread_create(&threadID, NULL, threadFunction, (void*)arg);
  if (status != 0) {
      // Handle error
  }

2. pthread_join()

Purpose: Waits for a thread to terminate and optionally obtains the return value of the start_routine.
Prototype: int pthread_join(pthread_t thread, void **retval);
Parameters:
- thread: Thread ID of the thread to wait for.
- retval: Pointer to the location where the return value of the start_routine (if any) will be stored.
Return Value: Returns 0 on success; an error number on failure.

Usage Example:

  void *returnValue;
  pthread_join(threadID, &returnValue);

3. pthread_exit()

Purpose: Terminates the calling thread.
Prototype: void pthread_exit(void *retval);
Parameters:
- retval: The return value that the thread should pass back. This can be accessed by other threads using pthread_join.
Usage Example:
```
  pthread_exit((void*)status);
```

4. pthread_mutex_init() and pthread_mutex_destroy()

Purpose: Initializes and destroys a mutex object, respectively.
Prototype:
- Init: int pthread_mutex_init(pthread_mutex_t *mutex, const pthread_mutexattr_t *attr);
- Destroy: int pthread_mutex_destroy(pthread_mutex_t *mutex);
Parameters:
- mutex: Pointer to the mutex.
- attr: Mutex attributes (NULL for default).
Return Value: Returns 0 on success; an error number on failure.

Usage Example:

  pthread_mutex_t mutex;
  pthread_mutex_init(&mutex, NULL);
  // Use the mutex
  pthread_mutex_destroy(&mutex);

5. pthread_mutex_lock() and pthread_mutex_unlock()

Purpose: Locks and unlocks a mutex, respectively.
Prototype:
- Lock: int pthread_mutex_lock(pthread_mutex_t *mutex);
- Unlock: int pthread_mutex_unlock(pthread_mutex_t *mutex);
Parameters:
- mutex: Pointer to the mutex.
Return Value: Returns 0 on success; an error number on failure.

Usage Example:

  pthread_mutex_lock(&mutex);
  // Critical section
  pthread_mutex_unlock(&mutex);

6. pthread_cond_wait() and pthread_cond_signal()

Purpose: These functions are used for waiting for a condition (synchronization between threads) and for signaling another thread to wake up, respectively.
Prototype:
- Wait: int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
- Signal: int pthread_cond_signal(pthread_cond_t *cond);
Parameters:
- cond: Pointer to the condition variable.
- mutex: Mutex that is locked by the calling thread.
Return Value: Returns 0 on success; an error number on failure.

Usage Example:

  pthread_cond_wait(&cond, &mutex);
  // Wait for condition
  pthread_cond_signal(&cond);

7. pthread_attr_init() and pthread_attr_destroy()

Purpose: Initializes and destroys thread attributes object, respectively.
Prototype:
- Init: int pthread_attr_init(pthread_attr_t *attr);
- Destroy: int pthread_attr_destroy(pthread_attr_t *attr);
Parameters:
- attr: Attributes object.
Return Value: Returns 0 on success; an error number on failure.

Usage Example:

  pthread_attr_t attr;
  pthread_attr_init(&attr);
  // Configure attributes
  pthread_attr_destroy(&attr);

Examples

This example demonstrates a classic producer-consumer problem using pthreads with a bounded buffer. The producer will produce items (in this case, integers) and add them to the buffer, while the consumer will remove items from the buffer and consume them. The buffer size N is specified by the user. We'll use a mutex to protect access to the buffer and two condition variables to signal the state of the buffer (empty or full).

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MAX_ITEMS 10 // Maximum items a producer can produce or a consumer can consume

int buffer[MAX_ITEMS];
int count = 0; // Number of items in the buffer
int in = 0; // Next insert position in the buffer
int out = 0; // Next remove position from the buffer

pthread_mutex_t mutex;
pthread_cond_t full, empty;

void* producer(void* param) {
    int item;
    for (int i = 0; i < MAX_ITEMS; i++) {
        item = i; // Produce an item
        pthread_mutex_lock(&mutex);
        while (count == MAX_ITEMS) // Buffer is full
            pthread_cond_wait(&empty, &mutex);

        buffer[in] = item;
        in = (in + 1) % MAX_ITEMS;
        count++;
        printf("Produced: %d\n", item);

        pthread_cond_signal(&full);
        pthread_mutex_unlock(&mutex);
        sleep(1); // Slow down the producer for demonstration purposes
    }
    pthread_exit(NULL);
}

void* consumer(void* param) {
    int item;
    for (int i = 0; i < MAX_ITEMS; i++) {
        pthread_mutex_lock(&mutex);
        while (count == 0) // Buffer is empty
            pthread_cond_wait(&full, &mutex);

        item = buffer[out];
        out = (out + 1) % MAX_ITEMS;
        count--;
        printf("Consumed: %d\n", item);

        pthread_cond_signal(&empty);
        pthread_mutex_unlock(&mutex);
        sleep(1); // Slow down the consumer for demonstration purposes
    }
    pthread_exit(NULL);
}

int main(int argc, char* argv[]) {
    pthread_t tid_producer, tid_consumer;
    pthread_mutex_init(&mutex, NULL);
    pthread_cond_init(&full, NULL);
    pthread_cond_init(&empty, NULL);

    pthread_create(&tid_producer, NULL, producer, NULL);
    pthread_create(&tid_consumer, NULL, consumer, NULL);

    pthread_join(tid_producer, NULL);
    pthread_join(tid_consumer, NULL);

    pthread_mutex_destroy(&mutex);
    pthread_cond_destroy(&full);
    pthread_cond_destroy(&empty);

    return 0;
}

Buffer and Synchronization Primitives: We define a global buffer buffer of integers, a count to keep track of the number of items in the buffer, and two indexes in and out to indicate where to insert and remove items. We also define a mutex mutex and two condition variables full and empty.
Producer Function:
- The producer generates items (here, simply integers) and adds them to the buffer. If the buffer is full (i.e., count == MAX_ITEMS), the producer waits on the empty condition variable, which is signaled by the consumer when it removes an item from the buffer.
- After adding an item, it signals the full condition variable to indicate that the buffer is not empty anymore, allowing the consumer to consume.
Consumer Function:
- The consumer waits if the buffer is empty (i.e., count == 0), waiting on the full condition variable which is signaled by the producer after adding an item to the buffer.
- After consuming an item, it signals the empty condition variable to indicate that there is at least one free space in the buffer, allowing the producer to produce more items.
Main Function:
- Initializes the mutex and condition variables, creates the producer and consumer threads, waits for them to finish, and then cleans up by destroying the mutex and condition variables.
Mutex and Condition Variables:
- The mutex ensures that only one thread modifies the buffer at any time, preventing data corruption.
- The condition variables full and empty are used to block the producer when the buffer is full and the consumer when the buffer is empty, respectively. They are signaled by the opposite party when the condition changes (i.e., an item is added or removed).

For the reader-writer problem where one writer and multiple readers access a shared variable, synchronization is key to ensure that readers can concurrently read the variable while the writer has exclusive access when writing. This example demonstrates such synchronization using pthreads, where the writer increments a shared variable and then all readers read the updated value.

The number of readers N is specified by the user. We'll use a mutex to protect the shared variable during updates and read count adjustments. Additionally, a read-write lock or a combination of condition variables could be employed for more sophisticated control, but for simplicity, we'll focus on mutexes and condition variables.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int sharedVar = 0; // Shared variable
int activeReaders = 0; // Active readers count
pthread_mutex_t mutex; // Mutex to protect shared variable and reader count
pthread_cond_t writerCond; // Condition variable for writer

void* writer(void* param) {
    for (int i = 0; i < 5; i++) { // Let's say writer wants to write 5 times
        pthread_mutex_lock(&mutex);
        // Wait for all readers to finish before writing
        while (activeReaders > 0) {
            pthread_cond_wait(&writerCond, &mutex);
        }
        sharedVar++; // Write operation
        printf("Writer updated sharedVar to %d\n", sharedVar);
        pthread_mutex_unlock(&mutex);
        // Optional: notify readers or another writer
        sleep(1); // Sleep to simulate writing duration
    }
    pthread_exit(NULL);
}

void* reader(void* param) {
    int id = *((int*)param);
    for (int i = 0; i < 5; i++) {
        pthread_mutex_lock(&mutex);
        activeReaders++;
        pthread_mutex_unlock(&mutex);

        // Read operation
        printf("Reader %d reads sharedVar as %d\n", id, sharedVar);

        pthread_mutex_lock(&mutex);
        activeReaders--;
        if (activeReaders == 0) {
            // Notify writer if it's waiting
            pthread_cond_signal(&writerCond);
        }
        pthread_mutex_unlock(&mutex);
        sleep(1); // Sleep to simulate reading duration
    }
    pthread_exit(NULL);
}

int main(int argc, char* argv[]) {
    if (argc < 2) {
        printf("Usage: %s <number_of_readers>\n", argv[0]);
        return 1;
    }
    int numReaders = atoi(argv[1]);
    pthread_t writerThread;
    pthread_t readers[numReaders];
    int readerIds[numReaders];

    pthread_mutex_init(&mutex, NULL);
    pthread_cond_init(&writerCond, NULL);

    // Create writer thread
    pthread_create(&writerThread, NULL, writer, NULL);

    // Create reader threads
    for (int i = 0; i < numReaders; i++) {
        readerIds[i] = i + 1; // Assign reader ID
        pthread_create(&readers[i], NULL, reader, &readerIds[i]);
    }

    // Wait for writer thread to finish
    pthread_join(writerThread, NULL);

    // Wait for all reader threads to finish
    for (int i = 0; i < numReaders; i++) {
        pthread_join(readers[i], NULL);
    }

    pthread_mutex_destroy(&mutex);
    pthread_cond_destroy(&writerCond);

    return 0;
}

Shared Variable and Active Readers Count: sharedVar is the variable both readers and writer will access. activeReaders tracks the number of readers currently reading the shared variable.
Writer Function: The writer acquires the mutex and waits for all readers to finish (i.e., activeReaders to be 0) before updating sharedVar. After writing, it releases the mutex and optionally notifies other threads (readers or another writer) that it's done.
Reader Function: Each reader increases the activeReaders count before reading sharedVar to signal their reading status. After reading, they decrease the activeReaders count and signal the writer if they're the last reader.
Mutex and Condition Variable: The mutex ensures exclusive access to the shared variable and the reader count. The condition variable writerCond is used by the writer to wait for a signal that there are no active readers before it begins writing.

The Dining Philosophers problem is a classic synchronization problem involving N philosophers sitting around a circular table with one chopstick between each pair. A philosopher alternates between thinking and eating. To eat, a philosopher must pick up the two nearest chopsticks, one to their left and one to their right. After eating, they put down the chopsticks and start thinking again.

To solve this problem while avoiding deadlocks, one common approach is to ensure that all philosophers except for one try to pick up the left chopstick before the right one, while the remaining philosopher picks up the right chopstick before the left one. This prevents the formation of a circular wait condition, which is a necessary condition for deadlock.

Below is a deadlock-free solution for the Dining Philosophers problem using pthreads, where N is the number of philosophers (and chopsticks) taken from the user. Each chopstick is represented by a mutex.

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int N; // Number of philosophers and chopsticks
pthread_mutex_t *chopsticks;

void think(int philosopher) {
    printf("Philosopher %d is thinking.\n", philosopher);
    sleep(rand() % 2);
}

void eat(int philosopher) {
    printf("Philosopher %d is eating.\n", philosopher);
    sleep(rand() % 2);
}

void* philosopher(void* num) {
    int id = *(int*)num;
    int left = id;
    int right = (id + 1) % N;

    while (1) {
        think(id);

        if (id == N - 1) { // Last philosopher picks right chopstick first
            pthread_mutex_lock(&chopsticks[right]);
            pthread_mutex_lock(&chopsticks[left]);
        } else {
            pthread_mutex_lock(&chopsticks[left]);
            pthread_mutex_lock(&chopsticks[right]);
        }

        eat(id);

        pthread_mutex_unlock(&chopsticks[left]);
        pthread_mutex_unlock(&chopsticks[right]);
    }
    pthread_exit(NULL);
}

int main(int argc, char* argv[]) {
    if (argc != 2) {
        printf("Usage: %s <number_of_philosophers>\n", argv[0]);
        return 1;
    }

    N = atoi(argv[1]);
    pthread_t philosophers[N];
    chopsticks = malloc(N * sizeof(pthread_mutex_t));
    int ids[N];

    // Initialize chopsticks
    for (int i = 0; i < N; i++) {
        pthread_mutex_init(&chopsticks[i], NULL);
    }

    // Create philosopher threads
    for (int i = 0; i < N; i++) {
        ids[i] = i;
        pthread_create(&philosophers[i], NULL, philosopher, &ids[i]);
    }

    // Wait for philosopher threads to finish
    for (int i = 0; i < N; i++) {
        pthread_join(philosophers[i], NULL);
    }

    // Cleanup
    for (int i = 0; i < N; i++) {
        pthread_mutex_destroy(&chopsticks[i]);
    }
    free(chopsticks);

    return 0;
}

Initialization: The program dynamically allocates an array of N mutexes, each representing a chopstick. N is the number of philosophers passed by the user.
Philosopher Routine:
- Each philosopher thinks for a random period, then attempts to eat by picking up the left and right chopsticks.
- To avoid deadlocks, all philosophers except for the last one (with idN - 1) pick up the left chopstick first, then the right. The last philosopher picks up the right chopstick first, then the left. This breaks the circular wait condition and prevents deadlock.
- After eating, the philosopher releases both chopsticks and repeats the cycle.
Main Function: Initializes mutexes, creates philosopher threads, waits for them to finish (though in this infinite loop example, they never do), and cleans up resources.

This solution prevents deadlocks by ensuring that at least one philosopher attempts to pick up chopsticks in a different order than the others, thus avoiding the circular wait condition necessary for a deadlock. However, this approach might lead to starvation if a philosopher is continuously preempted when trying to pick up chopsticks. Various strategies, such as introducing randomness or limiting eating times, can help mitigate starvation.

Implementing a parallel merge sort involves dividing the data into smaller segments, sorting those segments concurrently in separate threads, and then merging the segments back together. This approach can significantly reduce sorting time for large datasets when using multiple cores.

Below is a simplified example of how you might implement parallel merge sort using pthreads. The example assumes you have a basic understanding of merge sort, pthreads creation, and joining. Given the complexity of correctly implementing and managing memory, especially with over a million integers, this example is kept conceptual and simplified for clarity.

Step 1: Define the Structure for Sorting

First, define a structure that holds information about a segment of the array to be sorted or merged:

typedef struct {
    int *array;
    int left;
    int right;
} SortSegment;

Step 2: Merge Function

Implement a merge function that merges two sorted halves of an array:

void merge(int *array, int left, int mid, int right) {
    int n1 = mid - left + 1;
    int n2 = right - mid;
    int L[n1], R[n2];

    for (int i = 0; i < n1; i++)
        L[i] = array[left + i];
    for (int j = 0; j < n2; j++)
        R[j] = array[mid + 1 + j];

    int i = 0, j = 0, k = left;
    while (i < n1 && j < n2) {
        if (L[i] <= R[j]) {
            array[k] = L[i];
            i++;
        } else {
            array[k] = R[j];
            j++;
        }
        k++;
    }

    while (i < n1) {
        array[k] = L[i];
        i++;
        k++;
    }

    while (j < n2) {
        array[k] = R[j];
        j++;
        k++;
    }
}

Step 3: Parallel Merge Sort Function

Define the merge sort function that a thread will execute. This function will recursively split the array segment and create new threads for sorting each half, up to a certain depth based on the number of threads the user wants to use:

void* parallelMergeSort(void* arg) {
    SortSegment *segment = (SortSegment*) arg;
    int left = segment->left;
    int right = segment->right;
    if (left < right) {
        int mid = left + (right - left) / 2;

        SortSegment seg1 = {segment->array, left, mid};
        SortSegment seg2 = {segment->array, mid + 1, right};

        pthread_t thread1, thread2;
        int threadLimit = 4; // Example limit, adjust based on input or hardware

        if (right - left > threadLimit) { // Decide when to stop creating new threads
            pthread_create(&thread1, NULL, parallelMergeSort, &seg1);
            pthread_create(&thread2, NULL, parallelMergeSort, &seg2);

            pthread_join(thread1, NULL);
            pthread_join(thread2, NULL);
        } else { // Fallback to sequential merge sort for small segments
            parallelMergeSort(&seg1);
            parallelMergeSort(&seg2);
        }

        merge(segment->array, left, mid, right);
    }
    pthread_exit(NULL);
}

Step 4: Main Function

In your main function, you would initialize the array, create the initial SortSegment, and start the first thread for sorting:

int main(int argc, char **argv) {
    int N = 1000000; // Example: Sort 1,000,000 integers
    int *array = malloc(N * sizeof(int));

    // Populate the array with random numbers
    for (int i = 0; i < N; i++) {
        array[i] = rand() % N;
    }

    SortSegment segment = {array, 0, N - 1};
    pthread_t initialThread;
    pthread_create(&initialThread, NULL, parallelMergeSort, &segment);
    pthread_join(initialThread, NULL);

    // Use the sorted array
    // ...

    free(array);
    return 0;
}

The SortSegment struct allows us to pass the segment of the array being sorted or merged to the thread function.
The merge function combines two sorted halves of an array into a single sorted sequence.
The parallelMergeSort function decides whether to continue creating new threads for sorting subsegments or to sort them sequentially, based on a threshold. This threshold helps avoid excessive thread creation, which can be costly and counterproductive.
The example uses a simple strategy for limiting the creation of threads by defining a threshold (`threadLimit`). Adjust this limit based on the number of processors and the size of the dataset for optimal performance.
The main function initializes the array, starts the sorting process with a single thread, and then joins this thread.

Matrix multiplication can be parallelized by dividing the work among multiple threads, where each thread is responsible for computing a portion of the result matrix. For an (n x n) matrix multiplication problem, if we decide to use N threads (with N provided by the user), each thread can be assigned a specific set of rows or elements to compute in the resulting matrix. The approach here divides the task row-wise among the threads, but keep in mind that other strategies, such as block decomposition, can be more efficient but also more complex to implement.

Step 1: Structure for Thread Data

First, define a structure to pass necessary data to each thread:

typedef struct {
    int id; // Thread ID
    int n; // Dimension of the matrices
    int** A; // First matrix
    int** B; // Second matrix
    int** C; // Result matrix
} ThreadData;

Step 2: Matrix Multiplication Function for Each Thread

Each thread will execute this function. It calculates a portion of the matrix multiplication, specifically the rows assigned to it:

void* multiplyRow(void* arg) {
    ThreadData* data = (ThreadData*)arg;
    int n = data->n;
    int rowStart = (data->id * n) / data->n; // Calculate start row for this thread
    int rowEnd = ((data->id + 1) * n) / data->n; // Calculate end row for this thread

    for (int i = rowStart; i < rowEnd; ++i) {
        for (int j = 0; j < n; ++j) {
            data->C[i][j] = 0;
            for (int k = 0; k < n; ++k) {
                data->C[i][j] += data->A[i][k] * data->B[k][j];
            }
        }
    }

    pthread_exit(NULL);
}

Step 3: Main Function

The main function initializes the matrices, creates threads, and assigns each thread a portion of the matrix multiplication task:

#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

// Assume definitions of ThreadData and multiplyRow() are here

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("Usage: %s <n>\n", argv[0]);
        return 1;
    }

    int n = atoi(argv[1]);
    pthread_t* threads = malloc(n * sizeof(pthread_t));
    ThreadData* threadData = malloc(n * sizeof(ThreadData));

    // Initialize matrices A, B, and C
    int** A = malloc(n * sizeof(int*));
    int** B = malloc(n * sizeof(int*));
    int** C = malloc(n * sizeof(int*));
    for (int i = 0; i < n; ++i) {
        A[i] = malloc(n * sizeof(int));
        B[i] = malloc(n * sizeof(int));
        C[i] = malloc(n * sizeof(int));
        // Assume matrices A and B are filled with valid integers
    }

    // Create threads
    for (int i = 0; i < n; ++i) {
        threadData[i] = (ThreadData){ .id = i, .n = n, .A = A, .B = B, .C = C };
        pthread_create(&threads[i], NULL, multiplyRow, &threadData[i]);
    }

    // Join threads
    for (int i = 0; i < n; ++i) {
        pthread_join(threads[i], NULL);
    }

    // The result is in matrix C
    // Clean up
    for (int i = 0; i < n; ++i) {
        free(A[i]);
        free(B[i]);
        free(C[i]);
    }
    free(A);
    free(B);
    free(C);
    free(threads);
    free(threadData);

    return 0;
}

Thread Data Structure: Each thread receives a ThreadData structure containing pointers to the matrices involved in the operation, its unique thread ID, and the dimension (n) of the matrices.
Matrix Multiplication Function (multiplyRow): This function calculates specific rows of the result matrix (C), based on the thread's ID. It iterates over the rows and columns of matrices (A) and (B), performing the dot product to fill in the result matrix.
Main Function:
- It reads the matrix dimension (n) from the command line.
- Initializes matrices (A), (B), and (C) with dynamic memory allocation. For simplicity, the code for filling matrices (A) and (B) with values is omitted.
Creates (n) threads, each assigned a portion of the work. The example evenly distributes rows among the threads, but depending on (N) and the workload, some threads may compute more rows than others.
- Waits for all threads to complete using pthread_join.
- Cleans up allocated memory.

Jyotiprakash's Blog

Jyotiprakash's Blog

Multithreading and Synchronization

Examples