Concurrency Exercises

Practice atomic operations, lock-free programming, and OpenMP parallelization.

Exercise 1: Debug a Data Race

Objective

Use ThreadSanitizer to find and fix data races.

Setup

cpp

// File: race_exercise.cpp
#include <thread>
#include <vector>
#include <iostream>

int counter = 0;  // BUG: Not atomic!

void increment(int times) {
    for (int i = 0; i < times; ++i) {
        counter++;  // Data race!
    }
}

int main() {
    const int threads = 4;
    const int increments = 1000000;
    
    std::vector<std::thread> t;
    for (int i = 0; i < threads; ++i) {
        t.emplace_back(increment, increments);
    }
    
    for (auto& thread : t) {
        thread.join();
    }
    
    std::cout << "Expected: " << threads * increments << "\n";
    std::cout << "Actual: " << counter << "\n";
    return 0;
}

Tasks

Compile and run

bash

g++ -O2 -pthread race_exercise.cpp -o race_exercise
./race_exercise

Note that the result is wrong!

Detect with ThreadSanitizer

bash

g++ -O2 -pthread -fsanitize=thread race_exercise.cpp -o race_exercise_tsan
./race_exercise_tsan

Fix the data race
Verify the fix

Questions

Q: What are three ways to fix this?

Use std::atomic<int> with appropriate memory ordering
Use a mutex (std::mutex)
Use thread-local counters and combine at the end

Exercise 2: Memory Ordering

Objective

Understand the performance difference between memory orderings.

Setup

cpp

#include <atomic>
#include <chrono>
#include <iostream>
#include <thread>

std::atomic<int> counter{0};

void increment_relaxed(int times) {
    for (int i = 0; i < times; ++i) {
        counter.fetch_add(1, std::memory_order_relaxed);
    }
}

void increment_seq_cst(int times) {
    for (int i = 0; i < times; ++i) {
        counter.fetch_add(1, std::memory_order_seq_cst);
    }
}

Tasks

Benchmark both versions
Compare performance
Explain the difference

Questions

Q: When should you use each ordering?

relaxed: Counters, statistics where exact ordering doesn't matter
acquire/release: Producer-consumer patterns
seq_cst: Default, use when you're unsure or need total ordering

Exercise 3: Implement a Thread-Safe Counter

Objective

Implement a thread-safe counter with cache line padding.

Setup

cpp

// TODO: Implement this
class ThreadSafeCounter {
public:
    void increment();
    int get() const;
private:
    // Your members here
};

Requirements

Support concurrent increments
Prevent false sharing between multiple counters
Support fast reads

Tasks

Implement the class
Test with multiple threads
Verify no false sharing

Hint

Click for hint

Use alignas(64) to prevent false sharing:

cpp

struct alignas(64) CacheLineAligned {
    std::atomic<int> value{0};
};

Exercise 4: SPSC Queue Verification

Objective

Verify the correctness of a lock-free queue.

Setup

Use the queue from examples/05-concurrency/src/lock_free_queue.cpp.

Tasks

Write a test that:
- Producer pushes N items
- Consumer pops N items
- Verify all items received

Test under load

bash

# Run with ThreadSanitizer
cmake --preset=tsan
cmake --build build/tsan
./build/tsan/your_test

Verify no data races

Questions

Q: What invariants must the queue maintain?

No element is lost
No element is duplicated
FIFO ordering is preserved
Queue never enters invalid state (memory safety)

Exercise 5: OpenMP Scaling

Objective

Measure thread scaling efficiency with OpenMP.

Setup

cpp

#include <omp.h>
#include <vector>
#include <iostream>

void parallel_sum(const std::vector<int>& data) {
    long long sum = 0;
    
    #pragma omp parallel for reduction(+:sum)
    for (size_t i = 0; i < data.size(); ++i) {
        sum += data[i];
    }
    
    std::cout << "Sum: " << sum << "\n";
}

Tasks

Measure with different thread counts

bash

export OMP_NUM_THREADS=1
time ./parallel_sum

export OMP_NUM_THREADS=2
time ./parallel_sum

export OMP_NUM_THREADS=4
time ./parallel_sum

Calculate efficiency
- Efficiency = Actual Speedup / (Thread Count × 100%)
- Ideal: 100% efficiency
Identify bottlenecks
- Memory bandwidth?
- Load imbalance?
- False sharing?

Questions

Q: Why does efficiency decrease with more threads?

Amdahl's Law: Serial portions don't parallelize

Memory bandwidth saturation
Synchronization overhead
Cache contention
NUMA effects on multi-socket systems

Challenge: Parallelize a Real Algorithm

Objective

Parallelize a matrix multiplication using OpenMP.

Setup

cpp

void matrix_mult_scalar(const float* A, const float* B, float* C,
                        size_t N) {
    for (size_t i = 0; i < N; ++i) {
        for (size_t j = 0; j < N; ++j) {
            float sum = 0.0f;
            for (size_t k = 0; k < N; ++k) {
                sum += A[i * N + k] * B[k * N + j];
            }
            C[i * N + j] = sum;
        }
    }
}

Tasks

Parallelize with OpenMP
Optimize cache usage (loop tiling)
Combine with SIMD
Measure scaling efficiency

Hint

Click for hint

cpp

void matrix_mult_openmp(const float* A, const float* B, float* C,
                        size_t N) {
    #pragma omp parallel for
    for (size_t i = 0; i < N; ++i) {
        for (size_t j = 0; j < N; ++j) {
            float sum = 0.0f;
            #pragma omp simd reduction(+:sum)
            for (size_t k = 0; k < N; ++k) {
                sum += A[i * N + k] * B[k * N + j];
            }
            C[i * N + j] = sum;
        }
    }
}

Solutions

See solutions.md for complete solutions.

Next Steps

Review the Concurrency Module
Read Memory Utilities API for cache line padding
Explore Profiling Guide for analyzing multi-threaded performance

Concurrency Exercises ​

Exercise 1: Debug a Data Race ​

Objective ​

Setup ​

Tasks ​

Questions ​

Exercise 2: Memory Ordering ​

Objective ​

Setup ​

Tasks ​

Questions ​

Exercise 3: Implement a Thread-Safe Counter ​

Objective ​

Setup ​

Requirements ​

Tasks ​

Hint ​

Exercise 4: SPSC Queue Verification ​

Objective ​

Setup ​

Tasks ​

Questions ​

Exercise 5: OpenMP Scaling ​

Objective ​

Setup ​

Tasks ​

Questions ​

Challenge: Parallelize a Real Algorithm ​

Objective ​

Setup ​

Tasks ​

Hint ​

Solutions ​

Next Steps ​

Concurrency Exercises

Exercise 1: Debug a Data Race

Objective

Setup

Tasks

Questions

Exercise 2: Memory Ordering

Objective

Setup

Tasks

Questions

Exercise 3: Implement a Thread-Safe Counter

Objective

Setup

Requirements

Tasks

Hint

Exercise 4: SPSC Queue Verification

Objective

Setup

Tasks

Questions

Exercise 5: OpenMP Scaling

Objective

Setup

Tasks

Questions

Challenge: Parallelize a Real Algorithm

Objective

Setup

Tasks

Hint

Solutions

Next Steps