Concurrency Exercises
Practice atomic operations, lock-free programming, and OpenMP parallelization.
Exercise 1: Debug a Data Race
Objective
Use ThreadSanitizer to find and fix data races.
Setup
// File: race_exercise.cpp
#include <thread>
#include <vector>
#include <iostream>
int counter = 0; // BUG: Not atomic!
void increment(int times) {
for (int i = 0; i < times; ++i) {
counter++; // Data race!
}
}
int main() {
const int threads = 4;
const int increments = 1000000;
std::vector<std::thread> t;
for (int i = 0; i < threads; ++i) {
t.emplace_back(increment, increments);
}
for (auto& thread : t) {
thread.join();
}
std::cout << "Expected: " << threads * increments << "\n";
std::cout << "Actual: " << counter << "\n";
return 0;
}Tasks
Compile and run
bashg++ -O2 -pthread race_exercise.cpp -o race_exercise ./race_exerciseNote that the result is wrong!
Detect with ThreadSanitizer
bashg++ -O2 -pthread -fsanitize=thread race_exercise.cpp -o race_exercise_tsan ./race_exercise_tsanFix the data race
Verify the fix
Questions
Q: What are three ways to fix this?
- Use
std::atomic<int>with appropriate memory ordering - Use a mutex (
std::mutex) - Use thread-local counters and combine at the end
Exercise 2: Memory Ordering
Objective
Understand the performance difference between memory orderings.
Setup
#include <atomic>
#include <chrono>
#include <iostream>
#include <thread>
std::atomic<int> counter{0};
void increment_relaxed(int times) {
for (int i = 0; i < times; ++i) {
counter.fetch_add(1, std::memory_order_relaxed);
}
}
void increment_seq_cst(int times) {
for (int i = 0; i < times; ++i) {
counter.fetch_add(1, std::memory_order_seq_cst);
}
}Tasks
Benchmark both versions
Compare performance
Explain the difference
Questions
Q: When should you use each ordering?
relaxed: Counters, statistics where exact ordering doesn't matteracquire/release: Producer-consumer patternsseq_cst: Default, use when you're unsure or need total ordering
Exercise 3: Implement a Thread-Safe Counter
Objective
Implement a thread-safe counter with cache line padding.
Setup
// TODO: Implement this
class ThreadSafeCounter {
public:
void increment();
int get() const;
private:
// Your members here
};Requirements
- Support concurrent increments
- Prevent false sharing between multiple counters
- Support fast reads
Tasks
Implement the class
Test with multiple threads
Verify no false sharing
Hint
Click for hint
Use alignas(64) to prevent false sharing:
struct alignas(64) CacheLineAligned {
std::atomic<int> value{0};
};Exercise 4: SPSC Queue Verification
Objective
Verify the correctness of a lock-free queue.
Setup
Use the queue from examples/05-concurrency/src/lock_free_queue.cpp.
Tasks
Write a test that:
- Producer pushes N items
- Consumer pops N items
- Verify all items received
Test under load
bash# Run with ThreadSanitizer cmake --preset=tsan cmake --build build/tsan ./build/tsan/your_testVerify no data races
Questions
Q: What invariants must the queue maintain?
- No element is lost
- No element is duplicated
- FIFO ordering is preserved
- Queue never enters invalid state (memory safety)
Exercise 5: OpenMP Scaling
Objective
Measure thread scaling efficiency with OpenMP.
Setup
#include <omp.h>
#include <vector>
#include <iostream>
void parallel_sum(const std::vector<int>& data) {
long long sum = 0;
#pragma omp parallel for reduction(+:sum)
for (size_t i = 0; i < data.size(); ++i) {
sum += data[i];
}
std::cout << "Sum: " << sum << "\n";
}Tasks
Measure with different thread counts
bashexport OMP_NUM_THREADS=1 time ./parallel_sum export OMP_NUM_THREADS=2 time ./parallel_sum export OMP_NUM_THREADS=4 time ./parallel_sumCalculate efficiency
- Efficiency = Actual Speedup / (Thread Count × 100%)
- Ideal: 100% efficiency
Identify bottlenecks
- Memory bandwidth?
- Load imbalance?
- False sharing?
Questions
Q: Why does efficiency decrease with more threads?
Amdahl's Law: Serial portions don't parallelize
- Memory bandwidth saturation
- Synchronization overhead
- Cache contention
- NUMA effects on multi-socket systems
Challenge: Parallelize a Real Algorithm
Objective
Parallelize a matrix multiplication using OpenMP.
Setup
void matrix_mult_scalar(const float* A, const float* B, float* C,
size_t N) {
for (size_t i = 0; i < N; ++i) {
for (size_t j = 0; j < N; ++j) {
float sum = 0.0f;
for (size_t k = 0; k < N; ++k) {
sum += A[i * N + k] * B[k * N + j];
}
C[i * N + j] = sum;
}
}
}Tasks
Parallelize with OpenMP
Optimize cache usage (loop tiling)
Combine with SIMD
Measure scaling efficiency
Hint
Click for hint
void matrix_mult_openmp(const float* A, const float* B, float* C,
size_t N) {
#pragma omp parallel for
for (size_t i = 0; i < N; ++i) {
for (size_t j = 0; j < N; ++j) {
float sum = 0.0f;
#pragma omp simd reduction(+:sum)
for (size_t k = 0; k < N; ++k) {
sum += A[i * N + k] * B[k * N + j];
}
C[i * N + j] = sum;
}
}
}Solutions
See solutions.md for complete solutions.
Next Steps
- Review the Concurrency Module
- Read Memory Utilities API for cache line padding
- Explore Profiling Guide for analyzing multi-threaded performance