📚 API Reference
Complete GPU SpMV public API documentation.
Table of Contents
- Header Files Overview
- Error Handling
- Data Structures
- SpMV Computation Interface
- RAII Memory Management
- Performance Benchmarking
- PageRank Algorithm
- Best Practices
Header Files Overview
| Header | Purpose |
|---|---|
<spmv/common.h> |
Error codes and basic definitions |
<spmv/cuda_buffer.h> |
RAII GPU memory management |
<spmv/csr_matrix.h> |
CSR sparse matrix |
<spmv/ell_matrix.h> |
ELL sparse matrix |
<spmv/spmv.h> |
SpMV computation interface |
<spmv/benchmark.h> |
Performance benchmarking framework |
<spmv/pagerank.h> |
PageRank algorithm |
Error Handling
SpMVError Enum
1
2
3
4
5
6
7
8
9
10
11
enum class SpMVError {
SUCCESS = 0, // Operation successful
INVALID_DIMENSION = -1, // Matrix or vector dimension mismatch
CUDA_MALLOC = -2, // GPU memory allocation failed
CUDA_MEMCPY = -3, // GPU memory copy failed
KERNEL_LAUNCH = -4, // CUDA kernel launch/execution failed
INVALID_FORMAT = -5, // Invalid sparse matrix format
FILE_IO = -6, // File read/write error
OUT_OF_MEMORY = -7, // Host/device out of memory
INVALID_ARGUMENT = -8 // Invalid argument provided
};
Error String
1
const char* spmv_error_string(SpMVError err);
Returns a human-readable string describing the error.
Example:
1
2
3
4
SpMVResult result = spmv_csr(csr, d_x, d_y, &config, n);
if (result.error != SpMVError::SUCCESS) {
fprintf(stderr, "Error: %s\n", spmv_error_string(result.error));
}
Data Structures
CSRMatrix
CSR (Compressed Sparse Row) format sparse matrix.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
struct CSRMatrix {
int num_rows; // Number of rows
int num_cols; // Number of columns
int nnz; // Total non-zero elements
float* values; // Non-zero values array [nnz]
int* col_indices; // Column indices array [nnz]
int* row_ptrs; // Row pointers array [num_rows + 1]
// GPU device pointers
float* d_values;
int* d_col_indices;
int* d_row_ptrs;
bool owns_host_memory;
bool owns_device_memory;
};
Invariants:
row_ptrs[0] == 0row_ptrs[num_rows] == nnzrow_ptrs[i] <= row_ptrs[i+1](for all i)
CSRMatrix API
Create Matrix
1
CSRMatrix* csr_create(int num_rows, int num_cols, int nnz);
Create an empty CSR matrix structure.
Parameters:
num_rows: Number of rowsnum_cols: Number of columnsnnz: Number of non-zero elements
Returns: Newly allocated CSRMatrix pointer
Convert from Dense
1
2
void csr_from_dense(CSRMatrix* csr, const float* dense,
int num_rows, int num_cols);
Convert dense matrix to CSR format.
Parameters:
csr: Target CSR matrixdense: Source dense matrix (row-major)num_rows: Number of rowsnum_cols: Number of columns
Transfer to GPU
1
void csr_to_gpu(CSRMatrix* csr);
Transfer CSR matrix data to GPU.
Parameters:
csr: CSR matrix to transfer
Destroy Matrix
1
void csr_destroy(CSRMatrix* csr);
Free all memory used by CSR matrix.
Parameters:
csr: CSR matrix to destroy
ELLMatrix
ELLPACK format sparse matrix.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct ELLMatrix {
int num_rows; // Number of rows
int num_cols; // Number of columns
int max_nnz_per_row; // Maximum non-zero elements per row
int nnz; // Total non-zero elements
float* values; // Values array [num_rows * max_nnz_per_row]
int* col_indices; // Column indices array [num_rows * max_nnz_per_row]
// GPU device pointers
float* d_values;
int* d_col_indices;
bool owns_host_memory;
bool owns_device_memory;
};
ELLMatrix API
Create Matrix
1
ELLMatrix* ell_create(int num_rows, int num_cols, int max_nnz_per_row);
Create an empty ELL matrix structure.
Convert from CSR
1
void ell_from_csr(ELLMatrix* ell, const CSRMatrix* csr);
Convert CSR matrix to ELL format.
Transfer to GPU
1
void ell_to_gpu(ELLMatrix* ell);
Transfer ELL matrix data to GPU.
Destroy Matrix
1
void ell_destroy(ELLMatrix* ell);
Free all memory used by ELL matrix.
SpMV Computation Interface
SpMVConfig
SpMV computation configuration.
1
2
3
4
struct SpMVConfig {
KernelType kernel_type; // Kernel type
bool auto_select; // Auto-select kernel
};
KernelType Enum
1
2
3
4
5
6
enum class KernelType {
SCALAR_CSR, // Scalar CSR kernel
VECTOR_CSR, // Vector CSR kernel
MERGE_PATH, // Merge Path kernel
ELL // ELL kernel
};
Auto Configuration
1
SpMVConfig spmv_auto_config(const CSRMatrix* csr);
Automatically select optimal kernel based on matrix characteristics.
Parameters:
csr: CSR matrix
Returns: Optimized SpMVConfig
Execute SpMV (CSR)
1
2
3
4
5
SpMVResult spmv_csr(const CSRMatrix* csr,
const float* d_x,
float* d_y,
const SpMVConfig* config,
int n);
Execute CSR format SpMV computation: y = A * x
Parameters:
csr: CSR matrix (must be on GPU)d_x: Input vector (GPU pointer)d_y: Output vector (GPU pointer)config: SpMV configurationn: Vector dimension
Returns: SpMVResult containing execution time and error code
Execute SpMV (ELL)
1
2
3
4
SpMVResult spmv_ell(const ELLMatrix* ell,
const float* d_x,
float* d_y,
int n);
Execute ELL format SpMV computation: y = A * x
Parameters:
ell: ELL matrix (must be on GPU)d_x: Input vector (GPU pointer)d_y: Output vector (GPU pointer)n: Vector dimension
Returns: SpMVResult containing execution time and error code
SpMVResult
1
2
3
4
struct SpMVResult {
SpMVError error; // Error code
float time_ms; // Execution time (milliseconds)
};
RAII Memory Management
CudaBuffer
Template class for automatic GPU memory management.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
template<typename T>
class CudaBuffer {
public:
explicit CudaBuffer(size_t count);
~CudaBuffer();
T* data();
const T* data() const;
size_t size() const;
// Copy disabled
CudaBuffer(const CudaBuffer&) = delete;
CudaBuffer& operator=(const CudaBuffer&) = delete;
// Move enabled
CudaBuffer(CudaBuffer&& other) noexcept;
CudaBuffer& operator=(CudaBuffer&& other) noexcept;
};
Example:
1
2
3
4
5
{
CudaBuffer<float> buffer(1000);
// Access GPU pointer with buffer.data()
// Automatically freed when leaving scope
}
Performance Benchmarking
BenchmarkConfig
1
2
3
4
5
struct BenchmarkConfig {
int iterations; // Number of iterations
bool warmup; // Enable warmup
bool print_details; // Print detailed information
};
Run Benchmark
1
2
void spmv_benchmark(const CSRMatrix* csr,
const BenchmarkConfig* config);
Run SpMV benchmark.
Parameters:
csr: CSR matrixconfig: Benchmark configuration
PageRank Algorithm
PageRankConfig
1
2
3
4
5
struct PageRankConfig {
float damping; // Damping factor (default 0.85)
float tolerance; // Convergence threshold (default 1e-6)
int max_iterations; // Maximum iterations (default 100)
};
Execute PageRank
1
2
3
SpMVResult spmv_pagerank(const CSRMatrix* csr,
float* d_rank,
const PageRankConfig* config);
Execute PageRank computation.
Parameters:
csr: CSR matrix (adjacency matrix)d_rank: Rank vector (GPU pointer)config: PageRank configuration
Returns: SpMVResult containing convergence information
Best Practices
1. Resource Management
Use RAII pattern for automatic resource management:
1
2
3
4
5
6
7
8
9
10
11
12
13
// ✅ Recommended
void process() {
CudaBuffer<float> x(1000);
CudaBuffer<float> y(1000);
// Automatic cleanup
}
// ❌ Avoid
void process() {
float* x;
cudaMalloc(&x, 1000 * sizeof(float));
// Easy to forget cudaFree
}
2. Error Handling
Always check returned error codes:
1
2
3
4
5
6
7
SpMVResult result = spmv_csr(csr, d_x, d_y, &config, n);
if (result.error != SpMVError::SUCCESS) {
// Handle error
fprintf(stderr, "SpMV failed: %s\n",
spmv_error_string(result.error));
return result.error;
}
3. Performance Optimization
Reuse execution context for best performance:
1
2
3
4
5
SpMVExecutionContext ctx;
for (int i = 0; i < 100; i++) {
// Texture objects created once, reused thereafter
spmv_csr(csr, d_x, d_y, &config, n, &ctx);
}
Complete API specification see specs/api/public-api.md