Language: English 简体中文

Thank you for your interest in Mini-Inference Engine! This document describes how to contribute to the project.

Table of Contents


Code of Conduct

  • Respect all contributors
  • Maintain professional and constructive discussions
  • Accept constructive criticism
  • Act in the best interest of the project

How to Contribute

Reporting Bugs

  1. Check Issues for existing reports
  2. Create a new issue including:
    • Description: Clear problem description
    • Reproduction steps: How to reproduce
    • Expected behavior: What should happen
    • Actual behavior: What actually happened
    • Environment: GPU model, CUDA version, OS

Feature Requests

  1. Describe the feature need and use case
  2. Explain why this feature is valuable
  3. Provide possible implementation approach (optional)

Submitting Code

1. Fork the Repository

1
2
3
git clone https://github.com/<your-username>/mini-inference-engine.git
cd mini-inference-engine
git remote add upstream https://github.com/LessUp/mini-inference-engine.git

2. Create Branch

1
2
3
4
5
# Feature branch
git checkout -b feature/my-feature

# Or fix branch
git checkout -b fix/bug-description

3. Write Code

Follow Code Style.

4. Test

1
2
3
4
5
6
7
8
# Debug + tests
cmake --preset default
cmake --build --preset default
ctest --preset default

# Release build
cmake --preset release
cmake --build --preset release

5. Commit

1
2
git add .
git commit -m "feat: add new feature"

Commit message format follows Conventional Commits:

Type Description
feat: New feature
fix: Bug fix
docs: Documentation update
perf: Performance improvement
refactor: Code refactoring
test: Test-related
chore: Build/tool changes

6. Push and Create PR

1
git push origin feature/my-feature

Then create a Pull Request on GitHub.


Development Setup

Requirements

Dependency Minimum Recommended
CUDA Toolkit 11.0 12.0+
CMake 3.18 3.25+
C++ Compiler GCC 9 / Clang 10 GCC 11+
GPU Compute Capability 7.0+ 8.0+

Verify Environment

1
2
3
4
5
6
7
8
9
# Check CUDA
nvcc --version
nvidia-smi

# Check CMake
cmake --version

# Check Compiler
gcc --version

Code Style

C++ Style

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Naming conventions
class ClassName;           // Class: PascalCase
void function_name();      // Function: snake_case
int variable_name;         // Variable: snake_case
const int CONSTANT_NAME;   // Constant: UPPER_SNAKE_CASE
int member_variable_;      // Member: snake_case + suffix

// Indentation: 4 spaces
void function() {
    if (condition) {
        // code
    }
}

// Braces: same line
if (condition) {
    // code
} else {
    // code
}

CUDA Style

1
2
3
4
5
6
7
8
9
10
11
12
// Kernel naming: snake_case
__global__ void my_kernel(...);

// Template parameters: UPPER_SNAKE_CASE
template<int BLOCK_SIZE, bool USE_SHARED>
__global__ void templated_kernel(...);

// Shared memory: s_ prefix
__shared__ float s_data[256];

// Register variables: r_ prefix
float r_sum = 0.0f;

Code Formatting

Use clang-format:

1
clang-format --style=file -i <file>

Testing

Unit Tests

Every new feature needs tests:

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <gtest/gtest.h>
#include "feature.h"

class FeatureTest : public ::testing::Test {
protected:
    void SetUp() override {
        CUDA_CHECK(cudaSetDevice(0));
    }
};

TEST_F(FeatureTest, BasicFunctionality) {
    EXPECT_EQ(expected, actual);
}

Performance Tests

For performance-related changes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
TEST_F(FeatureTest, Performance) {
    GpuTimer timer;
    
    // Warmup
    for (int i = 0; i < 5; i++) {
        function_under_test();
    }
    
    // Benchmark
    timer.start();
    for (int i = 0; i < 20; i++) {
        function_under_test();
    }
    timer.stop();
    
    float avg_time = timer.elapsed_ms() / 20;
    printf("Average time: %.3f ms\n", avg_time);
}

Documentation

Code Comments

1
2
3
4
5
6
7
8
9
10
/// @brief Execute optimized GEMM operation
/// @param A Input matrix A (M x K)
/// @param B Input matrix B (K x N)
/// @param C Output matrix C (M x N)
/// @param M Rows of A
/// @param N Columns of B
/// @param K Columns of A / Rows of B
/// @param stream CUDA stream (optional)
void launch_optimized_gemm(const float* A, const float* B, float* C,
                           int M, int N, int K, cudaStream_t stream = 0);

README Updates

When adding new features, update README.md:

  • Feature list
  • Usage examples
  • API documentation

Submission Process

Review Process

  1. Automatic checks
    • Compilation passes
    • Tests pass
    • Code style check
  2. Manual review
    • Code quality
    • Design rationale
    • Documentation completeness
  3. Performance validation (if applicable)
    • No performance regression
    • New optimizations are effective

Merge Requirements

  • All CI checks pass
  • At least 1 reviewer approval
  • No unresolved review comments

Contact



*Last Updated: 2025-04-16 Document Version: v1.1.0*

Back to top

MIT License | A learning project for the CUDA community