API Specifications

This directory contains API interface definitions for the Mini-Inference Engine.

Purpose

API specifications serve as the contract between components. They define:

  • Function signatures
  • Data types and structures
  • Error codes and handling
  • Interface contracts

Current Status

The project’s API is defined in header files (include/) with inline documentation. Future API specifications may include:

  • OpenAPI/Swagger definitions (for potential REST API)
  • GraphQL schemas (for query interfaces)
  • Protocol buffer definitions (for serialization)

API Documentation

For complete API documentation, see:

Key Interfaces

GEMM Operations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Basic GEMM kernels
void launch_naive_matmul(const float* A, const float* B, float* C,
                         int M, int N, int K, cudaStream_t stream = 0);

void launch_tiled_gemm(const float* A, const float* B, float* C,
                       int M, int N, int K, cudaStream_t stream = 0);

// Optimized GEMM
void launch_optimized_gemm(const float* A, const float* B, float* C,
                           int M, int N, int K, cudaStream_t stream = 0);

// Fused operations
void launch_fused_gemm(const float* A, const float* B, float* C,
                       const float* bias, int M, int N, int K,
                       bool add_bias, bool apply_relu, cudaStream_t stream = 0);

Inference Engine

1
2
3
4
5
6
7
class InferenceEngine {
public:
    void init(int device_id = 0);
    bool load_weights(const std::string& path);
    void forward(const float* input, float* output, int batch_size);
    void cleanup();
};

See AGENTS.md for development workflow.


Back to top

MIT License | A learning project for the CUDA community