API Specifications

This directory contains API interface definitions for the Mini-Inference Engine.

Purpose

API specifications serve as the contract between components. They define:

Function signatures
Data types and structures
Error codes and handling
Interface contracts

Current Status

The project’s API is defined in header files (include/) with inline documentation. Future API specifications may include:

OpenAPI/Swagger definitions (for potential REST API)
GraphQL schemas (for query interfaces)
Protocol buffer definitions (for serialization)

API Documentation

For complete API documentation, see:

Key Interfaces

GEMM Operations

// Basic GEMM kernels
void launch_naive_matmul(const float* A, const float* B, float* C,
                         int M, int N, int K, cudaStream_t stream = 0);

void launch_tiled_gemm(const float* A, const float* B, float* C,
                       int M, int N, int K, cudaStream_t stream = 0);

// Optimized GEMM
void launch_optimized_gemm(const float* A, const float* B, float* C,
                           int M, int N, int K, cudaStream_t stream = 0);

// Fused operations
void launch_fused_gemm(const float* A, const float* B, float* C,
                       const float* bias, int M, int N, int K,
                       bool add_bias, bool apply_relu, cudaStream_t stream = 0);

Inference Engine

class InferenceEngine {
public:
    void init(int device_id = 0);
    bool load_weights(const std::string& path);
    void forward(const float* input, float* output, int batch_size);
    void cleanup();
};

See AGENTS.md for development workflow.