Menu

Post image 1
Post image 2
1 / 2
0

BitForge: Run LLMs on Microcontrollers

DEV Community·Aman Sachan·about 1 month ago
#TO1gOW5L
#llm#esp32#iot#python#quantization#tokens
Reading 0:00
15s threshold

Aman Sachan

I got GPT-2 running on an Arduino! Here's the quantization pipeline.

Process:

  1. Q4_K_M quantization via llama.cpp
  2. Memory-mapped flash for weight storage
  3. Optimized matvec for ARM Cortex-M
  4. KV cache quantization

Results:

  • Arduino Nano 33 BLE: 3 tokens/sec
  • ESP32-S3: 15 tokens/sec
  • Raspberry Pi Pico: 8 tokens/sec

Code: github.com/AmSach/bitforge

Hardware requirements: 512KB RAM, 2MB flash.

Read More