Menu

#Tokens

227 posts

Feed·
20 of 227 posts
China's Push for AI Token Futures Signals New Front in U.S. Tech Rivalry
🖼️
0

China's Push for AI Token Futures Signals New Front in U.S. Tech Rivalry

WebProNews·Juan Vasquez·2 days ago
#UsozwZoU

China's Shanghai Futures Exchange is designing futures contracts for AI tokens, the basic units powering large language models. Daily usage has surged 1,000-fold to over 140 trillion. The project diverges from U.S.…

15s
Read More
GitHub - jmaczan/tiny-vllm: Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM
🖼️
0

GitHub - jmaczan/tiny-vllm: Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

Hacker News·Hacker News·2 days ago
#9LqqwrXL
#github#include#define#need#model#number

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM - jmaczan/tiny-vllm

15s
Read More
LFM2.5-8B-A1B: an Even Better on-Device Mixture-of-Experts | Liquid AI
🖼️
0

LFM2.5-8B-A1B: an Even Better on-Device Mixture-of-Experts | Liquid AI

#liquid#lfm2#model#tokens#token#models

Today, we’re releasing LFM2.5-8B-A1B, a high-throughput edge model optimized for fast, reliable tool calling and complex instruction following on consumer hardware, delivering compressed performance competitive with much larger models and day-one support…

15s
Read More
Real-time LLM Inference on Standard Datacenter GPUs (3,000 tokens/s per request)
🖼️
0

Real-time LLM Inference on Standard Datacenter GPUs (3,000 tokens/s per request)

#blog#speed#model#inference#memory#tokens

Today, Kog AI launches a tech preview of the Kog Inference Engine (KIE): 3,000 output tokens/s per request on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 (FP16, no speculative decoding).…

15s
Read More