Articles

CUDA: Why Modern GPUs and LLMs Depend on it

This article examines CUDA as the execution contract between software and GPU hardware

January 30, 2026

Tensor Cores, Memory Coalescing, and Constraints

Tensor cores deliver extreme compute density, but their throughput is bounded by scheduling, tiling, and memory movement long before peak FLOPs are reached. This article examines tensor cores as fixed hardware constraints

January 25, 2026

gpuhardwaretensor

GPU Execution for ML

An overview of how GPUs run ML workloads in practice, with attention to occupancy, latency hiding, memory bottlenecks, and why throughput depends more on data movement than FLOPs.

January 22, 2026

gpuhardwaretech