Llama4S is a simple yet highly practical Llama 3.x inference engine written in Scala 3. It is inspired by the llama3.java project, which in turn draws from llama2.c by Andrew Karpathy and his excellent lectures on large language models. Llama4S supports fast general matrix-vector multiplication for quantized tensors through the Java Vector API (JEP 469), and leverages advanced optimizations provided by the Graal compiler compiler for optimal performance.