llamafile v0.8.13 Release Notes
llamafile 0.8.13 supports the latest models, has various quality improvements, and has new commands to try like whisperfile (speech to text / translation) and sdfile (image generation). The performance of the new HTTP server for embeddings has tripled.
Other updates include:
- Support for other new model architectures, e.g. Open ELM, GPT NEOX, Arctic, DeepSeek2, ChatGLM, BitNet, T5, JAIS, Poro, Viking, Tekken, and CodeShell.
- Mistral Nemo compatibility. Get the fresh llamafiles here.
- You can now use Gemma 2B. This model was released by Google a few weeks ago. It's very snappy, even on CPU, thanks to the new high-quality vectorized GeLU implementation.
- Better llamafiles for LLaMA v3.1 have been uploaded. It can now scale to the full 128k context window. Your prompt can be a whole book that you can ask questions about.
Join the conversation in Discord!