How to Pick an LLM Quantization?

April 4, 2024

https://blog.elijahlopez.ca/posts/which-quantization-to-choose-local-llm/ Elijah Lopez

#Ai
#Python

Generally speaking, the more bits the better and L > M > S. Therefore, when choosing a model, pick the largest model that fits into your RAM. Your GPU can be used to “offload” layers of the model to the GPUand let the CPU handle the rest. Offloading to the GPU’s VRAM is important for a faster LLM. The only reason I wrote this blog post is because it takes too long to figure this out from online sources.

When choosing the model size, make sure it’s at least 3GB smaller than your RAM size to leave space for other software you are running.