SamSuka
GET GOING FAST
GET GOING FAST

patreon


Llama.cpp - Quantize Models to Run Faster! (even on older GPUs!)

Hey guys! You ever found a model that you really want to play with but it's just too slow? This is the tutorial for you - we walk through taking a large LLM model (13B in this case), and make it run fairly fast on our computer.

How does it work? By removing precision from the larger models, AKA quantizing, we are able to get almost the same quality at half the resource cost! And in just moments! That's a game changer especially if we don't got great computers.

This is an easy process that once you figure out takes just mere minutes.

 For this tutorial you need Llama.cpp installed. If you have not already installed it then follow our tutorials below. ------

 Installing LLama.cpp - https://youtu.be/r-05yuXTEPE 
Converting Safetensors to GGUF - https://youtu.be/-VJz7V5MyIM 
Finding models on Huggingface.co - https://youtu.be/V5A496JEqbo

Llama.cpp - Quantize Models to Run Faster! (even on older GPUs!)

More Creators