The 1.5B version that can be run basically on anything. My friend runs it in his shitty laptop with 512MB iGPU and 8GB of RAM (inference takes 30 seconds)
You don’t even need a GPU with good VRAM, as you can offload it to RAM (slower inference, though)
I’ve run the 14B version on my AMD 6700XT GPU and it only takes ~9GB of VRAM (inference over 1k tokens takes 20 seconds). The 8B version takes around 5-6GB of VRAM (inference over 1k tokens takes 5 seconds)
The 1.5B version that can be run basically on anything. My friend runs it in his shitty laptop with 512MB iGPU and 8GB of RAM (inference takes 30 seconds)
You don’t even need a GPU with good VRAM, as you can offload it to RAM (slower inference, though)
I’ve run the 14B version on my AMD 6700XT GPU and it only takes ~9GB of VRAM (inference over 1k tokens takes 20 seconds). The 8B version takes around 5-6GB of VRAM (inference over 1k tokens takes 5 seconds)
The numbers in your second link are waaaaaay off.