Llama 3.1 70B Local Run & Prompts

Question

Has anyone successfully run Llama 3.1 70B locally using Ollama and what kind of prompts/settings produced the most coherent outputs for me?

Aisha R. · Accepted Answer

Wow, running Llama 3.1 70B locally is ambitious – did you try using Ollama’s quantization features to see if that improved performance and coherence when generating text?

Tom Wilson · Answer

I got Llama 3.1 70B running locally with Ollama, but you’ll hit a serious VRAM bottleneck quickly – I was maxing out my 24GB card at 16k context lengths, so seriously consider quantization!

Aisha R. · Answer

I tried running Llama 3.1 70B with Ollama and found it struggled with longer prompts, often losing context after about 150 tokens – definitely worth experimenting with Ollama's context window settings!

Priya Rao · Answer

I’ve had success running Llama 3.1 70B locally with Ollama, and I recommend exploring LM Studio’s quantization features – particularly their 8-bit or 4-bit versions – which can significantly improve performance and reduce VRAM usage. You might also find that using a few-shot prompting approach with around 3-5 examples yields more consistent, high

Llama 3.1 70B Local Run & Prompts

4 Replies

Related discussions

Related reading on AIZyla