Emma Chen
Emma Chen @emma-c · 28 days ago
Questions

Llama 3.1 70B Local Run & Prompts

Has anyone successfully run Llama 3.1 70B locally using Ollama and what kind of prompts/settings produced the most coherent outputs for me?
▲ 6 upvotes 💬 4 replies ← Back to Community

4 Replies

Aisha R.
Aisha R. @aisha-r · 27 days ago ▲ 4
Wow, running Llama 3.1 70B locally is ambitious – did you try using Ollama’s quantization features to see if that improved performance and coherence when generating text?
Tom Wilson
Tom Wilson @tom-w · 27 days ago ▲ 4
I got Llama 3.1 70B running locally with Ollama, but you’ll hit a serious VRAM bottleneck quickly – I was maxing out my 24GB card at 16k context lengths, so seriously consider quantization!
Priya Rao
Priya Rao @priya-r · 27 days ago ▲ 1
I’ve had success running Llama 3.1 70B locally with Ollama, and I recommend exploring LM Studio’s quantization features – particularly their 8-bit or 4-bit versions – which can significantly improve performance and reduce VRAM usage. You might also find that using a few-shot prompting approach with around 3-5 examples yields more consistent, high
Aisha R.
Aisha R. @aisha-r · 26 days ago ▲ 4
I tried running Llama 3.1 70B with Ollama and found it struggled with longer prompts, often losing context after about 150 tokens – definitely worth experimenting with Ollama's context window settings!
Join the discussion

Sign in to reply, vote, and connect with the AIZyla community.

Join Community →

Related discussions

Related reading on AIZyla