Skip to content
How to Start Using Large Language Models on NVIDIA RTX PCs

How to Start Using Large Language Models on NVIDIA RTX PCs

Large language models or LLMs are a type of artificial intelligence that can understand and generate human-like text. Many people want to run these models on their own computers for better privacy and more control. In the past, running LLMs locally meant that you might get lower quality results, but new open source models like OpenAI’s gpt-oss and Alibaba’s Qwen 3 now make it easy to get high quality outputs right on your own PC.

This change creates new chances for students, hobbyists and developers to build and use AI tools without needing cloud services. NVIDIA RTX PCs are especially good for this because their graphics cards can speed up AI tasks so everything runs quickly and smoothly.

To make things easier, NVIDIA has worked with several popular LLM software applications to get the most from RTX graphics cards. One of the simplest tools you can try is Ollama. Ollama lets you run LLMs, have conversations with them, and even drop your PDF documents into chat prompts. You can use it to create study helpers, chatbots, or AI assistants that work just with text or even with images. With NVIDIA’s help, Ollama is now faster and smarter when used on RTX PCs, and works especially well with larger models like gpt-oss-20B and Google’s Gemma models. Multi-GPU setups are also better supported.

Ollama can also connect to other apps. For example, you can use AnythingLLM to build your own personal AI assistant that gets all of Ollama’s performance boosts. AnythingLLM lets you upload documents, set up custom knowledge bases, and chat naturally with your AI, making it great for handling study materials or big research projects.

Another easy app for running LLMs locally is LM Studio, which uses the llama cpp framework. LM Studio’s friendly interface helps you load different AI models, chat with them in real time, or even use them from your own coding projects. NVIDIA has fine-tuned llama cpp for RTX GPUs, making it work faster and with less memory use. It now supports the latest NVIDIA Nemotron Nano v2 9B model, and Flash Attention technology is turned on by default for even quicker answers.

LLMs running locally on RTX PCs are a game changer for students. You can make your own AI-powered study buddy in AnythingLLM by loading your lecture slides, assignments, or textbooks. The AI can then create flashcards, answer questions based on your own notes, write and grade quizzes, and explain tough problems step by step. It is a flexible tool that helps students learn in the way that works best for them and can also help professionals or hobbyists prepare for exams or certifications.

Another cool tool is Project G-Assist, an experimental AI assistant that helps you control and optimize your PC with simple voice or text commands. The latest update now adds the ability to change laptop settings, like switching app modes to save battery, turning on BatteryBoost to make your battery last longer, or using WhisperMode to cut fan noise in half. G-Assist is customizable, so you can even add your own special commands or connect other apps easily. NVIDIA provides guides and sample plug-ins to help you get started.

Recent NVIDIA updates have made all these tools better. Ollama and llama cpp now run faster and use your GPU’s memory more efficiently. Windows ML with NVIDIA TensorRT is now available for Windows 11 PCs, giving up to 50 percent faster AI task performance. The NVIDIA Nemotron collection offers open AI models and resources for developers who want to create new tools or apps.

If you want to learn more, follow NVIDIA and their AI PC team on social media such as Facebook, Instagram, TikTok and X. You can also sign up for their newsletter to stay updated on the latest AI PC news and tips.

Original article and image: https://blogs.nvidia.com/blog/rtx-ai-garage-how-to-get-started-with-llms/

Cart 0

Your cart is currently empty.

Start Shopping