Hermes Agent and Qwen 3.6: Local AI Supercharged by NVIDIA RTX and DGX

Hermes Agent: Smarter Local AI On Your PC

Hermes Agent is an open source AI agent framework from Nous Research that has rapidly become one of the most popular agent systems in the world. It is designed from the ground up to run locally, which makes your hardware extremely important. For PC users with NVIDIA RTX GPUs or workstations, Hermes offers a powerful way to turn a desktop into an always on AI assistant.

Like other modern agents, Hermes can plug into messaging apps, access local files and applications, and stay running around the clock. What makes it stand out is how it handles reliability and self improvement.

Hermes focuses on four key capabilities.

Self evolving skills: Whenever Hermes faces a complex task or receives feedback, it can turn that experience into a reusable skill. Over time it builds up a toolkit that lets it handle similar tasks faster and more accurately.
Contained sub agents: Instead of one agent trying to juggle everything, Hermes spins up short lived sub agents for specific subtasks. Each sub agent has its own focused context and tools. This keeps things organized, reduces confusion and allows Hermes to work well even with smaller context windows, which is ideal for local models on consumer GPUs.
Reliability by design: The skills, tools and plugins that ship with Hermes are curated and stress tested by Nous Research. The goal is an agent that just works without endless debugging, even when you are using 30 billion parameter class models on a local machine.
Better results with the same model: Developers comparing identical language models across different agent frameworks have found that Hermes often gets stronger results. The reason is that Hermes is built as an active orchestration layer rather than a thin wrapper. It is designed for persistent on device agents instead of one off prompt execution.

Because both Hermes and the models it uses are meant to run locally, GPU performance matters a lot. This is where NVIDIA RTX graphics cards and NVIDIA RTX PRO workstations come in. They are tailored for AI inference tasks and can keep an agent like Hermes responsive even under heavy workloads.

Qwen 3.6: Big Model Performance in a Local Friendly Package

Hermes Agent is model agnostic, but one of the most impressive pairings is with the new Qwen 3.6 series from Alibaba. These are open weight large language models that aim to bring data center level intelligence to local hardware.

The Qwen 3.6 lineup includes models such as:

Qwen 3.6 35B: This model runs at around 20 GB of memory usage while beating older 120 billion parameter models that can need more than 70 GB of memory. That makes it far more practical for high end RTX GPUs and compact AI systems.
Qwen 3.6 27B: A dense model with more active parameters that can match the accuracy of huge models like Qwen 3.5 397B, but at roughly one sixteenth of the size. For local users, this means much less VRAM while still getting serious reasoning ability.

On NVIDIA RTX GPUs and NVIDIA DGX Spark systems, these models gain a major boost from Tensor Cores, which are specialized hardware blocks for AI inference. The result is higher throughput and lower latency, so an agent like Hermes can plan multistep tasks, update its own skills or respond to your prompts in seconds instead of minutes.

For enthusiasts and developers running AI alongside other workloads, the efficiency of Qwen 3.6 is a big deal. It lets you run frontier class models without needing enormous data center gear, especially when paired with modern gaming or workstation GPUs.

NVIDIA DGX Spark and Getting Started on Your Own Hardware

For users who want a dedicated always on AI machine, NVIDIA offers DGX Spark. It is a compact personal AI supercomputer designed for sustained agentic workloads and local models.

DGX Spark includes:

128 GB of unified memory that can comfortably handle very large models and multiple concurrent tasks.
Up to 1 petaflop of AI performance, enough to run 120 billion parameter mixture of experts models continuously.
Efficient support for Qwen 3.6 35B, which can deliver intelligence similar to 120 billion parameter models but with a much leaner footprint, freeing up capacity for parallel workloads.

NVIDIA provides a DGX Spark playbook for Hermes that walks through setup and optimization, making it easier to get a stable local agent environment.

If you are ready to try Hermes on your own PC or workstation, the process is straightforward:

Go to the Hermes Agent GitHub repository and follow the setup instructions.
Choose a local model such as Qwen 3.6 and a runtime like llama.cpp, LM Studio or Ollama.
Hermes has built in integrations with LM Studio and Ollama, which simplifies configuration and management of models.

This setup is ideal for power users, PC enthusiasts and developers who want a personal AI that stays on their machine, uses their GPU and can be customized around their own workflows.

NVIDIA continues to push local AI performance further with RTX PRO GPUs and new model formats. Examples include:

RTX PRO GPUs delivering up to three times faster token generation for Qwen 3.6 models with llama.cpp, improving real time responsiveness.
Google Gemma 4 26B and 31B models available as NVFP4 checkpoints for Blackwell GPUs, combined with Multi Token Prediction to triple inference speed at the same quality.
Updated Mistral Medium 3.5 support in llama.cpp and Ollama, allowing it to run efficiently on RTX PRO and DGX Spark systems.

Alongside Hermes, NVIDIA has also introduced NemoClaw, an open source stack that optimizes OpenClaw style agent experiences on NVIDIA devices with greater security and local model support. NemoClaw now works with Windows Subsystem for Linux 2 which makes it more accessible for Windows PC users.

For gamers, creators and PC hardware enthusiasts, this ecosystem points to a future where powerful AI agents run directly on your own GPU. With frameworks like Hermes, efficient models like Qwen 3.6 and hardware such as RTX GPUs and DGX Spark, local AI becomes fast, reliable and ready to plug into your everyday workflows.

Original article and image: https://blogs.nvidia.com/blog/rtx-ai-garage-hermes-agent-dgx-spark/

Nova Series

Nova Series

Hermes Agent and Qwen 3.6: Local AI Supercharged by NVIDIA RTX and DGX Spark

Hermes Agent: Smarter Local AI On Your PC

Qwen 3.6: Big Model Performance in a Local Friendly Package

NVIDIA DGX Spark and Getting Started on Your Own Hardware

GeForce NOW Update: Faster Gaijin Logins and New RTX 5080 Powered Games