From f43040117fbec355a48b3162e2c22e176f6e0176 Mon Sep 17 00:00:00 2001
From: James Devine <devine.jd@gmail.com>
Date: Sat, 2 May 2026 22:50:54 +0200
Subject: [PATCH] Update README with NPU optimization details

Added information about hardware and optimization strategies for NPU running LLMs.
---
 README.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README.md b/README.md
index be23abcea..49071bd64 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,7 @@
+# About this repo
+I recently bought a Radxa Dragon Wing Q6A and have been using claude code and hermes (with Deepseek V4 flash under the hood) to look at different strategies for optimising the NPU running LLMs. I got a long way with Llama.cpp, however I'm now swapping to LiteRT-LM to see if that goes better. TLDR the ingestion of tokens goes way faster with the NPU, but I couldn't get the TG to be any faster than with the CPU, so not really a win. The hardware is fairly limited, but I was hoping for at least some benefits over CPU on my 12gb board. Check out the guide I had the harness write here:
+https://github.com/pingud98/llamacpp_on_dragon_wing_q6a_guide/ 
+
 # llama.cpp
 
 ![llama](https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png)