llamacpp_on_dragon_wing_q6a_guide

2 commits 1 branch 0 tags 45 KiB

Author	SHA1	Message	Date
Jimmy Devine	627236a505	Update with full NPU analysis and benchmarks Adds: - Detailed explanation of why Hexagon NPU doesn't accelerate inference - offload_op callback is NULL in ggml-hexagon.cpp - 2048 MiB limit is hardcoded, not hardware-queried - Q4_K_M not supported by HTP kernels (only Q4_0, Q8_0, IQ4_NL, MXFP4) - Full benchmark table: 1B and 7B models, 2K/32K/64K context, CPU vs NPU - All results show CPU and NPU identical within margin of error - 7B test script (test-7b.sh) - Updated deploy script with password handling for DSP .so - Performance baseline in AGENTS.md - Cross-compile pitfalls (CMAKE_SYSROOT, rpcmem_init)	2026-05-02 12:42:42 +02:00
Jimmy Devine	18970e3258	Initial commit: Q6A Hexagon v68 + llama.cpp guide Complete documentation for running llama.cpp with the Qualcomm Hexagon CDSP v68 NPU backend on a Radxa Dragon Q6A (SA8775P) board. Includes: - Corrected FastRPC test harness (libcdsprpc handles INIT_CREATE) - Minimal DSP stub library - Cross-compile build script for llama.cpp - Deploy and test scripts for Q6A - Kernel FastRPC header for reference - Comprehensive README with lessons learned Key findings: - Do NOT call FASTRPC_IOCTL_INIT_CREATE manually - Must link against Q6A system libcdsprpc (not SDK cross-compiled) - Build verified: 32 t/s prompt, 4.5 t/s generation on 1B model	2026-05-02 10:28:51 +02:00

Author

SHA1

Message

Date

Jimmy Devine

627236a505

Update with full NPU analysis and benchmarks

Adds:
- Detailed explanation of why Hexagon NPU doesn't accelerate inference
  - offload_op callback is NULL in ggml-hexagon.cpp
  - 2048 MiB limit is hardcoded, not hardware-queried
  - Q4_K_M not supported by HTP kernels (only Q4_0, Q8_0, IQ4_NL, MXFP4)
- Full benchmark table: 1B and 7B models, 2K/32K/64K context, CPU vs NPU
  - All results show CPU and NPU identical within margin of error
- 7B test script (test-7b.sh)
- Updated deploy script with password handling for DSP .so
- Performance baseline in AGENTS.md
- Cross-compile pitfalls (CMAKE_SYSROOT, rpcmem_init)

2026-05-02 12:42:42 +02:00

Jimmy Devine

18970e3258

Initial commit: Q6A Hexagon v68 + llama.cpp guide

Complete documentation for running llama.cpp with the Qualcomm Hexagon
CDSP v68 NPU backend on a Radxa Dragon Q6A (SA8775P) board.

Includes:
- Corrected FastRPC test harness (libcdsprpc handles INIT_CREATE)
- Minimal DSP stub library
- Cross-compile build script for llama.cpp
- Deploy and test scripts for Q6A
- Kernel FastRPC header for reference
- Comprehensive README with lessons learned

Key findings:
- Do NOT call FASTRPC_IOCTL_INIT_CREATE manually
- Must link against Q6A system libcdsprpc (not SDK cross-compiled)
- Build verified: 32 t/s prompt, 4.5 t/s generation on 1B model

2026-05-02 10:28:51 +02:00