pingu_98/llamacpp_on_dragon_wing_q6a_guide

Jimmy Devine 18970e3258 Initial commit: Q6A Hexagon v68 + llama.cpp guide

Complete documentation for running llama.cpp with the Qualcomm Hexagon
CDSP v68 NPU backend on a Radxa Dragon Q6A (SA8775P) board.

Includes:
- Corrected FastRPC test harness (libcdsprpc handles INIT_CREATE)
- Minimal DSP stub library
- Cross-compile build script for llama.cpp
- Deploy and test scripts for Q6A
- Kernel FastRPC header for reference
- Comprehensive README with lessons learned

Key findings:
- Do NOT call FASTRPC_IOCTL_INIT_CREATE manually
- Must link against Q6A system libcdsprpc (not SDK cross-compiled)
- Build verified: 32 t/s prompt, 4.5 t/s generation on 1B model

2026-05-02 10:28:51 +02:00

1.5 KiB

Raw Blame History

Q6A Hexagon Guide — AGENTS.md

This repo documents how to get llama.cpp running with the Qualcomm Hexagon CDSP v68 backend on a Radxa Dragon Q6A board (SA8775P).

Key Rules

Do NOT call FASTRPC_IOCTL_INIT_CREATE manually. Let libcdsprpc handle it.
Always link against Q6A system libcdsprpc (/usr/lib/libcdsprpc.so.1), not the SDK's cross-compiled version.
Do NOT set CMAKE_SYSROOT in the cross-compile — it conflicts with Ubuntu's cross-compiler linker scripts.
Use rpcmem_alloc for DSP compute buffers — stack arrays only work for tiny buffers (~4KB fragile slow path).

Build Command

cd ~/llama.cpp
bash scripts/build-hexagon.sh

Deploy Command

Q6A=radxa@192.168.1.11 bash scripts/deploy-to-q6a.sh

Test Command

bash scripts/test-on-q6a.sh

File Reference

src/test_fastrpc_fixed.c — Correct init sequence (reference for how to open HTP handles)
src/htp_minimal_impl.c — Minimal DSP stub (for testing, full library works instead)
scripts/build-hexagon.sh — llama.cpp cmake build for aarch64 + Hexagon
scripts/deploy-to-q6a.sh — Deploy to Q6A
scripts/test-on-q6a.sh — Run inference test on Q6A
references/fastrpc.h — FastRPC ioctl definitions from Q6A kernel
README.md — Full guide with troubleshooting

Performance Baseline

Prompt processing: ~32 t/s (on 8 CPU cores)
Generation: ~4.5 t/s
Model: llama-3.2-1b-q4km.gguf (1B params, Q4_K_M)

1.5 KiB Raw Blame History

Q6A Hexagon Guide — AGENTS.md

Key Rules

Build Command

Deploy Command

Test Command

File Reference

Performance Baseline

1.5 KiB

Raw Blame History