llamacpp_on_dragon_wing_q6a_guide

History

Jimmy Devine e6fa9052b3 Add NPU offload results: offload_op, direct-compute, 10GB, Q8_0 4.3x prompt speedup - offload_op callback now implemented (MUL_MAT/MUL_MAT_ID) - Memory raised to 10 GiB - Direct compute mode bypasses broken dspqueue on this board - Q8_0 1B model: 115 t/s prompt (4.3x vs CPU 27 t/s) - Generation 9.6 t/s (27% slower than CPU, expected) - dspqueue path fails with error 0x0000002e - llama-cli renamed to llama-simple in current build - Updated scripts for direct-compute mode - Docs updated with new findings and instructions		2026-05-02 14:17:27 +02:00
..
build-hexagon.sh	Initial commit: Q6A Hexagon v68 + llama.cpp guide	2026-05-02 10:28:51 +02:00
deploy-to-q6a.sh	Add NPU offload results: offload_op, direct-compute, 10GB, Q8_0 4.3x prompt speedup	2026-05-02 14:17:27 +02:00
test-7b.sh	Update with full NPU analysis and benchmarks	2026-05-02 12:42:42 +02:00
test-on-q6a.sh	Add NPU offload results: offload_op, direct-compute, 10GB, Q8_0 4.3x prompt speedup	2026-05-02 14:17:27 +02:00