llamacpp_on_dragon_wing_q6a.../scripts
Jimmy Devine e6fa9052b3 Add NPU offload results: offload_op, direct-compute, 10GB, Q8_0 4.3x prompt speedup
- offload_op callback now implemented (MUL_MAT/MUL_MAT_ID)
- Memory raised to 10 GiB
- Direct compute mode bypasses broken dspqueue on this board
- Q8_0 1B model: 115 t/s prompt (4.3x vs CPU 27 t/s)
- Generation 9.6 t/s (27% slower than CPU, expected)
- dspqueue path fails with error 0x0000002e
- llama-cli renamed to llama-simple in current build
- Updated scripts for direct-compute mode
- Docs updated with new findings and instructions
2026-05-02 14:17:27 +02:00
..
build-hexagon.sh Initial commit: Q6A Hexagon v68 + llama.cpp guide 2026-05-02 10:28:51 +02:00
deploy-to-q6a.sh Add NPU offload results: offload_op, direct-compute, 10GB, Q8_0 4.3x prompt speedup 2026-05-02 14:17:27 +02:00
test-7b.sh Update with full NPU analysis and benchmarks 2026-05-02 12:42:42 +02:00
test-on-q6a.sh Add NPU offload results: offload_op, direct-compute, 10GB, Q8_0 4.3x prompt speedup 2026-05-02 14:17:27 +02:00