- offload_op callback now implemented (MUL_MAT/MUL_MAT_ID) - Memory raised to 10 GiB - Direct compute mode bypasses broken dspqueue on this board - Q8_0 1B model: 115 t/s prompt (4.3x vs CPU 27 t/s) - Generation 9.6 t/s (27% slower than CPU, expected) - dspqueue path fails with error 0x0000002e - llama-cli renamed to llama-simple in current build - Updated scripts for direct-compute mode - Docs updated with new findings and instructions |
||
|---|---|---|
| .. | ||
| build-hexagon.sh | ||
| deploy-to-q6a.sh | ||
| test-7b.sh | ||
| test-on-q6a.sh | ||