46 lines
1.5 KiB
Markdown
46 lines
1.5 KiB
Markdown
|
|
# Q6A Hexagon Guide — AGENTS.md
|
||
|
|
|
||
|
|
This repo documents how to get llama.cpp running with the Qualcomm Hexagon CDSP v68 backend on a Radxa Dragon Q6A board (SA8775P).
|
||
|
|
|
||
|
|
## Key Rules
|
||
|
|
|
||
|
|
1. **Do NOT call FASTRPC_IOCTL_INIT_CREATE manually.** Let libcdsprpc handle it.
|
||
|
|
2. **Always link against Q6A system libcdsprpc** (`/usr/lib/libcdsprpc.so.1`), not the SDK's cross-compiled version.
|
||
|
|
3. **Do NOT set CMAKE_SYSROOT** in the cross-compile — it conflicts with Ubuntu's cross-compiler linker scripts.
|
||
|
|
4. **Use rpcmem_alloc for DSP compute buffers** — stack arrays only work for tiny buffers (~4KB fragile slow path).
|
||
|
|
|
||
|
|
## Build Command
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd ~/llama.cpp
|
||
|
|
bash scripts/build-hexagon.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
## Deploy Command
|
||
|
|
|
||
|
|
```bash
|
||
|
|
Q6A=radxa@192.168.1.11 bash scripts/deploy-to-q6a.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
## Test Command
|
||
|
|
|
||
|
|
```bash
|
||
|
|
bash scripts/test-on-q6a.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
## File Reference
|
||
|
|
|
||
|
|
- `src/test_fastrpc_fixed.c` — Correct init sequence (reference for how to open HTP handles)
|
||
|
|
- `src/htp_minimal_impl.c` — Minimal DSP stub (for testing, full library works instead)
|
||
|
|
- `scripts/build-hexagon.sh` — llama.cpp cmake build for aarch64 + Hexagon
|
||
|
|
- `scripts/deploy-to-q6a.sh` — Deploy to Q6A
|
||
|
|
- `scripts/test-on-q6a.sh` — Run inference test on Q6A
|
||
|
|
- `references/fastrpc.h` — FastRPC ioctl definitions from Q6A kernel
|
||
|
|
- `README.md` — Full guide with troubleshooting
|
||
|
|
|
||
|
|
## Performance Baseline
|
||
|
|
|
||
|
|
- Prompt processing: ~32 t/s (on 8 CPU cores)
|
||
|
|
- Generation: ~4.5 t/s
|
||
|
|
- Model: llama-3.2-1b-q4km.gguf (1B params, Q4_K_M)
|