llama_cpp_for_radxa_dragon_wing_q6a

History

ddh0 f6dcda3900 server : context checkpointing for hybrid and recurrent models (#16382 ) * initial commit for branch 3 * generalize `swa_checkpoint` to `ctx_checkpoint` this extends `llama-server`'s SWA checkpointing logic to include hybrid/recurrent models such as Jamba, Granite * oops * disable debug prints * keep backwards compat with `--swa-checkpoints` Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update prompt re-processing message * fix off-by-one error per GG * keep `seq_rm` log per GG Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix checkpoint logic to support recurrent caches * server : cleanup and fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2025-10-03 21:34:51 +03:00
..
batched-bench
cvector-generator
export-lora
gguf-split
imatrix
llama-bench
main
mtmd
perplexity
quantize
rpc
run	common: introduce http.h for httplib-based client (#16373 )	2025-10-01 20:22:18 +03:00
server	server : context checkpointing for hybrid and recurrent models (#16382 )	2025-10-03 21:34:51 +03:00
tokenize
tts	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
CMakeLists.txt