llama_cpp_for_radxa_dragon_wing_q6a

History

Gabe Goodhart c08002a198 chat : Granite Docling stopping (#16438 ) * fix: Fix duplicate fake image before token on first slice Branch: GraniteDoclingStopping Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use double-newline before overview image Branch: GraniteDoclingStopping Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Remove incorrect newline at the end of granite chat template gen prompt There should not be one, even for the language models. Branch: GraniteDoclingStopping Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * tests: Remove bad newline from granite chat template test (legacy) Branch: GraniteDoclingStopping Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>		2025-10-06 18:59:40 +02:00
..
batched-bench
cvector-generator
export-lora
gguf-split
imatrix
llama-bench	rpc : add support for multiple devices (#16276 )	2025-10-04 12:49:16 +03:00
main
mtmd	chat : Granite Docling stopping (#16438 )	2025-10-06 18:59:40 +02:00
perplexity
quantize
rpc	rpc : add support for multiple devices (#16276 )	2025-10-04 12:49:16 +03:00
run
server	server: update readme to mention n_past_max metric (#16436 )	2025-10-06 10:53:31 +03:00
tokenize
tts
CMakeLists.txt