llama_cpp_for_radxa_dragon_wing_q6a

History

Georgi Gerganov 080b161995 completion : fix prompt cache for recurrent models (#19045 )		2026-01-25 09:12:50 +02:00
..
batched-bench	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
cli	common : use two decimal places for float arg help messages (#19048 )	2026-01-25 07:31:42 +01:00
completion	completion : fix prompt cache for recurrent models (#19045 )	2026-01-25 09:12:50 +02:00
cvector-generator	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
export-lora
fit-params	llama-fit-params: keep explicit --ctx-size 0 (#19070 )	2026-01-24 22:13:08 +01:00
gguf-split
imatrix	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
llama-bench	Setting mmap and direct_io to false as default in llama-bench.cpp (#18841 )	2026-01-16 09:46:51 +01:00
mtmd	mtmd : update docs to use llama_model_n_embd_inp (#18999 )	2026-01-22 14:36:32 +01:00
perplexity	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
quantize	quantize: prevent input/output file collision (#18451 )	2025-12-31 23:29:03 +08:00
rpc
server	common : use two decimal places for float arg help messages (#19048 )	2026-01-25 07:31:42 +01:00
tokenize
tts	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
CMakeLists.txt	cmake: only build cli when server is enabled (#18670 )	2026-01-09 16:43:26 +01:00