llama_cpp_for_radxa_dragon_.../common
Yes You Can Have Your Own 50e0ad08fb
server: save and clear idle slots on new task (--clear-idle) (#20993)
* server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)

* server: move idle slot KV clearing to slot release

The save "cost" is now paid by the finishing request.

* server: add --kv-clear-idle flag, enable by default

* server: skip clearing last idle slot, clear on launch

* server: test --no-kv-clear-idle flag

* server: simplify on-release clearing loop

* server: remove on-release KV clearing, keep launch-only

* cont : clean-up

* tests: update log strings after --clear-idle rename

* tests: use debug tags instead of log message matching

* test: fix Windows CI by dropping temp log file unlink

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-03 19:02:27 +02:00
..
jinja jinja: coerce input for string-specific filters (#21370) 2026-04-03 15:03:33 +02:00
arg.cpp server: save and clear idle slots on new task (--clear-idle) (#20993) 2026-04-03 19:02:27 +02:00
arg.h
base64.hpp
build-info.cpp.in
chat-auto-parser-generator.cpp common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat-auto-parser-helpers.cpp common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912) 2026-03-23 22:21:47 -05:00
chat-auto-parser-helpers.h chat : avoid including json in chat.h (#21306) 2026-04-03 09:07:59 +03:00
chat-auto-parser.h common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat-diff-analyzer.cpp common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat-peg-parser.cpp fix: gemma 4 template (#21326) 2026-04-02 23:31:02 +02:00
chat-peg-parser.h fix: gemma 4 template (#21326) 2026-04-02 23:31:02 +02:00
chat.cpp common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat.h common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
CMakeLists.txt common : add standard Hugging Face cache support (#20775) 2026-03-24 07:30:33 +01:00
common.cpp tests: allow exporting graph ops from HF file without downloading weights (#21182) 2026-04-02 18:19:20 +02:00
common.h server: save and clear idle slots on new task (--clear-idle) (#20993) 2026-04-03 19:02:27 +02:00
console.cpp
console.h
debug.cpp
debug.h
download.cpp common : cleanup logs and modernize the progress bar (#21215) 2026-03-31 16:18:00 +02:00
download.h common : add standard Hugging Face cache support (#20775) 2026-03-24 07:30:33 +01:00
hf-cache.cpp common : add getpwuid fallback for HF cache when HOME is not set (#21035) 2026-03-26 20:34:23 +01:00
hf-cache.h common : fix split model migration (#21019) 2026-03-26 12:04:37 +01:00
http.h
json-partial.cpp
json-partial.h
json-schema-to-grammar.cpp common/json-schema: fix: handle non-capturing groups (?:...) in JSON schema pattern converter (#21124) 2026-03-28 17:55:38 +01:00
json-schema-to-grammar.h
llguidance.cpp
log.cpp
log.h
ngram-cache.cpp
ngram-cache.h
ngram-map.cpp
ngram-map.h fix: correct misspellings in code comments (#21217) 2026-03-31 13:50:51 +02:00
ngram-mod.cpp
ngram-mod.h
peg-parser.cpp common : fix tool call type detection for nullable and enum schemas (#21327) 2026-04-03 17:51:23 +02:00
peg-parser.h
preset.cpp
preset.h
reasoning-budget.cpp common : inhibit lazy grammar sampler while reasoning is active (#20970) 2026-03-27 18:30:40 +01:00
reasoning-budget.h common : inhibit lazy grammar sampler while reasoning is active (#20970) 2026-03-27 18:30:40 +01:00
regex-partial.cpp common : fix iterator::end() dereference (#20445) 2026-03-16 08:50:38 +02:00
regex-partial.h
sampling.cpp common : Disable backend sampling if reasoning budget is enabled (#21209) 2026-03-31 10:14:01 +03:00
sampling.h
speculative.cpp
speculative.h
unicode.cpp
unicode.h