llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

Daniel Bevenius 657b8a77bd chat: handle gpt-oss return/end token inconsistency (#15421 ) This commit addresses an inconsistency during inference by adding a new member to the `templates_params` struct to indicate whether the chat is in inference mode. This allows the gpt-oss specific function `common_chat_params_init_gpt_oss` to check this flag and the `add_generation_prompt` flag to determine if it should replace the `<\|return\|>` token with the `<\|end\|>` token in the prompt. The motivation for this change is to ensure that the formatted prompt of past messages in `common_chat_format_single` matches the output of the formatted new message. The issue is that the gpt-oss template returns different end tags: `<\|return\|>` when `add_generation_prompt` is false, and `<\|end\|>` when `add_generation_prompt` is true. This causes the substring function to start at an incorrect position, resulting in tokenization starting with 'tart\|>' instead of '<\|start\|>'. Resolves: https://github.com/ggml-org/llama.cpp/issues/15417		2025-08-20 14:26:01 +02:00
..
arg.cpp	common : fix context shift help message (#15448 )	2025-08-20 13:33:30 +03:00
arg.h
base64.hpp
build-info.cpp.in
chat-parser.cpp	chat : support Granite model reasoning and tool call (#14864 )	2025-08-06 20:27:30 +02:00
chat-parser.h
chat.cpp	chat: handle gpt-oss return/end token inconsistency (#15421 )	2025-08-20 14:26:01 +02:00
chat.h	chat : include kwargs in template example (#15309 )	2025-08-14 10:28:29 -07:00
CMakeLists.txt
common.cpp	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
common.h	common : fix context shift help message (#15448 )	2025-08-20 13:33:30 +03:00
console.cpp
console.h
json-partial.cpp
json-partial.h
json-schema-to-grammar.cpp
json-schema-to-grammar.h
llguidance.cpp
log.cpp
log.h
ngram-cache.cpp
ngram-cache.h
regex-partial.cpp
regex-partial.h
sampling.cpp
sampling.h
speculative.cpp	server : implement universal assisted decoding (#12635 )	2025-07-31 14:25:23 +02:00
speculative.h	server : implement universal assisted decoding (#12635 )	2025-07-31 14:25:23 +02:00