llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

Jeff Bolz f01bd02376 vulkan: Implement split_k for coopmat2 flash attention. (#12627 ) When using group query attention, we have one workgroup per KV batch and this can be very few workgroups (e.g. just 8 in some models). Enable split_k to spread the work across SMs. This helps a lot when the KV cache is large.		2025-04-02 14:25:08 -05:00
..
.gitignore
CMakeLists.txt
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs
test-arg-parser.cpp	common : refactor downloading system, handle mmproj with -hf option (#12694 )	2025-04-01 23:44:05 +02:00
test-autorelease.cpp
test-backend-ops.cpp	vulkan: Implement split_k for coopmat2 flash attention. (#12627 )	2025-04-02 14:25:08 -05:00
test-barrier.cpp
test-c.c
test-chat-template.cpp	llama-chat : Add Yandex instruct model template support (#12621 )	2025-03-30 20:12:03 +02:00
test-chat.cpp	`server`: extract <think> tags from qwq outputs (#12297 )	2025-03-10 10:59:03 +00:00
test-double-float.cpp
test-gguf.cpp
test-grammar-integration.cpp
test-grammar-llguidance.cpp	upgrade to llguidance 0.7.10 (#12576 )	2025-03-26 11:06:09 -07:00
test-grammar-parser.cpp
test-json-schema-to-grammar.cpp	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )	2025-03-05 13:05:13 +00:00
test-llama-grammar.cpp
test-log.cpp
test-lora-conversion-inference.sh
test-model-load-cancel.cpp
test-opt.cpp
test-quantize-fns.cpp	tests : fix test-quantize-fns to init the CPU backend (#12306 )	2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp
test-rope.cpp
test-sampling.cpp
test-tokenizer-0.cpp
test-tokenizer-0.py
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp
test-tokenizer-1-spm.cpp
test-tokenizer-random.py