llama_cpp_for_radxa_dragon_.../tests
Gaurav Garg 517b5ddbf0
CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (#12183)
- Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value.
- Prefer vector flash attention kernels over MMA kernel for BS=1

Fixes Issue: #12182
---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-03-19 20:52:06 +01:00
..
.gitignore
CMakeLists.txt sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs
test-arg-parser.cpp
test-autorelease.cpp
test-backend-ops.cpp CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (#12183) 2025-03-19 20:52:06 +01:00
test-barrier.cpp
test-c.c
test-chat-template.cpp tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900) 2025-02-18 18:03:23 +00:00
test-chat.cpp server: extract <think> tags from qwq outputs (#12297) 2025-03-10 10:59:03 +00:00
test-double-float.cpp
test-gguf.cpp cleanup: fix compile warnings associated with gnu_printf (#11811) 2025-02-12 10:06:53 -04:00
test-grammar-integration.cpp sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
test-grammar-llguidance.cpp sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
test-grammar-parser.cpp
test-json-schema-to-grammar.cpp tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) 2025-03-05 13:05:13 +00:00
test-llama-grammar.cpp
test-log.cpp
test-lora-conversion-inference.sh
test-model-load-cancel.cpp
test-opt.cpp
test-quantize-fns.cpp tests : fix test-quantize-fns to init the CPU backend (#12306) 2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp
test-rope.cpp
test-sampling.cpp sampling: add Top-nσ sampler (#11223) 2025-02-13 08:45:57 +02:00
test-tokenizer-0.cpp
test-tokenizer-0.py
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp
test-tokenizer-1-spm.cpp
test-tokenizer-random.py