llama_cpp_for_radxa_dragon_.../tests
Oliver Simons 36f0132464
CUDA: Factor out and re-use block_reduce function (#18785)
* CUDA: Refactor and expose two_stage_warp_reduce_* function

* Use `two_stage_warp_reduce` also in softmax kernel, move smem out of it

Moving smem out of `__device__` function to `__global__` function
allows for explicit smem reuse, as either compiler or cuda rt seem to not
free it afterwards (`cudaFuncSetAttribute` fails when not accounting for
it once for each call to two_stage_warp_reduce)

* Update ggml/src/ggml-cuda/common.cuh

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* Use two_stage_warp_reduce in group_norm_f32

* Use two_stage_warp_reduce in rms_norm_f32

* Fix smem calculation which expects bytes

* Make `two_stage_warp_reduce` accept all values warp_reduce accepts

Also integrate it into norm_f32 function

* Use two_stage_warp_reduce in l2_norm_f32

* Use type traits for block reduction for better legibility

Also adresss other requests by @am17an such as variable renaming

* Make norm tests cover all cuda paths

* Mark columns % WARP_SIZE !=0 as supported for RMS_NORM_BACK

Unit-tests passed locally, let's see if they pass in the CI as well

* Use `enum class` for `block_reduce_method`

This is more type-safe than plain enum

* Rename variables as suggested in code review by @am17an

* Rename two_stage_warp_reduce -> block_reduce

* Fix trailing whitespace in common.cuh

* Make condition of static_assert type-dependent

This delays evaluation until the template is actually instantiated.
Otherwise, some compilers may evaluate the assert when parsing the
template, resulting in build errors as observed here:

https://github.com/ggml-org/llama.cpp/actions/runs/20960323123/job/60235530068?pr=18785

* Inline definitions

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2026-01-15 10:44:54 +08:00
..
peg-parser
.gitignore
CMakeLists.txt ci, tests : use cmake to download models and remove libcurl dependency (#18791) 2026-01-14 07:46:27 +01:00
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs
test-alloc.cpp
test-arg-parser.cpp ci, tests : use cmake to download models and remove libcurl dependency (#18791) 2026-01-14 07:46:27 +01:00
test-autorelease.cpp
test-backend-ops.cpp CUDA: Factor out and re-use block_reduce function (#18785) 2026-01-15 10:44:54 +08:00
test-backend-sampler.cpp tests : refactor test-backend-sampler (#18753) 2026-01-11 17:31:03 +02:00
test-barrier.cpp
test-c.c
test-chat-parser.cpp
test-chat-peg-parser.cpp
test-chat-template.cpp
test-chat.cpp chat: make tool description and parameters optional per OpenAI spec (#18478) 2025-12-31 17:21:37 -06:00
test-double-float.cpp
test-gbnf-validator.cpp
test-gguf.cpp
test-grammar-integration.cpp
test-grammar-llguidance.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-grammar-parser.cpp
test-json-partial.cpp
test-json-schema-to-grammar.cpp
test-llama-grammar.cpp
test-log.cpp
test-lora-conversion-inference.sh
test-model-load-cancel.cpp
test-mtmd-c-api.c
test-opt.cpp
test-peg-parser.cpp
test-quantize-fns.cpp
test-quantize-perf.cpp
test-quantize-stats.cpp
test-regex-partial.cpp common/grammar : replace problematic backtracking regex [\s\S]* (#18342) 2026-01-03 16:02:43 -06:00
test-rope.cpp
test-sampling.cpp
test-state-restore-fragmented.cpp
test-thread-safety.cpp
test-tokenizer-0.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-0.py
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-1-spm.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-random.py
test-tokenizers-repo.sh