llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

Aman Gupta 077c94d0ca CUDA: add a fused top-K MoE kernel (#16130 ) * CUDA: add a fused top-K MoE kernel This kernel does the following: 1. softmax over the logits per token [n_experts, n_tokens] 2. argmax reduce over the top-k (n_experts_used) logits 3. write weights + ids to global memory It is intended as fusion of softmax->top-k->get_rows pipeline for MoE models * Refactor into ggml_cuda_should_use_topk_moe * Review: Use better coalescing pattern, use WARP_SIZE, store logits into registers before * Review: format + micro-optimizations * Fix bug: fix tie breakers * Add optional norm + clean-up code * Use smem for final write * Add bounds check * Use better memory pattern for writeback		2025-09-25 16:35:05 +02:00
..
.gitignore
CMakeLists.txt	ggml : split graph allocations according to backend max buffer size (#15815 )	2025-09-24 16:17:49 +02:00
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs
test-alloc.cpp	ggml : split graph allocations according to backend max buffer size (#15815 )	2025-09-24 16:17:49 +02:00
test-arg-parser.cpp
test-autorelease.cpp
test-backend-ops.cpp	CUDA: add a fused top-K MoE kernel (#16130 )	2025-09-25 16:35:05 +02:00
test-barrier.cpp
test-c.c
test-chat-parser.cpp
test-chat-template.cpp
test-chat.cpp
test-double-float.cpp
test-gbnf-validator.cpp
test-gguf.cpp
test-grammar-integration.cpp
test-grammar-llguidance.cpp
test-grammar-parser.cpp
test-json-partial.cpp
test-json-schema-to-grammar.cpp
test-llama-grammar.cpp
test-log.cpp
test-lora-conversion-inference.sh
test-model-load-cancel.cpp
test-mtmd-c-api.c
test-opt.cpp
test-quantize-fns.cpp
test-quantize-perf.cpp	ci: run the x64 and arm ci on the github machines instead (#16183 )	2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp
test-regex-partial.cpp
test-rope.cpp
test-sampling.cpp
test-thread-safety.cpp
test-tokenizer-0.cpp
test-tokenizer-0.py
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp
test-tokenizer-1-spm.cpp
test-tokenizer-random.py
test-tokenizers-repo.sh