llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

Piotr Wilkin (ilintar) 96fe9badfc Add support for CUMSUM and TRI for CUDA. (#17584 ) * Add support for CUMSUM and TRI for CUDA. * Minor optimizations. * Correct warp_prefix_inclusive_sum in float2 variant to return float2 * Optimize TRI * Whitespace * Fix strides. * Implement double loop * Whitespace * Fix HIP compilation bugs * Optimizations + big case performance tests * Implement using CUB with fallback to custom kernel * Remove error message. * Fixes from code review * Comment out CPU-unsupported F16/BF16 cases to fix CI * Fine, you win :P * Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS * Vary warp-size based on physical warp size * Add GGML_UNUSED_VARS in tri as well * Use constexpr and call prefix_inclusive with warp_size template param * Update ggml/src/ggml-cuda/cumsum.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Change to tid % warp_size * Fix strides; hardcode mask; add ggml_lane_mask_t * Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info() * Too hasty... --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>		2025-12-04 22:19:51 +01:00
..
peg-parser	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
.gitignore	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
CMakeLists.txt	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs
test-alloc.cpp
test-arg-parser.cpp
test-autorelease.cpp
test-backend-ops.cpp	Add support for CUMSUM and TRI for CUDA. (#17584 )	2025-12-04 22:19:51 +01:00
test-barrier.cpp
test-c.c
test-chat-parser.cpp
test-chat-peg-parser.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
test-chat-template.cpp
test-chat.cpp	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
test-double-float.cpp
test-gbnf-validator.cpp
test-gguf.cpp
test-grammar-integration.cpp
test-grammar-llguidance.cpp
test-grammar-parser.cpp
test-json-partial.cpp
test-json-schema-to-grammar.cpp	Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572 )	2025-12-02 17:33:50 +01:00
test-llama-grammar.cpp
test-log.cpp
test-lora-conversion-inference.sh
test-model-load-cancel.cpp
test-mtmd-c-api.c
test-opt.cpp
test-peg-parser.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
test-quantize-fns.cpp
test-quantize-perf.cpp
test-quantize-stats.cpp	server: introduce API for serving / loading / unloading multiple models (#17470 )	2025-12-01 19:41:04 +01:00
test-regex-partial.cpp
test-rope.cpp	ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805 )	2025-11-11 13:33:24 +02:00
test-sampling.cpp
test-thread-safety.cpp	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
test-tokenizer-0.cpp
test-tokenizer-0.py
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp
test-tokenizer-1-spm.cpp
test-tokenizer-random.py
test-tokenizers-repo.sh