llama_cpp_for_radxa_dragon_wing_q6a

History

Johannes Gäßler 5fa07c2f93 CUDA: optimize FA for GQA + large batches (#12014 )		2025-02-22 12:20:17 +01:00
..
.gitignore
CMakeLists.txt
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs
test-arg-parser.cpp
test-autorelease.cpp
test-backend-ops.cpp	CUDA: optimize FA for GQA + large batches (#12014 )	2025-02-22 12:20:17 +01:00
test-barrier.cpp
test-c.c
test-chat-template.cpp	tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 )	2025-02-18 18:03:23 +00:00
test-chat.cpp	tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 )	2025-02-18 18:03:23 +00:00
test-double-float.cpp
test-gguf.cpp
test-grammar-integration.cpp
test-grammar-llguidance.cpp
test-grammar-parser.cpp
test-json-schema-to-grammar.cpp
test-llama-grammar.cpp
test-log.cpp
test-lora-conversion-inference.sh
test-model-load-cancel.cpp
test-opt.cpp
test-quantize-fns.cpp
test-quantize-perf.cpp
test-rope.cpp
test-sampling.cpp
test-tokenizer-0.cpp
test-tokenizer-0.py
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp
test-tokenizer-1-spm.cpp
test-tokenizer-random.py