llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

Jeff Bolz 716bd6dec3 vulkan: optimize mul_mat for small values of N (#10991 ) Make the mul_mat_vec shaders support N>1 (as a spec constant, NUM_COLS) where the batch_strides are overloaded to hold the row strides. Put the loads from the B matrix in the innermost loop because it should cache better. Share some code for reducing the result values to memory in mul_mat_vec_base.		2024-12-30 18:27:11 +01:00
..
.gitignore
CMakeLists.txt	tests: add tests for GGUF (#10830 )	2024-12-17 19:09:35 +01:00
get-model.cpp
get-model.h
run-json-schema-to-grammar.mjs	server : revamp chat UI with vuejs and daisyui (#10175 )	2024-11-07 17:31:10 -04:00
test-arg-parser.cpp	speculative : refactor and add a simpler example (#10362 )	2024-11-25 09:58:41 +02:00
test-autorelease.cpp
test-backend-ops.cpp	vulkan: optimize mul_mat for small values of N (#10991 )	2024-12-30 18:27:11 +01:00
test-barrier.cpp	ggml : move CPU backend to a separate file (#10144 )	2024-11-03 19:34:08 +01:00
test-c.c
test-chat-template.cpp	llama : support InfiniAI Megrez 3b (#10893 )	2024-12-23 01:35:44 +01:00
test-double-float.cpp
test-gguf.cpp	tests: disable GGUF test for bad value size (#10886 )	2024-12-19 08:53:58 +01:00
test-grammar-integration.cpp	llama : minor grammar refactor (#10897 )	2024-12-19 17:42:13 +02:00
test-grammar-parser.cpp	llama : refactor sampling v2 (#9294 )	2024-09-07 15:16:19 +03:00
test-json-schema-to-grammar.cpp	grammar : fix JSON Schema for string regex with top-level alt. (#9903 )	2024-10-16 19:03:24 +03:00
test-llama-grammar.cpp	llama : minor grammar refactor (#10897 )	2024-12-19 17:42:13 +02:00
test-log.cpp	common : use common_ prefix for common library functions (#9805 )	2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh	Fix HF repo commit to clone lora test models (#10649 )	2024-12-04 10:45:48 +01:00
test-model-load-cancel.cpp
test-opt.cpp	ggml : inttypes.h -> cinttypes (#0 )	2024-11-17 08:30:29 +02:00
test-quantize-fns.cpp	tests : fix compile warning	2024-11-25 15:17:32 +02:00
test-quantize-perf.cpp	ggml : inttypes.h -> cinttypes (#0 )	2024-11-17 08:30:29 +02:00
test-rope.cpp	llama : add Qwen2VL support + multimodal RoPE (#10361 )	2024-12-14 14:43:46 +02:00
test-sampling.cpp	sampling : refactor + optimize penalties sampler (#10803 )	2024-12-16 12:31:14 +02:00
test-tokenizer-0.cpp	common : use common_ prefix for common library functions (#9805 )	2024-10-10 22:57:42 +02:00
test-tokenizer-0.py
test-tokenizer-0.sh
test-tokenizer-1-bpe.cpp	common : use common_ prefix for common library functions (#9805 )	2024-10-10 22:57:42 +02:00
test-tokenizer-1-spm.cpp	common : use common_ prefix for common library functions (#9805 )	2024-10-10 22:57:42 +02:00
test-tokenizer-random.py