llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

Ivy233 02082f1519 clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 ) * [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA. * [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.		2025-03-26 15:06:04 +01:00
..
batched
batched-bench	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
batched.swift	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
convert-llama2c-to-ggml
cvector-generator	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
deprecation-warning
embedding	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
eval-callback
export-lora	common : refactor '-o' option (#12278 )	2025-03-10 13:34:13 +02:00
gbnf-validator
gen-docs
gguf
gguf-hash
gguf-split
gritlm	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
imatrix	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
infill	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
jeopardy
llama-bench	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama.android	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama.swiftui	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llava	clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 )	2025-03-26 15:06:04 +01:00
lookahead	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
lookup	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
main	docs : bring llama-cli conversation/template docs up-to-date (#12426 )	2025-03-17 21:14:32 +01:00
parallel	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
passkey	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
perplexity	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
quantize	ggml : portability fixes for VS 2017 (#12150 )	2025-03-04 18:53:26 +02:00
quantize-stats	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
retrieval	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
rpc
run	run: de-duplicate fmt and format functions and optimize (#11596 )	2025-03-25 18:46:11 +01:00
save-load-state	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
server	server : Add verbose output to OAI compatible chat endpoint. (#12246 )	2025-03-23 19:30:26 +01:00
simple
simple-chat	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
simple-cmake-pkg
speculative	speculative : fix seg fault in certain cases (#12454 )	2025-03-18 19:35:11 +02:00
speculative-simple	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
sycl	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
tokenize
tts	llama-tts : avoid crashes related to bad model file paths (#12482 )	2025-03-21 11:12:45 +02:00
chat-13B.bat
chat-13B.sh
chat-persistent.sh
chat-vicuna.sh
chat.sh
CMakeLists.txt
convert_legacy_llama.py
json_schema_pydantic_example.py
json_schema_to_grammar.py	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )	2025-03-05 13:05:13 +00:00
llama.vim
llm.vim
Miku.sh
pydantic_models_to_grammar.py
pydantic_models_to_grammar_examples.py
reason-act.sh
regex_to_grammar.py
server-llama2-13B.sh
server_embd.py
ts-type-to-grammar.sh