llama_cpp_for_radxa_dragon_.../examples
Ivy233 02082f1519
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566)
* [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize.
After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA.

* [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.
2025-03-26 15:06:04 +01:00
..
batched
batched-bench
batched.swift
convert-llama2c-to-ggml
cvector-generator
deprecation-warning
embedding
eval-callback
export-lora
gbnf-validator
gen-docs
gguf
gguf-hash
gguf-split
gritlm
imatrix
infill
jeopardy
llama-bench
llama.android
llama.swiftui
llava clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566) 2025-03-26 15:06:04 +01:00
lookahead
lookup
main
parallel
passkey
perplexity
quantize
quantize-stats
retrieval
rpc
run run: de-duplicate fmt and format functions and optimize (#11596) 2025-03-25 18:46:11 +01:00
save-load-state
server server : Add verbose output to OAI compatible chat endpoint. (#12246) 2025-03-23 19:30:26 +01:00
simple
simple-chat
simple-cmake-pkg
speculative
speculative-simple
sycl
tokenize
tts
chat-13B.bat
chat-13B.sh
chat-persistent.sh
chat-vicuna.sh
chat.sh
CMakeLists.txt
convert_legacy_llama.py
json_schema_pydantic_example.py
json_schema_to_grammar.py
llama.vim
llm.vim
Miku.sh
pydantic_models_to_grammar.py
pydantic_models_to_grammar_examples.py
reason-act.sh
regex_to_grammar.py
server-llama2-13B.sh
server_embd.py
ts-type-to-grammar.sh