llama_cpp_for_radxa_dragon_.../src
JJJYmmm d261223d24
model: add support for qwen3vl series (#16780)
* support qwen3vl series.

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>

* bugfix: fix the arch check for qwen3vl-moe.

* use build_ffn

* optimize deepstack structure

* optimize deepstack feature saving

* Revert "optimize deepstack feature saving" for temporal fix

This reverts commit f321b9fdf13e59527408152e73b1071e19a87e71.

* code clean

* use fused qkv in clip

* clean up / rm is_deepstack_layers for simplification

* add test model

* move test model to "big" section

* fix imrope check

* remove trailing whitespace

* fix rope fail

* metal : add imrope support

* add imrope support for sycl

* vulkan: add imrope w/o check

* fix vulkan

* webgpu: add imrope w/o check

* Update gguf-py/gguf/tensor_mapping.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix tensor mapping

---------

Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com>
Co-authored-by: yairpatch <yairpatch@users.noreply.github.com>
Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-30 16:19:14 +01:00
..
CMakeLists.txt
llama-adapter.cpp
llama-adapter.h
llama-arch.cpp model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
llama-arch.h model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
llama-batch.cpp llama: fix ASAN error with M-RoPE (#16848) 2025-10-29 20:11:39 +01:00
llama-batch.h llama: store mrope data in KV cell (#16825) 2025-10-29 18:09:18 +01:00
llama-chat.cpp model : add BailingMoeV2 support (#16063) 2025-10-20 21:38:20 +02:00
llama-chat.h model : add BailingMoeV2 support (#16063) 2025-10-20 21:38:20 +02:00
llama-context.cpp llama : disable pipeline parallelism if compute buffer allocation fails (#16748) 2025-10-27 21:51:28 +01:00
llama-context.h
llama-cparams.cpp
llama-cparams.h
llama-grammar.cpp
llama-grammar.h
llama-graph.cpp llama : use std::abs instead of abs (#16853) 2025-10-30 08:30:58 +02:00
llama-graph.h graph : support cacheless embeddings with FA and iSWA (#16528) 2025-10-13 22:42:37 +03:00
llama-hparams.cpp model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
llama-hparams.h model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
llama-impl.cpp
llama-impl.h
llama-io.cpp
llama-io.h
llama-kv-cache-iswa.cpp
llama-kv-cache-iswa.h
llama-kv-cache.cpp model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
llama-kv-cache.h memory : remove KV cache size padding (#16812) 2025-10-28 20:19:44 +02:00
llama-kv-cells.h llama: store mrope data in KV cell (#16825) 2025-10-29 18:09:18 +01:00
llama-memory-hybrid.cpp
llama-memory-hybrid.h
llama-memory-recurrent.cpp llama: consistent ctx <-> buf order for KV cache (#16746) 2025-10-28 11:23:54 +01:00
llama-memory-recurrent.h llama: consistent ctx <-> buf order for KV cache (#16746) 2025-10-28 11:23:54 +01:00
llama-memory.cpp
llama-memory.h
llama-mmap.cpp
llama-mmap.h
llama-model-loader.cpp
llama-model-loader.h
llama-model-saver.cpp
llama-model-saver.h
llama-model.cpp model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
llama-model.h model: Add support for CogVLM model (#15002) 2025-10-30 12:18:50 +01:00
llama-quant.cpp llama : use std::abs instead of abs (#16853) 2025-10-30 08:30:58 +02:00
llama-quant.h
llama-sampling.cpp
llama-sampling.h
llama-vocab.cpp model : add BailingMoeV2 support (#16063) 2025-10-20 21:38:20 +02:00
llama-vocab.h
llama.cpp llama-quant: add support for mmproj (#16592) 2025-10-15 14:48:08 +02:00
unicode-data.cpp
unicode-data.h
unicode.cpp
unicode.h