llama_cpp_for_radxa_dragon_wing_q6a

History

Gaurav Garg bcdcc1044f ggml : reduce CPU overhead in meta backend (#22041 ) * cache subgraph splits when cgraph is unchanged Skip per-call subgraph construction in ggml_backend_meta_graph_compute when the same ggml_cgraph is used consecutively. Assign uid to every sub-graph so that CUDA's fast uid check path hits too. * Address review comments * Keep the scope as is * Rename last_uid and last_n_subgraphs field. Remove last_max_tmp_size field. Refactor code. * Address review comments * Update ggml/src/ggml-backend-meta.cpp Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-backend-meta.cpp Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>		2026-04-19 12:48:35 +03:00
..
cmake	ggml: backend-agnostic tensor parallelism (experimental) (#19378 )	2026-04-09 16:42:19 +02:00
include	CUDA: manage NCCL communicators in context (#21891 )	2026-04-15 15:58:40 +02:00
src	ggml : reduce CPU overhead in meta backend (#22041 )	2026-04-19 12:48:35 +03:00
.gitignore
CMakeLists.txt	cmake: remove CMP0194 policy to restore MSVC builds (#21934 )	2026-04-19 10:25:05 +03:00