llama_cpp_for_radxa_dragon_.../common
Max Krasnyansky 053b1539c0
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995)
* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-05-31 15:39:19 -07:00
..
cmake
arg.cpp threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995) 2025-05-31 15:39:19 -07:00
arg.h common : add common_remote_get_content (#13123) 2025-04-26 22:58:12 +02:00
base64.hpp
build-info.cpp.in
chat-parser.cpp server: allow unclosed thinking tags (#13931) 2025-05-31 08:26:10 -07:00
chat-parser.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
chat.cpp sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
chat.h server: fix streaming crashes (#13786) 2025-05-26 16:03:57 +01:00
CMakeLists.txt sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
common.cpp threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995) 2025-05-31 15:39:19 -07:00
common.h server: --offline mode (#13804) 2025-05-26 22:34:27 +01:00
console.cpp
console.h
json-partial.cpp sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-partial.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-schema-to-grammar.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
llguidance.cpp llguidance : set tokenizer slices to default (#13424) 2025-05-10 17:19:52 +02:00
log.cpp Fix: Compile failure due to Microsoft STL breaking change (#11836) 2025-02-12 21:36:11 +01:00
log.h cleanup: fix compile warnings associated with gnu_printf (#11811) 2025-02-12 10:06:53 -04:00
ngram-cache.cpp ggml : portability fixes for VS 2017 (#12150) 2025-03-04 18:53:26 +02:00
ngram-cache.h llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
regex-partial.cpp common: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
regex-partial.h common: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp server: streaming of tool calls and thoughts when --jinja is on (#12379) 2025-05-25 01:48:08 +01:00
sampling.h sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
speculative.cpp llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
speculative.h speculative : update default params (#11954) 2025-02-19 13:29:42 +02:00