llama_cpp_for_radxa_dragon_.../examples
Sascha Rogmann 72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
* server: introduce self-speculative decoding

* server: moved self-call into speculative.cpp

* can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: can_speculate() tests self-spec

* server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* server: can_speculate() requires a task instance

* common: ngram map, config self-speculative decoding

* common: add enum common_speculative_type

* common: add vector of speculative states

* common: add option --spec-draftless

* server: cleanup (remove slot.batch_spec, rename)

* common: moved self-spec impl to ngram-map

* common: cleanup (use common_speculative_state_draft)

* spec : refactor

* cont : naming

* spec: remove --spec-config

* doc: (draftless) speculative decoding

* common: print performance in spec decoding

* minor : cleanup

* common : better names

* minor : cleanup + fix build

* minor: comments

* CODEOWNERS: add common/ngram-map.* (#18471)

* common : rename speculative.draftless_type -> speculative.type

* ngram-map : fix uninitialized values

* ngram-map : take into account the input can become shorter

* ngram-map : revert len check for now

* arg : change `--spec-draftless` -> `--spec-type`

* spec : add common_speculative_state::accept()

* spec : refactor + add common_speculative_begin()

* spec : fix begin() call with mtmd

* spec : additional refactor + remove common_speculative_params

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
..
batched context : reserve new scheduler when graph topology changes (#18547) 2026-01-15 16:39:17 +02:00
batched.swift
convert-llama2c-to-ggml
debug Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914) 2026-01-14 20:29:35 +01:00
deprecation-warning
diffusion llama : add use_direct_io flag for model loading (#18166) 2026-01-08 08:35:30 +02:00
embedding model : add LFM2-ColBert-350M (#18607) 2026-01-05 19:52:56 +01:00
eval-callback tests : download models only when running ctest (#18843) 2026-01-15 09:47:29 +01:00
gen-docs gen-docs: automatically update markdown file (#18294) 2025-12-22 19:30:19 +01:00
gguf
gguf-hash
idle metal : add residency sets keep-alive heartbeat (#17766) 2025-12-05 19:38:54 +02:00
llama.android refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
llama.swiftui
lookahead common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
lookup spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
model-conversion model-conversion : use BUILD_DIR variable in all scripts (#19015) 2026-01-23 09:01:36 +01:00
parallel common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
passkey
retrieval model : add LFM2-ColBert-350M (#18607) 2026-01-05 19:52:56 +01:00
save-load-state common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
simple
simple-chat
simple-cmake-pkg examples : add missing code block end marker [no ci] (#17756) 2025-12-04 14:17:30 +01:00
speculative spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
speculative-simple spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
sycl refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
training common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
CMakeLists.txt examples : add debug utility/example (#18464) 2026-01-07 10:42:19 +01:00
convert_legacy_llama.py
json_schema_pydantic_example.py
json_schema_to_grammar.py common : fix json schema with '\' in literals (#17307) 2025-11-29 17:06:32 +01:00
llama.vim
pydantic_models_to_grammar.py
pydantic_models_to_grammar_examples.py
reason-act.sh
regex_to_grammar.py
server-llama2-13B.sh
server_embd.py
ts-type-to-grammar.sh