llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

Fork 0

92fc86582f

Delete .github/workflows/build-cache.yml main main-b8992-92fc865 James Devine 2026-05-06 00:04:03 +0200
f43040117f

Update README with NPU optimization details main-b8991-f430401 James Devine 2026-05-02 22:50:54 +0200
094f7aaf18 Add Q6A build artifacts: cross-compiled llama-cli + Hexagon NPU DSP libraries pingud98 2026-05-02 20:44:11 +0000
c20c44514a

spec: fix argument typo (#22552) Ben Guidarelli 2026-04-30 10:32:32 -0400
6118c043b1

ci : bump ty to 0.0.33 (#22535) Sigbjørn Skjæret 2026-04-30 15:15:54 +0200
5f0ab726f7

vendor : update cpp-httplib to 0.43.2 (#22548) Adrien Gallouët 2026-04-30 15:04:39 +0200
e82aaf2587

CUDA: fix tile FA kernel on Pascal (#22541) Johannes Gäßler 2026-04-30 13:04:50 +0200
27aef3dd91

scripts : add wc2wt.sh - create worktree from current HEAD (#22513) Georgi Gerganov 2026-04-30 09:20:26 +0300
45155597aa

add fast matmul iquants (#22504) Rithik Sharma 2026-04-29 22:58:32 -0700
80afa33aad

spec : fix draft model checkpoints (#22521) Georgi Gerganov 2026-04-30 08:32:18 +0300
b42c7fa5b8

spec : fix vocab compat checks in spec example (#22426) Peter Sideris 2026-04-30 08:18:25 +0300
d77599234e

common : do not pass prompt tokens to reasoning budget sampler (#22488) Aldehir Rojas 2026-04-29 14:10:58 -0500
41a63be28e

hexagon: make vmem and buffer-size configurable (#22487) Max Krasnyansky 2026-04-29 11:51:21 -0700
098705a29e

CUDA: fuse SSM_CONV + ADD(bias) + SILU (#22478) Anav Prasad 2026-04-29 11:39:56 -0700
683c5acb90

spec : disacard last drafted token with low prob (#22506) Georgi Gerganov 2026-04-29 17:00:00 +0300
b1d5f5b449 sync : ggml Georgi Gerganov 2026-04-29 16:43:08 +0300
4b221b7f1e ggml : bump version to 0.10.1 (ggml/1469) Georgi Gerganov 2026-04-29 16:41:45 +0300
59237bfbbc

webui: fix slow mic stop and WAV encode (#22480) Pascal 2026-04-29 12:58:35 +0200
1cbc846eba

ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (#22293) shalinib-ibm 2026-04-29 16:02:40 +0530
3142f1dbb9

ggml-cuda: refactor fusion code (#22468) Aman Gupta 2026-04-29 16:19:33 +0800
b5c4227dc6

ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME (#22317) qiurui144 2026-04-29 15:59:21 +0800
d6a5094004

ggml-webgpu: Fix bug in FlashAttention support check (#22492) Reese Levine 2026-04-29 00:59:00 -0700
7b95ea5d11

common: Intentionally leak logger instance to fix hanging on Windows (#22273) Masato Nakasaka 2026-04-29 16:58:43 +0900
bdc9c743a5

ggml : add sve tuned code for gemm_q8_0_4x8_q8_0() kernel (#21916) hrushitfujitsu 2026-04-29 13:27:37 +0530
739393beeb

TP: fix delayed AllReduce + zero-sized slices (#22489) Johannes Gäßler 2026-04-29 08:55:07 +0200
fc2b0053ff

ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196) Michael Wand 2026-04-28 15:47:42 -0700
7b8443ac78

ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (… (#22286) lnigam 2026-04-29 01:07:35 +0530
5d56effdee

convert : add support for Nemotron Nano 3 Omni (#22481) Daniel Bevenius 2026-04-28 19:17:57 +0200
52e5f0a5c1

common : re-arm reasoning budget after DONE on new <think> (#22323) Jillis ter Hove 2026-04-28 19:15:36 +0200
f9f33654a6

vulkan: Coalesce Q4_K/Q5_K scale loads (#21751) Matt Corallo 2026-04-28 15:31:04 +0000
98bb57916a

ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing logic (#22456) Reese Levine 2026-04-28 07:27:17 -0700
f42e29fdf1

webui: Server tools (#21237) Aleksander Grygier 2026-04-28 14:35:49 +0300
19821178be

vulkan: add barrier after writetimestamp (#21865) Jeff Bolz 2026-04-28 12:28:12 +0200
698d19b93c

ggml: improve SPIR-V headers detection with __has_include (#21918) Emil Askerov 2026-04-28 13:19:06 +0300
50494a2800

ggml : skip already registered backends and devices (#22296) Adrien Gallouët 2026-04-28 09:02:32 +0200
d530d6e7a2

ggml : revert to -lm linking instead of find_library (#22355) Adrien Gallouët 2026-04-28 08:56:02 +0200
c3e08f4700

CANN: add new ops, optimize existing ops (#21204) hipudding 2026-04-28 14:27:22 +0800
14e733e36f

spec : refactor params (#22397) Georgi Gerganov 2026-04-28 09:07:33 +0300
516e8d7a8a

server: use pos_next instead of n_tokens for m-rope (#22439) Aman Gupta 2026-04-28 13:41:00 +0800
434b2a1ff6

ggml-webgpu: add Q1_0 support (#22374) Rithik Sharma 2026-04-27 15:50:59 -0700
983ca8992e

server: (router) Forward form-data to model server (Fixes #22044) (#22118) tha80 2026-04-27 23:55:00 +0200
665abc6097

add fast mat-vec kernels for i-quants (#22344) Rithik Sharma 2026-04-27 08:25:45 -0700
4414c04b9a

Additional test for common/gemma4 : handle parsing edge cases (#22420) Igor Rudenko 2026-04-27 17:36:59 +0300
ceaf47c4b1

fix: rpc-server cache may not work in Windows environments (#22394) unraido 2026-04-27 23:25:09 +0900
42401c72b8

Fix type casting for unaccounted memory calculation (#22424) rankaiyx 2026-04-27 20:31:13 +0800
e940b3d468

download : prefer q8_0 when q4_k not available (#22428) Georgi Gerganov 2026-04-27 15:30:29 +0300
0f1bb602dd

model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (#22421) ynankani 2026-04-27 07:58:48 +0000
d13540becd

convert : remove input_scale for dequantized fp8 modelopt (#22356) Sigbjørn Skjæret 2026-04-27 08:45:01 +0200
f84270ea10

ggml : use 64 bytes aligned tile buffers (#21058) Adrien Gallouët 2026-04-27 08:30:55 +0200
5594d13224

common: fix missing exports in llama-common (#22340) Max Krasnyansky 2026-04-26 22:06:39 -0700
f535774325

pr2wt : symlink .pi (#22386) Georgi Gerganov 2026-04-26 19:49:26 +0300
06a811d085

add performance-portable tuning for register-tile and subgroup matmul (#22241) Rithik Sharma 2026-04-26 09:26:28 -0700
78433f606f

Fix recurrent state serialization for partial reads and writes (#22362) Gaurav Garg 2026-04-26 17:04:40 +0530
7ec36aa861

Github: set meta backend code owner (#22388) Johannes Gäßler 2026-04-26 13:34:13 +0200
b1a5bd4e0c

CUDA: better coalesce data-access for contiguous concat (#22330) Oliver Simons 2026-04-26 09:21:45 +0200
0c6ee1cade

ggml-cpu : re-enable fast gelu_quick_f16 (#22339) Sigbjørn Skjæret 2026-04-26 08:28:14 +0200
2dd84169d1

ggml-cpu: optimize avx2 q6_k (#22345) Eve 2026-04-26 06:27:50 +0000
f454bd7eb8

opencl: add iq4_nl support (#22272) lhez 2026-04-25 21:21:58 -0700
b760272f1a

hexagon: guard HMX clock request for v75+ platforms (#22377) Trivikram Reddy 2026-04-25 19:58:26 -0500
dcad77cc3b

chat: fix handling of space in reasoning markers (#22353) Piotr Wilkin (ilintar) 2026-04-25 21:24:13 +0200
98dc1418ea

spec : fix vocab compat checks (#22358) Georgi Gerganov 2026-04-25 20:11:35 +0300
9725a313be

CUDA: reduce MMQ stream-k overhead (#22298) Johannes Gäßler 2026-04-25 14:15:03 +0200
d1649047a3

metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962) Developer-Ecosystem-Engineering 2026-04-25 05:14:28 -0700
9d34231bb8

llama-quant : default ftype param Q5_1 --> Q8_0 (#20828) ddh0 2026-04-25 01:25:35 -0500
8ea8fee966

gitignore : add .pi + personal SYSTEM.md (#22316) Georgi Gerganov 2026-04-25 09:20:45 +0300
eddd7a13a5

[SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291) Neo Zhang 2026-04-25 14:20:14 +0800
dd2914dc81

ggml-webgpu: support for SSM_SCAN and disable set_rows error checking (#22327) Reese Levine 2026-04-24 23:18:15 -0700
0adede866d

parser: fix structured output bug (#22302) Piotr Wilkin (ilintar) 2026-04-24 23:19:55 +0200
361fe72acb

Hexagon: Bump HMX Frequency to Max Corner (#22334) Trivikram Reddy 2026-04-24 15:55:17 -0500
a702f39597

CI Snapdragon: Switch ubuntu-latest to ubuntu-slim runner (#22303) Shreya Jain 2026-04-24 12:21:36 -0700
13d36cf891

ggml-webgpu: enable FLASH_ATTN_EXT on browser without subgroup matrix (#22199) Zheyuan Chen 2026-04-24 10:39:09 -0700
f65bc34c68

hexagon: use DIRID 13 in libggml-htp.inf for modern InfVerif (#22306) Mengsheng Wu 2026-04-25 00:21:33 +0800
15fa3c493b

metal : print GPU description (#22318) Georgi Gerganov 2026-04-24 13:56:03 +0300
dc80c5252a

common : fix jinja warnings with clang 21 (#22313) Adrien Gallouët 2026-04-24 12:36:02 +0200
e583f3b4f5

ggml : minor coding style (#22308) Georgi Gerganov 2026-04-24 11:02:00 +0300
017f090442

jinja : remove unused header (#22310) Georgi Gerganov 2026-04-24 11:01:46 +0300
ffdd983fb8

server : fix swa-full logic (#22288) Georgi Gerganov 2026-04-24 10:17:37 +0300
793d0a7931

server: rename debug tags to match --cache-idle-slots naming (#22292) Yes You Can Have Your Own 2026-04-24 09:28:44 +0300
8bc492ebb4

hexagon: add SOLVE_TRI op (#21974) Mengsheng Wu 2026-04-24 09:39:13 +0800
e5f070a1dc

fix(shader): handle the buffer aliasing for rms fuse (#22266) Chen Yuan 2026-04-23 19:32:59 -0400
fa0b8a70a8

cli: Remove redundant local sampling variables (#20429) (#22264) Ethan Turner 2026-04-23 15:53:23 -0700
5d2b52d80d

hexagon: add support for basic and extended Op profiling (#22269) Max Krasnyansky 2026-04-23 14:17:21 -0700
187a456370

Enable testing on Snapdragon devices (#21051) Shreya Jain 2026-04-23 13:08:10 -0700
185cbff6f1

server : convert_anthropic_to_oai: also copy chat_template_kwargs (#22154) srkizer 2026-04-24 03:32:46 +0900
c78fb909b2

server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21869) (#22267) Song Li 2026-04-23 12:39:07 -0400
12568ca8c8

vendor : update LibreSSL to 4.3.1 (#22285) Adrien Gallouët 2026-04-23 17:45:56 +0200
c807c6e3b0

server: (anthropic API) fix prefix caching (#21793) kvc0 2026-04-23 08:45:02 -0700
0949beb5a3

fix build number for sycl release (#22283) Sigbjørn Skjæret 2026-04-23 15:38:58 +0200
9012c50fc8

model-conversion : fix mmproj output file name [no ci] (#22274) Daniel Bevenius 2026-04-23 15:07:38 +0200
0dd7f915fd

cli : cleanup auto-completion code (#21745) Matthias Straka 2026-04-23 15:03:28 +0200
550d684bd1

server: Enable transcriptions API for LFM2-Audio (#22000) Tarek Dakhran 2026-04-23 10:47:26 +0200
8635e221c8

metal : fix event synchronization (#22260) Georgi Gerganov 2026-04-23 08:22:49 +0300
930e0210d1

gitignore: add AGENTS.local.md (#22246) Georgi Gerganov 2026-04-23 08:22:24 +0300
96c1db26c4

ggml-base: use MATH_LIBRARY variable instead of hardcoded 'm' (#22239) Georgi Gerganov 2026-04-23 08:22:08 +0300
4ead6fd957

[SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24 package. (#22078) Neo Zhang Jianyu 2026-04-23 13:21:36 +0800
5eaee65384

convert : Handle ModelOpt produced mixed precision model during convert to GGUF (#22247) ynankani 2026-04-23 05:19:51 +0000
60b68a6279

sycl : fused MoE mul_mat_vec_q for TG (#21920) abotsis 2026-04-22 23:18:56 -0600
b76429a69c

ggml-webgpu: add support for im2col (#22259) Chen Yuan 2026-04-22 23:17:41 -0400
86db42e97f

CUDA: fuse relu + sqr (#22249) Anav Prasad 2026-04-23 02:28:56 +0000
6217b49583

HIP: flip GGML_HIP_GRAPHS to default on (#22254) uvos 2026-04-23 02:34:31 +0200

Commit graph Select branches Hide pull requests main main-b8991-f430401 main-b8992-92fc865 Mono Color

Commit graph

Select branches

Hide pull requests

main

main-b8991-f430401

main-b8992-92fc865