ggml-webgpu: compute pass batching and removing profiling overhead (#21873)

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

* Update register tiling matmul to use f32 accumulation

* fix profiling code

* Fix register tiling matmul for chrome, i'm blaming dawn

* Update batch tuning value for iOS

* compile fix

* Fix use of new load function

* Move to a single query set for GPU profiling

* Move to batching compute passes when not profiling

* Refactor build_multi

* remove iOS throttling now that we're batching compute passes

This commit is contained in:

Reese Levine

2026-04-16 01:12:19 -07:00

committed by

GitHub

parent 8612ed18b7

commit 82677a6ede

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

1 changed files with 349 additions and 452 deletions

801

ggml/src/ggml-webgpu/ggml-webgpu.cpp

View file

File diff suppressed because it is too large Load diff

ggml-webgpu: compute pass batching and removing profiling overhead (#21873)

801 ggml/src/ggml-webgpu/ggml-webgpu.cpp View file

801

ggml/src/ggml-webgpu/ggml-webgpu.cpp

View file