ggml-webgpu: compute pass batching and removing profiling overhead (#21873)

* Update register tiling matmul to use f32 accumulation

* fix profiling code

* Fix register tiling matmul for chrome, i'm blaming dawn

* Update batch tuning value for iOS

* compile fix

* Fix use of new load function

* Move to a single query set for GPU profiling

* Move to batching compute passes when not profiling

* Refactor build_multi

* remove iOS throttling now that we're batching compute passes
This commit is contained in:
Reese Levine 2026-04-16 01:12:19 -07:00 committed by GitHub
parent 8612ed18b7
commit 82677a6ede
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

File diff suppressed because it is too large Load diff