ggml-webgpu: compute pass batching and removing profiling overhead (#21873)
* Update register tiling matmul to use f32 accumulation * fix profiling code * Fix register tiling matmul for chrome, i'm blaming dawn * Update batch tuning value for iOS * compile fix * Fix use of new load function * Move to a single query set for GPU profiling * Move to batching compute passes when not profiling * Refactor build_multi * remove iOS throttling now that we're batching compute passes
This commit is contained in:
parent
8612ed18b7
commit
82677a6ede
1 changed files with 349 additions and 452 deletions
File diff suppressed because it is too large
Load diff
Loading…
Reference in a new issue