llama_cpp_for_radxa_dragon_wing_q6a

History

Bizhao Shi 2d38b6e400 CANN: Add the basic supports of Flash Attention kernel (#13627 ) * cann: add the basic FA support * cann: update the readme * cann: update the FlashAttention with PSEShift * cann: update the input parameters in FA * cann: update the alibi with max_bias * cann: add the constrints of softcap * cann: update the docs CANN.md * cann: update the docs CANN.md * cann: fix typo of CANN.md * cann: add some comments and update the CANN.md * cann: update the CANN.md * cann: update the inner precise for fusedInferAttention * cann: update the constraints of flash_attn_ext on ggml-cann.cpp * cann: clean the whitespace * cann: clean the whitespace * cann: add a new endline		2025-05-26 10:20:18 +08:00
..
backend	CANN: Add the basic supports of Flash Attention kernel (#13627 )	2025-05-26 10:20:18 +08:00
development
multimodal
android.md
build.md
docker.md	musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647 )	2025-05-21 09:58:49 +08:00
function-calling.md	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 )	2025-05-25 01:48:08 +01:00
install.md
llguidance.md
multimodal.md	mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760 )	2025-05-25 14:06:32 +02:00