llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

Henri Vasserman 20568fe60f [Fix] Reenable server embedding endpoint (#1937 ) * Add back embedding feature * Update README		2023-06-20 01:12:39 +03:00
..
baby-llama	build : fix and ignore MSVC warnings (#1889 )	2023-06-16 21:23:53 +03:00
benchmark	build : fix and ignore MSVC warnings (#1889 )	2023-06-16 21:23:53 +03:00
embedding	build : fix and ignore MSVC warnings (#1889 )	2023-06-16 21:23:53 +03:00
jeopardy	hooks : setting up flake8 and pre-commit hooks (#1681 )	2023-06-17 13:32:48 +03:00
main	minor : warning fixes	2023-06-17 20:24:11 +03:00
metal	examples : fix examples/metal (#1920 )	2023-06-18 10:52:10 +03:00
perplexity	build : fix and ignore MSVC warnings (#1889 )	2023-06-16 21:23:53 +03:00
quantize	Allow "quantizing" to f16 and f32 (#1787 )	2023-06-13 04:23:23 -06:00
quantize-stats	build : fix and ignore MSVC warnings (#1889 )	2023-06-16 21:23:53 +03:00
save-load-state	build : fix and ignore MSVC warnings (#1889 )	2023-06-16 21:23:53 +03:00
server	[Fix] Reenable server embedding endpoint (#1937 )	2023-06-20 01:12:39 +03:00
simple	examples : add "simple" (#1840 )	2023-06-16 21:58:09 +03:00
train-text-from-scratch	train : get raw text instead of page with html (#1905 )	2023-06-17 09:51:54 +03:00
alpaca.sh
chat-13B.bat
chat-13B.sh
chat-persistent.sh
chat-vicuna.sh	examples : add chat-vicuna.sh (#1854 )	2023-06-15 21:05:53 +03:00
chat.sh
CMakeLists.txt	llama : fix kv_cache `n` init (close #1903 )	2023-06-17 19:31:20 +03:00
common.cpp	Only one CUDA stream per device for async compute (#1898 )	2023-06-17 19:15:02 +02:00
common.h	CUDA full GPU acceleration, KV cache in VRAM (#1827 )	2023-06-14 19:47:19 +02:00
gpt4all.sh
Miku.sh
reason-act.sh