llama_cpp_for_radxa_dragon_wing_q6a

pingu_98/llama_cpp_for_radxa_dragon_wing_q6a

History

howlger 1e13987fba embedding : show full embedding for single prompt (#6342 ) * embedding : show full embedding for single prompt To support the use case of creating an embedding for a given prompt, the entire embedding and not just the first part needed to be printed. Also, show cosine similarity matrix only if there is more than one prompt, as the cosine similarity matrix for a single prompt is always `1.00`. * Update examples/embedding/embedding.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2024-03-27 13:15:44 +02:00
..
baby-llama
batched	metal : pad n_ctx by 32 (#6177 )	2024-03-22 09:36:03 +02:00
batched-bench	llama : add pipeline parallelism support (#6017 )	2024-03-13 18:54:21 +01:00
batched.swift
beam-search
benchmark	ggml : remove old quantization functions (#5942 )	2024-03-09 15:53:59 +02:00
convert-llama2c-to-ggml	llama2c : open file as binary (#6332 )	2024-03-27 09:16:02 +02:00
embedding	embedding : show full embedding for single prompt (#6342 )	2024-03-27 13:15:44 +02:00
export-lora
finetune
gguf	gguf : fix resource leaks (#6061 )	2024-03-14 20:29:32 +02:00
gguf-split	common: llama_load_model_from_url split support (#6192 )	2024-03-23 18:07:00 +01:00
gritlm	gritlm : add initial README.md (#6086 )	2024-03-16 17:46:29 +02:00
imatrix	llama : greatly reduce output buffer memory usage (#6122 )	2024-03-26 16:46:41 +02:00
infill
jeopardy
llama-bench	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
llama.android	android : fix utf8 decoding error (#5935 )	2024-03-10 22:03:17 +02:00
llama.swiftui	llama : add pipeline parallelism support (#6017 )	2024-03-13 18:54:21 +01:00
llava	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
lookahead
lookup	lookup: complement data from context with general text statistics (#5479 )	2024-03-23 01:24:36 +01:00
main	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
main-cmake-pkg	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
parallel	llama : greatly reduce output buffer memory usage (#6122 )	2024-03-26 16:46:41 +02:00
passkey
perplexity	llama : greatly reduce output buffer memory usage (#6122 )	2024-03-26 16:46:41 +02:00
quantize	IQ1_M: 1.75 bpw quantization (#6302 )	2024-03-26 15:21:27 +01:00
quantize-stats
retrieval	examples : add "retrieval" (#6193 )	2024-03-25 09:38:22 +02:00
save-load-state
server	server: public: use relative routes for static files (#6325 )	2024-03-27 06:55:29 +01:00
simple
speculative	llama : greatly reduce output buffer memory usage (#6122 )	2024-03-26 16:46:41 +02:00
sycl	[SYCL] fix SYCL backend build on windows is break by LOG() error (#6290 )	2024-03-25 15:52:41 +08:00
tokenize
train-text-from-scratch	gguf : fix resource leaks (#6061 )	2024-03-14 20:29:32 +02:00
alpaca.sh
base-translate.sh
chat-13B.bat
chat-13B.sh
chat-persistent.sh
chat-vicuna.sh
chat.sh
CMakeLists.txt	examples : add "retrieval" (#6193 )	2024-03-25 09:38:22 +02:00
gpt4all.sh
json-schema-pydantic-example.py	json-schema-to-grammar improvements (+ added to server) (#5978 )	2024-03-21 11:50:43 +00:00
json-schema-to-grammar.py	json-schema-to-grammar : fix order of props + non-str const/enum (#6232 )	2024-03-22 15:07:44 +02:00
llama.vim
llama2-13b.sh
llama2.sh
llm.vim
make-ggml.py
Miku.sh
pydantic-models-to-grammar-examples.py
pydantic_models_to_grammar.py
reason-act.sh
regex-to-grammar.py	json-schema-to-grammar improvements (+ added to server) (#5978 )	2024-03-21 11:50:43 +00:00
server-embd.py
server-llama2-13B.sh
ts-type-to-grammar.sh	json-schema-to-grammar improvements (+ added to server) (#5978 )	2024-03-21 11:50:43 +00:00