llama_cpp_for_radxa_dragon_.../scripts
jaime-m-p 37bef89433
tokenizer : BPE fixes (#7530)
* Random test: add_bos_token, add_eos_token
* Random test: add BPE models for testing
* Custom regex split fails with codepoint 0
* Fix falcon punctuation regex
* Refactor llm_tokenizer_bpe: move code to constructor
* Move 'add_special_bos/eos' logic to llm_tokenizer_bpe
* Move tokenizer flags to vocab structure.
* Default values for special_add_bos/eos
* Build vocab.special_tokens_cache using vocab token types
* Generalize 'jina-v2' per token attributes
* Fix unicode whitespaces (deepseek-coder, deepseek-llm)
* Skip missing byte tokens (falcon)
* Better unicode data generation
* Replace char32_t with uint32_t
2024-06-18 18:40:52 +02:00
..
build-info.cmake
build-info.sh
check-requirements.sh
ci-run.sh
compare-commits.sh
compare-llama-bench.py
convert-gg.sh
debug-test.sh
gen-authors.sh
gen-build-info-cpp.cmake
gen-unicode-data.py
get-flags.mk
get-hellaswag.sh
get-pg.sh
get-wikitext-2.sh
get-wikitext-103.sh
get-winogrande.sh
hf.sh
install-oneapi.bat
LlamaConfig.cmake.in
pod-llama.sh
qnt-all.sh
run-all-perf.sh
run-all-ppl.sh
run-with-preset.py
server-llm.sh
sync-ggml-am.sh
sync-ggml.last
sync-ggml.sh
verify-checksum-models.py
xxd.cmake