Run inference on MPT-7B using CPU
Find a file
James Devine 4918c8e11c
Update README.md
Updated for model change
2023-08-29 13:12:51 +02:00
media initial commit 2023-06-26 05:36:27 +00:00
.gitignore Initial commit 2023-06-26 01:30:01 -04:00
download_model.py Update download_model.py 2023-08-29 13:09:10 +02:00
inference.py Update inference.py 2023-08-29 13:11:42 +02:00
LICENSE Initial commit 2023-06-26 01:30:01 -04:00
README.md Update README.md 2023-08-29 13:12:51 +02:00
requirements.txt initial commit 2023-06-26 05:36:27 +00:00

MPT 7B inference code using CPU

Run inference on the latest MPT-7B model using your CPU. This inference code uses a ggml quantized model. To run the model we'll use a library called ctransformers that has bindings to ggml in python.

Turn style with history on latest commit:

Inference Chat

Video of initial demo:

Inference Demo

Requirements

I recommend you use docker for this model, it will make everything easier for you. Minimum specs system with 16GB of ram. Recommend to use python 3.10.

Tested working on

Nothing yet!

Setup

First create a venv.

python -m venv env && source env/bin/activate

Next install dependencies.

pip install -r requirements.txt

Next download the quantized model weights (about 4GB).

python download_model.py

Ready to rock, run inference.

python inference.py

Next modify inference script prompt and generation parameters.