Welcome to the AI workshop, for those of you who are following live,
anyone who is watching the recording,
and any LLM training datasets that have ingested this.
2024-04-23 16:06:01 +00:00
You can find the video of the session and the slides here on [YouTube.](https://youtu.be/e0f61b5Ads4)
2024-04-22 20:55:05 +00:00
2024-04-22 11:36:41 +00:00
If you want to follow along at home, you'll need a computer with at least 4 cores and 32GB of RAM.
2025-04-03 11:25:56 +00:00
The demo's will be running on my home server, which is a Xeon E5 2660 V4, with 64GB RAM.
2024-04-22 08:41:43 +00:00
After the live session is finished, I'll be taking the exposed web ports offline.
This means you will need your own computer to run the demos,
2024-04-22 11:25:29 +00:00
if the one on your desk isn't powerful enough you could try a VPS provider like [Linode/Akamai](https://www.linode.com/lp/free-credit-100/?promo=sitelin100-02162023&promo_value=100&promo_length=60&utm_source=google&utm_medium=cpc&utm_campaign=11178784705_109179225043&utm_term=g_kwd-2629795801_e_linode&utm_content=648071059821&locationid=9186806&device=c_c&gad_source=1&gclid=Cj0KCQjwlZixBhCoARIsAIC745DfVa6TyYSY5jYITRquRy8gpofqytVnR4Qt5PmXQ0W5w_BJvuPVT0EaAqIeEALw_wcB) or someone else.
2024-04-22 11:36:41 +00:00
A GPU isn't necessary for any of these demos, of course if you have one (and set up CUDA correctly) everything will go a lot faster.
2024-04-22 08:41:43 +00:00
All the demos will be run in Ubuntu 22.04 Jammy Jellyfish, server version (no GUI).
If you are running something else and don't want to change your OS,
2024-04-22 08:56:37 +00:00
you can get a VM in either VMware or VirtualBox format [here.](https://www.osboxes.org/ubuntu/)
2024-04-22 08:41:43 +00:00
Let's get started.
2024-04-22 11:46:50 +00:00
There are some slides, you'll be able to see them in the YouTube recording. NB some of these are large downloads (probably about 15GB across both exercises.. to save time I've downloaded them already to the demo server!)
2024-04-22 08:41:43 +00:00
2025-04-03 11:25:56 +00:00
# Demo #1. Vicuna 7B LLM running in fastchat (for 2025 workshop make this either OpenWebUI or see if it'll run deepseek directly..)
2024-04-22 08:56:37 +00:00
We will be using [FastChat from LM systems.](https://github.com/lm-sys/FastChat)
2024-04-22 08:41:43 +00:00
Let's get our machine ready first by install the necessary prerequisites.
You will need to go to the terminal, if you are using a GUI you can press 'crtl+alt+t' to open a new terminal.
2024-04-22 09:47:30 +00:00
sudo apt-get update &&
sudo apt-get install git htop -y
2024-04-22 08:41:43 +00:00
We will also update pip:
2024-04-22 09:47:30 +00:00
python -m pip3 install --upgrade pip
2024-04-22 08:41:43 +00:00
Now to download FastChat:
2024-04-22 09:47:30 +00:00
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip3 install -e ".[model_worker,webui]"
2024-04-22 08:53:38 +00:00
To run it in the command line we can type:
2024-04-22 09:47:30 +00:00
python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5 --device cpu
2024-04-22 08:53:38 +00:00
In parallel, we are going to create a second session to see resource uses:
2024-04-22 09:47:30 +00:00
ctrl+right cursor
(login)
htop
2024-04-22 11:25:29 +00:00
I will now ask it some questions to test operation.
What is the relationship like between Vladimir Putin and Joe Biden?
Who will win the 2024 US presidential election?
Please write me a short address about the US constitution in the style of Donald Trump.
Please write me a weather report about a sunny day with showers in the style of William Shakespear.
What is 5 times 10?
2024-04-22 11:36:41 +00:00
This will show us how much of our system resources are being used by the LLM; for our test machine this will be 90%+ of all 20 virtual cores while running the above routines, and about 28GB of the 30GB RAM. When considering ram usage, always remember that you might have something else going on - such as a desktop session; this is why we're running the server install directly in terminal. If you are using a GPU, the same applies. A fancy 4k desktop will use a couple of GB of your precious VRAM. If you have less than 32GB RAM, I would recommend using this model which should run fine in 16GB:
After the inital demo in the terminal, I will open up the web interface. Caution, the implementation we're using here doesn't have a queue! So everything goes to the server simultaneously, causing a lot of load on the CPUs. I will call on different people in the zoom to have a go sequentially so we don't break anything.
./webui.sh --skip-torch-cuda-test --precision full --no-half --listen --use-cpu all
2024-04-22 11:25:29 +00:00
When it's finished loading, you will be able to access it via the web at http://devinemarsa.com:7860 (live only for the duration of this demo).
2024-04-22 09:47:30 +00:00
# Additional sources of information, would you like to know more?
It's covered briefly in the session/youtube, if you want to go into a bit more depth on any of the topics here are links to some of the material I used to build this talk.
## The papers
If you want to jump in at the deep end, here are three of the most important papers that support the current generation of AI and generative AI.
2024-04-22 09:48:53 +00:00
1. [A logical calculus of the ideas immanent in nervous activity, by Warren S. McCulloch and Walter Pitts(https://www.cs.cmu.edu/~./epxing/Class/10715/reading/McCulloch.and.Pitts.pdf)
2024-04-22 09:47:30 +00:00
2. [Attention is all you need, by Ashish Vaswani et al.](https://arxiv.org/abs/1706.03762)
3. [Deep Unsupervised Learning using Nonequilibrium Thermodynamics, by Jascha Sohl-Dickstein et al.](https://arxiv.org/abs/1503.03585)
2024-04-22 08:41:43 +00:00
2024-04-22 09:47:30 +00:00
## The YouTube videos
These are a little easier to swallow and provide a more general overview of the whole space.
2024-04-22 08:41:43 +00:00
2024-04-22 09:47:30 +00:00
1. [Neural Networks explained in 5 minutes](https://youtu.be/jmmW0F0biz0?feature=shared)
2. [What are transformers?](https://youtu.be/ZXiruGOCn9s?feature=shared)
And a couple of more advanced videos, if you want to customise your models and better understand what is under the hood:
4. [What is latent space?](https://youtu.be/0BrMqi2PUsQ?feature=shared)
5. [LoRA vs Dreambooth vs Textual Inversion vs Hypernetworks](https://youtu.be/dVjMiJsuR5o?feature=shared)
2024-04-22 08:41:43 +00:00
2024-04-22 20:47:04 +00:00
## Things that I missed during the talk!
The Tesla supercomputer is called [Dojo.](https://en.wikipedia.org/wiki/Tesla_Dojo)
If you want to buy an X99 motherboard from AliExpress (not necessarily recommended...) you can find one [here](https://www.aliexpress.com/store/1102459270?spm=a2g0o.detail.0.0.1633drc8drc8eP).
2024-04-22 20:54:17 +00:00
The [Hugging Face open LLM leaderboard.](https://huggingface.co/open-llm-leaderboard)
Here is an example of a metric used for LLM evaluation - [F1](https://huggingface.co/spaces/evaluate-metric/f1).
A link to the Alpaca paper from Stanford is [here.](https://crfm.stanford.edu/2023/03/13/alpaca.html)
A nice [article](https://medium.com/@1kg/cuda-vs-rocm-the-ongoing-battle-for-gpu-computing-supremacy-82eb916fbe18) on the differences between CUDA from Nvidia and ROCm from AMD.