I built a 5M model to see if it outperforms my 350M model…

I built a 5M Llama model with HF Transformers on 2x T4 in Kaggle to see, if it is able to be as good as my previous Apex 350M model (https://huggingface.co/LH-Tech-AI/Apex-1.6-Instruct-350M).

Link to the research site: https://lh-tech.de/ai/sub-5m-research.html

It came out, that if you optimize the model enough and train it on much data it can be nearly as good as a 70 times heavier model (like Apex 350M; GPT-2 architecture).

Tell me what you think about it!

Spark v5 coming soon... Expect it to be good 😃

https://preview.redd.it/pb2h7lnudbyg1.png?width=1009&format=png&auto=webp&s=87daa2a1c7b9ceb209cb7f0ff1da089a0e82b12e

submitted by /u/LH-Tech_AI
[link] [comments]

Leave a Comment