| Hi r/LocalLLaMA ! I built a 5M Llama model with HF Transformers on 2x T4 in Kaggle to see, if it is able to be as good as my previous Apex 350M model (https://huggingface.co/LH-Tech-AI/Apex-1.6-Instruct-350M). Link to the research site: https://lh-tech.de/ai/sub-5m-research.html It came out, that if you optimize the model enough and train it on much data it can be nearly as good as a 70 times heavier model (like Apex 350M; GPT-2 architecture). Tell me what you think about it! Spark v5 coming soon... Expect it to be good 😃 [link] [comments] |