Any there any realistic avenues to decentralised model training?

It seems like our free lunch is slightly erroding with hints of some OS model providers moving away from at least providing as much, and fair enough, but I think we all here value the stability, privacy, and let's be honest the cool factor/fun of local models.

What are the big barriers to a community growing a system for decentralised training?

I can see a few off....

GPU Brand Mismatch

Nvidia is hands down the best for CUDA, but to utilise a decentralised compute you'd likely need a brand agnostic framework, maybe Vulkan? I'm sure Vulkan is terrible for training too.

Data Curation and Quality

We'd need to make our own datasets across a variety of tasks, scrub for PII, and check quality which would take experts for the given task. Also find a place to store that data and build a process for all of the other issues above of curation, PII removal, and quality check.

Decentralised Compute Usage

Assuming we can solve the two above then we need to use high latency, small compute environments to check point the data, and the lack of ECC might hurt. I don't even imagine how we go about this with how to slice the work up and deal with uptimes of gpu's being inconsistent

Defining what types of models to build

You'll have super users wanting 400B+ which seems right as a baseline to distill from, but then the community might be heavily torn between the 30B-200B range of what they want built.

Getting people who actually know how to train.


All this seems like a lot, but I think this should be discussed more because we can't expect our free lunch to last forever, and see if there is even a chance to a community driven way for this?

Any thoughts? I'm sure I've missed a lot more issues, and challenges, or misunderstood some.

submitted by /u/ROS_SDN
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top