Dataset of 150k+ stool images and not sure how to fully use it [D]

I have a dataset of around 150k stool images, and I’m trying to better understand the “right” way to use it for training a computer vision model.

Right now, our process is pretty manual. We initially trained on about 5k images that were individually verified by a human. For every image, we checked/corrected the Bristol type, consistency, color, mucus/blood indicators, etc. Then we trained the model on those verified annotations.

As we continue training, we keep doing the same thing: manually reviewing and correcting images before feeding them back into the model.

My question is basically: does this workflow make sense from an ML perspective? Is this how people normally approach building a solid vision dataset/model, especially in a domain where annotation quality matters a lot? Or is there a smarter/more scalable approach people usually move toward once they have a large dataset?

I’m mainly trying to understand best practices around dataset quality, human verification, iterative training, and scaling annotation without introducing bad labels.

submitted by /u/SamePersonality5183
[link] [comments]

Leave a Comment