| Hello LocalLLamas, I'd like to share a fine-tuned model I've been working on: Model: https://huggingface.co/lolzinventor/Qwen3.5-4B-Base-ZitGen-V1 I thought some of you might find it interesting. It is an image captioning fine-tune optimized for Stable Diffusion prompt generation (i.e., image-to-prompt). What Makes This UniqueWhat makes this fine-tune unique is that the dataset (images + prompts) was generated entirely by LLMs tasked with regenerating a target image. The ProcessThe process is as follows:
Training DetailsThe system employed between 4 and 6 rounds of comparison and correction to generate each prompt-image pair. In theory, this process adapts the prompt to minimize the difference between the target image and the generated image, thereby tailoring the prompt to the specific SD model being used. The prompts were then ranked and filtered to remove occasional LLM errors, such as residuals from the original prompt or undesirable artifacts (e.g., watermarks). Finally, the prompts and images were formatted into the ShareGPT dataset format and used to train Qwen 3.5 4B. DatasetGiven that all the data used to create the fine-tune was created synthetically, is it free from any copyright issues? [link] [comments] |