%0 Journal Article %A Barrientos-Espillco, Fredy %A Pajares Martínsanz, Gonzalo %A López Orozco, José Antonio %A Besada Portas, Eva %T Customization of the text-to-image diffusion model by fine-tuning for the generation of synthetic images of cyanobacterial blooms in lentic water bodies %D 2025 %@ 0957-4174 %U https://hdl.handle.net/20.500.14352/122408 %X Cyanobacterial blooms emerge unpredictably on the surface of lentic water bodies, posing both ecological threats and public health risks. To effectively monitor these events, this study introduces the use of Machine Vision Systems (MVS) integrated into Autonomous Surface Vehicles (ASVs). These ASVs are capable of autonomous and safe navigation, enabling them to detect cyanobacterial blooms while avoiding obstacles. Convolutional Neural Networks (CNNs) are employed for early detection and continuous monitoring, but their effectiveness hinges on access to large, high-quality training datasets. Due to the sporadic and uncontrollable nature of bloom occurrences, acquiring sufficient real-world images for training and validating CNN models is a significant challenge. To overcome this, the Stable Diffusion XL (SDXL) text-to-image generative model is utilized to produce realistic synthetic images, ensuring a sufficient dataset for training. However, SDXL alone struggles to accurately depict cyanobacterial blooms. To address this limitation, DreamBooth is used to fine-tune SDXL with a small set of real bloom-specific image patches. To ensure the diversity of the synthetic dataset, detailed prompts for SDXL are generated using a Large Language Model (LLM). The combination of SDXL fine-tuning with LLM-driven prompts design applied to environmental monitoring and autonomous navigation in lentic environments represents the core innovation of this work. A dual-task CNN model is then trained on the synthetic dataset to simultaneously detect blooms and obstacles. Experimental results demonstrate the effectiveness and novelty of the proposed approach, showing improvements of up to 15.74% in object detection and 6.48% in semantic segmentation compared to the baseline dataset. %~