Customization of the text-to-image diffusion model by fine-tuning for the generation of synthetic images of cyanobacterial blooms in lentic water bodies
Loading...
Official URL
Full text at PDC
Publication date
2025
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Elsevier
Citation
Barrientos-Espillco, F., Pajares, G., López-Orozco, J. A., & Besada-Portas, E. (2025). Customization of the text-to-image diffusion model by fine-tuning for the generation of synthetic images of cyanobacterial blooms in lentic water bodies. Expert Systems with Applications, 128169.
Abstract
Cyanobacterial blooms emerge unpredictably on the surface of lentic water bodies, posing both ecological threats and public health risks. To effectively monitor these events, this study introduces the use of Machine Vision Systems (MVS) integrated into Autonomous Surface Vehicles (ASVs). These ASVs are capable of autonomous and safe navigation, enabling them to detect cyanobacterial blooms while avoiding obstacles. Convolutional Neural Networks (CNNs) are employed for early detection and continuous monitoring, but their effectiveness hinges on access to large, high-quality training datasets. Due to the sporadic and uncontrollable nature of bloom occurrences, acquiring sufficient real-world images for training and validating CNN models is a significant challenge. To overcome this, the Stable Diffusion XL (SDXL) text-to-image generative model is utilized to produce realistic synthetic images, ensuring a sufficient dataset for training. However, SDXL alone struggles to accurately depict cyanobacterial blooms. To address this limitation, DreamBooth is used to fine-tune SDXL with a small set of real bloom-specific image patches. To ensure the diversity of the synthetic dataset, detailed prompts for SDXL are generated using a Large Language Model (LLM). The combination of SDXL fine-tuning with LLM-driven prompts design applied to environmental monitoring and autonomous navigation in lentic environments represents the core innovation of this work. A dual-task CNN model is then trained on the synthetic dataset to simultaneously detect blooms and obstacles. Experimental results demonstrate the effectiveness and novelty of the proposed approach, showing improvements of up to 15.74% in object detection and 6.48% in semantic segmentation compared to the baseline dataset.
Description
This work has been supported by the Research Projects IA-GESFig. 8. Visual presentation of experimental results obtained using the dual-task CNN model, trained independently on three different datasets: Barrientos-Espillco et al. (2024), custom Stable Diffusion XL, and a combination of both. In the first row, the results of the model trained exclusively on the Barrientos-Espillco et al. (2024) dataset are shown. The second row presents the results of the model trained with synthetic images generated by Stable Diffusion XL. In the third row, the results of the model trained using the combination of both datasets are illustrated. F. Barrientos-Espillco et al. Expert Systems With Applications 287 (2025) 128169 14 BLOOM-CM (Y2020/TCS-6420) of the Synergic program of the Comunidad Autonoma ´ de Madrid, SMART-BLOOMS (TED2021-130123B-I00) funded by the Spanish Ministry of Science and Innovation and the European Union NextGeneration, and INSERTION (PID2021-127648OBC33) of the Knowledge Generation Programs of the Spanish Ministry of Science and Innovation. The first author, Fredy Barrientos-Espillco, is supported by a scholarship by PRONABEC, Ministry of Education of Peru













