Abstract
The rapid spread of fake content across digital platforms, including text, and images, poses significant challenges to the integrity of information. While recent advancements in multimodal fake detection have shown promise, existing models often focus on a single modality or lack scalability and transferability to unseen events. This study addresses these limitations by developing a novel fake detection model based on the CLIP and BLIP architectures, designed to be scalable across text and image modalities. The proposed architecture makes use of a BLIP model to extract image-text similarity and a CLIP model to extract image and text embeddings. The hybrid fusion technique was used to fuse the features. The proposed model was evaluated against existing bimodal approaches, demonstrating superior performance with an accuracy of 90.63%, precision of 92.04%, recall of 90.73%, and an F1-score of 91.38%. These results highlight the model’s effectiveness in accurately detecting fake content across diverse modalities while maintaining a balanced performance between precision and recall. This research contributes to the advancement of multimodal fake detection by providing a scalable and comprehensive approach, paving the way for future developments in combating misinformation across various digital mediums.
Original language | English |
---|---|
Pages | 54-59 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 11 Mar 2025 |
Event | 17th International Conference on Development in eSystem Engineering (DeSE) - University of Sharjah, Dubai, United Arab Emirates Duration: 6 Nov 2024 → 8 Nov 2024 https://dese.ai/dese-2024/ |
Conference
Conference | 17th International Conference on Development in eSystem Engineering (DeSE) |
---|---|
Country/Territory | United Arab Emirates |
City | Dubai |
Period | 6/11/24 → 8/11/24 |
Internet address |
Bibliographical note
Publisher Copyright:© 2024 IEEE.