Abstract
This paper proposes a novel algorithm, called CAPTION, for identifying and correcting errors in automatically generated image captions. The algorithm combines Deep Learning (DL) for object detection in images with Natural Language Processing techniques. CAPTION has been tested in the following three tasks: (1) classify a caption as correct or not; (2) detect wrong words in the caption, and (3) suggest text corrections. Results show that our method is superior with respect to others evaluated in the same data set in the error correction task. These other methods are generally based exclusively on DL models. This work shows that, although semantics still has not been used at its fullest in this type of task, a combination of DL with Natural Language Processing tools presents a better overall performance than using DL methods alone.
Original language | English |
---|---|
Article number | 390 |
Number of pages | 16 |
Journal | SN Computer Science |
Volume | 3 |
DOIs | |
Publication status | Published - 23 Jul 2022 |
Keywords
- Computer vision
- Image captioning
- Machine learning
- NLP