TY - GEN
T1 - Semantic Plagiarism Detection of Figures in Scholarly Documents
T2 - 2024 IEEE International Conference on Future Machine Learning and Data Science, FMLDS 2024
AU - Batool, Hafsa
AU - Islam, Syed M.S.
AU - Janjua, Naeem
PY - 2024
Y1 - 2024
N2 - Scholarly full-text documents on machine learning typically include numerous result figures that convey valuable information, such as experimental outcomes, assessments, and comparisons between models. However, research work often carries a great risk of plagiarism. Plagiarism can be textual as well as plagiarism of figures. The existing literature largely explores the plagiarism in the text; that is any degree of similarity between the texts of the scholarly documents, thus ignoring the figures. This study builds on the previous literature and brings new insights by proposing a conceptual framework of a system for detecting plagiarism in result-figures of scholarly documents. This would involve generating semantically enriched summaries specific to result-figures, which will be achieved by extracting relevant information from the figures themselves including the area under the curve (AVC), as well as their associated captions in full-text documents. To accomplish this, this study propose to classify the extracted figures and analyze them by parsing the figure text, legends, and data plots, using a convolutional neural network classification model like ResNet50 that is pre-trained on 1.2 million images from ImageNet. The specialized candidate figure summaries would then be evaluated against the specialized actual figure summaries using Jaccard similarity and edit distance metrics thus catering the challenging task of detecting plagiarism of figures.
AB - Scholarly full-text documents on machine learning typically include numerous result figures that convey valuable information, such as experimental outcomes, assessments, and comparisons between models. However, research work often carries a great risk of plagiarism. Plagiarism can be textual as well as plagiarism of figures. The existing literature largely explores the plagiarism in the text; that is any degree of similarity between the texts of the scholarly documents, thus ignoring the figures. This study builds on the previous literature and brings new insights by proposing a conceptual framework of a system for detecting plagiarism in result-figures of scholarly documents. This would involve generating semantically enriched summaries specific to result-figures, which will be achieved by extracting relevant information from the figures themselves including the area under the curve (AVC), as well as their associated captions in full-text documents. To accomplish this, this study propose to classify the extracted figures and analyze them by parsing the figure text, legends, and data plots, using a convolutional neural network classification model like ResNet50 that is pre-trained on 1.2 million images from ImageNet. The specialized candidate figure summaries would then be evaluated against the specialized actual figure summaries using Jaccard similarity and edit distance metrics thus catering the challenging task of detecting plagiarism of figures.
KW - Figure based plagiarism
KW - machine learning
KW - result figure parsing
KW - specialized summaries
KW - text mining
UR - http://www.scopus.com/inward/record.url?scp=85219749634&partnerID=8YFLogxK
U2 - 10.1109/FMLDS63805.2024.00023
DO - 10.1109/FMLDS63805.2024.00023
M3 - Conference contribution
AN - SCOPUS:85219749634
T3 - Proceedings - 2024 IEEE International Conference on Future Machine Learning and Data Science, FMLDS 2024
SP - 69
EP - 74
BT - Proceedings - 2024 IEEE International Conference on Future Machine Learning and Data Science, FMLDS 2024
A2 - Al-Jumaily, Adel
A2 - Islam, Md Rafiqul
A2 - Islam, Syed Mohammad Shamsul
A2 - Bashar, Md Rezaul
PB - Institute of Electrical and Electronics Engineers
Y2 - 20 November 2024 through 23 November 2024
ER -