TY - GEN
T1 - Detection of change frequency in web pages to optimize server-based scheduling
AU - Meegahapola, Lakmal
AU - Alwis, Roshan
AU - Nimalarathna, Eranga
AU - Mallawaarachchi, Vijini
AU - Meedeniya, Dulani
AU - Jayarathna, Sampath
PY - 2018/1/15
Y1 - 2018/1/15
N2 - The Internet at present has become vast and dynamic with the ever increasing number of web pages. These web pages change when more content is added to them. With the availability of change detection and notification systems, keeping track of the changes occurring in web pages has become more simple and straightforward. However, most of these change detection and notification systems work based on predefined crawling schedules with static time intervals. This can become inefficient if there are no relevant changes being made to the web pages, resulting in the wastage of both temporal and computational resources. If the web pages are not crawled frequently, some of the important changes may be missed and there may be delays in notifying the subscribed users. This paper proposes a methodology to detect the frequency of change in web pages to optimize server-side scheduling of change detection and notification systems. The proposed method is based on a dynamic detection process, where the crawling schedule will be adjusted accordingly in order to result in a more efficient server-based scheduler to detect changes in web pages.
AB - The Internet at present has become vast and dynamic with the ever increasing number of web pages. These web pages change when more content is added to them. With the availability of change detection and notification systems, keeping track of the changes occurring in web pages has become more simple and straightforward. However, most of these change detection and notification systems work based on predefined crawling schedules with static time intervals. This can become inefficient if there are no relevant changes being made to the web pages, resulting in the wastage of both temporal and computational resources. If the web pages are not crawled frequently, some of the important changes may be missed and there may be delays in notifying the subscribed users. This paper proposes a methodology to detect the frequency of change in web pages to optimize server-side scheduling of change detection and notification systems. The proposed method is based on a dynamic detection process, where the crawling schedule will be adjusted accordingly in order to result in a more efficient server-based scheduler to detect changes in web pages.
KW - Change detection and notification systems
KW - Change frequency
KW - Crawling
KW - Internet
KW - Server-side scheduling
KW - Web page
UR - http://www.scopus.com/inward/record.url?scp=85048515251&partnerID=8YFLogxK
U2 - 10.1109/ICTER.2017.8257791
DO - 10.1109/ICTER.2017.8257791
M3 - Conference contribution
AN - SCOPUS:85048515251
T3 - 17th International Conference on Advances in ICT for Emerging Regions, ICTer 2017 - Proceedings
SP - 165
EP - 171
BT - 17th International Conference on Advances in ICT for Emerging Regions, ICTer 2017 - Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 17th International Conference on Advances in ICT for Emerging Regions, ICTer 2017
Y2 - 7 September 2017 through 8 September 2017
ER -