An efficient method for extracting web news content

J. Sun, Luyang Tang, Dan Liao, Victor Chang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Web news extraction is a very important step in the process of Web intelligent information processing. It is the basis of research and application of network public opinion monitoring, heterogeneous Web data source integration and information retrieval. Therefore, the research and design of Web news content information extraction method has important research and application value. Using the idea of web information extraction based on statistics and web structure, this paper improves an existing webpage text extraction algorithm named ERBDF and designs a web news text extraction algorithm based on statistics and DOM tree structure (EETD). Finally, two algorithms are tested and compared in the accuracy and speed of text extraction and the results show that EETD has a better overall performance.
    Original languageEnglish
    Title of host publicationProceedings of 2017 International Conference on Engineering and Technology, ICET 2017
    ISBN (Electronic)9781538619490
    DOIs
    Publication statusPublished - 8 Mar 2018
    Event2017 International Conference on Engineering and Technology - UniversityAntalya, Antalya, Turkey
    Duration: 21 Aug 201723 Aug 2017

    Conference

    Conference2017 International Conference on Engineering and Technology
    Abbreviated titleICET 2017
    Country/TerritoryTurkey
    CityAntalya
    Period21/08/1723/08/17

    Fingerprint

    Dive into the research topics of 'An efficient method for extracting web news content'. Together they form a unique fingerprint.

    Cite this