Last modified on March 3, 2015, at 16:36

A Persian web page classifier applying a combination of content-based and context-based features

Revision as of 16:36, March 3, 2015 by Fnielsen (Talk | contribs) (Wrong DOI)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Publication (help)
A Persian web page classifier applying a combination of content-based and context-based features
Authors: Mojgan Farhoodi, Alireza Yari, Maryam Mahmoudi [edit item]
Citation: International Journal of Information Studies 1 (4): 263-71. 2009 October.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
A Persian web page classifier applying a combination of content-based and context-based features is a publication by Mojgan Farhoodi, Alireza Yari, Maryam Mahmoudi.


[edit] Abstract

There are many automatic classification methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classification. In this work we have suggested a linear combination of different features and adjusting the optimum weighing during application. To show the outcome of this approach, we have conducted various experiments on a dataset consisting of all pages belonging to Persian Wikipedia in the field of computer. These experiments demonstrate the usefulness of using content-based and context-based web page features in a linear weighted combination.

[edit] Research questions

"There are many automatic classifi cation methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classifi cation. In this work we have suggested a linear combination of different features and adjusting the optimum weighing during application."

Research details

Topics: Text classification [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Persian [edit item]

[edit] Conclusion

"We have proposed a method of classifying the Persian web page documents by linear combination of different features and adjusting the optimum weighting during classifi cation. . The results achieved with the current approach are quite encouraging. In most cases, the algorithm was able to categorize each page in the most appropriate category. The few exceptions appeared due to limitations of the linguistic tools we used for extracting the words."

[edit] Comments

"Experiment: method: linear combination of different features and adjusting the optimum weighting during classifi cation."


Further notes[edit]