A Persian web page classifier applying a combination of content-based and context-based features

From WikiLit
Jump to: navigation, search
Publication (help)
A Persian web page classifier applying a combination of content-based and context-based features
Authors: Mojgan Farhoodi, Alireza Yari, Maryam Mahmoudi [edit item]
Citation: International Journal of Information Studies 1 (4): 263-71. 2009 October.
Publication type: Journal article
Peer-reviewed: Yes
Database(s):
DOI: Define doi.
Google Scholar cites: Citations
Link(s): Paper link
Added by Wikilit team: Added on initial load
Search
Article: Google Scholar BASE PubMed
Other scholarly wikis: AcaWiki Brede Wiki WikiPapers
Web search: Bing Google Yahoo!Google PDF
Other:
Services
Format: BibTeX
A Persian web page classifier applying a combination of content-based and context-based features is a publication by Mojgan Farhoodi, Alireza Yari, Maryam Mahmoudi.


[edit] Abstract

There are many automatic classification methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classification. In this work we have suggested a linear combination of different features and adjusting the optimum weighing during application. To show the outcome of this approach, we have conducted various experiments on a dataset consisting of all pages belonging to Persian Wikipedia in the field of computer. These experiments demonstrate the usefulness of using content-based and context-based web page features in a linear weighted combination.

[edit] Research questions

"There are many automatic classifi cation methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classifi cation. In this work we have suggested a linear combination of different features and adjusting the optimum weighing during application."

Research details

Topics: Text classification [edit item]
Domains: Computer science [edit item]
Theory type: Design and action [edit item]
Wikipedia coverage: Sample data [edit item]
Theories: "Undetermined" [edit item]
Research design: Experiment [edit item]
Data source: Experiment responses, Wikipedia pages [edit item]
Collected data time dimension: Cross-sectional [edit item]
Unit of analysis: Article [edit item]
Wikipedia data extraction: Live Wikipedia [edit item]
Wikipedia page type: Article [edit item]
Wikipedia language: Persian [edit item]

[edit] Conclusion

"We have proposed a method of classifying the Persian web page documents by linear combination of different features and adjusting the optimum weighting during classifi cation. . The results achieved with the current approach are quite encouraging. In most cases, the algorithm was able to categorize each page in the most appropriate category. The few exceptions appeared due to limitations of the linguistic tools we used for extracting the words."

[edit] Comments

"Experiment: method: linear combination of different features and adjusting the optimum weighting during classifi cation."


Further notes[edit]

Facts about "A Persian web page classifier applying a combination of content-based and context-based features"RDF feed
AbstractThere are many automatic classification meThere are many automatic classification methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classification. In this work we have suggested a linear combination of different features and adjusting the optimum weighing during application. To show the outcome of this approach, we have conducted various experiments on a dataset consisting of all pages belonging to Persian Wikipedia in the field of computer. These experiments demonstrate the usefulness of using content-based and context-based web page features in a linear weighted combination.features in a linear weighted combination.
Added by wikilit teamAdded on initial load +
Collected data time dimensionCross-sectional +
CommentsExperiment: method: linear combination of different features and adjusting the optimum weighting during classifi cation.
ConclusionWe have proposed a method of classifying tWe have proposed a method of classifying the Persian web page documents by linear combination of different features and adjusting the optimum weighting during classifi cation. . The results achieved with the current approach are quite encouraging. In most cases, the algorithm was able to categorize each page in the most appropriate category. The few exceptions appeared due to limitations of the linguistic tools we used for extracting the words.ic tools we used for extracting the words.
Data sourceExperiment responses + and Wikipedia pages +
Google scholar urlhttp://scholar.google.com/scholar?ie=UTF-8&q=%22A%2BPersian%2Bweb%2Bpage%2Bclassifier%2Bapplying%2Ba%2Bcombination%2Bof%2Bcontent-based%2Band%2Bcontext-based%2Bfeatures%22 +
Has authorMojgan Farhoodi +, Alireza Yari + and Maryam Mahmoudi +
Has domainComputer science +
Has topicText classification +
Issue4 +
MonthOctober +
Pages263-71 +
Peer reviewedYes +
Publication typeJournal article +
Published inInternational Journal of Information Studies +
Research designExperiment +
Research questionsThere are many automatic classifi cation mThere are many automatic classifi cation methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classifi cation. In this work we have suggested a linear combination of different features and adjusting the optimum weighing during application.g the optimum weighing during application.
Revid11,617 +
TheoriesUndetermined
Theory typeDesign and action +
TitleA Persian web page classifier applying a combination of content-based and context-based features
Unit of analysisArticle +
Urlhttp://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5273915&tag=1 +
Volume1 +
Wikipedia coverageSample data +
Wikipedia data extractionLive Wikipedia +
Wikipedia languagePersian +
Wikipedia page typeArticle +
Year2009 +