CLASSIFICATION OF WEB PAGES BASED ON SOFM
-
摘要: 本文提出了一种SOFM(自组织特征映射)与LVQ(学习矢量量化)相结合的分类算法,利用一种新的网页表示方法,形成特征向量并应用于网页分类中.该方法充分利用了SOFM自组织的特点,同时又利用LVQ解决聚类中测试样本的交迭问题.实验表明它不仅具有较高的训练效率,同时有比较好的查全率和查准率.Abstract: The web classification is the problem of automatically assigning electronic text documents to pre-specified categories. In this paper,we focus on the SOFM algorithm that is derived automatically using a technique based on frequencies of titles and frequencies of keywords,investigating the effect of such addition on text classification perform ance.Our investigation into keywords,selected on the basis of frequencies confirms that the addition of keywords does give better accuracy,and moreover,the larger the pro portion of key words'features added,the larger the gain.We adopt unsupervised SOFM network to classify appro ximately the web pages.After that,the modified LVQ metho disused to clearly classify the overlap area of each class.The results have shown it is quite pro mising.
-
Keywords:
- classification /
- self-organized feature map /
- feature extraction /
- neural net work
-
-
[1] Salton G,Allan J,Buckley C, et al. Automatic analysis,theme generation and summarization of machine-readable texts[J].Science,1994,264:1421~1426 [2] William W C, Yoram S. Context-sensitive learning methods for text categoriza tion [A]. Hans-Pater Frei,Donna Harman,Peter Schanble. Nineteenth Annual Inte rnational ACM SIGIR Conference on Research and Development in Information Retrie val[C]. Zurich:1996.307~315 [3] Kivinen J, Warmuth M K. Exponentiated gradient versus gradient decent for li near predictors[R].Santa Cruz:University of California,1994 [4] David L, Robert S, James P C, et al. Training algorithms for linear class ifiers [A]. Hans-Peter Frei,Donna Harman,Peter Schanble. Nineteenth Annual In ternational ACM SIGIR Conference on Research and Development in Information Retr ieval[C]. Zurich:1996.298~300 [5] Kohonen T. Automatic formation of topological maps in self-organizing s ystem[A]. Oja E, Simula O. Proceedings of the 2nd Scand Inavian Conf on Image Analysis[C]. 1981.214~220
计量
- 文章访问数:
- HTML全文浏览量: 0
- PDF下载量: