基于自组织特征映射的网页分类研究

CLASSIFICATION OF WEB PAGES BASED ON SOFM

  • 摘要: 本文提出了一种SOFM(自组织特征映射)与LVQ(学习矢量量化)相结合的分类算法,利用一种新的网页表示方法,形成特征向量并应用于网页分类中.该方法充分利用了SOFM自组织的特点,同时又利用LVQ解决聚类中测试样本的交迭问题.实验表明它不仅具有较高的训练效率,同时有比较好的查全率和查准率.

     

    Abstract: The web classification is the problem of automatically assigning electronic text documents to pre-specified categories. In this paper,we focus on the SOFM algorithm that is derived automatically using a technique based on frequencies of titles and frequencies of keywords,investigating the effect of such addition on text classification perform ance.Our investigation into keywords,selected on the basis of frequencies confirms that the addition of keywords does give better accuracy,and moreover,the larger the pro portion of key words'features added,the larger the gain.We adopt unsupervised SOFM network to classify appro ximately the web pages.After that,the modified LVQ metho disused to clearly classify the overlap area of each class.The results have shown it is quite pro mising.

     

/

返回文章
返回