基于自组织特征映射的网页分类研究

张义忠, 赵明生, 梁久祯

张义忠, 赵明生, 梁久祯. 基于自组织特征映射的网页分类研究[J]. 信息与控制, 2003, 32(2): 108-112,117.
引用本文: 张义忠, 赵明生, 梁久祯. 基于自组织特征映射的网页分类研究[J]. 信息与控制, 2003, 32(2): 108-112,117.
ZHANG Yi-zhong, ZHAO Ming-sheng, LIANG Jiu-zhen. CLASSIFICATION OF WEB PAGES BASED ON SOFM[J]. INFORMATION AND CONTROL, 2003, 32(2): 108-112,117.
Citation: ZHANG Yi-zhong, ZHAO Ming-sheng, LIANG Jiu-zhen. CLASSIFICATION OF WEB PAGES BASED ON SOFM[J]. INFORMATION AND CONTROL, 2003, 32(2): 108-112,117.

基于自组织特征映射的网页分类研究

基金项目: 清华大学985项目;国家自然科学基金资助项目(60003014)
详细信息
    作者简介:

    张义忠(1972- ),男,博士.研究领域为计算机信息网络、人工智能和因特网技术的开发和研究.
    赵明生(1968- ),男,博士.研究领域为信号与信息处理.
    梁久祯(1968- ),男,博士.研究领域为信号与信息处理.

  • 中图分类号: T393P

CLASSIFICATION OF WEB PAGES BASED ON SOFM

  • 摘要: 本文提出了一种SOFM(自组织特征映射)与LVQ(学习矢量量化)相结合的分类算法,利用一种新的网页表示方法,形成特征向量并应用于网页分类中.该方法充分利用了SOFM自组织的特点,同时又利用LVQ解决聚类中测试样本的交迭问题.实验表明它不仅具有较高的训练效率,同时有比较好的查全率和查准率.
    Abstract: The web classification is the problem of automatically assigning electronic text documents to pre-specified categories. In this paper,we focus on the SOFM algorithm that is derived automatically using a technique based on frequencies of titles and frequencies of keywords,investigating the effect of such addition on text classification perform ance.Our investigation into keywords,selected on the basis of frequencies confirms that the addition of keywords does give better accuracy,and moreover,the larger the pro portion of key words'features added,the larger the gain.We adopt unsupervised SOFM network to classify appro ximately the web pages.After that,the modified LVQ metho disused to clearly classify the overlap area of each class.The results have shown it is quite pro mising.
  • [1] Salton G,Allan J,Buckley C, et al. Automatic analysis,theme generation and summarization of machine-readable texts[J].Science,1994,264:1421~1426
    [2] William W C, Yoram S. Context-sensitive learning methods for text categoriza tion [A]. Hans-Pater Frei,Donna Harman,Peter Schanble. Nineteenth Annual Inte rnational ACM SIGIR Conference on Research and Development in Information Retrie val[C]. Zurich:1996.307~315
    [3] Kivinen J, Warmuth M K. Exponentiated gradient versus gradient decent for li near predictors[R].Santa Cruz:University of California,1994
    [4] David L, Robert S, James P C, et al. Training algorithms for linear class ifiers [A]. Hans-Peter Frei,Donna Harman,Peter Schanble. Nineteenth Annual In ternational ACM SIGIR Conference on Research and Development in Information Retr ieval[C]. Zurich:1996.298~300
    [5] Kohonen T. Automatic formation of topological maps in self-organizing s ystem[A]. Oja E, Simula O. Proceedings of the 2nd Scand Inavian Conf on Image Analysis[C]. 1981.214~220
计量
  • 文章访问数: 
  • HTML全文浏览量:  0
  • PDF下载量: 
  • 被引次数: 0
出版历程
  • 收稿日期:  2002-04-17
  • 发布日期:  2003-04-19

目录

    /

    返回文章
    返回
    x