基于内容的中文网页自动分类研究

RESEARCH ON THE CONTENT-BASED CHINESE WEB PAGES AUTOMATIC CLASSIFICATION

  • 摘要: 本文主要介绍基于内容的网页自动分类系统,具体介绍了类别词典的建造方法,网页超文本类别词切分的方法,中文网页自动分类算法以及利用类别词与网页间的模糊关系对网页文本进行自动分类等内容.通过对旅游网页进行测试,自动分类正确率可达93.37%以上,有效地提高了查准率和查全率.

     

    Abstract: This paper mainly introduces a content-based web pages automatic classification system. Especially, it discusses the method of web pages key words extraction,the auto matic classification algorithm of Chinese web pages.The algo rithm decides on member ship function of indexing descriptors belonging in each of classification by statistics and classifies a rchives with fuzzy relation.The results of the exper iment show that the two impor tant factors,which are used for evaluating the automatic classification algo rithm,precision and recall,are improved.

     

/

返回文章
返回