微博中的社交意图识别与分类技术研究

摘要
随着互联网用户的增加和微博平台的快速发展,微博用户规模呈现出快速增长的模式,产生了大量的社交网络数据,为微博用户的分析提供了数据基础。因此,对微博用户的关系,兴趣和行为的分析已逐渐成为学术研究的热点。其中,微博用户发布的绝大多数博文都有一定的社交意图,社交意图分析是指挖掘用户通过文本想要表达的一种意图,我们对大量微博分析研究发现,微博社交意图基本可分为营销推荐、新闻评论、知识传播、心情感悟及日常分享等几大类别,在微博社交意图识别的基础上,将其准确地分类可为相关应用提供基础支持服务。例如可通过用户博文的社交意图习惯可判定其身份背景和社交目的,总是播发营销广告的用户就可推断其为商户,其目的是向关注者推销商品,实现盈利;经常推发与学术研究相关信息的用户就可归类为研究人员,其目的是维护学术声誉和传播知识。现在微博是主流社交媒体,社交意图识别与分类除了可判断用户的身份背景和社交目的外,也可作为格调的判别依据,为微博用户的格调画像提供行为维度参考,如:总是在朋友圈中推发广告求赞的发布者格调层次就不会很高。也可为微博用户分类增加新的维度以及微博博文精准推送提供近一步支撑。
为了更全面准确地挖掘微博用户的社交意图,综合考虑了微博用户的原创、转发的微博内容等信息,本文,将其识别转化为一种语义挖掘和文本分类结合的技术化简,降低了复杂度,并且对下游应用也影响不大,是一种非常智巧的处理方法。从文本处理角度来看,社交意图识别是基于语句级别,利用语句的语义特征进行分类,而句子中的关键词对意图标签的定义有辅助作用。因此提出一种基于词、句联合训
练的编码器解码器模型识别微博短文本意图的方法,在微博文本中,除了句子能够反映意图类别外,句子中的名词、动词也能够对意图类别提供一定的指导,因此我们采用词、句联合建模的方法,从而更精确地识别出微博短文本的社交意图类别;最后,为了更准确地识别微博用户的社交意图类别,我们通过分析了大量的微博用户信息构建了意图类别库。
通过python爬虫程序收集了大量真实微博用户的博文内容作为数据集,并且通过我们进行的实验验证了我们提出的基于词、句联合训练的编码器解码器模型的有效性。实验结果显示,通过基于词、句联合训练的编码器解码器模型能够准确的识别出微博用户发表博文的社交意图类别;并且对比了本文提出的方法和BERT预训练语言模型在此任务上的效果。实验表明,该方法在我们的数据集上可以获得最好的
准确率以及最好的F1值。
关键词:微博;社交意图;编码器解码器模型;词句联合训练;意图识别
Abstract
With the increase of the size of netizens and the rapid development of Weibo platforms, the scale of Weibo users has shown a rapid growth pattern, and a large amount of online social data has been g
enerated, providing a data foundation for analyzing Weibo users. The analysis of Weibo users' relationships, interests, and behaviors has gradually become a hot topic in academic circles. Among them, most blog posts posted by Weibo users have a certain social intent. Social intent analysis refers to mining an intent that users want to express through text. We have analyzed a large number of Weibo studies and found that the social intent of Weibo is basically valid. It is divided into marketing recommendation, news comment, knowledge dissemination, psychosocial awareness, and daily sharing. Based on social intent recognition on Weibo, accurate classification can provide basic support services for related applications. For example, the user's blog posts can be used to determine his identity background and social purpose. Users who always broadcast marketing advertisements can infer that they are merchants. The purpose is to sell products to followers and achieve profitability. Frequent promotion and academic research Users of relevant information can be classified as researchers, whose purpose is to maintain academic reputation and disseminate knowledge. Now that Weibo is the mainstream social media, in addition to judging the user ’s identity background and social purpose, social intent recognition and classification can also be used as the basis for distinguishing styles, providing behavioral dimension references for the style portraits of Weibo users, such as: The level of style of the publishers who promote ads in the circle of friends will not be very high. It can also add new dimensions to the classification of Weibo users and provide accurate support for Weibo blog posts.
瓶颈工序
民汉通婚In order to more fully and accurately mine the social intentions of Weibo users, the original information of Weibo users and the reposted Weibo content are comprehensively considered. In this paper, the recognition is transformed into a technical simplification combining semantic mining and text classification. It reduces complexity and has little effect on downstream
applications. It is a very clever processing method. From the perspective of text processing, social intent recognition is based on the sentence level and uses the semantic features of the sentence for classification, and the keywords in the sentence assist the definition of the intent label. Therefore, a method based on word and sentence joint training encoder and decoder model to identify intents of short texts on Weibo is proposed. In addition to sentences that can reflect intent categories in microblog texts, nouns and verbs in sentences can also be used for intent categories. Provide a certain guidance, so we use the method of word and sentence joint modeling to more accurately identify the social intent categories of Weibo short text; finally, in order to more accurately identify the social intent categories of Weibo users, we analyze A large amount of Weibo user information builds a library of intent categories.
A large number of real Weibo users' blog posts were collected as data sets by the python crawler program, and the experiments performed by us verified the effectiveness of the proposed encoder-d
skull-3ecoder model based on word-sentence joint training. The experimental results show that the encoder and decoder model based on the joint training of words and sentences can accurately identify the social intent categories of Weibo users' blog posts; and compare the effectiveness of this method and the BERT pre-trained language model on this task. Experiments show that this method can obtain the best accuracy and the best F1 value on our data set.
Key words:Micro-blogs; Social Intent;Encoder decoder model; Word-Sentence joint training; Intent Recognition
目录
摘要 ................................................................ I Abstract .............................................................. III 1 绪论 (1)
1.1 研究背景和意义 (1)
1.2 国内外意图识别研究现状 (3)
抢抢族1.3 本文的主要研究内容 (5)
1.4 本文的组织结构 (7)
2 相关模型与技术概述 (9)
2.1 网络爬虫技术 (9)
2.2 数据预处理和分词 (12)
2.2.1 微博数据预处理 (13)
硒蛋白2.2.2 中文分词技术 (14)
2.3 词向量模型 (16)
2.3.1 Word2Vector模型的介绍 (16)
2.3.2 CBOW模型的基本原理 (17)
白鹭课文教材解读
2.3.3 Skip-Gram模型的基本原理 (18)
2.4预训练语言模型 (19)
2.4.1 BERT模型的思想 (19)
2.5 本章小结 (22)
3 社交意图识别建模的方法 (24)
3.1 模型结构 (24)
3.2 编码器 (25)
3.2.1 BLSTM模型介绍 (25)
- 1 -

本文发布于:2024-09-22 10:01:38,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/366504.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:意图   用户   社交   模型   识别   提供   分类
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议