医疗CT文本结构化研究与系统实现

学校代码*****学号************分类号TP391密级公开
硕士学位论文
医疗CT文本结构化研究与系统实现
学位申请人蒋鸣珂
指导教师曾伟红高级工程师
学院名称信息工程学院
学科专业计算机技术
秦淮名妓
研究方向知识处理与智能系统
二〇一八年六月四日
Structural research and system implementation of medical CT text
Candidate Mingke Jiang
Supervisor SN ENGR.Weihong Zeng
College College of Information Engineering
思其始而成其终Program computer technology
Specialization Knowledge Processing and Intelligent Systems Degree Master of Engineering
University Xiangtan University
Date June04,2018青春的起点
摘要
苯酚丙酮
随着医疗行业和信息技术的迅速发展,目前各大医院在提供医疗服务的过程中都会产生大量非结构化的医疗文本数据。CT文本报告是一种非常重要的医疗文本数据,作为一种医疗数据的载体,记录器官CT下所见形态以及是否有病变等相关信息。CT文本报告对于医生对病人的疾病诊断非常重要,是最终诊断结果的一项重要依据。
目前对于医疗相关文本处理的方法主要依靠诊疗医生的个人知识经验对文本进行处理,但是人工处理的方式不仅耗时耗力,而且医生的长时间处理可能会导致准确率难以得到保证。因此基于智能化方法挖掘医疗文本的价值具有行业实用意义。
本文围绕医疗CT文本挖掘做了以下几点工作来实现非结构化CT文本价值挖掘:
(1)针对医疗领域表述相对专业的特点,本文提出了一种结合条件随机场(CRF)和医疗固有规则推导的CT文本命名实体识别的方法,在对原始CT文本进行中文分词以及词性标注之后,使用CRF++工具对文本进行成分序列标注以及句子序列标注,然后结合人工归纳的推导规则以及使用word2vec对提取出的实体进行归一化训练后,能够有效提取文本中实体,挖掘出实体之间的关系。
(2)本文创新地提出了应用于医疗文本数据的非结构化文本数据结构化方法论体系,即一种由CT文
本报告预处理阶段、CT文本报告属性提取阶段和CT文本报告即时结构化三个阶段构成的一个对CT文本进行结构化的方法。
(3)构建了医疗影像CT文本报告结构化原型系统,使用数据对系统进行测试,通过实验表明本文提出的结构化处理方法准确率可达86.7%,达到了系统预期设计的要求,为CT文本结构化提供了一种处理方法。
关键词:自然语言处理;CT文本;结构化;条件随机场;词向量;
Abstract
With the rapid development of the medical industry and information technology, major hospitals produce a large amount of unstructured medical text data in the process of providing medical services currently.The CT text report is a very important medical text data,as a carrier of medical data,which records the morphology of the organ under CT and the presence or absence of lesions and so on related information.The CT text report is very important for the doctor to diagnose the patient's illness and is an important basis for the final diagnosis.
五月槐花香主题曲
The current method of medical-related text processing mainly relies on the personal knowledge and
experience of the doctors,but this manual method is not only time-consuming and labor-intensive,and the doctor's long-term treatment may make it difficult to ensure accuracy.Therefore,mining medical texts based on intelligent methods has practical significance in the industry.
This paper focuses on the following aspects of medical CT text mining to achieve unstructured CT text value mining:
(1)In view of the relatively professional characteristics of medical field expression,this paper proposes a method of CT text named entity recognition based on conditional random fields(CRF)and medical inherent rules.After the Chinese word segmentation and the part of speech tagging of the original CT text report,the CRF++tool is used to mark the component sequence and the sentence sequence,then combining the deduction rules of artificial induction and the normalization training of the extracted entity using word2vec,it can effectively extract the entity in the text and excavate the relationship between the entities.
(2)This paper creatively proposed a structured methodology system for unstructured text data used in medical text data,that is a method of structuring CT text,which is composed of three stages o
f CT text report preprocessing phase,CT text report attribute extraction phase and CT text report is actually structured phase.
威尔逊主义(3)A structured prototype system of medical image CT text report is constructed,and the system is tested by using data.The experimental results show that the accuracy of the structured processing method proposed in this paper can reach 86.7%,which has reached the requirements of the system expected design,and provides a processing method for structuring the CT text.
Key words:Natural Language Processing;CT text;structure;conditional random field;word vector.

本文发布于:2024-09-24 22:27:39,感谢您对本站的认可!

本文链接:https://www.17tex.com/xueshu/559852.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:文本   医疗   结构化   处理   进行   方法   数据   报告
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2024 Comsenz Inc.Powered by © 易纺专利技术学习网 豫ICP备2022007602号 豫公网安备41160202000603 站长QQ:729038198 关于我们 投诉建议