Detailed Description
The cloud-edge-end architecture-based multi-center medical equipment big data cloud platform of the present application is described in detail below with reference to the accompanying drawings.
The software and hardware architecture of the medical big data cloud service analysis platform is described as follows:
1. the system adopts a cloud-edge-end structure
As shown in fig. 1, based on geographical location distribution, edge servers are deployed in each center to ensure low-delay data transmission and work such as real-time service data analysis, and display. And deploying a cloud platform, uniformly issuing services and models, ensuring the cooperation of edge clouds, and forming an architecture system with uniform management, flexible deployment and high automation operation and maintenance.
2. Principles of System design
1) The equipment operation information and the clinical information are repeated, all relevant data generated by the terminal equipment are concerned, and the hospital information system is connected to cover all relevant data of the equipment.
2) The method completely supports the functional requirements of the system, supports continuous operation for 7 days by 24 hours, has enough disk capacity, supports the operation speed of a large amount of real-time business processing, manages the capability of a database table in a complex relation, and has the advantages of safety, fault tolerance, support of friendly design of a user interface and the like.
3) The system operation environment and the system structure have strong flexibility, scalability, expandability and openness, not only need to fully consider and meet the current requirements, but also need to be convenient for the later expansion and expansion, and need to protect the existing investment for a long time.
4) The deployment and data synchronization interface of third-party products can be supported.
As shown in fig. 1, the cloud platform for big data of a multi-center medical device according to an embodiment of the present invention includes: the system comprises terminal medical equipment, a terminal medical equipment data acquisition module, an edge data stream cluster, an edge data display module, a cloud platform data lake cluster, a cloud platform technology platform, a cloud platform data AI platform and a cloud platform medical equipment Internet of things service center.
First, acquiring the bottom raw data of the medical equipment
Specifically, medical devices that may be classified as emergency ICU-type medical devices include: monitor, breathing machine, anesthesia machine etc. and large-scale image class medical equipment include: CT, MR, ultrasound, etc. The original data comprises sign data, waveform data, alarm data and log data. The method is compatible with various acquisition transmission protocols such as TCP, UDP, HTTP, local file systems and the like, and supports various structured, semi-structured and unstructured data acquisition such as hl7, xml, json, binary files, pictures, videos and the like.
The original data of the first-aid ICU medical equipment is acquired by integrating a communication protocol through a data acquisition terminal, and the medical equipment is connected with a serial port or a network port.
The large-scale image equipment can directly transmit equipment operation logs, equipment alarm logs and the like in a log file form through the internet access through the authority.
Second, edge server cluster
As shown in fig. 2, the edge server cluster deploys a large data stream platform, which covers a data acquisition server (Nifi), a distributed message engine (Kafka), a Streaming data real-time analysis engine (Spark Streaming), and the like, and implements real-time acquisition, parsing and collecting, stable transmission, and efficient analysis of data.
And displaying the service management and statistical chart information in real time through a data large screen by application display.
1. Real-time management of data streams
The data flow platform deploys distributed data acquisition service (Nifi) key functions including flow management, usability, safety, an extensible architecture and a flexible scaling model, and dynamically establishes connection with an acquisition client of a terminal through visual configuration to realize multi-path concurrent acquisition of mass data;
a platform deploys a data flow monitoring and management system to track basic information and data transmission conditions of medical equipment bound by each terminal in real time; presenting data flow statistical indexes, terminal states, alarm information and the like in real time; each node in the NiFi cluster performs the same task on the data, but each node operates on a different set of data. One node is selected by the ZooKeeper as the cluster coordinator and failover is handled automatically by the ZooKeeper. All cluster nodes report heartbeat and status information to the cluster coordinator. The cluster coordinator is responsible for disconnecting and connecting nodes. In addition, the cluster has a master node, which is also selected by ZooKeeper. As a dataslow manager, a user may interact with the NiFi cluster through a User Interface (UI) of any node. Any changes made will be replicated to all nodes in the cluster, allowing multiple entry points.
Data stream deployment and arrangement: the method solves the problems of terminal deployment and edge acquisition of the IoT application through a flow orchestration deployment model. And the visual arrangement and the release of the original flow are supported, and the deployment and the arrangement of an IoT application edge end are simplified.
Data flow monitoring: the data flow platform provides full-chain data tracking and flow monitoring from the source end to the tail end, and can trace the source end of each piece of data, and can automatically position terminal equipment when transmission faults, data quality problems and the like occur.
2. Analytic collection
The data analysis of the emergency treatment ICU medical equipment is analyzed and structured according to the HL7 standard, and HL7 original messages can be disassembled through a Java framework Hapi.
According to the regulations of manufacturers, the large-scale image medical equipment can analyze log data into structured data and transmit the structured data to the message middleware for data transmission and collection.
Data aggregation is the first step in the entry of various data from different data sources into a big data system. The performance of this step will directly determine the ability of a large data system to process the amount of data in a given time period. The data aggregation process is based on the personalized requirements of the system, but some commonly performed steps are-parsing incoming data, making necessary verifications, data clearness, e.g. data deduplication, format conversion, etc.
The transmissions from the different data sources are made asynchronously. May be transferred using files or implemented using message-oriented middleware. Due to the asynchronous transmission of data, the throughput of the data acquisition process can be much higher than the processing power of large data systems. Asynchronous data transmission can likewise be decoupled between the big data system and different data sources. The design of a big data infrastructure enables the big data infrastructure to be easily dynamically scaled, and the peak flow of data acquisition is safe for a big data system. Through testing, the data platform can process hundred million data levels per second, the data throughput of the platform is guaranteed, and message blocking is prevented.
3. Data transmission
After data is acquired and analyzed by the data acquisition service, a distributed message engine (Kafka) is accessed, the message engine provides distributed caching and parallel transmission of streaming data, one-time and only one-time semantics are realized, and the integrity and uniqueness of the message are ensured; the message engine has the functions of message caching, message distribution, low-delay delivery, data oriented distribution and the function of relieving the production and consumption speed mismatch problem of an upstream producer and a downstream consumer.
4. High efficiency assay
The distributed message engine is in butt joint with a downstream real-time data analysis system (Spark streaming), the real-time analysis system provides parallel consumption and real-time calculation of streaming data, and a calculation result is pushed to downstream dynamic presentation.
5. Application demonstration
By deploying and configuring the Superset, visualization of real-time query and statistical analysis of equipment time sequence data and medical service data is provided. For complex statistical analysis and visualization of machine learning calculation results, interactive presentation of Spark, Pyspark, Spark R and Python machine learning tasks is achieved through Zeppelin, Jupyter notewood and expansion thereof. For visualizations that require a high degree of customization, separate development and deployment is performed by zeppelin + Javascript. The front end is connected with a data large screen, and real-time display is realized based on the web and the like.
Three, cloud platform
Specifically, the cloud platform comprises a data lake cluster, a technical platform, a data AI platform and a medical equipment Internet of things service center.
1. Data lake cluster
Specifically, the data lake cluster maintains all platform data, constructs a data warehouse, completes the ETL work of the data, and stores the data.
1) The application technology is as follows:
hive: a data warehouse tool based on Hadoop is used for data extraction, transformation and loading, and is a mechanism capable of storing, inquiring and analyzing large-scale data stored in Hadoop. The hive data warehouse tool can map the structured data file into a database table, provide SQL query function and convert SQL sentences into MapReduce tasks for execution. For accessing structured device clinical data and device operational data.
Druid: the Druid is a distributed data processing system that supports real-time multidimensional OLAP analysis. The method supports high-speed real-time data intake processing and real-time and flexible multi-dimensional data analysis and query.
HDFS (Hadoop distributed File System): the HDFS is a Hadoop Distributed File System (Hadoop Distributed File System) and realizes reliable Distributed reading and writing of large-scale data. The HDFS aims at the use scene that data read and write have the characteristics of writing once and reading many times. HDFS ensures that a file is written to by only one caller at a time, but can be read by multiple callers. And the functions of reading, writing and storing unstructured data are carried.
sparkSQL: spark SQL is a module used by Spark to process structured data that provides a programming abstraction called DataFrame and acts as a distributed SQL query engine. Has the following characteristics: 1. easy integration 2. unified data access 3. compatible with Hive 4. standard data connection;
2) big data ETL
Referring to fig. 6, the big data platform ETL covers three contents of data collection, data storage and data conversion.
The data acquisition layer is connected with a service system and other associated external data, and supports various heterogeneous data sources such as a relational database, a NoSQL database, files, streaming data and the like. The layer is responsible for efficient and stable data acquisition and transmission, and integrity and consistency of data are guaranteed as far as possible.
The data acquisition layer adopts different technical schemes aiming at the difference of data sources.
For the business data with a relatively complex structure and synchronized regularly, a scheme for starting a timing acquisition task through a data ETL tool (such as Sqoop and Kettle) is adopted.
For data with higher real-time synchronization requirements, a streaming data acquisition scheme (such as Nifi/Flume/Logstash + Kafka, StreamSets) is adopted.
The big data platform stores data by adopting a layered scheme, and divides a data storage part into a source data layer, a data warehouse layer and a data application layer according to different data use scenes. The source data layer stores data reported by the acquisition service, keeps isomorphism of the data with the source system, and synchronizes with the source system data regularly/in real time in an increment/full loading mode. And the data warehouse layer is used for cleaning and processing the data of the source pasting layer and performing extraction and conversion according to the coarse-grained service scene to form a standardized data structure facing the theme. The data application layer organizes data for specific service applications (such as data retrieval, statistical analysis, iterative computation and the like) on the basis of the warehouse layer, and the service applications directly interact with the corresponding data table of the data application layer through an access layer API.
The data storage mode adopts different technical schemes according to the difference of the structure, the application and the source of the data and the requirements of real-time performance, integrity, consistency and the like.
Structured service data (such as HIS system data) for offline calculation is considered to be stored in the HDFS in a Hive table manner; storing time series data (such as equipment data) for real-time query and statistical analysis into the Druid; data (such as intermediate result data of a computing task) with real-time query and real-time update requirements are considered to be stored in the HDFS; and adopting Hive to solve the association operation between the heterogeneous data storages.
The data conversion model mainly undertakes the tasks of data extraction, cleaning, processing and the like of the layered storage model among layers. A set of extensible data conversion model based on a plug-in mode is designed, a model component provides a general data conversion process, and for data conversion requirements of different service types, custom rules can be developed according to interface specifications provided by the model and are accessed into the conversion model in the plug-in mode.
The technical scheme of the data conversion model is as follows: sqoop custom functions or a Kettle extension plug-in.
After the data passes through the ETL, a data warehouse or database for different subjects is formed, including a medical device business database comprising: an ICU equipment clinical database, a large-scale image log database, a large-scale image fault maintenance database, an electronic medical record document library, an operation management database and the like in each department.
The following description is given with reference to specific examples, the structured data of the ICU-like device of the present example is fused with the hospital information system to form an emergency ICU data warehouse, a data model, and a flowchart is shown in fig. 8.
The IO access of multiple data sources uses spark SQL, and the data sources comprise original HL7 message data streams of a monitor and xml messages of a hospital information system;
the data cleaning, the conversion and the extraction are completed by using distributed computation, an ETL tool is formed by configuring codes, the data cleaning is mainly implemented by setting a threshold value and an abnormal value to filter data, and screening and discarding null values;
the structured loading of the data is realized through spark SQL, and the structured processing is carried out after the data in the two data sources are matched;
hive provides meta and data warehouse operations, with data stored at hdfs at all.
The concrete configuration is as follows:
1) extracting data
2) Conversion
Select column names
Non-null processing and outlier processing
Data type conversion
df=df.withColumn('order_type',df.order_type.cast(IntegerType()))
df=df.withColumn('cost_count',df.cost_count.cast(IntegerType()))
df=df.withColumn('is_comb',df.is_comb.cast(IntegerType()))
#dfc=dfc.withColumn('is_insurup',dft.is_insurup.cast(IntegerType()))
df=df.withColumn('tsort',df.tsort.cast(IntegerType()))
df=df.withColumn('tstatus',df.tstatus.cast(IntegerType()))
df=df.withColumn('cost_tstatus',df.cost_tstatus.cast(IntegerType()))
df=df.withColumn('payment_tstatus',df.payment_tstatus.cast(IntegerType()))
df=df.withColumn('is_append',df.is_append.cast(IntegerType()))
#dfc=dfc.withColumn('send_mtl_flag',dft.send_mtl_flag.cast(IntegerType()))
df=df.withColumn('comb_cost_count',
df.comb_cost_count.cast(IntegerType()))
Polymerisation
# aggregation grouping aggregation by cost traffic type
df_order=df.groupBy("order_type").agg(F.sum(df.cost_money-df.prefer_money),F.max(df.cost_money-df.prefer_money))
df_order=df_order.withColumnRenamed("sum((cost_money-prefer_money))","sum_group_order")
df_order=df_order.withColumnRenamed("max((cost_money-prefer_money))","max_group_order")
df_order.show(200,truncate=False)
# polymerization: grouping aggregation by diagnostics department
df_depart=df.groupBy("op_depart_code").agg(F.sum(df.cost_money-df.prefer_money),F.max(df.cost_money-df.prefer_money))
df_depart=df_depart.withColumnRenamed("sum((cost_money-prefer_money))","sum_group_depart")
df_depart=df_depart.withColumnRenamed("max((cost_money-prefer_money))","max_group_depart")
df_depart.show(200,truncate=False)
3) Store and load
2. Technical platform
The technical platform provides technical services required by the platform in a micro-service mode, and mainly comprises data governance and safety control.
Data governance is mainly accomplished by deploying a data governance platform, and the framework is shown in fig. 9.
Through data management, an open and universal data acquisition interface can be constructed, and the data acquisition efficiency is improved; data standards are unified, and data are easily fused; establishing cross-platform data extraction and data tracing, realizing open sharing and getting through an information isolated island; and protecting private data and constructing credible data.
Referring to fig. 10, the platform integrates LDAP, KDC Kerberos, range to implement user and service account management, authorization, authentication, service cluster protection, data access permission control, etc.
The transmission data is encrypted based on the KMS key management service.
The data security transmission and access control are realized by combining the authorization authentication strategy and the security channel, and the security use of the data and the service is ensured.
Referring to fig. 11, the platform security architecture includes: Kerberos/LDAP is used for identity authentication, Ranger is used for authorization audit, Knox is responsible for cluster security, the same account number sharing is met after integration (for example, a user1 can use the user in linux, ambari, anger, Kerberos and the like), the user and administrator information is responsible for maintaining all users and administrator information, the Ranger can synchronize the users in the LDAP and carry out uniform user authority management, and the LDAP user can be configured as a system user of the linux; knox is used as an alternative scheme of an API security gateway, users created in LDAP and Kerberos can be shared, and illegal access Host is intercepted; thereby constituting a security architecture for the data stream and data lake platform.
3. Data AI platform
Platform integration is a service for ModelArts platforms. The ModelArts is a one-stop development platform and can support the full-process development process from data to AI application for developers. As shown in fig. 12, operations including data processing, model training, model management, model deployment, etc. are provided, and AI market functions are provided that enable sharing of models with other developers within the market. The ModelArts supports a plurality of AI application scenes such as image classification, image detection, video analysis, voice recognition, product recommendation and anomaly detection.
4. Medical equipment internet of things service center
The big data analysis of the integrated medical equipment thing networking of medical equipment thing networking service center forms unified supervision system, includes: cost effectiveness, service supervision, guarantee analysis, quality safety and use analysis.
1. Single machine benefit analysis
Selecting representative equipment for single machine benefit analysis, comprising: the device has high value and has great influence on the hospital income, such as CT, DSA, DR, MR, color Doppler ultrasound, biochemical analyzers and the like in outpatient medical technical departments; secondly, equipment which occupies a large amount and is widely distributed in a hospital, such as a multi-parameter monitor, is generally analyzed by selecting the equipment in a certain department as a whole; and devices which are not high in use rate but are necessary for rescuing patients, such as a defibrillator, a breathing machine and the like. The following indices were calculated: average charging standard, average unit variable cost, average unit income, periodic warranty income, warranty point (namely the minimum business volume which must be reached each year for avoiding the loss of equipment), hospital return year, the net income created by the equipment in the specified year, annual investment earning rate and the like, and are displayed on a data large screen in a tabular form.
2. Medical equipment performance prediction model based on data mining technology
The data mining technology is utilized to fuse operation log and maintenance log data generated by large-scale image equipment with data such as equipment maintenance cost, income cost, depreciation cost and the like. And constructing a predictive performance model of each medical device by using a decision tree algorithm to obtain a device performance score.
3. Reservation/inspection person number statistical analysis for large-scale image equipment
The method comprises the steps of monitoring the current reservation times and the inspection times of large-scale image medical equipment such as CT, MR and the like in real time, wherein the current reservation times and the inspection times comprise statistics of the times, department ranking, monthly reservation times/inspection times trend broken lines and monthly reservation waiting time statistics.
4. Intelligent report form of large-scale image medical equipment
Through scientific and reasonable evaluation and analysis of equipment operation logs, maintenance and key part scanning data, report statistics is formed on the operation condition of the medical equipment, and the report statistics can be directly displayed through system calling. The main indexes comprise: the method comprises the steps of key part monthly error reporting frequency statistics, medical equipment utilization rate statistical analysis, medical equipment daily/monthly inspection part change statistics, medical equipment monthly maintenance frequency statistics and key part damage rate.
5. Accurate preventive maintenance of hemodialysis machine
And predicting the running data of the hemodialysis machine equipment based on a BP neural network algorithm, prompting equipment maintenance in advance, and providing a preventive maintenance scheme.
6. Health degree evaluation standard system for large-scale image equipment
And (3) combining the experience of a maintenance engineer, obtaining an FTA chart and an FMEA chart of the equipment by using fault tree analysis and fault mode and influence analysis methods, establishing an equipment fault experience database, judging the health distribution type of the equipment, estimating the distribution friction number, and completing the health measurement analysis with high confidence level.
7. Patient abnormal state detection system
A patient abnormal state detection algorithm is developed for the vital sign monitoring data of the department patient, as shown in fig. 13, a real-time patient state score is obtained, and a threshold value is set. When the alarm exceeds the threshold value or the trend rises, the alarm is generated to assist medical personnel in carrying out rescue measures.
8. Prediction and early warning model based on large-scale image equipment log key parts
Based on data fusion of large-scale image equipment log data and equipment maintenance system data (MEIS), training the feature data through a machine learning algorithm to obtain a classification result, judging whether the equipment fails in a T +1 time period, and early warning in time, as shown in FIG. 14.
Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples set forth in this application are illustrative only and not intended to be limiting.
Although the present invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the teachings of this application and yet remain within the scope of this application.