Graphical analysis system Health Letter Open --1 basic technology needs analysis and
- Due / background
- Few pictures
- Following entered to specific personal experience as an example, analyzed and summarized the requirements:
- A practical problem, graphical alternative to the command-line script interaction
- Two practical problems to solve migration deployment issues
- Practice the three, to solve environmental construction, software installation problems
- Demand: analysis process (pipeline) to quickly deploy migration
- Technology: the use of virtualization technology:
- Practice Question four: to achieve full process automation / improve efficiency and reduce costs
- The final software architecture design as shown:
Due / background
From three years ago because of work come into contact with the NGS (Generation Sequencing) technology and related Seng Credit analysis, the clinical application of the conversion process encountered a series of technical problems encountered pit more, with the development of a common set of ideas Seng Credit production system, has now completed the first version of the development, it is necessary to do some recording and re-set: This article is based first in a series of articles.
Few pictures
-
Analysis process (pipeline) design: graphical workflow design based on input and output files for all the analysis process
-
Analysis results are stored: the configuration output file data structure can be stored directly in the database
-
Configuration analysis process (pipeline) to run automatically: You can choose how long the polls, which trigger the execution time
-
Filter results: a more complex analysis of the results, you can manually filter, and filter the results generated by the export analysis reports
-
Analysis process (pipeline) run: You can stop halfway and resume operation in the stop position, operating results statistics
-
Server Performance Monitoring: CPU, memory, network, disk space
-
Analysis report templates and output: word format template is easy to customize / DIY.
Following entered to specific personal experience as an example, analyzed and summarized the requirements:
A practical problem, graphical alternative to the command-line script interaction
- Our Technically, one after another to complete a dozen projects, and dozens of pipeline, students write a letter bigwigs of those 500 lines of shell script, basically requires the use of operational personnel in certain skill level (familiar with Linux systems, familiar shell, perl, python, R a programming), which limits the range of use. Then the company on the basis of the script also achieved partially automated, but still can not meet the following conditions:
- 产品注册:广州燃石、诺禾致源、厦门艾德、南京世和等都在进行并完成基于NGS技术的IVD(体外诊断试剂,医疗器械子分类)产品注册。NMPA(原CFDA,国家药监局)就要求这些软件产品必须要有友好的交互界面;要求软件产品经过严格的测试,从单元测试>集成测试>功能测试>性能&稳定性测试;并对分析结果进行临床试验验证。IVD产品应用于临床,须要严谨的验证过程。各个公司的pipeline过不过的了这一关,是个疑问。
- 产品投放:我司还有很多同行将开发完成的试剂盒、试验过程、分析软件作为一整套方案投放到医院科室,出于用户角度考虑,尽可能的实现整套方案的自动化,方便用户使用。曾经听过有的同行要求用户输入一条全自动分析脚本,对方三次都输入错误,还怪用户太笨的段子。
- 内部运营:如果运行软件的不要求熟悉Linux系统,shell,perl,python,R编程等专业技能,是否就能够减少专业人员数量并降低了成本?这里也可以通过自动化脚本实现。
- 结合以上,可以得出需求:
- 图形化交互界面(UI)优于命令行脚本,交互界面:B/S架构的优于C/S架构(升级维护方便)
- 通用图形界面优于非通用图形界面,避免重复开发。
图形程序和分析流程是一对多关系,图形程序能够快速组装分析流程由脚本工具到软件产品生信分析流程抽象上来说其实是基于文件的工作流,如果可以基于B/S实现工作流设计器,图形的工作流能够转换为分析流程脚本运行,也就基本实现了通用化目标。 - 自动化优于手动运行,图形配置自动运行参数优于脚本配置
- 针对以上需求,并结合自身知识结构,做出技术选型如下:
-
隔壁IT圈B/S技术,越来越多的采用前后端分离实现,前端(Browser端):容易上手的vue+element / iview或者react + ant,vue学习曲线平滑,这里选择vue;前端需要长连接与后端通信,这里引入websocket实现。
后端(Server端)使用最常用的java微服务架构springboot2+mybatis+mysql/postgresql,使用的人多,文档齐全,更新维护频繁。数据库熟悉pg强于mysql,这里选择pg。 -
需要前端javascript实现图形化的分析流程设计器,后面会详细讲,如图1。
-
Springboot提供了计划任务(定时任务)的功能,这里使用vue+iview 前端表单+ 后端springboot自带的Scheduling实现
实践问题二,解决迁移部署问题
- 刚加入公司时候:公司美国团队某跌落神坛的大佬写的一套分析流程,部署在ubuntu14.04上,迁移到ubuntu16.04遇到问题,某些底层代码或者库不兼容,具体原因不详,简单的说就是部署迁移成本高。
实践问题三,解决环境搭建、软件安装问题
- 每一套分析流程(pipeline)都要安装一大堆工具软件,如bwa,samtools,gatk,annovar,snpeff等等;安装配置过程相当痛苦。
需求:分析流程(pipeline)能够快速部署迁移
技术实现:使用虚拟化技术:
A, virtual machine technology Vmware, Virtualbox
B, Docker
A, B are able to meet demand, after comparison, Vmware, Virtualbox this more "heavy", Docker now widespread use of IT in the next lap, with a small footprint, run high efficiency series of advantages, is recommended Docker. Whether it is a virtual machine or deploy a good pipeline made Docker image, you can deploy, migrate, and do not always re-installation, configuration. Import Docker / ease of virtual machine images is much higher than a new installation. If not for that big reference files of several hundred G, it can be done directly on the automatic distribution, deployment.
Practice Question four: to achieve full process automation / improve efficiency and reduce costs
Before the company invested in the company of great information, including hardware and software. But on the whole process there are a few points is done manually:
- Data splitting machine, because the sequencer is a sample analysis system and the green channel for later docking is not completed
- After splitting the data, you need to manually start the analysis process, manual analysis to determine what needs to run the project
- After the analysis is complete, report output, you need to manually modify the reporting format, where consumption of a large manpower, particularly the use of life Sequencing System
Requirements: After the realization of the sample input, split sequencer data, start the analysis process, the analysis results are stored, a full analysis of the export process automation
Achieved: According to the above requirements, the structure shown in FIG summed to obtain automatic operation (Illumina models):
Automatic operation structure is shown (for Illumina models):
The final software architecture design as shown:
You can download the PPT or add QQ group: 853 718 264 discussion