Free Medical Data

https://eklitzke.org/free-medical-data

Free Medical Data

免费医疗数据

Mar 14, 2016

2016年3月14日

A lot has been made over “free” software—what it is, how it’s different from “open source” software, the merits of copyleft v.s. non-copyleft free software, and so on.

关于软件免费到底是什么,如何区分它与开源软件有什么不同,关于copyleft v.s,非copyleft免费软件的有点等等,人们已经做出了很多研究。

One issue that has come to my attention recently is that many medical data sets are proprietary, and this leads to worse patient treatment options.

最近我注意到的衣服问题是,许多医疗数据集是专有的,这会造成患者在治疗选择上面非常的不友好。

Here’s an example. Let’s say you have some sort of cancer, and there are several treatment options available (e.g. radiation therapy, chemotherapy, surgery) to try to treat the cancer. There is something called a “nomogram” where doctors take a bunch of other historical (anonymized) cases with their pre-surgery data points, the surgery option chosen, and the outcome. Based on these numbers they give you an answer like “option X has A% chance of curing you”, “option Y has B% chance of curing you”, etc. Here’s a concrete example. Let’s say you have prostate cancer which has been confirmed by measuring your blood PSA levels and have the cancer has been confirmed by a prostate biopsy test. Based on these factors, and any other relevant factors (age, weight, etc.), they’re able to create what is called a nomogram. The nomogram tells you for your specific numbers what they estimate you’ll be fully cured of prostate cancer (measured after 5 years) in different situations, e.g. you chose radical prostatectomy as your treatment option instead of radiation therapy.

这里有一个例子。假设你患有某种癌症,有几种治疗方法(如放射疗法、化疗、手术)可以尝试治疗癌症。有一种称为“列线图”的方法,医生用他们的术前数据点、选择的手术方案和结果来记录一堆其他的历史(匿名)病例。根据这些数字,他们给你一个答案,比如“选项X有百分之治愈你的机会”,“选项Y有百分之治愈你的机会”,等等。下面是一个具体的例子。假设你患有前列腺癌,已经通过测量你的血psa水平得到证实,并且已经通过前列腺活检检查得到证实。基于这些因素,以及其他相关因素(年龄、体重等),他们能够创建一个列线图。他们估计在不同的情况下,得出的列线图的具体的数字将告诉你,你将会完全治愈前列腺癌(5年后测量),例如:你选择根治性前列腺切除术作为你的治疗选择,而不是放射治疗。

I’m not sure of the math behind this, but I believe they use some sort of clustering algorithm to find similar patients and calculate a score based on their treatment results and how similar you were to those patients.

我不确定这背后的数学原理,但我相信他们使用某种聚类算法来寻找相似的病人,并根据他们的治疗结果与这些病人的相似程度来计算得分。

This is really cool, and it lets doctors choose the best treatment option to patients based on statics of thousands of previous patients. In many cases there is some treatment option that is usually best, but under various special circumstances an alternative is better; this system lets the doctor really choose the best option. For instance, in my father’s case normally a radical prostatectomy would be the treatment option for prostate cancer, but based on his nomogram it was discovered that radiation therapy has a much better treatment rate.

这真的很酷,它可以让医生根据成千上万以前的病人的统计数据,为病人选择最佳的治疗方案。在许多情况下,有一些治疗方案通常是最好的,但在各种特殊情况下,另一个方案更好;这个系统让医生真正选择最好的方案。例如,在我父亲的病例中,根治性前列腺切除术通常是前列腺癌的治疗选择,但是根据他的列线图,发现放射治疗有更好的治疗率。

Unfortunately, basically all of these nomogram databases are proprietary. The way it works is a hospital internally collects these numbers, and may share this data with other hospitals (I’m not sure under what IP terms). Then as a hospital you have to choose which nomogram database to use. Typically you’d be paying for such access, and the quality of the nomorgram data is based on how many data points are in that nomogram.

不幸的是,基本上所有这些列线图数据库都是专有的。它的工作方式是在医院内部收集这些数据,可能与其他医院共享这些数据(我不确定在什么IP条款下)。作为一家医院,你必须选择使用一个列线图数据库。通常需要为这种访问付费,而列线图数据的质量取决于该列线图中有多少数据点。

Fox Chase Cancer Center has a large online free nomogram database for various cancers. In addition to their own data, which is signficant, Fox Chase has a way for other hospitals to submit their own nomogram data, which increases the total information and helps doctors lead to more accurate predictions. I don’t know what the data licensing terms are; presumably you cannot directly download the Fox Chase cancer nomogram data. But at least you can use their online nomogram tools for free.

Fox Chase 癌症中心有一个针对各种癌症的大型在线免费列线图数据库,除了他们自己显著的数据,也有其他医院提交自己的列线图数据方式,这为总的信息增加了数据,并帮助医生更准确预测,我不知道数据许可条款是什么,大概你不能直接下载Fox Chase 癌症列线图数据,但是你至少可以免费试用他们的在线列线图工具。

There are a bunch of studies of techniques like this that you can find at the National Center for Biotechnology Information which is part of the NIH. For instance, here’s a study on intramedullary rods vs plate and screw fixation to fix humerus fractures;

有很多这样的技术研究,你可以再国家生物技术信息中心找到,它是NIH的一部分。例如本文研究了本文研究了髓内棒与钢板螺钉固定治疗肱骨骨折;

However, there a number of problems with this:

但是,这方面存在一些问题

  • While there may have been a few studies on the matter (there are a dozen articles or so on the plate fixation vs intrameduallary rod technique), the data isn’t tagged or aggregated openly in a free way that would allow one to try to build a nomogram based on the most amount of information possible
  • 虽然可能已经有一些关于这个问题的研究(有十几篇关于钢板固定和髓内棒技术的文章)但是这些数据并没有以一种免费的方式公开地标记或聚合,使人们能够建立一个基于尽可能多的信息的列线图。
  • The research articles that do exist tend to have small sample sets because they only include information from a single hospital or group of related hospitals
  • 确实存在的研究论文往往有小样本集,因为它们只包括来自单个医院或相关医院集团的信息。
  • The intellectual property that hospitals have by collecting your health records has some amount of intellectual property value, and there’s no monetary reason for them to share it for free
  • 医院通过收集你的健康记录所拥有的知识产权具有一定的知识产权价值,而且他们没有金钱理由免费分享
  • Most of these research articles are published through a for-profit publisher like Elsevier which means that as a normal citizen, I cannot read the results of the study without paying the publisher a large fee for access to the article.这些研究文章大多是通这些研究文章大多是通过像爱思唯尔这样的营利出版商发表的,这意味着作为一个普通公民,如果不向出版商支付一大笔访问文章的费用,我就将无法阅读研究的结果。

The Department of Health has a lot of issues on its hand, but this is one that I think they should focus on seriously. Consider the following class of medical conditions:

卫生部有很多问题,但我认为他们应该认真关注这个问题。考虑以下医疗条件

  • There is some way to collect numerical pre-treatment data
  • 有一些方法可以收集数值预处理数据
  • There are multiple treatment options
  • 有多种治疗方案
  • Efficacy of the treatment can be evaluated somehow
  • 治疗的效果可以通过某种方式来评估

In every case I belive the NIH should build an open (anonymized) database about the pre-treatment data, treatment option chosen, and the efficacy of the treatment. In some cases (say, for very rare conditions) it may not be possible to do this while observing privacy concerns, but surely we can come to a common ground where we take common medical problems (many forms of cancer, bone fractures, etc.) and then use these databases to make medical treatment decisions.

在任何情况下,我相信国家卫生研究院应该建立一个开放(匿名)的数据库,关于治疗前的数据,选择的治疗方案,以及治疗的效果。在某些情况下(比如说,在非常罕见的情况下),在观察隐私问题的同时可能无法做到这一点,但我们当然可以达成共识,解决共同的医疗问题(许多形式的癌症、骨折等),然后使用这些数据库做出医疗决策。

Hospitals can be made to submit such data to the NIH as a result of these treatments (in fact, I wouldn’t be surprised if they already do). The NIH can enforce this by making this type of data-sharing contingent of funding to the hospitals from the NIH.

通过这些治疗,医院可以向国家卫生研究院提交这些数据(事实上,如果他们已经这样做了,我也不会感到惊讶)。NIH研究院可以通过使这种类型的数据共享,成为NIH研究院向医院提供资金的团队来实施这一点。

I sincerely hope that an effort like this happens in the future. It could save millions of lives, save people from unnecessary pain, and I think frames the current hot-button topic debate of “intellectual property” in a good and reasonable way.

我真诚地希望将来会有这样的努力。它可以拯救数百万人的生命,让人们免于不必要的痛苦,我认为,它以一种良好而合理的方式勾勒了当前关于“知识产权”的热门话题辩论。

发布了88 篇原创文章 · 获赞 33 · 访问量 19万+

猜你喜欢

转载自blog.csdn.net/ccmedu/article/details/102511310