(补充)爬取大西洋月刊并调用彩云小译翻译 API 脚本

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/lzw2016/article/details/82952235

导读

上一篇文章写了如何爬取《The Atlantic》的新闻学习英语,这篇文章补充上一篇文章,在爬取文章段落时,同时调用翻译接口,到达如图所示的样子。
在这里插入图片描述

如图,翻译的非常不错,借助的是彩云小译·程序猿都知道的翻译机。以下重点就是讲解如何抓包,使用彩云小译的第三方API

问题 文章收纳

写入文件

这里是直接写入markdown,并添加了translate()函数翻译,其余内容可参考上一篇文章

def to_MarkDown(header,meta,time,p_list):
    with open('./《Atlantic》__{}.md'.format(header[0].strip()),'w+',encoding='utf=8') as f:
        f.writelines('## {}'.format(header[0].strip())+'\n')
        f.writelines('**{}**'.format(time[0].strip())+'爬取自《The Atlantic》\n\n')
        f.writelines('> 导读:**{}**'.format(meta[0].strip())+'\n\n')
        f.write('\n  ')
#         for p in p_list:
#             f.write('\n\n  '.join(p))
#             f.write('\n\n  ')
        source = []
        for p in p_list:
            for i in p:
                source.append(i)
        p_trans = translate(source)
        for i , j in zip(source,p_trans):
            f.write('  {}\n'.format(i))
            f.write('>   {}\n\n'.format(j.strip()))
    print('./《Atlantic》__{}.md | 写入成功'.format(header[0].strip()))    
        

添加彩云小译翻译接口

打开开发者工具(F12),抓包,很简单,你试了之后会发现它就一个translator接口,如图。
可知post请求,url
在这里插入图片描述

进一步分析post请求参数构成,如图我红框框出的

  • X-Authorization 是必填项——>点在这里申请,一个月可免费用100万字,足够了
  • content-type 注意,和以往的post请求的data参数不同,现在大多数网站是依据Payload来传参,需要改动的就是这个content-type
  • Payload 参数构成看代码,或者直接自己去分析
    • 注意 trans_type 不仅仅支持英译中(en2zh),中译英(zh2en),还有日语

关于X-Authorization,他有一个公用的,基本上无请求限制,分析下请求就能看出

在这里插入图片描述

import json
import time

def translate(source):
    payload = {'source':None,
               'media':'text',
               'detect':'true',
               'trans_type':'en2zh',
               'request_id':'demo'
              }
    payload['source'] = source
    payload = json.dumps(payload)
    headers = {
        'X-Authorization': 'token 你的token',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0',
        'content-type': 'application/json',
        'Referer':'http://fanyi.caiyunapp.com/'
    }

    url = 'https://api.interpreter.caiyunai.com/v1/translator'
    response = requests.post(url,data=payload,headers=headers)
    time.sleep(5)
    if response.status_code == 200:
        return json.loads(response.text)['target']
    
response.status_code
200

translate_p = json.loads(response.text)['target']
len(translate_p)
18
p_s = []
for i in p:
    for j in i:
        p_s.append(j)
for i,j in zip(p_s,translate_p):
    print('  {}\n'.format(i))
    print('>   {}\n\n'.format(j.strip()))
  Last year, the world learned that researchers led by David Evans from the University of Alberta had resurrected a virus called horsepox. The virus hasn’t been seen in nature for decades, but Evans’s team assembled it using genetic material that they ordered from a company that synthesizes DNA.

>   去年,全世界都知道,由大卫 · 埃文斯领导的阿尔伯塔大学研究人员复活了一种叫水痘的病毒。 这种病毒在自然界已经有几十年没有出现过了,但是埃文斯的研究小组用他们从合成 DNA 的公司订购的基因材料进行组装。


  The work caused a huge stir. Horsepox is harmless to people, but its close cousin, smallpox, killed hundreds of millions before being eradicated in 1980. Only two stocks of smallpox remain, one held by Russia and the other by the U.S. But Evans’s critics argued that his work makes it easier for others to recreate smallpox themselves, and, whether through accident or malice, release it. That would be horrific: Few people today are immunized against smallpox, and vaccine reserves are limited. Several concerned parties wrote letters urging scientific journals not to publish the paper that described the work, but PLOS One did so in January.

>   这项工作引起了巨大的轰动。 马瘟对人体无害,但它的近亲天花在1980年被根除之前已经杀死了上亿人。 只有两种天花存在,一种由俄罗斯持有,另一种由美国持有。 但是埃文斯的批评者认为,他的工作使得其他人更容易自己重建天花,并且,无论是意外还是恶意,都会释放出来。 这将是可怕的: 今天很少有人接种天花疫苗,而且疫苗储备有限。 一些有关方面写信敦促科学期刊不要发表描述这项研究的论文,但是《公共科学图书馆 · 综合》一月份就这样做了。


  This controversy is the latest chapter in an ongoing debate around “dual-use research of concern”—research that could clearly be applied for both good and ill. More than that, it reflects a vulnerability at the heart of modern science, where small groups of researchers and reviewers can make virtually unilateral decisions about experiments that have potentially global consequences, and that everyone else only learns about after the fact. Cue an endlessly looping GIF of Jurassic Park’s Ian Malcolm saying, “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”

>   这一争议是围绕"双重用途的关注性研究"正在进行的辩论中的最新一章,这项研究显然可以用于善与弊。 更重要的是,它反映了现代科学核心的脆弱性,在这里,一小群研究人员和评论家可以对具有潜在全球影响的实验作出几乎是单方面的决定,而其他人只能在事后才知道。 《侏罗纪公园》的伊恩 · 马尔科姆无休止地循环着,他说:"你们的科学家们一心想着他们是否能做到,他们没有停下来思考是否应该这样做。"


  Except Evans did think about whether he should, and clearly came down on yes. In one of several new opinion pieces that reflect on the controversy, he and his colleague Ryan Noyce argue that recreating horsepox has two benefits. First, Tonix, the company that funded the research, hopes to use horsepox as the basis of a safer smallpox vaccine, should that extinct threat ever be itself resurrected. Second, the research could help scientists to more efficiently repurpose poxviruses into vaccines against other diseases, or even weapons against cancer. (Evans politely declined a request for an interview, noting that he’d “rather let [his] piece speak for itself.”)

>   除了埃文斯确实考虑过他是否应该这样做,而且很明显是的。 他和他的同事瑞恩 · 诺伊斯在一篇反思这场争论的新观点中提出,重建马瘟有两个好处。 首先,为这项研究提供资金的公司 Tonix 希望使用水痘作为一个更安全的人痘接种术的基础,如果已经灭绝的威胁自身复活的话。 其次,这项研究可以帮助科学家更有效地将痘病毒重新用于其他疾病的疫苗中,甚至可以用来对抗癌症。 (埃文斯礼貌地拒绝了采访的要求,指出他"宁愿让他的文章自己说话。")


  Tom Inglesby, a health-security expert at the Johns Hopkins Bloomberg School of Public Health, doesn’t buy it. He says these purported benefits are hypothetical, and could be achieved in safer ways that don’t involve horsepox at all. Even if you want to use that particular virus, the CDC has specimens in its freezers; Evans didn’t ask for those because he thought Tonix couldn’t have commercialized the naturally occurring strain into a vaccine, according to reporting from NPR’s Nell Greenfieldboyce.

>   布隆博格公共卫生学院的健康安全专家汤姆•英格斯比(Tom Inglesby)不买账。 他说,这些所谓的好处是假设的,可以通过更安全的方式来实现,而且根本不涉及水痘。 根据美国国家公共电台的 Nell Greenfieldboyce 的报道,即使你想使用这种特殊的病毒,疾病控制中心的冰箱里也有标本; 埃文斯并没有要求这些样本,因为他认为 Tonix 无法将自然产生的毒株转化为疫苗。


  “I was a little surprised that the issue caused so much controversy,” says Gigi Gronvall, who has written extensively on biosecurity and also works at Johns Hopkins. Other researchers had already synthesized smaller viruses like polio, and bigger entities like bacteria; they’ve even made a start on far larger organisms like yeast. Given such milestones, one should just assume that all viruses are within reach—but only to those with the right expertise, equipment, and money. Evans didn’t just order horsepox in the mail; it took years to refine the process of making and assembling it. “It’s not like anybody could synthesize horsepox,” says Gronvall.

>   "我对这个问题引发如此大的争议感到有点惊讶,"Gigi Gronvall 说,她曾经写过大量关于生物安全的文章,同时也在约翰霍普金斯大学工作。 其他研究人员已经合成了小型病毒,比如脊髓灰质炎,还有更大的细菌,比如细菌; 他们甚至已经开始研究像酵母这样的大得多的生物体。 考虑到这些里程碑式的事件,人们应该假设所有的病毒都可以接触到——但是只有那些拥有正确的专业知识、设备和金钱的病毒。 埃文斯不仅在邮件中订购马瘟,还花了数年的时间来完善制作和组装过程。 "这不像是任何人都可以合成马瘟病毒的,"格隆维尔说。


  True, says Kevin Esvelt from MIT, but that feat is now technically easier because Evans’s paper spelled out several details of how to do so. It’s conceptually easier to weaponize because his paper explicitly connected the dots to smallpox. And it will become logistically easier to carry out with time, as the underlying tech becomes cheaper. “In the long run, I’m worried about the technology being accessible enough,” Esvelt says.

>   的确如此,来自麻省理工学院的凯文 · 埃斯维特说,但是从技术上来说,这一壮举在技术上变得更加容易,因为埃文斯的论文详细阐述了如何做到这一点。 因为他的论文明确地将这些点与天花联系在一起,所以在概念上更容易武器化。 随着基础技术变得更加廉价,随着时间的推移,这将变得更加容易。 "从长远来看,我担心的是技术是否足够容易获得,"Esvelt 说。


  There are ways of mitigating that risk. Most groups can’t make DNA themselves, and must order sequences from companies. Esvelt thinks that all such orders should be screened against a database of problematic sequences, as a bulwark against experiments that are unknowingly or deliberately dangerous. Such screening already occurs, but only on a voluntary basis. A mandatory, universal process could work if publishers or funders boycott work that doesn’t abide by it, or if companies build the next generation of DNA synthesizers to lock if a screening step is fixed.

>   有一些方法可以减轻这种风险。 大多数团队不能自己制造 DNA,而且必须从公司订购序列。 认为所有这些命令都应该在一个有问题的序列数据库中进行筛选,以防止不知不觉或故意危险的实验。 这种筛查已经发生,但只是在自愿的基础上进行。 如果出版商或出资者抵制不遵守的工作,或者如果公司建立下一代 DNA 合成器来锁定筛选步骤,那么一个强制性的、普遍的过程就可以奏效。


  But these technological fixes do little to address the underlying debate about how society decides what kinds of experiments should be done in the first place, let alone published. Few countries have clear procedures for reviewing dual-use research. The U.S. has perhaps the strongest policy, but it still has several loopholes. It only covers 15 big, bad pathogens, and horsepox, though related to one, isn’t one itself. It also only covers federally funded research, and Evans’s research was privately funded. He did his work in Canada, but he could just as easily have done so in the U.S.

>   但是这些技术上的解决办法并没有解决社会如何决定应该做什么样的实验的潜在争论,更不用说发表了。 很少有国家有审查双重用途研究的明确程序。 美国也许有最强硬的政策,但它仍然有一些漏洞。 它只覆盖了15个大的、不好的病原体和马瘟,尽管与其中一种病原体有关,但它本身并不存在。 它也只涉及联邦资助的研究,埃文斯的研究是私人资助的。 他在加拿大做了他的工作,但他在美国也可以轻而易举地这样做。


  Absent clearer guidelines, the burden falls on the scientific enterprise to self-regulate—and it isn’t set up to do that well. Academia is intensely competitive, and “the drivers are about getting grants and publications, and not necessarily about being responsible citizens,” says Filippa Lentzos from Kings College London, who studies biological threats. This means that scientists often keep their work to themselves for fear of getting scooped by their peers. Their plans only become widely known once they’ve already been enacted, and the results are ready to be presented or published. This lack of transparency creates an environment where people can almost unilaterally make decisions that could affect the entire world.

>   如果没有更明确的指导方针,科研企业将承担起自我监管的责任,而且它并不能很好地做到这一点。 伦敦大学国王学院研究生物威胁的 Filippa Lentzos 说,学术界竞争激烈,"驱动因素是获得奖学金和出版物,而不一定是要成为负责任的公民。"他研究生物威胁。 这意味着科学家常常把他们的工作留给自己,以免被同龄人挖走。 他们的计划只有在已经颁布之后才会广为人知,而且结果已经准备就绪,可以提交或公布。 这种缺乏透明度的做法创造了一种环境,人们几乎可以单方面作出可能影响整个世界的决定。


  Take the horsepox study. Evans was a member of a World Health Organization committee that oversees smallpox research, but only told his colleagues about the experiment after it was completed. He sought approval from biosafety officers at his university, and had discussions with Canadian federal agencies, but it’s unclear if they had enough ethical expertise to fully appreciate the significance of the experiment. “It’s hard not to feel like he opted for agencies that would follow the letter of the law without necessarily understanding what they were approving,” says Kelly Hills, a bioethicist at Rogue Bioethics.

>   以马瘟研究为例。 埃文斯是世界卫生组织的一个委员会的成员,该委员会负责监督天花的研究,但是他只是在实验完成后才告诉他的同事。 他寻求大学生物安全官员的批准,并与加拿大联邦机构进行了讨论,但目前还不清楚他们是否具备足够的道德专业知识,以充分认识到这项实验的重要性。 罗格生物伦理学院的生物伦理学家凯利•希尔斯(Kelly Hills)表示:"我们很难不觉得他选择了那些遵循法律条文的机构,而不一定了解他们所批准的内容。"。


  She also sees a sense of impulsive recklessness in the interviews that Evans gave earlier this year. Science reported that he did the experiment “in part to end the debate about whether recreating a poxvirus was feasible.” And he told NPR that “someone had to bite the bullet and do this.” To Hills, that sounds like: I did it because I could do it. “We don’t accept those arguments from anyone above age six,” she says.

>   她还在今年早些时候的采访中看到了一种冲动的鲁莽感。 科学报道说,他做这个实验的部分原因是为了结束关于重建痘病毒是否可行的争论。" 他告诉美国国家公共广播电台,"必须有人咬紧牙关才能做到这一点。" 对于希尔斯来说,这听起来像是: 我这么做是因为我能做到。 "我们不接受任何六岁以上人士的观点,"她表示。


  Even people who are sympathetic to Evans’s arguments agree that it’s problematic that so few people knew about the work before it was completed. “I can’t emphasize enough that when people in the security community feel like they’ve been blindsided, they get very concerned,” says Diane DiEuliis from National Defense University, who studies dual-use research.

>   即使是那些同情埃文斯论点的人也认为,在工作完成之前,很少有人知道这项工作是有问题的。 国防大学研究双重用途研究的黛安•迪尤利斯(Diane DiEuliis)表示:"我再怎么强调也不过分,当安全部门的人觉得自己被暗算时,他们会非常担心。"。


  The same debates played out in 2002, when other researchers synthesized poliovirus in a lab. And in 2005, when another group resurrected the flu virus behind the catastrophic 1918 pandemic. And in 2012, when two teams mutated H5N1 flu to be more transmissible in mammals, in a bid to understand how that might happen in the wild. Many of the people I spoke with expressed frustration over this ethical Möbius strip. “It’s hard not to think that we’re moving in circles,” Hills says. “Can we stop saying we need to have a conversation and actually get to the conversation?”

>   同样的辩论发生在2002年,当时其他研究人员在实验室合成了脊髓灰质炎病毒。 2005年,当另一个团体在1918年灾难性的大流行病背后复活了流感病毒。 而在2012年,当两个团队将 H5N1病毒变异为哺乳动物更容易传播,以期了解野生动物中可能发生的情况。 与我交谈过的许多人对这个伦理道德条款表示失望。 "很难不认为我们在绕圈子,"希尔斯说。 "我们能不能不要再说我们需要进行一次谈话,而是真正地开始谈话吗?"


  The problem is that scientists are not trained to reliably anticipate the consequences of their work. They need counsel from ethicists, medical historians, sociologists, and community representatives—but these groups are often left out from the committees that currently oversee dual-use research. “The peer group who is weighing in on these decisions is far too narrow, and these experiments have the potential to affect such a large swath of society,” says Lentzos. “I’m not saying we should flood committees with people off the streets, but there are a lot of professionals who are trained to think ethically or from a security perspective. Scientists don’t have that and it’s actually unfair that they’re being asked to make judgment calls on security issues.”

>   问题在于,科学家没有接受过可靠预测他们工作后果的训练。 他们需要来自伦理学家、医学历史学家、社会学家和社区代表的建议,但这些团体往往被排除在目前负责双重用途研究的委员会之外。 "参与这些决策的同行群体太过狭隘,这些实验有可能影响到如此大的社会群体,"Lentzos 说。 "我并不是说我们应该让委员会里的人走上街头,但是有很多专业人士接受过道德或安全方面的培训。 科学家们没有这种能力,而且他们被要求对安全问题做出判断是不公平的。"


  More broadly, Hills says, there’s a tendency for researchers to view ethicists and institutional reviewers as yet more red tape, or as the source of unnecessary restrictions that will stifle progress. Esvelt agrees. “Science is built to ascend the tree of knowledge and taste its fruit, and the mentality of most scientists is that knowledge is always good,” he says. “I just don’t believe that that’s true. There are some things that we are better off not knowing.” He thinks the scientific enterprise needs better norms around potentially dangerous information. First: Don’t spread it. Second: If someone tells you that your work represents an information hazard, “you should seriously respect their call.”

>   更广泛地说,希尔斯说,研究人员倾向于将伦理学家和机构评论家视为更多的繁文缛节,或者将其视为不必要限制的来源,而这些限制将会扼杀进步。 埃斯维特对此表示赞同。 他说:"科学的建立是为了提升知识树,品尝它的果实,而大多数科学家的心态是,知识永远是好的。"。 "我只是不相信这是真的。 有些事情我们最好不要知道。" 他认为科学企业需要围绕潜在危险信息制定更好的规范。 首先: 不要分散它。 第二: 如果有人告诉你你的工作是一种信息危害,"你应该严肃地尊重他们的呼吁。"


  Lentzos adds that scientists should be trained on these topics from the earliest stages of their careers. “It needs to start at the undergrad level, and be continually done for active researchers,” she says. There is a lot of talk about educating society about science. Perhaps what is more needed is educating scientists about society.

>   他补充说,科学家应该从他们职业生涯的最初阶段就开始接受有关这些主题的培训。 "它需要从本科生开始,并且不断地为活跃的研究人员做,"她说。 有很多关于科学教育社会的讨论。 也许更需要的是对科学家进行社会教育。


  We want to hear what you think about this article. Submit a letter to the editor or write to [email protected].

>   我们想听听你对这篇文章的看法。 向编辑提交一封信,或写信至 [email protected]

以上为调试内容,具体代码也在上面


猜你喜欢

转载自blog.csdn.net/lzw2016/article/details/82952235