How to Become a Contributor to Top Open Source Projects

overview

For programmers, being a contributor to a top open source project is a meaningful thing, of course, it is by no means an easy task. If you are working on artificial intelligence, then you must know about open source projects such as Google Tensorflow, Facebook Pytorch. Let's talk about how to become a Contributor of these top open source projects.

Prepare

1. First of all, you must become a github user and be familiar with the basic logic of hosting code on github.

2. For top-level open source projects, you generally need to sign a Contributor License Agreement (CLA for short), such as Tensorflow projects, individuals sign TF individual CLA, companies sign TF corporate CLA, and some projects in Pytorch need to sign Facebook CLA. Only then will your code be accepted.

3. Make the code you write more standardized. General open source projects are required to be Google Python Style, even Pytorch follows this specification, not to mention Google's own Tensorflow.

4. The code you contribute is often composed of classes or functions (except for document contributions), so you need a unit test program, which, like code comments, is an essential part of the code sharing process. Without it, your code won't be merged even if it's correct, and you'll end up being asked to provide unit test scripts.

5. Many open source projects require each of your py scripts to start with a license certificate, such as Tensorflow, this is an example of its python license certificate: Python license example, of course, this is very simple.

# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================

tool

Next, we will introduce the use of related tools, which can effectively help us complete the preparatory work before contribution, such as code specification and unit testing.

code specification tool

In order to meet the requirements of the code to meet Google Style, we first need a code specification detection tool, here we use the officially recommended pylint.

Install:

pip install pylint

use:

# 使用pylint检测脚本代码，默认将按照PEP8标准
# 这里我们需要指定配置文件，即按照Google Style标准
# myfile.py代表你写好的python脚本文件
pylint --rcfile=pylintrc myfile.py

For pylintrc content, please refer to: pylintrc

And because the code we initially wrote is often too random, there may be too many places to modify directly using pylint, which may cause serious damage to your young mind. Therefore, here is another tool recommended by many open source projects: black , it can directly help you modify the basic problems in the code (there are still many problems that cannot be determined and need to be detected by pylint).

Install:

pip install black

use:

# 这里的-l代表代码的每行最大长度
# 默认是88，但是Google Style要求为80
# 因此这里指定为80
black myfile.py -l 80

Code style example:

def my_op(tensor_in, other_tensor_in, my_param, other_param=0.5,
          output_collections=(), name=None):
  """My operation that adds two tensors with given coefficients.

  Args:
    tensor_in: `Tensor`, input tensor.
    other_tensor_in: `Tensor`, same shape as `tensor_in`, other input tensor.
    my_param: `float`, coefficient for `tensor_in`.
    other_param: `float`, coefficient for `other_tensor_in`.
    output_collections: `tuple` of `string`s, name of the collection to
                        collect result of this op.
    name: `string`, name of the operation.

  Returns:
    `Tensor` of same shape as `tensor_in`, sum of input values with coefficients.

  Example:
    >>> my_op([1., 2.], [3., 4.], my_param=0.5, other_param=0.6,
              output_collections=['MY_OPS'], name='add_t1t2')
    [2.3, 3.4]
  """
  with tf.name_scope(name or "my_op"):
    tensor_in = tf.convert_to_tensor(tensor_in)
    other_tensor_in = tf.convert_to_tensor(other_tensor_in)
    result = my_param * tensor_in + other_param * other_tensor_in
    tf.add_to_collection(output_collections, result)
    return result
output = my_op(t1, t2, my_param=0.5, other_param=0.6,
               output_collections=['MY_OPS'], name='add_t1t2')

unit testing tool

·Unit testing is very important for team development and is an important basis for testing code quality, so each of your complete codes must be equipped with unit testing scripts. Here we use python's mainstream unit testing tool unittest.

· Install:

pip install unittest

Usage: Here we only demonstrate the core usage method, please refer to the unittest documentation for more specific content

# 导入unittest工具包
import unittest

# 我们首先要建立一个测试类，它将包含你所有需要进行测试的函数
# 这个类不使用__init__(self)，但可以使用setUp(self)来定义公有部分
# 它需要继承unittest.TestCase, 类名往往也建议以Test开头
class TestStringMethods(unittest.TestCase):
    # 类的里面依次是你需要进行测试的函数
    # 这些函数建议以test_开头
    # 这些函数一般情况不设置参数，而是直接在函数中具体化需要的参数
    # 当然你也可以设置原始的参数，然后在外部具体化参数并调用该函数
    # 在测试函数中必须存在assert...来断定测试结果
    # 常用的assert...包括: assertEqual, assertTrue, assertFalse,
    # assertRaises, assertIn, assertNotIn, assertIs, assertIsNot...
    def test_upper(self,):
        # 使用assertEqual判断两个字符串是否相等
        self.assertEqual(
            "foo".upper(), "FOO",
        )

    def test_isupper(self,):
        # 使用assertTrue/False断定条件为真/假
        self.assertTrue("FOO".isupper())
        self.assertFalse("Foo".isupper())

    def test_split(self,):
        # 设定任意输入
        s = "hello world"
        # 使用assertIn断定列表包含关系
        self.assertIn(
            s.split(), [["hello", "world"]],
        )
        # 注意：这里with self.assertRaises来断定异常
        with self.assertRaises(TypeError):
            s.split("asd")


# 这里是主函数，如果使用python运行该脚本测试，则必须存在
# 如果使用pytest(后面会介绍)，则可以省略
if __name__ == "__main__":
    # 使用unittest.main运行所有继承unittest.TestCase的类
    unittest.main()

Use of decorators: One of the most commonly used methods of unittest is class/function decorators.

# 对于一些特殊需要强制跳过的测试的类/函数使用下方装饰器，但你必须说明原因
# @unittest.skip("长得太帅，不需要测试，给我跳过！")

# 如果条件为真，则该测试被强制跳过。比如：检测GPU是否可用
# @unittest.skipIf(TEST_CUDA, "CUDA available")

# 除非条件为真，否则该测试被强制跳过。比如: 检测某些依赖包是否安装
# @unittest.skipUnless(has_unittest, "unittest dependencies are not installed")

# 函数异常测试的表达方式，函数出现异常则测试通过，比之前说的内部异常粒度更大
# @unittest.expectedFailure

import torch
try:
    import unittest
except ImportError:
    has_unittest = False
else:
    has_unittest = True

if torch.cuda.is_available():
    TEST_CUDA = True
else:
    TEST_CUDA = False

# 条件为真，不跳过
@unittest.skipUnless(has_unittest, "unittest dependencies are not installed")
# 条件为真，跳过；条件为假，不跳过
@unittest.skipIf(TEST_CUDA, "CUDA available")
class TestStringMethods(unittest.TestCase):
    def test_upper(self,):
        self.assertEqual(
            "foo".upper(), "FOO",
        )
    @unittest.skip("长得太帅，不需要测试，给我跳过！")
    def test_isupper(self,):
        self.assertTrue("FOO".isupper())
        self.assertFalse("Foo".isupper())

    @unittest.expectedFailure
    def test_split(self,):
        s = "hello world"
        self.assertIn(
            s.split(), [["hello", "world"]],
        )
        # 这里预计抛出异常，但实际没有异常，本质上这也算一种异常
        # 可以使用@unittest.expectedFailure
        with self.assertRaises(TypeError):
            s.split("ZMZ")


if __name__ == "__main__":
    unittest.main()

Run your test script:

# 建议使用pytest执行测试脚本，你的python中往往自带这个工具包
# 这时你不必写下主函数，并且他的输出形式更美观
pytest test_myfile.py

Output effect:

======================== test session starts =========================
platform linux -- Python 3.7.3, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
rootdir: /root
plugins: remotedata-0.3.1, celery-4.3.0, doctestplus-0.3.0, arraydiff-0.3, openfiles-0.3.2
collected 3 items

test_myfile.py sx.                                             [100%]

=========== 1 passed, 1 skipped, 1 xfailed in 0.34 seconds ===========

Please refer to Pytorch Tests and Tensorflow Tests for real unit test scripts

process

Before you are ready to become a contributor, make sure you are comfortable using the project. Then clarify the type of source code you want to contribute, whether it is Fix Bug or Implement New Feature (implement new features). Of course, for a novice contributor, Fix Bug is your only choice. Unless you have identified the specific content of your contribution through your own practice, it is recommended that you follow the steps below:

first step:

Look for open issues from the Github Issues of open source projects, here are Tensorflow Issues, Pytorch Issues, carefully read the issues raised by everyone, this will help you save a lot of time in finding issues, and you can see related technologies in the discussion area Discuss or submit the PR to further clarify whether you should participate in solving the problem. (The issues of many open source projects will have the label "contributions welcome", so you can take a look at them first.)

Step two:

When you know the problem you want to solve, you need to fork this open source project to your own Github warehouse before you officially write the code, and then clone the warehouse to your designated server, so that you can finally submit the PR.

# 例如:
git clone https://github.com/AITutorials/tensorflow.git

At this point you can find through git remote -v that we are only connected to our remote warehouse (origin/master).

At this point we also need to establish a connection with the remote warehouse of the open source project (upstream/master)

# 以tensorflow为例建立连接
git remote add upstream https://github.com/tensorflow/tensorflow.git

# 查看到upstream
git remote -v

Then you need to create your own branch, of course, you can check the remote branch first

# 查看远程分支
git branch -a

# 创建自己的远程分支cnsync
git checkout -b cnsync

third step:

Through the second step, you have obtained the source code of the project and created your own branch. At this time, you will start your performance, coding + review, and the code specification tools and unit testing tools you prepared before will come in handy.

the fourth step:

Commit your code and create a PR in github.

# 把内容添加到暂存区
git add .

# 提交更改的内容
git commit -m "添加你的改变说明"

# push到自己的远程仓库
git push origin cnsync

Note: Although you only push to your own remote warehouse here, your remote warehouse is actually connected to the warehouse of the source project. That is to say, at this time, you can decide whether to create a PR of the source project by operating your own remote warehouse (these processes can be implemented on the project page you just forked, including filling in the title and comment of the PR, and sometimes you also need to enter title, such as [Draft]/[WIP]/[RFR], etc.).

the fifth step:

Wait patiently, if your PR is in the status of Ready For Review, it will soon enter the process of automated testing and the intervention of the jury, and you will receive some feedback in the near future, your code plan may be adopted, and may require More revisions or tests.