8 machine learning Python modeling project with practical advice, precisely to avoid the pit novice white

Many partners are contacted when the entry in the Python programming soon, we use machine learning Python modeling project, and everyone will have their own set of project files management habits, I also have a method, it is once stepped on a mine summed up here and now for everyone to share! I hope you detours!

8 machine learning Python modeling project with practical advice, precisely to avoid the pit novice white

Directory to put out

  • File archiving project well in advance
  • Never manually modify the source data and make a backup
  • Do properly configured path
  • The code where necessary make Notes and Remarks
  • Accelerate your Python loop code
  • Visualize your loop code progress
  • Efficient use of abnormal capture tool
  • To give more consideration to the robustness of the code

1. Project document well in advance of filing

Every time you start a new job, before I was always the sake of convenience, Code, Data, documents are concentrated in one folder, it looks messed up, so that once back process is very painful, or changed computers, files are all running to die, we need to modify the path on their own, very painful.

After some exploration of their own, we can roughly divide the project into several sub-folders, code in your main folder.

8 machine learning Python modeling project with practical advice, precisely to avoid the pit novice white

2. Never manually modify the source data and make a backup

We need a good backup of the source data to help us next time backtracking, may be the next operation or modification of the intermediate steps, and, on the other codes and other documents also need to be backed up to avoid accidental loss.

Xu Liang Linux here from an article, recommended four tools:

  • Git version control system
  • Rsync backup file
  • Dropbox cloud storage
  • Time Machine Time Machine

Introduction and use more tools I will not start here, you can go to inform themselves about the pictures.

3. do properly configured path

Many students at the time of writing are like direct path absolute path, although not a problem under normal circumstances, but if the code is shared with other people learning or running when the problem comes, are in many cases can not be directly run through,

It is recommended that:

  • Using a relative path: the script is located in the main directory, other resources (such as data, third-party packages, etc.) in which the same or lower directories, such as ./data/processed/test1.csv
  • Global Path configuration variables:
># 设置主目录
HOME_PATH = r'E:ML90615- PROJECT1'
# 读取数据
data = open(HOME_PATH+'/data/processed/test1.csv')
data = pd.read_csv(data)
data.head()

4. Code where necessary make Notes and Remarks

I believe that most people identify with, do not believe? Back a month ago to write the code to see it, to see what can understand how much (if not done Remarks words)

5. Accelerate your Python loop code

这里推荐 云哥(Python与算法之美)的一篇文章:24式加速你的python

收藏起来,多看多几次,养成好习惯呗,这样子你写代码才会越来越快~

6. 可视化你的循环代码进度

这里介绍一个Python库,tqdm,先安装一下:pip install tqdm

这个是一个可以显示循环进度的库,有了它就可以更加运筹帷幄了。

大家可以看下面的例子:

8 machine learning Python modeling project with practical advice, precisely to avoid the pit novice white

7. 使用高效的异常捕获工具

异常bug定位,以前的我经常也是一条print()函数走到底,虽然说也没什么问题,但效率上还是会比较慢,后来发现了一个叫PySnooper的装饰器,仿佛发现了新大陆。

我们一般debug,都是在我们可能觉得会有问题的地方,去打印输出,看下实际输出了什么,然后思考问题所在,这需要我们去改code,非常细致地改,相比较直接加个装饰器,是十分麻烦的。

大家可以看看Example:


在学习过程中有什么不懂得可以加我的
python学习交流扣扣qun,784758214
群里有不错的学习视频教程、开发工具与电子书籍。
与你分享python企业当下人才需求及怎么从零基础学习好python,和学习什么内容
1 import pysnooper
2 @pysnooper.snoop('./file.log')
3 def number_to_bits(number):
4 if number:
5 bits = []
6 while number:
7 number, remainder = divmod(number, 2)
8 bits.insert(0, remainder)
9 return bits
10 else:
11 return [0]
12 number_to_bits(6)

我们把函数每一步的输出都保存为file.log,我们可以直接去看到底哪里出了问题。

8. 要多考虑代码健壮性

何为代码的健壮性,顾名思义,就是可以抵挡得住各种异常场景的测试,异常处理工作由“捕获”和“抛出”两部分组成。“捕获”指的是使用 try ... except 包裹特定语句,妥当的完成错误流程处理。而恰当的使用 raise 主动“抛出”异常,更是优雅代码里必不可少的组成部分,下面总结几点供大家参考:

1)知道要传入的参数是什么,类型,个数 (异常处理,逻辑判断)

1 def add(a, b):
2 if isinstance(a, int) and isinstance(b, int):
3 return a+b
4 else:
5 return '参数类型错误'
6 print(add(1, 2))
7 print(add(1, 'a'))

2)只做最精准的异常捕获

我们有的时候想着让脚本work才是王道,所以不管三七二十一就搞一个大大的try...except把整块代码包裹起来,但这样很容易把原本该被抛出的 AttibuteError 吞噬了。从而给我们的 debug 过程增加了不必要的麻烦。

所以,我们永远只捕获那些可能会抛出异常的语句块,而且尽量只捕获精确的异常类型,而不是模糊的 Exception。

1 from requests.exceptions import RequestException
2 def save_website_title(url, filename):
3 try:
4 resp = requests.get(url)
5 except RequestException as e:
6 print(f'save failed: unable to get page content: {e}')
7 return False
8 # 这段正则操作本身就是不应该抛出异常的,所以我们没必要使用 try 语句块
9 # 假如 group 被误打成了 grop 也没关系,程序马上就会通过 AttributeError 来
10 # 告诉我们。
11 obj = re.search(r'<title>(.*)</title>', resp.text)
12 if not obj:
13 print('save failed: title tag not found in page content')
14 return False
15 title = obj.group(1)
16 try: with open(filename, 'w') as fp:
17 fp.write(title)
18 except IOError as e:
19 print(f'save failed: unable to write to file {filename}: {e}')
20 return False
21 else:
22 return True

3)异常处理不应该喧宾夺主

Speaking on a like exception catching to be accurate, but if each is very accurate, in fact we have a lot of code will try ... except statement block that disturb the core code, the code overall readability.

Here, we can use the context manager to improve our exception handling processes, simplifying the exception processing logic is repeated.

1 class raise_api_error:
2 """captures specified exception and raise ApiErrorCode instead
3 :raises: AttributeError if code_name is not valid
4 """
5 def __init__(self, captures, code_name):
6 self.captures = captures
7 self.code = getattr(error_codes, code_name)
8 def __enter__(self):
9 # 该方法将在进入上下文时调用
10 return self
11 def __exit__(self, exc_type, exc_val, exc_tb):
12 # 该方法将在退出上下文时调用
13 # exc_type, exc_val, exc_tb 分别表示该上下文内抛出的
14 # 异常类型、异常值、错误栈
15 if exc_type is None:
16 return False
17 if exc_type == self.captures:
18 raise self.code from exc_val
19 return False

In the above code, we define a context manager named raise_api_error, which is entering the context and do nothing. But when you exit context, it will determine whether the current context type self.captures thrown exception, if so, replace it with APIErrorCode exception class.

After use context manager, concise code is as follows:

1 def upload_avatar(request):
2 """用户上传新头像"""
3 with raise_api_error(KeyError, 'AVATAR_FILE_NOT_PROVIDED'):
4 avatar_file = request.FILES['avatar']
5 with raise_api_error(ResizeAvatarError, 'AVATAR_FILE_INVALID'),
6 raise_api_error(FileTooLargeError, 'AVATAR_FILE_TOO_LARGE'):
7 resized_avatar_file = resize_avatar(avatar_file)
8 with raise_api_error(Exception, 'INTERNAL_SERVER_ERROR'):
9 request.user.avatar = resized_avatar_file
10 request.user.save()
11 return HttpResponse({})

That's all for this article, I feel pretty good article might then covered up slowly, have any suggestions or comments are welcome to share the discussion in the comments area!

Guess you like

Origin blog.51cto.com/14568144/2444301