Three ways to parse parameters in Python

The main purpose of our sharing today is to improve the efficiency of the code by using the command line and configuration files in Python

Let’s go!

We practice the parameter tuning process in machine learning, and there are three ways to choose from. The first option is to use argparse, a popular Python module dedicated to command-line parsing; another is to read a JSON file where we can put all hyperparameters; the third is also lesser known The way to do it is to use a YAML file! Curious, let's get started!

insert image description here

prerequisites

In the code below, I'll use Visual Studio Code, a very productive integrated Python development environment. The beauty of this tool is that it supports every programming language by installing extensions, integrates the terminal and allows to process a large number of Python scripts and Jupyter notebooks simultaneously

The data set, using the shared bicycle data set on Kaggle, can be downloaded here or obtained at the end of the article

https://www.kaggle.com/datasets/lakshmi25npathi/bike-sharing-dataset

use argparse

insert image description here
As shown in the image above, we have a standard structure to organize our small projects:

  • A folder named data containing our dataset
  • train.py file
  • options.py file for specifying hyperparameters

First, we can create a file train.py where we have the basic program to import the data, train the model on the training data and evaluate it on the test set:

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error

from options import train_options

df = pd.read_csv('data\hour.csv')
print(df.head())
opt = train_options()

X=df.drop(['instant','dteday','atemp','casual','registered','cnt'],axis=1).values
y =df['cnt'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

if opt.normalize == True:
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
    
rf = RandomForestRegressor(n_estimators=opt.n_estimators,max_features=opt.max_features,max_depth=opt.max_depth)
model = rf.fit(X_train,y_train)
y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_pred, y_test))
mae = mean_absolute_error(y_pred, y_test)
print("rmse: ",rmse)
print("mae: ",mae)

In the code, we also import the train_options function contained in the options.py file. The latter file is a Python file from which we can change the hyperparameters considered in train.py:

import argparse

def train_options():
    parser = argparse.ArgumentParser()
    parser.add_argument("--normalize", default=True, type=bool, help='maximum depth')
    parser.add_argument("--n_estimators", default=100, type=int, help='number of estimators')
    parser.add_argument("--max_features", default=6, type=int, help='maximum of features',)
    parser.add_argument("--max_depth", default=5, type=int,help='maximum depth')
    opt = parser.parse_args()
    return opt

In this example, we used the argparse library, which is very popular for parsing command line arguments. First, we initialize the parser, then, we can add the parameters we want to access.

Here is an example of running the code:

python train.py

insert image description here
To change the default values ​​of hyperparameters, there are two ways. The first option is to set different defaults in the options.py file. Another option is to pass hyperparameter values ​​from the command line:

python train.py --n_estimators 200

We need to specify the name and corresponding value of the hyperparameter to change.

python train.py --n_estimators 200 --max_depth 7

Working with JSON files

insert image description here
As before, we can keep a similar file structure. In this case, we replaced the options.py file with a JSON file. In other words, we want to specify the values ​​of the hyperparameters in the JSON file and pass them to the train.py file. JSON files can be a fast and intuitive alternative to the argparse library, utilizing key-value pairs to store data. Below we create an options.json file that contains the data we need to pass to other code later.

{
"normalize":true,
"n_estimators":100,
"max_features":6,
"max_depth":5 
}

As you can see above, it is very similar to a Python dictionary. But unlike a dictionary, it contains data in text/string format. In addition, there are some common data types with slightly different syntax. For example, boolean values ​​are false/true, and Python recognizes False/True. Other possible values ​​in JSON are arrays, which are represented by square brackets as Python lists.

The beauty of working with JSON data in Python is that it can be converted to a Python dictionary with the load method:

f = open("options.json", "rb")
parameters = json.load(f)

To access a specific item, we just need to quote its key name inside square brackets:

if parameters["normalize"] == True:
    scaler = StandardScaler()
    X = scaler.fit_transform(X)
rf=RandomForestRegressor(n_estimators=parameters["n_estimators"],max_features=parameters["max_features"],max_depth=parameters["max_depth"],random_state=42)
model = rf.fit(X_train,y_train)
y_pred = model.predict(X_test)

Using YAML files

insert image description here
The last option is to exploit the potential of YAML. As with the JSON file, we read the YAML file in the Python code as a dictionary to access the values ​​of the hyperparameters. YAML is a human-readable data representation language in which hierarchies are represented using double space characters instead of parentheses like in JSON files. Below we show what the options.yaml file will contain:

normalize: True 
n_estimators: 100
max_features: 6
max_depth: 5

In train.py, we open the options.yaml file, which will always be converted to a Python dictionary using the load method, this time imported from the yaml library:

import yaml
f = open('options.yaml','rb')
parameters = yaml.load(f, Loader=yaml.FullLoader)

As before, we can access the values ​​of hyperparameters using the syntax required for dictionaries.

final thoughts

The configuration file compiles very quickly, whereas argparse requires writing a single line of code for each parameter we want to add.

Therefore, we should choose the most suitable method according to our different situations.

For example, if we need to add comments to parameters, JSON is not suitable because it doesn't allow comments, whereas YAML and argparse might be a good fit.

Finally: Can be on the public account: Sad spicy strips! Get a 216-page software test engineer interview collection document by yourself [free]. And the corresponding video learning tutorials are shared for free! , including basic knowledge, Linux essentials, Shell, Internet program principles, Mysql database, packet capture tool topics, interface testing tools, advanced testing-Python programming, Web automated testing, APP automated testing, interface automated testing, testing Advanced continuous integration, testing architecture development testing framework, performance testing, security testing, etc.

I recommend a [Python automated testing exchange group: 746506216 ], everyone can discuss and communicate software testing together, learn software testing techniques, interviews and other aspects of software testing together, to help you quickly advance Python automated testing/test development, and move towards a high salary.

Friends who like software testing, if my blog is helpful to you, if you like my blog content, please "Like", "Comment" and "Favorite" with one click!

Guess you like

Origin blog.csdn.net/wx17343624830/article/details/125727866