Summary of Python problems in the primary semester

unpacking operator

In Python, ** is an operator called unpacking operator or keyword argument unpacking operator.
What it does is unpack a dictionary or keyword arguments and pass the unpacked key-value pairs as keyword arguments to a function or method.
Specifically, ** is used in two different contexts:

1. Unpacking operation during function call:
When ** is used in a function call, it unpacks a dictionary and passes its key-value pairs to the function as keyword arguments.
Example:

   def my_func(param1, param2):
       print(param1, param2)

   my_dict = {
    
    'param1': 10, 'param2': 20}
   my_func(**my_dict)  # 等同于 my_func(param1=10, param2=20)

In the above example, the my_dict dictionary is unpacked and its key-value pairs are passed as keyword arguments to the my_func function.

2. Dictionary merge operation:
When ** is used in dictionary merge, it can unpack the key-value pairs of one dictionary and merge them into another dictionary.
Example:

   dict1 = {
    
    'a': 1}
   dict2 = {
    
    'b': 2, **dict1}  # 将 dict1 字典解包并合并到 dict2 字典
   print(dict2)  # 输出: {'b': 2, 'a': 1}

In the above example, dict1 unpacks the dict1 dictionary and merges its key-value pairs into the dict2 dictionary.
To summarize,
the unpacking operator is used in a function call to unpack a dictionary into keyword arguments, and in a dictionary merge to unpack and merge key-value pairs of a dictionary into another dictionary. It is a useful operator that simplifies code and provides more flexibility in how parameters are passed.

It should be noted that in versions prior to Python 3.5, the dictionary unpacking operator** and the keyword argument unpacking operator** did not exist, these functions were introduced in the Python 3.5 version.

keyword arguments

In Python, keyword arguments (Keyword Arguments) are a method of parameter passing for function calls, where the parameters are key=valuepassed to the function in the form of . It allows you to explicitly specify the value of the parameter according to the name of the parameter, instead of passing it according to the position of the parameter.
Keyword arguments provide the following advantages:

1. Optionality: You can pass only the required parameters instead of all parameters. This makes function calls more flexible.
2. Legibility: By specifying parameter names, function calls become clearer, easier to read and understand, because the purpose and meaning of the parameters become apparent at the time of the call.

Here's an example using keyword arguments:
def greet(name, age):
print(f"Hello, {name}! You are {age} years old.")

** Function calls with keyword arguments **

greet(name="Alice", age=30)

In the above example, the function greet accepts two parameters name and age. By using keyword arguments, we explicitly specify the name of the parameter and the corresponding value. This makes function calls more readable and does not depend on the positional order of the parameters.
Keyword arguments are supported both at function definition and function call time. When defining a function, you can use the parameter name to specify a default value, making it an optional parameter. When calling a function, you can specify specific parameter values ​​through keyword parameters.
It should be noted that the use of keyword parameters should follow certain grammatical rules: keyword parameters must be passed after positional parameters , and the same parameter cannot be specified repeatedly.
To sum up, keyword parameters are a way to pass parameter values ​​by specifying parameter names, which provide optionality and readability, making function calls more flexible and clear.

positional parameters

In Python, positional arguments (Positional Arguments) are a way of passing parameters to a function call. It refers to passing the parameter values ​​​​to the function in sequence according to the position order of the parameters when the function is defined.
The positional parameters are characterized as follows:

Positional order: Positional parameters need to be passed in the positional order of the parameters when the function is defined. That is, the value of the first positional parameter should be passed to the first parameter, the value of the second positional parameter to the second, and so on.

Necessity: Positional parameters are required, that is to say, a corresponding parameter value must be provided for each positional parameter in the function call, otherwise a call error will result.

Here is an example using positional parameters:

def greet(name, age):
    print(f"Hello, {
      
      name}! You are {
      
      age} years old.")
# 使用位置参数进行函数调用
greet("Alice", 30)

In the above example, the function greet accepts two positional arguments name and age. When the function is called, we pass the corresponding parameter values ​​​​to the function in order of position. That is, "Alice" corresponds to the name parameter, and 30 corresponds to the age parameter.

The values ​​of the positional parameters are relative to the position , so the order of the parameters is very important . When a function is defined, the order of the parameters determines the order in which the values ​​should be passed when it is called.
It should be noted that if the function definition contains a default value parameter (default parameter), and no parameter value is provided for the positional parameter when the function is called, the default value will be used. However, positional parameter passing still needs to be in the correct order.
In summary, positional parameters are a way to pass parameter values ​​in the order in which the parameters are defined when the function is defined. It is a required parameter in a function call, and the order of the parameters is very important.

The difference between ordered and unordered collections

The difference between an ordered collection and an unordered collection lies in the order and uniqueness of the data elements.

  1. Ordered Collection (Ordered Collection):
    An ordered collection refers to a collection type in which elements are arranged in a specific order.
    Elements in an ordered collection can be accessed and iterated based on their position in the collection.
    Elements in an ordered set can be repeated.
    In Python, the representative types of ordered collections include lists (list) and tuples (tuple).

  2. Unordered Collection (Unordered Collection):

    An unordered collection is a type of collection whose elements are arranged in no particular order.
    Elements in an unordered collection cannot be accessed and iterated by position because they are not in a fixed order.
    Elements in an unordered collection are unique and do not appear repeatedly.
    In Python, representative types of unordered collections include collections (set) and dictionaries (dict).
    A set is a collection of unique elements used in mathematical set operations.
    A dictionary is a collection of key-value pairs used to represent mapping relationships.

It should be noted that although a list (list) is an ordered collection, its order is variable and can be modified by indexing operations. A tuple is an immutable ordered collection, and the order of its elements cannot be changed once created.
To sum up, an ordered set is a set arranged in a specific order and can be repeated; an unordered set is a set whose elements are not arranged in a specific order, and the elements are unique.

Backslashes are interpreted as escape characters

report error
File “C:\Users\Administrator\Desktop\data\data\counts.py”, line 6 dirs=[“C:\Users\Administrator\Desktop\data\data\benign_processed”,“C:\Users\Administrator\Desktop\data\data\malware_processed”] ^ SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

Solution: Use the original string (Raw String): add the letter r in front of the string to create a raw string, where the backslash will not be seen as an escape character. For example:

dirs = [r"C:\Users\Administrator\Desktop\data\data\benign_processed",
        r"C:\Users\Administrator\Desktop\data\data\malware_processed"]

Read and save tensor

insert image description here

The distribution of categories in the dataset is not balanced

When encountering an imbalanced distribution of categories in a dataset, there are some strategies you can adopt to solve this problem. Some common methods are listed below:

1. Resampling (Resampling): Resampling is a method to adjust the number of samples of each category in the data set. Can be divided into two types:

2. Oversampling: Balance the dataset by increasing the number of samples in the minority category. Commonly used oversampling methods include random replication samples, SMOTE (Synthetic Minority Oversampling Technique), etc.
3. Undersampling (Undersampling): Balance the dataset by reducing the number of samples in the majority category. Commonly used undersampling methods include random deletion of samples, cluster greedy algorithm, etc.
The resampling method should be carefully selected according to the specific situation. Excessive resampling can lead to overfitting problems, while undersampling can lead to information loss. You can try different resampling methods or combinations thereof and evaluate the performance of the model on the balanced dataset.

4. Synthetic Sample Generation: This is a method of generating new synthetic samples by utilizing existing samples in the dataset. SMOTE (Synthetic Minority Oversampling Technique) is a commonly used method that generates new synthetic samples based on linear interpolation between minority class samples. The generated synthetic samples can help augment training data and improve minority class representations.
5. Class Weights (Class Weights): When training the model, you can adjust the sample weights of different classes to make the model pay more attention to the minority classes during the training process. This can be achieved by setting the loss function or class weights in the optimizer. Common approaches include setting class weights inversely proportional to their relative frequencies in the dataset, or using other weight assignment strategies based on class importance.
6. Model Ensemble (Model Ensemble): Combining the prediction results of multiple models can improve the prediction performance of the model for minority categories. Ensemble methods such as voting, weighted average, or stacking can be used. By using multiple different models, each with potentially different predictive performance for different classes, the overall predictive performance can be improved.
7. Data Augmentation: For samples of minority categories, various data augmentation techniques can be applied to generate new samples. For example, in image classification tasks, operations such as random cropping, rotation, flipping, and scaling can be performed to increase the diversity of samples. This can increase the number of samples in the minority category, and also improve the robustness and generalization of the model.

It is necessary to choose the appropriate method or their combination according to the specific situation. When trying different approaches, care should be taken to conduct sufficient evaluation and validation after implementation to determine whether the performance of the model has been improved, and make adjustments as appropriate.

f-string usage - f-string usage

In Python, prefixing a string with f is the usage of f-string. f-string is a convenient string interpolation syntax for embedding variables into strings.
In f-string, you can use curly braces {} to insert variables or expressions in the string, and it will be automatically replaced with the corresponding value at runtime. Inside the curly braces, you can use variable names, expressions, and call functions.
Here are some examples:
name = “Alice”
age = 25

embedded variable

greeting = f"Hello, {name}!"
print(greeting) # 输出: Hello, Alice!

embedded expression

message = f"{name} is {age + 1} years old."
print(message) # 输出: Alice is 26 years old.

embedded function call

uppercase_name = f"{name.upper()}"
print(uppercase_name) # 输出: ALICE

Using f-string can embed variables into strings more concisely, improving the readability and maintainability of the code. In f-string, you can also combine formatting options to control the display format of variables, such as specifying the number of decimal places, padding characters, etc.
Please note that f-string was introduced in Python 3.6, you need to ensure that your Python version is higher than or equal to 3.6 to use this feature.

lazy loading

Lazy loading (lazy loading) is a strategy to delay loading data, that is, loading data when needed, rather than loading the entire data set at once. This strategy can improve memory efficiency and reduce initialization time, especially when dealing with large datasets or requiring high memory consumption.
In machine learning and deep learning, datasets can be very large and difficult to load into memory all at once. Also, some tasks (such as training or prediction) may only require access to a portion of the dataset without loading all of it. In these cases, there are some benefits to using lazy loading.
Lazy loading can be implemented in the following ways:

1. Data set division: Divide the entire data set into multiple small batches (batches) or data blocks (chunks), and only load the currently required batches or blocks each time. This way, data is loaded only when needed, rather than the entire dataset at once.
2. Iterator or generator: Use an iterator or generator to generate data samples one by one, instead of returning all samples at once. An iterator or generator provides a sample at each iteration and reads the next sample as needed. This allows data to be loaded on demand, reducing memory footprint and initialization time.
3. Distributed loading: For distributed systems, data can be distributed and loaded, and data sets can be distributed on multiple nodes for parallel loading and processing. This approach can increase the speed of data loading and processing.

Lazy loading has important advantages in processing large data sets and saving memory, especially for environments with limited memory resources and tasks that require efficient processing of large-scale data. However, it should be noted that when using lazy loading, attention should be paid to the order and randomness of data, as well as the logic of data loading and batch processing during iteration and training to ensure correctness and efficiency.

Extract hidden features

In some tasks, some classes of features may be relatively rare or difficult to capture. Since these features occur less frequently in the dataset, the model may not be able to learn them sufficiently, resulting in weaker discrimination for these categories. To solve this problem, providing more samples can increase the number of training samples for these categories, thereby helping the model to better learn these hidden features.

By increasing the number of samples in the minority category, more samples can be provided to enhance the learning ability of the model for hidden features. This may include techniques such as data collection, data synthesis, or the use of generative models in order to create more samples.

It is worth noting that providing more samples is not just about increasing the number of samples in the dataset, but also ensuring that the added samples accurately represent the latent features of these classes. Therefore, when collecting additional samples or generating synthetic samples, careful selection of data sources and generation methods is required to ensure sample quality and representativeness.

Guess you like

Origin blog.csdn.net/m0_51312071/article/details/131469162