How does Python custom group the list and repeat it 2 times

1. The origin of the problem

Before, I wrote an article about calling Flush Machine Translation API to translate subtitles in batches.

In the process of calling the machine translation api, I encountered a problem, that is, the Python sample code provided by the website only accepts a list with a character length of less than 5000, so I think that if we pass the sentence to the server for translation sentence by sentence, the running speed of the program will be very slow. If we pass the list consisting of 30 to 50 string elements to the translation server, the operating efficiency will be greatly improved.

However, how to group list elements and ensure that the total length of the string is less than 5000 has always been a difficult problem for me. I have been finding the answer today.

2. Problem solving

I simplified the above problem to the following problem: custom group the elements in a list, and then determine whether the length of the grouped string is less than a certain length.

In order to further speed up the program and simplify the algorithm, we first define a function that accepts a list and an integer.

def chunks(lst:list,n:int):
    for i in range(0,len(lst),n):
        yield lst[i:i+n]

list is the list to be grouped, integer is the number of elements after grouping. In order to speed up the execution of the program, when we define the parameters, we directly declare the type of the parameters, such as the lst parameter is a list, n is an integer type, and at the same time use yield in the function to directly give the solution instead of giving the result after the traversal, that is, the list is replaced by the generator, which speeds up the execution speed of the program.

Then, we define a sample to apply this program:

def chunks(lst:list,n:int):
    for i in range(0,len(lst),n):
        yield lst[i:i+n]
        
sample:list[str] = ['a','b','c','d','e']

new = list(chunks(sample,3))

print(new)

Finally, we convert the returned generator into a list, and the result is:

[['a', 'b', 'c'], ['d', 'e']]

Third, the depth of the problem

After the grouping is successful, we can traverse the new list new, and then send the new grouping list to the remote server for machine translation.

If you want to add the judgment whether the character length is less than 5000, you can use the following code.

for li in new:
    if len(" ".join(li))<5000:
          translate(api, li) #这里假定翻译函数名是translate()

After completing these tasks, we can continue to analyze in depth. If we repeat these grouping elements twice, simply put the following list:

[['a', 'b', 'c'], ['d', 'e']]

becomes the list below

['a','b','c','a','b','c','d', 'e','d', 'e']

And how to expand it into a list? In fact, we can borrow the method we used before,

In the end we get the following code:

def chunks(lst:list,n:int):
    for i in range(0,len(lst),n):
        yield lst[i:i+n]
        
sample:list[str] = ['a','b','c','d','e']

new = list(chunks(sample,3))

print([elm for i in new for elm in i*2])

Generated result:

In the above code, we use a list comprehension to repeat the list elements 2 times and expand all sublists. If the list element is repeated twice, change the i in the list comprehension to [i], so that the sublist can be repeated.

print([elm for i in new for elm in [i]*2])

4. Post-school reflection

  1. You can simplify the algorithm and speed up the program by declaring variable types, using generators, and list comprehensions.
  2. The variable declaration has reached a new level. To declare a list variable, use: sample:list[str] = ['1', '2', '3'] to directly specify the variable type as a list, and the element type in the list is also specified as a character.

おすすめ

転載: blog.csdn.net/henanlion/article/details/131253110