[CSDN Editor's Note] Let's find bugs together.
Original link: https://dwrodri.gitlab.io/can-you-spot-the-bug-in-this-python-code/
This article has been authorized by the author and may not be reproduced without permission!
Author | Derek Rodriguez
Translator | Crescent Moon
Editor | Xia Meng
Listing | CSDN (ID: CSDNnews)
Recently, I encountered a very interesting problem while parsing text. Before we get into that, let me give you a little backstory. My task is to parse some comma separated data from a text file like this:
This text file contains several variable-width hexadecimal values, with at least three fields per line. I only care about the first and third fields. In my opinion, the analysis work can be divided into three steps:
- Read each row of data in a loop;
- Use commas to break the data into a list;
- Picks the first and third elements and converts them to integers.
It seems simple, I can write a few lines of code using pandas DataFrame and it's enough.
Below is the code I wrote:
Did you find the bug? Anyway, I didn’t see it. Now, let me explain this code in detail and dig into where I went wrong.
Detailed code explanation
CSV file is a list of lists
I simply think that CSV data is a list of lists. So I can treat the individual elements as embedded lists. I found the code to read the embedded list from a post online and copied and pasted:
nested_lists = [[1,2,3],[4,5,6],[7,8,9]]flattened_list = [element for sublist in nested_lists for element in sublist]
I had been exposed to C and C++ before learning Python, so when learning nested comprehensions, I felt that Python was just pseudocode that the machine could understand. This nested list generates the following bytecode:
Then I extended some of my own code and ended up with the following code:
mistake
It turns out that Python can't combine iterable text decomposition with comprehensions the way I imagined, you have to put the .split(",") call in another list:
This is a bit nerve-wracking for me, because .split(",") is itself a list. Packing it into another list, doesn't it become a double nested list? I do not quite understand. I tried looking for the answer via compiler browser. The image below shows the difference between the correct generator expression and the code I wrote:
Do you see the problem? The problem in your code is that the return value of .split() is an iterator before splitting the text. I'm not sure, but I believe it has to do with implementation details established when list comprehensions were first proposed.
Finally, I solved this problem with the help of CPython contributor Crowthebird, who demonstrated the problem of rewriting the code without using comprehensions.
Wrong way of writing:
Correct way to write:
Can this problem be solved?
This is actually because my understanding of the Python interpreter is wrong, there is nothing wrong with the interpreter itself. I don't think it would be better to modify the language as I understand it, since it is so difficult to distinguish when a container should be destructured and when it should be reused in the case of nesting, plus list comprehensions return tuples, which is not allowed by PEP 202 .