While Python is my favorite programming language, it's not without its flaws. Every language has shortcomings (some more than others), and Python is no exception. New Python programmers must learn to avoid some common "gotchas" Programmers learn this kind of knowledge at random, from experience, but this chapter collects it in one place. Knowing the programming behind these pitfalls can help you understand why Python sometimes behaves strangely.
This chapter explains how mutable objects like lists and dictionaries can behave strangely when you modify their contents. You'll learn sort()
how methods don't sort items alphabetically, and how floating-point numbers can have rounding errors. When you chain inequality operators !=
together, they have unusual behavior. And when writing a tuple containing a single item, you must use a trailing comma. This chapter tells you how to avoid these common pitfalls.
Don't add or remove items while iterating over the list
Adding or removing items from a list is likely to cause bugs when traversing (i.e., for
iterating ) the list with or . Consider a scenario where you want to iterate over a list of strings describing clothes and ensure there is an even number of socks by inserting a matching sock each time you find one in the list. The task seems simple: loop through the strings in the list, and when found in one , for example , append another string to the list.while
'sock'
'red sock'
'red sock'
But this code doesn't work. It gets stuck in an infinite loop and you have to break it Ctrl+C
by :
>>> clothes = ['skirt', 'red sock']
>>> for clothing in clothes: # Iterate over the list.
... if 'sock' in clothing: # Find strings with 'sock'.
... clothes.append(clothing) # Add the sock's pair.
... print('Added a sock:', clothing) # Inform the user.
...
Added a sock: red sock
Added a sock: red sock
Added a sock: red sock
`--snip--`
Added a sock: red sock
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
KeyboardInterrupt
You will autbor.com/addingloop
see the visual execution of this code.
The problem is that when you 'red sock'
append to clothes
the list, the list now has a new third item that it has to iterate over: ['skirt', 'red sock', 'red sock']
. for
The loop reaches the second on the next iteration 'red sock'
, so it appends another'red sock'
string. This makes the list ['skirt', 'red sock', 'red sock', 'red sock']
another string that Python iterates over. This will continue to happen, as shown in Figure 8-1, which is why we see a never-ending 'Added a sock.'
flow of messages. The loop only stops when the computer runs out of memory and crashes the Python program, or until you interrupt it by Ctrl+C
pressing .
Figure 8-1: On for
each iteration of the loop, a new one 'red sock'
is added to the list, clothing
which is referenced in the next iteration. This cycle repeats forever.
The point is not to add entries to the list while iterating over it. Instead, use a separate list for the contents of the new, modified list, like in this example newClothes
:
>>> clothes = ['skirt', 'red sock', 'blue sock']
>>> newClothes = []
>>> for clothing in clothes:
... if 'sock' in clothing:
... print('Appending:', clothing)
... newClothes.append(clothing) # We change the newClothes list, not clothes.
...
Appending: red sock
Appending: blue sock
>>> print(newClothes)
['red sock', 'blue sock']
>>> clothes.extend(newClothes) # Appends the items in newClothes to clothes.
>>> print(clothes)
['skirt', 'red sock', 'blue sock', 'red sock', 'blue sock']
Visual execution of this code autbor.com/addingloopfixed
in progress .
Our for
loop iterates over clothes
the items in the list, but doesn't modify what's inside the loop clothes
. Instead, a separate list, newClothes
. Then, after the loop, we newClothes
modify by expanding with the content of clothes
. You now have a list of matching socks clothes
.
Likewise, you should not delete items in a list while iterating over it. Consider a piece of code where we want to remove from a list any 'hello'
string that is not. The easiest way is to iterate over the list, removing non-matching entries:
>>> greetings = ['hello', 'hello', 'mello', 'yello', 'hello']
>>> for i, word in enumerate(greetings):
... if word != 'hello': # Remove everything that isn't 'hello'.
... del greetings[i]
...
>>> print(greetings)
['hello', 'hello', 'yello', 'hello']
Visual execution of this code autbor.com/deletingloop
in progress .
There seems to be more left on the list 'yello'
. The reason is that when for
the loop checks the index 2
, it is removed from the list 'mello'
. But this moves all remaining entries in the list down one index, going 'yello'
from index 3
to index 2
. The next iteration of the loop checks the index 3
, which is now the last one 'hello'
, as shown in Figure 8-2. The 'yello'
string slipped away in a daze! Don't remove items from the list while iterating over the list.
Figure 8-2: When deleting in a loop 'mello'
, the items in the list are moved down one index, causing i
a skip 'yello'
.
Instead, create a new list, copy all items except the one you want to remove, and replace the original list. For the error-free equivalent of the previous example, enter the following code in an interactive shell.
>>> greetings = ['hello', 'hello', 'mello', 'yello', 'hello']
>>> newGreetings = []
>>> for word in greetings:
... if word == 'hello': # Copy everything that is 'hello'.
... newGreetings.append(word)
...
>>> greetings = newGreetings # Replace the original list.
>>> print(greetings)
['hello', 'hello', 'hello']
Visual execution of this code autbor.com/deletingloopfixed
in progress .
Remember, because this code is just a simple loop that creates a list, you can replace it with a list comprehension. A list comprehension doesn't run faster or use less memory, but it's shorter without losing much readability. Enter the following into the interactive shell, which is equivalent to the code in the previous example:
>>> greetings = ['hello', 'hello', 'mello', 'yello', 'hello']
>>> greetings = [word for word in greetings if word == 'hello']
>>> print(greetings)
['hello', 'hello', 'hello']
Not only is the comprehension of the list more concise, it also avoids the problems that arise when the list is mutated while iterating over it.
references, memory usage, andsys.getsizeof()
This seems like a waste of memory by creating a new list instead of modifying the original one. But remember that just like variables technically contain references to values rather than actual values, lists contain references to values. The line shown earlier newGreetings.append(word)
doesn't copy word
the string in the variable, just the reference to the string, which is much smaller.
sys.getsizeof ()
You can see this by using a function that returns the number of bytes the object passed to it takes up in memory. In this interactive shell example, we can see that the short string 'cat'
takes 52 bytes and the long string takes 85 bytes:
>>> import sys
>>> sys.getsizeof('cat')
52
>>> sys.getsizeof('a much longer string than just "cat"')
85
(In the version of Python I'm using, the overhead of the string object takes 49 bytes, and each actual character in the string takes 1 byte.) But a list containing any of these strings takes 72 bytes, no matter how long the string is:
>>> sys.getsizeof(['cat'])
72
>>> sys.getsizeof(['a much longer string than just "cat"'])
72
The reason is that, technically, lists don't contain strings, but just references to strings, and the references are the same size regardless of the size of the data being referenced. Similar newGreetings.append(word)
code doesn't copy word
the string in , but a reference to the string. If you want to know how much memory an object and all the objects it references take up, Python core developer Raymond Hettinger wrote a function for this, which you can code.activestate.com/recipes/577504-compute-memory-footprint-of-an-object-and-its-cont
access at .
So you shouldn't feel like it's a waste of memory to create a new list instead of modifying the original while iterating. Even if your list-modifying code appears to work, it can be the source of subtle bugs that take a long time to find and fix. Wasting a programmer's time is far more expensive than wasting a computer's memory.
Although you shouldn't add or remove items from a list while iterating over it (or any iterable), it's fine to modify the contents of the list. For example, we have a list of numbers in string form: ['1', '2', '3', '4', '5']
. We can convert this list of strings into a list of integers while iterating over the list [1, 2, 3, 4, 5]
:
>>> numbers = ['1', '2', '3', '4', '5']
>>> for i, number in enumerate(numbers):
... numbers[i] = int(number)
...
>>> numbers
[1, 2, 3, 4, 5]
Visual execution of this code autbor.com/covertstringnumbers
in progress . Modifying the items in the list does the trick; it changes the number of errors-prone entries in the list.
Another possible way to safely add or remove entries from a list is to iterate backwards from the end of the list to the beginning. This way, you can remove items from the list, or add items to the list, as long as they are added to the end of the list while iterating over it. For example, enter the following code, which someInts
removes even integers from a list.
>>> someInts = [1, 7, 4, 5]
>>> for i in range(len(someInts)):
...
... if someInts[i] % 2 == 0:
... del someInts[i]
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IndexError: list index out of range
>>> someInts = [1, 7, 4, 5]
>>> for i in range(len(someInts) - 1, -1, -1):
... if someInts[i] % 2 == 0:
... del someInts[i]
...
>>> someInts
[1, 7, 5]
This code works because the index of all future items iterated by the loop is unchanged. But the repeated shift up of values after the deleted value makes this technique inefficient for long lists. Visual execution of this code autbor.com/iteratebackwards1
in progress . You can see the difference between forward iteration and backward iteration in Figure 8-3.
Figure 8-3: Removing even numbers from a list when iterating forward (left) and backward (right)
Similarly, when you traverse a list backwards, you can add items to the end of the list. Enter the following in an interactive shell, and it will someInts
append a copy of any even number in the list to the end of the list:
>>> someInts = [1, 7, 4, 5]
>>> for i in range(len(someInts) - 1, -1, -1):
... if someInts[i] % 2 == 0:
... someInts.append(someInts[i])
...
>>> someInts
[1, 7, 4, 5, 4]
Visual execution of this code autbor.com/iteratebackwards2
in progress . By iterating backwards, we can add or remove entries from the list. But this can be difficult to get right, as small changes to this basic technique can end up introducing bugs. Creating a new list is much simpler than modifying the original. As Python core developer Raymond Hettinger puts it:
- Q: What is the best practice for modifying a list while looping through it?
- A: Don't do it.
Don't copy mutable values without using copy.copy()
andcopy.deepcopy()
Variables are best thought of as labels or name tags that refer to objects, rather than as boxes that contain objects. This mental model is especially useful when modifying mutable objects: objects such as lists, dictionaries, and collections whose values can change (i.e. change). A common problem arises when copying one variable that references a mutable object to another, thinking that the actual object is being copied. In Python, assignment statements never copy objects; they only copy a reference to an object. (Python developer Ned Batchelder had a great talk on this idea at PyCon 2015, titled "Facts and Misconceptions About Python Names and Values." Watch it here. youtu.be/_AEJHKGk9ns
)
For example, enter the following code in an interactive shell and notice that spam
the variable cheese
is changed even though we only changed it:
>>> spam = ['cat', 'dog', 'eel']
>>> cheese = spam
>>> spam
['cat', 'dog', 'eel']
>>> cheese
['cat', 'dog', 'eel']
>>> spam[2] = 'MOOSE'
>>> spam
['cat', 'dog', 'MOOSE']
>>> cheese
['cat', 'dog', 'MOOSE']
>>> id(cheese), id(spam)
2356896337288, 2356896337288
Visual execution of this code autbor.com/listcopygotcha1
in progress . If you thought cheese = spam
the list object was copied, you might be surprised that cheese
it seems to have changed, even though we only modified it spam
. But assignment statements never copy objects , only references to objects . The assignment statement cheese = spam
causes cheese
the reference to spam
the same list object as it is in computer memory. It doesn't copy the list object. That's why change spam
also changes cheese
: both variables refer to the same list object.
The same principle applies to mutable objects passed to function calls. Enter the following into the interactive shell, noting that both global variables and local parameters (remember, parameters are variables defined within the spam
function's statement) point to the same object:def
theList
>>> def printIdOfParam(theList):
... print(id(theList))
...
>>> eggs = ['cat', 'dog', 'eel']
>>> print(id(eggs))
2356893256136
>>> printIdOfParam(eggs)
2356893256136
Visual execution of this code autbor.com/listcopygotcha2
in progress . Note that the IDs id()
for eggs
and theList
returned are the same, meaning these variables refer to the same list object. eggs
The variable's list object is not copied theList
; instead, the reference is copied, which is why two variables refer to the same list. A reference is only a few bytes in size, but imagine if Python copied the entire list instead of just the reference. eggs
Passing it to printIdOfParam()
a function would require copying this huge list if it contained a billion entries instead of three. Just doing a simple function call consumes gigabytes of memory! That's why Python assignments only copy references, never objects.
One way to prevent this is to copy.copy()
copy list objects (not just references) with functions. Enter the following in the interactive shell:
>>> import copy
>>> bacon = [2, 4, 8, 16]
>>> ham = copy.copy(bacon)
>>> id(bacon), id(ham)
(2356896337352, 2356896337480)
>>> bacon[0] = 'CHANGED'
>>> bacon
['CHANGED', 4, 8, 16]
>>> ham
[2, 4, 8, 16]
>>> id(bacon), id(ham)
(2356896337352, 2356896337480)
The visual execution of this code is autbor.com/copycopy1
on . ham
The variable refers to a copied list object, not bacon
the original list object referenced by , so it doesn't suffer from this problem.
But just like variables are like labels or nametags instead of boxes containing objects, lists also contain labels or nametags that refer to objects instead of actual objects. If your list contains other lists, copy.copy()
only copy references to those inner lists. Enter the following in an interactive shell to view the problem:
>>> import copy
>>> bacon = [[1, 2], [3, 4]]
>>> ham = copy.copy(bacon)
>>> id(bacon), id(ham)
(2356896466248, 2356896375368)
>>> bacon.append('APPENDED')
>>> bacon
[[1, 2], [3, 4], 'APPENDED']
>>> ham
[[1, 2], [3, 4]]
>>> bacon[0][0] = 'CHANGED'
>>> bacon
[['CHANGED', 2], [3, 4], 'APPENDED']
>>> ham
[['CHANGED', 2], [3, 4]]
>>> id(bacon[0]), id(ham[0])
(2356896337480, 2356896337480)
Visual execution of this code autbor.com/copycopy2
in progress . Although bacon
and ham
are two different list objects, they refer to the same [1, 2]
and [3, 4]
internal lists, so changes to those internal lists are reflected in both variables, even if we use copy.copy()
. The solution is to use copy.deepcopy()
, which will copy any list objects in the list object being copied (and any list objects in those list objects, etc.). Enter the following in the interactive shell:
>>> import copy
>>> bacon = [[1, 2], [3, 4]]
>>> ham = copy.deepcopy(bacon)
>>> id(bacon[0]), id(ham[0])
(2356896337352, 2356896466184)
>>> bacon[0][0] = 'CHANGED'
>>> bacon
[['CHANGED', 2], [3, 4]]
>>> ham
[[1, 2], [3, 4]]
Visual execution of this code autbor.com/copydeepcopy
in progress . While slightly slower copy.deepcopy()
than copy.copy()
, it's safer to use if you don't know whether the list being copied contains other lists (or other mutable objects like dictionaries or sets). My general advice is to always use copy.deepcopy()
: it may prevent subtle bugs, and your code may not be noticed.
Don't use mutable values as default parameters
Python allows you to set default parameters for parameters in functions you define. If no parameters are explicitly set by the user, the function will be executed with default parameters. This is useful when most calls to the function use the same argument, since default arguments make the argument optional. For example, split()
passing a method None
makes it split on whitespace, but None
also the default argument: call does the same thing 'cat dog'.split()
as call . 'cat dog'.split(None)
The function uses default arguments for arguments unless the caller passes in one. *
But you should not set a mutable object, such as a list or dictionary, as a default parameter. To see how this can lead to errors, look at the following example, which defines a addIngredient()
function that adds an ingredient string to a list representing a sandwich. Since the first and last items of this list are usually 'bread'
, a mutable list ['bread', 'bread']
is used as the default argument:
>>> def addIngredient(ingredient, sandwich=['bread', 'bread']):
... sandwich.insert(1, ingredient)
... return sandwich
...
>>> mySandwich = addIngredient('avocado')
>>> mySandwich
['bread', 'avocado', 'bread']
But using a mutable object like a ['bread', 'bread']
list like this as a default parameter has a subtle problem: the list is def
created when the function's statement is executed, not every time the function is called. This means that only one list object is created ['bread', 'bread']
because we only defined the function once . But each function call will addIngredient()
reuse this list. This can lead to unexpected behavior, as follows:
>>> mySandwich = addIngredient('avocado')
>>> mySandwich
['bread', 'avocado', 'bread']
>>> anotherSandwich = addIngredient('lettuce')
>>> anotherSandwich
['bread', 'lettuce', 'avocado', 'bread']
The function returns because addIngredient('lettuce')
it ends up using the same default argument list as the previous call, which had been added 'avocado'
instead of . Because the argument list is the same as the last function call, the string appears again. Only one list is created because the statement of the function is executed only once, not every time the function is called. Visual execution of this code in progress .['bread', 'lettuce', 'bread']
['bread', 'lettuce', 'avocado', 'bread']
sandwich
'avocado'
['bread', 'bread']
def
autbor.com/sandwich
If you need to use a list or dictionary as a default argument, the Python-style solution is to set the default argument to None
. Then write code to check this and provide the new list or dictionary when calling the function. This ensures that the function creates a new mutable object each time the function is called, rather than calling the function only once when the function is defined , as in the following example:
>>> def addIngredient(ingredient, sandwich=None):
... if sandwich is None:
... sandwich = ['bread', 'bread']
... sandwich.insert(1, ingredient)
... return sandwich
...
>>> firstSandwich = addIngredient('cranberries')
>>> firstSandwich
['bread', 'cranberries', 'bread']
>>> secondSandwich = addIngredient('lettuce')
>>> secondSandwich
['bread', 'lettuce', 'bread']
>>> id(firstSandwich) == id(secondSandwich)
False # 1
Note that firstSandwich
and secondSandwich
1 do not share the same list reference, since a new list object is created sandwich = ['bread', 'bread']
on every call , not just once on definition.addIngredient()
addIngredient()
Mutable data types include lists, dictionaries, sets, and class
objects made of statements. Do not put objects of these types as default parameters def
in statements.
Don't use string concatenation to build strings
In Python, strings are immutable objects. This means that string values cannot be changed, and any code that appears to modify a string is actually creating a new string object. For example, each of the following operations changes spam
the contents of a variable, not by changing the string value, but by replacing it with a new string value with a new identity:
>>> spam = 'Hello'
>>> id(spam), spam
(38330864, 'Hello')
>>> spam = spam + ' world!'
>>> id(spam), spam
(38329712, 'Hello world!')
>>> spam = spam.upper()
>>> id(spam), spam
(38329648, 'HELLO WORLD!')
>>> spam = 'Hi'
>>> id(spam), spam
(38395568, 'Hi')
>>> spam = f'{
spam} world!'
>>> id(spam), spam
(38330864, 'Hi world!')
Note that id(spam)
each call to returns a different identity, because spam
the string object in is not changed: it is replaced by an entirely new string object with a different identity. Creating a new string by using F-strings, format()
string methods, or format specifiers also creates new string objects, just like string concatenation. %s
Normally, this technical detail doesn't matter. Python is a high-level language that handles many of these details for you, so you can focus on creating your programs.
But building strings through lots of string concatenation slows down the program. Each iteration of the loop creates a new string object and discards the old one: in code, this looks like a concatenation in an for
OR while
loop, like this:
>>> finalString = ''
>>> for i in range(100000):
... finalString += 'spam '
...
>>> finalString
spam spam spam spam spam spam spam spam spam spam spam spam `--snip--`
Because finalString += 'spam '
the loop occurs 100,000 times, Python performs 100,000 string concatenations. finalString
The CPU has to create these intermediate string values by concatenating the current 'spam '
sums, putting them in memory, and then discarding them almost immediately on the next iteration. This is a waste since we only care about the last string.
The Pythonic way of building strings is to append the smaller strings to a list, then concatenate the lists into a single string. This method still creates 100,000 String objects, but it only performs the string concatenation once, when it is called join()
. For example, the following code produces the equivalent finalString
, but without the intermediate string concatenation:
>>> finalString = []
>>> for i in range(100000):
... finalString.append('spam ')
...
>>> finalString = ''.join(finalString)
>>> finalString
spam spam spam spam spam spam spam spam spam spam spam spam --snip--
When I measured these two pieces of code running on my machine, the list append method was 10 times faster than the string concatenation method. (Chapter 13 describes how to measure how fast your program runs.) The more iterations the loop goes through, the bigger the difference. But when you range(100000)
change to range(100)
instead, the speed difference is negligible, although joins are still slower than list appends. format()
You don't need to avoid string concatenation, F-strings, string methods, or %s
format specifiers too much in any case . The speed increases significantly only when doing a lot of string concatenation.
Python frees you from having to think about many low-level details. This allows programmers to write software quickly, and as mentioned earlier, programmer time is more valuable than CPU time. But there are cases where it pays to understand the details, like the difference between immutable strings and mutable lists, so you don't get bogged down in things like building strings by concatenation.
Don't expect sort()
to sort alphabetically
Understanding sorting algorithms, which are algorithms that systematically arrange values in some predetermined order, is an important foundation of computer science education. But this is not a computer science book; we don't need to know these algorithms, because we can call Python's sort()
methods directly. However, you'll notice sort()
some weird sorting behavior, putting uppercase before Z
lowercase a
:
>>> letters = ['z', 'A', 'a', 'Z']
>>> letters.sort()
>>> letters
['A', 'Z', 'a', 'z']
American Standard Code for Information Interchange (ASCII, pronounced "ask-ee") is a mapping between numeric codes (called code points or plain codes ) and text characters. sort()
Methods use ASCII code point sorting (a general term meaning ordinal sorting) rather than alphabetical sorting. In the ASCII system, A
this is represented by code point 65, B
by 66, and so on up to Z
90. Lowercase a
is represented by code point 97, b
98, and so on until z
122. When sorting by ASCII, uppercase Z
(code point 90) comes before lowercase a
(code point 97).
Although ASCII was nearly universal in Western computing before and throughout the 1990s, it was only an American standard: the dollar sign had a code point, (code point 36), but the pound sign had no code $
point . ASCII has largely been replaced by Unicode, which contains all of ASCII's code points and over 100,000 others.
By passing a character to ord()
a function, the code point or ordinal of the character can be obtained. You can in turn pass an ordinal integer to chr()
the function, which returns a string. For example, enter the following in an interactive shell:
>>> ord('a')
97
>>> chr(97)
'a'
If you want to sort alphabetically, pass str.lower
the method as key
an argument. This will sort the list as if lower()
a string method had been called on the values:
>>> letters = ['z', 'A', 'a', 'Z']
>>> letters.sort(key=str.lower)
>>> letters
['A', 'a', 'z', 'Z']
Note that the actual strings in the list are not converted to lowercase; they are just sorted as-is. Ned Batchelder provides more information on Unicode and code points in his talk "Practical Unicode, or, How Do I Stop Being Painful? nedbatchelder.com/text/unipain.html
"
By the way, sort()
the sorting algorithm used by Python's method is Timsort, designed by Tim Peters, the core Python developer and author of "Zen of Python". It is a hybrid of the merge sort and insertion sort algorithms, en.wikipedia.org/wiki/Timsort
described here.
Don't Assume Floating Point Numbers Are Perfectly Accurate
Computers can only store numbers in the binary number system, ie, 1s and 0s. To represent the familiar decimal numbers, we need to 3.14
translate a number like this into a series of binary 1s and 0s. Computers do this according to the IEEE 754 standard published by the Institute of Electrical and Electronics Engineers (IEEE, pronounced "eye-triple-ee"). For simplicity, these details are hidden from the programmer, allowing you to type numbers with a decimal point and ignoring the decimal-to-binary conversion process:
>>> 0.3
0.3
The IEEE 754 representation of a floating-point number does not always match a decimal number exactly, although the details of the specific case are beyond the scope of this book. A well-known example is 0.1
:
>>> 0.1 + 0.1 + 0.1
0.30000000000000004
>>> 0.3 == (0.1 + 0.1 + 0.1)
False
This weird, slightly inaccurate sum is the result of rounding errors caused by the way computers represent and handle floating-point numbers. This isn't a Python trap; the IEEE 754 standard is a hardware standard implemented directly in the CPU's floating-point circuitry. C++, JavaScript, and any other language running on a CPU using IEEE 754 (actually every CPU in the world) will get the same result.
The IEEE 754 standard also cannot represent all 2 ** 53
integer values greater than , for technical reasons beyond the scope of this book. For example, 2 ** 53
and 2 ** 53 + 1
as floating point values, both round to 9007199254740992.0
:
>>> float(2**53) == float(2**53) + 1
True
As long as you're using floating point data types, there's no way to account for these rounding errors. But don't worry. Unless you're writing software for a bank, a nuclear reactor, or a bank's nuclear reactor, the roundoff errors are small and probably not a significant problem for your program. You can usually resolve them by using integers with smaller denominations: for example, 133
cents instead of 1.33
dollars, or 200
milliseconds instead of 0.2
seconds. That way, 10 + 10 + 10
it adds up to 30
minutes or milliseconds, not 0.1 + 0.1 + 0.1
dollars 0.30000000000000004
or seconds.
But if you need precise precision, say for scientific or financial calculations, use Python's built-in decimal
module, which is docs.python.org/3/library/decimal.html
documented in . Although they are slower, Decimal
objects are exact replacements for floating point values. For example, decimal.Decimal('0.1')
create an object that represents the exact number 0.1 0.1
without being imprecise like a floating point value.
0.1
Passing a float value to decimal.Decimal()
will create an object with the same imprecision as the float value Decimal
, which is why the final Decimal
object isn't exactly Decimal('0.1')
. Instead, pass a string of floating point values to decimal.Decimal()
. To illustrate this, enter the following into the interactive shell:
>>> import decimal
>>> d = decimal.Decimal(0.1)
>>> d
Decimal('0.1000000000000000055511151231257827021181583404541015625')
>>> d = decimal.Decimal('0.1')
>>> d
Decimal('0.1')
>>> d + d + d
Decimal('0.3')
Integers have no rounding errors, so passing to decimal.Decimal()
is always safe. Enter the following in the interactive shell:
>>> 10 + d
Decimal('10.1')
>>> d * 3
Decimal('0.3')
>>> 1 - d
Decimal('0.9')
>>> d + 0.1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'decimal.Decimal' and 'float'
But Decimal
objects don't have infinite precision; they just have a predictable, established level of precision. For example, consider the following operations:
>>> import decimal
>>> d = decimal.Decimal(1) / 3
>>> d
Decimal('0.3333333333333333333333333333')
>>> d * 3
Decimal('0.9999999999999999999999999999')
>>> (d * 3) == 1 # d is not exactly 1/3
False
The expression decimal.Decimal(1) / 3
evaluates to something other than one third. But by default it will be accurate to 28 significant figures. You can find out how many significant figures the module uses by accessing decimal.getcontext().prec
the property . decimal
(Technically, a property prec
of getcontext()
the returned Context
object, but conveniently on one line.) You can change this property so that all objects created afterwards Decimal
use this new level of precision. The following interactive shell example reduces the precision from the original 28 significant digits to 2:
>>> import decimal
>>> decimal.getcontext().prec
28
>>> decimal.getcontext().prec = 2
>>> decimal.Decimal(1) / 3
Decimal('0.33')
decimal
Modules give you fine-grained control over how numbers interact. The module is fully documenteddecimal
at https://docs.python.org/3/library/decimal.html .
Do not chain inequality operators!=
Chained comparison operators like 18 < age < 35
this or six = halfDozen = 6
chained assignment operators like this are convenient shortcuts for (18 < age) and (age < 35)
and , respectively.six = 6; halfDozen = 6
But don't chain !=
comparison operators. You might think that the code below checks that all three variables have different values from each other, because the expression below evaluates to True
:
>>> a = 'cat'
>>> b = 'dog'
>>> c = 'moose'
>>> a != b != c
True
But this chain is actually equivalent (a != b) and (b != c)
. This means it a
can still be c
the same as and a != b != c
the expression is still True
:
>>> a = 'cat'
>>> b = 'dog'
>>> c = 'cat'
>>> a != b != c
True
The bug is subtle and the code is misleading, so chaining !=
operators is best avoided.
Don't forget commas in single-item tuples
When writing tuple values in code, keep in mind that a trailing comma is still required even if the tuple contains only one item. While value (42, )
is a 42
tuple containing integers, values (42)
are just integers 42
. (42)
The parentheses in are similar to (20 + 1) * 2
those used in expressions that evaluate to integer values 42
. Forgetting the comma leads to this:
>>> spam = ('cat', 'dog', 'moose')
>>> spam[0]
'cat'
>>> spam = ('cat')
>>> spam[0] # 1
'c'
>>> spam = ('cat', ) # 2
>>> spam[0]
'cat'
Without the comma, ('cat')
evaluates to the string value, which is why spam[0]
evaluates to the first character of the string, 'c'
1 . To recognize parentheses as a tuple value, a trailing comma is required. In Python, commas form more tuples than parentheses.
Summarize
Miscommunication occurs in every language, even in programming languages. Python has several pitfalls for the unwary. Even if they occur rarely, it's good to know about them so you can quickly identify and debug problems they might cause.
Although it is possible to add or remove entries from a list while iterating over it, this is a potential source of bugs. It is safer to iterate over a copy of the list, and then make modifications to the original list. When you copy a list (or any other mutable object), remember that the assignment statement only copies the reference to the object, not the actual object. You can use copy.deepcopy()
a function to copy an object (and a copy of any object it references).
You should not def
use mutable objects as default parameters in statements, because they are def
created when the statement is run, not every time the function is called. A better idea is to set default parameters None
, then add code to check None
and create a mutable object when the function is called.
A subtle problem is +
concatenating several smaller strings with operators in a loop. For a small number of iterations, this syntax is fine. But behind the scenes, Python is constantly creating and destroying string objects on each iteration. A better approach is to append the smaller strings to a list, then call join()
the operator to create the final string.
sort()
Methods are sorted by numeric code points, which is different from alphabetical order: uppercase ones come before Z
lowercase ones a
.
Floating point numbers have slight rounding errors as a side effect of the way they represent numbers. For most programs, this doesn't matter. But if this affects your program, you can use Python's decimal
modules.
Never !=
string operators together, as 'cat' != 'dog' != 'cat'
expressions like this will confusingly evaluate to True
.
Although this chapter describes the Python pitfalls you're most likely to encounter, they don't occur very often in most real-world code. Python does a great job of reducing the number of surprises that can arise in your programs. In the next chapter, we'll cover some even rarer and downright weird traps. It's almost impossible to encounter these strange Python languages without looking for them, but it's interesting to explore why they exist.