The most important thing you need to know about Python code optimization is to never write timing functions yourself.
Timing a very short code is complicated. How much time does the processor have to run this code? Is there something running in the background? Every modern computer runs continuous or intermittent programs in the background. Small oversights can ruin your 100-year plan, background services are occasionally " wake up " in the last thousandth of a second to do things like check for mail, connect to timed communication servers, check for app updates, scan for viruses, see if a disk is inserted into an optical drive meaningful things like that. Turn everything off and disconnect from the network before starting the timed test. Make sure everything is off again and turn off services that keep checking to see if the network is back up, etc.
Next is the variable factor introduced by the timing frame itself. Does the Python interpreter cache method name lookups? Cache compiled results of code blocks? What about regular expressions? Does your code have side effects when it runs repeatedly? Don't forget that the results of your work will be presented in units smaller than seconds, and that small mistakes in your timing framework will distort the results irreparably.
There is a saying in the Python community: " Python carries its own battery. " Don't write your own timing framework. Python 2.3 has a perfect timing tool called timeit .
Example 18.2. Timeit introduction
If you have not downloaded the sample programs that accompany this book, you can download this and other sample programs .
>>> import timeit >>> t = timeit.Timer("soundex.soundex('Pilgrim')", ... "import soundex") >>> t.timeit() 8.21683733547 >>> t.repeat(3, 2000000) [16.48319309109, 16.46128984923, 16.44203948912]
You can use the timeit module on the command line to test an existing Python program without modifying the code. See the documentation for command line options at http://docs.python.org/lib/node396.html . |
Note that repeat() returns a list of times. Due to small changes in processor time used by Python timers (or those nasty background processes you can't root out), it's almost impossible to have duplicates in these times. Your first thought might be to say, " Let's average to get the real data. "
In fact, that's almost certainly wrong. Changes to your code or the Python interpreter may shorten the time, and those nasty background processes that cannot be removed or other factors other than the Python interpreter may increase the time. If the difference between the timing results is more than a few percent, there are too many variables to trust the results, if not, take the minimum value and discard the other results.
Python has a handy min function that returns the smallest value in the input list:
>>> min(t.repeat(3, 1000000)) 8.22203948912
The timeit module is only used when you know which piece of code needs to be optimized. If you have a large Python program and don't know where your performance problems lie, check out the hotshot module . |