Ned's post about "human sorting" lists lead to the inevitable comments from people who want to write the most succinct solution to the problem. Among them was the remark that "exception handling is slow." Ned asked me why it was assumed that exception handling was so slow, and I replied that it isn't (based on some of the remarks about improvements in exception handling in Python 2.5). I had no real evidence of the overhead from catching exceptions, so I thought I'd benchmark the different sorting methods
def tryint(s): try: return int(s) except: return s def ak1(s): return [ tryint(c) for c in re.split('([0-9]+)', s) ]
Then, using str.isdigit instead:
def safeint(s): if s.isdigit(): return int(s) else: return s def ak2(s): return [ safeint(c) for c in re.split('([0-9]+)', s) ]
Somebody mentioned the new syntax in 2.5 that allows one-liner conditionals, thus saving a function call:
def ak3(s): return [ int(c) if c.isdigit() else c for c in re.split('([0-9]+)', s) ]
Of course, there's the easy optimization of compiling the regular expression beforehand:
NUM_RE = re.compile('([0-9]+)') def ak4(s): return [ int(c) if c.isdigit() else c for c in NUM_RE.split(s) ]
Ok, so what were the results? Here, I'm sorting a 5000 item list of strings created like so:
def make_list(sise): samp = string.letters + string.digits return [''.join(random.sample(samp, 4)) for i in xrange(size)]
for each function:
timeit.py -s "import sorting; td=sorting.make_list(5000)" "td.sort(key=sorting.ak1)"Results:
ak1: 80.7 msec
ak2: 37 msec
ak3: 34 msec
ak4: 18.1 msec
It turns out the conventional wisdom isn't just an old pythonic wive's tale: exception handling is slow. Even though ak1 would've been my first thought for the function (hanging around with Ned too much?) because it seems more pythonic to just try to convert the value to an int and catch the exception if it's not an integer, that function is 4 times slower than the fastest case. Also, this benchmark shows that there isn't much of a gain from the new conditional syntax, even though a function call is saved.
As usual, pre-compiling the regex makes lots of sense for performance-critical code. For giggles, I have a function (that I suppose works...) which avoids the regular expression:
def ak5(s): new_s =  b = '' for c in s: oc = ord(c) if oc > 57 or oc < 48: if b: new_s.append(int(b)) b = '' new_s.append(c) else: b += c return new_s
It's not as slow as I thought, averaging 25.5 msecs on my machine.
What have we learned today?
- Python exception handling isn't nearly as lightweight as I thought. Especially avoid it in tight loops.
- Function calls should be minimized. (duh)
- Sometimes the more succinct solution isn't faster.
- Compile common regular expressions. (duh)