Runcible Blog

removing dupes

I came across a post about the quickest way to remove duplicates from a python list. This particular problem seems to crop up often, and I've noticed a huge slow down when doing something like:

new_list = []
for i in old_list:
    if i not in new_list:
        new_list.append(i)
I never took the time to benchmark different techniques like Peter did, though. It's interesting to see that:
{}.fromkeys(seq).keys()
is the fastest method, but only a hair faster than the conceptually simpler:
list(set(seq))

Most of the time, I use list(set(seq)) or just leave the sequence as a set without turning it back into a list. Also, the results present a strong argument against preserving order: even the fastest order-preserving method took 2.5 times as long as the quickest non-order-preserving method.

good to know.