Program optimization
I while back I came across http://critticall.com/
This program optimizes code by making random changes, checking whether they work and their speed. Would this sort of approach work with optimizing assembler? Currently a compiler outputs a particular version for each type of cpu but there are potentially more optimizations that can be done for a particular system. Experimentation would surely give better results. 
It would have been funny, if it wasn't so sad to watch...

Here's how Critticall would solve the following problem.
One of the lights in the bathroom just went out. Solve it. <...a few hours later...> All walls are drilled in random places, the toilet is taken off and broken to pieces, all the fuses are blown by shorts, the water pipes were disconnected (but not shut)... you get the idea. The light is still off. _______________ (Genetic algorithms were around for more than a decade. I do agree that there are some problems for them. I've exaggerated a little bit with this example.) Last fiddled with by Batalov on 20120915 at 19:05 
The basic problem is that the random genetic evolution algorithm needs 4 billion years to work properly.

The code is first designed and it is then analysed and annotated with information that specifies how far each individual instruction can be moved relative to the others around it without changing the final answer (using knowledge of the target microarchitecture). The code is then run in a simulator that tests the different instruction orders to find the order that gives the highest speed. 

The problem I foresee here is:
The modern CPUs are very complex things. There are both internal and external states and conditions that occur under real application usage that a simulator cannot hope to duplicate, or even model closely. Also with the sheer number of movable instruction in any nontrivial program I suspect that finding the optimal order is a rather hard problem. Is there a travelling salesman in the house? 

Nobody pretends that any of this is perfect  it doesn't have to be. The overall speed improvements are both measurable and significant. 

