A major hog of real estate is, of course, caches. Caches, and any IC (integrated circuit) based memory device, must have many wires running to and from the read and write ports. For on-chip caches, the load/store unit in the CPU must be able to access every location in the cache from both the read and write ports. The situation is even worse when there are more than one load/store units. That's a lot of wires! In the Pentium Pro, for example, a single package includes both the CPU chip and a L2 cache chip; the CPU chip has about 5 million transistors, while the cache chip has about 15 million transistors.
However, even accounting for caches, there is still a large increase in the number of transistors in today's chips compared to the 4004. Obviously, microprocessors are becoming increasingly more complex. We can understand this increasing complexity since chip designers want to create fast processors which are at the same time affordable. As process technology improved and more transistors could be fitted in the same die area, it became cost effective to add newer or improved features to the processor in an attempt to increase its effective speed. One of these improvements is dynamic scheduling.
Dynamic scheduling, as its name implies, is a method in which the hardware determines which instructions to execute, as opposed to a statically scheduled machine, in which the compiler determines the order of execution. In essence, the processor is executing instructions out of order. Dynamic scheduling is akin to a data flow machine, in which instructions don't execute based on the order in which they appear, but rather on the availability of the source operands. Of course, a real processor also has to take into account the limited amount of resources available. Thus instructions execute based on the availability of the source operands as well as the availability of the requested functional units.
Dynamically scheduled machines can take advantage of parallelism which would not be visible at compile time. They are also more versatile as code does not necessarily have to be recompiled to run efficiently since the hardware takes care of much of the scheduling. In a statically scheduled machine, code would have to be recompiled to take advantage of the machine's particular hardware. (All of this is assuming the machines use the same instruction set architecture. Of course, the code would have to be recompiled no matter what if the machines used different ISAs.)