Based on what I remember from the maintenance outages, I would say they had each server carrying 2 or 3 complete worlds. They always seemed to be down in pairs or triples.
Now, I like the idea of the servers as VMs since in theory you could (provided you had the capacity in hardware) spin up VMs for high load situations. You could even set it up so that a fixed number of public mapservers existed that run only the public maps. When someone started a private map, players on the team transfer to the VM handling just that map or perhaps a variable number of maps with a fixed max for the private mapservers. If you hit the max, spin up another VM in the waiting for work pool and grab one that's in hot standby for work from that pool. These private mapservers would handle private maps for all public map mapservers. Basically, have 2 private mapservers in that waiting pool. When someone starts a private map and there is a need to grab a server from that pool (all currently working private mapservers can't take another private map), grab it to start their mission and start another VM for that pool. The pool would always have say 2 private mapservers with no work waiting. If capacity would push that pool to greater than 2, spin down VMs or pause them so that only 2 are on hot standby if desired.
Where I work, we actually do that in a semi automated way with our testing environments. A set of programmers can order environments for testing code. They order it, it's cloned from libraries, configured as needed, and in their hands in 30 minutes or less. We could have it done in 2 minutes where it not for all the configuration work. But, there's no work done by us. Our automation handles all of it from submitting the order, through life cycle, and the expire/release resources cycle. We're even considering setting up automated code testing where the programmer submits his/her code to a testing server with a battery of desired tests. The testing server will order an environment, run the tests, kill the environment, and report test results in a completely automated way. No humans beyond the programmer submitting his/her work for test.