"It isn't virtual reality until you can mount a coup d'etat in it."
This is the information age, so our definition of a coup consequently varies from the traditional one (involving guns and colonels in sunglasses). But, for the sake of argument, let us posit that it's a de-facto coup if you can fool all of the people all of the time — and controlling their perception of reality is a good start. But how much reality do you need to control?
We can approach the problem by estimating the total afferent sensory bandwidth of the target. If you can control someone's senses completely, you can present them with stimuli and watch them respond — voluntary cooperation is optional. (Want them to jump to the left? Make them see a train approaching from the right.) How stimuli are generated is left as an exercise for the world-domination obsessed AI; the question I'm asking is, what is the maximum bandwidth that may have to be controlled and filtered?
I'm picking on Scotland as an example because it's big enough to be meaningful, small enough to be unthreatening, and generally innocuous. And we can either consider Scotland as a body politic, or as a collection of approximately five million human beings.
First, the body politic. The UK currently has approximately one public CCTV camera per ten people. In addition, the population in general have cameraphones — perhaps one per person (optimistically) with an average resolution of 1Mpixel. Let's wave a magic wand and make all of them video cameraphones, and always on. (Yes, this would swamp the phone network. I'm looking for an upper bound, not a lower.) At 25 frames/sec on every camera and 8-bit colour, that gives us 5.5M cameras generating 25Mb/sec, for 137.5 x 1012 bytes/sec. Reality is liable to be an order of magnitude (or more) lower ...
Let's also give every household a home broadband internet connection — this dwarfs the bandwidth of their POTS connection — running at, say, 40mbps (an order of magnitude above the current average for broadband). Households: say they average 2.5 people — that means we have 2 million of them. So we have another 10 x 1012 bytes/sec.
Thus, the combined internet traffic, phone traffic, and video surveillance that's going on in Scotland is somewhere below 1014 bytes/sec.
Now for the second half: the people. Eyeballs first: going by Hans Moravec's estimate, the human eyeball processes about 10 one million point images per second. I think that's low — a rough estimate of the retina gives me about 40M pixels at 17 images/second, and the pixels take more than 32 bits to encode colour and hue information fully (let's say 64 bits). So, the ten million eyeballs in Scotland would take approximately 2,720 x 1013 bits — call it roughly 1015 bytes/sec to fool.
We've got more senses than just eyeballs, of course. But human skin isn't a brilliantly discriminative sensory organ in comparison — we can only distinguish between stimuli that are more than a centimetre apart over most of our bodies. (Hands, lips, and a few other places are exceptions.) Assuming 2 metres2 of skin per person, that gives us 20K sensors. Giving a firing rate of 10x per second, and 32 bits for encoding the inputs (heat and pressure, not just touch) that still approximates to less than 1Mb/sec per person, or 5 x 1012 bits per second per Scotland, which is basically lost in the noise compared to the optics, or even the spam'n'web surfing.
Sound ... hell, let's just throw in CD-quality audio times five million and have done with it. That's another 10Mb/sec per ear, or 2 x 1013 bits/sec/Scotland.
Now. Let's suppose you can plug everyone in Scotland into a Matrix-style tank and feed them real-time hallucinations. What's the infrastructure like?
We can see that the ceiling is 1015 bytes/sec. (It'd only be 1017 bytes/sec if you wanted to do this to the USA or India — don't get cocky over there!) A single high quality optical fibre can, with wavelength dimension multiplexing, carry about 2 x 1012 bits/sec. So we'd need to run one fibre to every couple of hundred people. The combined trunk to carry the sensory bandwidth of Scotland would need on the order of ten thousand fibres. It's going to be a bit fatter than my thigh; not terribly impressive.
Of course, for this weird thought-experiment to be relevant you'd have to be cramming content into that pipe and monitoring the subject's responses. But encoding human efferent output — gestures and speech — is cheap and easy compared to their sensory inputs — we are net informational sinks, outputing far less than we take in.
I've also completely ignored the issue of redundancy, in assuming that everyone in Scotland has a unique and separate experience of reality. (Call it virtual qualia.) In practice, a lot of folks will see exactly the same thing (or as near as makes no difference) at the same time. If a million people are watching a football match on TV, they will see the same image, subject to a fairly simple pixel-based transformation to modify the angle and distance they're sitting at from the TV screen (and possibly its brightness and contrast). Again, we all spend an average of 30% of our time sleeping, during which we're not doing a hell of a lot with our external visual field. So, conceivably, I'm over-estimating by a couple of orders of magnitude.
Finally: we've got a modern telecoms infrastructure that provides fibre to the kerb. Obviously, our current infrastructure isn't providing terabits per second on every fibre — but the glass is in the ground. Of more interest is the question of whether or not the available wireless bandwidth would support this sort of large-scale subversion. Right now it wouldn't, but with UWB estimated to top off around the terabit/second mark across distances of under ten metres, I wouldn't bet against it in the future.
Now if you'll excuse me, I'm going to go drink my morning cup of tea and start making myself a nice tinfoil hat ...