I think 128 bytes is probably on the high side of the average, but not a bad place to start for conservative estimates.
Here's an example worst case update message, taken from a player who has just reached the apex of a jump and is headed downward and triggered a non-predictable animation transition, while spinning around in midair.
I say worst case because attributes that don't change are omitted from update messages, so if I weren't spinning around, the orientation ("o") would remain constant and not be included, for example. The same happens if it believes that the position and animation state can be trivially predicted.
<message type="groupchat" to="test1@server.nice.try/Nomad"
from="novapraetoria_meta@conference.server.nice.try/Conan the Librarian">
<u xmlns="xx:u" p="-4742.7 40 -240.8" v="0 -0.6 0" o="0 -1.77 0" m="JUMPFALL"/>
</message>
That's the inbound message that is being received by another client, chosen because it's slightly longer (outgoing stanzas to the server can omit the 'from').
That's 221 bytes for the worst case. Notice that the majority of the data is actually the boilerplate XMPP message wrapper around it. However, one thing that doesn't take into account is that we use TLS, and TLS applies DEFLATE compression to its data stream. I haven't measured the effects directly, but it should make the message considerably shorter, as the tags and addresses are repeated very often and are prime candidates to be turned into back-references.
We rate limit outbound messages to prevent flooding, as well as add a small delay before sending to allow changes that happen within a couple ticks of each other to be batched into a single update. Very small changes in the values also cause it to wait longer before sending an update.
The zones will initially be capped to 30, to be on the conservative side. That's a setting on the XMPP server itself, so it's something we can tweak in real time if it looks like we can go higher.
That reminds me, I need to implement instancing when a zone fills up. BRB.