A system of 16x16x16 = 4096 blocks (voxels) per chunk, which consumes 12,288 bytes per chunk (assuming SM still uses 3 byte voxels). On a 10Mbps connection this requires 10.1ms to transfer. For the desktop/GPU, smaller chunks also pose a
significant rendering overhead. There's also AABB (e.g. collision) checks: a collision typically interacts with more chunks when they're smaller, and therefore more calculation work is required when there's more/smaller chunks.
Upcoming system using 32x32x32 = 32678 blocks per chunk, consumes 98,034 bytes per chunk. This requires 80.3ms of time to transfer, and should help improve mesh generation for the desktop/GPU, while also reducing the quantity of chunks involved in many types of collisions (therefore making AABB checks less painful).
A jump to 64x64x64 = 262,144 bytes/chunk, requires 214.7ms to transfer...from a server perspective, that's a point of diminishing returns. Primarily because anytime a single block is changed, the entire chunk (+ any colliding/interacting chunks) must be recomputed/resent. That would require a lot of bandwidth, and I assume would introduce too much lag.