Generally, like you described it, I say:
It would work 'like or along' player-skins with bandwidth and data-size.
Client-only, using a standard-sound which can be enhanced by downloading an externally hosted sound file can work well.
Serverside synchronsiation and logic-interaction, for...