Yeah, it will be local first for sure. But I've pretty much got the idea for the network setup. Essentially it will flow like this.
->Player connects to server.
<-Server messages all current players that a user has joined.
<-Server messages all current players the new player's tracker data.
--Players play tracker information on their side, alone with their own tracker, all on beat.
->Player changes his tracker.
<-Server messages all current players the player's new tracker data.
And then a sort of constant.
->Players send their positions to the server.
<-Server relays all their positions back to them.
--Players update postions and sound volumes inrespect to the updates.
Tracker data will probably be as follows.
[1 Byte][Tempo]
[1 Byte][Number of Channels]
[1 Byte][Channel 1 Step Length]
[1 Byte][Channel 1 Step 1 Sound]
[1 Byte][Channel 1 Step 1 Volume]
[etc...]
Max Channels for now will be 4.
Max Steps will be 32.
So that's around 262 bytes at a time. And they only need to be transfered when the player joins. All other changes will be sent to the server as individual changes, such as to tempo or changing independent steps, again like 5~8 bytes max.