Thoughts about the region chunkdata and blockdata in smd2 files

    Joined
    Sep 29, 2015
    Messages
    13
    Reaction score
    2
    • Purchased!
    Hello everyone, I am a long time reader and finally got around to my first post.
    I hope it's the right place to post it here.
    Caution! Tech ahead.

    In the last days i made some research on the smd2 files to eventually build a layer editor for starmade.
    The smd2 files are the region files for starmade ships which describe 16x16x16 chunks and their corresponding 16*16*16 blockdata.

    The smd2 (binary) file starts with a block of 16x16x16 informations of data describing the further content of the file.
    This information contains a chunkId and a chunk size, that's 8 bit of data per entry. The position of the chunk can be determined by the current index of the data while iterating over it.
    The chunkId is important to identify valid data later in the file.

    typedef struct
    {
    int segmentId;
    int segmentSize;
    } SegmentIndex;

    The next block of 16*16*16 data contains a timestamp, that's a number in the long format that should contain the elapsed seconds since Jan 01 1970.
    My guess is that this value is used to control when a chunk has to be synchronized with the server.

    Now that we have seen all the bookkeeping data the interesting part is ahead of us.
    In the first block of data we found the ids of the chunks. All id's that where not -1 are now interesting for us because that are the valid ids in this data block.
    Even if the id is not valid and therefore does not describe any data of our ship it is still saved so we need to skip over that somehow.
    For every id we have the following data:
    The first 8 bytes contain another timestamp.
    The next 3x4 bytes (3x an Integer value that consists of 4 bytes) in the file we read is the x, y and z position of the chunk (This position seems to be invalid in the current savefiles build but can be retained by the segmentId and the position calculated in the first block).
    After that we got 2 chars (each two bytes) for the chunk type and compressed size (they seem to be rather useless currently).

    The last part are the 5094 bytes of compressed data that contain the blockdata.
    Every block is described by 3 bytes (have a look at the links at the end for additional infos) so the inflated blockdata should have 3*16*16*16 = 12288 bytes every time.
    The links at the end describe that the compressed data gets filled with 0's up to the 5094 bytes limit and that lets some questions open.

    -How do we know that 5094 bytes is the maximum number of bytes that the compressed data can have (truly randomized data could not be compressed at all so here should be a pattern 5094 bytes)
    -Why is it filled with 0's instead of saving the compressed size as an integer in front of the compressed block
    (most probably to have the ability to jump to whatever blockdata we want to read without indexing and the ability to change the data without changing the starting point of data blocks behind the changed one)

    So there are a lot of 16*16*16 byte blocks are used in the savefiles. That could be just a random size that fits because 8*8*8=512 seems to small and 32*32*32=32768 seems to big but i guess it has more to do with the 4k-sectors on hdds and read/write performance when reading consecutive data from the same sector.

    I hope my introduction to savefiles was interesting (for all the readers that made it to the end :) ).
    If there are errors in my description or something is unclear feel free to criticise constructively.

    For further reading and kudos to the writers of the wiki:

    https://starmadepedia.net/wiki/Blueprint_File_Formats
    http://www.starmadewiki.com/wiki/File_format
    https://github.com/StarMade/SMTools

    ----------------------------------------------------
    For evereyone who is not tired yet:

    Sorry, I cant stop to write so lets dive into the ingame data layout of the chunks and blocks :). From now on it gets a bit tech and java heavy.
    I am really curios about how the data is handled in the game. The blockdata is a 3 byte structure with 3 different definitions:
    Type 1: Orientation[3] Active[1] Hit Points[9] Block ID[11]
    Type 2: Orientation[4] Hit Points[9] Block ID[11]
    Type 3: Orientation[5] Hit Points[8] Block ID[11]
    (braces contain the number of bits)

    That means there would be a waste when using a integer[4096] array to store the data (4 byte each integer, 1 byte would be lost or available for random data) but using bytes[4096*3] would require 2 array loockups to get the information for block id and hitpoints and probably a conversion to a more convenient type.
    Also it would be a lot of wasted memory if only one block is placed in a chunk and a full array of whatever type is generated to hold that single block data.
    Until about 340 blocks a hashmap would be a lot more economic choice with a access complexity of O(1) (high probability, not worst case) than a array of integers even with the overhead of references, hashes and Integer autoboxing. In ships with a lot of interior there might be many chunks with only a few blocks so this might save a lot of memory.

    Many thanks to all the brave readers that made it until here (got a bit longer than i expected).
     
    Joined
    Jul 21, 2013
    Messages
    2,932
    Reaction score
    460
    • Hardware Store
    The smd2 files are the region files for starmade ships which describe 16x16x16 chunks and their corresponding 16*16*16 blockdata.

    The smd2 (binary) file starts with a block of 16x16x16 informations of data describing the further content of the file.
    This information contains a chunkId and a chunk size, that's 8 bit of data per entry. The position of the chunk can be determined by the current index of the data while iterating over it.
    The chunkId is important to identify valid data later in the file.

    typedef struct
    {
    int segmentId;
    int segmentSize;
    } SegmentIndex;

    The next block of 16*16*16 data contains a timestamp, that's a number in the long format that should contain the elapsed seconds since Jan 01 1970.
    My guess is that this value is used to control when a chunk has to be synchronized with the server.

    Now that we have seen all the bookkeeping data the interesting part is ahead of us.
    In the first block of data we found the ids of the chunks. All id's that where not -1 are now interesting for us because that are the valid ids in this data block.
    Even if the id is not valid and therefore does not describe any data of our ship it is still saved so we need to skip over that somehow.
    For every id we have the following data:
    The first 8 bytes contain another timestamp.
    The next 3x4 bytes (3x an Integer value that consists of 4 bytes) in the file we read is the x, y and z position of the chunk (This position seems to be invalid in the current savefiles build but can be retained by the segmentId and the position calculated in the first block).
    After that we got 2 chars (each two bytes) for the chunk type and compressed size (they seem to be rather useless currently).

    The last part are the 5094 bytes of compressed data that contain the blockdata.
    Every block is described by 3 bytes (have a look at the links at the end for additional infos) so the inflated blockdata should have 3*16*16*16 = 12288 bytes every time.
    The links at the end describe that the compressed data gets filled with 0's up to the 5094 bytes limit and that lets some questions open.

    -How do we know that 5094 bytes is the maximum number of bytes that the compressed data can have (truly randomized data could not be compressed at all so here should be a pattern 5094 bytes)
    -Why is it filled with 0's instead of saving the compressed size as an integer in front of the compressed block
    (most probably to have the ability to jump to whatever blockdata we want to read without indexing and the ability to change the data without changing the starting point of data blocks behind the changed one)

    So there are a lot of 16*16*16 byte blocks are used in the savefiles. That could be just a random size that fits because 8*8*8=512 seems to small and 32*32*32=32768 seems to big but i guess it has more to do with the 4k-sectors on hdds and read/write performance when reading consecutive data from the same sector.
    Why did you write a description of the smd2 filetype, when you could've just linked to an anchor in one of the wikis[or all of them], and then just asked the questions you had? (as for the guesses, at least for me there is nothing new or surprising there)
    (This position seems to be invalid in the current savefiles build but can be retained by the segmentId and the position calculated in the first block).
    The position is stored in blocks. as each chunk is 16×16×16 blocks, you would have to divide each coordinate by 16, or rightshift each by 4.
    (they seem to be rather useless currently)
    The chunk-type might become useful later. The additional length info is for when a segment is read without consulting the index at the start of the file.
    -How do we know that 5094 bytes is the maximum number of bytes that the compressed data can have (truly randomized data could not be compressed at all so here should be a pattern 5094 bytes)
    Most blocks do not use the orientation bits, and usually most blocks of the same type will also have full HP, so the 8 hp bits usually also don't have random values.
    Also, less than half of all possible blockIDs are used currently, so even the amount of expectable combinations of the 11 block ID's aren't all used.
     
    • Like
    Reactions: GRHayes
    Joined
    Sep 29, 2015
    Messages
    13
    Reaction score
    2
    • Purchased!
    Thanks for the answer, thats some interesting input.

    Why did you write a description of the smd2 filetype, when you could've just linked to an anchor in one of the wikis[or all of them], and then just asked the questions you had? (as for the guesses, at least for me there is nothing new or surprising there)
    Well... i just wanted to write it down in my own words, linking would shurely have been the faster way.
    Nice to hear you came to similar conclusions regarding the guesses.

    The position is stored in blocks. as each chunk is 16×16×16 blocks, you would have to divide each coordinate by 16, or rightshift each by 4.
    Thats interesting. Until now i thought it is the position in chunk steps and not block steps.
    After trying the rightshift I still get very strange values for the position (I get the correct output when using the positions created in the first block). Maybe i got something wrong in my code.

    When I first read a long for the timestamp and then 3 integers for x,y,z with a rightshift of 4 I get
    X:-25165824 Y:16777216 Z:0 instead of X:1 Y:0 Z:0
    X:-112197633 Y:-16777216 Z:1048575 instead of X:-1 Y:0 Z:-2
    All block positions and orientations are read correctly without a problem.

    The additional length info is for when a segment is read without consulting the index at the start of the file.
    Sorry, i don't understand why this would help to read or interpret the segment. Could you give me some more hints

    Most blocks do not use the orientation bits, and usually most blocks of the same type will also have full HP, so the 8 hp bits usually also don't have random values.
    Also, less than half of all possible blockIDs are used currently, so even the amount of expectable combinations of the 11 block ID's aren't all used.
    Well... that are the obvious reasons. I thought there might be a hard limit of compression that is calculable and will always be reached with every expected combination or something more than "it's expected that the data will compress good".
    Actually if only one byte has pseudo random numbers (generated with java.util.Random) for every block in a chunk e.g. a random damage on every block (very unlikely inside the game but still a valid state of blocks) the java DeflaterOutputStream can't compress the data to less than 5094 bytes. My test indicated something around 5700 bytes with (4096 * one random byte and two fixed bytes). I really wonder what would happen ingame if this state occures :) (Of course HP is not needed in blueprints so it might always be the maximum for all blocks. I wonder if the savefiles have the same layout.).
     
    Joined
    Jul 21, 2013
    Messages
    2,932
    Reaction score
    460
    • Hardware Store
    X:-25165824 Y:16777216 Z:0 instead of X:1 Y:0 Z:0
    X:-112197633 Y:-16777216 Z:1048575 instead of X:-1 Y:0 Z:-2
    This looks like you read a bigendian as littleendian[or vice-versa].
    Sorry, i don't understand why this would help to read or interpret the segment. Could you give me some more hints
    During random access, a chunk is look up in the index/header of the file, the location and length of the segment are read from the header, then loaded into memory, and then parsed. When you want to read ALL chunks however, it would be easier to just skip the header and just load each following 5kb segment. However. as the compressed data is padded by zeros, you still need to tell how many of the zeros at the end are from the padding or compression-output. Therefore, the length info exists within the segment of each chunk too.
     
    Joined
    Sep 29, 2015
    Messages
    13
    Reaction score
    2
    • Purchased!
    The endianess might indeed be part of the problem. After reversing the bytes I get far more reasonable values (also still not what I would expect). Do you know to what the origin of the block offset is (ship core, or something different)?
    For my chunk at 0,0,0 i get 121,0,0 for 0,0,-1 i get 251,0,-256 and for 0,0,-2 i get 150,0,-256
    That still doesn't follow a pattern that I do recognize.

    For now I will try to create a dummy smd2 file to load in starmade and see what's the outcome for different position values.
    If there is no difference between different values I will continue to use the calculated positions (position by block index).

    Also loading files only partially will not be needed so i will always read the whole file. With about 4 million block i am below 350 mb ram and loading takes just a few seconds.
     
    Joined
    Jul 21, 2013
    Messages
    2,932
    Reaction score
    460
    • Hardware Store
    The endianess might indeed be part of the problem. After reversing the bytes I get far more reasonable values (also still not what I would expect). Do you know to what the origin of the block offset is (ship core, or something different)?
    For my chunk at 0,0,0 i get 121,0,0 for 0,0,-1 i get 251,0,-256 and for 0,0,-2 i get 150,0,-256
    That still doesn't follow a pattern that I do recognize.
    The shipcore is at blockCoords 8,8,8 in chunk 0,0,0 in region 0,0,0
     

    therimmer96

    The Cake Network Staff Senior button unpusher
    Joined
    Jun 21, 2013
    Messages
    3,603
    Reaction score
    1,053
    The shipcore is at blockCoords 8,8,8 in chunk 0,0,0 in region 0,0,0
    I get that there is a reason for it, but am I the only person who is annoyed that the core is not in the exact center of the chunk >_>
     
    Joined
    May 5, 2014
    Messages
    375
    Reaction score
    77
    • Legacy Citizen
    • Purchased!
    I get that there is a reason for it, but am I the only person who is annoyed that the core is not in the exact center of the chunk >_>
    .... because that is impossible? I mean hard to be in the middle of a 16^3 cube.