Hello everyone, I am a long time reader and finally got around to my first post.
I hope it's the right place to post it here.
Caution! Tech ahead.
In the last days i made some research on the smd2 files to eventually build a layer editor for starmade.
The smd2 files are the region files for starmade ships which describe 16x16x16 chunks and their corresponding 16*16*16 blockdata.
The smd2 (binary) file starts with a block of 16x16x16 informations of data describing the further content of the file.
This information contains a chunkId and a chunk size, that's 8 bit of data per entry. The position of the chunk can be determined by the current index of the data while iterating over it.
The chunkId is important to identify valid data later in the file.
typedef struct
{
int segmentId;
int segmentSize;
} SegmentIndex;
The next block of 16*16*16 data contains a timestamp, that's a number in the long format that should contain the elapsed seconds since Jan 01 1970.
My guess is that this value is used to control when a chunk has to be synchronized with the server.
Now that we have seen all the bookkeeping data the interesting part is ahead of us.
In the first block of data we found the ids of the chunks. All id's that where not -1 are now interesting for us because that are the valid ids in this data block.
Even if the id is not valid and therefore does not describe any data of our ship it is still saved so we need to skip over that somehow.
For every id we have the following data:
The first 8 bytes contain another timestamp.
The next 3x4 bytes (3x an Integer value that consists of 4 bytes) in the file we read is the x, y and z position of the chunk (This position seems to be invalid in the current savefiles build but can be retained by the segmentId and the position calculated in the first block).
After that we got 2 chars (each two bytes) for the chunk type and compressed size (they seem to be rather useless currently).
The last part are the 5094 bytes of compressed data that contain the blockdata.
Every block is described by 3 bytes (have a look at the links at the end for additional infos) so the inflated blockdata should have 3*16*16*16 = 12288 bytes every time.
The links at the end describe that the compressed data gets filled with 0's up to the 5094 bytes limit and that lets some questions open.
-How do we know that 5094 bytes is the maximum number of bytes that the compressed data can have (truly randomized data could not be compressed at all so here should be a pattern 5094 bytes)
-Why is it filled with 0's instead of saving the compressed size as an integer in front of the compressed block
(most probably to have the ability to jump to whatever blockdata we want to read without indexing and the ability to change the data without changing the starting point of data blocks behind the changed one)
So there are a lot of 16*16*16 byte blocks are used in the savefiles. That could be just a random size that fits because 8*8*8=512 seems to small and 32*32*32=32768 seems to big but i guess it has more to do with the 4k-sectors on hdds and read/write performance when reading consecutive data from the same sector.
I hope my introduction to savefiles was interesting (for all the readers that made it to the end
).
If there are errors in my description or something is unclear feel free to criticise constructively.
For further reading and kudos to the writers of the wiki:
https://starmadepedia.net/wiki/Blueprint_File_Formats
http://www.starmadewiki.com/wiki/File_format
https://github.com/StarMade/SMTools
----------------------------------------------------
For evereyone who is not tired yet:
Sorry, I cant stop to write so lets dive into the ingame data layout of the chunks and blocks
. From now on it gets a bit tech and java heavy.
I am really curios about how the data is handled in the game. The blockdata is a 3 byte structure with 3 different definitions:
Type 1: Orientation[3] Active[1] Hit Points[9] Block ID[11]
Type 2: Orientation[4] Hit Points[9] Block ID[11]
Type 3: Orientation[5] Hit Points[8] Block ID[11]
(braces contain the number of bits)
That means there would be a waste when using a integer[4096] array to store the data (4 byte each integer, 1 byte would be lost or available for random data) but using bytes[4096*3] would require 2 array loockups to get the information for block id and hitpoints and probably a conversion to a more convenient type.
Also it would be a lot of wasted memory if only one block is placed in a chunk and a full array of whatever type is generated to hold that single block data.
Until about 340 blocks a hashmap would be a lot more economic choice with a access complexity of O(1) (high probability, not worst case) than a array of integers even with the overhead of references, hashes and Integer autoboxing. In ships with a lot of interior there might be many chunks with only a few blocks so this might save a lot of memory.
Many thanks to all the brave readers that made it until here (got a bit longer than i expected).
I hope it's the right place to post it here.
Caution! Tech ahead.
In the last days i made some research on the smd2 files to eventually build a layer editor for starmade.
The smd2 files are the region files for starmade ships which describe 16x16x16 chunks and their corresponding 16*16*16 blockdata.
The smd2 (binary) file starts with a block of 16x16x16 informations of data describing the further content of the file.
This information contains a chunkId and a chunk size, that's 8 bit of data per entry. The position of the chunk can be determined by the current index of the data while iterating over it.
The chunkId is important to identify valid data later in the file.
typedef struct
{
int segmentId;
int segmentSize;
} SegmentIndex;
The next block of 16*16*16 data contains a timestamp, that's a number in the long format that should contain the elapsed seconds since Jan 01 1970.
My guess is that this value is used to control when a chunk has to be synchronized with the server.
Now that we have seen all the bookkeeping data the interesting part is ahead of us.
In the first block of data we found the ids of the chunks. All id's that where not -1 are now interesting for us because that are the valid ids in this data block.
Even if the id is not valid and therefore does not describe any data of our ship it is still saved so we need to skip over that somehow.
For every id we have the following data:
The first 8 bytes contain another timestamp.
The next 3x4 bytes (3x an Integer value that consists of 4 bytes) in the file we read is the x, y and z position of the chunk (This position seems to be invalid in the current savefiles build but can be retained by the segmentId and the position calculated in the first block).
After that we got 2 chars (each two bytes) for the chunk type and compressed size (they seem to be rather useless currently).
The last part are the 5094 bytes of compressed data that contain the blockdata.
Every block is described by 3 bytes (have a look at the links at the end for additional infos) so the inflated blockdata should have 3*16*16*16 = 12288 bytes every time.
The links at the end describe that the compressed data gets filled with 0's up to the 5094 bytes limit and that lets some questions open.
-How do we know that 5094 bytes is the maximum number of bytes that the compressed data can have (truly randomized data could not be compressed at all so here should be a pattern 5094 bytes)
-Why is it filled with 0's instead of saving the compressed size as an integer in front of the compressed block
(most probably to have the ability to jump to whatever blockdata we want to read without indexing and the ability to change the data without changing the starting point of data blocks behind the changed one)
So there are a lot of 16*16*16 byte blocks are used in the savefiles. That could be just a random size that fits because 8*8*8=512 seems to small and 32*32*32=32768 seems to big but i guess it has more to do with the 4k-sectors on hdds and read/write performance when reading consecutive data from the same sector.
I hope my introduction to savefiles was interesting (for all the readers that made it to the end
If there are errors in my description or something is unclear feel free to criticise constructively.
For further reading and kudos to the writers of the wiki:
https://starmadepedia.net/wiki/Blueprint_File_Formats
http://www.starmadewiki.com/wiki/File_format
https://github.com/StarMade/SMTools
----------------------------------------------------
For evereyone who is not tired yet:
Sorry, I cant stop to write so lets dive into the ingame data layout of the chunks and blocks
I am really curios about how the data is handled in the game. The blockdata is a 3 byte structure with 3 different definitions:
Type 1: Orientation[3] Active[1] Hit Points[9] Block ID[11]
Type 2: Orientation[4] Hit Points[9] Block ID[11]
Type 3: Orientation[5] Hit Points[8] Block ID[11]
(braces contain the number of bits)
That means there would be a waste when using a integer[4096] array to store the data (4 byte each integer, 1 byte would be lost or available for random data) but using bytes[4096*3] would require 2 array loockups to get the information for block id and hitpoints and probably a conversion to a more convenient type.
Also it would be a lot of wasted memory if only one block is placed in a chunk and a full array of whatever type is generated to hold that single block data.
Until about 340 blocks a hashmap would be a lot more economic choice with a access complexity of O(1) (high probability, not worst case) than a array of integers even with the overhead of references, hashes and Integer autoboxing. In ships with a lot of interior there might be many chunks with only a few blocks so this might save a lot of memory.
Many thanks to all the brave readers that made it until here (got a bit longer than i expected).