How to cure logic lagg

    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    Right now my assumption you are using what equates to an interpreter for logic.
    You could write a converter that creates a java file instead. Then use event handlers to deal with stuff like buttons being pressed and so on. You could better deal with timing issues and get rid of problems like you had with docked beam system creating massive lag.

    In short you would be creating a plugin that is stored with the ships file.

    This shouldn't be tremendous difficult. You obviously already have the parser and interpreter part written.
     

    Olxinos

    French fry. Caution: very salty!
    Joined
    May 7, 2015
    Messages
    151
    Reaction score
    88
    I don't think it's a good idea. In fact, I think it's a very bad one for the following reasons :
    - logic systems don't cause that much lag (except maybe if you want to build a 16+ bit calculator, but Starmade isn't supposed to be a computer simulator), there are far more important things to optimize first
    - this doesn't handle well the partial destruction of logic systems
    - this is overly complicated : you're suggesting to generate java files, compile them, and "link" to them while the game is still running... I'm not even sure that'd lead to speedups (I'm however sure that done badly it'd lead to horrible slowdowns, I'm talking about two orders of magnitude here)
    - you underestimate the difficulty of writing a correct code generation tool (even in such a simple case, the mere fact that they have to prevent feedback loops to cause infinite execution loops for instance makes it prone to bugs), and overestimate the compiler's optimization opportunities (sure, you could try to generate good and relatively easily optimizable code, but now it really becomes a non-trivial issue)

    In general, I think you make wrong assumptions about what is costly. For instance, since you mention "docked beams", I don't think the logic clocks were what caused lag in docked reactors... recurrent ship/beam collision checks were far more likely to blame. As for "timing issues", I'm not sure of what you're talking about so I'll grant you the benefit of the doubt.
     
    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    I don't think it's a good idea. In fact, I think it's a very bad one for the following reasons :
    - logic systems don't cause that much lag (except maybe if you want to build a 16+ bit calculator, but Starmade isn't supposed to be a computer simulator), there are far more important things to optimize first
    - this doesn't handle well the partial destruction of logic systems
    - this is overly complicated : you're suggesting to generate java files, compile them, and "link" to them while the game is still running... I'm not even sure that'd lead to speedups (I'm however sure that done badly it'd lead to horrible slowdowns, I'm talking about two orders of magnitude here)
    - you underestimate the difficulty of writing a correct code generation tool (even in such a simple case, the mere fact that they have to prevent feedback loops to cause infinite execution loops for instance makes it prone to bugs), and overestimate the compiler's optimization opportunities (sure, you could try to generate good and relatively easily optimizable code, but now it really becomes a non-trivial issue)

    In general, I think you make wrong assumptions about what is costly. For instance, since you mention "docked beams", I don't think the logic clocks were what caused lag in docked reactors... recurrent ship/beam collision checks were far more likely to blame. As for "timing issues", I'm not sure of what you're talking about so I'll grant you the benefit of the doubt.
    Apparently according to the devs it does create a lot of lag. It was the reason given for removing the beam transfer systems.
    The beam transfer itself is no different than any other weapon. But the logic to fire it continually creates lagg.
    They also mentioned they didn't want to create small logic systems so people wouldn't try to attach every light and door in the ship to logic because of the lagg it would create.

    Which is why I just posted how they could actually solve that issue.
    I've written an OS from scratch which included in writing an assembler and compiler to start with. In fact the book which inspired me to try is "Develop Your own 32-bit operating system".
     

    Nauvran

    Cake Build Server Official Button Presser
    Joined
    Jun 30, 2013
    Messages
    2,343
    Reaction score
    1,194
    • Master Builder Bronze
    • Competition Winner - Small Fleets
    • Legacy Citizen 10
    what kind of monstrosities are you making since you get lag from logic?
    With all the amazing logic creations I have seen on CBS (and NFDB) I have never had any real lag from the logic itself. if anything lagged it was the amount of outputs hitting an entity, entity collision checks, or the sheer size of the logic entity, not the logic systems themselves.
    But getting some of the people that do a lot of logic tinkering in here would probably be a good idea.
     
    • Like
    Reactions: GRHayes

    Valiant70

    That crazy cyborg
    Joined
    Oct 27, 2013
    Messages
    2,189
    Reaction score
    1,167
    • Thinking Positive
    • Purchased!
    • Legacy Citizen 4
    there are far more important things to optimize first
    ...Which is why this or something similar should be slated for somewhere down the line, not thrown out.

    this doesn't handle well the partial destruction of logic systems
    If each interlinked logic system is one file, just disable that file when one of the linked blocks is damaged. Cite power surges, etc. for scifi reasoning. Re-compile when the ship core reboots.

    - you underestimate the difficulty of writing a correct code generation tool (even in such a simple case, the mere fact that they have to prevent feedback loops to cause infinite execution loops for instance makes it prone to bugs), and overestimate the compiler's optimization opportunities (sure, you could try to generate good and relatively easily optimizable code, but now it really becomes a non-trivial issue)
    This is true, but depending on future logic features and the magnitude of the speedup, it might be worthwhile. In any case, running a single function with "IF this, that, and something else, THEN these things" is faster than the object-oriented way that I assume logic must use currently.

    - this is overly complicated : you're suggesting to generate java files, compile them, and "link" to them while the game is still running... I'm not even sure that'd lead to speedups (I'm however sure that done badly it'd lead to horrible slowdowns, I'm talking about two orders of magnitude here)
    It would increase speed when it matters and slow down when it doesn't matter as much. I'm not very familiar with Java's multithreading and so forth yet, but I imagine you could prioritize what matters and let the compilation code do its thing in the background during a ship's rebooting process.

    I'm not sure how logic is done currently, but I would wager it's pretty inefficient and too closely tied to the blocks themselves. There might be a better solution, but who knows.
     
    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    what kind of monstrosities are you making since you get lag from logic?
    With all the amazing logic creations I have seen on CBS (and NFDB) I have never had any real lag from the logic itself. if anything lagged it was the amount of outputs hitting an entity, entity collision checks, or the sheer size of the logic entity, not the logic systems themselves.
    But getting some of the people that do a lot of logic tinkering in here would probably be a good idea.
    Personally I haven't gotten any significant lagg from logic that has ever hindered me. That was the devs reason and from reading posts over the last 2 years I would say he is correct. It probably does cause a bit of lag if you have several ships with large amounts of power systems running on logic.
    [doublepost=1491840579,1491840002][/doublepost]
    ...Which is why this or something similar should be slated for somewhere down the line, not thrown out.


    If each interlinked logic system is one file, just disable that file when one of the linked blocks is damaged. Cite power surges, etc. for scifi reasoning. Re-compile when the ship core reboots.


    This is true, but depending on future logic features and the magnitude of the speedup, it might be worthwhile. In any case, running a single function with "IF this, that, and something else, THEN these things" is faster than the object-oriented way that I assume logic must use currently.


    It would increase speed when it matters and slow down when it doesn't matter as much. I'm not very familiar with Java's multithreading and so forth yet, but I imagine you could prioritize what matters and let the compilation code do its thing in the background during a ship's rebooting process.

    I'm not sure how logic is done currently, but I would wager it's pretty inefficient and too closely tied to the blocks themselves. There might be a better solution, but who knows.
    You are dead on Valiant70.
    As for how much of a difference it will make.
    The magnitude of speed up would be quite a bit. Right now they treat them as blocks so each time it has to go through a parser then interpreted. If you are running it as native code there is no need to go through a parser. Java's JIT compiler can be nearly as fast as C sometimes faster depending on the libraries used and what is being done.

    You are also correct it isn't something I am saying needs to be done immediately simply a suggestion for down the road.
     

    Olxinos

    French fry. Caution: very salty!
    Joined
    May 7, 2015
    Messages
    151
    Reaction score
    88
    Well, that's certainly impressive that you programmed an operating system and tool suite on your own. I get you're trying to say that "I've done it, and it wasn't that complicated", that however doesn't mean this is an efficient or overall good solution.

    If logic really is that costly for starmade (which I still think dubious, I'd appreciate a source), there are certainly simpler and/or better ways to optimize it. Worst comes to worst, they can use something like libjit (or another equivalent library) to achieve the same goal with probably less pitfalls by creating functions on the fly if Java JIT compiler isn't smart enough to do it... (that's the "correct way" of doing it)

    ...but even then I think it's overkill and this raises the issue of recompilation overheads whenever a block is added or destroyed.
    I mean, there aren't that many things which can be saved by compiling lazily. More on that in a second.


    ...Which is why this or something similar should be slated for somewhere down the line, not thrown out.
    Point taken. Let's assume optimizing logic will matter later.


    If each interlinked logic system is one file, just disable that file when one of the linked blocks is damaged. Cite power surges, etc. for scifi reasoning. Re-compile when the ship core reboots.
    I doubt people would like that. That'd also need a ship reboot whenever you add or remove a logic block. Sure you could say that when you build you use the old interpreter and recompiles when you exit buildmode, but then they'd have to maintain both systems. Not to mention that with that system if you lose a single door button and you made the error of having some kind of global lock system, BAM all yours doors are broken until you reboot.

    This is true, but depending on future logic features and the magnitude of the speedup, it might be worthwhile. In any case, running a single function with "IF this, that, and something else, THEN these things" is faster than the object-oriented way that I assume logic must use currently.
    The problem is here. I have a good idea about the magnitude of the speedup because I already implemented similar algorithms and optimizations myself... and this one is not worth the trouble.

    It would increase speed when it matters and slow down when it doesn't matter as much. I'm not very familiar with Java's multithreading and so forth yet, but I imagine you could prioritize what matters and let the compilation code do its thing in the background during a ship's rebooting process.
    Well, if we don't assume the logic simply shutdowns until reboot whenever it's hit like you did before, it's actually the opposite : you get speedups when it doesn't matter and slowdowns when it does. Think about it, the efficiency loss comes from recompilations, recompilations would be caused by changes to the logic, those changes are likely to happen in 2 situations: building, or fighting and fighting is probably the situation where you put the most stress on Starmade's engine (projectile/ship collisions, block updates, docked entities flying off...).
    If you shutdown the logic upon a hit, sure, there won't be efficiency problems : you've turned off logic. But in that case, you don't actually need to optimize it...

    I'm not sure how logic is done currently, but I would wager it's pretty inefficient and too closely tied to the blocks themselves. There might be a better solution, but who knows.
    That might be right, I didn't read that part of their code so I don't know. But to be frank, if it's not optimized, I don't think it's because they don't know to do it, but because they don't need to do it.

    [...]As for how much of a difference it will make.
    The magnitude of speed up would be quite a bit. Right now they treat them as blocks so each time it has to go through a parser then interpreted. If you are running it as native code there is no need to go through a parser. Java's JIT compiler can be nearly as fast as C sometimes faster depending on the libraries used and what is being done.
    You're misusing "parser", a parser is tool which analyzes a sequence of tokens (and usually outputs an abstract syntax tree), you're right in that they're intepreting logic formulas but that doesn't necessarily make it especially slow, depends on the interpreter and on what is interpreted. I mean, in some sense, Starmade is interpreted too by the JRE.
    As for the speedup, "quite a bit" would certainly be "twice as fast" (well of course, if you also add other optimisations in the mix and depending on what the current state of the code is, you might gain more, but precompiling alone shouldn't get you much more than that). There aren't that much thing to save by pre-compiling actually, mostly branch mispredictions caused by the logic operator selection (those are a bit expensive indeed, but that's about all you can save).
    Now if you factor in the already small time used to compute logic states, you'll probably get a 1 to 2% faster game ? Without factoring in the overheads introduced by recompilations.

    Seriously, if you think logic is that slow, here's an example of extra dirty toy-interpreter which interprets "random code". I intentionally "deoptimized it" (random permutation to cause cache misses, casts from int to bool and vice-versa which will cause comparison to zero to normalize booleans, lazy operators where bitwise one would have worked and certainly be more efficient, an indirection for variables too...). Sure, this computes garbage, but this is close enough to a simple bad logic interpreter to be reprentative imho. I doubt Starmade's system is slower than that (if it is, I'm worried for them).
    Code:
    #include <iostream>
    #include <vector>
    #include <algorithm>
    #include <cassert>
    #include <chrono>
    #include <random>
    
    using instr_t = unsigned int;
    using addr_t = unsigned int;
    using size_t = std::size_t;
    using boolean_t = bool;
    
    using namespace std::chrono;
    using clk_t = high_resolution_clock;
    
    constexpr size_t OpCodeBits = 3;
    constexpr instr_t OpCodeMask = instr_t(~(~instr_t(0) << OpCodeBits));
    constexpr size_t AddrBits = 15;
    constexpr size_t EnvSize = 1 << AddrBits;
    
    enum op_t : instr_t {
        Mov,
        And,
        Or,
        Not,
        Xor,
        Nand,
        Nor,
        Nxor,
        LENGTH_
    };
    
    static_assert(LENGTH_ == 1 << OpCodeBits, "missing opcodes");
    
    int main(int argc, char* argv[]) {
        std::random_device seeder;
        std::default_random_engine rng(seeder());
        std::uniform_int_distribution<instr_t> dist;
      
        if(argc != 2) {
            std::cerr << "usage ./a.out n" << std::endl;
            return 1;
        }
        const size_t n = atol(argv[1]);
        std::vector<instr_t> random_code(n);
        for(instr_t& instruction : random_code)
            instruction = dist(rng);
        std::vector<std::size_t> permutation(n);
        for(size_t i = 0 ; i < n ; ++i)
            permutation[i] = i;
        std::shuffle(std::begin(permutation), std::end(permutation), rng);
        std::vector<boolean_t> environment(EnvSize);
        boolean_t eax;
      
        const auto start = clk_t::now();
        for(size_t idx : permutation) {
            const instr_t instruction = random_code[idx];
            const op_t opcode = static_cast<op_t>(instruction & OpCodeMask);
            const addr_t address[2] =
                {static_cast<addr_t>((random_code[idx] >> OpCodeBits) % EnvSize)
                ,static_cast<addr_t>
                    ((random_code[idx] >> (OpCodeBits + AddrBits)) % EnvSize)
                };
            std::vector<boolean_t>::reference op1 = environment[address[0]];
            const auto op2 = environment[address[1]];
            switch(opcode) {
                case Mov:
                    op1 = eax;
                    break;
                case And:
                    eax = op1 && op2;
                    break;
                case Or:
                    eax = op1 || op2;
                    break;
                case Not:
                    eax = !op1;
                    break;
                case Xor:
                    eax = (op1 && !op2) || (!op1 && op2);
                    break;
                case Nand:
                    eax = !(op1 && op2);
                    break;
                case Nor:
                    eax = !(op1 || op2);
                    break;
                case Nxor:
                    eax = (op1 && op2) || (!op1 && !op2);
                    break;
                default:
                    assert(false);
            }
        }
        const auto end = clk_t::now();
        auto us = duration_cast<microseconds>(end - start).count();
        std::cout << "Took " << (us / int(1e6)) << 's';
        us %= int(1e6);
        std::cout << (us / int(1e3)) << "ms";
        us %= int(1e3);
        std::cout << us << "us\n";
        std::cout << std::boolalpha << eax << std::endl;
        return 0;
    }
    Now, let's feed him a big enough n so that my permutation causes cache misses everywhere (I even compiled it without optimisations, clang 3.9):
    Code:
    user$ ./a.out 10
    Took 0s0ms1us
    true
    No...

    Code:
    user$ ./a.out 10000
    Took 0s0ms544us
    false
    user$ ./a.out 100000
    Took 0s5ms531us
    false
    Still not there yet...


    Code:
    user$ ./a.out 1000000
    Took 0s107ms451us
    true
    Ah?

    Code:
    user$ ./a.out 10000000
    Took 1s578ms938us
    false
    user$ ./a.out 100000000
    Took 17s509ms72us
    false
    Ah! Okay, so let's assume this is linear (reasonable enough), that gives us about 200 nanoseconds for one gate. I'd say most ships have about 2000-gate-evaluations circuits tops and will reevaluate their circuits about 10 times per second if they're using some fast rail clock. Let's say there are 25 such ships on the server. That means that each second, with those (pessimistic) numbers, Starmade would need to allocate 100ms to logic. That's a lot sure, but that's still not enough to warrant such a weird optimisation. I mean, in that case you'd only get a 5% speedup overall...

    You are also correct it isn't something I am saying needs to be done immediately simply a suggestion for down the road.
    Ok with that.
     
    Last edited:

    Ithirahad

    Arana'Aethi
    Joined
    Nov 14, 2013
    Messages
    4,150
    Reaction score
    1,329
    • Purchased!
    • Top Forum Contributor
    • Legacy Citizen 8
    So, it has to recompile and re-plug in code every time someone accidentally deletes and replaces a logic block? o_O
     
    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    Well, that's certainly impressive that you programmed an operating system and tool suite on your own. I get you're trying to say that "I've done it, and it wasn't that complicated", that however doesn't mean this is an efficient or overall good solution.

    If logic really is that costly for starmade (which I still think dubious, I'd appreciate a source), there are certainly simpler and/or better ways to optimize it. Worst comes to worst, they can use something like libjit (or another equivalent library) to achieve the same goal with probably less pitfalls by creating functions on the fly if Java JIT compiler isn't smart enough to do it... (that's the "correct way" of doing it)

    ...but even then I think it's overkill and this raises the issue of recompilation overheads whenever a block is added or destroyed.
    I mean, there aren't that many things which can be saved by compiling lazily. More on that in a second.



    Point taken. Let's assume optimizing logic will matter later.



    I doubt people would like that. That'd also need a ship reboot whenever you add or remove a logic block. Sure you could say that when you build you use the old interpreter and recompiles when you exit buildmode, but then they'd have to maintain both systems. Not to mention that with that system if you lose a single door button and you made the error of having some kind of global lock system, BAM all yours doors are broken until you reboot.


    The problem is here. I have a good idea about the magnitude of the speedup because I already implemented similar algorithms and optimizations myself... and this one is not worth the trouble.


    Well, if we don't assume the logic simply shutdowns until reboot whenever it's hit like you did before, it's actually the opposite : you get speedups when it doesn't matter and slowdowns when it does. Think about it, the efficiency loss comes from recompilations, recompilations would be caused by changes to the logic, those changes are likely to happen in 2 situations: building, or fighting and fighting is probably the situation where you put the most stress on Starmade's engine (projectile/ship collisions, block updates, docked entities flying off...).
    If you shutdown the logic upon a hit, sure, there won't be efficiency problems : you've turned off logic. But in that case, you don't actually need to optimize it...


    That might be right, I didn't read that part of their code so I don't know. But to be frank, if it's not optimized, I don't think it's because they don't know to do it, but because they don't need to do it.


    You're misusing "parser", a parser is tool which analyzes a sequence of tokens (and usually outputs an abstract syntax tree), you're right in that they're intepreting logic formulas but that doesn't necessarily make it especially slow, depends on the interpreter and on what is interpreted. I mean, in some sense, Starmade is interpreted too by the JRE.
    As for the speedup, "quite a bit" would certainly be "twice as fast" (well of course, if you also add other optimisations in the mix and depending on what the current state of the code is, you might gain more, but precompiling alone shouldn't get you much more than that). There aren't that much thing to save by pre-compiling actually, mostly branch mispredictions caused by the logic operator selection (those are a bit expensive indeed, but that's about all you can save).
    Now if you factor in the already small time used to compute logic states, you'll probably get a 1 to 2% faster game ? Without factoring in the overheads introduced by recompilations.

    Seriously, if you think logic is that slow, here's an example of extra dirty toy-interpreter which interprets "random code". I intentionally "deoptimized it" (random permutation to cause cache misses, casts from int to bool and vice-versa which will cause comparison to zero to normalize booleans, lazy operators where bitwise one would have worked and certainly be more efficient, an indirection for variables too...). Sure, this computes garbage, but this is close enough to a simple bad logic interpreter to be reprentative imho. I doubt Starmade's system is slower than that (if it is, I'm worried for them).
    Code:
    #include <iostream>
    #include <vector>
    #include <algorithm>
    #include <cassert>
    #include <chrono>
    #include <random>
    
    using instr_t = unsigned int;
    using addr_t = unsigned int;
    using size_t = std::size_t;
    using boolean_t = bool;
    
    using namespace std::chrono;
    using clk_t = high_resolution_clock;
    
    constexpr size_t OpCodeBits = 3;
    constexpr instr_t OpCodeMask = instr_t(~(~instr_t(0) << OpCodeBits));
    constexpr size_t AddrBits = 15;
    constexpr size_t EnvSize = 1 << AddrBits;
    
    enum op_t : instr_t {
        Mov,
        And,
        Or,
        Not,
        Xor,
        Nand,
        Nor,
        Nxor,
        LENGTH_
    };
    
    static_assert(LENGTH_ == 1 << OpCodeBits, "missing opcodes");
    
    int main(int argc, char* argv[]) {
        std::random_device seeder;
        std::default_random_engine rng(seeder());
        std::uniform_int_distribution<instr_t> dist;
     
        if(argc != 2) {
            std::cerr << "usage ./a.out n" << std::endl;
            return 1;
        }
        const size_t n = atol(argv[1]);
        std::vector<instr_t> random_code(n);
        for(instr_t& instruction : random_code)
            instruction = dist(rng);
        std::vector<std::size_t> permutation(n);
        for(size_t i = 0 ; i < n ; ++i)
            permutation[i] = i;
        std::shuffle(std::begin(permutation), std::end(permutation), rng);
        std::vector<boolean_t> environment(EnvSize);
        boolean_t eax;
     
        const auto start = clk_t::now();
        for(size_t idx : permutation) {
            const instr_t instruction = random_code[idx];
            const op_t opcode = static_cast<op_t>(instruction & OpCodeMask);
            const addr_t address[2] =
                {static_cast<addr_t>((random_code[idx] >> OpCodeBits) % EnvSize)
                ,static_cast<addr_t>
                    ((random_code[idx] >> (OpCodeBits + AddrBits)) % EnvSize)
                };
            std::vector<boolean_t>::reference op1 = environment[address[0]];
            const auto op2 = environment[address[1]];
            switch(opcode) {
                case Mov:
                    op1 = eax;
                    break;
                case And:
                    eax = op1 && op2;
                    break;
                case Or:
                    eax = op1 || op2;
                    break;
                case Not:
                    eax = !op1;
                    break;
                case Xor:
                    eax = (op1 && !op2) || (!op1 && op2);
                    break;
                case Nand:
                    eax = !(op1 && op2);
                    break;
                case Nor:
                    eax = !(op1 || op2);
                    break;
                case Nxor:
                    eax = (op1 && op2) || (!op1 && !op2);
                    break;
                default:
                    assert(false);
            }
        }
        const auto end = clk_t::now();
        auto us = duration_cast<microseconds>(end - start).count();
        std::cout << "Took " << (us / int(1e6)) << 's';
        us %= int(1e6);
        std::cout << (us / int(1e3)) << "ms";
        us %= int(1e3);
        std::cout << us << "us\n";
        std::cout << std::boolalpha << eax << std::endl;
        return 0;
    }
    Now, let's feed him a big enough n so that my permutation causes cache misses everywhere (I even compiled it without optimisations, clang 3.9):
    Code:
    user$ ./a.out 10
    Took 0s0ms1us
    true
    No...

    Code:
    user$ ./a.out 10000
    Took 0s0ms544us
    false
    user$ ./a.out 100000
    Took 0s5ms531us
    false
    Still not there yet...


    Code:
    user$ ./a.out 1000000
    Took 0s107ms451us
    true
    Ah?

    Code:
    user$ ./a.out 10000000
    Took 1s578ms938us
    false
    user$ ./a.out 100000000
    Took 17s509ms72us
    false
    Ah! Okay, so let's assume this is linear (reasonable enough), that gives us about 200 nanoseconds for one gate. I'd say most ships have about 2000-gate-evaluations circuits tops and will reevaluate their circuits about 10 times per second if they're using some fast rail clock. Let's say there are 25 such ships on the server. That means that each second, with those (pessimistic) numbers, Starmade would need to allocate 100ms to logic. That's a lot sure, but that's still not enough to warrant such a weird optimisation. I mean, in that case you'd only get a 5% speedup overall...


    Ok with that.
    Criss brings up the issue here. he uses this term. Denser logic = performance impact.
    Starmade just got 0wned...
    DukeofRealms specifically lists docked reactors in our discussion here as not being performance friendly
    No kidding the power system is broke just realized why!
    Those are the two most recent. You can go back through the posts over the last year you should be able to find more.
    But as you can see they may use another term other than lagg.
    The issue with docked reactors started getting more negative after I put a bug report on them missing a tick in the firing function and power or shield transfer.
    Here is where lancake and I found that bug. Can the devs give us an official ruling on PS transfer Not sure the bug report.

    When you go through logic you still need to combine and separate functions. Even in ladder logic circuits compilers we have written parser in them.

    Java is actually designed to handle dynamic changes at run time. It is one of its greatest benefits. It doesn't compile the code to machine language till just before using it. You can change it on the fly.
    But yes you will probably see some performance loss primarily because it will be pulling from outside of cache to grab that new code and change. That should still be really small compared to the interpreting speed. But I could be wrong and java's biggest asset supposedly could be a lie.

    As for performance difference I think you will be surprised.
    Compiler vs. Interpreter
    They don't really give a figure on the site but. They have it correct in there are essentially three types of programs when looking at this pure interpreters like basic or what the logic on here is being handled by, Then compilers like C and C++ and finally ones like Java which are a cross between the two. You can't call Java an interpreter because it actually uses pre-compiled code to run each program. It doesn't simply interpret it and handle running the code under its own program.

    The speed difference can be a lot more than simply doubling the effective speed try 10 or even 50 times faster. But that depends on how well the interpreter is written and how well the compiler is written. Well in this case you really aren't writing a compiler more of a cross compiler switching from one language to another. The actual compiler that Java uses we know is extremely well written. Now consider on minecraft people have used redstone to build 16 bit computers. That is thousands of times more logic than probably ever been put on any of these servers. That I would say is either a very very well performing interpreter in comparison to this or they have swapped like I am cross compiled it to java to get that performance gain. No matter how you slice it there is that much more performance that can be gotten out of logic under java.
     
    Last edited:

    Olxinos

    French fry. Caution: very salty!
    Joined
    May 7, 2015
    Messages
    151
    Reaction score
    88
    Criss brings up the issue here. he uses this term. Denser logic = performance impact.
    Starmade just got 0wned...
    DukeofRealms specifically lists docked reactors in our discussion here as not being performance friendly
    No kidding the power system is broke just realized why!
    Those are the two most recent. You can go back through the posts over the last year you should be able to find more.
    But as you can see they may use another term other than lagg.
    The issue with docked reactors started getting more negative after I put a bug report on them missing a tick in the firing function and power or shield transfer.
    Here is where lancake and I found that bug. Can the devs give us an official ruling on PS transfer Not sure the bug report.
    Ok for Criss' quote (although I think he's referring to some players' wishes to have something as powerful as arbitrary lua code in blocks), Duke's quote however doesn't mention logic (the cause of their inefficiency might as well be something else). I'll look for older quotes later.

    When you go through logic you still need to combine and separate functions. Even in ladder logic circuits compilers we have written parser in them.
    I don't get what you're trying to say here. Sure, if you write a circuit description language you'll need a parser to translate it into some kind of internal graph representation (even if it's just some kind of "ascii art description" which actually is more annoying to do than some simili-C language for circuit description because it may have intuitive semantics but hard ones to formally define). I still don't see how it applies to Starmade : they might also need to build some kind of logic graph representation internally, but it's not built by a parser (in fact their linking system certainly gives them the graph representation "for free").

    Java is actually designed to handle dynamic changes at run time. It is one of its greatest benefits. It doesn't compile the code to machine language till just before using it. You can change it on the fly.
    But yes you will probably see some performance loss primarily because it will be pulling from outside of cache to grab that new code and change. That should still be really small compared to the interpreting speed. But I could be wrong and java's biggest asset supposedly could be a lie.
    I don't know what you think about when you're speaking of "dynamic changes at runtime", if you're thinking about some kind of "hotlinking" this is neither specific to Java nor one if its greatest assets. If you're talking about JIT compilation, this is more of an argument against your proposal since you're essentially trying to bypass java's JIT-compiler to perform some naive JIT-compilation which you think will give you juicy benefits. I don't get why you're telling me the second part though (the one about cache and performance), maybe you misunderstood something I said since I never said anything about java behaviour with cache or overall efficiency.

    As for performance difference I think you will be surprised.
    Compiler vs. Interpreter
    They don't really give a figure on the site but. They have it correct in there are essentially three types of programs when looking at this pure interpreters like basic or what the logic on here is being handled by, Then compilers like C and C++ and finally ones like Java which are a cross between the two. You can't call Java an interpreter because it actually uses pre-compiled code to run each program. It doesn't simply interpret it and handle running the code under its own program.
    I know the difference between a compiler and interpreter (and the link you gave isn't extremely informative, it's pretty obvious stuff). I also know compiling a program can lead to significant speedups, but not on such a simple problem. A compiler isn't a magical program which squeezes out every last drop of inefficiency of a program for free, a compiler is a program which tries its best at efficiently translating a high-level language into a lower-level language and usually does a pretty good job, but it's nothing else than a clever translator. You can't expect significant improvements by taking such a simple language, mapping it into a very high level language (Java) and mapping it back to a low-level language (java bytecode) which will then be JIT-compiled to machine code, even if the java compiler is maintained by a group of experts… that sounds as absurd to me as "optimizing" an algorithm for a P problem by reducing it to an equivalent SAT instance and feeding it to a SAT solver (yes, if you had a very bad algorithm you may see improvements, there are a lot of expertly written SAT solvers out there after all, but still… wtf).
    The JRE is an interpreter in some aspects : it translates a higher level language (java bytecode) to a lower level language (machine code) on the fly and executes it. Sure, it does it in a clever way and it'd be more accurate to talk about JIT-compilation, but a JIT-compiler is still an clever and glorified interpreter.

    The speed difference can be a lot more than simply doubling the effective speed try 10 or even 50 times faster. But that depends on how well the interpreter is written and how well the compiler is written. Well in this case you really aren't writing a compiler more of a cross compiler switching from one language to another. The actual compiler that Java uses we know is extremely well written. Now consider on minecraft people have used redstone to build 16 bit computers. That is thousands of times more logic than probably ever been put on any of these servers. That I would say is either a very very well performing interpreter in comparison to this or they have swapped like I am cross compiled it to java to get that performance gain. No matter how you slice it there is that much more performance that can be gotten out of logic under java.
    Ok, maybe I was a bit stingy when I said twice as fast. Maybe it'll be something like thrice as fast if you have a good special-purpose logic formula JIT-compiler, but you'll never reach speedups as great as 10x or 50x for pre-compiling something as simple as that. You can certainly get those speedups in some cases, for instance when you compare naively interpreted very high level languages with compiled ones (and even then sometimes there's just nothing to do), but Starmade isn't interpreting a high level language here, it runs an extension of the circuit value problem.
    As I said, compilers aren't magic : if there isn't a lot of room for improvement, you won't get a lot of improvement, period.
    In that case, the main improvement which could only be achieved by pre-compilation is cutting some conditional structures (if/switch/whatever mechanism they use for dynamic call dispatch depending on how it's made), which certainly isn't negligible but isn't enough to give you such incredible speedups. In fact if Java JIT-compiler is clever enough it might even be able to optimize most of those away already.
    There aren't many other things to do, you can't really rewrite the formulas because Starmade needs all intermediate results to update the blocks' textures (and this wouldn't require such a complicated process anyway), maybe you can avoid pushing/poping a couple bytes on and off the stack… maybe you could inline a couple calls too… but that alone won't give you significant improvements, not to mention similar improvements may actually be performed to a lesser extent by java's JIT-compiler (I don't know the specifics so I can't be sure, but it could be done).
    And, again, you're overestimating the performance hit of simulating logic circuits. You don't need such a weird "optimisation" to simulate a 16-bit computer… heck I remember a CS project where I had to write the netlist of a minimalist 32bit processor + an interpreter parsing and simulating a netlist file (allowing for some basic I/O) + a simple assembly program running on that processor (some kind of watch program outputing signals for an arrangement of 7segments displays of our choice), I botched it in 2 days (and nights, let's be honest) at the very last moment and yet it still ran smoothly. (to be fair, theoretically we were supposed to do it in groups of two to four… but obviously we were all so busy with our other projects and exams that I ended up doing it alone except for the assembly program that I asked another member to do, so I haven't really done everything alone… I still had to debug the assembly program afterwards though x'])
    Anyway, here's the point: a licence ("undergraduate" in the english system I guess) student can do in 2 days a netlist simulator able to smoothly simulate a minimalist 32bit processor (granted with a pretty low clock speed, something like 100kHz~10MHz but this is largely enough for games like Starmade or Minecraft). You don't need an extremely optimized code to do that.
    You're also misusing "cross-compilation" (cross-compilation is compiling for another platform/architecture than the one you're currently compiling on) but I get what you mean, and I highly doubt Minecraft is using that kind of hackery.

    I don't say logic mustn't be optimized, but not now and certainly not that way. The speedup wouldn't be nearly as high as you'd imagine, this would be a hell to maintain, and it'd cause other annoying issues which aren't worth the trouble. If the time comes where logic is so time-consuming it absolutely needs to be optimized, there are certainly other options to consider first which will give better benefits for less effort.
     
    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    I appreciate your time and effort replying. So please understand when I say this. The following statement isn't about anything you said or done.
    I won't be offering this company or group any further information. If they want information on programming from this point forward they can hire someone, look it up by a book or simply try it and find out. It is no longer my concern. I'm not going to go into why that is or point fingers.
    I didn't want to simply not reply. Best of luck.
     

    Criss

    Social Media Director
    Joined
    Jun 25, 2013
    Messages
    2,187
    Reaction score
    1,772
    • Master Builder Bronze
    • Video Genius
    • Competition Winner - Stations
    Denser logic = performance impact.
    Hello. I want to clear something up. I will preface this by saying I am not knowledgeable on the technical side of things. I couldn't tell you the first thing about how our game works. So I will not be getting into specifics.

    With the suggestion to include denser logic, there often follows suggestions for denser detailing and block sizes. Logic on it's own is not really performance heavy. We have optimized it quite a bit over time, and structures that took minutes to prepare logic systems now are ready to work immediately after loading.

    But as I said, these suggestions often come with denser everything-else. Players want smaller blocks for detailing like in Skywanderers. Currently, Saber is not willing to create furniture or the myriad of assets for those details as they are not a high priority.

    We aim to be consistent. If we have small logic blocks, why not small detailing blocks, smaller rails, smaller doors, etc. It would be great, sure, but as it currently stands, the game is not built for that. StarMade can render millions of blocks at the high end, but if we are condensing shapes, our smaller ships will have the performance impact of a kilometer long hull. Players will detail their builds to that degree. I would!

    If there is a solution for condensed logic, I would like to work with it but that is likely as far as we will go when it comes to smaller shapes.
     
    Joined
    Sep 30, 2013
    Messages
    23
    Reaction score
    6
    I do agree with this, making a java compiler would make the logic system much more streamlined and cut a great portion of lag created by complex interconnected systems.

    On the issue of combat damage, if the compiled logic was attached to a file inside the ships blueprints with metadata that links it to the logic blocks, then the system could stop running the compiled version if any one of the blocks become damaged.

    However, this would become problematic if you made a highly complex system to run smaller logic groups elsewhere on your ship. If one system goes down, all systems would go down.
    So it is a tough call on how that should be handled.
    You could add a tag to each of the functions in the java script, something along the lines of; If(2,3,0= 1)and(2,3,1!=damaged) But that might cause just as much lag as the current system as that doubles or even triples the number of IF statements, especially in the case of xnor xnand
     
    Last edited:
    Joined
    Jul 21, 2013
    Messages
    2,932
    Reaction score
    460
    • Hardware Store
    How exactly will the compiled code work? Will it[at its core] be a function that takes the current state of all gates in a circuit, and returns the subsequent state of all gates in that circuit?
    If so, it will likely create more lag than the current system, especially when idle[assuming the circuit does not change, so we can ignore any compilation overhead]. Let me explain why I believe so.
    The current system[if it was not changed since my last experiments] is event based:
    When a logic block changes state[either due to a manual change, or due to input], it sends an update event to all blocks linked to it.
    When a (non-delay) logic block recieves an update event, it evaluates its inputs, setting its state accordingly. If this results in its state changing, it will send an update event to all blocks linked to it.
    When a delay block recieves an update event, it waits 0.5 seconds, and then evaluates its state according to the state of its inputs after the time has passed. It is to note that when it changes state, the update event it changes will NOT reference the update event that caused it to evaluate its input. This is to allow them to avoid the recursion limit that prevents delayless unstable looped circuits from causing full load.​
    This has the advantage, then when no state changes occur, there is absolutely no load to run the logic[rendering it is a different story, but as rendering is not addressed here we can ignore it]. And when a state change does occur, only the blocks that would potentially change state are evaluated, reducing the number of unnecessary evaluations drastically.
    Unlike the compiled version, which would evaluate everything every X ms[assuming a tick based system].
    The use of an event based system as described can be verified experimentally by checking for edge cases and race conditions it will have.
    E.g. the following circuit:
    A1→AND1→NOT1→NOT2
    A2→AND1
    Set it up such that in the initial state, all gates and activators are off. [If placed NOTs default to ON on placement, connect them to an activator, activate said activator. Then hook the NOTs up, and destroy the activator without deactivating it.(last I checked destroyed blocks won't send an update OFF event)]
    Now activate A1. If the system is event based as I described, AND1 won't change state, and thus the event won't ripple down further.
    Now activate A2. If the system is event based as I described, AND1 will change state, thus updating NOT1. As the first not was already OFF however, it won't change state, leaving NOT2 unupdated at OFF.

    Secondly a race condition:
    A1→NOT1→AND1→FLIPFLOP1
    A1→AND1
    NOT1→AND2→FLIPFLOP2
    A1→A2→A3→AND2
    Set it up such that in the initial state, all gates and activators, except for the NOT, are OFF.
    Activate A1. If the system is event based as I described, the following will happen:
    1. NOT1(A1,ON), AND1(A1,ON), A2(A1,ON) will recieve update events. regardless in what order they are evaluated in, AND1 will be evaluated before the update event from NOT1 reaches it, thus all inputs for AND1 will breifly be ON. This will take much less than a frame, so the only way to really notice is by intriducing a FLIPFLOP.
    2. AND1(NOT1,OFF), FLIPFLOP1(AND1,ON), AND2(NOT1,OFF), A3(A2,ON) will recieve update events. This will cause AND1 to turn off again, while the flipflop changes state.
    3. FLIPFLOP1(AND1,OFF), AND2(A3,ON) will recieve update events. This will cause no further state changes.
    Now deactivate A1. If the system is event based as I described, the following will happen:
    1. NOT1(A1,OFF), AND1(A1,OFF), A2(A1,OFF) will recieve update events, and thus change state accordingly, if necessary. AND1 is already off, so no event will be fired from it.
    2. AND1(NOT1,ON), A3(A2,OFF), AND2(NOT1,ON) will recieve update events. As A3's state was ON, all inputs for AND2 will briefly be ON, as described for AND1 after A1 was activated.
    3. AND2(A3,OFF), FLIPFLOP2(AND2,ON) will recieve update events. AND2 will this turn OFF, and FLIPFLOP2 will change state.
    4. FLIPFLOP2(AND2,OFF) will recieve update events, and change state accordingly.
    You should be able to construct more edgecases and raceconditions. Granted, not all are unique to the event based system I described, but the more tests you run, the more you will be able to narrow it down.
     
    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    How exactly will the compiled code work? Will it[at its core] be a function that takes the current state of all gates in a circuit, and returns the subsequent state of all gates in that circuit?
    If so, it will likely create more lag than the current system, especially when idle[assuming the circuit does not change, so we can ignore any compilation overhead]. Let me explain why I believe so.
    The current system[if it was not changed since my last experiments] is event based:
    When a logic block changes state[either due to a manual change, or due to input], it sends an update event to all blocks linked to it.
    When a (non-delay) logic block recieves an update event, it evaluates its inputs, setting its state accordingly. If this results in its state changing, it will send an update event to all blocks linked to it.
    When a delay block recieves an update event, it waits 0.5 seconds, and then evaluates its state according to the state of its inputs after the time has passed. It is to note that when it changes state, the update event it changes will NOT reference the update event that caused it to evaluate its input. This is to allow them to avoid the recursion limit that prevents delayless unstable looped circuits from causing full load.​
    This has the advantage, then when no state changes occur, there is absolutely no load to run the logic[rendering it is a different story, but as rendering is not addressed here we can ignore it]. And when a state change does occur, only the blocks that would potentially change state are evaluated, reducing the number of unnecessary evaluations drastically.
    Unlike the compiled version, which would evaluate everything every X ms[assuming a tick based system].
    The use of an event based system as described can be verified experimentally by checking for edge cases and race conditions it will have.
    E.g. the following circuit:
    A1→AND1→NOT1→NOT2
    A2→AND1
    Set it up such that in the initial state, all gates and activators are off. [If placed NOTs default to ON on placement, connect them to an activator, activate said activator. Then hook the NOTs up, and destroy the activator without deactivating it.(last I checked destroyed blocks won't send an update OFF event)]
    Now activate A1. If the system is event based as I described, AND1 won't change state, and thus the event won't ripple down further.
    Now activate A2. If the system is event based as I described, AND1 will change state, thus updating NOT1. As the first not was already OFF however, it won't change state, leaving NOT2 unupdated at OFF.

    Secondly a race condition:
    A1→NOT1→AND1→FLIPFLOP1
    A1→AND1
    NOT1→AND2→FLIPFLOP2
    A1→A2→A3→AND2
    Set it up such that in the initial state, all gates and activators, except for the NOT, are OFF.
    Activate A1. If the system is event based as I described, the following will happen:
    1. NOT1(A1,ON), AND1(A1,ON), A2(A1,ON) will recieve update events. regardless in what order they are evaluated in, AND1 will be evaluated before the update event from NOT1 reaches it, thus all inputs for AND1 will breifly be ON. This will take much less than a frame, so the only way to really notice is by intriducing a FLIPFLOP.
    2. AND1(NOT1,OFF), FLIPFLOP1(AND1,ON), AND2(NOT1,OFF), A3(A2,ON) will recieve update events. This will cause AND1 to turn off again, while the flipflop changes state.
    3. FLIPFLOP1(AND1,OFF), AND2(A3,ON) will recieve update events. This will cause no further state changes.
    Now deactivate A1. If the system is event based as I described, the following will happen:
    1. NOT1(A1,OFF), AND1(A1,OFF), A2(A1,OFF) will recieve update events, and thus change state accordingly, if necessary. AND1 is already off, so no event will be fired from it.
    2. AND1(NOT1,ON), A3(A2,OFF), AND2(NOT1,ON) will recieve update events. As A3's state was ON, all inputs for AND2 will briefly be ON, as described for AND1 after A1 was activated.
    3. AND2(A3,OFF), FLIPFLOP2(AND2,ON) will recieve update events. AND2 will this turn OFF, and FLIPFLOP2 will change state.
    4. FLIPFLOP2(AND2,OFF) will recieve update events, and change state accordingly.
    You should be able to construct more edgecases and raceconditions. Granted, not all are unique to the event based system I described, but the more tests you run, the more you will be able to narrow it down.
    I'm not going to provide new information but since I already answered this concern I will elaborate a little.
    Remember in the original post where I mentioned the event handler for being used to handle stuff like when a button is pressed. Well you use that same system for state changes to blocks. By using an event handler if nothing changes then no code is run. It is basically the same thing as an event handler in a game for keyboard. If no one pushes a button, no one releases a button, no one breaks a block, .... No code related to an event is ran. You use a timer to determine your 0.5 seconds if that is the interval you want to use. And each loop you check if last time vs current time that is done by looking at CPU ticks. Once the difference exceeds 0.5 seconds then you run an update of the states. For blocks that have a no delay this would simply update when the logic leading to it does. There is always a state block prior to no delay blocks.

    An event based system has to exist for both compiled and script based systems. Both require it. The difference is when a compiled system reaches an event it runs its own code when a script system gets an event it mimics the script on its own program.
    I posted this link above Compiler vs. Interpreter
    It shows the difference in how the compilers, interpreters, and byte code systems work.
    The big thing it seems to miss out on is how these programs run on a cpu.
    A compiled program can make heavy use of L1 and L2 cache java while a byte code system still is a compiler. Byte code is translated to machine code and can depending on the program and optimization run as fast as C. In short it can make heavy use of L1 and L2 cache.
    An interpreter makes very poor use of L1 and L2 because they generate cache misses because they are pulling from outside cache for the program code and even if they could pull script entirely into the cache the program is more complex because it is doing the interpreting part on top of trying to run the simulated functions of the script.

    If you wonder how much difference a program makes running optimized in L1 and L2 vs not. I have this program at the point now that it generates 40,000 rooms and connects them all together with paths. With it not being optimized it takes about 7 minutes to run. Optimized it takes 0.98 seconds on average to run. That is a speed improvement of 420 times. Are you going to get a speed improvement that large on here probably not but I wouldn't be surprised to see 50 or more.

    As for my program that takes 7 minutes cut down to less than a second. Understand the 7 minute program is massively optimized to start. Functions that generally increase at an exponential rate such as the number of paths to check, the room potential positions and collision detection all have been optimized to where I reduced them to a fixed length linear. Inline systems are already specified and so on. Before I did that it would take hours to complete. At one point when I was using pure random room space selection it would only get to 30% capacity and stop progressing because of the number of collisions. The reason I am discussing this is so you get how the program was by the point I moved it to the optimization of running it in the cache it was already optimized a hell of a lot.
     
    Joined
    Jul 21, 2013
    Messages
    2,932
    Reaction score
    460
    • Hardware Store
    I'm not going to provide new information but since I already answered this concern I will elaborate a little.
    Remember in the original post where I mentioned the event handler for being used to handle stuff like when a button is pressed. Well you use that same system for state changes to blocks. By using an event handler if nothing changes then no code is run. It is basically the same thing as an event handler in a game for keyboard. If no one pushes a button, no one releases a button, no one breaks a block, .... No code related to an event is ran. You use a timer to determine your 0.5 seconds if that is the interval you want to use. And each loop you check if last time vs current time that is done by looking at CPU ticks. Once the difference exceeds 0.5 seconds then you run an update of the states. For blocks that have a no delay this would simply update when the logic leading to it does. There is always a state block prior to no delay blocks.

    An event based system has to exist for both compiled and script based systems. Both require it. The difference is when a compiled system reaches an event it runs its own code when a script system gets an event it mimics the script on its own program.
    I posted this link above Compiler vs. Interpreter
    It shows the difference in how the compilers, interpreters, and byte code systems work.
    The big thing it seems to miss out on is how these programs run on a cpu.
    A compiled program can make heavy use of L1 and L2 cache java while a byte code system still is a compiler. Byte code is translated to machine code and can depending on the program and optimization run as fast as C. In short it can make heavy use of L1 and L2 cache.
    An interpreter makes very poor use of L1 and L2 because they generate cache misses because they are pulling from outside cache for the program code and even if they could pull script entirely into the cache the program is more complex because it is doing the interpreting part on top of trying to run the simulated functions of the script.

    If you wonder how much difference a program makes running optimized in L1 and L2 vs not. I have this program at the point now that it generates 40,000 rooms and connects them all together with paths. With it not being optimized it takes about 7 minutes to run. Optimized it takes 0.98 seconds on average to run. That is a speed improvement of 420 times. Are you going to get a speed improvement that large on here probably not but I wouldn't be surprised to see 50 or more.

    As for my program that takes 7 minutes cut down to less than a second. Understand the 7 minute program is massively optimized to start. Functions that generally increase at an exponential rate such as the number of paths to check, the room potential positions and collision detection all have been optimized to where I reduced them to a fixed length linear. Inline systems are already specified and so on. Before I did that it would take hours to complete. At one point when I was using pure random room space selection it would only get to 30% capacity and stop progressing because of the number of collisions. The reason I am discussing this is so you get how the program was by the point I moved it to the optimization of running it in the cache it was already optimized a hell of a lot.
    So the intent of the suggestion is to not change the code of logic evaluation, but to simply compile to machinecode instead of java-bytecode? (doing so will not actually require compilation during runtime)
    If so, yes, there will be performance benefits, but as for the majority of time logic stands still, I suspect that it won't make much of an impact.
    Even with 100 times better performance on the logic, if only 0.1% of processing time was dedicated to logic beforehand, the overall improvement would only be 0.099%.
    Granted, I don't know how much of overall processing time for starmade is used by logic, and I also don't know how one could currently measure that, as one can't simply disable logic evaluation without also removing the logic blocks from all entities, thus also reducing the amount of processing time needed elsewhere, skewing the result. It would be a good metric to know though.
     
    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    So the intent of the suggestion is to not change the code of logic evaluation, but to simply compile to machinecode instead of java-bytecode? (doing so will not actually require compilation during runtime)
    If so, yes, there will be performance benefits, but as for the majority of time logic stands still, I suspect that it won't make much of an impact.
    Even with 100 times better performance on the logic, if only 0.1% of processing time was dedicated to logic beforehand, the overall improvement would only be 0.099%.
    Granted, I don't know how much of overall processing time for starmade is used by logic, and I also don't know how one could currently measure that, as one can't simply disable logic evaluation without also removing the logic blocks from all entities, thus also reducing the amount of processing time needed elsewhere, skewing the result. It would be a good metric to know though.
    The simplest way to look at it is you are making a jar file out of the logic. That jar file would be stored with the entity it belongs to. In short each circuit would be like a small addon module to the game ran when the game needs to or the ship is loaded. In short it would be byte code. But going from byte code to machine code is vastly faster than interpreting it on the fly. It can be nearly as fast as running machine code itself. Consider it this way one process could handle that conversion from byte to machine code and another then run it. So there would be effectively zero time waiting on the conversion.

    This really isn't something players need to worry about. The suggestion was for the programmers they are the only ones that can measure it or implement it. If users want to learn from it great other wise there isn't much else they can do with it.

    To understand the impact level of what it can be. Consider it this way. They are performing all sorts of tests and so on to display millions of blocks on here. And they are doing it several times a second. The GPU is handling most of the display work. The game still has to provide information to the GPU such as when a ship moves, turns and so on. If it takes 0.1 seconds now to handle all the logic currently in the game and they cut it to 0.001 seconds. They could handled how much more other stuff? If they were doing 30 program cycles a second then this would allow them 3.3 additional cycles time to work with.
     
    Joined
    Jul 21, 2013
    Messages
    2,932
    Reaction score
    460
    • Hardware Store
    The simplest way to look at it is you are making a jar file out of the logic. That jar file would be stored with the entity it belongs to. In short each circuit would be like a small addon module to the game ran when the game needs to or the ship is loaded. In short it would be byte code. But going from byte code to machine code is vastly faster than interpreting it on the fly. It can be nearly as fast as running machine code itself. Consider it this way one process could handle that conversion from byte to machine code and another then run it. So there would be effectively zero time waiting on the conversion.

    This really isn't something players need to worry about. The suggestion was for the programmers they are the only ones that can measure it or implement it. If users want to learn from it great other wise there isn't much else they can do with it.

    To understand the impact level of what it can be. Consider it this way. They are performing all sorts of tests and so on to display millions of blocks on here. And they are doing it several times a second. The GPU is handling most of the display work. The game still has to provide information to the GPU such as when a ship moves, turns and so on. If it takes 0.1 seconds now to handle all the logic currently in the game and they cut it to 0.001 seconds. They could handled how much more other stuff? If they were doing 30 program cycles a second then this would allow them 3.3 additional cycles time to work with.
    Thank you for clarifying the intent of the suggestion for me.
    However I still fail to see what benefits packing the logic evaluation as a program with the blueprint/entity provides, aside from those provided by compiler optimizations, and the benefits of running machinecode directly[both of which could be achieved without packing the logic with the blueprint].
    Assuming a player would still be able to see the logic blocks change their state accordingly, what benefits does packing the logic with the blueprint provide? Because unnecessary or inefficient intermediary results must still be displayed, it would not allow optimizing those away, and the optimizations provided by compiling and running machinecode don't require that packing in order to be applied.

    And removing internal logic blocks visibly changing their state would remove a debug/analysis feature.
     
    Joined
    Dec 14, 2014
    Messages
    745
    Reaction score
    158
    • Community Content - Bronze 1
    • Purchased!
    • Legacy Citizen 2
    Thank you for clarifying the intent of the suggestion for me.
    However I still fail to see what benefits packing the logic evaluation as a program with the blueprint/entity provides, aside from those provided by compiler optimizations, and the benefits of running machinecode directly[both of which could be achieved without packing the logic with the blueprint].
    Assuming a player would still be able to see the logic blocks change their state accordingly, what benefits does packing the logic with the blueprint provide? Because unnecessary or inefficient intermediary results must still be displayed, it would not allow optimizing those away, and the optimizations provided by compiling and running machinecode don't require that packing in order to be applied.

    And removing internal logic blocks visibly changing their state would remove a debug/analysis feature.
    They could stick all the stuff in one file for all the ships and all the bases. That would mean though each time they wanted to make a change for any object they would have to take the file off line and change and it would effect all of them at once. Packing it and sticking in the entities directory means when a change or something needs to take place they only need effect the one entity. To the user the only noticeable difference would be the could run a lot more logic or the devs could use that freed up processing power for something like the new planets.
     
    Joined
    Jul 21, 2013
    Messages
    2,932
    Reaction score
    460
    • Hardware Store
    They could stick all the stuff in one file for all the ships and all the bases. That would mean though each time they wanted to make a change for any object they would have to take the file off line and change and it would effect all of them at once. Packing it and sticking in the entities directory means when a change or something needs to take place they only need effect the one entity. To the user the only noticeable difference would be the could run a lot more logic or the devs could use that freed up processing power for something like the new planets.
    Oh, my question was more along the lines of:
    What is the major performance difference between the following C-ish pseudocodesnippets[assuming both are compiled with the same compiler]?
    Code:
    template<gatetypeenum type> inline bool evalStatusOnUpdate(bool newstate, int activeInputCount) {
    switch(type) {
    case AND: return activeInputCount==connectedInputCount; // AND, OR, etc. are values of the enum gatetypeenum
    case OR: return activeInputCount>0;
    //...
    default: error();
    }
    }
    Code:
    inline bool evalStatusOnUpdate(bool newstate, int activeInputCount) {
    return activeInputCount==5; //at compiletime, gatetype is known to be AND, and connectedInputCount is known to be 5
    }
    Specifically, the former represents general logic eval code, which does not need to be shipped with blueprints as it is identical for all[despite being a template, the number of concrete specializations is sufficiently limited], while the latter represents a specifically compiled instance of logic eval code, for a part of a circuit consisting of an AND gate with 5 inputs.​

    I simply don't see any sufficiently significant performance gain that can be achieved by compiling for each specific scenario on demand as opposed to compiling a general evaluator once, and have it work for all.

    Additionally, although besides the main point, packing executable code for the logic with a blueprint poses a security risk. A malicious player could make a blueprint, and replace the compiled code for the logic with malware. By directly having the CPU execute code provided by an outside source, keeping sufficient security becomes very hard. Granted, this is not an issue in singleplayer, unless the player downloads other blueprints, but starmade features multiplayer.