RegEx compilation thread

    Fellow Starmadian

    Oh cool so thats what this is
    Joined
    Jun 7, 2014
    Messages
    227
    Reaction score
    87
    • Community Content - Bronze 1
    • Wired for Logic
    • Legacy Citizen 2
    dont. use. spaces. it will drive you cray cray. Use something else, like a semicolon ";". it isn't part of the regex syntax so its as easy to use as a space, but MUCH easier to see. I put a notice on the main post, but basically it's better in the long run to do this.

    My string parser (called a string parser) rips apart a sentence into a bunch of different display blocks, I'm working on one now that, instead of having a predetermined length, continues until there is nothing else to separate. But the current one will work for your purposes, you just need to expand the instant pulse chain. I guess I should have made a template for that.. lol. In a few hours I'll hopefully have my dynamic length string parser complete.

    Also, the point of the thread was to put all the regex code in the top of the thread, so look there and you will find what you seek :P

    edit: switched to tahoma font.
    edit2: moar explainin!

    [doublepost=1486428565,1486427639][/doublepost]I invite people to discuss this with me, I am quite willing to retract my "semicolons are better than spaces" statement if proven otherwise. All I know is we need to decide on a format that is compatible with other peoples projects. Not for collaboration, but for easy understanding.
     
    Joined
    Jul 30, 2013
    Messages
    398
    Reaction score
    282
    • Wired for Logic Gold
    • Legacy Citizen 8
    • Purchased!
    For my nasometer, I needed to increment in decimal and in hexidecimal, so I decided to figure out Snapems counter. TBH, modifying it is p. easy.

    As a base, the original is
    [REPLACEALL]\b([0-9]+)\b[WITH]0$1~01234567890
    [REPLACEALL]\b0(?!9*~)|([0-9])(?=9*~[0-9]*?\1([0-9]))|~[0-9]*[WITH]$2

    My modified Hexidecimal counter is
    [REPLACEALL]\b([0-F]+)\b[WITH]0$1~0123456789ABCDEF0
    [REPLACEALL]\b0(?!F*~)|([0-F])(?=F*~[0-F]*?\1([0-F]))|~[0-F]*[WITH]$2

    I simply replaced every instance of 9 with F, and added in some letters to get where I wanted to go. Just for gits and shiggles though, I wanted to play around and make a binary counter
    [REPLACEALL]\b([0-1]+)\b[WITH]0$1~010
    [REPLACEALL]\b0(?!1*~)|([0-1])(?=1*~[0-1]*?\1([0-1]))|~[0-1]*[WITH]$2

    Which indicates a few things, really. Every time I've listed 9, F, or 1, that's the maximum number, and then that string of numbers at the end is all the possible components, in order, and then the first number again. For more shiggles, I fucked around with that last string. This counts up in hexidecimal using only odd numbers, but also skips even number beginnings (F + 1 = 11, 1F + 1 = 31)
    [REPLACEALL]\b([0-F]+)\b[WITH]0$1~013579BDF1
    [REPLACEALL]\b0(?!F*~)|([0-F])(?=F*~[0-F]*?\1([0-F]))|~[0-F]*[WITH]$2

    This one just does weird shit I don't even understand
    [REPLACEALL]\b([0-F]+)\b[WITH]0$1~0FEDCBA9876543210
    [REPLACEALL]\b0(?!F*~)|([0-F])(?=F*~[0-F]*?\1([0-F]))|~[0-F]*[WITH]$2

    Anyways, all of this is just to say that modifying this piece of Regex is easy and can do powerful things.
    I have found a limitation with the regex commands, it seems that using either [a-z] or [A-Z] or [\w] ... does not detect/match the letter "ñÑ" ... and this should be added to the verification range by hand. Therefore it should be:

    Code:
    *This count UP A,B,C,D...
    First display block:  [REPLACEALL]\b([\wÑ]+)\b[WITH]A$1~ABCDEFGHIJKLMNÑOPQRSTUVWXYZA
    Second display block: [REPLACEALL]\b\w(?!Z*~)|([\wÑ])(?=Z*~[\wÑ]*?\1([\wÑ]))|~[\wÑ]*[WITH]$2
    Code:
    *This count DOWN Z,Y,X,W...
    First display block:  [REPLACEALL]\b([\wÑ]+)\b[WITH]$1~AZYXWVUTSRQPOÑNMLKJIHGFEDCBA
    Second display block: [REPLACEALL]\bAÑ|([\wÑ])(?=A*~[\wÑ]*?\1([\wÑ]))|~[\wÑ]*[WITH]$2
    *Remember, using " \w " is the same as [a-zA-Z0-9_], so we can short it if we want to detect the whole range
     
    Last edited:
    • Like
    Reactions: Stormraven

    Fellow Starmadian

    Oh cool so thats what this is
    Joined
    Jun 7, 2014
    Messages
    227
    Reaction score
    87
    • Community Content - Bronze 1
    • Wired for Logic
    • Legacy Citizen 2
    Thats because that tilda N is a special character. That bracket function uses the unicode values of the symbols as the range. If you look up an extended unicode chart it should show what I mean. Also, this link describes how to add special characters into the [a-z] type of range: Regex to match only letters
    [doublepost=1486497228,1486497041][/doublepost]\p{L} matches all unicode letters, so I guess use this?
     
    • Like
    Reactions: Stormraven

    Fellow Starmadian

    Oh cool so thats what this is
    Joined
    Jun 7, 2014
    Messages
    227
    Reaction score
    87
    • Community Content - Bronze 1
    • Wired for Logic
    • Legacy Citizen 2
    So I've been introspecting my format choices for regEx (and arguing with galactusX). I think GalactusX succeeded in impressing on me that semicolons all over the place would be counter productive. I was thinking in terms of my string parsing system, that to understand what a string said, you'd need to separate the words. But now with GalactusX's very useful regEx for singling out variables, I decided that my format was inefficient. I'll post my new opinion here, so we can discuss it's effectiveness compared to other solutions. Here's an example of my new data storage format.

    ;variable1[text];variable2[this is a string];variable3[1234];variable4[1234.56];variable5[true];

    There is one reason why you need semicolons in these situations: to make the regEx creation easier. You could potentially write a very complex statement that can separate the words outside the bracket from the things inside the bracket, but with semicolons that code is much easier. It makes things easier to see as well. It isn't necessary, but it is useful.

    Second point of discussion: Can we agree that there needs to be a common format? Both for the regEx here, and for all regEx posted elsewhere. Can't we create the equivalent of the universal docking port in regEx? It would certainly make creating a complex computer easier if we could just plug in regEx modules from multiple creators. Think of them like ICs, tiny integrated circuit boards that can be connected together to to amazing things.
     
    Joined
    Jul 30, 2013
    Messages
    398
    Reaction score
    282
    • Wired for Logic Gold
    • Legacy Citizen 8
    • Purchased!
    A new contribution from my hand, if you have asked how to "detect" if a display block can no longer contain "more lines of text", or can no longer contain "more characters" X , I have found these solutions:
    Code:
    This detect the number of characteres in a line (current limit 240)
    [REPLACEALL]^.{240,}$()[WITH]$1
    Regex101 - online regex editor and debugger
    Code:
    This detect the number of lines (current limit 10)
    [REPLACEALL]^(?>\V*\n\w){10,}\V*$()[WITH]$1
    Regex101 - online regex editor and debugger

    * I have used a special syntax " ?> "of the regex commands called "atomic groups". See this link for more information.

    When we use this syntax {... , ...} is when we want to DEFINE the limits of the comparison that precedes it, so if we want to count / compare a certain number of "characters" we will use "." (The point is used To detect any character other than a line break)

    We use ^ ... $ to define the START and END of the string, so that the regex command takes into consideration the entire line.

    I think these commands will be a great help
     
    Last edited:

    Jaaskinal

    ¯\_(ツ)_/¯
    Joined
    Jan 19, 2014
    Messages
    1,377
    Reaction score
    646
    • Legacy Citizen 4
    • Wired for Logic Gold
    • Thinking Positive
    just making this shorter -Jaas
    I think that a universal data storage method may be difficult. Some applications require different solutions to more effectively use space. This is kind of a bad example, but we can look at the differences between Galactus' storage needs, in his latest thing, and my storage needs, in the Nasometer.

    Galactus needs to have named variables with assigned values. The logical conclusion of this is exactly what you have suggested, I believe. :Player1[###]:Player2[###]:Player3[###]: is perfect for this application.

    Now, you're going to hate this, and I'm going to probably change this soon, but the Nasometer needed a ton of compression, needed to store data in hexadecimal, but it only needed to have to correct order and amount, the readability was not a concern. In this situation, I went for just having number after number, storing up to 240 bytes in a single display module.

    I think, at a minimum, there's going to be two standards. The one you describe, since it's extremely good at keeping things tidy, and easy for regex to find. A raw data standard would still probably be useful, however, I think it should probably more resemble your first storage method compared to this new storage method. :)#:#:#:)

    -

    In regards to standardizing the behind the scenes regex, I think that that may be even more difficult. There are some people like you, Galactus and Snapems, all of whom know regex pretty well, and then there's the people like me, who just bang stuff together until it works.
     
    • Like
    Reactions: Stormraven
    Joined
    Jan 11, 2017
    Messages
    168
    Reaction score
    83
    In addition to what Jaaskinal said, there's also the fact that if you're pasting together computer bits from multiple players into one big system, then either A) You've already coordinated it and thus this shouldn't be a problem. Or B) You're chunking together bits that were likely never intended to be used that way, and should expect to have to make a few 'adapters' that take the output of one machine and format it before it goes into the second machine.
     

    Fellow Starmadian

    Oh cool so thats what this is
    Joined
    Jun 7, 2014
    Messages
    227
    Reaction score
    87
    • Community Content - Bronze 1
    • Wired for Logic
    • Legacy Citizen 2
    Lol, I would describe myself as a "banger" more than a knowledgeable person. And that's actually a very good point! raw data is far superior to stored variables. That actually gives me an idea, which I'll have to work on. we could essentially make an assembly like language, with pointers up the wazoo. Maybe even a C level language, with a compiler. All you need to do is use memory storage like it's used IRL. Have a graphical representation of the variables, that actually are a pointer to a position in memory with the actual value there. Since we aren't limited by binary it could be done with whatever base we want. I think building a system where we could do this would be fairly easy, yes? For reference, Im going to put a list of things that need to be invented for this to work:
    1: data compiler, that takes values and dynamically stores them.
    2: data type flag. at beginning of memory location.
    3: variable needs pointer that points to a data location in a different display.
    hmm.. is this going overboard again? I think I'd need to look again at your nasometer to see how you handled it. I need to remind myself that this is starmade, and in starmade things work differently.

    ex:

    & references the variable, * de-references.
    print var1& returns (memLocation(0)) the memory location that var1 points to
    print var1 returns ("string")
    dereferencing is confusing, but basically in C++ if you want one pointer's value to equal the value that another pointer points to, you would do this:
    val1 = *val2;

    For Starmade's purposes, at least for now, all we need is the ability to do three things: assign a memory location to a variable, change the value stored in that location to the value of the variable, and then return the proper value when we call the correct variable.

    why do I do this to myself :<
    [doublepost=1486586441,1486583734][/doublepost]Jaaskinal is this the storage you were refering to?

    I had found it on your earlier logic, but I can't seem to find it this time. it seems you're storing it in two display blocks instead of one as well.
    [doublepost=1486586702][/doublepost]
    In addition to what Jaaskinal said, there's also the fact that if you're pasting together computer bits from multiple players into one big system, then either A) You've already coordinated it and thus this shouldn't be a problem. Or B) You're chunking together bits that were likely never intended to be used that way, and should expect to have to make a few 'adapters' that take the output of one machine and format it before it goes into the second machine.
    Thats a real possibility, but I'm convinced we wont need adapters if we all agree to a specific format. Just a quick example: Jaaskinal makes a data storage block with display modules, with a control, input, and output display block. why can't a small text based game use the control block to do what it will with the data storage? Or maybe link a multiplier module up to a game module that needs to calculate damage? If we decide on the format, the only conversion we would need to do would be to possibly change the names of the commands we can send to the modules.
     
    • Like
    Reactions: Jaaskinal

    Jaaskinal

    ¯\_(ツ)_/¯
    Joined
    Jan 19, 2014
    Messages
    1,377
    Reaction score
    646
    • Legacy Citizen 4
    • Wired for Logic Gold
    • Thinking Positive
    Those screens have the label "METADATA" above them, that's the raw data which has been processed to be somewhat meaningful.
    Typically hex data ends up looking more like this;


    And when the Nasometer's storage is full, it should look somewhat like this;

    That takes a long time to observe though, because it would require 1,920 transactions to occur before it looked like this.
     
    Joined
    Jul 30, 2013
    Messages
    398
    Reaction score
    282
    • Wired for Logic Gold
    • Legacy Citizen 8
    • Purchased!
    Metadata is good, but for certain tasks not, for example, I have worked in real life with automatas, much of the programming, if not all, is based on scheduling the point-to-point movement, this means a start point, a step intermediate, and a final one, the final program is much clearer and easy access/read if variables are used.

    In my opinion it is better to use START[...], STEP[...], FINAL, [...] instead of its HEX equivalent, (53 54 41 52 54 5b 2e 2e 2e 5d), (53 54 45 50 5b 2e 2e 2e 5d), (46 49 4e 41 4c 5b 2e 2e 2e 5d); In my personal opinion, using variables to access the information is much faster, because you only need the name of the variable and read the information it contains, instead of having to convert the HEX code.

    We are limited by the current maximum characters of the display blocks, and I think the best way to save a data is as it is, without converting it to another format.
     

    Jaaskinal

    ¯\_(ツ)_/¯
    Joined
    Jan 19, 2014
    Messages
    1,377
    Reaction score
    646
    • Legacy Citizen 4
    • Wired for Logic Gold
    • Thinking Positive
    Metadata is good, but for certain tasks not, for example, I have worked in real life with automatas, much of the programming, if not all, is based on scheduling the point-to-point movement, this means a start point, a step intermediate, and a final one, the final program is much clearer and easy access/read if variables are used.

    In my opinion it is better to use START[...], STEP[...], FINAL, [...] instead of its HEX equivalent, (53 54 41 52 54 5b 2e 2e 2e 5d), (53 54 45 50 5b 2e 2e 2e 5d), (46 49 4e 41 4c 5b 2e 2e 2e 5d); In my personal opinion, using variables to access the information is much faster, because you only need the name of the variable and read the information it contains, instead of having to convert the HEX code.

    We are limited by the current maximum characters of the display blocks, and I think the best way to save a data is as it is, without converting it to another format.
    Metadata is data about data. I agree that it's not useful for certain tasks, however I would like to state that it is useful for other tasks. In terms of raw data storage, the type the nasometer uses, it is unnecessary to assign variable names and waste all of that space when all that is required is to store a byte of information, and where exactly in the sequence it is.

    The two methods of storage are equally useful for different tasks.
     
    Joined
    Jul 30, 2013
    Messages
    398
    Reaction score
    282
    • Wired for Logic Gold
    • Legacy Citizen 8
    • Purchased!
    Something was working ...
    Dont appear in the gif, but i´m writing in the "WRITE MESSAGE" display
    When the first display block is full, it copy&paste the text inside to the second UP display and clear the first for new text input
     

    Jaaskinal

    ¯\_(ツ)_/¯
    Joined
    Jan 19, 2014
    Messages
    1,377
    Reaction score
    646
    • Legacy Citizen 4
    • Wired for Logic Gold
    • Thinking Positive
    This is awesome! Thank you!
    This regex allowed me to make more flexible entry commands for my logic core.

    With my old command towers, you had to type in the exact phrase to trigger any functions. So {SPOOL WARP COIL} would start your chain drive but "Turn warp drive on" would get you the "I don't recognize that command" prompt.
    With THIS, I can now set up a tower to not only recognize certain words in a sentence but recognize when they're used in conjunction! So while the tower on the right will only function with {SPOOL WARP COIL}, the one on the left will trigger if your command contains the word "warp" in conjunction with the words "activate" or "on". I can then make another 'warp' tower which recognized "warp" and "deactivate" or "off". :)

    The only issue is that this tower takes up a LOT more space, so this level of responsiveness might be limited to stations. Otherwise, my logic core could quickly become absolutely HUGE, and thus unreasonable for the average ship. As it is my logic core is already something like 40*30*9...
    So I figured out another interesting thing. I'm not sure, but just in case, if you're making one sensor/ display block command for every different field, that is also somewhat inefficient. To be more efficient, we can use commands like [replacefirst](on|activate|spool|allume+(r|s|z|nt)|allons)[with]temp to detect more with less.
     
    Joined
    Jan 11, 2017
    Messages
    168
    Reaction score
    83
    So I figured out another interesting thing. I'm not sure, but just in case, if you're making one sensor/ display block command for every different field, that is also somewhat inefficient. To be more efficient, we can use commands like [replacefirst](on|activate|spool|allume+(r|s|z|nt)|allons)[with]temp to detect more with less.
    *Sees post and leans in with interest*
    *Reads new code excitedly, trying to parse out what it does*
    *Jaw hits floor.*

    Holy Frak, if that works... Jesus the space I'll save in the mk 2 would be HUGE!
     

    Jaaskinal

    ¯\_(ツ)_/¯
    Joined
    Jan 19, 2014
    Messages
    1,377
    Reaction score
    646
    • Legacy Citizen 4
    • Wired for Logic Gold
    • Thinking Positive
    *Sees post and leans in with interest*
    *Reads new code excitedly, trying to parse out what it does*
    *Jaw hits floor.*

    Holy Frak, if that works... Jesus the space I'll save in the mk 2 would be HUGE!
    Yeah, lol. IDK why, but I decided to be stupid and add French in there, but then I realized that this sort of regex would actually be nearly required to make what you want in another language.
     
    Joined
    Jan 11, 2017
    Messages
    168
    Reaction score
    83
    In theory, if this works, then a few extra columns could be added back into the design with alternate languages. If any of them trigger, it could flip the computer over to that language for responses.
    I'm not about to try and BUILD that, but I think it could totally be done! XD
    [doublepost=1487043812,1487043652][/doublepost]Wait. Problem.
    It might detect all those words, but each of those words needs to have a separate function/output...
    [doublepost=1487043853][/doublepost]Otherwise On Off and Charge all do the same thing. :(
     

    Jaaskinal

    ¯\_(ツ)_/¯
    Joined
    Jan 19, 2014
    Messages
    1,377
    Reaction score
    646
    • Legacy Citizen 4
    • Wired for Logic Gold
    • Thinking Positive
    In theory, if this works, then a few extra columns could be added back into the design with alternate languages. If any of them trigger, it could flip the computer over to that language for responses.
    I'm not about to try and BUILD that, but I think it could totally be done! XD
    [doublepost=1487043812,1487043652][/doublepost]Wait. Problem.
    It might detect all those words, but each of those words needs to have a separate function/output...
    [doublepost=1487043853][/doublepost]Otherwise On Off and Charge all do the same thing. :(
    Yeah, it's meant to detect all the words with similar functions. On and off can not go together like this, but on and activate can.
     
    • Like
    Reactions: Stormraven

    Fellow Starmadian

    Oh cool so thats what this is
    Joined
    Jun 7, 2014
    Messages
    227
    Reaction score
    87
    • Community Content - Bronze 1
    • Wired for Logic
    • Legacy Citizen 2
    your post is gonna look silly when I ret-con the heck out of that post XD I had to copy some text over from another comp, didnt want to leave it because I havent actually finished building the regex to go along with the text...