We can't find the internet
Attempting to reconnect
Something went wrong!
Attempting to reconnect
Analysis Summary
Worth Noting
Positive elements
- This video provides an excellent technical deep-dive into the specific bitmasks and POSIX specifications that cause real-world bugs in shell scripting.
Be Aware
Cautionary elements
- The use of 'imposter syndrome' rhetoric at the start is a common engagement tactic that frames technical confusion as a psychological burden to increase viewer loyalty.
Influence Dimensions
How are these scored?About this analysis
Knowing about these techniques makes them visible, not powerless. The ones that work best on you are the ones that match beliefs you already hold.
This analysis is a tool for your own thinking — what you do with it is up to you.
Related content covering similar topics.
Transcript
all your life you've been told that you're an impostor just because you don't know regular Expressions but no matter what you do your regular Expressions never seem to match even after you try adding a backslash before every single character but I'm here to tell you today that it's not your fault it's not your fault in this video I'm going to expand on something that I said about regular expressions in an earlier video My overall goal is to raise awareness about the confusing mess of different regular expression implementations that exist out there I think very few people are aware of just how many subtle differences there are in the flavors of regular Expressions that you'll commonly encounter in command line tools people will often say that it's the sign of a novice when someone blames their tools instead of just learning how to use them properly however as someone who has spent many hundreds of hours thoroughly learning all of these command line tools I can say with confidence that basically no one actually takes the time to learn all of these different regular expression quirks and in this video I'm going to show you some examples to justify my claim a good place to start the discussion would be with video that I made several years ago about some unexpected behavior that I saw with grep at the time I was working on writing an article to discuss how regular expression quantifiers work I was doing some tests on the command line to verify my understanding of quantifiers many of you will probably recognize that this regular expression matches an a character repeated three times and as you can see every instance of an a character repeated three times is listed in the output in addition you might also recognize that the question mark character makes this quantifier lazy and in this similar command the plus character makes the quantifier possessive however the output that I see here is not what I expected since I know that the d e flag uses extended regular Expressions I decided to also try it out with pearl compatible regular Expressions which gives the following result this prompted me to eventually start investigating the specification to figure out what was going on and the conclusion that I came to was that the behavior with the- e flag was undefined behavior in fact it turns out that posix based regular Expressions don't support non-g greedy reputation operators at all that's a problem because it means that if you do try to a non-g greedy regular expression quantifier you'll just end up with silently undefined Behavior according to the specification so it's clear that there's undefined behavior in the specification so is there more so checking with control F It also says the interpretation of an ordinary character preceded by an unescaped back slash is undefined except for these cases so I think that's also an interesting case usually the way this is handled when you escape something that doesn't need to be escaped is to just interpret it as the original character and when you don't know the escaping rules I think it's quite common to just try escaping stuff and works so this is probably also another common source of undefined behavior in an Ideal World these commands would produce warnings so that you can at least have an opportunity to change the Rex as you can see most of the time the undefined Behavior does the right thing but trying to rely on that is a dangerous game so now that we've established that pic's regular Expressions don't support non-g greedy quantifiers let's see what other features they lack you might be surprised to learn that BR regular Expressions don't even support alternation this is interesting because BR is the default mode for grap and it's important not to to confuse BR with ere which does support alternation in general BR supports less features than ere and by default it requires some extra escaping for special characters which is not the case for ER so now that we know how crippled ER and B regular expressions are the important question is which flavor do our tools actually use according to the man page for gnu GP the default is BR however this version of GP also supports ER as well as Pearl compatible regular Expressions now let's see what the documentation says for Sid according to this the default is B however said also supports e however it currently doesn't appear to support Pearl compatible regular Expressions the man page for said also includes this gem posix 2 bres should be supported but they aren't completely because of performance problems that's reassuring especially since BR is the default now what about gnu o the man page doesn't say anything about b or ere it turns out that there's a posic specification specifically for a and to make things more interesting the specification specifically referen erere regular Expressions however it then goes on to specify a list of exceptions specifically for a this is just a few tools but of course many commandline tools use regular Expressions so if you wanted to actually Master regular Expressions you'd need to memorize every feature that every tool supports and if you start digging into the source code you'll find that a lot of these differences have been formalized as a feature that can be switched on or off with a bit mask and here's a few examples so we've got an O syntax a posx o syntax a grap syntax EGP posix EGP and and a few more so if you truly want to master regular Expressions you have to memorize all of these and it gets even more fun than that if you go digging around in the source code for Cory yous you'll find tons of references to those bit masks and here you can see a few other commands that use regular Expressions so it's worth stepping back a bit and asking how could the situation be improved I think at minimum it would be great if the g tools supported a flag that would turn on warnings when invoking undefined behavior in a regular expression I think it would also be great if other gnu tools like said or o also had a flag that turned on Pearl compatible expressions in my opinion and I think the opinion of many others Pearl compatible regular expressions are far superior to b or ere now B and ere still need to be supported any kind of official government or corporate technology stack will want to stick with the original historical standards rather than the most elegant solution for these systems consistency and stability is far more important than efficiency or Elegance for any kind of new tools I think that Pearl compatible regular Expressions should be the default I also don't think there's any need to invent a new canonical flavor regular Expressions it would be best if we could just pick a well-defined subset of pearl compatible regular expressions and cross-compile everything to that if you compare alternation Works in pcre versus posix there are a few incompatible differences but maybe you could Implement something like a simple switch to the regular expression engine there's a lot more that I could say on this topic but I think that regular Expressions have a tragically unrealized potential but if you're interested in learning more you can read my blog post on my regular expression visualizer tool this page talks about how you can effectively compile any regular expression down to handful of operations almost like a tiny set of assembly instructions there are some even deeper connections that you can make between regular expressions and computation in general you could even think about a regular expression matcher as though it were some kind of abstract computation machine without any branching or Loop instructions each Atomic item in the regular expression would just map to an individual assembly instruction I could say a lot more about this topic but I'll have to save that for another video
Video description
Become A Channel Member: https://www.youtube.com/channel/UCOmCxjmeQrkB5GmCEssbvxg/join SOCIALS ---------------- Patreon: https://www.patreon.com/RobertElderSoftware?utm_source=yt&utm_medium=desc&utm_campaign=ytchannel&utm_content=ys7yUyyQA-Y Tiktok: https://www.tiktok.com/@roberteldersoftware Linkedin: https://linkedin.com/company/robert-elder-software Blog: https://blog.robertelder.org/?utm_source=yt&utm_medium=desc&utm_campaign=ytchannel&utm_content=ys7yUyyQA-Y Twitter: https://twitter.com/RobertElderSoft Twitch: https://www.twitch.tv/roberteldersoftware Github: https://github.com/RobertElderSoftware Facebook: https://www.facebook.com/RobertElderSoftware Instagram: https://www.instagram.com/roberteldersoftware/ Merch: https://store.robertelder.org/?utm_source=yt&utm_medium=desc&utm_campaign=ytchannel&utm_content=ys7yUyyQA-Y