Some of the most common things people fuck up in regular expressions:

1. Not escaping \\. in domain names: /abc.stuff.com/

This will accidentally match things like abcxstuffxcom

Seems obvious when you see it here, but this is probably the most common mistake I see in practice.
Test for negative cases as well as positive!
2. Not anchoring domain names

Things can get past your ACL by using a subdomain like http://abc.stuff.com .<their-malicious-domain>.com

Instead you probably want to anchor \\.com$ at the end. Maybe at the start, too.
3. ^abc|xyz|def$ means (^abc)|xyz|(def$)

but you probably think it means ^(abc|xyz|def)$

If you do this in an ACL, you may as well not have an ACL.
✨ test for negative cases as well as positive ✨
4. Matching more than you want.

Instead of /abc="(.*)"/ you probably mean /abc="([^"]*)"/

Think about reading characters as they come, left to right.

I know you're thinking about the ending quote, because as humans we see "..." and then only look at the ... part after.
Write exactly what you're matching. Don't think of it as "whatever" followed by something specific. Spell out the "whatever" part.
5. Escaping things you don't need to escape: \\- is common. This is usually harmless, but messy and confusing.
That's all for now.

Yes, I have seen the "don't parse HTML with regexps" thread.
Yes, I have seen the "all valid email addresses" regexp.
Yes, I have seen Russ Cox's articles.

Be kind to your security people.
You can follow @thingskatedid.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.