Some of the most common things people fuck up in regular expressions:
1. Not escaping \\. in domain names: /abc.stuff.com/
This will accidentally match things like abcxstuffxcom
Seems obvious when you see it here, but this is probably the most common mistake I see in practice.
1. Not escaping \\. in domain names: /abc.stuff.com/
This will accidentally match things like abcxstuffxcom
Seems obvious when you see it here, but this is probably the most common mistake I see in practice.
Test for negative cases as well as positive!
2. Not anchoring domain names
Things can get past your ACL by using a subdomain like http://abc.stuff.com .<their-malicious-domain>.com
Instead you probably want to anchor \\.com$ at the end. Maybe at the start, too.
Things can get past your ACL by using a subdomain like http://abc.stuff.com .<their-malicious-domain>.com
Instead you probably want to anchor \\.com$ at the end. Maybe at the start, too.
3. ^abc|xyz|def$ means (^abc)|xyz|(def$)
but you probably think it means ^(abc|xyz|def)$
If you do this in an ACL, you may as well not have an ACL.
but you probably think it means ^(abc|xyz|def)$
If you do this in an ACL, you may as well not have an ACL.


4. Matching more than you want.
Instead of /abc="(.*)"/ you probably mean /abc="([^"]*)"/
Think about reading characters as they come, left to right.
I know you're thinking about the ending quote, because as humans we see "..." and then only look at the ... part after.
Instead of /abc="(.*)"/ you probably mean /abc="([^"]*)"/
Think about reading characters as they come, left to right.
I know you're thinking about the ending quote, because as humans we see "..." and then only look at the ... part after.
Write exactly what you're matching. Don't think of it as "whatever" followed by something specific. Spell out the "whatever" part.
5. Escaping things you don't need to escape: \\- is common. This is usually harmless, but messy and confusing.
That's all for now.
Yes, I have seen the "don't parse HTML with regexps" thread.
Yes, I have seen the "all valid email addresses" regexp.
Yes, I have seen Russ Cox's articles.
Be kind to your security people.
Yes, I have seen the "don't parse HTML with regexps" thread.
Yes, I have seen the "all valid email addresses" regexp.
Yes, I have seen Russ Cox's articles.
Be kind to your security people.