Published on

# Regex for URL validation

A URL, or Uniform Resource Locator, is a string of text that specifies the location of a resource on the internet. It is typically used to identify web pages and other resources such as images and videos.

A URL consists of a protocol, a domain name, and sometimes a path to a specific resource. For example, the URL "https://www.example.com/page1.html" specifies a resource on the internet using the HTTPS protocol, at the domain "www.example.com", and the specific resource is the file "page1.html".

Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.

# Structure of a Website URL

The website URL should have the following criteria and structure-

• then it has to be followed by ://
• then it may or maynot contain www.
• then it must be followed by domain name
• then it will be followed by top level domain(TLD) like .com, .net, .io etc.,
• then it can also have query params in the url

# Regex for checking if URL is valid or not

Regular Expression-

/^(?:(?:(?:https?|ftp):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\\u00a1-\\uffff][a-z0-9\\u00a1-\\uffff_-]{0,62})?[a-z0-9\\u00a1-\\uffff]\.)+(?:[a-z\\u00a1-\\uffff]{2,}\.?))(?::\d{2,5})?(?:[\/?#]\S*)?$/igm  Test string examples for the above regex- Input StringMatch Output .as10does not match http://www.google.commatches #@$some .qwq.erasdoes not match
https://www.debugpointer.com?name=somethingmatches
debugpointer.comdoes not matches

Here is a detailed explanation of the above regex-

/^(?:(?:(?:https?|ftp):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\\u00a1-\\uffff][a-z0-9\\u00a1-\\uffff_-]{0,62})?[a-z0-9\\u00a1-\\uffff]\.)+(?:[a-z\\u00a1-\\uffff]{2,}\.?))(?::\d{2,5})?(?:[\/?#]\S*)?$/igm ^ asserts position at start of a line Non-capturing group (?:(?:(?:https?|ftp):)?\/\/) Non-capturing group (?:(?:https?|ftp):)? ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy) Non-capturing group (?:https?|ftp) 1st Alternative https? http matches the characters http literally (case insensitive) s matches the character s with index 11510 (7316 or 1638) literally (case insensitive) ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy) 2nd Alternative ftp ftp matches the characters ftp literally (case insensitive) : matches the character : with index 5810 (3A16 or 728) literally (case insensitive) \/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive) \/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive) Non-capturing group (?:\S+(?::\S*)?@)? ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy) \S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ]) + matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy) Non-capturing group (?::\S*)? ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy) : matches the character : with index 5810 (3A16 or 728) literally (case insensitive) \S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ]) * matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) @ matches the character @ with index 6410 (4016 or 1008) literally (case insensitive) Non-capturing group (?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\\u00a1-\\uffff][a-z0-9\\u00a1-\\uffff_-]{0,62})?[a-z0-9\\u00a1-\\uffff]\.)+(?:[a-z\\u00a1-\\uffff]{2,}\.?)) 1st Alternative (?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4])) Negative Lookahead (?!(?:10|127)(?:\.\d{1,3}){3}) Assert that the Regex below does not match Non-capturing group (?:10|127) 1st Alternative 10 10 matches the characters 10 literally (case insensitive) 2nd Alternative 127 127 matches the characters 127 literally (case insensitive) Non-capturing group (?:\.\d{1,3}){3} {3} matches the previous token exactly 3 times \. matches the character . with index 4610 (2E16 or 568) literally (case insensitive) \d matches a digit (equivalent to [0-9]) Negative Lookahead (?!(?:169\.254|192\.168)(?:\.\d{1,3}){2}) Assert that the Regex below does not match Non-capturing group (?:169\.254|192\.168) Non-capturing group (?:\.\d{1,3}){2} Negative Lookahead (?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2}) Assert that the Regex below does not match 172 matches the characters 172 literally (case insensitive) \. matches the character . with index 4610 (2E16 or 568) literally (case insensitive) Non-capturing group (?:1[6-9]|2\d|3[0-1]) Non-capturing group (?:\.\d{1,3}){2} Non-capturing group (?:[1-9]\d?|1\d\d|2[01]\d|22[0-3]) 1st Alternative [1-9]\d? 2nd Alternative 1\d\d 3rd Alternative 2[01]\d 4th Alternative 22[0-3] Non-capturing group (?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2} {2} matches the previous token exactly 2 times \. matches the character . with index 4610 (2E16 or 568) literally (case insensitive) Non-capturing group (?:1?\d{1,2}|2[0-4]\d|25[0-5]) Non-capturing group (?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4])) \. matches the character . with index 4610 (2E16 or 568) literally (case insensitive) Non-capturing group (?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]) 2nd Alternative (?:(?:[a-z0-9\\u00a1-\\uffff][a-z0-9\\u00a1-\\uffff_-]{0,62})?[a-z0-9\\u00a1-\\uffff]\.)+(?:[a-z\\u00a1-\\uffff]{2,}\.?) Non-capturing group (?:(?:[a-z0-9\\u00a1-\\uffff][a-z0-9\\u00a1-\\uffff_-]{0,62})?[a-z0-9\\u00a1-\\uffff]\.)+ Non-capturing group (?:[a-z\\u00a1-\\uffff]{2,}\.?) Non-capturing group (?::\d{2,5})? ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy) : matches the character : with index 5810 (3A16 or 728) literally (case insensitive) \d matches a digit (equivalent to [0-9]) Non-capturing group (?:[\/?#]\S*)? ? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy) Match a single character present in the list below [\/?#] \S matches any non-whitespace character (equivalent to [^\r\n\t\f\v ])$ asserts position at the end of a line
Global pattern flags
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and \$ to match the begin/end of each line (not only begin/end of string)


Hope this article was useful to check if the string is a valid URL or not.