DebugPointer
Published on

Regex for GitHub URL

Regex for GitHub URL

GitHub is a version control system that is used for tracking changes in computer files and coordinating work on those files among multiple people. It is primarily used for source code management in software development, but it can be used to keep track of changes in any set of files. In this article let's understand how we can create a regex for GitHub URL and how regex can be matched for GitHub URL values.

Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.

Conditions to match a GitHub URL

The value has to be one of the following-

  • It should start with the following protocols - ssh:, git:, git@, http:, https:
  • It should end with .git
  • It should not contain any special characters other than - and _
  • It should not contain any spaces
  • It should have host name and repository name
  • It should have a valid username

Regex for checking if its a valid GitHub URL

Regular Expression-

/((git|ssh|http(s)?)|(git@[\w\.]+))(:(\/\/)?)([\w\.@\:\/\-~]+)(\.git)(\/)?/igm

Test string examples for the above regex-

Input StringMatch Output
git.com/hellodoes not match
google.comdoes not match
git@github.com:user/project.gitmatches
ssh://user@github.com:port/path/to/repo.git/matches
http://github.com/user/project.gitmatches
https://github.com/user/project.gitmatches
git://github.com/path/to/repo.git/matches

Here is a detailed explanation of the above regex-

/((git|ssh|http(s)?)|(git@[\w\.]+))(:(\/\/)?)([\w\.@\:\/\-~]+)(\.git)(\/)?/igm

1st Capturing Group ((git|ssh|http(s)?)|(git@[\w\.]+))
1st Alternative (git|ssh|http(s)?)
2nd Capturing Group (git|ssh|http(s)?)
1st Alternative git
git matches the characters git literally (case insensitive)
2nd Alternative ssh
ssh matches the characters ssh literally (case insensitive)
3rd Alternative http(s)?
http matches the characters http literally (case insensitive)
3rd Capturing Group (s)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
s matches the character s with index 11510 (7316 or 1638) literally (case insensitive)
2nd Alternative (git@[\w\.]+)
4th Capturing Group (git@[\w\.]+)
git@ matches the characters git@ literally (case insensitive)
Match a single character present in the list below [\w\.]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\w matches any word character (equivalent to [a-zA-Z0-9_])
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
5th Capturing Group (:(\/\/)?)
: matches the character : with index 5810 (3A16 or 728) literally (case insensitive)
6th Capturing Group (\/\/)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
7th Capturing Group ([\w\.@\:\/\-~]+)
Match a single character present in the list below [\w\.@\:\/\-~]
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\w matches any word character (equivalent to [a-zA-Z0-9_])
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
@ matches the character @ with index 6410 (4016 or 1008) literally (case insensitive)
\: matches the character : with index 5810 (3A16 or 728) literally (case insensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
\- matches the character - with index 4510 (2D16 or 558) literally (case insensitive)
~ matches the character ~ with index 12610 (7E16 or 1768) literally (case insensitive)
8th Capturing Group (\.git)
\. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
git matches the characters git literally (case insensitive)
9th Capturing Group (\/)?
? matches the previous token between zero and one times, as many times as possible, giving back as needed (greedy)
\/ matches the character / with index 4710 (2F16 or 578) literally (case insensitive)
Global pattern flags
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Hope this article was useful to check the validity of a GitHub URL using regex.