DebugPointer
Published on

Regex for GSTIN validation

Regex for GSTIN validation

GSTIN stands for Goods and Services Tax Identification Number in India. It is a standard registration number for a person who have registered for Goods and Services Tax(GST). You need to register to GSTIN once you cross a threshold of turnover. In this article let's understand how we can create a regex for GSTIN and how regex can be matched for GSTIN number.

Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.

Structure of GSTIN

  • It should be 15 characters long.
  • The first 2 characters should be a number.
  • The next 10 characters should be the PAN number of the taxpayer.
  • The 13th character (entity code) should be a number from 1-9 or an alphabet.
  • The 14th character should be Z.
  • The 15th character should be an alphabet or a number.

Regex for checking if GSTIN is valid

Regular Expression-

/^[0-9]{2}[A-Z]{5}[0-9]{4}[A-Z]{1}[1-9A-Z]{1}Z[0-9A-Z]{1}$/gm

Test string examples for the above regex-

Input StringMatch Output
06AAD2V1160H122does not match
13BZWCV3512J1ZBmatches
222222222222222does not match
06AADCV1460H1ZImatches

Here is a detailed explanation of the above regex-

/^[0-9]{2}[A-Z]{5}[0-9]{4}[A-Z]{1}[1-9A-Z]{1}Z[0-9A-Z]{1}$/gm

^ asserts position at start of a line
Match a single character present in the list below [0-9]
{2} matches the previous token exactly 2 times
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
Match a single character present in the list below [A-Z]
{5} matches the previous token exactly 5 times
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [0-9]
{4} matches the previous token exactly 4 times
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
Match a single character present in the list below [A-Z]
{1} matches the previous token exactly one time (meaningless quantifier)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Match a single character present in the list below [1-9A-Z]
{1} matches the previous token exactly one time (meaningless quantifier)
1-9 matches a single character in the range between 1 (index 49) and 9 (index 57) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Z matches the character Z with index 9010 (5A16 or 1328) literally (case sensitive)
Match a single character present in the list below [0-9A-Z]
{1} matches the previous token exactly one time (meaningless quantifier)
0-9 matches a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
A-Z matches a single character in the range between A (index 65) and Z (index 90) (case sensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Hope this article was useful to match GSTIN regex pattern.