DebugPointer
Published on

Regex for File Extension validation

Regex for File Extension validation

A file is a collection of data that is stored in a computer or other electronic device. It can contain any type of information, such as text, numbers, images, audio, or video. There are many different types of files, including text files, data files, audio files, video files, and more. The type of file is determined by the file extension, which is the three or four letter code that appears after the period at the end of the file name. For example, a file with the extension ".txt" is a text file, while a file with the extension ".mp3" is an audio file. In this article let's understand how we can create a regex for file and how regex can be matched for file with extensions.

Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.

Structure of file name

  • A complete file name consists of - name of file and its extension
  • File name can contain any character except for the following - \ / : * ? " < > |
  • File extension can contain any character except for the following - \ / : * ? " < > |
  • File extension can be of any length depending on the operating system
  • File name can be of any length depending on the operating system

Regex for checking if File Extension is valid or not

Regular Expression for document file extensions - .doc, .docx, .pdf, .txt, .rtf, .odt, .wps, .wpd, .pages

/(?i:^.*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)/gm

Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
donald-trump.is.from.usa.pdfmatches
213245does not match
mydoc.docxmatches

Regular Expression for spreadsheet file extensions - .xls, .xlsx, .csv, .ods, .fods, .ots, .gnumeric, .numbers

/(?i:^.*\.(xls|xlsx|csv|ods|fods|ots|gnumeric|numbers)$)/gm

Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
finance-report.xlsxmatches
213245does not match
email-list.csvmatches

Regular Expression for presentation file extensions - .ppt, .pptx, .pps, .ppsx, .odp, .fodp, .otp, .key

/(?i:^.*\.(ppt|pptx|pps|ppsx|odp|fodp|otp|key)$)/gm

Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
finance-report.odpmatches
213245does not match
pitch-deck.pptxmatches

Regular Expression for image file extensions - .jpg, .jpeg, .png, .gif, .bmp, .tiff, .psd, .raw, .cr2, .nef, .orf, .sr2

Regex supporting standard image file formats like .jpg, .jpeg, .png, .gif

/(?i:^.*\.(jpg|jpeg|png|gif)$)/gm

Regex supporting all image formats

/(?i:^.*\.(jpg|jpeg|png|gif|bmp|tiff|psd|raw|cr2|nef|orf|sr2)$)/gm

Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
an-image.pngmatches
213245does not match
image-raw-file.psdmatches

Regular Expression for audio file extensions - .mp3, .wav, .wma, .aac, .flac, .ogg, .m4a, .aiff, .alac, .amr, .ape, .au, .mpc, .tta, .wv, .opus

Regex supporting standard audio file formats like .mp3, .wav, .m4a

/(?i:^.*\.(mp3|wav|m4a)$)/gm

Regex supporting all audio formats

/(?i:^.*\.(mp3|wav|wma|aac|flac|ogg|m4a|aiff|alac|amr|ape|au|mpc|tta|wv|opus)$)/gm

Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
my-song.wavmatches
4532.pngdoes not match
audio-raw-file.aiffmatches

Regular Expression for video file extensions - .mp4, .avi, .wmv, .mov, .flv, .mkv, .webm, .vob, .ogv, .m4v, .3gp, .3g2, .mpeg, .mpg, .m2v, .m4v, .svi, .3gpp, .3gpp2, .mxf, .roq, .nsv, .flv, .f4v, .f4p, .f4a, .f4b

Regex supporting standard video file formats like .mp4, .avi, .wmv, .mov, .flv, .mkv

/(?i:^.*\.(mp4|mov|avi|mkv|flv)$)/gm

Regex supporting all video formats

/(?i:^.*\.(mp4|avi|wmv|mov|flv|mkv|webm|vob|ogv|m4v|3gp|3g2|mpeg|mpg|m2v|m4v|svi|3gpp|3gpp2|mxf|roq|nsv|flv|f4v|f4p|f4a|f4b)$)/gm

Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
my-video.mp4matches
4532.pngdoes not match
video-file.webmmatches

Regular Expression for compressed file extensions - .zip, .rar, .7z, .tar, .gz, .bz2, .xz, .iso, .dmg

Regex supporting standard compressed file formats like .zip, .rar, .7z, .tar

/(?i:^.*\.(zip|rar|7z|tar)$)/gm

Regex supporting all compressed file formats

/(?i:^.*\.(mp4|avi|wmv|mov|flv|mkv|webm|vob|ogv|m4v|3gp|3g2|mpeg|mpg|m2v|m4v|svi|3gpp|3gpp2|mxf|roq|nsv|flv|f4v|f4p|f4a|f4b)$)/gm

Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
my-video.mp4matches
4532.pngdoes not match
video-file.webmmatches

Explanation of Regex

Here is a detailed explanation of the document file extension regex-


/(?i:^.\*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)/gm

Non-capturing Group. Matches the tokens contained with the following effective flags: gmi (?i:^.\*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
^ asserts position at start of a line
. matches any character (except for line terminators)

- matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
  \. matches the character . with index 4610 (2E16 or 568) literally (case insensitive)
  1st Capturing Group (doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)
  1st Alternative doc
  doc matches the characters doc literally (case insensitive)
  2nd Alternative docx
  docx matches the characters docx literally (case insensitive)
  3rd Alternative pdf
  pdf matches the characters pdf literally (case insensitive)
  4th Alternative txt
  txt matches the characters txt literally (case insensitive)
  5th Alternative rtf
  rtf matches the characters rtf literally (case insensitive)
  6th Alternative odt
  odt matches the characters odt literally (case insensitive)
  7th Alternative wps
  wps matches the characters wps literally (case insensitive)
  8th Alternative wpd
  wpd matches the characters wpd literally (case insensitive)
  9th Alternative pages
  pages matches the characters pages literally (case insensitive)
  $ asserts position at the end of a line
  Global pattern flags
  g modifier: global. All matches (don't return after first match)
  m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

Hope this article was useful to match file extensions regex pattern.