Published on

# Regex for File Extension validation

A file is a collection of data that is stored in a computer or other electronic device. It can contain any type of information, such as text, numbers, images, audio, or video. There are many different types of files, including text files, data files, audio files, video files, and more. The type of file is determined by the file extension, which is the three or four letter code that appears after the period at the end of the file name. For example, a file with the extension ".txt" is a text file, while a file with the extension ".mp3" is an audio file. In this article let's understand how we can create a regex for file and how regex can be matched for file with extensions.

Regex (short for regular expression) is a powerful tool used for searching and manipulating text. It is composed of a sequence of characters that define a search pattern. Regex can be used to find patterns in large amounts of text, validate user input, and manipulate strings. It is widely used in programming languages, text editors, and command line tools.

# Structure of file name

• A complete file name consists of - name of file and its extension
• File name can contain any character except for the following - \ / : * ? " < > |
• File extension can contain any character except for the following - \ / : * ? " < > |
• File extension can be of any length depending on the operating system
• File name can be of any length depending on the operating system

# Regex for checking if File Extension is valid or not

/(?i:^.*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)/gm  Test string examples for the above regex- Input StringMatch Output something.wrongextdoes not match donald-trump.is.from.usa.pdfmatches 213245does not match mydoc.docxmatches ## Regular Expression for spreadsheet file extensions - .xls, .xlsx, .csv, .ods, .fods, .ots, .gnumeric, .numbers /(?i:^.*\.(xls|xlsx|csv|ods|fods|ots|gnumeric|numbers)$)/gm


Test string examples for the above regex-

Input StringMatch Output
something.wrongextdoes not match
finance-report.xlsxmatches
213245does not match
email-list.csvmatches

/(?i:^.*\.(ppt|pptx|pps|ppsx|odp|fodp|otp|key)$)/gm  Test string examples for the above regex- Input StringMatch Output something.wrongextdoes not match finance-report.odpmatches 213245does not match pitch-deck.pptxmatches ## Regular Expression for image file extensions - .jpg, .jpeg, .png, .gif, .bmp, .tiff, .psd, .raw, .cr2, .nef, .orf, .sr2 Regex supporting standard image file formats like .jpg, .jpeg, .png, .gif /(?i:^.*\.(jpg|jpeg|png|gif)$)/gm


Regex supporting all image formats

/(?i:^.*\.(jpg|jpeg|png|gif|bmp|tiff|psd|raw|cr2|nef|orf|sr2)$)/gm  Test string examples for the above regex- Input StringMatch Output something.wrongextdoes not match an-image.pngmatches 213245does not match image-raw-file.psdmatches ## Regular Expression for audio file extensions - .mp3, .wav, .wma, .aac, .flac, .ogg, .m4a, .aiff, .alac, .amr, .ape, .au, .mpc, .tta, .wv, .opus Regex supporting standard audio file formats like .mp3, .wav, .m4a /(?i:^.*\.(mp3|wav|m4a)$)/gm


Regex supporting all audio formats

/(?i:^.*\.(mp3|wav|wma|aac|flac|ogg|m4a|aiff|alac|amr|ape|au|mpc|tta|wv|opus)$)/gm  Test string examples for the above regex- Input StringMatch Output something.wrongextdoes not match my-song.wavmatches 4532.pngdoes not match audio-raw-file.aiffmatches ## Regular Expression for video file extensions - .mp4, .avi, .wmv, .mov, .flv, .mkv, .webm, .vob, .ogv, .m4v, .3gp, .3g2, .mpeg, .mpg, .m2v, .m4v, .svi, .3gpp, .3gpp2, .mxf, .roq, .nsv, .flv, .f4v, .f4p, .f4a, .f4b Regex supporting standard video file formats like .mp4, .avi, .wmv, .mov, .flv, .mkv /(?i:^.*\.(mp4|mov|avi|mkv|flv)$)/gm


Regex supporting all video formats

/(?i:^.*\.(mp4|avi|wmv|mov|flv|mkv|webm|vob|ogv|m4v|3gp|3g2|mpeg|mpg|m2v|m4v|svi|3gpp|3gpp2|mxf|roq|nsv|flv|f4v|f4p|f4a|f4b)$)/gm  Test string examples for the above regex- Input StringMatch Output something.wrongextdoes not match my-video.mp4matches 4532.pngdoes not match video-file.webmmatches ## Regular Expression for compressed file extensions - .zip, .rar, .7z, .tar, .gz, .bz2, .xz, .iso, .dmg Regex supporting standard compressed file formats like .zip, .rar, .7z, .tar /(?i:^.*\.(zip|rar|7z|tar)$)/gm


Regex supporting all compressed file formats

/(?i:^.*\.(mp4|avi|wmv|mov|flv|mkv|webm|vob|ogv|m4v|3gp|3g2|mpeg|mpg|m2v|m4v|svi|3gpp|3gpp2|mxf|roq|nsv|flv|f4v|f4p|f4a|f4b)$)/gm  Test string examples for the above regex- Input StringMatch Output something.wrongextdoes not match my-video.mp4matches 4532.pngdoes not match video-file.webmmatches # Explanation of Regex Here is a detailed explanation of the document file extension regex-  /(?i:^.\*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$)/gm

Non-capturing Group. Matches the tokens contained with the following effective flags: gmi (?i:^.\*\.(doc|docx|pdf|txt|rtf|odt|wps|wpd|pages)$) i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z]) ^ asserts position at start of a line . matches any character (except for line terminators) - matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy) \. matches the character . with index 4610 (2E16 or 568) literally (case insensitive) 1st Capturing Group (doc|docx|pdf|txt|rtf|odt|wps|wpd|pages) 1st Alternative doc doc matches the characters doc literally (case insensitive) 2nd Alternative docx docx matches the characters docx literally (case insensitive) 3rd Alternative pdf pdf matches the characters pdf literally (case insensitive) 4th Alternative txt txt matches the characters txt literally (case insensitive) 5th Alternative rtf rtf matches the characters rtf literally (case insensitive) 6th Alternative odt odt matches the characters odt literally (case insensitive) 7th Alternative wps wps matches the characters wps literally (case insensitive) 8th Alternative wpd wpd matches the characters wpd literally (case insensitive) 9th Alternative pages pages matches the characters pages literally (case insensitive)$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and \$ to match the begin/end of each line (not only begin/end of string)