Presentation Screenshots Download Support Development Forum    
            Automatic Save Folder         The filters         The options         The regular expressions

The Regular Expressions

What is it ?

The Regular Expression (or "Reg. exp.", "Regexp") is a matching method to define more accurate and complex filters.

With the Regexp you can filter filename or domain that would be difficult to filter with the usual asterisk '*'.

Note 1 : All the Automatic Save Folder's Filters are case insensitive.
Note 2 : The regexp in this plugin needs to be capture inside a / / statement. All the examples below are writen without the / / at the beginning and the end of the filters in order to be easily readable. When you check the checkbox 'regexp' in the filter window, the / / will be added automatically.



The key words

Like the wildcard character * (asterisk) in normal filters to replace undefine number of letters, Regular Expression has many key words, or wildcards, used to replace specific set of characters.
This page will list some of the commonly used Regexp wildcards.
For detailed information, wildcard list, tutorial and examples, you can visit this website.



Special character : .
The dot . replaces any 1 character (a letter, number or space).
mo..lla matches mozilla and any other word starting with 'mo' followed by 2 characters and finishing with 'lla'.


Special character : *
the * repeats the last character, from zero to infinite time.
mozil*a matches mozia, mozila, mozilla, mozillllllllla, etc.
.* matches many times the dot . = many times any characters


Special character : ^
^ means beginning of the string.
^http only matches first occurence. Good on http://test.com not good in ftp://http_test.com


Special character : $
$ means end of the string.
com$ only matches ending occurence. Good in http://test.com and not good in http://computer.net


Special character : ( )
The parenthesis is used to group letters. The main use is to apply another wildcard on the group.
m(oz)*illa will match any times 'oz' : milla, mozilla, mozozilla, etc.


Special character : ( ) and |
The parenthesis and pipe (aaaa|bbbb|cccc) can capture any matching character either aaaa or bbbb or cccc etc.
m(o|z)illa will match either moilla or mzilla, but not mozilla.


Special character : { }
{x} repeat the previous character x times.
tor{2}ent replace r{2} by rr, and will match torrent

{x,y} repeat the previous character x to y times.
moz{1,2}illa replace z{1,2} by z or zz, and will match either mozilla or mozzilla.


Special character : ?
? matches if the character is present or not. It is the same as writing {0,1}
mpeg video files can be .mpg or .mpeg, the regexp would be mpe?g
Nov(ember)? will match Nov or November


Special character : [ ]
The bracket [ ] is useful to define a multi matches for a given character.
r[aio]se matches the words rase, rise and rose, but not raise.

The minus sign - in the bracket defines a range of characters :
[a-z] means any characters from a to z.
r[a-z]se matches rase, rbse, rcse, ..., rzse.

You can use many - in the same bracket, like [a-zA-Z] to match all alphabetic characters, and [a-zA-Z0-9] to match all alphanumerics
[a-cn-o] to match any letters from a to c and from n to o.

If you need to match the minus - character, place it at the beginning of the bracket, if place elsewere it will define a range.
[-a-z] to match - (minus) and a to z.


The bracket only correspond to one letter.
To define an option on several letters, use these options :
t[se]*t will Corresponds as much as 'tet' or 'tst' or 'test' or 'tset', or finally 'tessest', etc.
t[es]{0,2} will Corresponds to only this possibility : 't', 'te', 'ts', 'tee', 'tss', 'tes', 'tse'


Special character : \
\d matches any decimal (number)
\s matches any whitespace character (space, tab, etc.)
\w matches any word
\b matches any word boundary ( \barc\b = only the word arc, not the arc in parc, nor the arc in arctic)
\uFFFF with FFFF = hexadecimal code, matches the unicode character from the hexadecimal code. example \u00E0 matches à

The \ is also an escape character to cancel the following wildcard effect.
All the above wildcards are used by the Regular expression to do a specific job.
If you need to use any of them in your filter as part of the searched string (and not as a key word), You need to escape (prefix) the said character with a \.
domain.com will need to use domain\.com, to make the dot a real dot and not a "any character" wildcard to prevent the matching word domainicom.

\ → \\ . → \. * → \* ? → \? ^ → \^
$ → \$ ( → \( ) → \) { → \{ } → \}
[ → \[ ] → \] / → \/


Other Special characters :

There are many more filtering methods and possibilities that cannot be explained here.
You can read this documentation : http://www.regular-expressions.info/tutorial.html
Or this one for beginners : http://www.javascriptkit.com/jsref/regexp.shtml



Some examples


Filter examples on filenames :

To match any .rar and .r01 .rxx files :
r(ar|\d{2}) means r + ar, or r + 2 decimals

To filter all the archive files :
.*\.(z(ip|\d{2})|r(ar|\d{2})|jar|bz2|gz|tar|rpm|7z)$

Filter all the images :
.*\.(jpe?g|jpe|gif|png|tiff?|bmp|ico)$

! This is working only if the image is downloaded via the download window, not the right-click.
This is only given as an exemple, as righ-click downloads are not implemented yet in ASF.

To filter all the videos :
.*\.(mp(eg?|[g4])|rm|avi|mov|mkv|divx|asf|qt|wmv|ram|m1v|m2v|rv|vob|asx|og(g|v)|flv)$




Filter examples on domains :

To match any http in .com :
^http.*\.com$

Filter all the ftp protocol
regexp : ^ftp:\/\/.*
non regexp : ftp://*

To match a domain, regardless if there is www or not in the path :
^http:\/\/(|www\.)domain\.com



Differences between Normal and Regexp filters :

filter on a particular domain
regexp : ^http:\/\/(|www\.)domain\.com$
non regexp : http://*domain.com
(but with non regexp, it will matches http://domain.com, http://www.domain.com, but will also matches http://the.bad.domain.com)


to match any domain with 'zilla' in it :
regexp : .*zilla.*
non regexp : zilla



Conclusion

Regular expressions permit a better use of filters, but it can be sometimes much more faster to use simple asterisk widlcard * in a normal filter.
Regular expressions are useful only on complex matching.