"The old one, known as regexp,
is simple and clean, but a bit slow and not POSIX-compliant.
"The one that shipped with 4.4BSD, regex,
is POSIX compliant, but big and ugly and also slow.
"The newest one, reg, currently is found only in the latest
Tcl distribution (version 8.2x)."
The latest version of Henry's code is a complete rewrite to support Unicode and Advanced Regular Expressions (AREs) as defined in Perl. AREs support things like character classes, e.g. \s and [:space:] match whitespace, and non-greedy matching, e.g. "Hello.*?World" over "Hello, World, World, World!" matches "Hello, World" not "Hello, World, World, World". AREs are described in the Regular Expression Syntax man page provided with this distribution as well as Mastering Regular Expressions from O'Reilly.
For simplicity, my current port does not support Unicode. Ultimately I'd like A and W versions of all of the regex entry points in the style of Win32, but that hasn't happened yet.
Chris Sells, csells@sellsbrothers.com, http://www.sellsbrothers.com.
See the Regular Expression Syntax man page for a description of Henry's implementation of AREs.
AREs, with their expanded functionality, are about 10x slower than BREs. However, feel free to specify the REG_BASIC flag (as described below) to designate the use of BRE if you do not need ARE functionality.
To build, open the reg.dsw project and build either the Release or Debug build, which will produce a regex.lib or regexd.lib file respectively. Upon successfully building the library, the post-build step will copy the following files into a peer directory (not a sub-directory) called common:
The file regex_class.h is also found in the common directory. The CRegex class it defines is discussed below.
// stdafx.h #include <regex.h>
#ifdef _DEBUG #pragma comment(lib, "regexd.lib") #else #pragma comment(lib, "regex.lib") #endif
// recli.h #include "stdafx.h"
const char* pszRE = "Hello.*?World"; const char* pszToMatch = "Hello, World, World, World!"; regex_t re = { 0 }; if( regcomp(&re, pszRE, REG_ADVANCED) ) return -1; regmatch_t rgMatches[11]; if( regexec(&re, pszToMatch, lengthof(rgMatches), rgMatches, 0) ) return -1; regfree(&re); char sz[256]; strncpy(sz, pszToMatch + rgMatches[0].rm_so, rgMatches[0].rm_eo - rgMatches[0].rm_so); sz[rgMatches[0].rm_eo - rgMatches[0].rm_so] = 0; printf("match: '%s'\n", sz);
A C++ class called CRegex, which wraps Henry's regex_t and regmatch_t structures, is included with this distribution in the regex_class.h file. It's meant to be used like so:
// stdafx.h #include "regex_class.h" // regex.lib automatically added to linker line in VC6
// recli.h #include "stdafx.h"
const char* pszRE = "Hello.*?World"; const char* pszToMatch = "Hello, World, World, World!"; CRegex re; if( !re.Compile(pszRE) ) return -1; if( !re.Match(pszToMatch) ) return -1; printf("match: '%s'\n", re2[0].c_str());
As mentioned in Regular Expression Syntax, there are a number of flags you can embed into the regex string itself. If you prefer, you can pass flags separately from the regex string as defined in regex.h, as shown below:
Flag Name | Meaning |
REG_BASIC | Basic Regular Expressions (BREs) |
REG_EXTENDED | Extended Regular Expressions (EREs) |
REG_ADVF | Advanced features in EREs |
REG_ADVANCED | Advanced Regular Expressions (AREs) |
REG_QUOTE | No special characters |
REG_ICASE | Ignore case |
REG_NOSUB | Don't care about sub-expressions |
REG_EXPANDED | Expanded format, white space & comments |
REG_NLSTOP | \n doesn't match . or [^ ] |
REG_NLANCH | ^ matches after \n, $ before |
REG_NEWLINE | Newlines are line terminators |
REG_EXPECT | Report details on partial/limited matches |
Flag Name | Meaning |
REG_NOTBOL | Beginning of string (BOS) is not beginning of line (BOL) |
REG_NOTEOL | End of string (EOS) is not end of line (EOL) |
All modifications to the "reg" library, including extras provided for Win32 and C++, are provided under the same license as the "reg" library itself.