Make Perl regex script faster using grep

Perl’s implementation of regular expressions performs much worse than grep’s implementation of regular expressions (i.e., Thompson NFA).

Time to match a?nan against an

I needed to write a Perl script to process numerous large log files, and Perl’s regex matching was not fast enough.

In a Unix/Linux environment, you can call the system() subroutine within Perl to execute shell commands like grep. In appropriate situations, it might be more efficient to call grep instead of using Perl to open files, read them line by line, and match each line against a regular expression. However, I needed backreferences to extract strings within a pattern, and grep only supports backreferences within the regular expression, not backreferences that can be used after.

I managed to reduce my script execution time for a given data set from 20 seconds to 11 seconds by combining grep and native Perl regular expression matching. I found the matching lines of the log files using grep, outputted the results to a temp file, and then used Perl regex matching to process the lines in the temp file.

For another part of my script, I tried using
system("echo '$string' | egrep '$pattern' > /dev/null") == 0
instead of
$string =~ m/$pattern/
within a loop, because I thought that (e)grep would always be faster, but system calls are expensive, so it ended up making my script unbearably slow.

In summary, if you need to speed up your Perl script and your Perl script processes text files, calling grep might improve performance. However, calling grep within a loop tends to be inefficient.

This entry was posted in Development, Programming and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s