Make Perl regex script faster using grep

Perl’s implementation of regular expressions performs much worse than grep’s implementation of regular expressions (i.e., Thompson NFA).

Time to match a?nan against an

I needed to write a Perl script to process numerous large log files, and Perl’s regex matching was not fast enough.

In a Unix/Linux environment, you can call the system() subroutine within Perl to execute shell commands like grep. In appropriate situations, it might be more efficient to call grep instead of using Perl to open files, read them line by line, and match each line against a regular expression. However, I needed backreferences to extract strings within a pattern, and grep only supports backreferences within the regular expression, not backreferences that can be used after.

I managed to reduce my script execution time for a given data set from 20 seconds to 11 seconds by combining grep and native Perl regular expression matching. I found the matching lines of the log files using grep, outputted the results to a temp file, and then used Perl regex matching to process the lines in the temp file.

For another part of my script, I tried using
system("echo '$string' | egrep '$pattern' > /dev/null") == 0
instead of
$string =~ m/$pattern/
within a loop, because I thought that (e)grep would always be faster, but system calls are expensive, so it ended up making my script unbearably slow.

In summary, if you need to speed up your Perl script and your Perl script processes text files, calling grep might improve performance. However, calling grep within a loop tends to be inefficient.

Advertisements
Posted in Development, Programming | Tagged , , , , | Leave a comment

The new TTC subway train has shiny things.

On Saturday, I walked from one end of the new TTC subway train to the other.

As I passed by, I grabbed a red handle bar at the top and pulled it down to a vertical position. When I let it go, it moved back to its default horizontal position.

Going south on the Yonge-University-Spadina line means that orange LEDs turn green. The next stop is a flashing LED. If the LED connects to another subway line, it might not turn green after you pass it.

Subway cars are connected together with shiny plates of metal that you can step on.

Posted in Photos | Tagged , , | 3 Comments

Authenticity and Job Interviews

From my past experience as a job seeker, I performed the best on job interviews when I didn’t care about being weird. Sometimes people give well-meaning advice about what not to do during interviews, but this primes the interviewee into behaving as average as possible and suppressing personality.

For example, a younger peer advised me not to wear my hat, a homburg, to an interview, because it was “weird”. As I approached the interview room, I felt fake for pretending to be a non-hat-wearing person when I would normally wear my hat. I felt self-conscious and awkward, and I pretended to be a normal, non-hat-wearing person.

When the interviewer asked me, “If I was blind, how would you describe the colour blue to me?”, I was more concerned with not being weird, and giving him the answer that I thought he wanted, than with answering honestly and being myself. The other interviewer thought that it would be an “easy” question for me because of my background in cognitive science and artificial intelligence. While one could argue that the question is actually “very difficult” to answer seriously because of the massive scope of the problem, one could also argue that my mind was simply not tuned to creative thinking that day. (Because I didn’t have my hat.)

Continue reading

Posted in Uncategorized | Tagged , | 1 Comment

How to Do Well in School

Taking Tests

When taking tests, answer the easy questions first.

When I was little, I thought that all tests were intelligence tests, and that you had to write your final answer to the current question before proceeding to the next one. I thought that doing questions out of order was “cheating”, because your mark (“IQ”) would improve if you first answered all the questions that you knew. Because of this, I did very poorly on tests; I often got stuck on question and then ran out of time before completing the test.

Subject tests, even math tests, are not intelligence tests! You can do them out of order. Answer the easy questions first, and put a star on the hard ones that you will have to come back to. This way, you will have time to think about the difficult questions and not worry as much about not having enough time to complete the test.

For Multiple Choice questions, cross out the letters of the choices that you know are wrong.

I used to think that smart people do everything inside their heads, and stupid people need to use mental aids, like making marks on the paper to show the steps in their thinking. I thought that if I depended on a pencil, it meant that I was stupid, and that it would also stunt my short term memory, preventing it from growing to match that of other people.

However, there is nothing stupid about using tools to help you think. If people never invented writing, we would never develop mathematics. For multiple choice tests, visually eliminating the wrong answers sometimes leaves only one correct answer. At other times, you can focus your mental effort on deciding between two possible correct answers, instead of trying to keep track of four or more choices at once.
Continue reading

Posted in Programming | Tagged , , , , | 1 Comment

The new TTC subway train is like a spaceship.

The new TTC subway train looks massive on the inside, like a long spaceship.

On September 2, I happily leapt inside the front of the futuristic subway train and when I looked to the back of the train, I couldn’t believe my eyes. The train, and the people inside it, converged to a point. The horizon was made of metal, plastic, and the flesh of Torontonians. I couldn’t even see the end of the train, because it was too far away.

I finally understood that the city of Toronto is huge.

Posted in Photos | Tagged , , | 2 Comments

Emacs Extension: Binary-search-inspired movement

Moving forward and backward word-by-word is inefficient, so I thought it would be useful to reduce the distance between your cursor and your desired position on the line using something like binary search.

If you load this emacs extension (in your init file, etc.), then C M-f will move your cursor forward to the midpoint between the current position and the end of the line. C M-b will move your cursor backward to the midpoint between the current position and the beginning of the line.

(global-set-key "\C-\M-b" 'backward-midpoint)
(global-set-key "\C-\M-f" 'forward-midpoint)

(defun backward-midpoint ( )
"Move backward to midpoint between current position and beginning of line."
(interactive)
(backward-char (/ (- (point) (line-beginning-position)) 2)))

(defun forward-midpoint ( )
"Move forward to midpoint between current position and end of line."
(interactive)
(forward-char (/ (- (line-end-position) (point)) 2)))

This is not the same as binary search, because C M-f, C M-f, C M-b will move the cursor to a position that is before the midpoint of the line. However, it has some properties of binary search and it is practically more useful than word-by-word linear search.

Posted in Programming | Tagged , | Leave a comment

WordPress Plugins: ShushThatNoise and HumansNotBots

I wrote two WordPress Plugins, and both of them are available for download. The first one (ShushThatNoise) is for bloggers who want to reduce comment noise without “censoring”/deleting undesirable comments. It’s a better alternative to disemvoweling, in my opinion.

The second one (HumansNotBots) is the basic version of my email obfuscation method in WordPress Plugin form.

HumansNotBots – Easy, Accessible Email Cloaker

This email cloaking method:

  • is accessible for people browsing with screen readers (e.g., blind people);
  • degrades gracefully for browsers without JavaScript; and
  • works just like a normal, clickable email address for browsers with JavaScript enabled.

Email addresses in the form email AT address DOT com are converted to a clickable version, email@address.com, if JavaScript is enabled. If JavaScript is not enabled (such as for screen readers), then the email address in the form email AT address DOT com is still readable to humans.

Download: HumansNotBots – Easy, Accessible Email Cloaker

Posted in Development, Programming | Tagged , , , , , , , , , , , | Leave a comment