Introduction to Clojure Regular Expressions

If you need to validate user input, find and replace phrases or words in files, or extract information from documents, then you want to use regular expressions. Regular expressions, also referred to as regex, are the go-to tool for programmers when dealing with any sort of find, find-replace, or data extraction involving strings. I’ve worked with regular expressions for decades, and never found a situation where a match needed to be made, that regular expressions couldn’t come to the rescue.

Clojure regular expressions are based off of Java’s regular expressions API. Clojure uses Pattern and Matcher extensively. However, you don’t need to learn Java to use Clojure’s Regular Expressions. This tutorial teaches the basics of using Clojure’s regex.

To start with, you need to understand the #”pattern” macro that Clojure provides. #”pattern” creates a matching pattern from whatever you replace pattern with. If Clojure is the first language that you’ve used regex with, never fear. Matching patterns are ¬†easy to learn.

When using regular expressions, you need a string to search, and a pattern to search for. In its simplest form, a pattern is just a word like you’d use when using the find feature in your favorite browser or word processor.

(defn my-regex
  "just doing Clojure regex"
  []
  (let 
    [result 
     (re-find 
       #"Waldo" 
       "People here includes Waldo!")]
    (println "Who did you find? ... " result)))


(my-regex)

Here is the resulting printed line from the Clojure code above.

Who did you find? ...  Waldo

We’re looking to find Waldo, and we found him!

All the Clojure program above does is define a function called “my-regex” that calls a function from the Clojure APIs called “re-find”. “re-find” stands for regular expression find. “re-find” takes a pattern and a string, and then compares them for a match. If the pattern matches something in the string, then re-find returns the first instance of a match in the string. Finally, the Clojure program prints a line of text informing us about the status of the match.

Our pattern was just the word, “Waldo”. Finding a word in a long string is only the most remedial use of regular expressions, so we’ll look at patterns again in a moment.

If you’ve ever created a form for a user submit, or if you’ve ever needed to find a specific word or phrase in a big list of files, you probably see the utility of regular expressions at this point.

Back to the re-find function, … re-find returns the first match of the pattern in the provided string when a match is found. What about when a match is not found? In that case, the re-find function returns nil. So, re-find either returns a string matching the pattern or nil. This is significant, because nil (and false) means false in Clojure, and everything else means false. So the return value of re-find is a logical true or false.

Here’s an example.

(defn my-regex
  "just doing Clojure regex"
  []
  (let 
    [result 
     (re-find 
       #"Waldo" 
       "No one here but us chickens!")]
    (if result
      (println "Who did you find? ... " result)
      (println "No luck. Try again."))))


(my-regex)

We check our result with an if statement in the above code. The result is a nil / false. So the Clojure code prints the following message.

No luck. Try again.

So that is an example of doing a find on a string in Clojure. Now let’s take a closer look at using the #”pattern” macro.

What if you want to look for one out of a list of words? Any of the words works, but at least one of the words needs to be there. In that case, make the pattern listing the words, and separating them with the pipe. The pipe looks like a vertical line … |. Don’t use any spaces, just the pipe to separate the words you’re looking for.

Here’s an example,

(defn my-regex
  "just doing Clojure regex"
  []
  (let 
    [result 
     (re-find 
       #"Waldo|Bob" 
       "Walter and Bob are here.")]
      (println "Who did you find? ... " result)))


(my-regex)

As you can see, we look for Bob or Waldo, and happen to find Bob.

This simple search actually gets thrown off by simple lack of or mixed up capitalization. You won’t find “bob”, but will find “Bob”.

It’s easy enough to do a case insensitive search by adding a (?i) flag to your pattern. The flag goes at the beginning of the pattern definition, as in #”(?i)pattern”. There are a whole bunch of flags that you can add to your Clojure regular expressions.

Here’s an example of case insensitive searching in Clojure with regex.

(defn my-regex
  "just doing Clojure regex"
  []
  (let 
    [result 
     (re-find 
       #"(?i)waldo|bob" 
       "Walter and BOB are here.")]
      (println "Who did you find? ... " result)))


(my-regex)

We’ve barely scratched the surface of Clojure regular expression patterns. Check my next tutorial for more on Clojure regular expression patterns.