Clojure Regex Patterns

Note: If you’re new to regex, check out my Clojure regex introduction.

Entire books have been written on the regular expression patterns. Regular expressions were designed originally by Larry Wall, the creator of the PERL programming language. He has claimed that he was just being lazy, because he didn’t want write the program to match some text, so he invented an entire new language to solve the problem instead. What he really did was revolutionize finding and replacing text in the entire realm of computer programming and word processing.

Clojure, like most languages worth learning, supports a full suite of regex patterns. In my previous tutorial, I explained the basics of finding words in a string using regular expressions and the Clojure language. However, regex doesn’t show its real power until you don’t have much more than an idea what you are looking for. That’s when regex shines its brightest.

Clojure uses the Java’s Pattern object, under the hood. Clojure creates a Pattern object with the #”pattern” macro. This tutorial focuses on common patterns used in Clojure.

#”pattern” can take a word, or multiple words separated by a pipe (|), if finding one of a list of words is what you need. See my previous tutorial for an example of using the pipe in regex patterns.

First let’s introduce the basic pattern you should know. The first are typically called wildcards, because they match a certain number of times, but don’t care what character they are actually matching to. There are many other wildcards, but the most common are “?“, “*“, “+“, and “.“.

? Match one time or no times
* Match zero or more times
+ Match one or more times.
. Match any single character

The following match on specific types of characters, such as only numbers, or only letters, or only spaces.

\d Match any single digit, that is 0-9
\D Match anything that isn’t a number
\s Match whitespace character, such as spaces and tabs
\S Match anything that is not whitespace
\w Match word characters. That includes a-z, A-Z, and 0-9
\W Match anything except letters and numbers

A few special patterns you should know are as follows.

[] Match one-and-only-one character in this
| Match whatever is before or after this
^ Match the beginning of a line of text
$ Match the ending of a line of text

These patterns are made to be mixed and matched. Here’s a few examples.

A common match you’ll see, is checking responses, such as for a yes response from a user.

#"[Yy]es"

That pattern looks for the word “yes” with either a capital or lowercase “Y”.

If you are looking for a number, such as an age, you might do the following.

#"\d+"

The “\d” means find a number. The “+” means find at least one of whatever came before the “+”, but find as many as possible.

The “\d+” regex might look like the following when used in a Clojure program.

(defn my-pattern
  "just doing Clojure regex"
  []
  (let 
    [result 
     (re-find 
       #"\d+" 
       "My age is 23.")]
      (println "Age: " result)))


(my-pattern)

The resulting printed line would look like the following.

Age:  23

If you’re looking for a name, but it might be spelled multiple ways, regex comes in handy again.

#"Ter+[e|a]nce"

That above pattern matches on Terrance, Terrence, and Terance. As you can imagine, Clojure’s regular expression patterns will come in very handy not only for alternate spelling by for search text that might have misspellings.

If you want to check that a line of a string does not contain any numbers, you might try the following.

#"^[\D]*$"

Also, if you want to make sure that you only have number, and no letters or whitespace, try the following.

#"\d*"

Suppose you have a form, and need to know what was entered on the third column? \w is very useful, but needs to be combine with a powerful feature called groups. The next step to learning Clojure regular expressions is to learn about groups. Check out my next tutorial to learn about matching groups with Clojure.