Clojure Regex

Note: If you’re new to regex, check out my Clojure regex introduction.

Clojure programming supports regular expressions, often just called regex. Underneath Clojure’s hood, Clojure is using Java’s java.util.regex package, so it isn’t surprising to see Clojure’s regular expressions (regex) has similarities to Java’s regular expressions.

Now of course, you could write regular expression code directly using Java from Clojure with Java-Interop. For example,

(def some-quote 
  (str "It was the best of times. "
  "It was the worst of times. It was Friday "
  "night and it was late."))

(def day-pattern "\\w*day")

(defn java-interop-regex
  "just doing Clojure regex with Java APIs"
  []
  (let [pat (java.util.regex.Pattern/compile day-pattern) 
        mat (.matcher pat some-quote)
        day-found (.find mat)]
    (println "Is there a day? ... " day-found)))


(java-interop-regex)

would print out,

Is there a day? ...  true

because “Friday” is found in the quote:

It was the best of times. 
It was the worst of times. 
It was Friday night and it 
was late.

All that Java-interop code does is check for any word ending in day within the given string. The Java version would look something like the following.

package com.genedavis.testing;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class MyTest {

  public static void main(String[] args) {
    
    String s = "It was the best of times. "+
        "It was the worst of times. It was Friday "+
        "night and it was late.";
    
    String mp = "Friday";
    
    Pattern p = Pattern.compile(mp);
    Matcher m = p.matcher(s);
    boolean b = m.find();
    System.out.println("Is there a day? ... "+b);
  }

}

Clojure has shortcuts built in for dealing with regex. The first Clojure shortcut for using regex is the #”pattern” macro. “Pattern” in that macro is literally a string that represents a regex pattern.

Another shortcut that Clojure offers for using regular expressions is the (re-find <pattern> <string>)¬†function. (I’m so used to referring to methods in Java, that I almost called re-find a method, but it’s a function, because this is Clojure.) Not too surprisingly, the re-find function checks to see if anything in the provided string matches the pattern provided.

For example,

(def some-quote 
  (str "It was the best of times. "
  "It was the worst of times. It was Friday "
  "night and it was late."))

(def day-pattern #"\w*day")

(defn java-interop-regex
  "just doing Clojure regex with Java APIs"
  []
  (let [day-found (re-find day-pattern some-quote)]
    (println "Is there a day? ... " day-found)))


(java-interop-regex)

would provide the exact same results as our first example above. However, you may not realize that they are the same results. The output would be as follows.

Is there a day? ...  Friday

Notice that the first match, “Friday”, is returned instead of true. Remember that in Clojure nil and false are false, and everything else is true. So by returning the first match, not only does Clojure save you time of dealing with groups, but it also is returning the logical equivalent of true.

Another important difference to note between the Java-Interop and pure Clojure approach to writing regex patterns is that in the Java-Interop version, you have to escape the \ so that it looks like \\. Escaping backslashes in Java regular expressions is a huge pain, so it is nice that with Clojure’s #”pattern” macro, you don’t have to bother with escaping your backslashes.

In case you are curious, the “re” in re-find stands for regular expression. Clojure has several regular expression specific functions, and they all start with “re”. Having the regular expression functions in Clojure start with “re” helps find them in the Clojure function documentation that is sorted alphabetically, because they are all together.

In addition to re-find, Clojure provides re-groups, re-matcher, re-matches, re-pattern, and re-seq.

The re-pattern function returns a Pattern like the #”pattern” macro, but no one really uses re-pattern. They usually use #”pattern”.

The re-matcher function returns a java.util.regex.Matcher given a regular expression and a string.

The re-matches function (not to be confused with the re-matcher function) is the Clojure counterpart to Matcher.matches().

The re-groups function returns a vector of the groups found, or a string if only one un-nested match was found.

The re-seq function returns a lazy-sequence of the matches found in the string.

Enjoy your use of regex in Clojure, and if you need more help, checkout some of my other Clojure regex tutorials.