Building search results and highlighting matches with regex

17.03.2017 5 comments

3398 days since last revision. Details are possibly out of date.

This post is a practical step-by-step. We will be writing some JavaScript that allows us to highlight user entered strings in text. Think of something like a ‘find’ function in a text editor.

Here’s the demo; just enter a string like ‘Manchester’ or ‘Manc JcT’:

http://benfrain.com/playground/highlight.html

The focus of this post is the JavaScript. The HTML and CSS are very basic. As with prior posts, I’m using ECSS for naming conventions.

Let’s make a start. Here is our HTML (note in the linked demo I have in-lined the CSS and JS):

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Highlighting text demo by Ben Frain</title>
    <style type="text/css">
        /* Styles */
    </style>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<div class="sch-Search">
    <div class="sch-Header">
    <input id="schInput" type="text" class="sch-Input"/>
    </div>
    <div id="schResults" class="sch-Results">
        
    </div>
</div>

<script>
    // JS Here
</script>
</body>
</html>

The CSS is largely irrelevant, so I’m not listing it out here. The only key thing worth mentioning is that I have added overflow-x: hidden; overflow-y: scroll; to the body tag. This is because the page will always be quite long and I dislike the scroll bar appearing and disappearing when the input is emptied:

The goal

A user enters a string of text and we use that string to parse the target text and wrap any matches in a span tag. The class on this span tag (sch-Result_Highlight) then allows us to add our highlighting styles.

We could obviously do this on a bunch of ‘lorem ipsum’ text but we are also going to use the string of user input text to decide what text to display in the first place. This is so we can show relevant results but then also highlight what they matched.

So, in plain terms, if I search for ‘Manche’ I want to display any records of data that include ‘Manche’ and then also highlight the ‘Manch’ string in the resultant data.

Get some data

I wanted a large dataset to play with so I opted for data.gov.uk. Ideally, we could just use their API and grab the dataset on page load. For example:

var dataSource = "//data.gov.uk/data/api/service/transport/planned_road_works/road?road=M62";
var request = new XMLHttpRequest();
request.open("GET", dataSource, true);

request.onload = function() {
    if (request.status >= 200 && request.status < 400) {
        data = JSON.parse(request.responseText);
    } else {
        console.warn("// We reached our target server, but it returned an error");
    }
};

request.onerror = function() {
    console.error("// There was a connection error of some sort");
};

request.send();

Note: I didn’t use Fetch as support is poor as I write this.

However, I found the above API a little flakey and I was worried about how long it would hang around so instead I just loaded a subset of that data at the bottom of the JS file in a ‘here’s some data I saved earlier’ style.

data = [
    {
        traffic_management: "Lane Closure",
        status: "Firm",
        start_date: "26/10/2011 21:00:00",
        road: "M62",
        reference_number: "1819373",
        published_date: "25/8/2011 11:25:23",
        location: "Jct 19 Westbound",
        local_authority: "Rochdale",
        expected_delay: "No Delay",
        end_date: "27/10/2011 05:00:00",
        description: "Nightime hardshoulder closure westbound for electrical testing",
        closure_type: "Planned Works",
        centre_northing: "408723",
        centre_easting: "386220",
    },
    // More
]

As an aside, my second thought was to do an import statement, ES2015 style, to keep the main JS file cleaner like this:

import * as data from './data.js';

However, native support for import is poor and I didn’t want to get into transpiling with TypeScript or Babel et al.

Get the data relating to input (debounced)

So, we have some data ready to go, so let’s start actually doing something. First off, I want to respond to input in the input field:

schInput.addEventListener("input", function(e) {
    debouncedBuildResults(e);
}, false);

Our listener waits for input in our schInput element and then invokes debouncedBuildResults. Right, let’s look what debouncedBuildResults looks like:

var debouncedBuildResults = debounce(function(e) {
    schResults.innerHTML = "";
    if (e.target.value.length < 3) {
        return;
    }
    for (var i = 0; i < SETTINGS.resultsLimit; i++) {
        buildResults(e.target.value, data[i]);
    }
}, 250);

This inner part of this function is what we actually want to do but wrapped inside a debounce function. I just grabbed the debounce function off the shelf (_lodash I think) so I won’t detail that here. The only point worth mentioning is that without debounce we would be firing on each input; which might choke things up a little. The debounce allows a little breathing room (250 milliseconds in this example).

If you aren’t sure what a debounce is or whether you want a debounce or a throttle, Chris Coyier has you covered

In terms of what happens on input; first of all we empty the DOM node that contains any existing results. Then if there are less than 3 characters we return out of the function, otherwise, we loop through building out the results (up to the limit of results set in SETTINGS.resultsLimit):

for (var i = 0; i < SETTINGS.resultsLimit; i++) {
    buildResults(e.target.value, data[i]);
}

Building a list of results

For each iteration of the loop, we run the buildResults function and pass as parameters the value that has been input and the data result from this iteration of the loop (the data is an array of objects so the square bracket notation: data[i] lets us pick the next one each time). So let’s look now at what buildResults does. Here is the complete function:

function buildResults(query, itemdata) {
    // Make an array from the input string
    query = query.split(" ");

    query.forEach(function(item) {
        // Bail early if we just have a space or a space and then nothing
        if (item === " " || item === "") {
            return;
        }
        var reg = "(" + item + ")(?![^<]*>|[^<>]*</)"; // explanation: http://stackoverflow.com/a/18622606/1147859
        var regex = new RegExp(reg, "i");

        // If the search string(s) aren't found in either key we are interested in, bail
        if (!itemdata.local_authority.match(regex) && !itemdata.description.match(regex)) {
            return;
        }
        
        var aResult = document.createElement("div");
        aResult.classList.add("sch-Result");
        var authority = document.createElement("h1");
        authority.classList.add("sch-Result_Title");
        authority.innerHTML = highlightMatchesInString(itemdata.local_authority, query);
        aResult.appendChild(authority);
        var detail = document.createElement("p");
        detail.classList.add("sch-Result_Detail");
        detail.innerHTML = highlightMatchesInString(itemdata.description, query);
        aResult.appendChild(detail);
        schResults.appendChild(aResult);
    });
}

The first thing we do is make an array out of whatever the user has entered. This is so we can check if any of the individual strings match. So, say I enter, “Manc closure”, the function will create an array in memory that looks like this: ["Manc", "closure"]. Now we want to iterate over each item in the array using forEach. I believe in returning early when you can so if the item itself is just whitespace, we return. Otherwise, we first create a string using the search term.

var reg = "(" + item + ")(?![^<]*>|[^<>]*</)";

We are going to use this string as a regular expression. I’m familiar with the adage:

Some people, when confronted with a problem, think
“I know, I’ll use regular expressions.” Now they have two problems.
Jamie Zawinski

But the more often I use Regular Expressions, the more impressed I am by their versatility.

Now, I can’t take any credit for this regex string. I’d spent hours trying to figure it out until I found the answer here: http://stackoverflow.com/a/18622606/1147859. What this regex ultimately does is prevent any matches that are within HTML tags. This is important otherwise one string could be found within another that has already been highlighted. For example, suppose a search was made for ‘Manchester Chester’. We’d end up with this kind of HTML:

<h1 class="sch-Result_Title"><mark class="sch-Result_Highlight">Man<mark class="sch-Result_Highlight">chester</mark></mark>dale</h1>

Thanks to Alex in the comments below, I’ve now switched to using a mark element here instead of a span. More on the mark element here: https://developer.mozilla.org/en/docs/Web/HTML/Element/mark

So, now we have built a string we can pass this to a new Regex:

var regex = new RegExp(reg, "i");

Notice, the second parameter passed is "i", so that we can match insensitively (e.g. we aren’t bothered about capitalisation)

So, we next use our regex to check for matches in either of the keys we are interested in (itemdata.local_authority or itemdata.description). If we don’t get a match in either key we return out of the function. Otherwise, we proceed to build the result HTML. Here’s an example result:

<div class="sch-Result">
    <h1 class="sch-Result_Title">
        <mark class="sch-Result_Highlight">Roch</mark>dale
    </h1>
    <p class="sch-Result_Detail">Nightime hardshoulder closure westbound for electrical testing</p>
</div>

You’ll notice that for the innerHTML of the title and detail we are calling the highlightMatchesInString function. This is the final piece of the puzzle. Let’s look at that next.

Highlighting matches

We set the innerHTML of the sch-Result_Highlight and sch-Result_Detail by passing our query alongside the key to the highlightMatchesInString function like this:

authority.innerHTML = highlightMatchesInString(itemdata.local_authority, query);

So this will set the HTML to be whatever is returned out of that function. Here is the highlightMatchesInString function itself:

function highlightMatchesInString(string, query) {
    // the completed string will be itself if already set, otherwise, the string that was passed in
    var completedString = completedString || string;
    query.forEach(function(item) {
        var reg = "(" + item + ")(?![^<]*>|[^<>]*</)"; // explanation: http://stackoverflow.com/a/18622606/1147859
        var regex = new RegExp(reg, "i");
        // If the regex doesn't match the string just exit
        if (!string.match(regex)) {
            return;
        }
        // Otherwise, get to highlighting
        var matchStartPosition = string.match(regex).index;
        var matchEndPosition = matchStartPosition + string.match(regex)[0].toString().length;
        var originalTextFoundByRegex = string.substring(matchStartPosition, matchEndPosition);
        completedString = completedString.replace(regex, `<mark class="sch-Result_Highlight">${originalTextFoundByRegex}</mark>`);
    });
    return completedString;
}

First of all we create a variable for the string we want to ultimately return. We assign this to itself unless it is null, in which case we set it to be the original string that was passed in.

var completedString = completedString || string;

Then, for each item in the array of strings to match against (remember, the user may have searched for two or more space separated things) we make a regex, just as we did in the prior function. Then if we don’t have a match, we return for the function, otherwise we highlight the text. Here’s how we can do that:

var matchStartPosition = string.match(regex).index;
var matchEndPosition = matchStartPosition + string.match(regex)[0].toString().length;
var originalTextFoundByRegex = string.substring(matchStartPosition, matchEndPosition);
completedString = completedString.replace(regex, `<mark class="sch-Result_Highlight">${originalTextFoundByRegex}</mark>`);

We find the start position with the index. Then we find the end position of the match within the text by taking the number of the start and adding the length of the matched string. Now, before we wrap this range we need to var off the original text. We do this by passing our matchStartPosition and matchEndPosition values to the substring method.

Finally, we can set our completedString to be itself but also replace anything that matches the regex with our wrapped text. I’m using a ES2015 template literal for ease.

completedString = completedString.replace(regex, `<mark class="sch-Result_Highlight">${originalTextFoundByRegex}</mark>`);

Finally, we have our new string so we return that back out of our function:

return completedString;

With any highlights needed applied to the strings, back in the end of the buildResults function, we then append that result into the DOM.

schResults.appendChild(aResult);

Conclusion

If you’ve made it here, it’s worth looking back on what we’ve done. A little DOM building, a little looping and some array work. We’ve even gotten our hands dirty with some regular expressions, even if the hard work was done for us.

javascript Regex

5 comments:

Alex Bondarev says:

March 21, 2017 at 4:07 pm

Hi Ben, thanks for the post, one tiny improvement will be to use the <mark class="sch-Result_Highlight"> element instead of a <span>.

- Ben Frain says:
  
  March 21, 2017 at 4:22 pm
  
  Hi Alex, excellent thought! Thanks, I will amend!
  
Adam Patterson says:

June 8, 2019 at 5:04 pm

Hey Ben,

Do you happen to have working links to the playground examples?

Cheers!

- Ben Frain says:
  
  June 9, 2019 at 10:03 am
  
  Sorry Adam, I’d switched hosts a month or so back and neglected to copy them over.
  
  I’ve copied them back now. Sorry about that!
  
  - Adam Patterson says:
    
    June 9, 2019 at 4:37 pm
    
    Thanks so much!