Building search results and highlighting matches with regex
This post is a practical step-by-step. We will be writing some JavaScript that allows us to highlight user entered strings in text. Think of something like a ‘find’ function in a text editor.
Here’s the demo; just enter a string like ‘Manchester’ or ‘Manc JcT’:
http://benfrain.com/playground/highlight.html
The focus of this post is the JavaScript. The HTML and CSS are very basic. As with prior posts, I’m using ECSS for naming conventions.
Let’s make a start. Here is our HTML (note in the linked demo I have in-lined the CSS and JS):
<!DOCTYPE html>
<html lang="en">
<head>
<title>Highlighting text demo by Ben Frain</title>
<style type="text/css">
/* Styles */
</style>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<div class="sch-Search">
<div class="sch-Header">
<input id="schInput" type="text" class="sch-Input"/>
</div>
<div id="schResults" class="sch-Results">
</div>
</div>
<script>
// JS Here
</script>
</body>
</html>
The CSS is largely irrelevant, so I’m not listing it out here. The only key thing worth mentioning is that I have added overflow-x: hidden; overflow-y: scroll;
to the body tag. This is because the page will always be quite long and I dislike the scroll bar appearing and disappearing when the input is emptied:
The goal
A user enters a string of text and we use that string to parse the target text and wrap any matches in a span
tag. The class on this span tag (sch-Result_Highlight
) then allows us to add our highlighting styles.
We could obviously do this on a bunch of ‘lorem ipsum’ text but we are also going to use the string of user input text to decide what text to display in the first place. This is so we can show relevant results but then also highlight what they matched.
So, in plain terms, if I search for ‘Manche’ I want to display any records of data that include ‘Manche’ and then also highlight the ‘Manch’ string in the resultant data.
Get some data
I wanted a large dataset to play with so I opted for data.gov.uk. Ideally, we could just use their API and grab the dataset on page load. For example:
var dataSource = "//data.gov.uk/data/api/service/transport/planned_road_works/road?road=M62";
var request = new XMLHttpRequest();
request.open("GET", dataSource, true);
request.onload = function() {
if (request.status >= 200 && request.status < 400) {
data = JSON.parse(request.responseText);
} else {
console.warn("// We reached our target server, but it returned an error");
}
};
request.onerror = function() {
console.error("// There was a connection error of some sort");
};
request.send();
Note: I didn’t use Fetch as support is poor as I write this.
However, I found the above API a little flakey and I was worried about how long it would hang around so instead I just loaded a subset of that data at the bottom of the JS file in a ‘here’s some data I saved earlier’ style.
data = [
{
traffic_management: "Lane Closure",
status: "Firm",
start_date: "26/10/2011 21:00:00",
road: "M62",
reference_number: "1819373",
published_date: "25/8/2011 11:25:23",
location: "Jct 19 Westbound",
local_authority: "Rochdale",
expected_delay: "No Delay",
end_date: "27/10/2011 05:00:00",
description: "Nightime hardshoulder closure westbound for electrical testing",
closure_type: "Planned Works",
centre_northing: "408723",
centre_easting: "386220",
},
// More
]
As an aside, my second thought was to do an import statement, ES2015 style, to keep the main JS file cleaner like this:
import * as data from './data.js';
However, native support for import
is poor and I didn’t want to get into transpiling with TypeScript or Babel et al.
Get the data relating to input (debounced)
So, we have some data ready to go, so let’s start actually doing something. First off, I want to respond to input in the input field:
schInput.addEventListener("input", function(e) {
debouncedBuildResults(e);
}, false);
Our listener waits for input in our schInput element and then invokes debouncedBuildResults
. Right, let’s look what debouncedBuildResults
looks like:
var debouncedBuildResults = debounce(function(e) {
schResults.innerHTML = "";
if (e.target.value.length < 3) {
return;
}
for (var i = 0; i < SETTINGS.resultsLimit; i++) {
buildResults(e.target.value, data[i]);
}
}, 250);
This inner part of this function is what we actually want to do but wrapped inside a debounce function. I just grabbed the debounce function off the shelf (_lodash I think) so I won’t detail that here. The only point worth mentioning is that without debounce we would be firing on each input; which might choke things up a little. The debounce allows a little breathing room (250 milliseconds in this example).
In terms of what happens on input; first of all we empty the DOM node that contains any existing results. Then if there are less than 3 characters we return out of the function, otherwise, we loop through building out the results (up to the limit of results set in SETTINGS.resultsLimit
):
for (var i = 0; i < SETTINGS.resultsLimit; i++) {
buildResults(e.target.value, data[i]);
}
Building a list of results
For each iteration of the loop, we run the buildResults
function and pass as parameters the value that has been input and the data result from this iteration of the loop (the data is an array of objects so the square bracket notation: data[i]
lets us pick the next one each time). So let’s look now at what buildResults
does. Here is the complete function:
function buildResults(query, itemdata) {
// Make an array from the input string
query = query.split(" ");
query.forEach(function(item) {
// Bail early if we just have a space or a space and then nothing
if (item === " " || item === "") {
return;
}
var reg = "(" + item + ")(?![^<]*>|[^<>]*</)"; // explanation: http://stackoverflow.com/a/18622606/1147859
var regex = new RegExp(reg, "i");
// If the search string(s) aren't found in either key we are interested in, bail
if (!itemdata.local_authority.match(regex) && !itemdata.description.match(regex)) {
return;
}
var aResult = document.createElement("div");
aResult.classList.add("sch-Result");
var authority = document.createElement("h1");
authority.classList.add("sch-Result_Title");
authority.innerHTML = highlightMatchesInString(itemdata.local_authority, query);
aResult.appendChild(authority);
var detail = document.createElement("p");
detail.classList.add("sch-Result_Detail");
detail.innerHTML = highlightMatchesInString(itemdata.description, query);
aResult.appendChild(detail);
schResults.appendChild(aResult);
});
}
The first thing we do is make an array out of whatever the user has entered. This is so we can check if any of the individual strings match. So, say I enter, “Manc closure”, the function will create an array in memory that looks like this: ["Manc", "closure"]
. Now we want to iterate over each item in the array using forEach
. I believe in returning early when you can so if the item itself is just whitespace, we return. Otherwise, we first create a string using the search term.
var reg = "(" + item + ")(?![^<]*>|[^<>]*</)";
We are going to use this string as a regular expression. I’m familiar with the adage:
Some people, when confronted with a problem, think
“I know, I’ll use regular expressions.” Now they have two problems.
Jamie Zawinski
But the more often I use Regular Expressions, the more impressed I am by their versatility.
Now, I can’t take any credit for this regex string. I’d spent hours trying to figure it out until I found the answer here: http://stackoverflow.com/a/18622606/1147859. What this regex ultimately does is prevent any matches that are within HTML tags. This is important otherwise one string could be found within another that has already been highlighted. For example, suppose a search was made for ‘Manchester Chester’. We’d end up with this kind of HTML:
<h1 class="sch-Result_Title"><mark class="sch-Result_Highlight">Man<mark class="sch-Result_Highlight">chester</mark></mark>dale</h1>
mark
element here instead of a span
. More on the mark element here: https://developer.mozilla.org/en/docs/Web/HTML/Element/mark
So, now we have built a string we can pass this to a new Regex:
var regex = new RegExp(reg, "i");
Notice, the second parameter passed is "i"
, so that we can match insensitively (e.g. we aren’t bothered about capitalisation)
So, we next use our regex to check for matches in either of the keys we are interested in (itemdata.local_authority
or itemdata.description
). If we don’t get a match in either key we return out of the function. Otherwise, we proceed to build the result HTML. Here’s an example result:
<div class="sch-Result">
<h1 class="sch-Result_Title">
<mark class="sch-Result_Highlight">Roch</mark>dale
</h1>
<p class="sch-Result_Detail">Nightime hardshoulder closure westbound for electrical testing</p>
</div>
You’ll notice that for the innerHTML
of the title and detail we are calling the highlightMatchesInString
function. This is the final piece of the puzzle. Let’s look at that next.
Highlighting matches
We set the innerHTML
of the sch-Result_Highlight
and sch-Result_Detail
by passing our query alongside the key to the highlightMatchesInString
function like this:
authority.innerHTML = highlightMatchesInString(itemdata.local_authority, query);
So this will set the HTML to be whatever is returned out of that function. Here is the highlightMatchesInString
function itself:
function highlightMatchesInString(string, query) {
// the completed string will be itself if already set, otherwise, the string that was passed in
var completedString = completedString || string;
query.forEach(function(item) {
var reg = "(" + item + ")(?![^<]*>|[^<>]*</)"; // explanation: http://stackoverflow.com/a/18622606/1147859
var regex = new RegExp(reg, "i");
// If the regex doesn't match the string just exit
if (!string.match(regex)) {
return;
}
// Otherwise, get to highlighting
var matchStartPosition = string.match(regex).index;
var matchEndPosition = matchStartPosition + string.match(regex)[0].toString().length;
var originalTextFoundByRegex = string.substring(matchStartPosition, matchEndPosition);
completedString = completedString.replace(regex, `<mark class="sch-Result_Highlight">${originalTextFoundByRegex}</mark>`);
});
return completedString;
}
First of all we create a variable for the string we want to ultimately return. We assign this to itself unless it is null, in which case we set it to be the original string that was passed in.
var completedString = completedString || string;
Then, for each item in the array of strings to match against (remember, the user may have searched for two or more space separated things) we make a regex, just as we did in the prior function. Then if we don’t have a match, we return for the function, otherwise we highlight the text. Here’s how we can do that:
var matchStartPosition = string.match(regex).index;
var matchEndPosition = matchStartPosition + string.match(regex)[0].toString().length;
var originalTextFoundByRegex = string.substring(matchStartPosition, matchEndPosition);
completedString = completedString.replace(regex, `<mark class="sch-Result_Highlight">${originalTextFoundByRegex}</mark>`);
We find the start position with the index
. Then we find the end position of the match within the text by taking the number of the start and adding the length of the matched string. Now, before we wrap this range we need to var
off the original text. We do this by passing our matchStartPosition
and matchEndPosition
values to the substring
method.
Finally, we can set our completedString
to be itself but also replace anything that matches the regex with our wrapped text. I’m using a ES2015 template literal for ease.
completedString = completedString.replace(regex, `<mark class="sch-Result_Highlight">${originalTextFoundByRegex}</mark>`);
Finally, we have our new string so we return that back out of our function:
return completedString;
With any highlights needed applied to the strings, back in the end of the buildResults
function, we then append that result into the DOM.
schResults.appendChild(aResult);
Conclusion
If you’ve made it here, it’s worth looking back on what we’ve done. A little DOM building, a little looping and some array work. We’ve even gotten our hands dirty with some regular expressions, even if the hard work was done for us.
Hi Ben, thanks for the post, one tiny improvement will be to use the
<mark class="sch-Result_Highlight">
element instead of a<span>
.