moscardino.net

Adding Search to the Blog

How to add search to a static site.

There’s something new on the site. Up in the header there is a search box. You can now search through the posts I’ve made. Go ahead and try it out, it works pretty well. Come back if you want an explanation as to how I did that.

Search isn’t a difficult thing to implement, but getting it right is very hard. I’ve worked on many kinds of searches over the last few years, and I never (ever) want to do search by hand. Always use a dedicated search tool or service.

When you have a server powering the site, search tools are plentiful. I’ve used Solr, which is tricky to work with and get configured but extremely powerful. And recently I’ve converted a project at work from Solr to Azure Search. Azure Search is a bit more simplistic, but a lot easier to deal with since it’s mostly managed by Azure.

But this site doesn’t have a server powering it. It’s all static. There are files on my computer, I run hexo generate, and there are files that can be put on any webserver and served up as-is. No per-request processing required. It makes the site super fast.

Someday I’ll do a post about how I host this site and others on Netlify.

Then last week I saw a blog post about searching static sites and decided to try it for myself. After all, I’ve got quite a few posts spanning 5 years now.

Enter: Lunr

Lunr is a search tool, like Solr, but meant to be run client-side. In a browser. All in JavaScript. That seems like it’ll be complex and slow. That’s what I initially thought, but it performs very well.

All you need is some data to feed it to build and index, and then you search and get results. Sounds easy, right?

Getting JSON from Hexo

So we need data from the site. But Hexo publishes HTML, not JSON. Well it can publish JSON, too!

The generator has lots of options. After some trial and error, here’s the config I ended up using:

_config.yml
jsonContent:
file: posts.json
dateFormat: YYYY-MM-DD
meta: false
pages: false
posts:
title: true
date: true
text: true
description: true
tags: true
image: true
path: true
link: false
raw: false
content: false
slug: false
updated: false
comments: false
permalink: false
categories: false
author: false

The resulting JSON file looks something like this:

posts.json
[
{
"title": "Post Title",
"date": "2020-01-25",
"text": "The full text of the post, without any HTML!",
"description": "Post summary line",
"path": "2020/01/post",
"tags": [
{
"name": "Tag",
"slug": "Tag",
"permalink": "https://moscardino.net/tags/Tag/"
}
]
}
]

Wonderful! It has all the data I’d like to search and all the data I’d like to display.

You can see the entire generated file here. It’s not small, around 220KB, but it compresses very well.

Building the Index

The next step is to load the JSON file and create a Lunr index from it. This part is actually quite easy:

search.js
// Load the posts from the json file
let response = await fetch('/posts.json');
let posts = await response.json();

// Create the lunr index
let index = lunr(function () {
this.ref('id');
this.field('title');
this.field('text');
this.field('description');
this.field('keywords');

posts.forEach(function (post, i) {
post.id = i;
post.keywords = post.tags.map(tag => tag.name);

this.add(post);
}, this);
});

There are two key things here:

  1. Each document in the index needs a .ref(). This must be unique per document. In this case, we are using the index of the post in the JSON file. All other searchable fields should be added with .field()
  2. Our tags array is too complex, so we extract just the name of each tag.

Now that we have an index, we can make our search:

search.js
let urlParams = new URLSearchParams(window.location.search);
let term = urlParams.get('term') || '';
let results = index.search(term);

Cool! What does the results array look like? What do we do with it? The most important thing to know is that the results does not contain the documents to display. You get a ref value which you can use to retrieve the document from the original array of posts. It also contains some more info about the score of each match and the details of the match, but I’m not using those.

Displaying the Results

Here’s my code for showing the results:

search.js
if (results.length) {
updateTitleBox(`Showing search results for <strong>${term}</strong>.`);

// Display the results
results
.filter((_, i) => i < 10) // Top 10 results only
.forEach(result => {
let post = posts[result.ref];

if (post) {
let html = createPostItemHtml(post);

document.querySelector('main').insertAdjacentHTML('beforeend', html);
}
});
}
else
updateTitleBox(`No results found for <strong>${term}</strong>.`);

updateTitleBox is a helper method to update the text of the box at the top of the results page. createPostItemHtml uses JS template strings to generate the HTML for each result. I don’t like generating HTML in JS, but loading a templating library for this is overkill.

There is some more stuff in the rest of search.js, but it’s not very relevant to this post.

Wiring It Up

The last piece to this is to create a page that serves as a results page. This part is very specific to Hexo.

  1. Create a new layout template in the theme folder. I called mine search.ejs. This template needs to have some HTML for our results to be inserted into and it needs to load Lunr and our search.js file. I saved lunr.js from GitHub into my project to I don’t have to rely on some 3rd-party host.
  2. Create a search page. Not a post, but a page. I put mine at public/search/index.md. The file is empty, except for 2 front-matter properties. layout: search to use the layout we made, and title: Search for the page title.
  3. Add a search form somewhere on the site. I put mine in the header. It’s really simple, as the label and button are only shown to screen readers:
<form action="/search" method="GET" class="header-search__form">
<label for="term" class="u-sr-only">
Search Term
</label>

<input type="search" name="term" id="term" class="header-search__input" placeholder="Search" />

<button type="submit" class="u-sr-only">
Search
</button>
</form>

A note about debugging: Using hexo server will not work for testing search as it strips query strings from URLs. I got around this by using http-server and hexo generate -f together. hexo-generate -w will not work consistently because I don’t think it watches the JS files.

Conclusion

Lunr is awesome. It’s simple and fast. It’s also easy to integrate with anything that can output JSON. Implementing search took very little time and I think I get a lot of benefits from it.

There can be improvements. As I add more posts, the JSON file will get larger and the index will take more time to build. Lunr does offer a way to pre-build the index and load that directly, but I would need to find a way to build that into the Hexo build pipeline. Maybe some day.

Photo by João Silas on Unsplash.