Adding Search to the Blog

How to add search to a static site.

There鈥檚 something new on the site. Up in the header there is a search box. You can now search through the posts I鈥檝e made. Go ahead and try it out, it works pretty well. Come back if you want an explanation as to how I did that.

Search isn鈥檛 a difficult thing to implement, but getting it right is very hard. I鈥檝e worked on many kinds of searches over the last few years, and I never (ever) want to do search by hand. Always use a dedicated search tool or service.

When you have a server powering the site, search tools are plentiful. I鈥檝e used Solr, which is tricky to work with and get configured but extremely powerful. And recently I鈥檝e converted a project at work from Solr to Azure Search. Azure Search is a bit more simplistic, but a lot easier to deal with since it鈥檚 mostly managed by Azure.

But this site doesn鈥檛 have a server powering it. It鈥檚 all static. There are files on my computer, I run hexo generate, and there are files that can be put on any webserver and served up as-is. No per-request processing required. It makes the site super fast.

Someday I鈥檒l do a post about how I host this site and others on Netlify.

Then last week I saw a blog post about searching static sites and decided to try it for myself. After all, I鈥檝e got quite a few posts spanning 5 years now.

Enter: Lunr

Lunr is a search tool, like Solr, but meant to be run client-side. In a browser. All in JavaScript. That seems like it鈥檒l be complex and slow. That鈥檚 what I initially thought, but it performs very well.

All you need is some data to feed it to build and index, and then you search and get results. Sounds easy, right?

Getting JSON from Hexo

So we need data from the site. But Hexo publishes HTML, not JSON. Well it can publish JSON, too!

The generator has lots of options. After some trial and error, here鈥檚 the config I ended up using:

jsonContent:
    file: posts.json
    dateFormat: YYYY-MM-DD
    meta: false
    pages: false
    posts:
        title: true
        date: true
        text: true
        description: true
        tags: true
        image: true
        path: true
        link: false
        raw: false
        content: false
        slug: false
        updated: false
        comments: false
        permalink: false
        categories: false
        author: false

The resulting JSON file looks something like this:

[
    {
        "title": "Post Title",
        "date": "2020-01-25",
        "text": "The full text of the post, without any HTML!",
        "description": "Post summary line",
        "path": "2020/01/post",
        "tags": [
            {
                "name": "Tag",
                "slug": "Tag",
                "permalink": "https://moscardino.net/tags/Tag/"
            }
        ]
    }
]

Wonderful! It has all the data I鈥檇 like to search and all the data I鈥檇 like to display.

You can see the entire generated file here. It鈥檚 not small, around 220KB, but it compresses very well.

Building the Index

The next step is to load the JSON file and create a Lunr index from it. This part is actually quite easy:

// Load the posts from the json file
let response = await fetch('/posts.json');
let posts = await response.json();

// Create the lunr index
let index = lunr(function () {
    this.ref('id');
    this.field('title');
    this.field('text');
    this.field('description');
    this.field('keywords');

    posts.forEach(function (post, i) {
        post.id = i;
        post.keywords = post.tags.map(tag => tag.name);

        this.add(post);
    }, this);
});

There are two key things here:

  1. Each document in the index needs a .ref(). This must be unique per document. In this case, we are using the index of the post in the JSON file. All other searchable fields should be added with .field()
  2. Our tags array is too complex, so we extract just the name of each tag.

Perform the Search

Now that we have an index, we can make our search:

let urlParams = new URLSearchParams(window.location.search);
let term = urlParams.get('term') || '';
let results = index.search(term);

Cool! What does the results array look like? What do we do with it? The most important thing to know is that the results does not contain the documents to display. You get a ref value which you can use to retrieve the document from the original array of posts. It also contains some more info about the score of each match and the details of the match, but I鈥檓 not using those.

Displaying the Results

Here鈥檚 my code for showing the results:

if (results.length) {
    updateTitleBox(`Showing search results for <strong>${term}</strong>.`);

    // Display the results
    results
        .filter((_, i) => i < 10) // Top 10 results only
        .forEach(result => {
            let post = posts[result.ref];

            if (post) {
                let html = createPostItemHtml(post);

                document.querySelector('main').insertAdjacentHTML('beforeend', html);
            }
        });
}
else
    updateTitleBox(`No results found for <strong>${term}</strong>.`);

updateTitleBox is a helper method to update the text of the box at the top of the results page. createPostItemHtml uses JS template strings to generate the HTML for each result. I don鈥檛 like generating HTML in JS, but loading a templating library for this is overkill.

There is some more stuff in the rest of search.js, but it鈥檚 not very relevant to this post.

Wiring It Up

The last piece to this is to create a page that serves as a results page. This part is very specific to Hexo.

  1. Create a new layout template in the theme folder. I called mine search.ejs. This template needs to have some HTML for our results to be inserted into and it needs to load Lunr and our search.js file. I saved lunr.js from GitHub into my project to I don鈥檛 have to rely on some 3rd-party host.
  2. Create a search page. Not a post, but a page. I put mine at public/search/index.md. The file is empty, except for 2 front-matter properties. layout: search to use the layout we made, and title: Search for the page title.
  3. Add a search form somewhere on the site. I put mine in the header. It鈥檚 really simple, as the label and button are only shown to screen readers:
<form action="/search" method="GET" class="header-search__form">
    <label for="term" class="u-sr-only">
        Search Term
    </label>

    <input type="search" name="term" id="term" class="header-search__input" placeholder="Search" />

    <button type="submit" class="u-sr-only">
        Search
    </button>
</form>

A note about debugging: Using hexo server will not work for testing search as it strips query strings from URLs. I got around this by using http-server and hexo generate -f together. hexo-generate -w will not work consistently because I don鈥檛 think it watches the JS files.

Conclusion

Lunr is awesome. It鈥檚 simple and fast. It鈥檚 also easy to integrate with anything that can output JSON. Implementing search took very little time and I think I get a lot of benefits from it.

There can be improvements. As I add more posts, the JSON file will get larger and the index will take more time to build. Lunr does offer a way to pre-build the index and load that directly, but I would need to find a way to build that into the Hexo build pipeline. Maybe some day.

Update: I figured out pre-building. Read about it here.


Photo by Jo茫o Silas on Unsplash.