I love messing with this site. Recently I was working on a new search feature for the AWH website and happened to read a blog post from Mozilla about how MDN’s search works. MDN, for those not familiar, is now a static site with all of it’s content hosted on GitHub. They hook into their own generator’s pipeline to create a JSON file of content to be searchable and use FlexSearch to power the search in the browser.
That all made me re-evaluate the search on my own site.
To be fair, there was nothing wrong with the old search. It was perfectly functional but pretty boring. I wanted to see if I could make it better.
First, the content is fed to a Hexo generator that both creates a Lunr index and an array of post data (just enough to display results). Then the search page takes the query from the URL query string, loads up the pre-built index, runs the search, and shows the results. The search page was accessible through a simple search box in the header which was just a simple HTML form (no JS fanciness there) that was pointing at the search page. Since there is no server back-end to this site, and the form was configured to use
GET, it would simply load the search page with the single input as a query string parameter.
Looking back, there were two problems with this. The first was that I was missing the actual content of the posts from the index (whoops) and the second was that if you didn’t find what you wanted on your first search, you would have to reload the page again (and maybe again and again and again). This would mean re-downloading the Lunr index and posts each time. Sure, that file wasn’t that big and the browser cache should have helped, but it still wasn’t great.
The updated generator function is still pretty much the same. I fixed the missing content in the index first and the size of the JSON file grew substantially so I split the index and posts into separate files. Hexo lets you return an array of objects from a generator such that each object becomes a file. Splitting up that data means the files can be downloaded in parallel in the browser, gaining back some of the performance loss from the size increase.
Compression helps those files greatly. The index file is 700 KB uncompressed, but Brotli gets it down to about 130 KB. The posts file goes from 76 to 44 KB. 🤯
The search box in the site header is now gone, replaced with a link to the search page.
The search page is all new. Since the header search form is gone, there’s now a nice, big search field. As you type, results start appearing. Loading the index and posts is the only part that might be slow, so once that is done then searching is actually really fast. I’m using the same mechanism for getting and showing results as before, but this time tied to the
input event on the search box instead of searching on page load.
There’s a debounce function in there, too, so that I don’t waste resources searching too much while you are still typing.
But wait, this is still using Lunr? MDN uses FlexSearch and it’s supposed to be a lot faster, so why not use that? I’m sure it is a lot faster, but I can’t figure out how to make the export function work with a Hexo generator. Lunr has the benefit that once you have created your index, you can simply use
JSON.stringify to export it and reload it easily with
lunr.Index.load(). FlexSearch uses a export function on the index which takes a callback. I can’t figure out how to get the data from that callback to be part of the return value of the generator. If you know how I could make that work, please let me know.
So that’s the new search. Results as you type! It’s very quick and nice to use. I hope you enjoy it.
The code for the search isn’t crazy huge, but I didn’t want to paste it all into this post so I created a Gist where you can see how it all works.