2020-09-28

String Compression in Blazor

Trying some things out for a future project.

Blazor is awesome and something you should care about. So awesome, in fact, that I’m constantly trying to use it for various side projects.

One of those site projects is to recreate my old game, LexiTiles, in Blazor. But there’s something I want to do with it that I haven’t been able to figure out until now: how to do it entirely in Blazor.

Specifically, I want someone to be able to play a game and share their results using a URL. But doing that requires that the board and results be stored somewhere. When I implemented sharing in the old game, I had a web service that stored game results in a database so they could be retrieved. For this new version, I want to do the same thing, but without that web service or database.

Storing Data in URLs

As I was thinking of this, a tweet scrolled by with a link to itty.bitty.site. For those too lazy to click, that site lets you enter some HTML or text, and get a URL that will render that page out. The kicker is that the entire page you create is stored in that URL. There’s no web service or database because everything is done in the browser.

How it works is by taking the text, compressing it, then adding it to the URL hash. When you load up a page, the JavaScript reads the hash, decompresses the text, then renders it to the page. But how much text could you realistically fit into a URL? Quite a bit, as it turns out.

At work, we use 2000 characters as a de-facto limit on long URLs. It’s not a hard limit, but you are generally safe if you stay under it. If you really need a number, 2000 is a safe one to use.

Game Results and Strings

So how can I use that URL limit to save game results? Easy, we just need to pare down the data to the bare minimum required to render the results. Let’s break down how the game works and what the results looks like.

The basic portion of the game consists of a board of 25 letters (5x5 grid). How the board is generated is not important, but it’s mostly random with some weights to alter the frequency of each letter. During the game, the player has 60 seconds to use those letters to create as many words as possible. Once the time is up, the words are scored using Scrabble letter scores with a bonus for longer words. Incorrect words are worth negative points.

I’m planning on using the Enable1 word list this time. I previously used the English Open Word List, but Enable1 seems to be more commonly used for word games.

Breaking down the results, we only really need to know the words that were saved in order to recreate the results. If we have the list of words, we can calculate the score on the fly. For good measure, we will also save the letters that make up the board so we can show the board that created the results, too. If we want that as a string, we could use JSON but it’s a lot more space efficient to concatenate them to a single string (putting the board letters as the first word).

Figuring Out Lengths

Now that we know how we can store results as a string, how long do we think results will end up being? Let’s do some math. Our board will be 25 characters, no getting around that, and each word will need a single character separator. Using some basic word sizes, we can fit the following number of words in 2000 characters:

10 letters: ~178 words
8 letters: ~218 words
6 letters: ~283 words
5 letters: ~330 words
4 letters: ~395 words

Even if someone came up with only 10 letter words, 178 words in 60 seconds will probably break Blazor in some way. Just saving the results as a string would work just fine, but I want to go further.

I’m aware that I can shorten the data string by a bit by using the seed value in place of the Board Letters. The savings would be around 20 characters. Unfortunately, if I tweak how the board is generated (by changing the letter frequency, for instance) then the seed is not enough to recreate the board exactly as it was.

Compression in Blazor

itty.bitty.site uses LZMA compression, but there are other options out there. Specifically, I wanted to look at GZip, BZip2, LZ4, and Deflate. All of those options are easy to implement in .NET Core using either SharpZipLib or EasyCompressor.

System.IO.Compression doesn’t seem to work in WebAssembly. Maybe .NET 5 will address that. Until then, EasyCompressor works for all of the above except for BZip2, which SharpZipLib provides.

To test this, I built a Blazor app that will generate random results and run the string through various compressors to compare the results. You can try it out for yourself here and view the source if you want.

Note that enabling all the various algorithms at once will probably crash the app, which is why there are toggles shown. I think this is due to memory limits either with Mono or WASM. Either way, keep it to a couple at a time and it will work. This is only a proof-of-concept test app.

Based on playing with that app for a while, my conclusion is this:

Most games will be between 100-400 characters long.
At that length, compression isn’t really necessary, but Deflate generally results in data that is smaller and it takes effectively no time to perform.
Even bumping the numbers up to results over 2000 characters, Deflate generally wins though GZip starts to be more useful around that size.
BZip2 tends to be a bit slower and produce longer strings than GZip, though not by much. I’m guessing BZip2 would work better with more data.
And LZMA and LZ4 are not suited for this at all, taking a long time (relatively) and generally producing larger strings than the source, especially at smaller sizes. Like BZip2, I’m guessing they are more efficient with more data.

So whenever I get around to finally remaking LexiTiles in Blazor (I’ve started on it but it’s still very early on), I know I can do what I want. Results can be shared as a URL with no web service or database required. Thanks to some simple string manipulation and Deflate.