May 04, 2024

|

Engineering

Use What Works

When we redesigned GhostRemix, our talented designer suggested that we add a RapidRead button to offer article summaries for readers in a hurry. I thought this was a great idea. At the time Google had just released Gemma 2B, a smaller model that promised performance on less-than-stellar hardware. And I had also recently taken an interest in Golang, so this presented a fantastic opportunity to gain experience building and deploying a small, useful Go application.

The idea was simple, GoGemma would be a basic Golang server that accepted a token, a command, and some text. It would return the model’s response and use Redis to cache it for subsequent requests. Despite some tricky bits around crafting the Dockerfile and getting Gemma CPP to compile in production, I got it working.

My first attempts at running on Fly hit out of memory errors. After tweaking the configuration, I eventually found some setups that worked better. But, I couldn't get the performance I wanted without marshaling more resources than I'd intended to use. And then Anthropic, the Amazon-backed OpenAI competitor, released their latest series of Claude models. They were fast, cheap, and offered great results. And using Claude also meant one less service I had to manage myself. The choice was clear.

GoGemma gave me the full Go experience. I learned the basics of the style and philosophy of Go programs, how to build a cozy local development environment, and how to deploy. It’s a great, little project worth peeking at or playing with for those early in their Go journey. But at the end of the day, we use what works.

So, we’ve updated RapidRead to use Claude. Grab an Anthropic key and download the latest code here to add it to your project.