Jerry Ng

Building What Michelin Wouldn’t: Its Awards History

Jerry Ng — Thu, 17 Jul 2025 00:00:58 GMT

A few years back, I decided to gather Michelin restaurant data for fun. Over the years, something unexpected happened. My DMs and inbox started flooding with requests I kept having to turn down. Email after email asking the same thing: "Hey, I love your project! Do you have historical Michelin data? Can you tell me when Restaurant Z lost its star?"

Peak into my Gmail

My response was always the same awkward deflection: "Um, no... no way. You can try to get it from Wayback Machine or the old Michelin Guide books from Internet Archive". I must have copy-pasted some variation of that reply dozens of times over the years.

But here's the thing — every time I hit reply on one of those emails, it bugged me a little more. The more I thought about it, the more I realized I was sitting on an interesting problem that apparently no one else was solving.

I started looking into it

I spent way too much time Googling, scouring Reddit threads, Wikipedia, and food blogs. I was looking (and hoping) for someone, anyone, who was systematically tracking Michelin star changes over time.

The results were... not great. Even Michelin themselves don't have this old data published. Sure, you can find scattered blog posts about individual restaurants, but comprehensive historical tracking? Nothing.

Hey, I thought, why not? I’ve got a bit of free time before I start my new job, so I decided to go for it. Honestly, I was really curious to see how restaurants earn and lose their stars over the years.

Using git-history (...and failing at it)

Getting historical Michelin data turned out to be way more complex than I initially thought. My initial idea was straightforward: since I had already started collecting data and committing it to GitHub since July 2022, I figured we could just use git-history — a tool that reads through the entire commit history of a file and generates an SQLite database reflecting changes over time.

However, it wasn't as simple as it seemed. When I tried it out, things didn't go as smoothly as I hoped. Over the years, some snapshots of my own generated data had faulty entries — restaurants with missing star counts, pricing data that said null for months, and addresses that were just empty strings. I had to manually exclude commits from the history, which made the entire process even more complicated than it needed to be.

One night, after the tool crashed for the fifth time, I realized I was fighting a losing battle. I did spend days trying to patch and fix the tool for my own use, but it was too much work, and I decided this wasn't worth my time.

Trying to fix things for my own use

Wayback Machine to the Rescue!

Clearly, I needed a completely different approach. So I pivoted to Plan B: use the Wayback Machine API to get snapshots of restaurant pages, then visit each available snapshot to extract the data. Sounds simple, right?

Wrong.

The ever-changing website layout problem

The first problem hit me immediately — Michelin's website structure changes almost every year. My existing HTML parsing code that worked perfectly? Completely useless.

Each year's snapshot had different CSS/XPath selectors, different div structures. Different everything. What started as a simple data extraction job turned into building a time-traveling HTML parser.

Missing publication dates

Left: snapshot from 2020; Right: snapshot from 2025

Here's where things get a bit tricky. Take a look at the 2020 snapshot (the picture on the left) — it clearly says "MICHELIN GUIDE 2020," so you know the award is for that year. You'd think every year's guide would include that info, right? But surprisingly, after 2021, the Michelin Guide website stopped mentioning the year on the restaurant pages altogether, and honestly, I’m not sure why they made that change.

Now you're probably thinking, "Oh, simple! If the snapshot is from 2025, then the 3-star award must be for 2025.

Nope.

For example, this snapshot was taken on February 7, 2025; the Michelin Guide for the year hadn’t even been released yet (it was only announced on February 10, 2025).

Luckily, when I inspected the HTML behind the page, I found the publication date tucked away deep inside one of the script tags (though the older snapshots do not have this):

See the webpage source for the 2025 snapshot for yourself

So, even if the page looks like it’s from 2025, the 3-star award might actually be for 2024! The website just… assumes you know.

The final extraction logic looks something like:

flowchart TD A[Start: HTML Snapshot] --> B{JSON-LD Script Present?} B -->|Yes| C[Parse JSON-LD for review.datePublished] C -->|Found| F[Return Published Date] C -->|Not Found| D B -->|No| D["Try another text selector"] D --> E{Found any regex match for date?} E -->|Yes| F E -->|No| D E -->|Still No| G["Give up"]

Pricing data nightmare

The pricing data was also all over the place.

Back in 2019, a restaurant might just list prices like "125 - 280 USD" in plain text:

Example snapshot from 2019

By 2023, the same range was represented with a bunch of "$$$$" symbols and totally different HTML markup:

Example snapshot from 2023

I spent hours trying to clean and normalize all those prices across different years, only to realize that even Michelin's own categories kept changing and weren’t consistent. In the end, I decided just to store everything as a string and leave the prices as they were:

A screenshot of the SQLite table (browsed using TablePlus)

The Infrastructure

The last step of this entire project was to ensure that it wouldn't break existing users who are depending on the generated CSV (which also my Kaggle dataset is linked directly to). This means the existing way of generating the latest freshest dataset and publishing to the GitHub workflow must continue to work.

Secondly, the architecture is deliberately simple. The entire system must be simple to maintain so that I won't hate myself after 6 months of maintaining this.

Basically, here’s what the final thing looks like:

Backfilling marathon

First of all, I let the backfilling process run for about 3 days. Once it was done, I just uploaded the SQLite database directly to my MinIO instance via the console.

Oh, before that, I had to refactor my existing scraper to accommodate the new database schema, whereby new historical entries will always be appended to the existing restaurants.

Serving the data

Once everything is done, it's being deployed on Railway.app.

Now, I can browse my SQLite database through a Datasette instance, which serves the data directly via a web interface.

And of course, like I said, I make sure to keep the freshest Michelin data updated and available as CSV on GitHub as part of the flow.

The Result

Now that I have all this historical data, it's time for some fun stuff. I was able to quickly, for example, look for the restaurants in Spain which had the longest 3-star streaks since 2019:

Well, it’s definitely not the full story. Michelin already started including Spain in their guides back in 1910, so there’s that

The rise and fall

Take GästeHaus Klaus Erfort: it used to have 3 stars back in 2020, but has consistently stayed at 2 stars since 2021 (which is pretty impressive nevertheless):

Green stars follow the money

Another cool finding: Green Stars (for sustainability) are way more common in higher price brackets (“€€€€”), especially in Europe:

Makes sense when you think about it - sustainable sourcing and practices cost money, and higher-end restaurants have the margins to support it

On the flip side, Bib Gourmand (good value) is concentrated in “$$” and “€€” (mid-price brackets), rarely in the highest. Which is literally the point of the award, but it's satisfying to see the data confirm what we'd expect.

Top cuisines among the restaurants

In terms of cuisine, starred restaurants are mostly dominated by Creative and Japanese cuisine:

Was anyone expecting differently?

Clearly, there is much more to be discovered, but I think these will do for now.

💡

Want to explore the data yourself? Check out the demo here. Alternatively, you can download the SQLite database.

Flaws

Finally, I must say that the project isn’t perfect. The Wayback Machine only has snapshots starting from 2019, and even then, not every restaurant and year is fully archived.

Secondly, closed restaurants, like Julemont in Wittem, are missing from the current database. It was a two-star place in June 2024, but after closing, it had just disappeared by February 2025. It’d be great if we could keep their info in the records, so we have a more complete history.

💬

Update: the second flaw is addressed by PR#125

Lastly, what we have here doesn’t consider if a restaurant changes its name, switches owners, or even moves to a different location over the years. This is particularly tricky to track, especially if the Michelin guide URL changes for the restaurant.

Rest assured that the award history will still be accurate as long as the Michelin guide URL stays consistent over the years!

Accepting imperfection

At some point, I had to make a choice: spend more weeks trying to achieve perfect historical coverage, or accept that 2019 onwards is good enough.

I went with the latter.

Sure, I'm missing some earlier data, but 2019 gives me a solid base. More importantly, from now on, I have the infrastructure to capture proper historical data for the future. Every month, I'm building tomorrow's historical dataset.

Having said that, I'll probably work on addressing some of these issues in the near future (fingers crossed).

🙏

Thanks so much to everyone who emailed me and took the time to share your feedback so quickly — it truly means a lot to me!

What's Next

Now that I have all this historical data flowing, the real fun is just beginning.

I really hope to keep this project going for a while. Fingers crossed it won’t end up costing me too much. Who knows, maybe in 10 or 20 years, we’ll look back at what we’ve done with even more data — now that would be pretty fun!

How to Replicate DuckDuckGo Bangs in Firefox

Jerry Ng — Tue, 17 Jun 2025 00:00:57 GMT

I finally made the switch from Chrome to Firefox (thanks to uBlock Origin drama). If you still get ads when watching YouTube with uBlock Origin, you might have read that recent news where uBlock Origin is now forcefully disabled by Chrome.

While I still occasionally get force-fed with ads on YouTube and Gmail on Chrome, so far it's been bearable. But, I know that eventually things will actually stop working and I'll have to watch YouTube's unskippable ads.

Why I Finally Made the Switch

This wasn't the first time I attempted a browser switch. As mentioned in my previous post, I really like DuckDuckGo's Bangs feature. My previous attempts at switching browsers failed because this feature was missing from other browsers. As of recent months, I got to learn that I was able to replicate the DuckDuckGo bang thing on Firefox now!

Also, I recently bought a new MacBook Air 15" with the M4 chip after using my old Dell XPS 13 for a good 5 years+ with WSL2 for development. I figured, heck, if I were to make the switch from Google Chrome to Firefox, now is the time.

How to Add Custom Search Shortcuts in Firefox

Adding the search shortcut was quite easy.

Step 1: Enable the Feature in `about:config`

Make sure to go to about:config in your Firefox address bar to set browser.urlbar.update2.engineAliasRefresh to true.

This makes the "Add" button appear — otherwise, you can't add custom search shortcuts.

If the "Add" button doesn't appear, that's because you haven't completed you need to enable the browser.urlbar.update2.engineAliasRefresh setting first.

Step 2: Add Your Custom Shortcuts

Navigate to about:preferences#search and click the "Add" button to create your custom search shortcuts.

After that, just add whatever you want. For example, my favorite was !r to append site:reddit.com to my Google search to look for answers from Reddit only.

Add whatever search engine you want!

The Result

!r now works like a charm

That's all! Now I can finally browse without constantly battling Chrome's anti-adblock changes while still keeping my beloved search shortcuts. The switch feels good so far, and I'm honestly wondering why I waited this long to make the move.

My Search Shortcuts

Search Engine	Keyword	Engine URL
Reddit	!r	https://www.google.com/search?q=site:reddit.com+%s
Phind	!p	https://www.phind.com/search?q=%s
YouTube	!yt	https://www.youtube.com/results?search_query=%s
GitHub	!gh	https://github.com/search?q=%s
Hacker News	!hn	https://www.google.com/search?q=site:news.ycombinator.com+%s
Stack Overflow	!so	https://www.google.com/search?q=site:stackoverflow.com+%s
Sourcegraph	!sg	https://sourcegraph.com/search?q=context:global+%s
Perplexity	!pl	https://www.perplexity.ai/search?q=%s

Note: %s represents the search query placeholder that gets replaced with your actual search terms.

Why Not Uppercase in Go Modules Name?

Jerry Ng — Tue, 18 Feb 2025 00:00:36 GMT

I work with Go modules a lot at work. Over time, I've developed a habit of quickly checking the latest version of a module by visiting the /@latest endpoint of a Go proxy, like this:

❯ curl https://proxy.golang.org/github.com/aws/aws-sdk-go/@latest | jq
{
  "Version": "v1.55.6",
  "Time": "2025-01-15T18:57:15Z",
  "Origin": {
    "VCS": "git",
    "URL": "https://github.com/aws/aws-sdk-go",
    "Hash": "e1db430efbf87c6fd64a01c3330ad7df794b8847",
    "Ref": "refs/tags/v1.55.6"
  }
}

This trick is particularly useful when you're building tooling that relies on fetching the latest version of a Go dependency.

Similarly, you can get a list of all available versions using the /@v/list endpoint:

❯ curl https://proxy.golang.org/github.com/aws/aws-sdk-go/@v/list
v1.16.1
v1.13.54
v1.13.9
v1.50.20
# ... (omitted for brevity)

One day, I noticed a strange uptick in errors popping up in our Go proxy server log:

invalid module path encoding "git.company.net/dbops/Redis/v8"

Not a real example

The Investigation

Puzzled, I figured I’d check the Go module repository git.company.net/dbops/Redis/v8 directly, and it worked just fine! Everything resolved with a lovely 200 OK response, which left me scratching my head.

Now even more confused, I decided to double-check by hitting the Go proxy endpoint directly to see if it was actually giving me errors from the client-side:

❯ curl -I https://goproxy.company.net/git.company.net/dbops/Redis/v8/@v/list
HTTP/2 500
date: Fri, 07 Feb 2025 00:04:15 GMT
content-type: application/json; charset=utf-8
cache-control: no-cache, no-store, must-revalidate

And indeed, it was. I started wondering if there was a bug with the Go proxy server!

Running go get with -x flag

Next on my agenda was to try running the go get command on that specific module path with the -x flag.

Now, I expected the go get command to fail too.

💡

Use the -x flag to print the underlying commands as they are executed. This is useful for debugging version control commands when a module is downloaded directly from a repository.

go get -x git.company.net/dbops/Redis/v8@v8.9.10
# get https://goproxy.company.net/github.com/dgryski/go-rendezvous/@v/list
# get https://goproxy.company.net/github.com/cespare/xxhash/v2/@v/list
# get https://goproxy.company.net/git.company.net/dbops/%21redis/v8/@v/list
# get https://goproxy.company.net/github.com/dgryski/go-rendezvous/@v/list: 200 OK (0.513s)
# get https://goproxy.company.net/github.com/dgryski/go-rendezvous/@latest
# get https://goproxy.company.net/github.com/cespare/xxhash/v2/@v/list: 200 OK (0.536s)
# get https://goproxy.company.net/github.com/dgryski/go-rendezvous/@latest: 200 OK (0.385s)
# get https://goproxy.company.net/git.company.net/dbops/%21redis/v8/@v/list: 200 OK (1.645s)
go: added github.com/cespare/xxhash/v2 v2.1.2
go: added github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f
go: added git.company.net/dbops/Redis/v8 v8.9.10

But guess what? It didn’t fail!

On closer look at the requests sent by the go get command, I noticed the characters %21 — that’s when it hit me that it is the URL-encoded version of the ! character!

# get https://goproxy.company.net/git.company.net/dbops/%21redis/v8/@v/list
# get https://goproxy.company.net/git.company.net/dbops/%21redis/v8/@v/list: 200 OK (1.645s)

If you decode the URL it's git.company.net/dbops/!redis/v8/@v/list

So, why was the module name being encoded?

How Go Handles Module

So, it turns out, Go proxy server (e.g. official proxy.golang.org and Athens), converts uppercase letters in module by by prefixing them with a bang (!) followed by a lowercase.

💬

I eventually found the reason why Go does this in the Go documentation.

TL;DR due to the case-insensitive nature of some file systems & web servers, Go needed a way to distinguish between modules with the same name but different casing.

For example:

github.com/Azure/azure-sdk-for-go becomes github.com/!azure/azure-sdk-for-go
github.com/GoogleCloudPlatform/cloudsql-proxy becomes github.com/!google!cloud!platform/cloudsql-proxy

Try it out yourself.

proxy.golang.org/github.com/VictoriaMetrics/VictoriaMetrics/@latest (404 Not Found)
proxy.golang.org/github.com/!victoria!metrics/!victoria!metrics/@latest (200 OK)

Oh, go commands automatically converts it to lowercase and URL-encodes any special characters. In my case, Redis became !redis (which is %21redis when URL-encoded).

This explains why hitting the proxy endpoint directly was giving me errors while my go get command didn’t.

Closing Thoughts

The lesson here? It's best to avoid uppercase letters in module names, especially if you intend to build tooling around Go proxy. Stick to lowercase, and save yourself from some head-scratching moments.

References

I Built a Visa Requirement Change Tracker for Fun

Jerry Ng — Tue, 07 Jan 2025 00:00:54 GMT

Have you ever wondered how visa requirements between countries change over time? I certainly have. It all started when I was planning an international trip and needed to check if I needed a visa for the country I was visiting. A quick Google search gave me the answer, but it sparked a bigger question: How have visa requirements evolved over the years?

I was wondering if countries are getting more relaxed about international travel or if things are getting stricter. Also, I wanted to keep track of when and how visa rules shift between countries in the future, for example:

Recent Visa requirement changes for United States

Surprisingly, I couldn't find a good source for historical visa requirement data online. So, I figured I’d have fun creating something simple to track it. Hopefully, it will stick around for years to come!

Finding the Right Data Source

My first step was to find a credible source. In this case, I turned to the Henley Passport Index, the same site where I initially checked visa requirements.

A quick inspection of the network requests revealed two useful API endpoints:

Inspecting the network tab from the website

api.henleypassportindex.com/api/v3/countries
api.henleypassportindex.com/api/v3/visa-single/:country_iso_code

These APIs provided current visa requirements and historical passport strength data for each country — perfect!

I always prefer using APIs instead of scraping HTML. API data is way more organized and doesn’t tend to break when the website design changes.

Designing the Database Schema

I picked SQLite because it's super easy to use and I am familiar with it.

Based on the available data and the questions I wanted to answer for in mind, I settled on this schema:

erDiagram Country ||--o{ CountryRanking : has Country ||--o{ VisaRequirement : "issues/receives" Country { text code PK text name text region } CountryRanking { text country_code PK, FK int year PK int rank int visa_free_count } VisaRequirement { text from_country PK, FK text to_country PK, FK date effective_date PK text requirement_type }

I hope I don't end up regretting using natural keys here, but honestly, I think it makes sense in this case.

💬

If you tried hitting the /v3/countries endpoint above, you'd notice a field called openness. I'm not exactly sure why does but it just seemed to have the same value for every single country so I'm omitting that here.

Designing the System

My goals were simple:

Keep my cost as low as possible (ideally free to host)
Make it easy to maintain (e.g. simple code, minimizing the number of interacting components)
Make it easy to share with others i.e. host it on the Internet

The Cron Job

I decided to use a technique I call "GHActions Scraping", which I've detailed in my previous post.

Basically, the idea is to use:

GitHub Actions as a cron job for scraping tasks
Workflow Artifacts for storing the SQLite database, eliminating the need for a separate database server

SQLite is perfect for this use case because it's just a flat file that can be easily uploaded to and downloaded from GitHub Artifacts. Anyway, here's what the project would look like:

💡

You can see the full GitHub Actions workflow I've set up for this project here.

The Scraper

Having written several scrapers from finding cheap craft beers to Esports schedules before, I opted for just a simple Python script. No frameworks whatsoever.

The logic was straightforward:

Fetch data for all 227 ~~countries~~ travel destinations
For each country code, fetch its visa requirements
Parse and store this data in SQLite

The script is designed to update existing records if they've changed and only add new ones when necessary to avoid duplication.

You can find the entire Python script on GitHub.

Hosting and Display

For the cron job, I use GitHub Actions. Since I plan to run the job only twice a month, it's essentially free.

To display the data, I chose Datasette hosted on Railway. While Datasette may not be the fanciest looking choice, it gets a lot done without requiring extensive frontend work which I am not really good at.

Continuous Deployment (CD) With Railway Docker Image Source

While setting up CD with Railway, I ran into a little hiccup. Whenever I scrape new data and update my SQLite DB, I have to build a new Docker image. The problem is that we're using a Docker image as our deployment source.

Right now, Railway has no way of knowing when a new Docker image is published on Docker Hub, so it doesn’t automatically deploy the latest one (the railway up command doesn’t do the trick for redeploying Docker images as it tries to build with Nixpacks instead). To get around this, I had to check the last successful deploy ID and then use that ID to trigger a redeployment to make CD work properly for my project (example). If you're hosting Datasette on Vercel, it’s way easier with the datasette-publish-vercel plugin!

The Results (Some Screenshots)

If you're curious about how everything looks like on Datasette, here are some of the interesting findings that I've gathered:

Top 10 countries with the most improved passport rankings (c.a.a 2024)

Compares average visa-free counts by region for the last 5 years (c.a.a 2024)

My personal favorite is the non-reciprocal visa requirements by country:

Non-reciprocal visa relationships refer to situations where two countries have different visa requirements for each other's citizens (c.a.a 2024)

💡

The underlying queries are stored in this metadata.json

Concerns and Caveats

I've thought about some things that could go wrong with this project. Here they are:

API Problems

I'm using an API that isn't officially documented. This means it could stop working as is at any time. Or maybe, the data structure might just change without notice. If that happens, I'll need to update the script or find a new data source, which would be a pain.

Losing Interest

I might get bored or tired of fixing this if it breaks. I've kept my old projects running so far, but it gets harder as I make more things. If it becomes too much work, I might have to shut it down — which is something I sometimes think about.

Increasing Costs

I hope the costs stay low. I could use a static site instead of Datasette to potentially save money, but Datasette is just so good! Observable Framework is such a strong contender for this.

That’s It

The website is now up and running!

I plan to come back to this over time to see how resilient this is and to see how visa policies change over time. Who knows what interesting patterns we might see over time?

💡

This project joins my other similar GHActions scraping efforts, like the California Fire History Database and the Singapore Starbucks Price Database.

4 Ways of Bumping Major Versions in Your Go Project

Jerry Ng — Tue, 05 Nov 2024 00:00:25 GMT

I've recently found myself in a rabbit hole of Go major version bumping to v2. What started as a simple task quickly turned into hours of sifting through conflicting information. Should I use a v2 directory? Create a new v2 branch? What about creating a new repository altogether?

The more I read, the more confused I became. To save future me from this headache, I've decided to jot down the differences that I've learned. I think this will be especially useful if you're maintaining a library that other developers depend on.

What I'll Cover

4 different approaches to major version bumping in Go
Show real-world examples from popular Go libraries using each approach
Tradeoffs of each approach

Approach 1: Major Version Subdirectory

Let’s start with the official recommendation from the Go team (reference). In this approach, we create a new subdirectory for each major version in your repository while the root directory maintains the previous version's code (v0 or v1).

How it works

Make a new folder named after the new major version (e.g. v2/, v3/, etc.)
Copy your code into this new folder
Update the go.mod file in the new folder to include the new major version suffix in the module path
Git tagging the commit that represents the new major version release

Here's a simple visual:

github.com/user/project (main branch)
│
├── go.mod         # module github.com/user/project
├── main.go
├── utils.go
│
├── v2/
│   ├── go.mod     # module github.com/user/project/v2
│   ├── main.go
│   └── utils.go
│
└── v3/
    ├── go.mod     # module github.com/user/project/v3
    ├── main.go
    └── utils.go

In essence, we have a separate copy of the entire codebase for each major version in a separate subdirectory.

💡

Real-world example: github.com/googleapis/gax-go

When to use

“This approach is compatible with tools that aren’t aware of modules: file paths within the repository match the paths expected by go get in GOPATH mode. This strategy also allows all major versions to be developed together in different directories … We recommend that module authors follow this strategy as long as they have users developing in GOPATH mode.” — Go Modules: v2 and Beyond

“In other words, for every major version, we are encouraged to maintain a new copy of the entire codebase. This is also the only way to do it if you want pre-modules users to be able to use your package. I understand why for large projects this makes a ton of sense, it allows the maintainers to continue patching old versions easily while developing the new version.” — Go’s Major Versioning Sucks – From a Fanboy

Tradeoffs

Pros:

Your code will work with older Go versions (pre-Go 1.11). This is especially useful in organizations that adopted Go before the pre-Go-module days who are still relying on GOPATH development mode (now legacy) as it relies on this specific directory structure
Allows concurrent development of multiple major versions

Cons:

Code duplication (also commonly shared code in the repository needs to be more carefully managed)
More complex repository structure

Approach 2: Major Version Branch

The second commonly known strategy for bumping major versions in Go is the branch-based approach (reference). Instead of using directories, we maintain different major versions in separate Git branches.

How it works

Create a new branch for the new major version (e.g., git checkout -b v2)
In each branch, update the go.mod file to include the new major version suffix in the module path
Git tag releases in the new branch (e.g., v2.0.0)

Here's a simple example:

github.com/user/project (v3 branch)
│
├── go.mod         # module github.com/user/project/v3
├── main.go
└── utils.go

github.com/user/project (v2 branch)
│
├── go.mod         # module github.com/user/project/v2
├── main.go
└── utils.go

github.com/user/project (v0 or v1 branch)
│
├── go.mod         # module github.com/user/project
├── main.go
└── utils.go

💡

Real-world example: github.com/go-yaml/yaml

Tradeoffs

Pros:

Clearer separation in version control
Cleaner repository structure without duplicated directories

Cons

May complicate CI/CD pipelines
Potential confusion between branches and tags

Approach 3: Major Version Suffix

This approach offers a much simpler alternative to the previous approaches. Instead of creating new directories or branches, we simply increment the major version number suffix in the module path of your go.mod file and that’s it!

How It Works

Keep your existing directory structure
Update the go.mod file’s module path to include the new major version suffix
Tagging the commit that represents the new major version release

Here's an example:

github.com/user/project (main branch)
│
├── go.mod         # module github.com/user/project/v2 (Changes with each major version)
├── main.go
└── utils.go

💡

Real-world example: github.com/google/go-github

Tradeoffs

Pros:

Simple and straightforward
No code duplication
No need to create new directories or branches

Cons:

Potentially breaks compatibility with users who are still using GOPATH development mode

Approach 4: New Repository for Each Major Version

While not officially recommended by Go, some teams opt to create an entirely new repository for each major version. This way, they get a clear separation between versions, although it does make things a bit trickier to handle.

It might seem a little strange at first, but when you think about it, a major version (like v2) of a Go module really is like starting a whole new Go module.

How it works

Create a new repository for each major version of your module
Start fresh with the new version in the new repository
Ensure that the go.mod file module path suffix correctly reflects the new major version

Some example:

github.com/user/project (main branch)
│
├── go.mod         # module github.com/user/project
├── main.go
└── utils.go

github.com/user/project-v2 (main branch)
│
├── go.mod         # module github.com/user/project-v2
├── main.go
└── utils.go

github.com/user/project-v3 (main branch)
│
├── go.mod         # module github.com/user/project-v3
├── main.go
└── utils.go

💡

Real-world example: see AWS SDK for Go: github.com/aws/aws-sdk-go (v1) and github.com/aws/aws-sdk-go-v2 (v2)

Tradeoffs

Pros:

Complete isolation between versions

Cons:

Increased overhead in managing multiple repositories
Cannot reuse existing CI/CD pipelines

Summary

Consider the size of your project and whether you still have users working in GOPATH mode. Think about how your team likes to work and go with the approach that best fits your unique situation.

Closing Thoughts

Unfortunately, the complexity of major version bumping in Go has an unintended side effect. It seems to sometimes make developers hesitant to take the leap for necessary major version updates. As a result, there’s a chance breaking changes may sneak in with minor updates, which goes against the spirit of semantic versioning.

3 Easy Ways To Add Version Flag in Go

Jerry Ng — Tue, 08 Oct 2024 00:00:00 GMT

One of my favorite things about Go is its distribution process using the go install command. With go install, I don’t have to deal with the trouble of setting up brew, npm, pip, or any other package manager separately like I had to with some languages. It just works out of the box!

Now a while back, I built a command-line (CLI) app in Go. I wanted to add a simple way for users to check the app's version after installing it via go install. This led me to discover different ways to add version flags in Go.

What I Wanted to Do

Let's say we have a Go CLI app named mym. My goals were straightforward:

This should work right after they install the app with go install
I wanted users to type mym -version or mym -v and see the app's current version:

❯ go install github.com/ngshiheng/michelin-my-maps/v2/cmd/mym@v2.6.1
❯ mym --version
Version: v2.6.1

This led me to explore different ways to do this in Go. Here's what I’ve gathered:

Method 1: Build time injection

If you've done some digging online, you've probably come across this common solution online. It involves using the -ldflags switch with the build command to set version information into the binary during the build process. Here are the steps:

Step 1: Define the version variable

First, you define a version variable in your main package:

package main

// Version is set during build time using -ldflags
var Version = "unknown"

// printVersion prints the application version
func printVersion() {
	fmt.Printf("Version: %s\n", Version)
}

func main() {
    versionFlag := flag.Bool("version", false, "print version information")
    flag.Parse()

    if *versionFlag {
        printVersion()
        return
	}

    // ...
}

Step 2: Build with the version flag

Then, you add the -ldflags flag to your go build command to set the version dynamically:

❯ go build -ldflags "-X 'main.Version=v2.6.0'" cmd/mym/mym.go
❯ ./mym --version
Version: v2.6.0

Step 3: Host the binary somewhere

Once you've built your Go binary with the version stamped using -ldflags, the next step is to host the binary on a platform where users can download, e.g. on GitHub release, AWS S3, or your own server.

Users who download this exact binary would then be able to run the command and get the same version information:

# (... download binary directly)
❯ mym --version
Version: v2.6.0

Downside

While this approach works, I think it has some major drawbacks for both the developer and users:

It requires a few extra steps
You can't expect users to build the app themselves with a specific ldflag
Users can’t pass ldflags with go install (even if they could, expecting users to install your app with a long, complex command is poor UX)

What about CI/CD?

Here's the thing: I already have a CI in place that handles automatic releases. Implementing this method would require either:

Setting up another workflow, or
Modifying the existing workflow to include steps for building and uploading binaries with version flags.

This adds an extra layer of build process complexity. I thought we could do better!

Method 2: Read from runtime build info

Git tags play an important role in publishing and versioning Go modules.

Each version tag in Git corresponds to a specific release of the module. When you push a new tag to your repository, it creates a new version of your module.

Then it hit me:

Since my Go CLI app is a Go module, the version info is already available in the Git tag, right?

Why not reuse it by reading it at runtime?

Here's how we can implement this approach

package main

import (
    "fmt"
    "runtime/debug"
)

// printVersion prints the application version
func printVersion() {
	buildInfo, ok := debug.ReadBuildInfo()
	if !ok {
		fmt.Println("Unable to determine version information.")
		return
	}

	if buildInfo.Main.Version != "" {
		fmt.Printf("Version: %s\n", buildInfo.Main.Version)
	} else {
		fmt.Println("Version: unknown")
	}
}

func main() {
    versionFlag := flag.Bool("version", false, "print version information")
    flag.Parse()

    if *versionFlag {
        printVersion()
        os.Exit(0)
    }

    // ...
}

NOTE: If the binary wasn’t built using go install, the version will show up as "(devel)"

From the end user’s perspective, all they needed to do was:

❯ go install github.com/ngshiheng/michelin-my-maps/v2/cmd/mym@v2.6.1
❯ mym --version
Version: v2.6.1

This means I don't have to modify my existing CI/CD workflow, and I get versioning out of the box from our automated release (git tag)!

Method 3: Use the `versioninfo` module

As I was writing this, I came across this blog post where someone else had the same issue and created a Go package for this. The simple package allows you to add a version flag to your CLI with just two lines of code!

package main

import (
    "flag"
    "fmt"

    "github.com/carlmjohnson/versioninfo" // 1. import
)

func main() {
    versioninfo.AddFlag(nil) // 2. add flag
    flag.Parse()
}

However, if you really don't want to add another module to your already long list of dependencies, you should probably stick to Method 2.

Closing Thoughts

Adding version information to a Go CLI/app turned out to be more interesting than I initially thought.

While all three options are valid, Method 2 hit the sweet spot for me. It doesn't require any special build steps and works with go install. As a bonus point, it’s dependency-free, and won’t leave you scrambling if a package disappears.

Solving Canceled Meeting Rooms With Apps Script

Jerry Ng — Tue, 06 Aug 2024 00:00:41 GMT

Today, I arrived at work a bit earlier than usual. After grabbing my morning coffee routinely, I opened up my Google Calendar to see what meetings I had for the day. Of course, our meeting room for our recurring daily stand-up got canceled again.

This had been bothering me for longer than I cared to remember

Whenever this happened, we ended up manually rebooking a different meeting room — not particularly time-consuming, but it was a bit annoying to do it every day. We'd be forced to book another room ad-hoc, or worse, sometimes ended up without a room.

But today, I decided enough was enough. I couldn’t stop thinking about it on the way home. Once I got back, I pulled out my laptop and got to work.

Solution? Automate this.

After some digging online, I confirmed that I wasn't alone in my struggle.

At this point I was no stranger to working with the Google Workspace API and messing around with Google Apps Script. So naturally my first thought was to use Apps Script.

💬

Apps Script is a niche tech stack. But honestly, it's super easy to use, especially if you dealing with Google Workspace stuff (e.g. Gcal, Gmail, Gsheet, etc.) and you already know JavaScript.

On a high level, here's how I envisioned the workflow to be:

graph TD subgraph "Cron job (Time-driven Trigger)" A(Start) --> B{Check if WFH Day} B -->|Yes| C(End) B -->|No| D[1. Find daily stand-up] D -->|Not Found| C D -->|Found| E[2. Get meeting rooms] E --> F{3. Is room available?} F -->|No| C F -->|Yes| G[4. Book room] G --> C end

Considerations

Of course, there were some things I had to keep in mind:

Don’t book another room if there’s already one reserved for the meeting
Don’t hog meeting rooms on days we don’t need them (like work-from-home days)
Only book the room on the actual day of the meeting

Implementation

1. Finding the daily stand-up meeting

The first step was to find the stand-up meeting (event) for the current day:

const STANDUP_EVENT_NAME = "Daily Standup";

/**
 * Finds the standup event for the current day.
 */
function findStandupEvent() {
    const calendarId = "primary";
    const today = new Date();
    const startOfDay = new Date(today.getFullYear(), today.getMonth(), today.getDate());
    const endOfDay = new Date(today.getFullYear(), today.getMonth(), today.getDate() + 1);

    const events = Calendar.Events.list(calendarId, {
        timeMin: startOfDay.toISOString(),
        timeMax: endOfDay.toISOString(),
        singleEvents: true,
        orderBy: "startTime",
    });

    const standupEvent = events.items.find((event) => {
        return event.summary === STANDUP_EVENT_NAME;
    });

    if (standupEvent) {
        const startTime = new Date(standupEvent.start.dateTime).toLocaleTimeString();
        console.log(`Found standup event "${standupEvent.summary}" which starts at ${startTime}.`);
        return standupEvent;
    }

    console.log("No standup event found for today.");
    return null;
}

Here, I utilized the Calendar.Events.list API (reference) to retrieve events from the primary calendar, filtering for events with the title "Daily Standup".

By setting timeMin and timeMax, I was able to narrow down the search to events occurring within the current day.

Once I had the list of events, I simply used the find method to locate the stand-up event based on its summary (i.e. title of the meeting). If the stand-up was found, we’d just return the event object for later use.

2. Get all meeting rooms in the office building

After locating the stand-up event, the next step was to retrieve a list of meeting rooms:

/**
 * Retrieves a list of available meeting rooms (excluding phone booths and cockpits).
 */
function getAllRooms() {
    const rooms = AdminDirectory.Resources.Calendars.list("my_customer", {
        maxResults: 100,
        query: "buildingId=SG-1N AND floorName=3 AND resourceCategory=CONFERENCE_ROOM",
        orderBy: "capacity asc",
    });

    const availableRooms = rooms.items.filter((room) => {
        const resourceName = room.resourceName.toUpperCase();
        if (resourceName.includes("BOOTH") || resourceName.includes("COCKPIT")) {
            console.log(`Skipping ${room.resourceName} (phone booth or cockpit)`);
            return false;
        }

        if (room.capacity < 6 || room.capacity > 8) {
            console.log(`Skipping ${room.resourceName} (inadequate capacity)`);
            return false;
        }
        console.log(`Including ${room.resourceName}`);
        return true;
    });

    console.log(`Found ${availableRooms.length} available meeting rooms.`);
    return availableRooms;
}

"my_customer" alias tells the API that the operation should be performed within the scope of my own Google Workspace account

Through some quick Googling (with site:stackoverflow.com) on how room resources work with Google Calendar, I discovered the AdminDirectory.Resources.Calendars.list API (reference). For this to work, I had to first enable the AdminDirectory API in my Apps Script.

This API allows fetching calendars for meeting rooms based on specific criteria (e.g. building, floor, and category):

const rooms = AdminDirectory.Resources.Calendars.list("my_customer", { maxResults: 100, query: "buildingId=SG-1N AND floorName=3 AND resourceCategory=CONFERENCE_ROOM"});
console.log(rooms.items[0])

// Output:
//
// { capacity: 1,
//   etags: '"-roQ5YNyqtVnJTuIfcddtIsUY9W3r0o3wv8vGq722Ls/hrna2LzZYMJYI0OOszX_2X4iKIw"',
//   generatedResourceName: 'SG-1N-3-M09 SOLO BOOTH (1)',
//   buildingId: 'SG-1N',
//   resourceId: '10744518274',
//   kind: 'admin#directory#resources#calendars#CalendarResource',
//   resourceEmail: 'some-placeholder-resourceEmail@resource.calendar.google.com',
//   featureInstances: [ { feature: [Object] } ],
//   resourceName: 'M09 SOLO BOOTH',
//   floorName: '3',
//   resourceCategory: 'CONFERENCE_ROOM' }

[Not part of the main code] Example of a room object

The getAllRooms function filters out rooms that don't meet my criteria, such as phone booths, cockpits, and rooms with inadequate capacity for my use case.

💬

The AdminDirectory.Resources.Calendars.list API doesn't provide us with any information about a room's availability. To determine that, we'll need to take an additional step, which I'll cover next.

3. Checking Room Availability

With the list of potential meeting rooms in hand, the next step was to filter out those that were unavailable during the stand-up event's scheduled time:

/**
 * Checks the availability of a room during a given time range.
 */
function isRoomAvailable(roomGeneratedResourceName, roomEmail, startTime, endTime) {
    const freebusy = Calendar.Freebusy.query({
        timeMin: startTime.toISOString(),
        timeMax: endTime.toISOString(),
        items: [{ id: roomEmail }],
    });

    const busyTimes = freebusy.calendars[roomEmail].busy;
    const isAvailable = busyTimes.length === 0;

    console.log(`${roomGeneratedResourceName} is ${isAvailable ? "available" : "not available"} during the specified time range.`);
    return isAvailable;
}

To achieve this, I use the Calendar.Freebusy.query API (reference), which allows checking if a room is free or busy during a specific time range.

The response provides an array of busy time slots for the specified room(s), for example:

const rooms = Calendar.Freebusy.query({
  timeMin: startOfDay, 
  timeMax: endOfDay,
  items:[{"id": "resourceEmail-from-before@resource.calendar.google.com"}],
})

console.log(rooms.calendars["some-placeholder-resourceEmail@resource.calendar.google.com"].busy) 

// Output:
//
// [ { end: '2024-07-18T04:00:00Z', start: '2024-07-18T03:00:00Z' },
// { end: '2024-07-18T07:00:00Z', start: '2024-07-18T06:00:00Z' },
// { end: '2024-07-18T10:00:00Z', start: '2024-07-18T09:00:00Z' } ]

[Not part of the main code] Example data showing the room is busy during these time window

So, if the array is empty, it means the room is available during the given time range. Otherwise, it's considered unavailable (busy).

Using this, I could filter out the meeting rooms that were already booked during the stand-up event's scheduled time, leaving me with a list of available options.

4. Book the meeting room

Once I was able to find the stand-up event, retrieve all meeting rooms in the office building, and check their availability during the desired time range, the last step was to actually book a meeting room:

💡

It turns out that to book a room, you can add it to the event's attendee list, treating it just like any other participant in the meeting!

/**
 * Books an available room for the standup event.
 */
function bookMeetingRoom(event) {
    console.log(`Attempting to book a room for the standup event "${event.summary}"...`);

    // Check if a room is already booked
    const existingRoomAttendee = event.attendees?.find((attendee) => attendee.resource);
    if (existingRoomAttendee) {
        console.warn(`A room "${existingRoomAttendee.displayName}" is already booked for this event.`);
        return;
    }

    const availableRooms = getAllRooms();
    const startTime = event.start.dateTime;
    const endTime = event.end.dateTime;

    const availableRoomsForEvent = availableRooms.filter((room) => {
        return isRoomAvailable(room.generatedResourceName, room.resourceEmail, new Date(startTime), new Date(endTime));
    });

    if (availableRoomsForEvent.length === 0) {
        console.warn(`No available rooms found for the standup event "${event.summary}".`);
        return;
    }

    // OR: randomly select a room instead
    // const randomIndex = Math.floor(Math.random() * availableRoomsForEvent.length);
    // const selectedRoom = availableRoomsForEvent[randomIndex];

    const selectedRoom = availableRoomsForEvent[0]; // FIXME: come up with a preferred room algorithm

    console.log(`Selected "${selectedRoom.generatedResourceName}" for the standup event.`);

    const attendees = event.attendees || [];
    attendees.push({
        resource: true,
        responseStatus: "accepted",
        displayName: selectedRoom.generatedResourceName,
        email: selectedRoom.resourceEmail,
    });

    const updatedEvent = {
        attendees,
        ...event,
    };

    Calendar.Events.update(updatedEvent, "primary", event.id); // NOTE: comment out this line if you do not want to actually book the room during testing
    console.log(`Booked "${selectedRoom.generatedResourceName}" for the standup event "${event.summary}".`);
}

To keep things simple for now, I just pick the first available room for us

The bookMeetingRoom function first checks if a room is already booked for the event. Again, we don’t want to accidentally double-book a room!

Here, I retrieve the list of available rooms and filter them based on their availability during the stand-up event's time range. If there are no available rooms, then oh well.

Finally, I simply call the Calendar.Events.update API (reference) to update the event with the new attendee list, effectively booking the selected room for the stand-up event by including the meeting room as an attendee.

Putting everything together

The final step was to integrate everything into a single file (e.g. Code.gs) with an entry point:

/**
 * Entry point function to be triggered for booking a room for the standup event.
 * This function orchestrates the process of finding the standup event,
 * checking if a room is already booked, and booking a new available room if needed.
 */
function bookStandupRoom() {
    const today = new Date();
    const dayOfTheWeek = today.getDay();
    const isWorkFromHomeDays = [0, 1, 2, 6].includes(dayOfTheWeek);

    if (isWorkFromHomeDays) {
        console.log(`Skipping job as today is WFH day.`);
        return;
    }

    const standupEvent = findStandupEvent();
    if (standupEvent) {
        bookMeetingRoom(standupEvent);
    }
}

This function first checks if the current day is a work-from-home day. If so, it skips the entire booking process. Otherwise, it proceeds to book a meeting room for the daily stand-up that day.

Simple!

Run this daily

With everything in place, the final step was to set up a time-driven trigger daily at 8 AM. The trigger would call the bookStandupRoom function, automating the entire process and ensuring that an available meeting room is booked for the stand-up event each day.

Here's what it would look like every day from the execution log:

Fun fact: "Steady" is a Singaporean/Malaysian expression that praises someone for a job well done

The Result

Honestly, it felt pretty cool to see the meeting room getting booked for real every day as I walked by before the meeting. I know it sounds kinda weird, but it was one of those awesome moments that brought back the joy of coding and fixing my own problems.

Meeting room is secured for the daily stand-up!

How I Saved Scraped Data in an SQLite Database on GitHub

Jerry Ng — Tue, 09 Jul 2024 00:00:50 GMT

Some time back, I stumbled upon this amazing idea of Git scraping made popular by Simon Willison. It involves using GitHub Actions to scrape data and save it directly into a Git repository. Essentially, for each new set of data scraped, a new commit is created in your repository.

The beauty of this is that you don’t need an application or database server since the VCS host (e.g. GitHub) runs and stores everything for you (for free!). Plus, Git inherently offers historical records of the data. Each commit serves as a time-stamped snapshot, enabling you to track data changes over time.

This is absolutely brilliant! I knew I had to try it.

“Drawbacks” of Git scraping

After trying it myself, one downside (although not a big one) is that you'll end up with a lot of Git commits over time depending on how often your job runs.

Also, querying data from Git commits and history is much more annoying than from a regular database like Postgres or SQLite.

Embracing SQLite

Around this time, I was also experimenting with SQLite. I usually used CSV or JSON to store data for small projects. However, I realized that SQLite has many advantages over these file formats. If I need the data in CSV format (which most people outside tech prefer), I can easily convert it to CSV anyway.

Moreover, having your database in a file format (like *.db, *.sqlite3) has many benefits. For example, you can store it anywhere easily — on Google Drive, Dropbox, AWS S3, or even storing it in Git.

The Idea: "GHActions Scraping"

One day, it hit me:

"What if I could store the scraped data in an SQLite database within GitHub Artifacts?"

This way, I could run a scraping job periodically using GitHub Actions while keeping my data in a proper database like SQLite, stored on GitHub Artifacts instead of, say, AWS S3.

Here's what I had in mind:

graph TB subgraph GitHub subgraph Actions scraper[Scrape Job] class scraper actions; end subgraph Artifacts db[(SQLite)] class db artifacts; end end subgraph Web html[HTML/API] class api Web; end db --> |1. Download| scraper html --> |2. Fetch Data| scraper scraper --> |3. Upload| db %% Apply dotted line styles style GitHub stroke-dasharray: 5 5; style Web stroke-dasharray: 5 5;

With this setup, I can run the entire scraping system without needing a dedicated server (at least not in the traditional sense of a self-managing one).

Proof of Concept

To test this out, I needed an idea to prove what I proposed. I ended up adapting work from simonw/ca-fires-history.

For this, I’ve written a very simple Python script to fetch fire incident data from the California Department of Forestry and Fire Protection web API and store it in SQLite. Next, we need to write a GitHub Actions workflow file. Let's call it scrape.yml.

Requirements

We need to create a fine-grained access PAT token with Action (Read-only) permission
Then we need to store the token under the repository's Actions secrets and variables settings

This PAT is needed by the actions/download-artifact@v4 step so that we can download artifacts (which is our SQLite DB) from our last workflow run

Here's how to create a scrape.yml containing the jobs and steps that mimic what we’ve drawn above. I've added some notes to the important parts:

name: Scrape latest data

on:
    push:
    workflow_dispatch: # This allows us to trigger manually from the GitHub Actions UI
    schedule:
        - cron: "6,26,46 * * * *" # Scheduled to run 3 times/hour (at minute 6, 26, and 46)

jobs:
    scheduled:
        runs-on: ubuntu-latest

        steps:
            - name: Check out this repo
              uses: actions/checkout@v4

            - name: Set up python
              uses: actions/setup-python@v5
              with:
                  python-version: 3.11

            # Step to get the latest artifact run ID
            # Fetch the latest artifact run ID using GitHub API and jq
            # Save the run ID as an environment variable
            # If your repository is set to private, an OAuth app token or personal access token (classic) with repo scope is required
            - name: Get latest artifact run id
              run: |
                  ARTIFACT_RUN_ID=$(curl -s "https://api.github.com/repos/${{ github.repository }}/actions/artifacts?per_page=1" | jq '.artifacts[0].workflow_run.id')
                  echo "artifact_run_id=$ARTIFACT_RUN_ID" >> $GITHUB_ENV

            # Download the artifact (our SQLite DB!) from the last run
            - name: Download artifact
              uses: actions/download-artifact@v4
              with:
                  name: ca-fires-history-db
                  path: ./data/
                  run-id: ${{ env.artifact_run_id }} # Run ID of the artifact (SQLite DB) uploaded from the last run
                  github-token: ${{ secrets.GH_PAT  }} # REQUIRED. See https://github.com/actions/download-artifact?tab=readme-ov-file#download-artifacts-from-other-workflow-runs-or-repositories
              continue-on-error: true # Set this to false after the first run

            - name: Display downloaded file
              run: ls data/

            - name: Run scrape.py
              run: python3 scrape.py

            - name: Upload updated artifact
              uses: actions/upload-artifact@v4
              with:
                  name: ca-fires-history-db # Name of the artifact to upload, make sure to match the name in the download step
                  path: ./data/fires.db
                  if-no-files-found: error

Remember to create a new repository secret name GH_PAT at github.com/username/repository/settings/secrets/actions!

With this, we have an entire automated web scraping system where the data is collected periodically and stored in an SQLite database – all these within the GitHub Actions ecosystem!

Caveats

Long-running jobs

In my experience, this setup is great for handling smaller, shorter-running jobs. However, it doesn't work well for longer tasks if you're sticking with the free tier. You might end up seeing your runners getting killed or running out of memory or CPU if the job is too intensive.

90 days retention limit

As of now, GitHub Artifacts are retained for a maximum of 90 days for public repositories before they are automatically deleted (reference).

However, as our scraper downloads our SQLite database before each job, updating it, and re-uploading it as a new artifact more often than every 90 days. This means we won't lose any of the data collected from previous runs.

Artifacts upload limit

Watch out for the upload limit on artifacts. GitHub doesn't specify a strict limit, but discussions suggest it's around 5GB (2019).

💬

Update (19 Nov 2024): Just a quick note that if your project goes inactive for too long (e.g. no new commits/PR), your workflow will be disabled.

One idea is you can consider setting up a Renovate Bot. It can help you with automated dependency updates and keep your project "active" so you won't have to worry about your workflow getting disabled.

Overall, I wanted to share that this setup has required minimal oversight and I am super happy about it!

Bonus: Visualizing our data with Datasette

graph TB subgraph Vercel deployment[Datasette] class deployment vercel; end subgraph GitHub subgraph Actions scraper[Scrape Job] class scraper actions; end subgraph Artifacts db[(SQLite)] class db artifacts; end end subgraph Web html[HTML/API] class api Web; end db --> |1. Download| scraper html --> |2. Fetch Data| scraper scraper --> |3. Upload| db scraper --> |4. Publish| deployment deployment --> |5. View/Access Data| client[User] %% Apply dotted line styles style GitHub stroke-dasharray: 5 5; style Web stroke-dasharray: 5 5;

With our data stored in an SQLite database, we can easily visualize or interact with it using Datasette, which is another tool I recently fell in love with.

Datasette is super easy to use. It allows you to easily publish data and share it with the world. If your site usage is low, you can even run it for free on Vercel.

To start, create a Vercel API token under Account Settings.

Then, we simply have to append the following step at the very end of our previous scrape.yml workflow:

# ... (omitted for brevity)
            - name: Install datasette
              run: |
                  pipx install datasette

            - name: Deploy to vercel
              env:
                  VERCEL_TOKEN: ${{ secrets.VERCEL_TOKEN }} # Vercel token for authentication
              run: |-
                  datasette install datasette-publish-vercel
                  datasette publish vercel data/fires.db  --project=cafireshistorydb --install=datasette-hashed-urls --install=datasette-cluster-map --token="$VERCEL_TOKEN" --metadata data/metadata.json --setting allow_download off  --setting allow_csv_stream off --extra-options "-i"

Remember to create another new repository secret name VERCEL_TOKEN

Datasette also has a bunch of plugins that you can use right out of the box! Here, I use the datasette-cluster-map plugin to visualize fire incidents on a map!

Palisades Fire is currently one of the most devastating wildfires in Los Angeles County

It’s pretty cool, right?

You can find the complete version of our scrape.yml file here on GitHub.

Closing Thoughts

💡

Feel free to interact with the final demo here. The source code is available on GitHub.

That's it! Unlike Git scraping, GHActions scraping doesn't create a new Git commit for each new piece of data. Instead, we store the data in an SQLite database (within GitHub Artifacts), download it in the next run, add/update rows as needed, and then re-upload it for future runs.

Is this approach scalable? Probably not. But after keeping it running for a few months, I really think it requires very little to zero effort to maintain! It really depends on the scale that you're running. If your data is in the range of gigabytes or petabytes, you might want to stick with other solutions.

With that said, I had a lot of fun experimenting and building this. Cheers!

All You Need To Know About CORS

Jerry Ng — Tue, 04 Jun 2024 00:00:10 GMT

I’ll never forget the first time I stumbled across a Cross-Origin Resource Sharing (CORS) error. Trying to digest terms like the Same-origin policy, preflight requests, CORS policy, and Access-Control-Allow-Origin header all at once left me with more questions than answers.

Today, let’s dive into the world of CORS and why it’s the bane of developer’s existence. I promise it’ll only take 5 minutes to get a good grasp on it!

If you're just looking to resolve your CORS errors, skip right ahead to the solution!

What the heck is CORS?

So, what is CORS, and why does it keep insisting on ruining our day?!

Imagine this: you’re working on a fancy new website, let’s call it example.com. You’ve built a sleek frontend, and you’re ready to hook it up with your backend API from api.example.com.

You’ve tested the API a million times on Postman, and everything looks perfect. The response structure is flawless, and you’re feeling super confident. But then, you send a request from your webpage to the API, and bam — you're hit with the dreaded CORS error:

Access to fetch at 'https://api.example.com/posts' from origin 'https://example.com' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

Talk about a buzzkill!

Basically, CORS is a security mechanism that browsers enforce to prevent websites from making requests to domains different from the one serving the web page. Think of it like having a bouncer at a club who only lets in people from the same neighborhood (or domain, in this case).

💡

Wait a minute, aren't example.com and api.example.com the same domain?
Not quite! Even though they share the same domain name, they are considered different origins because they have different subdomains. In the world of CORS, the scheme/protocol, domain, and port must match exactly for two URLs to be considered the same origin.

See this determination rules table for the Same-origin policy.

When do CORS errors happen?

Here's the kicker: CORS errors only happen when you're making requests to APIs hosted on different domains from the browser.

If you're testing your API with Postman or Insomnia (or any other API development/testing tool), you'll never see this error because those tools don't give a damn about CORS rules.

Okay, what’s going on? Why is my browser doing this?

Well, your browser is just trying to protect you from potential vulnerabilities (e.g. CSRF attacks).

How? When your website makes a request to an API on a different domain, the browser sometimes automatically performs a series of steps to ensure the request is safe.

This process involves making a preflight request (which is an OPTIONS request) sent by the browser to the API server before the actual request is made.

Preflight Requests

Here's an example of preflight requests in the network tab:

Preflight requests include headers like Access-Control-Request-Method and Access-Control-Request-Headers to indicate the intended request method and headers

The preflight request (automatically issued by the browser) checks if the server allows the cross-origin request and includes the necessary CORS headers.

If the server responds with the appropriate CORS headers, the browser proceeds with the actual request. Otherwise, the request is blocked.

sequenceDiagram participant Browser participant API Browser->>Browser: Check if preflight request is needed Note right of Browser: Preflight request is needed for certain types of requests (e.g., a POST request with custom headers) Browser->>API: OPTIONS preflight request API-->>Browser: Server responds with CORS headers Browser->>Browser: Check if CORS is allowed Note right of Browser: If CORS is allowed, proceed with the actual request Browser->>API: Actual fetch("https:/api.example.com/posts") POST request API-->>Browser: Response

Wait, you said sometimes?

Preflight requests are only triggered for certain cross-origin requests.

Yep, you heard that right. The browser will only send a preflight request if the actual request meets any of the following conditions:

Only for certain types of requests, e.g. POST, PUT, DELETE
Your request includes custom headers (e.g., X-Custom-Header)
The Content-Type header has a value other than application/x-www-form-urlencoded, multipart/form-data, or text/plain. Commonly used ones are like application/json 👀

How to fix CORS errors

Alright, now that you understand what's actually going on, let's talk about how to fix your CORS mess. Here’s how you can typically go about it:

Scenario: You’re making an API request in the browser from example.com to api.example.com/posts and you ran into a CORS error.

flowchart TD Start([CORS Error Encountered]) --> OwnServer{Do you own/control the server?} OwnServer --Yes--> ImplementCORS[Implement CORS middleware] ImplementCORS --> AllowOrigin{Allow specific origin or *?} AllowOrigin --"Specific Origin"--> SetOrigin[Set Access-Control-Allow-Origin to your website's origin e.g. https://example.com] AllowOrigin --"All Origins"--> SetAsterisk[Set Access-Control-Allow-Origin to *] SetOrigin --> Success([CORS Issue Resolved]) SetAsterisk --> Success OwnServer --No--> Workaround{Consider workarounds} Workaround --Proxy--> ProxyRequest[Proxy requests through a server you control] Workaround --"Third-Party"--> ThirdPartyService[Use a third-party service that handles CORS] ProxyRequest --> Success ThirdPartyService --> Success

CORS errors should primarily be handled on the backend side, not client-side workarounds.

CORS Configuration

The first thing you should check is if you/your team owns or controls the API server (i.e. api.example.com in this case). If you do — awesome! Here's how you can make it happen:

To enable CORS on the server side, configure your API server to include the necessary CORS headers in its responses
This is typically done by setting up CORS middleware or configuring the server framework to handle CORS

What CORS response headers should I add?

For example, these headers are to be added to your API server's HTTP response to enable CORS for requests originating from https://example.com:

Header	Description	Must-Have	Example Value
Access-Control-Allow-Origin	Specifies the allowed origin.	Yes	`https://example.com`
Access-Control-Allow-Methods	Specifies the allowed HTTP methods.	Yes	`GET`, `POST`, `OPTIONS`
Access-Control-Allow-Headers	Specifies the allowed headers.	Yes	`Content-Type`, `Authorization`, `X-Custom-Header`
Access-Control-Allow-Credentials	Indicates if credentials are allowed.	No	`true`
Access-Control-Max-Age	Specifies the cache duration for preflight responses.	No	`86400` (24 hours)

Here is an example of how you might modify the code to add CORS headers to a response from a Node.js Express server:

const express = require('express');
const bodyParser = require('body-parser');
const app = express();

app.use(bodyParser.json());

app.post('/posts', (req, res) => {
  res.setHeader('Access-Control-Allow-Origin', 'https://example.com');
  res.setHeader('Access-Control-Allow-Methods', 'POST');
  res.setHeader('Access-Control-Allow-Headers', 'Content-Type');
  // Send the actual response data here
  res.json({
    "userId": 1,
    "id": 1,
    "title": "My title",
    "body": "My post body"
  });
});

app.listen(3000, () => {
  console.log('Server listening on port 3000');
});

This is only for a simple demonstration. A more scalable/maintainable approach would be to use middleware to handle CORS globally

💡

Server-side frameworks and libraries often provide built-in CORS support or plugins to simplify CORS configuration. See the example of how to do it in Express.js.

Um… I don’t own the 3rd-party API server

If you don't control the API server, well, you do have a few choices.

Unless the API owners decide to open up their CORS policy, you'll have to find a workaround. Here are some example workarounds:

Proxy Server (Recommended): you can proxy the requests through a proxy server (which you can self-host) e.g. CORS Anywhere
CORS Browser Extensions: some browsers have extensions or plugins that allow you to bypass CORS restrictions during development. These extensions modify the browser's behavior by adding CORS headers to the requests. This should only be used for development and testing purposes

Key Takeaways

Preflight requests are automatically issued by the browser (when conditions are met)
CORS errors only happen with browser requests to different domains, not server-side applications.
To fix CORS errors, implement CORS middleware on the server side
If you lack control over the API server, consider solutions like proxying requests or using third-party services, but keep security in mind

Happy coding, and may your requests always find their way around the CORS maze!

How I Made a Chrome Extension To Help With Rephrasing

Jerry Ng — Mon, 13 May 2024 00:00:46 GMT

I decided to teach myself how to create a Chrome extension and a little bit about running LLM directly in web browsers. It was quite a fun and interesting experience, and I thought I wanted to jot down my thought process and whatever I’ve learned.

Why I Made This

I occasionally find myself needing to change the casing of my sentences or rephrase short sentences to sound more professional, polite, or casual, depending on the context

Typically, I would search for "how to professionally say " or copy and paste the text into another tool, manually prompt it to make the desired changes, and then copy the revised text back to its original location.

This is a bit of a hassle for me for N * MyEntireLifetime.

An example of how that looks like if I were to use How to professionally say

Then I thought, wouldn't it be much more convenient to have an extension that could do just this from where I was typing?

Instead of switching between multiple tools, I could just highlight the text I wanted to change and use a tool to rephrase (and replace) it instantly.

Immediately adjust the tone right from my context menu/keyboard shortcut with Rephrase Tools

Features I Want in My Extension

Here are the features that I want for my extension:

Change the case of the text (e.g., Title Case, UPPERCASE, lowercase)
Change the tone of text (e.g,. making it sound more professional or casual or anything in between)
Should work almost everywhere on the Chrome browser, e.g., Notion, WhatsApp, Gmail, etc.
Support keyboard shortcuts and a context menu for easy access

What Did I Learn

Most of my past experiences/projects have been more focused on backend development. Working with client-side development like manipulating the browser DOM has always been an area where I feel less self-assured. So, creating a Chrome extension seemed like the perfect chance to mess around with client-side code.

The making of a Chrome Extension

manifest.json
src
|-- background.js
|-- content.js
|-- popup.html
|-- popup.js

First off, I discovered that making a Chrome extension is actually simpler than I initially thought. The key concept is pretty straightforward.

The main components are basically just these 4:

Content script (e.g. content.js): This helps you interact with the browser DOM
Background script; service worker (e.g. background.js): This lets you run things in the background
manifest.json file: This turns your JS, HTML, and CSS files into a Chrome extension
popup.js + popup.html (optional): This is for the little popup you see when you click the extension icon

It's that simple!

Recurring cost

Since part of the feature helps you convert your tone, like being more professional, it currently calls the OpenAI API for this. This means it costs me money to keep this alive. If this grows (unlikely, I know), I’m wary about keeping up with any growing cost; I'm probably just overthinking.

My previous spam attack experience after hosting a URL shortener has made me a little wary about putting things online. The last thing I want is to have my side project unexpectedly rack up a large bill.

Why LLM in the browser

I spent weeks thinking about ways to make this economically viable. One idea I had was to see if I could run LLMs directly on the client side (web browser). The appeal of running LLMs in the browser is that they run locally, which means it's much cheaper since I won’t need a server.

It also offers privacy, as no data leaves the user's browser.

Exploring LLMs

Here are the 2 notable projects that I experimented with to see if they could be integrated into my extension. These projects enable running LLMs natively in web browsers by leveraging WebGPU:

Experiments and numbers

I tested out the WebLLM demo using the Llama-3-8B-Instruct-q4f32_1-1k model with an Apple M1 Pro Chip.

💡

For comparison, the Llama 3 8B has 8 billion parameters while the GPT-3.5 model has 175 billion parameters! A higher parameter count generally translates to more powerful language understanding and generation capabilities.

It took me about 116s (10s if fetched from cache) to initialize (~4GB downloads), which felt okay if I only did this once. After that, the model takes roughly about 3s to 4s to respond, which is surprisingly fast!

Try it out yourself: WebLLM demo

Having said that, I experienced a significantly worse result when repeating my experiment on my older Dell XPS 13 7390 (with an integrated GPU). The initial loading from the cache took about 40s whereas responding often takes more than 70s! Oof.

While browser LLMs are cool, I feel they aren't good enough for widespread public use yet. I ultimately decided not to integrate them into the current version of Rephrase Tools as I won’t be able to provide a decent UX (i.e., sub 2-3 second init and response times, no crashes, no lag). It’s just not practical to expect most users to have a somewhat decent machine, much less one with a GPU.

💡

Update 15 Mar 2026: it appears that someone has made a website that help you find out which AI models your machine can actually run.

Charging money

Based on my past experience, enforcing signups/charging a fee (even a small one) can act as a deterrent against spam by making it less economically viable for spammers.

Extension Popup screen with ExtPay

At the end of 2023, I wanted to make something that is able to generate some form of revenue every year. Last year, I made a small add-on that runs on a subscription model. Quite frankly, I'm not a fan of the subscription model; I personally prefer the good ol' buy-once-use-forever model. Plus, the logic for the latter is much easier to manage!

Hanging out on Hacker News, I found ExtPay, a neat tool made to monetize browser extensions. It's pretty easy to use!

💡

A little trick I discovered is that you could attach the entire README.md of ExtPay to claude.ai/ChatGPT, and it would figure out the logic for you. This made me wonder if library/package documentation should be kept on a single, large page that can be easily copy-pasted into an LLM chat window to provide the right context quickly.

Publishing the extension

Finally, publishing a Chrome extension is super easy.

The developer dashboard website guides you through it, and it's much simpler than publishing a Google Workspace add-on (which I did before). However, you do need to pay a one-time developer registration fee of $5 to verify your account and publish items.

Submit Chrome Extension for review; it will be published upon approval

You can publish the extension as "unlisted" so only you and your friends/family can use it. This crossed my mind as I didn't want to make it public at first because I was afraid of spam and unexpected costs.

Extra: Nuances of Title Casing

Did you know there are so many styles of title casing in the world? Like, AP, APA, Chicago style, etc. I initially wanted to support all of these but couldn't find a reliable/battle-tested library on GitHub for it.

You can try out different styles of title casing using this tool

I wanted to create a library myself but didn't have the time, so I gave up. I just do regular title cases that work just like some_string.title() in Python.

Maybe this is a problem I'll attempt to solve some other time.

Closing Thoughts

And that’s it! That's the journey behind the creation of Rephrase Tools, a Chrome extension made to simplify the process of rephrasing the tone of your writing.

If you're curious about trying out Rephrase Tools, you can start a free 7-day trial without entering any credit card information. Feel free to give it a spin!

When people ask me about topics related to LLM, I'm usually pretty meh about it. But now, I'm actually really excited about the potential for browser-based LLM to grow and develop. I think they show a lot of promise for the future.

Ideas for the Future

Here are some things I might work on next:

Incorporate semantic-release-chrome plugin in my CI/CD pipeline to publish Chrome extensions automatically
Make the extension work on Firefox too
Checking out extension.js - I found this a bit later! If I were to start another browser extension project, I'd definitely give this a try. Alternatively, there’s also Plasmo which is a full-fledged framework for building browser extensions
Rebuild the entire thing to use browser-based LLM as they mature

10,000 Hours of Gaming

Jerry Ng — Tue, 02 Apr 2024 00:00:45 GMT

10,000 hours — that’s how long it takes for you to be good at something. Or at least that’s what they say. Well for me, that's how long I immersed myself in taking gaming rather seriously. At peak, this pursuit has placed me in the top 300 ranks within the SEA region in Dota 2, and consistently achieved the Legend rank in Hearthstone every month for years.

Not sure what compelled me to write all this down. Perhaps it's the wistful longing for my online gaming days over the past two decades. As I was writing this, I realized how integral gaming was in my life.

Anyway, let’s rewind for a bit...

A Nostalgic Throwback

The Beginning - Ragnarok Online (RO)

I lost all my old RO screenshots from almost 20 years ago

My gaming journey began way back in the day with Starcraft and Diablo 2. I remember being only 12 years old when I first got into online gaming. I fell in love with this game called Ragnarok Online. But man, the official game required a point-based subscription. For my younger self, it felt like a door to an exciting world of online gaming was locked behind a paywall. I had no means to afford it.

So, I turned to one of the free private servers which I loved! I was part of one of the top guilds in the server and made cool friends along the way.

Now here's the intriguing bit, this was when I first dipped my toes into something that resembled "programming". I ran bots for farming in-game consumables and buffing up my team before entering the PvP arena. Honestly, I didn't know what I was doing back then! I was just following guides written on forums and fiddling with bits and pieces of code.

To date, this is still one of my fondest memories.

My Introduction to DotA

A couple of years later, when I hit 14, I got introduced to Defense of the Ancients (DotA) at school, a mod for the game Warcraft III. Pinpointing the exact DotA version that marked the start of my DotA journey might be tricky but my nostalgia hints at it being the time when Puck was the shiny new character.

I remember those days like yesterday. I’d finish all my homework at school, then rush home from school and straight away dive into forums to learn about skill/item build for each hero, reading tutorials and guides on every way to get better, and even playing 1 vs 5 against Insane difficulty bots.

Delving into Pro Games

Watching a DotA pro game replay

Ever so gradually, I found myself drawn towards the pro games (anyone remembers the old MYM, SK Gaming back then?). I would spend hours breaking down every move in replays while paying attention to the meticulous detail that pros put in. It became my way of learning — pause, look, replay, and analyze every move.

Strange as it may sound, I actually enjoyed watching pro game replays as opposed to just playing the game. To me, it was more than just learning; it was the perfect mix of my love for the game and curiosity.

The Competitive Front

Eventually, I made my stride into the competitive scene amongst friends and local tournaments as a high school kid. Some games I'd triumph, others, not so much.

As I was wrapping up high school, an opportunity to play professionally came my way for around RM300/month. But, cool as it seemed to me back then, the money aspect just wasn't cutting it even as an 18-year-old. I had college on the horizon and, as much as I cherished playing DotA, I knew I was pretty decent academically.

Eventually with a scholarship in my lap, pursuing a Bachelor's degree seemed like a much more sensible option than scraping by with RM300 a month in a shared dorm room in KL.

Dota 2 - The Next Chapter

When college started, then came the sequel to my obsession — Dota 2. I landed some 'Beta Keys' and wasted no time sharing them with my friends. Being one of the first few to play Dota 2 was an exhilarating experience.

Now, as I moved out of my home, I had even more time to invest in this hobby with no restrictions. Some days, I clocked in 8-10 matches without breaking a sweat. I was the first among my gang to hit 5K MMR (Matchmaking Rating points) when ranked matches were introduced in 2013. It was quite a milestone back then.

This achievement swiftly put me on the map within the local circles.

💬

For context, back in 2016, 5K MMR put you at >95th percentile.

The College Years

I mainly play Pos 4/5 in the ranked game; and Pos 3 in tournaments (context about Pos)

Throughout my 5-year stint at college/university, I managed to bag a couple of local tourneys and, eventually, even hit 6K MMR on 3 Steam accounts.

One noteworthy tidbit from my college years was my foray into trading within Dota 2. There came a point when I found myself trading more than I was actually playing the game! Looking back, it was a fun and rewarding time as a broke student.

Today, all those years of trading Dota 2 items have allowed me to enjoy other Steam games without having to spend any money until today.

I used to have one of those super rare Pink couriers called the 'Pink Burning Animus Wardog'

Card Games: Hearthstone

This period was also when Hearthstone piqued my interest. I got engrossed in watching Hearthstone games from 'Fight Night', but never actually thought to myself about playing the game. All I sought was the joy of commentary.

But as the fascination grew, I downloaded the game. Just like climbing the MMR ladder in Dota 2, I found myself really enjoying the challenge of reaching the Legend rank in Hearthstone.

*Remembering all the crazy RNG moments from the Shudderwock Shaman deck in the October 2018 season*

My stint with Hearthstone reaffirmed the old saying — "it's free to play if you play it long enough". I am happy to admit that I never spent a single dime on it.

Found my old notes on Arcane Dust budget planning. I had to be meticulous to ensure I could afford to build a specific deck

Looking Back

Okay, so what were my takeaways from all these years? Well, it's difficult to pinpoint. But, one thing is undeniable — it has shaped me into the person I am today.

Focus On What I Control

As I climbed into the higher ranks of MMR in Dota 2, I started to notice that grief and toxicity, unfortunately, were present at every level and were not confined to lower ranks.

Playing in the top MMR bracket in the SEA region, my daily matches often included a mix of Tier 1 and Tier 2 pro players. To my surprise then, even pro players occasionally exhibited unsporty and immature attitudes. Realizing even they exhibited toxicity shattered my admiration and perceptions of some of the pro players I grew up watching. Nevertheless, it reiterated the fact that professionalism doesn't make one immune to bouts of frustration or anger. We’re all human after all.

Anyway, this revelation brought me to a fundamental understanding: ranting or battling against elements outside my control was not only futile but counterproductive. Instead, I learned to treat these unfortunate encounters as a personal challenge, focusing only on what was within my control.

Discipline > Motivation

Discipline became a fundamental part of my life. Post-university, I was determined to climb the MMR ladder, obliging myself to play a minimum of 2 games daily, disregarding losses as insignificant trifles.

Later, this instilled discipline naturally transitioned into my career. Just as with grinding MMR, I found myself coding extensively outside of work hours or spending countless hours learning about stuff related to software engineering and computer science.

The Cruel Reality of "Good Enough"

Found this old screenshot from back when I was going through calibration in one of the seasons. Immortal rank was introduced later on

In the world of Esports (perhaps by extension traditional sports), there's no room for mediocrity. The harsh truth is, you're either among the best (think in the top 16 teams worldwide) or you're not.

Unfortunately, for those themselves in tier 2 or 3 teams, life and pay were not exactly rosy. And to make ends meet, they might resort to rather dubious activities like boosting MMR or even participating in match-fixing schemes.

The Epilogue

But one day, just like that, as I finished another match, took a look around, and had what you might call an epiphany. I kid you not, my ‘competitive’ gaming story literally ended like this in around 2018.

I guess as time ticked by, I just wasn’t enjoying the game anymore. The MMR climb, the fame, the pride, the praises — they didn't seem as alluring any more, especially seeing how they didn't contribute to my real-life progress. It seemed like my life was trapped in stagnation, not going anywhere.

As much as I loved gaming, adulthood came knocking with its mundane necessities like paying bills. So, I found myself at a crossroads — do I continue on my aimless path or take charge of my future? I chose the latter.

Childhood Dream Fulfilled

After taking a break from watching pro-Dota 2 games around 2018, I have started tuning into them again. Like most adults, life can get pretty hectic. To keep myself up-to-date with the latest upcoming matches, I made a small add-on for that.

Picture taken from my seat at TI 11 - Aster vs Liquid

In 2022, I had the pleasure of attending The International (TI) 11 in Singapore with one of my close friends. It was an absolute dream come true. Some time ago, I remember discussing with my friends about saving up RM10,000 to travel to Seattle (as TIs were exclusively held in the US back then) to watch TI in person. To think I actually lived that dream is truly incredible.

Looking back, there's not a second I would change. I've relished every bit of my journey thus far — even the frustrations of dealing with sulking players or griefing teammates wreaking havoc in my games by feeding away couriers to the enemy.

What am I playing today?

One of the most memorable runs in Slay the Spire! A20 + Heart Kill with a Pressure Points deck. Had a blast during this run!

Over time, my gaming preferences gravitated towards indie games. I really enjoy playing games like Slay the Spire, FTL, and Hades.

Currently, I'm playing Risk of Rain 2 with my friends. After years of being away from online gaming, playing with them again feels particularly nostalgic.

My only regret is not taking the time to properly save all those screenshots from back in the day. Unfortunately, I didn't back them up properly and lost everything.

To wrap up, I look forward to sharing these memories with my loved ones — every high, every low. Gaming was, and will always remain, a huge part of my life. No doubt, I loved every second of it!

Shoutout to all my crazy amazing friends — thanks for the memories!

I Explored My Z Shell History. Here’s What I Found

Jerry Ng — Tue, 05 Mar 2024 00:00:00 GMT

As I was reading another interesting blog post about popular git config options, a curious thought crept into my mind: which git commands do I find myself using the most? This also eventually led me down a journey to explore my own terminal usage pattern throughout the day by using metadata like Epoch which is readily available in my .zsh_history.

My Prediction

Let's see... there's git status, git commit, git add, git push, git pull, git checkout... Huh, what else could there be? It seems these are the ones I rely on most frequently in my day-to-day git workflow.

I'm willing to bet that my most frequently used git commands are git commit and git add. If I had to rank the top 5 git commands I use the most, it would probably look something like this:

git status
git add
git commit
git pull
git push

Inside a Command Line Interface (CLI) Tool

Just so we’re on the same page, here’s a brief breakdown of the anatomy of a command (git as an example).

git fetch origin main --depth=5

git: command name
fetch: subcommand name
origin main: arguments (there are 2 in this case)
--depth: options/flags

Verifying My Assumptions

I began pondering: wouldn’t it be fascinating if I could track my local CLI tool usage? Perhaps there are existing tools out there for this purpose. However, considering privacy and security, I wouldn't want to use a third-party hosted solution where sensitive information like passwords or API tokens might accidentally be sent to a server I don’t control.

Suddenly, it struck me – all the commands I’ve run are already recorded in our shell history. What if I could just parse through all the commands that I’ve run from history?

What is in my Zsh history

Before we dive into anything else, let's take a peek inside our shell history first. By exploring its contents, we can get a better idea of what we're working with:

❯ cat ~/.zsh_history | head -n 5
: 1705638216:0;cd wraith
: 1705638264:0;git status
: 1705638987:0;git checkout feat/add-r2-backup
: 1705639215:0;ls
: 1705639390:0;git commit -m 'feat: support sync backup to r2'

It looks like we have some useful information in there. We've got the Epoch, which tells us when each command was executed, the exit status, and the actual commands themselves.

Alright, I think we've got a good starting point to work with!

Identifying the Top 10 Most Used Commands

Let’s start with identifying the top 10 most used commands in my terminal:

history | awk '{print $2}' | sort | uniq -c | sort -rn | head -n 10

Explanation

history: This command displays your command history.
awk '{print $2}': This filters the output of history to only show the second column, which contains the actual command executed.
sort: This sorts the list of commands alphabetically.
uniq -c: This counts the occurrences of each unique command.
sort -n: This sorts the commands by their count in descending order (most frequent first).
tail -n 10: This shows the last 10 entries, which are the top 10 most used commands. Try changing this to 20 or something.

Here, I can’t help to be reminded how remarkable Unix's philosophy is: use small programs that do one thing well, which can then be seamlessly combined or "piped" (|) together to achieve more complex functionalities!

💡

I personally find the tldr CLI tool to be handy for quickly checking command help pages. It's a friendlier alternative to the traditional man pages, making it easier to grasp commands and their usage.

Output:

❯ history | awk '{print $2}' | sort | uniq -c | sort -rn | head -n 10
   3839 git
    426 npm
    409 cd
    395 docker
    278 rm
    269 clasp
    253 poetry
    249 go
    191 wrangler
    177 make

Looking at the output, it’s clear that git commands dominate my terminal activity.

Visualizing it using Mermaid.js

With these numbers in hand, it's time to visualize the data. When I think about representing this information graphically, a pie chart immediately comes to mind. Let’s draw this using Mermaid:

pie title Top 10 Most Used Commands "git" : 3839 "npm" : 426 "cd" : 409 "docker" : 395 "rm" : 278 "clasp" : 269 "poetry" : 253 "go" : 249 "wrangler" : 191 "make" : 177

Not surprisingly, it appears that git commands reign supreme, comprising 59% of all the commands I used!

At the same time, it's quite intriguing to see how different tools and operations occupy varying proportions of my command usage. Here, I'm not surprised to see my Linux make command usage made it into the top 10!

Most Used Git Commands

Now, back to the original question. Let's take a closer look at the git commands I use the most.

My first thought was to simply add grep 'git' to our original command and tweak the awk part, we can hone in on the most frequently used git (sub)commands:

history | grep 'git' | awk '{print $2 " " $3}' | sort | uniq -c | sort -rn | head -n 10

Output:

❯ history | grep 'git' | awk '{print $2 " " $3}' | sort | uniq -c | sort -rn | head -n 10
   1923 git commit
    496 git status
    293 git add
    200 git push
    121 git checkout
     96 git branch
     90 git lg
     85 git pull
     79 go mod
     64 go get

Ah, it seems like some unrelated go commands are showing up in the terminal history despite using grep to filter for git-related commands.

Upon careful inspection after running only history | grep 'git', I realized my original command takes into account arguments, flags, and other unrelated strings!

Instead, we need to filter out only the lines starting with git:

history | awk '$2=="git" {print $2 " " $3}' | sort | uniq -c | sort -rn | head -n 10

Output:

❯ history | awk '$2=="git" {print $2 " " $3}' | sort | uniq -c | sort -rn | head -n 10
   1923 git commit
    496 git status
    293 git add
    200 git push
    121 git checkout
     96 git branch
     90 git lg
     85 git pull
     55 git remote
     51 git st

Yay! This works! Just a side note: git lg and git st are my git aliases in my git configs managed using Chezmoi.

Plotting the top 10 most used git commands

pie title Top 10 Most Used Git Commands "git commit" : 1923 "git status" : 496 "git add" : 293 "git push" : 200 "git checkout" : 121 "git branch" : 96 "git lg" : 90 "git pull" : 85 "git remote" : 55 "git st" : 51

Aligned with my initial prediction, it seems like git commit (56%) is by far the command I've used the most, followed by git status (15%) and git add (9%). However, I was surprised to see that the number of git add was not as closely matched to git commit as I had expected.

I also noticed some other git commands like git push, git checkout, and git branch popping up quite frequently. This makes sense!

Overall, it's interesting to see these patterns and get a glimpse into my workflow with git.

Terminal Activity Pattern

While I was working on this, another idea popped into my head. Remember when we noticed the Epoch entry in the Zsh history? What if we could do something with that date?

💬

I think I just discovered another reason to love Zsh even more than Bash – they have a bit more metadata like Epoch that I can use!

Now, for this task, I don’t think I am able to rely on a single one-liner command like I did before. Well, if the one-liner gets any longer, it won’t be readable anyway.

I wrote a simple Python script to parse our .zsh_history file from our home directory:

import re
from datetime import datetime
from pathlib import Path


def main():
    command_activities = []

    for log_entry in load_zsh_history():
        if not log_entry or log_entry.isspace():
            continue
        epoch_time, command = extract_activity_details(log_entry)
        if not epoch_time or not command:
            continue

        command_activities.append((epoch_time, command))

    hour_of_the_day = group_activities_by_hour(command_activities)
    graph = create_mermaidjs_graph(hour_of_the_day)
    print(graph)


def load_zsh_history():
    zsh_history_path = Path.home() / ".zsh_history"
    with open(zsh_history_path, encoding="latin-1") as file:
        for line in file:
            yield line.strip()


def extract_activity_details(log_entry):
    matches = re.match(r"^:\s(\d+):\d+;(\S+(?:\s\S+)?)", log_entry)
    if matches:
        return matches.groups()
    return None, None


def group_activities_by_hour(command_activities, specific_command=""):
    hours_of_the_day = range(24)
    grouped_activities = {h: 0 for h in hours_of_the_day}

    for epoch_time, command in command_activities:
        if len(specific_command) != 0 and specific_command not in command:
            continue

        normal_time = datetime.fromtimestamp(int(epoch_time))
        hour_of_the_day = normal_time.hour
        grouped_activities[hour_of_the_day] += 1
    return grouped_activities


def create_mermaidjs_graph(grouped_data):
    values = list(grouped_data.values())
    graph = (
        "xychart-beta\n"
        '    title "Terminal Activity by Hour of the Day"\n'
        '    x-axis "Hour of the day"\n'
        '    y-axis "No. of commands run"\n'
        f"    bar {values}\n"
        f"    line {values}\n"
    )
    return graph


if __name__ == "__main__":
    main()

Try running it in your ipython!

Basically here’s how the Python script works:

The script goes through my terminal history file
It looks at each line in the file to see when I did stuff and what I did
It then counts how many times I did each thing during different hours of the day
After it has all that data, it prints out the Mermaidjs syntax — which then allows me to make a little graph below to show me when I'm most active in the terminal

Output:

❯ python3 main.py
xychart-beta
    title "Terminal Activity by Hour of the Day"
    x-axis "Hour of the day"
    y-axis "No. of commands run"
    bar [122, 42, 30, 21, 29, 61, 8, 423, 1003, 881, 624, 500, 363, 595, 412, 477, 455, 404, 351, 587, 807, 884, 920, 364]
    line [122, 42, 30, 21, 29, 61, 8, 423, 1003, 881, 624, 500, 363, 595, 412, 477, 455, 404, 351, 587, 807, 884, 920, 364]

Finally, pasting this on mermaid.live allows me to instantly visualize the breakdown of when I’m the most active in the terminal throughout the day:

xychart-beta title "Terminal Activity by Hour of the Day" x-axis "Hour of the day" y-axis "No. of commands run" bar [122, 42, 30, 21, 29, 61, 8, 423, 1003, 881, 624, 500, 363, 595, 404, 473, 455, 404, 351, 587, 807, 884, 920, 364] line [122, 42, 30, 21, 29, 61, 8, 423, 1003, 881, 624, 500, 363, 595, 404, 473, 455, 404, 351, 587, 807, 884, 920, 364]

From the graph, it appears that my terminal activity usually peaks in the morning hours (at around 9 am) and at night around 10 pm. It's also quite apparent that my terminal activity gradually decreases in the afternoon.

Obviously, I don't often pull an all-nighter.

My Takeaways

I had a lot of fun digging into my terminal usage history and finding out what commands I use the most.

If there’s any takeaway, I’d like to leave you with these two commands:

# Top 10 most used commands:
history | awk '{print $2}' | sort | uniq -c | sort -rn | head -n 10

# Top 10 most used  commands:
history | awk '$2=="git" {print $2 " " $3}' | sort | uniq -c | sort -rn | head -n 10

Do tweak the commands/Python script to fit your needs and learn more about your terminal usage just for fun!

What's Next

While playing around with this, I stumbled upon a project called Atuin that replaces your shell history with an SQLite database, providing additional context for your commands. What's even cooler is that it supports syncing between machines (with the option to self-host it). I haven't tried it out yet, but it sounds promising!

💬

Update (19 Nov 2024): I recently started using Atuin, and I’m really loving it! Atuin has a much nicer UI for reverse-i search. One thing that I appreciate a lot is the ability to easily delete old zsh history entries where I messed up with typos. It’s nice not having to deal with those mistakes popping up in my autocompletion anymore.

I also came across several other projects that piqued my interest but had not used:

resh (contextual shell history for Zsh and Bash)
Zsh-history-analysis (tool for analyzing zsh history)

That’s it, thanks for reading!

How to Update Python Version: The Better Way

Jerry Ng — Mon, 05 Feb 2024 00:00:26 GMT

We’ve all been there — there comes a time when we must update our Python version to meet a different Python version yet, be it at work or when working on personal projects.

Just a quick search about “update python version” and we will be bombarded with suggestions to run python --version followed by brew upgrade python3 or sudo apt-get update.

Okay cool. Problem solved right?

Not really. Probably 9/10 times an upgrade won’t cut it — enter another project. And guess what? It wants a different Python version, maybe an older one just to make our life a little bit more miserable.

So, here we are, stuck between a rock and a hard place, asking ourselves, "Do I downgrade Python now? But what if I need to juggle both projects? I didn't sign up for this symlink or PATH variable wrestling match!”

Let's use `pyenv`?

We know it’s not uncommon either to find ourselves in another project needing a different Python version yet.

Now, with some quick search, you can tell most Python folks swear by pyenv (docs) for managing Python versions.

💡

Tip on a quick search: How to Google With a Bang!

Don’t get me wrong, it works but it’s just not for me.

Well, given that I work with a lot of CLI tools like go, node, terraform, git, etc., I prefer the simplicity of using a single tool – asdf.

In other words, I very much prefer to use asdf to manage all my programming language or CLI tool versions, rather than dealing with the likes of gvm, nvm, and pyenv separately.

`asdf` Python Quick Guide

Beyond Python, asdf supports various plugins. But, let's focus on Python without delving into exhaustive details.

💡

Hint: run asdf plugin list all to list all available plugins.

Installation

Easy, just follow asdf-vm.com/guide/getting-started.html based on your system specifications:

OS (e.g. linux, macOS)
Package manager (e.g. brew, apt, pacman, etc.)
Shell (e.g. zsh, bash, etc.)

For instance, on macOS with Homebrew and ZSH:

# Using Homebrew on macOS
brew install asdf
echo -e "\n. $(brew --prefix asdf)/libexec/asdf.sh" >> ${ZDOTDIR:-~}/.zshrc

Setup

Add the Python plugin:

# asdf plugin add : Adds a plugin for managing a specific runtime
# e.g. : python, nodejs, golang
asdf plugin add python

Basic Usage

Let’s install our first Python version:

# asdf install  : Installs a specific version of a runtime
asdf install python 3.12.1

What if most of your projects rely on Python 3.12.1? Well, let’s set Python 3.12.1 as our global/default Python version:

# asdf global  : Sets a global (default) version of a runtime
asdf global python 3.12.1
cd && python --version # Python 3.12.1

Next, let’s install more Python versions!

asdf install python 3.8.13
asdf install python 3.9.16
asdf install python 3.10.9
asdf install python 3.11.3

Wait, I lost track, how many different Python versions have I installed…?

asdf list python
#  3.10.9
#  3.11.3
# *3.12.1
#  3.8.13
#  3.9.16

"*3.12.1" in the asdf list python output indicates that 3.12.1 is the currently active (local) Python version in the current directory.

Now, you can easily switch between different Python versions:

# Go to my project
cd ~/github.com/ngshiheng/burplist

# Project current Python version
python --version # Python 3.12.1

# But, I need Python 3.8.13
asdf local python 3.8.13

# Yay!
python --version # Python 3.8.13

That’s it! You'll likely find yourself using this set of commands about 80% of the time.

Cheatsheet

Here's a quick refresher:

# "How to install a specific Python version?"
asdf install python 3.12.1

# "What versions have I installed?"
asdf list python

# Set version on global level
asdf global python 3.12.1

# Set version on project level
asdf local python 3.12.1

# "What is my current Python version in this dir?"
python --version

`.tools-version`

Now you may notice that your project directory may contain a file called .tool-versions. It's used to remember which versions of these tools each project needs (reference).

Should I commit this file to Git?

If having the .tool-versions file in your project helps everyone on your team use the same versions of tools, then it's a good idea to include it in your source control. It keeps things consistent for everyone.

Not Just Python

This approach isn't limited to Python; it works for managing versions of other tools like Node.js, Go, etc. All you need to do is to replace "python" with the respective tool/plugin name:

# Same examples, but in golang:
asdf plugin add golang
asdf install golang 1.21.6
asdf list golang
asdf global golang 1.21.6
asdf local golang 1.21.6
go version

# Same examples, but in nodejs:
asdf plugin add nodejs
asdf install nodejs 21.6.1
asdf list nodejs
asdf global nodejs 21.6.1
asdf local nodejs 21.6.1
node --version

Closing Thought

These days, when considering adopting a new tool, I've adopted a systematic approach:

Firstly, I check asdf to see if there's plugin support available (asdf plugin list all). Vet the plugin first!
If not, I explore whether the specific CLI tool has its own version manager like gvm, nvm, rubyenv, pyenv, tfenv, etc.
If neither option is viable, then only I resort to installing the tool from the source via my package manager like brew or apt

Following this decision-making chain has significantly simplified version management for all my tools, saving me considerable time and pain.

References

P/S: A friend recommended an alternative to asdf called mise. I haven't had the chance to check it out yet, but it seems promising. I might give it a try in the near future.

💬

Update (19 Nov 2024): I recently made the switch to mise (pronounced as "MEEZ"), and I have to say, I'm really impressed! The UX feels much better compared to asdf. Over the years, I still struggle to remember the commands for asdf, but I don’t seem to have that problem with mise at all. The author of mise mentioned that asdf has quite a few quirky UX papercuts, and I can definitely relate to that! That said, I still think asdf still has its place. It's still the most popular version manager out there.

My 2023 Year in Review

Jerry Ng — Tue, 02 Jan 2024 00:00:16 GMT

Making Internet money is kinda cool. As I wrap up 2023, I decided to jot down the various Internet revenue streams that I have made throughout the year. However, little did I anticipate the nuances involved.

Ironically, I usually pride myself on being a well-organized person. But honestly, I never anticipated that these inconsequential ventures would bring in any money. So, here I am, realizing I never bother to consolidate everything into a single place. Oops.

So, I ended up finding myself jumping from one platform to another, navigating through a maze of dashboards. It felt like a digital treasure hunt just to nail down the right numbers.

Anyway, let’s see…

TL;DR

In the year 2023, I made a total profit of $920.26 (USD):

Income: $1027.66
Expenses: -$107.40

Revenue

Total: $1,027.66

Medium Partner Program

Income: $229.70

All of my latest articles find their first home on jerrynsh.com. At the same time, they are automatically cross-posted to Medium.com using Zapier:

Zapier cross-posting automation

So, here's the revenue breakdown by month in 2024 (cutoff on 28 December 2023):

Month	Revenue ($)	After Tax ($)
Jan	27.20	19.04
Feb	23.92	16.74
Mar	13.58	9.51
Apr	13.62	9.53
May	15.64	10.95
Jun	14.98	10.49
Jul	22.85	15.99
Aug	9.37	6.56
Sep	21.36	14.95
Oct	22.83	15.98
Nov	15.32	10.72
Dec	127.49	89.24
Total	328.16	229.70

As you can see, most months didn't bring in much, except for the final month when a post about my adventures in automating stuff gained some traction on Medium.

Oh, It's worth noting that I am paying a whopping 30% withholding tax! Ouch.

Screenshot from my Medium Partner Program dashboard

As I was documenting everything, I started to wonder how this compare to last year. Turns out, I’m down by 64.5%. Yeah, a substantial drop from last year. Oh well.

My Stripe dashboard (excluding December's earnings)

Overall, I think Medium is not a bad distribution channel for people who already write.

Income: $117.73 (~$155.94 SGD)

If you've read some of the posts on jerrynsh.com, you may have come across some ads (unless you're using an ad-blocker, of course).

Just a glimpse of my AdSense dashboard

Obviously, the earnings numbers don't mean a thing on their own.

Looking at the views, I'm hitting the 11k mark every month recently. It's kind of wild to think people would want to hang out here on my little corner of the Internet.

Views (2023 vs 2022)

Beyond that, you guys are spending an average of 1 minute and 45 seconds reading stuff here on this blog.

Engagement Time (2023 vs 2022)

About 70% of this traffic comes from organic search, which I think is great.

Traffic Source (2023 vs 2022)

Now, full disclosure, I'm not a huge fan of most ads, but they pull their weight by helping cover my costs for my tiny projects.

Having said that, I'm eyeing a switch to EthicalAds or Carbon Ads next year. The plan is to bring in more relevant and less intrusive ads.

Affiliate/Referrals Rewards

Income: $635.00

This year, somehow, by some dumb luck, I've managed to pull in some decent cash through affiliate/referral links that were scattered in my blog posts a year or two ago:

Nium — $100.00
ScraperAPI — $535.00

I reached out to Nium around 2-3 years ago, but I haven't been writing much about them since the partnership started. Most of them came from a single blog post that I wrote back in Jan 2020.

Nonetheless, I genuinely like their service for money transfers compared to the traditional banking hassle and fees for foreign transactions. Although, I'm not sure how long this revenue stream will keep flowing.

On the other hand, the money brought in by ScraperAPI is quite decent this year. Though, most of my earnings from them come from one loyal user (talk about putting all your eggs in one basket).

I'll be honest, the sustainability of this income source seems a bit iffy. We'll see how it goes.

Tournacat

Income: $40.23

Let me introduce you to Tournacat, my little brainchild from this year. It's a simple Google Calendar workspace add-on that syncs upcoming Esports matches/tournaments right to your Google Calendar.

How Tournacat Pro created Esports calendars and events look like.

Seriously, if you're into Esports, you should totally give it a try — you won't be disappointed!

Tournacat is free to use from the get-go. But, if users feel a bit fancy and opt for the Pro plan for just $2.50/month (at the time of writing; which comes with a purchasing power parity discount too!), they can unlock some pretty neat features.

Oh, sorry for the sales pitch — here’s the breakdown of how much it made:

Lemon Squeezy payout page

For now, I'm letting it grow organically. As I mentioned in my previous blog post, as long as the cost of growth is covered, I'm okay with giving away free stuff.

Donations

Income: $5.00

After years of setting up Ko-fi, I finally got my first donation. I actually did end up buying myself a cup of coffee that day.

Donation email from Ko-fi

Cost/Expenses

Total: -$107.40

Domains

Cost: -$29.64

So far, I've only got 3 domains in my collection: jerrynsh.com, tournacat.com, and burplist.com. Each of them set me back $9.88, summing up to $29.64 for the year.

I've hitched my domain registration and DNS wagon to Cloudflare. Their all-in-one service suits me well, making management a bit easier.

Hosting

Cost: -$77.76

After making the shift from Heroku, my projects now call three different Platform-as-a-Service (PaaS) homes. The good news? Most of them still don't cost me anything yet. The hassle is well worth it I suppose.

However, there's one exception — a Digital Ocean droplet that's currently setting me back $6.48/month. Do the math for a year, and we're looking at $77.76 (taxes included).

Looking down the road, I do expect to start paying for Cloudflare Workers because of Tournacat. I really like them, but yeah, we'll see how that unfolds.

What are you doing with these profits?

The responsible thing would be to invest in the stock market or something. But nahhhhhh...

Well, buying coffee seems like a solid plan, right? Isn't that the point of Ko-fi? Jokes aside. Honestly, no concrete plans. Maybe snag a few more domains for some ideas at the back of my head.

On a side note, I did manage to snatch a couple of games that run well on the Steam Deck. I'm really excited to play them!

Looking into 2024

So, the big question: Do I expect profit/revenue to shoot up? Probably not.

Why? Well, most (if not all) of the revenue sources are inconsistent — take those affiliate links, for example. Without those, the profit numbers would have been down from last year.

Another chunk of the revenue is tied to this blog post (plus cross-posting on Medium). Growing blog traffic is a challenge for me because I don't bother much with SEO or self-promotion on social sites like Twitter, LinkedIn or Facebook.

Closing Thoughts

This blog was largely inspired by a read at “Xe's blog made $2564.42 in profit last year”.

If my younger self were to ask myself about starting a blog solely for the extra cash, I'd say think twice. As you can see, it's not the most profitable idea for the most part. The return on investment (time and effort) is very tough to justify, financially.

I think what kept me going was the fun I got out of writing stuff and the opportunity to talk to random people on the Internet.

Learnings

I'm no hero, but this year taught me that I'm genuinely happy when people use or read the stuff I made — whether it's a blog, software, or projects. Even if no money is rolling in. So please keep sending your random DMs/requests my way.

Earlier in my career, I always thought everything I invested time and effort in must somehow turn into some form of financial gain; otherwise, I was just wasting my time. I'm glad and grateful that I don't hold on to that anymore.

Anyway, thanks for reading! Happy New Year!

A Look Back on 7 Years of Automating Stuff

Jerry Ng — Mon, 04 Dec 2023 00:00:34 GMT

A little bit more than 7 years into my career, I thought it would be fun to pen down a summary post about my adventure in automating various tiny aspects of my life. Most if not all of the stuff here sprouted from my own problems and itches I needed to scratch.

This post will probably read more like a personal diary of the minor nuances that I encountered and the sweet minutes/hours that I managed to snatch back through automation.

On to the first one –

Six Percent: Automating ASNB Purchases

2018 – 2022

The startup window of the bot. This was the first thing that I've made that people actually use

For context, ASNB a unit trust management company offers a fixed-price fund (i.e. it can never go up or down!) that promised a sweet 6% p.a. dividend back then. It’s virtually risk-free.

Well, there was a catch. The limited funds pool meant that it was selling like hotcakes — you had to keep retrying to snag those units. So, what did I do? I wrote this.

Cost

As this turns out to be a tiny Windows executable, it didn’t really cost me any money to host or anything.

Free lunch, literally

Beyond saving me countless mindless hours clicking like a headless chicken, I got a free thank-you meal out of it.

Fast forward to today, and I'm no longer actively using or maintaining the project. Over the years, there have been a few minor bug fixes here and there, but it's essentially retired. I can't guarantee that it still works, but man, it saved me so many hours.

What did I learn

Today, I’ve become quite comfortable with any form of browser automation. If I can interact with it on the web, I can automate it. The script opened doors for tackling repetitive/mundane tasks, aiding in integration testing, etc.

💡

Years later, I stumbled upon go-rod which is significantly better in terms of ease of use and developer experience.

That aside, I did also learn other cool tricks like how to solve a CAPTCHA using OCR and how to package a Python app using PyInstaller!

Todoleet: Daily Leetcode Questions in Todoist

2021 – Present

Back in early 2021, I started to solve the LeetCode Daily challenge as part of my morning routine. Quite frankly, it wasn’t exactly fun for me; it was necessary.

My personal Todolist. Nope, I'm not attempting this one.

How it works

Even though I've ditched my morning LeetCode routine, Todoleet is still up and running today.

Under the hood, it’s merely a simple JS script that talks to the undocumented LeetCode API and then creates a new to-do task using the Todoist API.

If you're looking for the nitty-gritty, I've spilled the beans on how I sync the Daily Leetcode Challenge to my Todoist.

Cost

$0. The free Cloudflare Worker tier has got me covered! This solution costs 1 request per day and I didn’t need to store anything.

Time saved

I did manage to save a few clicks (seconds) every day. They all add up, I guess?

Any learnings?

This was my introduction to Cloudflare Worker. It paved the road for all the other projects I've tinkered with in my free time!

Future plans

I did consider taking this to another level and listing it as a Todoist integration. But, to be honest, I didn't care enough to make it happen.

Burplist: Sipping on Craft Beer Savings

2021 – Present

Craft beers are delicious. There was just one hiccup — the price. Then I figured, wouldn’t it be great if I could have current and historical prices of all craft beers in Singapore, all in a place?

Now, instead of wrestling with 10+ websites for the best deals, a quick search from my database would do the trick. This saves me time and sanity.

I was looking for some Dark Ale as I was writing this

How it works

Burplist is essentially a web scraper built using Scrapy. It scours over 10 local online stores and e-commerce sites every morning (Singapore time), fetching craft beer prices and storing them in a Postgres database. As a result, I've amassed two years of historical craft beer price data in Singapore.

Cost

Other than the ~$10/year for the domain name, running Burplist has always been free. After Heroku phased out of its free tier plan:

The scraper Cron job was moved to Northflank
The Postgres database is hosted on Railway
The website was moved to Koyeb

Free beer for that Christmas

So, initially, I gave it a shot to make some money out of this. All I did was put it up for sale on Gumroad, but it didn't really catch on.

I also tried reaching out to a few companies through cold emails to see if they'd be interested in partnering up with some affiliate links, but only one replied — well, it didn’t work out either.

But hey, no big deal! It's all good because something awesome still came out of it! Thirsty, this local craft beer company, actually surprised me with this amazing package of craft beer for Christmas that year! I was over the moon.

I got a box of delicious craft beers for Christmas that year

Any lesson learned?

This was quite a big one. I’ve had so much fun and learned so much from making this project. Burplist is the fanciest web scraper that I’ve built thus far. I’ve written down some of the learnings in a blog post.

😂

Oh, and here's a quirky side effect: I’m now able to figure out whether a mega-sale/promotion is the real deal or just a clever markup with a discount disguise.

Looking ahead

Future plans? Maybe migrate from Postgres to SQLite. Then, update the daily job to push the SQLite file directly to GitHub and present the data through something like this nifty method. Expected result? One less webserver to babysit. If this ever happens, I’ll probably write about it somewhere.

Wraith: Automating Ghost Blog Backup

2022 – Present

A screenshot of my terminal emulator

At the start of my blogging journey, I didn’t really think too much about what would happen if this $6/month Droplet crashed. I mean, I had all my blog entries in Notion, so I figured "Meh, it wouldn’t hurt to copy-paste ~10 of them back if anything bad happens".

As time went on, the realization hit — losing all my data and whatnot now would really suck. So, the solution? Automate the backup. With the blog serving around 10k views monthly, any downtime or a prolonged 404 or 500 is something I'd rather avoid.

What’s underneath

It’s a pretty simple Bash script that does three things:

ghost backup
mysqldump
rclone to a remote drive (e.g. Dropbox, Google Drive, etc.)

This Bash script gets a weekly run in a Cron job, and it's as easy as that.

Cost

$0. I didn’t have to pay for Dropbox; the free tier fits my needs just fine.

Time saved

I mean, the alternative would be manually SSH-ing into my Droplet, running backup steps one by one — a process taking roughly about 5 minutes or less. So, that's the weekly saving of ~5 minutes.

What did I learn

Picked up a few best practices for writing Bash script along the way. Besides that, I did learn about new tools like expect, rclone, and pass (the Linux password manager).

Future plans

Not a whole lot to be honest. Perhaps a backup restore script could be handy. Thought about moving passwords from plain text to using pass. But, that means users dealing with gpg + pass CLI setup — an extra hurdle.

Tournacat: Sync Esports Schedules to Google Calendar

Started 2023

I love video games. As an adult, I got tired of missing highly anticipated Dota 2 Esports matches, dealing with wonky time zones, and the mental gymnastics of remembering it all. Since Google Calendar is practically my second brain, I figured, why not sync upcoming matches straight into it?

Month view of Google Calendar

Turning it into a micro SaaS

I started by sharing the Dota 2 calendar with friends. Then, a lightbulb moment — if it's handy for them, maybe others would pay for it. And so, D2GCal was born. Pay, and get a public link to the Google Calendar. Simple.

But wait — people are picky about their calendars. Some find it “noisy”, and some want a customized experience. Enter Tournacat, supporting 10+ Esports titles (not just limited to Dota 2!), giving users the power to own and customize their calendars.

Cost

~$10/year for now for the domain name.

Firstly, the website (tournacat.com) operates as a static site built using Hugo. It's currently hosted using Cloudflare Pages which is free.

The Google Workspace add-on is built and runs on Google Apps Script (GAS) for free.

Lastly, Tournacat has an API server running on Cloudflare Worker. It fetches the upcoming Esports events from a data source. Running on Cloudflare Worker is currently still within the free tier limit but it won't be so for long.

The neat part? Tournacat doesn't store any user info. No names, no emails — nothing at all. This means that no database is needed.

Revenue

Lemon Squeezy payout page

Today, Tournacat has about 90 users. In terms of paid subscriptions, it's made a tiny profit of $40.23 in 11 months.

Honestly, I never expected to rake in big bucks anyway. The goal was to cover growing worker usage costs. Any surplus? Well, that's coffee money.

Lessons learned

The journey with Tournacat has been a blast. The realization that a handful of people want and use what you built yourself is an indescribable feeling.

This tiny venture has taught me many valuable lessons:

Developing on the GAS platform
Publishing of a Google Calendar add-on to Workspace
Working with payments and subscriptions
Crafting a micro SaaS

Besides, the journey also introduced me to the mundane world of social media marketing, which, isn't my cup of tea. On the flip side, I had an amazing time speaking to people about their feedback (feel free to ask me anything)!

Overall, I feel like the Esports landscape is shaped by a generation accustomed to abundant free entertainment and tools. Convincing people to spend money can be a tough sell.

Any future plans?

There are plenty of tasks/tickets on the project board, but I'm taking it slow. The plan? I’ll just be working on minor improvements here and there unless specific user requests pop up.

Being a solo developer has its perks — the turnaround time for new requests is pretty quick. I've been able to incorporate feedback almost immediately (as long as it's somewhat reasonable).

Overall, the product feels pretty complete as it is.

SGS Issuance Calendar: Automated T-Bill Tracking

Started 2023

Feeling the manual-checking fatigue for the Singapore MAS T-bill issuance calendar, I decided, "Why not automate this?" So, I crafted a schedule to run every month using Google Apps Script (GAS) and detailed the process in this blog post.

T-bill announcement and auction dates are on my calendar!

How it Works

Under the hood, it's a small GAS project written in TypeScript. The script gets triggered every month; pings the MAS API, and then creates important dates (e.g. announcement/auction date) as Google Calendar events. Simple as that!

Cost

$0. The best part about using GAS is that I don’t need to be bothered by the underlying infrastructure hassle. No fretting over scaling, upgrades, backups, availability — none of that. It just does its thing.

Some lessons learned

This project brought some learnings to the table. I am now able to build GAS projects in TypeScript and delved into the world of writing unit tests for GAS. This newfound skillset later found a home in Tournacat's add-on UI codebase, which is also written in GAS.

What's next?

I've added support for SGS and SSB calendars as well! For now, no concrete plans unless user feedback or GitHub issues come knocking. Two months in, and it's been serving me well.

Closing Thoughts

The journey has been nothing short of fun. My approach to automating/building usually unfolds like this:

Identify my own pain point or problem (note it down immediately!); however small it may be. Look for quick wins!
Prefer leveraging what's already out there. Check if there's something in the market for it, like Zapier or IFTTT
If not, then roll up my sleeves and code/build
See if I can generalize the solution or approach
Write down the process somewhere
Rinse and repeat

“Aren’t some of these side projects?"

Well, I guess. I just think the term feels so worn out at this point. It's like, they're more about automating some parts of my life. Sure, one of them is making coffee money monthly, a bit laughable. The whole idea of “hustling” just doesn't quite vibe with me. I've realized I just want to do things just for fun. No, really.

Thanks for sticking around! 🍻 Here's to more automation to come!

Jerry Ng

Building What Michelin Wouldn’t: Its Awards History

I started looking into it

Using git-history (...and failing at it)

Wayback Machine to the Rescue!

The ever-changing website layout problem

Missing publication dates

Pricing data nightmare

The Infrastructure

Backfilling marathon

Serving the data

The Result

The rise and fall

Green stars follow the money

Top cuisines among the restaurants

Flaws

Accepting imperfection

What's Next

How to Replicate DuckDuckGo Bangs in Firefox

Why I Finally Made the Switch

How to Add Custom Search Shortcuts in Firefox

Step 1: Enable the Feature in about:config

Step 2: Add Your Custom Shortcuts

The Result

My Search Shortcuts

Why Not Uppercase in Go Modules Name?

The Investigation

Running go get with -x flag

How Go Handles Module

Try it out yourself.

Closing Thoughts

References

I Built a Visa Requirement Change Tracker for Fun

Finding the Right Data Source

Designing the Database Schema

Designing the System

The Cron Job

The Scraper

Hosting and Display

Continuous Deployment (CD) With Railway Docker Image Source

The Results (Some Screenshots)

Concerns and Caveats

API Problems

Losing Interest

Increasing Costs

That’s It

4 Ways of Bumping Major Versions in Your Go Project

What I'll Cover

Approach 1: Major Version Subdirectory

How it works

When to use

Tradeoffs

Approach 2: Major Version Branch

How it works

Tradeoffs

Approach 3: Major Version Suffix

How It Works

Tradeoffs

Approach 4: New Repository for Each Major Version

How it works

Tradeoffs

Summary

Closing Thoughts

3 Easy Ways To Add Version Flag in Go

What I Wanted to Do

Method 1: Build time injection

Step 1: Define the version variable

Step 2: Build with the version flag

Step 3: Host the binary somewhere

Downside

What about CI/CD?

Method 2: Read from runtime build info

Here's how we can implement this approach

Method 3: Use the versioninfo module

Closing Thoughts

Solving Canceled Meeting Rooms With Apps Script

Solution? Automate this.

Considerations

Implementation

1. Finding the daily stand-up meeting

Step 1: Enable the Feature in `about:config`

Method 3: Use the `versioninfo` module