Using regex to parse frontmatter

January 11, 2024

I had a blog (before this one) which used a content SDK called contentlayer. It did all of the heavy lifting for me such as parsing frontmatter. When deciding to fully rebuild my blog from the ground up, I found out that contentlayer wasn't supporting the latest version of NextJS (read this github issue).

Welp, it was a bummer alright. However, I thought of this to be the chance to do all of the heavy lifting myself. We can't always be dependent on third-party libraries or SDKs.

After searching around the web, I found out that utilizing Javascript regular expressions was a possibility.

What are Javascript Regular Expressions?

Javascript regular expressions (or regex) are used to find specific patterns within strings. Regex is commonly used to...

  • validate text (password validation when signing up)
  • searching throughout a text (find and replace features)

This was perfect as I could set specific patterns to only extract the frontmatter of my blog and snippet posts.

What is frontmatter?

The term "frontmatter" is often used to reference the first section of a book. Frontmatter contains the title, a preface, and much more. Frontmatter in the context of markdown is somewhat similar. Look at the highlighted lines below:

---
title: Using Javascript Regex to parse frontmatter
date: 2024-01-05
tags: Javascript, Regex
---
 
# What are Javascript Regular Expressions?

Frontmatter is the section contained between the ---. It contains the title, date, tags, and much more. It's important as frontmatter is used as metadata of a blog post. We can extract the title and published date and show it to the users.

metadata is data that provides information about other data, but not the content of the data itself - Wikipedia

Using regex to process frontmatter

I would pass the raw content of a post (the entire thing - with the frontmatter and the actual content) to a function to parse only the frontmatter. Here is the code:

function processFrontmatter(rawContent: string) {
  let frontmatterRegex = /---\s([\s\S]*?)\s---/
  let match = frontmatterRegex.exec(rawContent)
  let frontmatterPart = match![1]
  let frontMatter = frontmatterPart.trim().split("\n") // ['title: "Using regex to parse frontmatter"', 'date: "January 11, 2024"']
  ...
}

The highlighted part above was the most important line of code. That code selectively chooses the content between ---. We can test if this regular expression works as we expect using online tools such as regexr.com. I've tested the line of code and it does only select the frontmatter.

image

From this point forward, I used the frontmatter as metadata within my blog.

Using regular expressions for my project was extremely difficult. I found out that I wasn't the only one because regex was notorious for its steep learning curve. However, this experience taught me how useful regex can be.

Thank you very much to Lee Robinson as his blog was the inspiration for mine. I referred to his code and learned a lot :)


Go back to list