% cd ..

Automating Blog Tagging with AI

Tagging is a Chore

Adding tags to blog posts is a surprisingly tedious task.

  • You have to consider consistency with existing tags.
  • When creating new tags, you want to maintain a consistent level of granularity (genre? theme? specific technology?).
  • You need to decide on labeling conventions for each language.
  • As the number of articles grows, reconciling tags with past entries becomes increasingly difficult.

"Rule-based classification" is an area where AI excels. I created a script to automate tagging using the Gemini API and fully automated the process by integrating it into my GitHub Actions pipeline.

tags.ts: The AI Tagging Script

I started by creating a script that runs locally. It does something simple: it passes the body of a Japanese article and the list of existing tags to Gemini, asking it to propose and apply appropriate tags. However, giving the AI total freedom to create tags would lead to chaos, so I established a few rules.

Tag Granularity Rules

I set up three levels of granularity for tags and provide this hierarchy to the AI.

GranularityMeaningExample
largeGenre/Categoryblog, tech, entertainment
mediumField/Themeai, web-development, anime
smallSpecific Tech/Productnextjs, vercel, supabase

The goal is to have at least 2 and at most 10 tags per article, with at least one tag per granularity level (though these are target goals, not hard rules, to avoid inflating tags on short articles).

Reconciliation (Preventing Duplication)

If you let AI create tags freely, duplicates like "ai" and "machine-learning" tend to emerge. I explicitly state in the prompt, "Do not create concepts that overlap with existing tags," allowing the AI to generate new tags only when the topic is truly new.

Japanese Labeling Conventions

I follow the conventions actually used in the Japanese tech community. I do not force transliterations for terms that are commonly used in English.

  • java → "Java" (O), "ジャバ" (X)
  • embedding → "Embedding" (O), "エンベディング" (△)
  • blog → "ブログ" (O)

If I manually edit the tags.json file, the AI will respect it as the source of truth from then on. The design prioritizes human judgment at all times.

Mitigating LLM Output Variance

Since LLM output fluctuates slightly, I added validation (checking tag counts, tag ID existence, etc.) and a retry mechanism for when rules are violated. The trick is to separate rules into "hard rules" and "target goals" so as not to be overly restrictive.

Model Selection

Due to free tier constraints, I switch models based on the task. For tagging, I use a lightweight model suitable for high-volume execution (gemini-3.1-flash-lite-preview, 500 RPD). I manually review the quality of the overall tag system (checking for duplicates, granularity errors, or unnatural labels) separately, using a more capable model (gemini-2.5-flash) since this requires a broader perspective.

The Effect of Prompt Tuning

Let's compare the results of different prompts on the same article (Mechanisms for automatically verifying translation quality with AI).

Simple Prompt

When instructed simply with "Please add appropriate tags to this article":

ai, llm, translation, backtranslation, embedding,
vectorsearch, pgvector, postgresql, nlp, automation

This is full of problems:

  • Duplication: "ai", "llm", and "nlp" overlap conceptually.
  • Duplication: "pgvector" and "postgresql" are essentially the same, just differing in granularity.
  • Ignoring Existing Tags: "backtranslation" was created separately from the existing "translation".
  • Inconsistent Granularity: Genre tags and specific technology tags are listed indiscriminately.
  • Hits the Limit: Used all 10 available slots.

Prompt with Rules

When passing the existing tag list, granularity hierarchy, and reconciliation rules:

[medium] ai, translation
[small]  postgresql, embedding, gemini, supabase

Reduced to 6 tags with no duplicates. It prioritizes existing tags and proposes new ones only when necessary.

This difference exists because the nature of the task changes entirely when asking an AI to "classify freely" versus "classify within this rule system." The latter yields stable results because of the constraints.

Integration into GitHub Actions

With tags.ts running locally, the next step was integrating it into my GitHub Actions translation pipeline.

Design: Tagging → Translation → Combined Commit

My blog already had a system where pushing an article automatically triggers a translation. I added the tagging step beforehand.

push to posts/ja/
  → Detect slug of changed article
  → AI Tagging (tags.ts apply)       ← Added
  → Translation (translate.ts translate)
  → Select Best Translation (translate.ts best)
  → git commit & push all together

The key is combining the tag change and translation change into a single commit. I git add the posts/ja/ (frontmatter update), posts/en/ (English translation), and content/tags.json (new tag) before committing just once. Consolidating these prevents the commit history from becoming cluttered.

This Article is the Test

This article was pushed with tags: [] (no tags). If the pipeline works correctly, the AI should read this article, automatically generate appropriate tags, and write them into the frontmatter.

If this article has tags, the automated tagging pipeline is working properly.

Bonus: Automating Meta Descriptions

Since the tagging mechanism has stabilized, I also started generating the HTML <meta name="description"> at the same time.

What is a meta description?

It is a summary of an article shown in search results or when shared on social media. It is important for SEO; Google search results display up to about 120 Japanese characters.

Usually, one either writes these manually every time or mechanically clips the beginning of the article—both are suboptimal. Manual writing is a chore, and clipping often includes headers like "## Introduction" or cuts off sentences mid-flow.

Generating Tags and Description in One API Call

I simply added, "Also, create a summary of under 120 characters" to my tagging prompt.

{
  "tags": ["tech", "ai", "gemini", "github-actions"],
  "new_tags": {},
  "description": "The 'granularity' and 'reconciliation' in blog tagging...",
  "reasoning": "..."
}

The number of API calls doesn't increase. Since I was already passing the full text for tag suggestions, adding a summary incurs zero additional cost and does not consume more of the free tier's RPD.

Respecting Manual Descriptions

If a description already exists in the frontmatter, it is not overwritten. Since there are cases where a manually written summary is better, the AI is designed to fill in the blank only when it's empty.

Summary

  • Created an AI tagging script (tags.ts) using the Gemini API.
  • Implemented granularity rules, reconciliation, Japanese label generation, and validation with retries.
  • Integrated it into the GitHub Actions translation pipeline, consolidating everything into a single commit.
  • Automatically generate the meta description in the same API call as tagging (zero additional cost).