% cd ..

Building a Cyberpunk Blog with Next.js + Vercel

Motivation

I wanted to build a multilingual blog — not just translating content, but also automatically verifying translation quality with AI.

The approach is called "Back-Translation." A Japanese article is translated into English, then that English is translated back into Japanese. By comparing the original Japanese with the round-tripped version, a numerical measure of translation accuracy can be obtained. This is automated using PostgreSQL's pgvector extension with cosine similarity.

It would also be interesting to run multiple translation models (Gemini, Llama, DeepSeek, etc.) simultaneously and rank which model translates best.

Tech Stack

Here's the finalized stack. Everything can be started for free.

  • Next.js 16 (App Router) — Frontend
  • Markdown — Articles managed in Git with frontmatter
  • Vercel — Auto build and deploy on GitHub push
  • Supabase — PostgreSQL + pgvector for translation data
  • Google Gemini API — Translation + embedding (free tier)
  • Tailwind CSS — Styling

Design Considerations

To CMS or Not to CMS

Originally, I planned to use Payload CMS for its WordPress-like writing experience.

However, with a GitHub push → Vercel auto-deploy workflow, adding a CMS layer felt unnecessary. The simplest approach turned out to be "write Markdown, push, done."

Payload CMS is planned for a later phase, partly as a learning exercise with headless CMS. Having a working Markdown-based blog first makes it easier to appreciate the differences.

Hybrid Content Management

Translation management ended up with an interesting hybrid design.

  • Supabase (research) — Stores all translation results and scores from every model. Viewable on a dashboard for comparison
  • Git (production) — The best translation is auto-committed to posts/en/ as Markdown. This is what readers see

Readers only see the "best version" on Git. Behind the scenes, translations from multiple models accumulate, allowing quality trends to be tracked over time.

Two-Repository Structure

The project is split into a public repository (dazy-blog) and a private repository (dazy-blog-system).

  • Articles and the Next.js frontend are public — they're going to be published as a blog anyway
  • Translation pipeline, design docs, and API key configuration stay private

The translation pipeline lives in GitHub Actions on the private repo side, triggered by repository_dispatch when the public repo receives a push.

Language Switching

Language switching uses URL prefixes.

/ja/posts/hello-world  → Japanese version
/en/posts/hello-world  → English version

Accessing / redirects to /ja. The [JA] [EN] buttons in the top-right header handle switching.

Multilingual Tags

Tags are managed by ID (lowercase English), with display names mapped per language in tags.json.

{
  "tech": { "ja": "テック", "en": "Tech" },
  "anime": { "ja": "アニメ", "en": "Anime" }
}

Frontmatter only contains the ID. Display names are resolved automatically based on the current language. Adding Chinese support in the future is just a matter of adding "zh" entries to tags.json.

Translation Quality Verification Concept

The most exciting part of this blog is the automated translation quality verification.

Basic Flow

  1. Translate the original (Japanese) to English using an AI model
  2. Translate the English back to Japanese using the same model (back-translation)
  3. Generate embeddings for both the original and back-translated Japanese (Gemini Embedding)
  4. Calculate cosine similarity with pgvector
  5. Higher score = better meaning preservation = better translation

The key insight is that no reference translation is needed. Traditional metrics like BLEU require a human-made reference translation for comparison, but this approach is self-contained through the round-trip.

Multi-Model Comparison

The same source text is translated by Gemini, Llama (Groq), DeepSeek, and others, with scores ranked. Casual blog writing with its free-form style should reveal each model's strengths and weaknesses quite clearly.

With three free models (Gemini + Llama + DeepSeek) available at no cost, the plan is to start experimenting right away. The embedding model is unified to Gemini — using different "rulers" would make fair comparison impossible.

eof