Post

How to Support Multiple Languages on Jekyll Blog with Polyglot (2) - Troubleshooting Chirpy Theme Build Failures and Search Function Errors

This post introduces the process of implementing multilingual support by applying the Polyglot plugin to a Jekyll blog based on 'jekyll-theme-chirpy'. This is the second post in the series, covering the identification and resolution of errors that occur when applying Polyglot to the Chirpy theme.

How to Support Multiple Languages on Jekyll Blog with Polyglot (2) - Troubleshooting Chirpy Theme Build Failures and Search Function Errors

Overview

About 4 months ago, in early July 12024 of the Holocene calendar, I added multilingual support to this Jekyll-based blog hosted on Github Pages by applying the Polyglot plugin. This series shares the bugs encountered during the process of applying the Polyglot plugin to the Chirpy theme, their solutions, and how to write HTML headers and sitemap.xml with SEO in mind. The series consists of 2 posts, and this is the second post in the series.

Requirements

  • The built result (web pages) should be provided in language-specific paths (e.g., /posts/ko/, /posts/ja/).
  • To minimize the additional time and effort required for multilingual support, the system should automatically recognize the language based on the local path (e.g., /_posts/ko/, /_posts/ja/) without having to specify ‘lang’ and ‘permalink’ tags in the YAML front matter of each original markdown file.
  • The header section of each page on the site should include appropriate Content-Language meta tags and hreflang alternate tags to meet Google’s multilingual search SEO guidelines.
  • The site should provide links to all pages supporting each language without omissions in sitemap.xml, and sitemap.xml itself should exist only once in the root path without duplication.
  • All features provided by the Chirpy theme should work properly on each language page, and if not, they should be modified to work properly.
    • ‘Recently Updated’, ‘Trending Tags’ features working properly
    • No errors during the build process using GitHub Actions
    • Blog post search function in the upper right corner working properly

Before We Begin

This post is a continuation of Part 1, so if you haven’t read it yet, I recommend reading the previous post first.

Troubleshooting (‘relative_url_regex’: target of repeat operator is not specified)

After completing the previous steps, when I ran the bundle exec jekyll serve command to test the build, it failed with the error 'relative_url_regex': target of repeat operator is not specified.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...(omitted)
                    ------------------------------------------------
      Jekyll 4.3.4   Please append `--trace` to the `serve` command 
                     for any additional information or backtrace. 
                    ------------------------------------------------
/Users/yunseo/.gem/ruby/3.2.2/gems/jekyll-polyglot-1.8.1/lib/jekyll/polyglot/
patches/jekyll/site.rb:234:in `relative_url_regex': target of repeat operator 
is not specified: /href="?\/((?:(?!*.gem)(?!*.gemspec)(?!tools)(?!README.md)(
?!LICENSE)(?!*.config.js)(?!rollup.config.js)(?!package*.json)(?!.sass-cache)
(?!.jekyll-cache)(?!gemfiles)(?!Gemfile)(?!Gemfile.lock)(?!node_modules)(?!ve
ndor\/bundle\/)(?!vendor\/cache\/)(?!vendor\/gems\/)(?!vendor\/ruby\/)(?!en\/
)(?!ko\/)(?!es\/)(?!pt-BR\/)(?!ja\/)(?!fr\/)(?!de\/)[^,'"\s\/?.]+\.?)*(?:\/[^
\]\[)("'\s]*)?)"/ (RegexpError)

...(omitted)

After searching to see if similar issues had been reported, I found that exactly the same issue had already been registered in the Polyglot repository, and a solution existed.

The Chirpy theme’s _config.yml file contains the following syntax:

1
2
3
4
5
6
7
8
9
exclude:
  - "*.gem"
  - "*.gemspec"
  - docs
  - tools
  - README.md
  - LICENSE
  - "*.config.js"
  - package*.json

The cause of the problem lies in the regex syntax in the following two functions in Polyglot’s site.rb, which cannot properly handle globbing patterns with wildcards like "*.gem", "*.gemspec", and "*.config.js".

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
    # a regex that matches relative urls in a html document
    # matches href="baseurl/foo/bar-baz" href="/foo/bar-baz" and others like it
    # avoids matching excluded files.  prepare makes sure
    # that all @exclude dirs have a trailing slash.
    def relative_url_regex(disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          regex += "(?!#{x})"
        end
        @languages.each do |x|
          regex += "(?!#{x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{#{start}="?#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

    # a regex that matches absolute urls in a html document
    # matches href="http://baseurl/foo/bar-baz" and others like it
    # avoids matching excluded files.  prepare makes sure
    # that all @exclude dirs have a trailing slash.
    def absolute_url_regex(url, disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          regex += "(?!#{x})"
        end
        @languages.each do |x|
          regex += "(?!#{x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{(?<!hreflang="#{@default_lang}" )#{start}="?#{url}#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

There are two ways to solve this problem.

1. Fork Polyglot and modify the problematic parts

As of the time of writing this post (11.12024), the Jekyll official documentation states that the exclude setting supports globbing patterns.

“This configuration option supports Ruby’s File.fnmatch filename globbing patterns to match multiple entries to exclude.”

In other words, the root cause is not in the Chirpy theme but in Polyglot’s relative_url_regex() and absolute_url_regex() functions, so the fundamental solution is to modify them to prevent the problem.

Since this bug has not yet been fixed in Polyglot, you can fork the Polyglot repository with reference to this blog post and the answer to the previous GitHub issue, modify the problematic parts as follows, and use it instead of the original Polyglot.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    def relative_url_regex(disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x})"
        end
        @languages.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{#{start}="?#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

    def absolute_url_regex(url, disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x})"
        end
        @languages.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{(?<!hreflang="#{@default_lang}" )#{start}="?#{url}#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

2. Replace globbing patterns with exact filenames in the Chirpy theme’s ‘_config.yml’ configuration file

The proper and ideal solution would be for the above patch to be incorporated into the Polyglot mainstream. However, until then, you would need to use a forked version, which can be cumbersome as you would need to keep up with upstream Polyglot updates. Therefore, I used a different approach.

If you check the files in the root path of the Chirpy theme repository that match the patterns "*.gem", "*.gemspec", and "*.config.js", there are only 3 files:

  • jekyll-theme-chirpy.gemspec
  • purgecss.config.js
  • rollup.config.js

Therefore, you can delete the globbing patterns in the exclude section of the _config.yml file and replace them as follows so that Polyglot can process them without issues.

1
2
3
4
5
6
7
8
9
exclude: # Modified with reference to https://github.com/untra/polyglot/issues/204
  # - "*.gem"
  - jekyll-theme-chirpy.gemspec # - "*.gemspec"
  - tools
  - README.md
  - LICENSE
  - purgecss.config.js # - "*.config.js"
  - rollup.config.js
  - package*.json

Modifying the Search Function

After completing the previous steps, almost all site functions worked satisfactorily as intended. However, I later discovered that the search bar located in the upper right corner of pages using the Chirpy theme could not index pages in languages other than site.default_lang (English in the case of this blog), and when searching in languages other than English, it still displayed English pages in the search results.

To understand the cause, let’s look at what files are involved in the search function and where the problem occurs.

‘_layouts/default.html’

Looking at the _layouts/default.html file that forms the template for all pages on the blog, we can see that it loads the contents of search-results.html and search-loader.html inside the <body> element.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  <body>
    {% include sidebar.html lang=lang %}

    <div id="main-wrapper" class="d-flex justify-content-center">
      <div class="container d-flex flex-column px-xxl-5">
        
        (...omitted...)

        {% include_cached search-results.html lang=lang %}
      </div>

      <aside aria-label="Scroll to Top">
        <button id="back-to-top" type="button" class="btn btn-lg btn-box-shadow">
          <i class="fas fa-angle-up"></i>
        </button>
      </aside>
    </div>

    (...omitted...)

    {% include_cached search-loader.html lang=lang %}
  </body>

‘_includes/search-result.html’

_includes/search-result.html creates a search-results container to store search results for keywords entered in the search box.

1
2
3
4
5
6
7
8
9
10
<!-- The Search results -->

<div id="search-result-wrapper" class="d-flex justify-content-center d-none">
  <div class="col-11 content">
    <div id="search-hints">
      {% include_cached trending-tags.html %}
    </div>
    <div id="search-results" class="d-flex flex-wrap justify-content-center text-muted mt-3"></div>
  </div>
</div>

‘_includes/search-loader.html’

_includes/search-loader.html is the core part that implements search based on the Simple-Jekyll-Search library. It executes JavaScript in the visitor’s browser to find matches for input keywords in the search.json index file and returns the corresponding post links as <article> elements, operating on the client side.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{% capture result_elem %}
  <article class="px-1 px-sm-2 px-lg-4 px-xl-0">
    <header>
      <h2><a href="{url}">{title}</a></h2>
      <div class="post-meta d-flex flex-column flex-sm-row text-muted mt-1 mb-1">
        {categories}
        {tags}
      </div>
    </header>
    <p>{snippet}</p>
  </article>
{% endcapture %}

{% capture not_found %}<p class="mt-5">{{ site.data.locales[include.lang].search.no_results }}</p>{% endcapture %}

<script>
  {% comment %} Note: dependent library will be loaded in `js-selector.html` {% endcomment %}
  document.addEventListener('DOMContentLoaded', () => {
    SimpleJekyllSearch({
      searchInput: document.getElementById('search-input'),
      resultsContainer: document.getElementById('search-results'),
      json: '{{ '/assets/js/data/search.json' | relative_url }}',
      searchResultTemplate: '{{ result_elem | strip_newlines }}',
      noResultsText: '{{ not_found }}',
      templateMiddleware: function(prop, value, template) {
        if (prop === 'categories') {
          if (value === '') {
            return `${value}`;
          } else {
            return `<div class="me-sm-4"><i class="far fa-folder fa-fw"></i>${value}</div>`;
          }
        }

        if (prop === 'tags') {
          if (value === '') {
            return `${value}`;
          } else {
            return `<div><i class="fa fa-tag fa-fw"></i>${value}</div>`;
          }
        }
      }
    });
  });
</script>

‘/assets/js/data/search.json’

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
layout: compress
swcache: true
---

[
  {% for post in site.posts %}
  {
    "title": {{ post.title | jsonify }},
    "url": {{ post.url | relative_url | jsonify }},
    "categories": {{ post.categories | join: ', ' | jsonify }},
    "tags": {{ post.tags | join: ', ' | jsonify }},
    "date": "{{ post.date }}",
    {% include no-linenos.html content=post.content %}
    {% assign _content = content | strip_html | strip_newlines %}
    "snippet": {{ _content | truncate: 200 | jsonify }},
    "content": {{ _content | jsonify }}
  }{% unless forloop.last %},{% endunless %}
  {% endfor %}
]

This file uses Jekyll’s Liquid syntax to define a JSON file containing the title, URL, category and tag information, creation date, the first 200 characters of the content as a snippet, and the full content of all posts on the site.

Search Function Structure and Problem Identification

To summarize, when hosting the Chirpy theme on GitHub Pages, the search function operates through the following process:

stateDiagram
  state "Changes" as CH
  state "Build start" as BLD
  state "Create search.json" as IDX
  state "Static Website" as DEP
  state "In Test" as TST
  state "Search Loader" as SCH
  state "Results" as R
    
  [*] --> CH: Make Changes
  CH --> BLD: Commit & Push origin
  BLD --> IDX: jekyll build
  IDX --> TST: Build Complete
  TST --> CH: Error Detected
  TST --> DEP: Deploy
  DEP --> SCH: Search Input
  SCH --> R: Return Results
  R --> [*]

I confirmed that search.json is created for each language by Polyglot as follows:

  • /assets/js/data/search.json
  • /ko/assets/js/data/search.json
  • /es/assets/js/data/search.json
  • /pt-BR/assets/js/data/search.json
  • /ja/assets/js/data/search.json
  • /fr/assets/js/data/search.json
  • /de/assets/js/data/search.json

Therefore, the problematic part is the “Search Loader”. The issue of non-English pages not being searchable occurs because _includes/search-loader.html statically loads only the English index file (/assets/js/data/search.json) regardless of the language of the page being visited.

  • However, unlike markdown or html format files, for JSON files, Polyglot wrappers for Jekyll-provided variables like post.title, post.content work, but the Relativized Local Urls feature does not seem to work.
  • Similarly, I confirmed during testing that within JSON file templates, it’s not possible to access additional liquid tags provided by Polyglot such as {{ site.default_lang }}, {{ site.active_lang }} beyond the variables provided by Jekyll.

Therefore, while values like title, snippet, and content in the index file are generated differently for each language, the url value returns the default path without considering the language, and appropriate handling needs to be added to the “Search Loader” part.

Solution

To solve this, modify the content of _includes/search-loader.html as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{% capture result_elem %}
  <article class="px-1 px-sm-2 px-lg-4 px-xl-0">
    <header>
      {% if site.active_lang != site.default_lang %}
      <h2><a {% static_href %}href="/{{ site.active_lang }}{url}"{% endstatic_href %}>{title}</a></h2>
      {% else %}
      <h2><a href="{url}">{title}</a></h2>
      {% endif %}

(...omitted...)

<script>
  {% comment %} Note: dependent library will be loaded in `js-selector.html` {% endcomment %}
  document.addEventListener('DOMContentLoaded', () => {
    {% assign search_path = '/assets/js/data/search.json' %}
    {% if site.active_lang != site.default_lang %}
      {% assign search_path = '/' | append: site.active_lang | append: search_path %}
    {% endif %}
    
    SimpleJekyllSearch({
      searchInput: document.getElementById('search-input'),
      resultsContainer: document.getElementById('search-results'),
      json: '{{ search_path | relative_url }}',
      searchResultTemplate: '{{ result_elem | strip_newlines }}',

(...omitted)
  • I modified the liquid syntax in the {% capture result_elem %} section to add the prefix "/{{ site.active_lang }}" before the post URL loaded from the JSON file when site.active_lang (current page language) is different from site.default_lang (site default language).
  • Similarly, I modified the <script> section to compare the current page language with the site default language during the build process, and set search_path to the default path (/assets/js/data/search.json) if they are the same, or to the language-specific path (e.g., /ko/assets/js/data/search.json) if they are different.

After making these modifications and rebuilding the website, I confirmed that search results are displayed correctly for each language.

Since {url} is a placeholder for the URL value read from the JSON file and not a URL itself, it is not recognized as a localization target by Polyglot, so it needs to be handled directly according to the language. The problem is that "/{{ site.active_lang }}{url}" is recognized as a URL, and although localization has already been completed, Polyglot doesn’t know that and tries to perform localization again (e.g., "/ko/ko/posts/example-post"). To prevent this, I specified the {% static_href %} tag.

This post is licensed under CC BY-NC 4.0 by the author.