Post

How to Support Multiple Languages on Jekyll Blog with Polyglot (2) - Troubleshooting Chirpy Theme Build Failure and Search Function Errors

This post introduces the process of implementing multilingual support by applying the Polyglot plugin to a Jekyll blog based on 'jekyll-theme-chirpy'. This is the second post in the series, covering the identification and resolution of errors that occurred when applying Polyglot to the Chirpy theme.

How to Support Multiple Languages on Jekyll Blog with Polyglot (2) - Troubleshooting Chirpy Theme Build Failure and Search Function Errors

Overview

About 4 months ago, in early July 2024, I added multilingual support to this blog, which is hosted via GitHub Pages based on Jekyll, by applying the Polyglot plugin. This series shares the bugs encountered during the process of applying the Polyglot plugin to the Chirpy theme, their resolution process, and how to write HTML headers and sitemap.xml considering SEO. The series consists of 2 posts, and this post you’re reading is the second one in the series.

Requirements

  • The built result (web pages) should be provided in language-specific paths (e.g., /posts/ko/, /posts/ja/).
  • To minimize additional time and effort required for multilingual support, the language should be automatically recognized based on the local path (e.g., /_posts/ko/, /_posts/ja/) of the original markdown file during build, without having to specify ‘lang’ and ‘permalink’ tags in the YAML front matter of each file.
  • The header of each page on the site should include appropriate Content-Language meta tags and hreflang alternate tags to meet Google’s multilingual search SEO guidelines.
  • All page links supporting each language on the site should be provided in sitemap.xml without omission, and sitemap.xml itself should exist only once in the root path without duplication.
  • All functions provided by the Chirpy theme should work normally on each language page, and if not, they should be modified to work properly.
    • ‘Recently Updated’, ‘Trending Tags’ functions working normally
    • No errors occurring during the build process using GitHub Actions
    • Post search function in the upper right corner of the blog working normally

Before We Start

This post is a continuation of Part 1, so if you haven’t read it yet, it’s recommended to read the previous post first.

Troubleshooting (‘relative_url_regex’: target of repeat operator is not specified)

After proceeding with the previous steps, when I ran the bundle exec jekyll serve command to test the build, it failed with the error 'relative_url_regex': target of repeat operator is not specified.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...(omitted)
                    ------------------------------------------------
      Jekyll 4.3.4   Please append `--trace` to the `serve` command 
                     for any additional information or backtrace. 
                    ------------------------------------------------
/Users/yunseo/.gem/ruby/3.2.2/gems/jekyll-polyglot-1.8.1/lib/jekyll/polyglot/
patches/jekyll/site.rb:234:in `relative_url_regex': target of repeat operator 
is not specified: /href="?\/((?:(?!*.gem)(?!*.gemspec)(?!tools)(?!README.md)(
?!LICENSE)(?!*.config.js)(?!rollup.config.js)(?!package*.json)(?!.sass-cache)
(?!.jekyll-cache)(?!gemfiles)(?!Gemfile)(?!Gemfile.lock)(?!node_modules)(?!ve
ndor\/bundle\/)(?!vendor\/cache\/)(?!vendor\/gems\/)(?!vendor\/ruby\/)(?!en\/
)(?!ko\/)(?!es\/)(?!pt-BR\/)(?!ja\/)(?!fr\/)(?!de\/)[^,'"\s\/?.]+\.?)*(?:\/[^
\]\[)("'\s]*)?)"/ (RegexpError)

...(omitted)

After searching to see if a similar issue had been reported, I found that exactly the same issue had already been registered in the Polyglot repository, and a solution existed as well.

The Chirpy theme’s _config.yml file that this blog is using contains the following statement:

1
2
3
4
5
6
7
8
9
exclude:
  - "*.gem"
  - "*.gemspec"
  - docs
  - tools
  - README.md
  - LICENSE
  - "*.config.js"
  - package*.json

The cause of the problem lies in the fact that the regular expression statements in the following two functions included in Polyglot’s site.rb file cannot properly handle globbing patterns that include wildcards like "*.gem", "*.gemspec", "*.config.js" above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
    # a regex that matches relative urls in a html document
    # matches href="baseurl/foo/bar-baz" href="/foo/bar-baz" and others like it
    # avoids matching excluded files.  prepare makes sure
    # that all @exclude dirs have a trailing slash.
    def relative_url_regex(disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          regex += "(?!#{x})"
        end
        @languages.each do |x|
          regex += "(?!#{x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{#{start}="?#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

    # a regex that matches absolute urls in a html document
    # matches href="http://baseurl/foo/bar-baz" and others like it
    # avoids matching excluded files.  prepare makes sure
    # that all @exclude dirs have a trailing slash.
    def absolute_url_regex(url, disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          regex += "(?!#{x})"
        end
        @languages.each do |x|
          regex += "(?!#{x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{(?<!hreflang="#{@default_lang}" )#{start}="?#{url}#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

There are two ways to solve this problem.

1. Fork Polyglot and use it after modifying the problematic parts

As of the time of writing this post (November 2024), the Jekyll official documentation states that the exclude configuration supports the use of globbing patterns.

“This configuration option supports Ruby’s File.fnmatch filename globbing patterns to match multiple entries to exclude.”

In other words, the cause of the problem is not in the Chirpy theme but in the relative_url_regex() and absolute_url_regex() functions of Polyglot, so modifying them to prevent the problem from occurring is the fundamental solution.

Since this bug has not yet been resolved in Polyglot, you can fork the Polyglot repository and use it instead of the original Polyglot by modifying the problematic parts as follows, referring to this blog post and the answer to the previous GitHub issue.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    def relative_url_regex(disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x})"
        end
        @languages.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{#{start}="?#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

    def absolute_url_regex(url, disabled = false)
      regex = ''
      unless disabled
        @exclude.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x})"
        end
        @languages.each do |x|
          escaped_x = Regexp.escape(x)
          regex += "(?!#{escaped_x}\/)"
        end
      end
      start = disabled ? 'ferh' : 'href'
      %r{(?<!hreflang="#{@default_lang}" )#{start}="?#{url}#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
    end

2. Replace globbing patterns with exact file names in the ‘_config.yml’ configuration file of the Chirpy theme

In fact, the proper and ideal method would be for the above patch to be reflected in the Polyglot mainstream. However, until then, a forked version must be used instead, which can be cumbersome to keep up with and reflect updates every time the Polyglot upstream is versioned up. Therefore, I used a different method.

If you check the files located in the project root path in the Chirpy theme repository that correspond to the "*.gem", "*.gemspec", "*.config.js" patterns, there are only 3 of them anyway:

  • jekyll-theme-chirpy.gemspec
  • purgecss.config.js
  • rollup.config.js

Therefore, if you delete the globbing patterns in the exclude statement of the _config.yml file and replace them as follows, Polyglot can process them without any problems.

1
2
3
4
5
6
7
8
9
exclude: # Modified referring to the issue https://github.com/untra/polyglot/issues/204.
  # - "*.gem"
  - jekyll-theme-chirpy.gemspec # - "*.gemspec"
  - tools
  - README.md
  - LICENSE
  - purgecss.config.js # - "*.config.js"
  - rollup.config.js
  - package*.json

Modifying the Search Function

When I proceeded up to the previous steps, almost all site functions worked satisfactorily as intended. However, I later discovered that the search bar located in the upper right corner of the page applying the Chirpy theme could not index pages in languages other than site.default_lang (English in the case of this blog), and when searching in languages other than English, it output English pages as search results.

To understand the cause, let’s look at what files are involved in the search function and where the problem occurs.

‘_layouts/default.html’

If you check the _layouts/default.html file that forms the framework for all pages in the blog, you can see that it loads the contents of search-results.html and search-loader.html inside the <body> element.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  <body>
    {% include sidebar.html lang=lang %}

    <div id="main-wrapper" class="d-flex justify-content-center">
      <div class="container d-flex flex-column px-xxl-5">
        
        (...omitted...)

        {% include_cached search-results.html lang=lang %}
      </div>

      <aside aria-label="Scroll to Top">
        <button id="back-to-top" type="button" class="btn btn-lg btn-box-shadow">
          <i class="fas fa-angle-up"></i>
        </button>
      </aside>
    </div>

    (...omitted...)

    {% include_cached search-loader.html lang=lang %}
  </body>

‘_includes/search-result.html’

_includes/search-result.html constructs the search-results container to store search results for the keyword when a search term is entered in the search box.

1
2
3
4
5
6
7
8
9
10
<!-- The Search results -->

<div id="search-result-wrapper" class="d-flex justify-content-center d-none">
  <div class="col-11 content">
    <div id="search-hints">
      {% include_cached trending-tags.html %}
    </div>
    <div id="search-results" class="d-flex flex-wrap justify-content-center text-muted mt-3"></div>
  </div>
</div>

‘_includes/search-loader.html’

_includes/search-loader.html is the core part that implements the search based on the Simple-Jekyll-Search library. It can be seen that it operates on the Client-Side by executing JavaScript in the visitor’s browser to find parts that match the input keyword among the contents of the search.json index file and return the corresponding post link as an <article> element.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{% capture result_elem %}
  <article class="px-1 px-sm-2 px-lg-4 px-xl-0">
    <header>
      <h2><a href="{url}">{title}</a></h2>
      <div class="post-meta d-flex flex-column flex-sm-row text-muted mt-1 mb-1">
        {categories}
        {tags}
      </div>
    </header>
    <p>{snippet}</p>
  </article>
{% endcapture %}

{% capture not_found %}<p class="mt-5">{{ site.data.locales[include.lang].search.no_results }}</p>{% endcapture %}

<script>
  {% comment %} Note: dependent library will be loaded in `js-selector.html` {% endcomment %}
  document.addEventListener('DOMContentLoaded', () => {
    SimpleJekyllSearch({
      searchInput: document.getElementById('search-input'),
      resultsContainer: document.getElementById('search-results'),
      json: '{{ '/assets/js/data/search.json' | relative_url }}',
      searchResultTemplate: '{{ result_elem | strip_newlines }}',
      noResultsText: '{{ not_found }}',
      templateMiddleware: function(prop, value, template) {
        if (prop === 'categories') {
          if (value === '') {
            return `${value}`;
          } else {
            return `<div class="me-sm-4"><i class="far fa-folder fa-fw"></i>${value}</div>`;
          }
        }

        if (prop === 'tags') {
          if (value === '') {
            return `${value}`;
          } else {
            return `<div><i class="fa fa-tag fa-fw"></i>${value}</div>`;
          }
        }
      }
    });
  });
</script>

‘/assets/js/data/search.json’

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
layout: compress
swcache: true
---

[
  {% for post in site.posts %}
  {
    "title": {{ post.title | jsonify }},
    "url": {{ post.url | relative_url | jsonify }},
    "categories": {{ post.categories | join: ', ' | jsonify }},
    "tags": {{ post.tags | join: ', ' | jsonify }},
    "date": "{{ post.date }}",
    {% include no-linenos.html content=post.content %}
    {% assign _content = content | strip_html | strip_newlines %}
    "snippet": {{ _content | truncate: 200 | jsonify }},
    "content": {{ _content | jsonify }}
  }{% unless forloop.last %},{% endunless %}
  {% endfor %}
]

It defines a JSON file containing the title, URL, category and tag information, creation date, the first 200 characters snippet of the content, and the full content of all posts on the site using Jekyll’s Liquid syntax.

Search Function Operation Structure and Problem Identification

In summary, when hosting the Chirpy theme on GitHub Pages, the search function operates in the following process:

stateDiagram
  state "Changes" as CH
  state "Build start" as BLD
  state "Create search.json" as IDX
  state "Static Website" as DEP
  state "In Test" as TST
  state "Search Loader" as SCH
  state "Results" as R
    
  [*] --> CH: Make Changes
  CH --> BLD: Commit & Push origin
  BLD --> IDX: jekyll build
  IDX --> TST: Build Complete
  TST --> CH: Error Detected
  TST --> DEP: Deploy
  DEP --> SCH: Search Input
  SCH --> R: Return Results
  R --> [*]

Here, I confirmed that search.json is created for each language by Polyglot as follows:

  • /assets/js/data/search.json
  • /ko/assets/js/data/search.json
  • /es/assets/js/data/search.json
  • /pt-BR/assets/js/data/search.json
  • /ja/assets/js/data/search.json
  • /fr/assets/js/data/search.json
  • /de/assets/js/data/search.json

Therefore, the problematic part is the “Search Loader”. The problem of pages in languages other than English not being searched occurs because _includes/search-loader.html statically loads only the English index file (/assets/js/data/search.json) regardless of the language of the page currently being visited.

Therefore, while values such as title, snippet, content in the index file are generated differently for each language, the url value returns the default path without considering the language, and appropriate handling for this needs to be added to the “Search Loader” part.

Problem Resolution

To solve this, you need to modify the contents of _includes/search-loader.html as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{% capture result_elem %}
  <article class="px-1 px-sm-2 px-lg-4 px-xl-0">
    <header>
      {% if site.active_lang != site.default_lang %}
      <h2><a {% static_href %}href="/{{ site.active_lang }}{url}"{% endstatic_href %}>{title}</a></h2>
      {% else %}
      <h2><a href="{url}">{title}</a></h2>
      {% endif %}

(...omitted...)

<script>
  {% comment %} Note: dependent library will be loaded in `js-selector.html` {% endcomment %}
  document.addEventListener('DOMContentLoaded', () => {
    {% assign search_path = '/assets/js/data/search.json' %}
    {% if site.active_lang != site.default_lang %}
      {% assign search_path = '/' | append: site.active_lang | append: search_path %}
    {% endif %}
    
    SimpleJekyllSearch({
      searchInput: document.getElementById('search-input'),
      resultsContainer: document.getElementById('search-results'),
      json: '{{ search_path | relative_url }}',
      searchResultTemplate: '{{ result_elem | strip_newlines }}',

(...omitted)
  • I modified the liquid syntax in the {% capture result_elem %} part to add the "/{{ site.active_lang }}" prefix in front of the post URL loaded from the JSON file when site.active_lang (current page language) and site.default_lang (site default language) are not the same.
  • In the same way, I modified the <script> part to designate the default path (/assets/js/data/search.json) if the current page language and the site default language are the same during the build process, and the path corresponding to that language (e.g., /ko/assets/js/data/search.json) if they are different, as search_path.

After modifying as above and rebuilding the website, I confirmed that the search results are displayed correctly for each language.

Since {url} is where the URL value read from the JSON file will be inserted later, not a URL itself, Polyglot does not recognize it as a target for localization, so it needs to be handled directly according to the language. The problem is that "/{{ site.active_lang }}{url}", which has been processed in this way, is recognized as a URL, and although localization has already been completed, Polyglot doesn’t know that and tries to perform localization redundantly (e.g., "/ko/ko/posts/example-post"). To prevent this, I specified the {% static_href %} tag.

This post is licensed under CC BY-NC 4.0 by the author.