How to Support Multiple Languages on Jekyll Blog with Polyglot (2) - Troubleshooting Chirpy Theme Build Failures and Search Function Errors
This post introduces the process of implementing multilingual support by applying the Polyglot plugin to a Jekyll blog based on 'jekyll-theme-chirpy'. This is the second post in the series, covering the identification and resolution of errors that occur when applying Polyglot to the Chirpy theme.
Overview
About 4 months ago, in early July 12024 of the Holocene calendar, I added multilingual support to this Jekyll-based blog hosted on Github Pages by applying the Polyglot plugin. This series shares the bugs encountered during the process of applying the Polyglot plugin to the Chirpy theme, their solutions, and how to write HTML headers and sitemap.xml with SEO in mind. The series consists of 2 posts, and this is the second post in the series.
- Part 1: Applying Polyglot Plugin & Implementing hreflang alt Tags, Sitemap, and Language Selection Button
- Part 2: Troubleshooting Chirpy Theme Build Failures and Search Function Errors (this post)
Requirements
- The built result (web pages) should be provided in language-specific paths (e.g.,
/posts/ko/
,/posts/ja/
). - To minimize the additional time and effort required for multilingual support, the system should automatically recognize the language based on the local path (e.g.,
/_posts/ko/
,/_posts/ja/
) without having to specify ‘lang’ and ‘permalink’ tags in the YAML front matter of each original markdown file. - The header section of each page on the site should include appropriate Content-Language meta tags and hreflang alternate tags to meet Google’s multilingual search SEO guidelines.
- The site should provide links to all pages supporting each language without omissions in
sitemap.xml
, andsitemap.xml
itself should exist only once in the root path without duplication. - All features provided by the Chirpy theme should work properly on each language page, and if not, they should be modified to work properly.
- ‘Recently Updated’, ‘Trending Tags’ features working properly
- No errors during the build process using GitHub Actions
- Blog post search function in the upper right corner working properly
Before We Begin
This post is a continuation of Part 1, so if you haven’t read it yet, I recommend reading the previous post first.
Troubleshooting (‘relative_url_regex’: target of repeat operator is not specified)
After completing the previous steps, when I ran the bundle exec jekyll serve
command to test the build, it failed with the error 'relative_url_regex': target of repeat operator is not specified
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...(omitted)
------------------------------------------------
Jekyll 4.3.4 Please append `--trace` to the `serve` command
for any additional information or backtrace.
------------------------------------------------
/Users/yunseo/.gem/ruby/3.2.2/gems/jekyll-polyglot-1.8.1/lib/jekyll/polyglot/
patches/jekyll/site.rb:234:in `relative_url_regex': target of repeat operator
is not specified: /href="?\/((?:(?!*.gem)(?!*.gemspec)(?!tools)(?!README.md)(
?!LICENSE)(?!*.config.js)(?!rollup.config.js)(?!package*.json)(?!.sass-cache)
(?!.jekyll-cache)(?!gemfiles)(?!Gemfile)(?!Gemfile.lock)(?!node_modules)(?!ve
ndor\/bundle\/)(?!vendor\/cache\/)(?!vendor\/gems\/)(?!vendor\/ruby\/)(?!en\/
)(?!ko\/)(?!es\/)(?!pt-BR\/)(?!ja\/)(?!fr\/)(?!de\/)[^,'"\s\/?.]+\.?)*(?:\/[^
\]\[)("'\s]*)?)"/ (RegexpError)
...(omitted)
After searching to see if similar issues had been reported, I found that exactly the same issue had already been registered in the Polyglot repository, and a solution existed.
The Chirpy theme’s _config.yml
file contains the following syntax:
1
2
3
4
5
6
7
8
9
exclude:
- "*.gem"
- "*.gemspec"
- docs
- tools
- README.md
- LICENSE
- "*.config.js"
- package*.json
The cause of the problem lies in the regex syntax in the following two functions in Polyglot’s site.rb
, which cannot properly handle globbing patterns with wildcards like "*.gem"
, "*.gemspec"
, and "*.config.js"
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# a regex that matches relative urls in a html document
# matches href="baseurl/foo/bar-baz" href="/foo/bar-baz" and others like it
# avoids matching excluded files. prepare makes sure
# that all @exclude dirs have a trailing slash.
def relative_url_regex(disabled = false)
regex = ''
unless disabled
@exclude.each do |x|
regex += "(?!#{x})"
end
@languages.each do |x|
regex += "(?!#{x}\/)"
end
end
start = disabled ? 'ferh' : 'href'
%r{#{start}="?#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
end
# a regex that matches absolute urls in a html document
# matches href="http://baseurl/foo/bar-baz" and others like it
# avoids matching excluded files. prepare makes sure
# that all @exclude dirs have a trailing slash.
def absolute_url_regex(url, disabled = false)
regex = ''
unless disabled
@exclude.each do |x|
regex += "(?!#{x})"
end
@languages.each do |x|
regex += "(?!#{x}\/)"
end
end
start = disabled ? 'ferh' : 'href'
%r{(?<!hreflang="#{@default_lang}" )#{start}="?#{url}#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
end
There are two ways to solve this problem.
1. Fork Polyglot and modify the problematic parts
As of the time of writing this post (11.12024), the Jekyll official documentation states that the exclude
setting supports globbing patterns.
“This configuration option supports Ruby’s File.fnmatch filename globbing patterns to match multiple entries to exclude.”
In other words, the root cause is not in the Chirpy theme but in Polyglot’s relative_url_regex()
and absolute_url_regex()
functions, so the fundamental solution is to modify them to prevent the problem.
Since this bug has not yet been fixed in Polyglot, you can fork the Polyglot repository with reference to this blog post and the answer to the previous GitHub issue, modify the problematic parts as follows, and use it instead of the original Polyglot.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def relative_url_regex(disabled = false)
regex = ''
unless disabled
@exclude.each do |x|
escaped_x = Regexp.escape(x)
regex += "(?!#{escaped_x})"
end
@languages.each do |x|
escaped_x = Regexp.escape(x)
regex += "(?!#{escaped_x}\/)"
end
end
start = disabled ? 'ferh' : 'href'
%r{#{start}="?#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
end
def absolute_url_regex(url, disabled = false)
regex = ''
unless disabled
@exclude.each do |x|
escaped_x = Regexp.escape(x)
regex += "(?!#{escaped_x})"
end
@languages.each do |x|
escaped_x = Regexp.escape(x)
regex += "(?!#{escaped_x}\/)"
end
end
start = disabled ? 'ferh' : 'href'
%r{(?<!hreflang="#{@default_lang}" )#{start}="?#{url}#{@baseurl}/((?:#{regex}[^,'"\s/?.]+\.?)*(?:/[^\]\[)("'\s]*)?)"}
end
2. Replace globbing patterns with exact filenames in the Chirpy theme’s ‘_config.yml’ configuration file
The proper and ideal solution would be for the above patch to be incorporated into the Polyglot mainstream. However, until then, you would need to use a forked version, which can be cumbersome as you would need to keep up with upstream Polyglot updates. Therefore, I used a different approach.
If you check the files in the root path of the Chirpy theme repository that match the patterns "*.gem"
, "*.gemspec"
, and "*.config.js"
, there are only 3 files:
jekyll-theme-chirpy.gemspec
purgecss.config.js
rollup.config.js
Therefore, you can delete the globbing patterns in the exclude
section of the _config.yml
file and replace them as follows so that Polyglot can process them without issues.
1
2
3
4
5
6
7
8
9
exclude: # Modified with reference to https://github.com/untra/polyglot/issues/204
# - "*.gem"
- jekyll-theme-chirpy.gemspec # - "*.gemspec"
- tools
- README.md
- LICENSE
- purgecss.config.js # - "*.config.js"
- rollup.config.js
- package*.json
Modifying the Search Function
After completing the previous steps, almost all site functions worked satisfactorily as intended. However, I later discovered that the search bar located in the upper right corner of pages using the Chirpy theme could not index pages in languages other than site.default_lang
(English in the case of this blog), and when searching in languages other than English, it still displayed English pages in the search results.
To understand the cause, let’s look at what files are involved in the search function and where the problem occurs.
‘_layouts/default.html’
Looking at the _layouts/default.html
file that forms the template for all pages on the blog, we can see that it loads the contents of search-results.html
and search-loader.html
inside the <body>
element.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<body>
{% include sidebar.html lang=lang %}
<div id="main-wrapper" class="d-flex justify-content-center">
<div class="container d-flex flex-column px-xxl-5">
(...omitted...)
{% include_cached search-results.html lang=lang %}
</div>
<aside aria-label="Scroll to Top">
<button id="back-to-top" type="button" class="btn btn-lg btn-box-shadow">
<i class="fas fa-angle-up"></i>
</button>
</aside>
</div>
(...omitted...)
{% include_cached search-loader.html lang=lang %}
</body>
‘_includes/search-result.html’
_includes/search-result.html
creates a search-results
container to store search results for keywords entered in the search box.
1
2
3
4
5
6
7
8
9
10
<!-- The Search results -->
<div id="search-result-wrapper" class="d-flex justify-content-center d-none">
<div class="col-11 content">
<div id="search-hints">
{% include_cached trending-tags.html %}
</div>
<div id="search-results" class="d-flex flex-wrap justify-content-center text-muted mt-3"></div>
</div>
</div>
‘_includes/search-loader.html’
_includes/search-loader.html
is the core part that implements search based on the Simple-Jekyll-Search library. It executes JavaScript in the visitor’s browser to find matches for input keywords in the search.json
index file and returns the corresponding post links as <article>
elements, operating on the client side.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{% capture result_elem %}
<article class="px-1 px-sm-2 px-lg-4 px-xl-0">
<header>
<h2><a href="{url}">{title}</a></h2>
<div class="post-meta d-flex flex-column flex-sm-row text-muted mt-1 mb-1">
{categories}
{tags}
</div>
</header>
<p>{snippet}</p>
</article>
{% endcapture %}
{% capture not_found %}<p class="mt-5">{{ site.data.locales[include.lang].search.no_results }}</p>{% endcapture %}
<script>
{% comment %} Note: dependent library will be loaded in `js-selector.html` {% endcomment %}
document.addEventListener('DOMContentLoaded', () => {
SimpleJekyllSearch({
searchInput: document.getElementById('search-input'),
resultsContainer: document.getElementById('search-results'),
json: '{{ '/assets/js/data/search.json' | relative_url }}',
searchResultTemplate: '{{ result_elem | strip_newlines }}',
noResultsText: '{{ not_found }}',
templateMiddleware: function(prop, value, template) {
if (prop === 'categories') {
if (value === '') {
return `${value}`;
} else {
return `<div class="me-sm-4"><i class="far fa-folder fa-fw"></i>${value}</div>`;
}
}
if (prop === 'tags') {
if (value === '') {
return `${value}`;
} else {
return `<div><i class="fa fa-tag fa-fw"></i>${value}</div>`;
}
}
}
});
});
</script>
‘/assets/js/data/search.json’
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
layout: compress
swcache: true
---
[
{% for post in site.posts %}
{
"title": {{ post.title | jsonify }},
"url": {{ post.url | relative_url | jsonify }},
"categories": {{ post.categories | join: ', ' | jsonify }},
"tags": {{ post.tags | join: ', ' | jsonify }},
"date": "{{ post.date }}",
{% include no-linenos.html content=post.content %}
{% assign _content = content | strip_html | strip_newlines %}
"snippet": {{ _content | truncate: 200 | jsonify }},
"content": {{ _content | jsonify }}
}{% unless forloop.last %},{% endunless %}
{% endfor %}
]
This file uses Jekyll’s Liquid syntax to define a JSON file containing the title, URL, category and tag information, creation date, the first 200 characters of the content as a snippet, and the full content of all posts on the site.
Search Function Structure and Problem Identification
To summarize, when hosting the Chirpy theme on GitHub Pages, the search function operates through the following process:
stateDiagram
state "Changes" as CH
state "Build start" as BLD
state "Create search.json" as IDX
state "Static Website" as DEP
state "In Test" as TST
state "Search Loader" as SCH
state "Results" as R
[*] --> CH: Make Changes
CH --> BLD: Commit & Push origin
BLD --> IDX: jekyll build
IDX --> TST: Build Complete
TST --> CH: Error Detected
TST --> DEP: Deploy
DEP --> SCH: Search Input
SCH --> R: Return Results
R --> [*]
I confirmed that search.json
is created for each language by Polyglot as follows:
/assets/js/data/search.json
/ko/assets/js/data/search.json
/es/assets/js/data/search.json
/pt-BR/assets/js/data/search.json
/ja/assets/js/data/search.json
/fr/assets/js/data/search.json
/de/assets/js/data/search.json
Therefore, the problematic part is the “Search Loader”. The issue of non-English pages not being searchable occurs because _includes/search-loader.html
statically loads only the English index file (/assets/js/data/search.json
) regardless of the language of the page being visited.
- However, unlike markdown or html format files, for JSON files, Polyglot wrappers for Jekyll-provided variables like
post.title
,post.content
work, but the Relativized Local Urls feature does not seem to work.- Similarly, I confirmed during testing that within JSON file templates, it’s not possible to access additional liquid tags provided by Polyglot such as
{{ site.default_lang }}
,{{ site.active_lang }}
beyond the variables provided by Jekyll.Therefore, while values like
title
,snippet
, andcontent
in the index file are generated differently for each language, theurl
value returns the default path without considering the language, and appropriate handling needs to be added to the “Search Loader” part.
Solution
To solve this, modify the content of _includes/search-loader.html
as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{% capture result_elem %}
<article class="px-1 px-sm-2 px-lg-4 px-xl-0">
<header>
{% if site.active_lang != site.default_lang %}
<h2><a {% static_href %}href="/{{ site.active_lang }}{url}"{% endstatic_href %}>{title}</a></h2>
{% else %}
<h2><a href="{url}">{title}</a></h2>
{% endif %}
(...omitted...)
<script>
{% comment %} Note: dependent library will be loaded in `js-selector.html` {% endcomment %}
document.addEventListener('DOMContentLoaded', () => {
{% assign search_path = '/assets/js/data/search.json' %}
{% if site.active_lang != site.default_lang %}
{% assign search_path = '/' | append: site.active_lang | append: search_path %}
{% endif %}
SimpleJekyllSearch({
searchInput: document.getElementById('search-input'),
resultsContainer: document.getElementById('search-results'),
json: '{{ search_path | relative_url }}',
searchResultTemplate: '{{ result_elem | strip_newlines }}',
(...omitted)
- I modified the liquid syntax in the
{% capture result_elem %}
section to add the prefix"/{{ site.active_lang }}"
before the post URL loaded from the JSON file whensite.active_lang
(current page language) is different fromsite.default_lang
(site default language). - Similarly, I modified the
<script>
section to compare the current page language with the site default language during the build process, and setsearch_path
to the default path (/assets/js/data/search.json
) if they are the same, or to the language-specific path (e.g.,/ko/assets/js/data/search.json
) if they are different.
After making these modifications and rebuilding the website, I confirmed that search results are displayed correctly for each language.
Since
{url}
is a placeholder for the URL value read from the JSON file and not a URL itself, it is not recognized as a localization target by Polyglot, so it needs to be handled directly according to the language. The problem is that"/{{ site.active_lang }}{url}"
is recognized as a URL, and although localization has already been completed, Polyglot doesn’t know that and tries to perform localization again (e.g.,"/ko/ko/posts/example-post"
). To prevent this, I specified the{% static_href %}
tag.