These are the detailed instructions on generating the ACL Anthology website as seen on https://aclanthology.org/ and contributing to it.
The Anthology website is generated using the Hugo static
site generator. However, before we can actually invoke Hugo, we need to prepare
the contents of the website. The following steps describe what happens
behind the scenes. All the steps have a corresponding make target as well.
If you are on a system that uses apt for installing packages, you can therefore
just run the following commands:
sudo apt install jing bibutils hugo
make all
If this doesn’t work, you can instead use the following instructions to go through the process step by step and observe the expected outputs.
To build the Anthology, the packages listed in bin/requirements.txt are needed (they are installed and updated by make automatically).
pip install -r bin/requirements.txtYou also need to install “jing”, an XML schema checker. if you are using Homebrew on OS X, you can install
this with brew install jing-trang.
The data sources for the Anthology currently reside in the data/
directory. XML files contain the authoritative paper metadata, and additional
YAML files document information about venues and special interest groups (SIGs).
Before the Anthology website can be generated, all this information needs to be
converted and preprocessed for the static site generator. Some derived
information, such as BibTeX entries for each paper, is also added in this step.
This is achieved by calling:
$ python3 bin/create_hugo_data.py
This process should not take longer than a few minutes.
(NB: This step is skipped on preview branches.)
In this step, we create the full consolidated BibTeX files (anthology.bib
etc.) as well as the MODS and Endnote formats. This is achieved by calling:
$ python3 bin/create_extra_bib.py
The exported files will be written to the build/data-export/ subdirectory.
For other export formats, we rely on the
bibutils suite by first
converting the generated .bib files to MODS XML, then converting the MODS XML
to Endnote. This happens within the bin/create_extra_bib.py script and uses
some performance optimizations (such as process pools) to speed this up.
The files that were generated so far are in the build/ subdirectory, in which
Hugo will be invoked. Before doing this, however, you need to also copy the
content of the hugo/ subdirectory into build/ so that all the configuration
files and the page structure is accessible to the engine.
After doing so, the website can be built by simply invoking Hugo from the build/
subdirectory. Optionally, the --minify flag can be used to create minified
HTML output:
$ hugo --minify
Generating the website is quite a resource-hungry process, but should not take
longer than a few minutes. Due to the high memory usage (approx. 18 GB
according to the output of hugo --stepAnalysis), it is possible that it will
cause swapping and consequently slow down your system for a while.
The fully generated website will be in build/anthology/ afterwards.
The static site tries to follow a strict separation of content and presentation. If you need to make changes to the Anthology, the first step is to figure out where to make these changes.
Changes in content (paper metadata, information about SIGs, etc.) should
always be made in the data files under data/ or in the scripts that
interpret them; changes that only affect the presentation on the website can
be made within the Hugo templates.
The data sources of the Anthology are currently stored under data/. They
comprise:
The authoritative XML files (in xml/); these contain all paper
metadata. Their format is defined in an RelaxNG schema
schema.rnc in the
same directory.
YAML files for SIGs (in yaml/sigs/); these contain names,
URLs, and associated venues for all special interest groups.
YAML files that define venues (in yaml/venues/).
Each venue has its own yaml file that contains venue specific information
such as venue acronym, venue full name and venue url.
A name variant list (name_variants.yaml) that
defines which author names should be treated as identical for purposes of
generating “author” pages.
The “acl-anthology” module under python/ is responsible
for parsing and interpreting all these data files. Some information that is not
explicitly stored in any of these files is derived automatically by this
module during Step 1 of building the website.
HTML templates for the website are found under hugo/layouts/.
The main skeleton for all HTML files is
_default/baseof.html.
The front page is index.html.
Most other pages are defined as **/single.html (e.g.,
papers/single.html defines the paper
pages).
The appearance of paper entries in lists (on proceedings pages, author pages,
etc.) is defined in
papers/list-entry.html.
CSS styling for the website is based on Bootstrap
4.3. The final CSS is compiled from
hugo/assets/css/main.scss, which defines
If a new year is added to the Anthology, make sure the front page
template is updated to include this new year. Make
sure to adjust the variable $all_years (and $border_years, if needed) and
don’t forget to update the table headers as well! (Their colspan
attributes need to match the number of years subsumed under the header.)
The following criteria are checked automatically (via Travis CI) and enforced for all changes pushed to the Anthology:
schema.rnc.black
tool.ruff tool. If there’s a good
reason to ignore a rule, noqa
comments can be
used on an individual basis.There are three make targets that help you check (and fix) your commits:
make check will check all files in the repository.make check_commit will only check files currently staged for commit. This
is best used as a pre-commit hook in order to help you identify problems
early.make autofix works like check_commit, except that it will also run the
black code formatter to automatically make
your Python files style-compliant, and the
ruff linter to correct those
linting errors which can be fixed automatically. This can also be used as a
pre-commit hook, or run manually when you find that make check_commit
complains about your files.To easily make any of these targets work as a pre-commit hook, you can create a symlink to one of the predefined scripts as follows:
ln -s ../../.git-hooks/check_commit .git/hooks/pre-commit (for check target)ln -s ../../.git-hooks/autofix .git/hooks/pre-commit (for autofix target)