Lexer Plugins#

This project’s “lexer” is a lightweight Fortran parser that extracts enough structure to drive directives and cross-references (it is not intended to be a full compiler frontend).

This page explains:

  • where parsing starts in the Sphinx build,

  • what your lexer must return,

  • how to populate the Fortran object model used by this project,

  • how to register your lexer as a plugin.

Where Parsing Starts#

Parsing runs automatically during a Sphinx build:

  1. Sphinx loads extensions listed in conf.py.

  2. This extension registers an event handler on builder-inited.

  3. When the builder initializes, the handler:

    • collects Fortran source files from fortran_sources (and applies fortran_sources_exclude),

    • derives doc markers from fortran_doc_chars,

    • resolves the configured lexer name via the lexer registry,

    • calls lexer.parse(files, doc_markers=...),

    • stores the returned symbols into the f domain data.

After that, directives (e.g. .. f:module::) render by reading the stored symbols from the domain.

The Lexer Interface#

A lexer is any object implementing the FortranLexer protocol defined in sphinx_fortran_domain.lexers.

At minimum:

  • name: a string identifier

  • parse(file_paths, *, doc_markers) -> FortranParseResult

Example skeleton:

from __future__ import annotations

from typing import Sequence

from sphinx_fortran_domain.lexers import (
    FortranLexer,
    FortranParseResult,
    FortranModuleInfo,
)
from sphinx_fortran_domain.utils import read_text_utf8


class MyLexer(FortranLexer):
    name = "my-lexer"

    def parse(self, file_paths: Sequence[str], *, doc_markers: Sequence[str]) -> FortranParseResult:
        modules = {}
        submodules = {}
        programs = {}

        for path in file_paths:
            text = read_text_utf8(path)
            # ... parse text ...
            # modules["mymodule"] = FortranModuleInfo(name="mymodule", doc="...", ...)

        return FortranParseResult(modules=modules, submodules=submodules, programs=programs)

The doc_markers argument is the resolved marker list (examples: ["!>"], ["!>", "!!"]). It is your lexer’s responsibility to interpret those markers and extract doc text.

Populating the Fortran Object Model#

Your parse method returns a FortranParseResult:

  • modules: mapping module_name -> FortranModuleInfo

  • submodules: mapping submodule_name -> FortranSubmoduleInfo

  • programs: mapping program_name -> FortranProgramInfo

The domain stores these mappings and directives query them by exact key.

Naming#

Use stable, human-facing names as keys (typically the Fortran entity name as written in source). Cross-references like :f:mod:`foo` resolve against these names.

Locations#

Most object types accept an optional SourceLocation(path, lineno). Providing locations is recommended because it enables:

  • better debugging,

  • better program-source extraction in directives,

  • future improvements (source links, jump-to-definition, etc.).

Use 1-based line numbers.

Docs#

Most objects accept a doc: str | None. That string is treated as a reStructuredText fragment by this project’s directives.

Practical guidance:

  • Store plain text without the leading doc marker.

  • Keep line breaks as you want them rendered.

  • You may include Sphinx roles/directives (e.g. :math:`...` or .. math::).

This project also supports a lightweight doc convention that is normalized before parsing as reST (for example, ## Title becomes a rubric). If you extract docs from source, you can preserve those markers.

Objects You Can Produce#

Below is a simplified overview of the main dataclasses.

Module#

from sphinx_fortran_domain.lexers import FortranModuleInfo, SourceLocation

mod = FortranModuleInfo(
    name="mymodule",
    doc="""Module documentation.""",
    procedures=[...],
    types=[...],
    interfaces=[...],
    location=SourceLocation(path="/abs/or/rel/path.f90", lineno=1),
)

Procedures (functions/subroutines)#

Procedures are represented as FortranProcedure objects.

  • kind must be either "function" or "subroutine".

  • arguments is a sequence of FortranArgument.

  • For functions, optionally set result (a FortranArgument) to document the result variable.

Derived types#

Derived types are represented by FortranType.

  • components/attributes: FortranComponent

  • type-bound procedures: FortranTypeBoundProcedure

Programs#

Programs are represented by FortranProgramInfo.

If you can provide them:

  • dependencies: modules referenced via use statements

  • procedures: internal procedures after contains

  • source: raw source string for the program unit

All of these are optional; you can start with just name/doc/location.

Registering a Lexer Plugin#

A plugin is typically a small Python package that exposes a Sphinx extension. In its setup(app) function, register a lexer factory:

from sphinx_fortran_domain.lexers import register_lexer

def setup(app):
    register_lexer("my-lexer", lambda: MyLexer())
    return {"version": "0.1.0", "parallel_read_safe": True}

Then, in the consuming project’s conf.py:

extensions = [
    "my_fortran_lexer_plugin",  # registers the lexer
    "sphinx_fortran_domain",
]

fortran_lexer = "my-lexer"

Reference Plugin (Concrete Example)#

This section shows a minimal, end-to-end “reference plugin” that you can copy as a starting point.

It consists of:

  • a Python module that registers a lexer via register_lexer

  • a project conf.py that enables the plugin and selects the lexer

  • a minimal Fortran source file that the lexer parses

The plugin module#

Create a module (or package) that Sphinx can import, for example reference_plugin.py:

For a fully working (tested) reference implementation, see tests/fixtures/reference_plugin.py in this repository.

from __future__ import annotations

 import re
 from typing import Dict, Sequence

from sphinx_fortran_domain.lexers import (
    FortranLexer,
    FortranModuleInfo,
    FortranParseResult,
    SourceLocation,
    register_lexer,
)

from sphinx_fortran_domain.utils import extract_predoc_before_line, read_lines_utf8


_RE_MODULE = re.compile(r"^\s*module\s+(?!procedure\b)([A-Za-z_]\w*)\b", re.IGNORECASE)


class ReferenceLexer(FortranLexer):
    name = "reference"

    def parse(self, file_paths: Sequence[str], *, doc_markers: Sequence[str]) -> FortranParseResult:
        modules: Dict[str, FortranModuleInfo] = {}
        for file_path in file_paths:
            lines = read_lines_utf8(file_path)
            for idx, line in enumerate(lines):
                m = _RE_MODULE.match(line)
                if not m:
                    continue
                name = m.group(1)
                doc = extract_predoc_before_line(lines, idx, doc_markers=doc_markers)
                modules[name] = FortranModuleInfo(
                    name=name,
                    doc=doc,
                    location=SourceLocation(path=str(file_path), lineno=idx + 1),
                )
                break
        return FortranParseResult(modules=modules, submodules={}, programs={})


def setup(app):
    # The registry stores a factory; Sphinx will call it when parsing starts.
    register_lexer("reference", lambda: ReferenceLexer())
    return {"version": "0.1.0", "parallel_read_safe": True}

Enable it in a consuming project’s conf.py#

In the consuming project (the docs you are building), enable both the plugin and this domain:

extensions = [
    "reference_plugin",      # registers the lexer
    "sphinx_fortran_domain", # uses the registered lexer
]

fortran_lexer = "reference"

Minimal Fortran input#

With the default doc marker !>, a minimal module might look like:

!> This is a test module.
module mymod
end module mymod

When you run Sphinx, parsing starts at builder-inited and your lexer is invoked with the collected fortran_sources.