Files

CI / Check / Test (push) Failing after 1m24s

Details

init: archlens — architecture diagram generator

Hex arch + DDD, tree-sitter parsing, Mermaid/ASCII output.
Supports Rust + Python. 92 tests. CI, diff, --check for staleness detection.

2026-06-16 16:13:04 +02:00

1.9 KiB

Raw Blame History

0002 — Tree-sitter for Source Code Parsing

Status: Accepted
Date: 2026-06-16

Context

Archlens needs to extract type-level information (classes, structs, traits, interfaces, enums, fields, inheritance) from Rust, C#, and Python source code. The tool must be language-agnostic in design, fast enough for CI, and memory-efficient for large codebases.

Decision

Use tree-sitter as the primary parsing backend. One tree-sitter adapter crate with internal modules per language. Each language module defines tree-sitter S-expression queries to extract CodeElements and Relationships.

Tradeoff accepted: Tree-sitter provides syntactic analysis only — no cross-file type resolution, no generics resolution, no resolved imports. This is sufficient for architecture diagrams where we care about structural relationships visible in the source text.

Semantic resolution is a future concern, handled by a separate adapter (e.g., roslyn-adapter for C#) implementing the same SourceAnalyzer port.

Alternatives Considered

Custom parsers/lexers per language: Full control but enormous implementation and maintenance effort. Rejected.
LSP servers: Rich semantic info but heavy to run, hard to orchestrate in CI, each language needs its own server process. Rejected for CI use case.
Native AST APIs (Roslyn, rustc_ast, Python ast): Very accurate but each is a completely different API and ecosystem. Can't run Roslyn from Rust easily. Rejected as primary approach — viable as future specialized adapters.
Regex/heuristic: Breaks on edge cases. Not serious for a real tool.

Consequences

Single unified parsing approach across all languages
Adding a new language = writing tree-sitter queries (hours, not weeks)
No semantic type resolution — use Foo doesn't tell us which Foo across modules
Memory-efficient: parse trees are per-file and dropped after extraction
Fast: tree-sitter is incremental and optimized for performance

1.9 KiB Raw Blame History