docs: federated hashtag indexing design spec

This commit is contained in:
2026-05-16 02:39:33 +02:00
parent 78ee7b9388
commit f895503175

View File

@@ -0,0 +1,84 @@
# Federated Hashtag Indexing Design
**Date:** 2026-05-16
**Status:** Approved
## Problem
When a remote ActivityPub Note arrives via the inbox, `accept_note` stores the thought in the `thoughts` table (`local = false`) but never attaches hashtags. As a result, federated content is invisible to tag feeds — `/tags/rust` only shows local posts even when remote servers have sent tagged notes.
## Solution
After persisting the remote thought, extract hashtags from the Note's AP `tag` array and attach them using the existing `TagRepository` infrastructure.
---
## Design
### Hashtag source: AP `tag` array
AP Notes carry a structured `tag` array:
```json
[
{ "type": "Hashtag", "name": "#rust", "href": "https://mastodon.social/tags/rust" },
{ "type": "Mention", "href": "...", "name": "@alice" }
]
```
Filter entries where `type == "Hashtag"`, take `name`, strip the leading `#`, lowercase. Do NOT use `domain::hashtag::extract()` on the raw content — remote content is often HTML and the char-walker would produce false positives inside anchor text.
### `accept_note` return type change
`ActivityPubRepository::accept_note` currently returns `Result<(), DomainError>`. Change to `Result<ThoughtId, DomainError>` so the handler has the ID needed for `attach_to_thought`.
### Handler change
In `crates/adapters/activitypub/src/handler.rs`, after calling `accept_note`:
```rust
let thought_id = ap_repo.accept_note(...).await?;
// Extract hashtags from AP tag array
let hashtag_names: Vec<String> = note["tag"]
.as_array()
.map(|tags| {
tags.iter()
.filter(|t| t["type"].as_str() == Some("Hashtag"))
.filter_map(|t| t["name"].as_str())
.map(|name| name.trim_start_matches('#').to_lowercase())
.filter(|name| !name.is_empty())
.collect()
})
.unwrap_or_default();
for name in hashtag_names {
if let Ok(tag) = tag_repo.find_or_create(&name).await {
let _ = tag_repo.attach_to_thought(&thought_id, tag.id).await;
}
}
```
Tag failures are silenced (`let _ = ...`) — a tag attachment failure should not cause the entire note ingestion to fail.
### Dependency injection
The AP handler struct gains `tag_repo: Arc<dyn TagRepository>`. Wired in `crates/bootstrap/src/` alongside the existing handler dependencies.
---
## Files Changed
| File | Change |
|---|---|
| `crates/domain/src/ports.rs` | `ActivityPubRepository::accept_note` return type: `() → ThoughtId` |
| `crates/adapters/postgres/src/activitypub.rs` | Return `ThoughtId` from `accept_note` impl |
| `crates/adapters/activitypub/src/handler.rs` | Add `tag_repo` field; extract + attach hashtags after `accept_note` |
| `crates/bootstrap/src/factory.rs` | Inject `TagRepository` into AP handler |
---
## What This Does Not Cover
- Backfilling existing remote thoughts already in the DB (only new incoming notes get tagged)
- Updating tags when a remote Edit activity arrives for a previously accepted note
- Federated search (search still queries local thoughts only; this only fixes tag feeds)