docs: federated hashtag indexing design spec
This commit is contained in:
@@ -0,0 +1,84 @@
|
||||
# Federated Hashtag Indexing Design
|
||||
|
||||
**Date:** 2026-05-16
|
||||
**Status:** Approved
|
||||
|
||||
## Problem
|
||||
|
||||
When a remote ActivityPub Note arrives via the inbox, `accept_note` stores the thought in the `thoughts` table (`local = false`) but never attaches hashtags. As a result, federated content is invisible to tag feeds — `/tags/rust` only shows local posts even when remote servers have sent tagged notes.
|
||||
|
||||
## Solution
|
||||
|
||||
After persisting the remote thought, extract hashtags from the Note's AP `tag` array and attach them using the existing `TagRepository` infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## Design
|
||||
|
||||
### Hashtag source: AP `tag` array
|
||||
|
||||
AP Notes carry a structured `tag` array:
|
||||
```json
|
||||
[
|
||||
{ "type": "Hashtag", "name": "#rust", "href": "https://mastodon.social/tags/rust" },
|
||||
{ "type": "Mention", "href": "...", "name": "@alice" }
|
||||
]
|
||||
```
|
||||
|
||||
Filter entries where `type == "Hashtag"`, take `name`, strip the leading `#`, lowercase. Do NOT use `domain::hashtag::extract()` on the raw content — remote content is often HTML and the char-walker would produce false positives inside anchor text.
|
||||
|
||||
### `accept_note` return type change
|
||||
|
||||
`ActivityPubRepository::accept_note` currently returns `Result<(), DomainError>`. Change to `Result<ThoughtId, DomainError>` so the handler has the ID needed for `attach_to_thought`.
|
||||
|
||||
### Handler change
|
||||
|
||||
In `crates/adapters/activitypub/src/handler.rs`, after calling `accept_note`:
|
||||
|
||||
```rust
|
||||
let thought_id = ap_repo.accept_note(...).await?;
|
||||
|
||||
// Extract hashtags from AP tag array
|
||||
let hashtag_names: Vec<String> = note["tag"]
|
||||
.as_array()
|
||||
.map(|tags| {
|
||||
tags.iter()
|
||||
.filter(|t| t["type"].as_str() == Some("Hashtag"))
|
||||
.filter_map(|t| t["name"].as_str())
|
||||
.map(|name| name.trim_start_matches('#').to_lowercase())
|
||||
.filter(|name| !name.is_empty())
|
||||
.collect()
|
||||
})
|
||||
.unwrap_or_default();
|
||||
|
||||
for name in hashtag_names {
|
||||
if let Ok(tag) = tag_repo.find_or_create(&name).await {
|
||||
let _ = tag_repo.attach_to_thought(&thought_id, tag.id).await;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Tag failures are silenced (`let _ = ...`) — a tag attachment failure should not cause the entire note ingestion to fail.
|
||||
|
||||
### Dependency injection
|
||||
|
||||
The AP handler struct gains `tag_repo: Arc<dyn TagRepository>`. Wired in `crates/bootstrap/src/` alongside the existing handler dependencies.
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `crates/domain/src/ports.rs` | `ActivityPubRepository::accept_note` return type: `() → ThoughtId` |
|
||||
| `crates/adapters/postgres/src/activitypub.rs` | Return `ThoughtId` from `accept_note` impl |
|
||||
| `crates/adapters/activitypub/src/handler.rs` | Add `tag_repo` field; extract + attach hashtags after `accept_note` |
|
||||
| `crates/bootstrap/src/factory.rs` | Inject `TagRepository` into AP handler |
|
||||
|
||||
---
|
||||
|
||||
## What This Does Not Cover
|
||||
|
||||
- Backfilling existing remote thoughts already in the DB (only new incoming notes get tagged)
|
||||
- Updating tags when a remote Edit activity arrives for a previously accepted note
|
||||
- Federated search (search still queries local thoughts only; this only fixes tag feeds)
|
||||
Reference in New Issue
Block a user