The architecture of an ActivityPub link aggregator¶

I'm getting closer to the final design of #brutalinks (né #littrme), the federated link aggregator that started my journey with ActivityPub so I thought to put down a couple of ideas about it.

Phase 1: Synchronous application using vocabulary extensions¶

Initially the aggregator was planned to be a thin layer on top of ActivityPub collections, and all operations were done synchronously at user access time.

This was initially a success, as the application didn’t need any local storage, and everything was done on the fly, but as we increased the complexity of the information we wanted to show users it became more and more difficult to maintain low latencies for the resulting pages.

The ActivityPub vocabulary is pretty extensive and allows for a lot of models of interaction, however not all information relevant to a link aggregator can be accessed easily.

The most important one was the score for submissions and users.

In vanilla ActivityPub this is usually done by tracking the likes (and maybe dislikes) of these objects. However the spec is not detailed enough in this respect (as an example, the /liked, /likes collection exist on objects, actors, but they’re supposed to contain just links to Liked objects and actors, not Disliked ones). This makes a meaningful score hard to compute, if aggregating the Dislike count needs a full inbox scan for all Dislike activities with the item as object.

So due to this, one option I considered for a long time was to add a custom extension to ActivityPub objects to hold this computed value in a property called score.

Another issue with the ActivityPub model is the fact that the naive way of building a reply object is by setting the parent of the current reply as a inReplyTo value. This is problematic if one wants to build a threaded model for the discussion as it lacks context for things higher in the discussion.

What we ended up using was to add all grand-parent objects to the inReplyTo property and end up incompatible with a lot of the Fediverse which expects that to be a single IRI. Additionally we store the top object in the context property, which is a behaviour that was later duplicated by other “threadiverse” applications.

With this behaviour, the replies collections of the grand-parents will hold links to all their descendants, and to build a full thread one only needs to request the top post object’s replies.

Phase 2: asynchronous independent application¶

As I’ve already mentioned, over time this synchronous method of modelling the link aggregator on top of ActivityPub proved to be much to slow, as the number of requests for compiling an entire page was getting too large.

Recently we migrated to an asynchronous model, where the instance actor’s collections are scanned periodically for new activities and added to local storage.

Additionally, when actors get locally created, their collections also get added to this list.

From all of these collections we built a local model which includes local indexes that can be used for speeding up searches and object filtering.

Data model¶

I've touched on the subject a little but I want to go into more detail about how we model the link aggregator on top of the ActivityPub vocabulary. This model corresponds to the latest version of the code that we’re using, and as such doesn’t use any object types or extensions outside of vanilla ActivityPub.

An overview of the following detailed ideas can be seen at a glance on the concepts page.

User and Object scores¶

Currently the way we compute scores for objects is by aggregating over the list of Like and Dislike activities for that object and using the HackerNews initial algorithm for the hot sort.

For actors we defaulted to a simpler method, where we use only their /liked collection to compute a “positivity” score.

Federation model¶

We consider that in order to foster good quality communities the federation mechanism needs to be one which requires explicit opt-in from both ends of the relationship.

As such, brutalinks instances that want to federate with other brutalinks instances need to send an explicit Follow to that instance. Follow

When they get accepted all suitable activities of that instance get propagated to the current one.

When, in turn, they receive such a Follow it can be Accepted or Rejected.

If the follow relationship is mutual the communication will be bidirectional[1].

[1] Caveat to this bidirectionality is the fact that when a remote user comments on an item of the current instance, that user gets appended to the list of recipients for future children comments, irrespective if their instance has a follow relationship with the current one.

Basic mapping between link aggregator and ActivityPub concepts¶

For the purpose of this document we will map the following social media concepts to the ActivityPub vocabulary:

Instance and Service¶

An instance represents an Actor object on an ActivityPub service with the type Application or Service.

In the GoActivityPub libraries, we use the following conventions:

A Service actor represents an ActivityPub capable server (for our use case, it is required for it to be C2S capable). Its inbox can be used as a global inbox for the other actors belonging to it.
An Application actor represents a “frontend” to present the activitypub content to its users. It can be a web or standalone application.

An instance can belong to any of these two categories, but for the purpose of this document we will refer to the former as the “service” and to the later as the “instance”.

Application actor¶

The link aggregator itself is represented by an Application actor. The collections for this actor will make up the bulk of the data that we use to build the main pages of the aggregator.

The main page is composed of all the Create activities with Object types that we know how to represent, have a Name property, and are not replies to other objects. The types of these objects are Page for link submissions, Note and Article for textual submissions, and Video, Audio, Image for media submissions.

We support some additional filtering for building the other main pages:

the /self tab will contain only submissions coming from actors on the same instance as the aggregator itself.
the /federated tab will contain only submissions coming from actors on other instances.
the “discussions” filter is only for textual objects.

The service actor¶

The underlying ActivityPub server needs to be a Client to Server capable service where the initial actor for the BrutaLinks application can be created using OAuth2 dynamic user creation, or has been created manually and configured in BrutaLinks.

Currently the server that matches this is FedBOX, which in fairness was developed specifically for the purpose, and only progressively became useful as an independent ActivityPub service.

Users¶

A regular user represents an Actor on an ActivityPub service with the type Person. Its lifetime is most likely related to that of the Instance, which represents the intermediary for creating, modifying and possibly removing the user.

So the users on an instance of BrutaLinks are independent actors that can access and operate in the wider fediverse, but have their origin with the instance itself.

Moderation¶

In BrutaLinks the moderation is done with the help of the community.

The moderators of the instance can then operate on these requests. Currently those operations.

The members of the moderation team can be recognized by some predetermined tags. Currently we recognize two such tags: #sysop and #mod.

For additional technical details see the expanded document regarding moderation.