• Samuel ProulxA
    link
    fedilink
    English
    arrow-up
    1
    ·
    9 days ago

    So most modern activitypub servers backfill threads and profiles. My single user instance processes 30000 notes a day. If I was actually trying, I’m sure it’d be easy to grab much more while appearing well behaved.

    • r00ty@kbin.life
      link
      fedilink
      arrow-up
      3
      ·
      8 days ago

      It’s not how ActivityPub (at least Lemmy/*bin servers) works. There isn’t so far as I’ve ever seen an API that allows for this within ActivityPub (now specific to Lemmy/*bin implementations there’s the API the browser/apps use that must provide this, but that’s not ActivityPub). It actually looks to be cleverly designed to prevent it. It might look like backfilling is happening because old stuff appears, but there are reasons for this.

      How it works from my experience (I did some work on the federation in kbin a year or so ago).

      • Instance A subscribes to community B hosted on Instance C.
      • Instance C notes this and does nothing. No previous content is sent, only future activities will be.
      • User on Instance D already subscribed to community B upvotes a comment on a post in community B.
      • Instance D sends the activity to Instance C.
      • Instance C sends the activity to Instance A.
      • Instance A gets the notice of the upvote, but realises it has no context for the upvote. But luckily the upvote has the comment ID of the comment that it was related to. So, now Instance A makes a request for the comment from Instance C.
      • Instance A receives the response from Instance C. But it turns out that comment was in reply to another comment. But the comment contains the ID of the parent comment. So Instance A requests that comment (and any parent comments until it gets the parent post).
      • By now Instance A has the information about the like, all comments from the liked comment to the post. These are saved to the database and will appear on the local system.
      • For each of the likes, comments and posts. If the user isn’t known locally the profile will also be fetched from their instance and stored locally.

      And so old posts and comments will begin to appear as activities linked to them happen. But there isn’t a method to ask for “all the posts in community X” using activity pub. I remember because I was specifically looking for this a year or so ago. It let’s you see the parent object but not any children.

      Maybe Mastadon etc does it different? No idea.

      And all of this is moot because if I block a User Agent, or I block an AS number/IP block. They’re not getting anything either by ActivityPub or scraping unless they change User Agent, AS number, or both.