With all the talk about posts being lost due to people deleting comments, posts, and subs going private or generally protesting, I wondered if people have an appetite for doing some work to move key bits of reddit history over to kbin/Lemmy for posterity?
Some people were already doing archival of Reddit over at r/DataHoarder.
(This link does not give Reddit traffic.)
I was a member of the Civcraft community for a long time. We had a pretty extensive wiki set up within Reddit, and when they switched over to new Reddit our Wiki got nuked. We lost sooo much history and archival work for Civ/Ancap Minecraft in general. Just wanted to bring this up so hopefully you can try to safeguard against a similar situation.
Would it be possible for you to replicate the initial steps of this post
to get a torrent of a reddit archive and pull the things you’ve lost?
Don’t we already have one over at /m/Copy ?
Where is it? Can you give a link?
I think @Copy is how to link it.
Here’s also a web browser link - https://kbin.social/m/Copy
I have to admit I find all this talk of copying and backing up other people’s content to be a bit odd.
When reddit restores our content after we delete it, we get angry - it was my content to delete as I wish, how dare they reverse my decision? When our content is monetized for more than server costs+a bit of profit, we get angry - we made that content, we don’t mind you covering your costs and earning a living, but why do you get to get filthy rich from it? When our posts are quoted for pseudonews articles mining comments for opinions, we’re taken aback – excuse you, you could have at least asked if you could quote my story about my most embarrassing whathaveyou.
We get angry about these things because we feel a sense of ownership over what we post. And I don’t really see the difference here - this is another form of removing content from the control of its creators. If people want to copy their own stuff over, that’s fine! But I would never dream of removing that choice from the original poster.
Yes, reddit’s collective knowledge was valuable, and yes, there are some large-scale operations to preserve the internet in things like the wayback machine, the datahoarders backup, and other efforts… but these doesn’t mean we should also start a grassroots movement to disrespect ownership of content on top of all that’s already in place.
That’s a fair assessment, I guess it would be hard to store whatever was considered to be important without some sort of GDPR style impact.
Too late for me I guess, I’ve already deleted all of my user content in order to “deflate” reddit.
But I still have fingers and a few neurons connecting so, onward to new content!
A dedicated instance sounds better than a magazine though not sure who’d be willing to take on the expense of taking on that volume of data.
The easiest grab method would be using the API, which provides about a week to get dev approval and to copy all Reddit data via the API without getting banned.Actually even subreddits are affected by the 1000 indexing limit IIUC. So we would have limitations on what content we could discover without an external source.
I guess we could grab from the pushshift torrents and use API access to grab as much as we can of the last couple of months? (Pushshift lost access at the start of May iirc so that’s where the gap would start.) Also getting stuff from subs still protesting as private would be a problem.
Basically not a fan of the API approach.
Yes! https://lemmit.online only mirrors posts but not comments