Rose here. Also @umbraroze for non-kbin stuff.

  • 7 Posts
  • 37 Comments
Joined 1 year ago
cake
Cake day: June 14th, 2023

help-circle



  • Reddit has an user data checkout feature (IIRC, check out the user settings or maybe reddit help pages to find it).

    It’s a bit crap though.

    It takes a long time to process, especially if you happened to post in the era when the Reddit data infrastructure was horribly terrible instead of merely ordinarily terrible, and apparently this involves some handwork in the worst cases on behalf of the staff.

    Some data may be missing or truncated. It doesn’t give you data from privated/banned subreddits (which was a fun thing to discover because last time I tried to do this the blackouts were on), and even for legit stuff, long comments/posts may be truncated. Even so, I’m pretty sure that the dumps just straight up didn’t have all of my posts from several years ago, even if those were on public subreddits. So you need to make sure the checked out data is sensible.

    In conjunction to the official dumps, I recommend a few other tools, especially since the dumps aren’t really magnificently usable on their own. One tool that I found personally invaluable is reddit-user-to-sqlite, which allows you to import Reddit data dumps and available live user data (I think it does this by scraping or something, I’m sure it worked despite the API being shut down) to sqlite database, and Datasette is a nice frontend for browsing the posts.

    As for scrubbing, there’s tools for that are supposed to work. I think.



  • Yup. The robots.txt file is not only meant to block robots from accessing the site, it’s also meant to block bots from accessing resources that are not interesting for human readers, even indirectly.

    For example, MediaWiki installations are pretty clever in that by default, /w/ is blocked and /wiki/ is encouraged. Because nobody wants technical pages and wiki histories in search results, they only want the current versions of the pages.

    Fun tidbit: in the late 1990s, there was a real epidemic of spammers scraping the web pages for email addresses. Some people developed wpoison.cgi, a script whose sole purpose was to generate garbage web pages with bogus email addresses. Real search engines ignored these, thanks to robots.txt. Guess what the spam bots did?

    Do the AI bros really want to go there? Are they asking for model collapse?


  • I’m, like, OK, nuclear power isn’t necessarily a bad thing.
    But power plants like that should probably serve wider municipal needs.

    Building a private nuclear power plant just to power a data center? Well that’s clearly stupid.
    Building a private nuclear power plant just to power a data center focused on a niche application? Well you know how that goes.

    Also, look up SL-1. Disturbingly few Americans I’ve talked to have heard about that. Generally a good argument about why not every single thing should be powered by a tiny dedicated nuclear reactor.




  • Yeah, basically unsubbed from AvE over this too.

    I can’t remember who this was, but there was another engineering YouTuber who, during the pandemic, basically twittered about being frustrated with the lockdowns from business perspective and whingled about being scared talking about his political beliefs because apparently being anything anything right of a model leftist is a crucifiable offence in the bird site, according to him. And how the horse paste actually works. I was like “…oh shit, maybe this dude is a magahatter?”


  • I used to watch iilluminaughtii several years ago, probably because I’ve been grabbing popcorn and enjoying watching someone dunking on multi-level marketing since, uh, 90s at least. Then I watched some video that was about some topic that I was kind of in middle of a deep dive, too (I can’t remember which exactly. Elan School, probably?). And the video was bland as hell. And then I was like “yeah, most of these other videos are kind of forgettable shallow pap too”.

    …and this year we found out about the whole landlordy corporate town fancier backstabby financial abuser helicopter-CEO situation. And the content mill situation. And the plagiarism thing. Can’t forget the plagiarism thing. …I was like, “oh this all just makes sense now.”







  • I’m from Finland. This is how it usually goes in the winter:

    During the 2 hours of daylight we get at this latitude:

    • Ooooooh this is pretty
    • Bet I can get some nice photographs
    • …or I would, if the sky wasn’t overcast goddamn it

    Other times:

    • Rummaging through the closet for wool socks and more clothing
    • Put on the headphones, hit the metal music collection on my Nokia, and face the Darkness with a grim stare
    • Would hit the beer, but not in this economy

  • I literally just looked at Reddit for the first time in ages.

    What the fuck.

    Here’s the thing: Reddit’s UI design has always been shitty. Old Reddit was fucking garbage, so admins cheerfully asked RES folks to fix their shit. (Instead of, you know, hiring them.) New Reddit? Always been shit, and nobody’s going to fix it.

    This Newer New Reddit? I… I don’t think they even know at this point. What. What’s going on.

    If they ask critique from the community, some AI bot will AI-pat the admin’s arse and AI-splain the remaining AI-users that things will be just fine. (Now, “things actually getting better” has literally never happened as far as Reddit or its user interface has ever been concerned, as you should well know if you’ve ever been a human Reddit user.)



  • I was a Slashdot user.

    People kept hyping Digg as a Slashdot replacement, but trying to submit posts was actually even more futile in practice than trying to submit articles to Slashdot editors. So much bigger hivemind too. Boring unfunny comment section.

    When I first joined Reddit, it seemed like it was mostly populated by Slashdot refugees. Just people posting awesome shit. Great riveting discussions, even before anyone actually read the articles. That sort of stuff.


  • Depends on the type of account, but here are some of the common methods of how this might happen:

    • The attacker could be straight up guessing the password. (One possible way to mitigate this: the website can go “wow, 10 failed login attempts from that source. I’m going to ignore all attempts from there for 24 hours.”)
    • The attacker could be using previously exposed passwords. (One possible way to mitigate this: The websites should immediately require password reset for all users when that kind of data breach happens. For users: never use same password for multiple different services, certainly never reuse a compromised password even if it’s for a different service. Also: haveibeenpwned.com)
    • The attacker, currently using the same network, could hijack the session. (This was a really huge problem back in the day. In this day and age, websites should be using HTTPS, which limits this very much. Still possible if the site doesn’t use HTTPS, and through some other vectors, e.g. malware or hijacked network hardware).

    Also: Malware is a really scary big problem in that they’re rarely targeting you specifically. Why do that, when they can million people at the same time and sift through that stolen data for most valuable stuff, right?


  • I was about to say “this reminds me of the Hot Dog Stand”.

    …but someone actually made Hot Dog Stand. Shit.

    Look, I’m a Linux nerd, and there are very few things that scare me. Linux Kernel programmers, maybe - you don’t meddle with them unless the hour is truly dire and we form a delegation to seek their aid after a complex debate as the world burns around us and we climb their mountain together. …And the other thing that scares me are some particular brands of Microsoft ultra fans, for thereover lies madness like we have not seen before.