We’ve upgraded lemmy.world to 0.18.1-rc.1 and rolled back that upgrade because of issues.

(If you had posted anything in those 10 minutes between upgrade and rollback, that post is gone. Sorry!)

The main issue we saw is that users can’t login anymore. Existing sessions still worked, but new logins failed (from macos, ios and android. From linux and windows it worked)

Also new account creation didn’t work.

I’ll create an issue for the devs and retry once it’s fixed.

Edit Contacted the devs, they tell me to try again with lemmy-ui at version 0.18.0. Will try again, brace for some downtime!

Edit 2 So we upgraded again, and it seemed to work nicely! But then it slowed down so much it was unuseable. There were many locks in the database. People reported many JSON errors. Sorry, we won’t be on 0.18.1 any time soon I’m afraid…

  • rrobin@lemmy.world
    link
    fedilink
    arrow-up
    32
    ·
    1 year ago

    As any engineer who does ops can tell you - you did the right thing - the solution is always to roll back, never force a roll forward, ever.

    We should totally do pre and post update parties though. Even if the update fails we can have an excuse for drinks and a fun thread.

    • T156@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      Although since we seem to be rolling more than a ship in a storm, I think a proportion of lemmings would end up hospitalised for alcohol poisoning.

  • OsrsNeedsF2P@lemmy.ml
    link
    fedilink
    arrow-up
    18
    ·
    1 year ago

    Both Dessalines and Nutomic have been working their butts off to get 0.18.x ready for the Reddit API changes. Huge hopes they can pull through!

    Dessalines:

    Nutomic:

    • imaqtpie@sh.itjust.works
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      1 year ago

      Yeah they’ve been hard at work all month. But it’s also okay if things aren’t ready in time. Most of the people who matter are already here.

      Maybe we will blow up soon, maybe later, but the quality of content here is sufficient to drive growth regardless of whether or not we get the prophesized huge migrations from reddit

      • rambaroo@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        1 year ago

        It’s adequate the way it is now. They should hold off on any major upgrades until after the next batch of new users come in. Lemmy breaking during that period would be far worse then missing some updates and new features.

      • thegreatgarbo@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Agreed. I’m here to stay with the content from the last couple weeks. 3 day poop embargoes alone is worth it. And recognizing users because the community is still small enough is SO different from they who shall not be named.

    • egeres@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      Oh god, so much coffee… it’s also thrilling to grasp a sense of what’s going on under the hood of such big social networks at a development level (not like I could understand it, but it was very interesting to see twitter’s recommendation algorithm being open-sourced)

    • Ruud@lemmy.worldOPM
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      Yeah we have a test instance, but not sure if we could test with this kind of load…

    • Ulu-Mulu-no-die@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      There are already 3 test instances where devs push updates before going live, but:

      • if not many people go there test, they won’t catch all the bugs.
      • in this specific case, admins are trying to push a new release candidate (that is not officially released yet) because some people are stuck with mobile apps that don’t work anymore on the version we have now.
  • Jerti@lemmy.world
    link
    fedilink
    arrow-up
    15
    ·
    1 year ago

    UI: 0.18.0 BE: 0.18.1-rc.1 👍

    seems to be working for me, I was already logged on

    • Ruud@lemmy.worldOPM
      link
      fedilink
      arrow-up
      13
      ·
      1 year ago

      Because of a few things, really anoying bugs and the Jerboa app not working properly with older versions

    • dragontamer@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      1 year ago

      Federation is completely borked with .18 servers. Its very difficult for us to interact with https://lemmy.ca in any way: subscribing, upvoting, comments, posting… its all bugged.

      Its maybe not that big a deal because Lemmy.world “has the most users”, so in some regards its https://lemmy.ca’s loss but… we need to restore reliable federation… especially before the July 1st rush IMO.

      The .17 to .18 upgrade is basically a soft-defederation event, because of whatever this bug is between the two versions.

      • FearTheCron@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Ah that explains it. Someone posted a cool photo to my community from lemmy.ca but didn’t interact further. Looks like my comment didn’t even show up on their end.

        Anyway, thanks to everyone working on the issue. I know these things aren’t easy.

        • dragontamer@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          I’m subscribed to lemmy.ca and programming.dev, both of which are .18 now. Feels kinda bad losing access to those communities while this issue is getting worked out…

    • twistedtxb@lemmy.world
      link
      fedilink
      arrow-up
      7
      ·
      1 year ago

      I suspect there will be a large influx of new users in two days, and that having Jerboa not working on .world might cause a few issues

    • Antik 👾@lemmy.world
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      For example the frontpage is no longer being constantly filled with random posts when you set the filter to “active” or “top day”.

    • axzxc1236@lemmy.world
      link
      fedilink
      arrow-up
      6
      ·
      edit-2
      1 year ago

      Reasons I can think of:

      1. The official Android client for lemmy, Jerboa, only supports 0.18 and later, unless users download older version from github and sideload manually.

      2. Sorting is broken pre 0.18, new posts keeps flowing in.

      3. Performance improvement by removing web socket from lemmy. (which fixes 2, which is why 1. happens)

      • JdW@lemmy.world
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        1 year ago

        Jerboa works fine with lemmy world, it just gives a warning and crashes on occasion. Not an issue to use it though.

        • average650@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          The latest version only supports 0.18 because the backend works differently. Older versions of Jerboa support older backends of lemmy.

        • astanix@lemmy.world
          link
          fedilink
          arrow-up
          0
          ·
          1 year ago

          Likely because Jebora is written as a side project of the devs of lemmy so it’s always going to be cutting edge

          • Graphine@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            I understand that. I’m not complaining about the quality. I’m just confused from a technical perspective why it doesn’t support rollback or older server versions in the event of…this.

            • Darorad@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              Fixing the issues with posts appearing and scrolling the page the backend needed some pretty drastic rewrites. I’m not super familiar, but from what I’ve seen of the code, it would be a decent amount more work to support both versions.

              Would it have been worth it? Yes, but it wasn’t anticipated that devs would stay on 0.0.17 for more than a day or two. With the time it takes for app stores to update, servers would have been updated before lemmy 0.0.45 was updated for the vast majority of users. At most, it would be a day or two instead of a week or two.

      • WhiskyTangoFoxtrot@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        The official Android client for lemmy, Jerboa, only supports 0.18 and later, unless users download older version from github and sideload manually.

        Even that won’t work, because Jerboa initially tries to connect to lemmy.ml which is running 0.18, which older versions of Jerboa aren’t compatible with. The app just crashes instantly without giving you the opportunity to log in to your instance.

    • woelkchen@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Just curious, why are we updating now instead of waiting for the proper 0.18.1 release?

      A Release Candidate is supposed to be past beta testing already and in a state of no major bugs.

    • Joe B@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Cause if you are on the homepage for a little bit new post start pushing the old post down and it keeps doing it. something about websockets that 18 fixes it. im assuming the admins want to get it out now so people can stop complaining about it. i get it though just wait for the 18 instead of the rc-1 but users are impatient and can’t wait for sh*t!

  • naneek@lemmy.world
    link
    fedilink
    arrow-up
    12
    ·
    edit-2
    1 year ago

    Thanks for the transparency. Maybe it’s a good idea to have a test instance and some test cases/validation done there before updating the main instance. This is a regular process in any software/tech company/stack.

    Testing should never be done directly on the prod instance.

      • Antik 👾@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        Correct there was a test instance and that worked fine but once the live instance was upgraded everything became very slow. Can’t really emulate that kind of load

  • Nintendo@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    1 year ago

    appreciate the transparency. how are things looking in the back in lemmy.world (server wise)? will we get to a point where it wont require complete rollbacks on the state when a botched update gets rolled out?

    • Dandroid@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Not having to do a database rollback is a really, really hard problem to solve, and it would almost certainly need to be on the Lemmy developers side, not the server owner’s side. And if I’m them, that’s a low priority issue, and probably not something I even think about until 1.0.

      Basically, they write code that says what to do in the event of a database version change. Usually this only handles upgrade cases, because that’s what happening most of the time. One example of something you might do in a db upgrade is let’s say you had a column where the data type was only numbers, but now you want to allow any alphanumeric character for some reason. You could have a line of code that converts the number to a string.

      Okay, but now you need to go back to the previous version. Okay, your db change code runs, but it’s the old version of the db change code, not some new version that you wrote. You unfortunately didn’t have a crystal ball when you wrote this code and couldn’t predict that you were going to change the data to strings, so you didnt write code to change it from a string to a number.

      This is why most software doesn’t support downgrades unless you wipe first. For example, if you updated your aging MacBook to the latest Mac OS version, then realized it slows down your laptop too much, you can only go back if you first wipe your laptop in the process. So it’s just easier to just take a snapshot before an upgrade and revert to the snapshot if it fails. Some folks will even do “scheduled maintenance” time during the upgrade in which the whole system goes down for a short time so they don’t have to risk losing data that happened after the snapshot.

  • Sterben@lemmy.world
    link
    fedilink
    arrow-up
    8
    ·
    1 year ago

    I am grad I didn’t write my last post during the upgrade process.

    Maybe next time, give a warning, or maintenance notice.

    Thanks for trying though.

  • Guy_Fieris_Hair@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    1 year ago

    Suddenly my Jerboa won’t change from local and I can’t change the sort. Not sure if this is related, but it was working fine yesterday after I updated the app. But no longer this morning.

    • Sockenklaus@sh.itjust.works
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      What you’re describing is an issue introduced with Jerboa 0.0.36-alpha and has been fixed in today released 0.0.37-alpha.

    • jennwiththesea@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      Mine randomly quit working a few days ago. All I get is a blank screen. I’ve switched to the website for now, though I need to look into the other app options. There are a couple more now.

      • dingus@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Wefwef.app is pretty interesting. Its browser based and so far seems more stable to me than a lot of the current Android apps. Not sure what happened, but Jerboa and the like all have become nearly completely unusable with bugs and crashes.

        Wefwef seems to unfortunately be lacking some very basic features tho.

  • snargledorf@lemmy.world
    link
    fedilink
    arrow-up
    6
    ·
    1 year ago

    @ruud@lemmy.world just wondering if you have considered setting up a second, beta, instance of lemmy.world open to the public?

    With all the performance issues with 0.18.1, it’s highlighted that there needs to be a way to stress test these updates before applying them to the main instance.

    • Ruud@lemmy.worldOPM
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      Yes, considering that. But we’ll need people to use that when we will do testing…

        • med@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          One more. You find a stable way to notify anout upgrades and get a test sheet to run through and we can generate posts and activity to help test with.

          Light the beacons! lemmy.world calls for aid!

      • Finnagain@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        I’m not familiar with what the server architecture looks like, but is there a possibility of using a load balancer in front of the instance’s server and swapping a “beta” server into the load balancer when you need to do testing? You could basically migrate your traffic with zero downtime, assuming Lemmy’s architecture allows for it.

        • Ruud@lemmy.worldOPM
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          Well that doesn’t differ much from what I do. I just copy the files to a second directory and test with that. Easy rollback. Downside is, that all data is lost between upgrade and rollback, which will be the same in the scenario you suggest.

          • Finnagain@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            Ah, that’s fair. Best of luck either way. This is the tough part of admin work. FWIW, the stability we’ve had thus far has been pretty impressive!