At the time of writing, Lemmyworld has the second highest number of active users (compared to all lemmy instances)

Also at the time of writing, Lemmyworld has >99% uptime.

By comparison, other lemmy instances with as many users as Lemmyworld keep going down.

What optimizations has Lemmyworld made to their hosting configuration that has made it more resilient than other instances’ hosting configurations?

See also Does Lemmy cache the frontpage by default (read-only)? on !lemmy_support@lemmy.ml

  • wheen@lemmy.world
    link
    fedilink
    arrow-up
    15
    ·
    1 year ago

    Can none of this scale horizontally? Every mention of scaling has been just “throw a bigger computer at it”.

    We’re already running into issues with the bigger servers being unable to handle the load. Spinning up entirely new instances technically works, but is an awful user experience and seems like it could be exploited.

    • PriorProject@lemmy.world
      link
      fedilink
      arrow-up
      43
      arrow-down
      1
      ·
      edit-2
      1 year ago

      It’s important to recall that last week the biggest lemmy server in the world ran on a 4-core VM. Anybody that says you can scale from this to reddit overnight with “horizontal scaling” is selling some snake oil. Scaling is hard work and there aren’t really any shortcuts. Lemmy is doing pretty well on the curve of how systems tend to handle major waves of adoption.

      But that’s not your question, you asked if Lemmy can horizontally scale. The answer is yes, but in a limited/finite way. The production docker-compose file that many lemmy installs are based on has 5 components. From the inside out, they are:

      • Postgres: The database, stores most of the data for the other components. Exposes a protocol to accept and return SQL queries and responses.
      • Lemmy: The application server, exposes websockets and http protocols for lemmy clients… also talks to the db.
      • Lemmy-ui: Talks to Lemmy over websockets (for now, they’re working to deprecate that soon) and does some fancy dynamic webpage construction.
      • Nginx: Acts as a web proxy. Does https encryption, compression over the wire, could potentially do some static asset caching of images but I didn’t see that configured in my skim of the config.
      • Pict-rs: Some kind of image-hosting server.

      So… first off… there’s 5 layers there that talk to each other over the docker network. So you can definitely use 5 computers to run a lemmy instance. That’s a non-zero amount of horizontal scaling. Of those layers, I’m told that lemmy and lemmy-ui are stateles and you can run an arbitrary number of them today. There are ways of scaling nginx using round-robin DNS and other load-balancing mechanisms. So 3 out of the 5 layers scale horizontally.

      Pict-rs does not. It can be backed by object storage like S3, and there are lots of object storage systems that scale horizontally. But pict-rs itself seems to still need to be a single instance. But still, that’s just one part of lemmy and you can throw it on a giant multicore box backed by scalable object storage. Should take you pretty far.

      Which leaves postgres. Right now I believe everyone is running a single postgres instance and scaling it bigger, which is common. But postgres has ways to scale across boxes as well. It supports “read-replicas”, where the “main” postgres copies data to the replicas and they serve reads so the leader can focus on handling just the writes. Lemmy doesn’t support this kind of advanced request routing today, but Postgres is ready when it can. In the far future, there’s also sharding writes across multiple leaders, which is complex and has its downsides but can scale writes quite a lot.

      All of which is to say… lemmy isn’t built on purely distributed primitives that can each scale horizontally to arbitrary numbers of machines. But there is quite a lot of opportunity to scale out in the current architecture. Why don’t people do it more? Because buying a bigger box is 10x-100x easier until it stops being possible, and we haven’t hit that point yet.