How can Lemmy scale?

HelloLemmySup@sh.itjust.works · 1 year ago

How can Lemmy scale?

MentalEdge@sopuli.xyz · edit-2 1 year ago

You’ve misunderstood. Every instance does not contain all content from every other instance. Only that which at least one user has specifically requested by entering the id of a community in the !name@instan.ce format in search.

This means that the star trek instance, will only ever need to mostly host start trek content. It wont get flooded with everything else on the entire network, as it grows. Some portion of it, yes, as users on the star trek instance will inevitably sub to at least some stuff outside it, too.

Additionally, pictures and media are cached, but not permanently federated. When you upload a picture, you may have noticed the link becoming one that points to the instance you’re posting from. This doesn’t change even when that post gets federated to other instances, they are still fetching that image from the instance it was posted from (unless its a recent post, in which case the image may well be cached, as well).

This means that whats gets federated, is mostly just a bunch of text data, and even then, just a subset that is needed. A much lighter load.

At the smallest scale, you could have a node with just one user, perhaps that user creates a community or two. But this means that that instance will ONLY EVER store the subs of that one user, and the content of the communities they created. Not even close to the total content of the entire fediverse.

HelloLemmySup@sh.itjust.works · 1 year ago

Ok thats a bit better. I didnt know about that detail.

Still that only moves the problem to the future. As I understand you should pick a community at random to sign up and then from that community access the rest. Then its a matter of time that enough users from StarTrek that have signed up there subscribe to enough big communities for the problem to appear, no?

maegul@lemmy.ml · 1 year ago

Yea, I think you’re right. Once any instance has enough users with enough interests and subscriptions to enough communities, you get a scenario where a good portion of the whole network is duplicated on every or many nodes of the whole network. This is how the fediverse works, and I’ve yet seen anyone seriously address what this looks like at large scales and long timelines.

Storage space isn’t too expensive I guess, so maybe it’s something we can just solve when we come to it.

But, the problem may be worse with threadiverse platforms (lemmy/kbin and any other grouped or threaded platform) for exactly the reason you highlight … the whole community and all of its discussions get duplicated. For microblogging platforms, things are more granular as it’s only single posts by people who are followed that duplicated.

It may not be fatal and may be something we can solve when we get, which makes sense as getting up to a significant scale of users is tough in its own right … but it’d sure be nice to see someone think through the numbers.

MentalEdge@sopuli.xyz · 1 year ago

This is literally how the entire internet works. You are describing CDNs.

HelloLemmySup@sh.itjust.works · 1 year ago

That’s why in my mind something like a consensus algorithm with the data duplicated N times where N < number of instances with subscribed people would make more sense. As it is right now I can’t see it scaling pass the few instances that can afford to keep it running.

dragnucs@lemmy.ml · 1 year ago

Individual servers do not handle all data. They only handle data required by its users.

nivenkos@lemmy.world · 1 year ago

The high RAM requirements are a little concerning in practice - I wonder what the main cause is.