Today, like the past few days, we have had some downtime. Apparently some script kids are enjoying themselves by targeting our server (and others). Sorry for the inconvenience.
Most of these ‘attacks’ are targeted at the database, but some are more ddos-like and can be mitigated by using a CDN. Some other Lemmy servers are using Cloudflare, so we know that works. Therefore we have chosen Cloudflare as CDN / DDOS protection platform for now. We will look into other options, but we needed something to be implemented asap.
For the other attacks, we are using them to investigate and implement measures like rate limiting etc.
You obviously haven’t written one.
Simple case, without sticky sessions:
2 app servers behind a naive load balancer. Assume an actually restful service. Also assume a reasonable single app design with persistent db connections and db caching. Assume a single client. Single clients first connection comes in to app servers 1. App servers 1 makes db connection and grabs relevant data out of db. Caches information for client expecting a reconnect. Client makes second call, load balancer places it on app server 2, app servers 2 now makes a second connection and queries the data.
The db has now done twice the work for a single client. This pattern is surprisingly common and as the user count grows this duplication significantly degrades cache performance and increases load on the db. It only gets worse as the user count increases.
It’s a common scenario for someone who doesn’t understand the point of putting a load balancer in front of a stateful application, perhaps. Not for anyone trying to solve a traffic problem.
No idea where you are getting your ideas from, but this is an absolutely uninformed example of how NOT to do something in an ideal way.
I’m really interested now which one of you is right. While the other person put some effort and gave a lot of actual information, you just come off as arrogant. Still, maybe you’re right. Care to elaborate why?
I’m not one of these 2 arguing. But in general the app servers don’t do caching or state handling.
You cache things in a third external cache such as redis or memcached. So if a user connects to app server 1 and then to app server 2 they will both grab cachee info from redis. No extra db calls required. This has been the basic way of doing things even with old school WordPress sites forever. You also store session cookies in there or in the db.
And even if you weren’t caching externally like this, databases use up a lot of memory to cache tons of data. So even if the same query hits the db the second hit would probably still be hot in memory and return super fast. It’s not double the load. At least with postgres this is the case and it’s what Lemmy uses.
Definitely this. I use PostgreSQL (which Lemmy uses on the backend) for an enterprise-grade system that has anywhere from 700-1k users at any given point in time, and it also takes in several million messages from external systems throughout the day. PostgreSQL is excellent at caching data in memory. I’ve got the code for that system up in another window while I write this.
At this point in time, it doesn’t look like Lemmy is using any form of an L2 cache like Redis or Memched. The only single point of failure (that’s not horizontally scalable) looks like the pic-rs server that Lemmy is using for image hosting. If anything, that could easily be swapped over to use something S3 compatible and easily hosted using something like Minio locally, or even directly off of B2 or Linode cloud storage (doesn’t charge for requests).
Not trying to come off as arrogant, but definitely incensed when I catch armchair tech heroes throwing wildly inaccurate information out there as if it were fact. This person has a very basic understanding of some terminology here, and zero idea how it is applied in the real world. Hate to see it.