One thing Reddit dominates on is search results. I’m looking things up and seeing so many links to reddit, which I guess is going to help keep that place relevant (unless those subreddits stay dark).

I wondered how Lemmy and this fed thingy stuff all works for that? With more posts can we expect to see people arriving through search results?

  • kadu@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    ·
    2 years ago

    One thing to keep in mind is that Google currently penalizes links that don’t end in the common top domains like “.com”, “.org” and similar. So something like lemmy.world, if indexed, will rank lower than a site ending in .com with the same keyword density.

    • Briongloid@aussie.zone
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      2 years ago

      Google went from being the most important website on the internet to being more and more useless, it’s amazing seeing such a massive company go downhill. But they have so much money that they’ll be able to stay big forever from capital alone.

      • gun@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 years ago

        What do you use as a search engine instead of Google? I feel like I’ve tried everything, but always end up back at Google search.

        • crt0o@lemm.ee
          link
          fedilink
          English
          arrow-up
          6
          ·
          2 years ago

          I’ve been using DuckDuckGo for about a year now, the results still aren’t as good as google, but not having to look at ads and the better privacy outweigh that for me. It really has improved a lot over the last few years.

        • Ministar@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          2 years ago

          Been using Ecosia and so far its been very good. I did not have a need to use Google once.

    • SkyNTP@lemmy.ml
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      1
      ·
      edit-2
      2 years ago

      Let Google be irrelevant. It kind of already is there in the absence of Reddit.

      The nerds always blaze a trail when boring old entrenched media ruins good things. In this case the thing being ruined is a search engine that makes the critical mistake of assuming a traditionally “prestigious” .com equates value. Fuck the old establishment, it’s time to ditch decrepit big tech and remake the internet the way it was meant to be. It’s time to reinvent how we share and discover content.

      • CalcProgrammer1@lemmy.ml
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 years ago

        Embracing the Fediverse now does pretty much feel like “taking back the Internet”. It reminds me of the early days and that’s an amazing thing. Tired of the over commercialized hellscape the Internet has become over the past decade and a half.

    • Joe B@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      2 years ago

      Fine with me. We will have a lot of users and become bigger then reddit and Google will still treat us like second class citizens. oh well for Google they are missing out

    • ccx@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Making it a Searx plugin would probably do better in terms of making it accessible to a lot of people.

      I wonder how good are various ActivityPub instances at searching. Having pregenerated fulltext indexes of public content available for download could go a long way to make building search engine easy and fast.

    • Ben@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      2 years ago

      Actually the point is that, if someone searched internet for ‘fediverse sources’ they wouldn’t find a relevant thread on lemmy.world, or lemmy.ml, or whatever.

  • wpuckering@lm.williampuckering.com
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    2 years ago

    There’s a lot of things that factor into the answer, but I think overall it’s gonna be pretty random. Some instances are on domains without “Lemmy” in the name, some don’t include “Lemmy” in the site name configuration, and in the case of some like my own instance, I set the X-Robots-Tag response header such that search engines that properly honor the header won’t crawl or index content on my instance. I’ve actually taken things a step further with mine and put all public paths except for the API endpoints behind authentication (so that Lemmy clients and federation still work with it), so you can’t browse my instance content without going through a proper client for extra privacy. But that goes off-topic.

    Reddit was centralized so could be optimized for SEO. Lemmy instances are individually run with different configuration at the infrastructure level and the application configuration level, which if most people leave things fairly vanilla, should result in pretty good discovery of Lemmy content across most of these kinds of instances, but I would think most people technical enough to host their own instances would have deviated from defaults and (hopefully) implemented some hardening, which would likely mess with SEO.

    So yeah, expect it to be pretty random, but not necessarily unworkable.

    • OrangeSlice@lemmy.mlM
      link
      fedilink
      English
      arrow-up
      8
      ·
      2 years ago

      Easily the best answer here, I think the people who think it will work “just like Reddit” are unfamiliar with federation still, and aren’t used to thinking things through in those terms.

      Not to mention that Google results in general have been pretty trash for a couple years now. I don’t expect fediverse content to be prominent for some time unless there is a dedicated service that indexes everything.

      • itty53@vlemmy.net
        link
        fedilink
        English
        arrow-up
        5
        ·
        2 years ago

        I mean why couldn’t there be a dedicated service that indexes everything? Whoever makes it and gets it working in a user friendly manner is going to have a significant level of control on the content that is shown in the results. If you don’t want it, it isn’t indexed. I don’t have to stretch the imagination to think of parties that have good reason to want to be first to do that across Activity Pub as a whole. Mastodon is already a big frontrunner in that regard.

      • jmp242@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 years ago

        I kind of feel like Kagi will be all over this with it’s forum ‘lens’ for search, but it’s paid. Maybe boardreader would focus on this too?

        Google search isn’t as good as it used to be and using startpage.com to break the filter bubble isn’t effective as much anymore either. So we probably all also need to start remembering like 1999 and different search engine for different things and looking for what works the best.

    • melonpunk@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 years ago

      Great answer, thanks.

      I’m not hugely familiar with SEO, but I seem to remember there could be a penalty applied to content that is duplicated as it’s seen as spammy. I might be wrong on how this works though, and it could be based around only content pasted within a single domain.

      I just wonder how search engines will deal with seeing the same content across a lot of instances in terms of ranking and noise.

    • fizzym4d@lemmy.fmhy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 years ago

      Your “off-topic” sounded pretty cool to me! I love that that is something anyone can do when hosting a lemmy instance. You get to choose if it’s searchable on the web! Obviously there are search engines which ignore the no scraping/indexing header, but the rest of what you did should counteract that, noice.

      • wpuckering@lm.williampuckering.com
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        2 years ago

        Yeah, if you’re running something yourself, you can do pretty much whatever you want in order to protect it. Especially if it’s behind a reverse proxy. Firewalls are great for protecting ports, but reverse proxies can be their own form of protection, and I don’t think a lot of people associate them with “protection” so much. Why expose paths (unauthenticated) that don’t need to be? For instance, in my case with my Lemmy instance, all any other instance needs is access to the /api path which I leave open. And all the other paths are behind basic authentication which I can access, so I can still use the Lemmy web interface on my own instance if I want to. But if I don’t want others browsing to my instance to see what communities have been added, or I don’t want to give someone an easy glance into what comments or posts my profile has made across all instances (for a little more privacy), then I can simply hide that behind the curtain without losing any functionality.

        It’s easy to think of these things when you have relevant experience with things such as web development, debugging web applications, full stack development, and subject matter knowledge in those and related areas.

        • Arinshot@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 years ago

          I’d be interested in how you did this, this seems like one of the best ways I’ve seen for securing a lemmy instance.

          • maynarkh@feddit.nl
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 years ago

            One easy way to do that is to set up something like Nginx as a reverse proxy in front and forward /api clean, but forward everything else with basic auth.

            The steps broadly would be:

            And you’re done.

          • wpuckering@lm.williampuckering.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 years ago

            I have a single Nginx container that handles reverse proxying of all my selfhosted services, and I break every service out into its own configuration file, and use include directives to share common configuration across them. For anyone out there with Nginx experience, my Lemmy configuration file should make it fairly clear in terms of how I handle what I described above:

            server {
              include ssl_common.conf;
              server_name lm.williampuckering.com;
              set $backend_client lemmy-ui:1234;
              set $backend_server lemmy-server:8536;
              
              location / {
                set $authentication "Authentication Required";
                include /etc/nginx/proxy_nocache_backend.conf;
                
                if ($http_accept = "application/activity+json") {
                  set $authentication off;
                  set $backend_client $backend_server;
                }
                if ($http_accept = "application/ld+json; profile=\"https://www.w3.org/ns/activitystreams\"") {
                  set $authentication off;
                  set $backend_client $backend_server;
                }
                if ($request_method = POST) {
                  set $authentication off;
                  set $backend_client $backend_server;
                }
                
                auth_basic $authentication;
                auth_basic_user_file htpasswd;
                proxy_pass http://$backend_client;
              }
              
              location ~* ^/(api|feeds|nodeinfo|.well-known) {
                include /etc/nginx/proxy_nocache_backend.conf;
                proxy_pass http://$backend_server;
              }
              
              location ~* ^/pictrs {
                proxy_cache lemmy_cache;
                include /etc/nginx/proxy_cache_backend.conf;
                proxy_pass http://$backend_server;
              }
              
              location ~* ^/static {
                proxy_cache lemmy_cache;
                include /etc/nginx/proxy_cache_backend.conf;
                proxy_pass http://$backend_client;
              }
              
              location ~* ^/css {
                proxy_cache lemmy_cache;
                include /etc/nginx/proxy_cache_backend.conf;
                proxy_pass http://$backend_client;
              }
            }
            

            It’s definitely in need of some clean-up (for instance, there’s no need for multiple location blocks that do the same thing for caching, a single expression can handle all of the ones with identical configuration to reduce the number of lines required), but I’ve been a bit lazy to clean things up. However it should serve as a good example and communicate the general idea of what I’m doing.

  • dan@lemm.ee
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    2 years ago

    My guess is just that Reddit happily lets search engines crawl it, so that content is well-indexed, and because Reddit threads are often linked to from elsewhere the site is considered good quality.

    I’d imagine Lemmy would eventually get to the same point naturally if enough information is shared here. At least, assuming it doesn’t block search engines.

    Hmm although I don’t really understand how federation will fit with that, given it basically means the same content is duplicated on a bunch of domains.

  • Ben@lemmy.ml
    link
    fedilink
    English
    arrow-up
    4
    ·
    2 years ago

    I actually added a custom search engine to Firefox… so I can search something on Lemmy. I have the keyword ‘LW’ for Lemmy.World search right now (because Lemmy.ml was offline a while).

    Basically, do the Lemmy search (search term ssss) then edit/replace ssss > %s and copy the entire link. https://lemmy.world/search/q/%s/type/All/sort/TopAll/listing_type/All/community_id/0/creator_id/0/page/1

    Then using ‘add custom search engine’ extension on Firefox, you add it.

  • piece
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 years ago

    I’m curious about this too, but I don’t think this can compete

  • JohannesOliver@beehaw.org
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 years ago

    One would hope! I can find results from lemmy instances on Google - they are definitely crawling them, but their page rank is going to start out very low.

  • Monkey With A Shell@lemmy.socdojo.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    A lot of search engines rely on backlinks to rank the reliablitly/validity of a site so even if a given instance was picked up to have enough places reference it to be seen as a valid source would ve a pretty heavy lift.

  • Kalkaline @lemmy.one
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    I guess you’d have to try it out, right? Maybe look up some topics and point Google to Lemmy. Honestly haven’t looked much into the whole community beyond setting up a Mastodon account a while back and looking into it a bit more this week.

  • theTrainMan932@infosec.pub
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    I imagine it’ll take a while for fediverse stuff to be high up on search results but it should still work and appear the same way as reddit posts do, just using the federated domains instead of all only being on one site. Hopefully people do start arriving for that reason though!

  • Djokkum@rammy.site
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    2 years ago

    I would expect Lemmy to show up equally in the search results if there is enough relevant content. My tiny tiny instance is already showing up in search results, crawlers can definitely find stuff on here. It would be great if at some point we can append “lemmy” to search queries to get the good stuff like we could with Reddit.