A purported leak of 2,500 pages of internal documentation from Google sheds light on how Search, the most powerful arbiter of the internet, operates.

The leaked documents touch on topics like what kind of data Google collects and uses, which sites Google elevates for sensitive topics like elections, how Google handles small websites, and more. Some information in the documents appears to be in conflict with public statements by Google representatives, according to Fishkin and King.

  • jonne@infosec.pub
    link
    fedilink
    English
    arrow-up
    64
    arrow-down
    2
    ·
    1 month ago

    You mean hosting your own crawler/indexer? That doesn’t really sound like a thing you could do cost-effectively.

    • interdimensionalmeme@lemmy.ml
      link
      fedilink
      English
      arrow-up
      62
      ·
      1 month ago

      No problem we crowdsource the crawling torrent style.

      We outsourced that to google for reasonnable performance reason. But they shit the bed so now there’s no choice but to do it ourselves.

          • wanderingmagus@lemm.ee
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 month ago

            Veilid is a peer-to-peer network and application framework released by the Cult of the Dead Cow on August 11, 2023, at DEF CON 31.[1][2][3][4] Described by its authors as “like Tor, but for apps”,[5] it is written in Rust, and runs on Linux, macOS, Windows, Android, iOS,[6] and in-browser WASM.[7] VeilidChat is a secure messaging application built on Veilid.[1][4]

            Veilid borrows from both the Tor anonymising router and the InterPlanetary File System (IPFS), to offer encrypted and anonymous peer-to-peer connection using a 256-bit public key as the only visible ID. Even details such as IP addresses are hidden.[4]

            Source: https://en.wikipedia.org/wiki/Veilid

    • brbposting@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      16
      ·
      1 month ago

      Right!

      Before his company was able to block more of Microsoft’s own tracking scripts, DuckDuckGo CEO and founder Gabriel Weinberg explained in a Reddit reply why firms like his weren’t going the full DIY route:

      “… [W]e source most of our traditional links and images privately from Bing … Really only two companies (Google and Microsoft) have a high-quality global web link index (because I believe it costs upwards of a billion dollars a year to do), and so literally every other global search engine needs to bootstrap with one or both of them to provide a mainstream search product. The same is true for maps btw – only the biggest companies can similarly afford to put satellites up and send ground cars to take streetview pictures of every neighborhood.”

      Ars