• Soyweiser@awful.systems
    link
    fedilink
    English
    arrow-up
    4
    ·
    6 days ago

    So, anybody know the regular linux commands to turn a pdf into markdown? I assume there is a simple command that does that for you, if there isn’t already a pdf2markdown.

    • flaviat@awful.systems
      link
      fedilink
      English
      arrow-up
      5
      ·
      5 days ago

      There cannot be such a thing since pdf does not structure its data. There is an extension to the standard that would let a program do it for you but nobody uses it (PDF/UA-1). (also pandoc is vibe coded now)

      • diz@awful.systems
        link
        fedilink
        English
        arrow-up
        2
        ·
        12 hours ago

        There could be but it would be difficult to implement - something a bit like putting together a shredded paper, heh.

      • froztbyte@awful.systems
        cake
        link
        fedilink
        English
        arrow-up
        4
        ·
        4 days ago

        yeah, my answer to this also used to be pandoc until they took the prompt unto their soul

        it’s deeply fucking frustrating

        • Soyweiser@awful.systems
          link
          fedilink
          English
          arrow-up
          3
          ·
          4 days ago

          That sucks so much. But thanks anyway everybody, my post was half shitpost, half serious. (And I know some things can’t be easily converted, (but my regexp to match xhtml script is almost complete).

          I’m a bit surprised (but not totally) there actually was a proper tool for it a bit, even if it is vibe corrupted now.