• mindbleach@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    12 days ago

    The supervised fine-tuning phase employed Low-Rank Adaptation (LoRA) to efficiently adapt the base DeepSeek- R1-Distill-Qwen-7B model for extraction tasks

    So this is bolted on top of a model that cost six figures.

    • Dionysus@leminal.space
      link
      fedilink
      English
      arrow-up
      1
      ·
      12 days ago

      And deepseek is based on llama, more than six figures.

      I’m not aware of any larger parameter LLMs not based on one which is absurdly expensive.

      • mindbleach@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        12 days ago

        DeepSeek is trained from-scratch. Only some variants used other LLMs.

        This is a megaphone made from string, a squirrel, and a megaphone.