• Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 month ago

    I bet he just wants a card to self host models and not give companies his data, but the amount of vram is indeed ridiculous.

    • Jeena@piefed.jeena.net
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 month ago

      Exactly, I’m in the same situation now and the 8GB in those cheaper cards don’t even let you run a 13B model. I’m trying to research if I can run a 13B one on a 3060 with 12 GB.

        • Viri4thus@feddit.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 month ago

          I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx

        • levzzz@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 month ago

          You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory