• BakedCatboy@lemmy.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      4 days ago

      If you have a lot of RAM, you can run small models slowly on the CPU. Your integrated graphics I would guess won’t fit anything useful in it’s vram, so if you really want to run something locally, getting some extra sticks of RAM is probably your cheapest option.

      I have 64G and I run 8-14b models. 32b is pushing it (it’s just really slow)

      • Lucy :3@feddit.org
        link
        fedilink
        arrow-up
        2
        ·
        4 days ago

        Don’t iGPUs use the RAM as VRAM directly? You’d only need to configure how much in the BIOS (eg. by default it uses 1.5GB of 8GB or smth and you can set it to 6/8GB)

        • BakedCatboy@lemmy.ml
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          3 days ago

          Yes for gaming, but for LLMs I’ve heard that the bandwidth limitations of using system RAM as vram hurts performance worse than running on the CPU using system memory directly, since smaller models are more memory bandwidth limited.

          I’ve never tried to run AI on an igpu with system memory though so you could try it, assuming it will let you allocate like 32GB or more like 64GB. I think you’ll also need a special runner that supports igpus.