• ArchRecord@lemm.ee
    link
    fedilink
    English
    arrow-up
    11
    ·
    3 days ago

    Here’s someone doing 200 tokens/s (for context, OpenAI doesn’t usually get above 100) on… A Raspberry Pi.

    Yes, the “$75-$120 micro computer the size of a credit card” Raspberry Pi.

    If all these AI models can be run directly on users devices, or on extremely low end hardware, who needs large quantities of top of the line GPUs?

    • aesthelete@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      ·
      edit-2
      3 days ago

      Thank the fucking sky fairies actually, because even if AI continues to mostly suck it’d be nice if it didn’t swallow up every potable lake in the process. When this shit is efficient that makes it only mildly annoying instead of a complete shitstorm of failure.

    • adoxographer@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      3 days ago

      While this is great, the training is where the compute is spent. The news is also about R1 being able to be trained, still on an Nvidia cluster but for 6M USD instead of 500

      • alvvayson@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 days ago

        True, but training is one-off. And as you say, a factor 100x less costs with this new model. Therefore NVidia just saw 99% of their expected future demand for AI chips evaporate

        Even if they are lying and used more compute, it’s obvious they managed to train it without access to the large amounts of the highest end chips due to export controls.

        Conservatively, I think NVidia is definitely going to have to scale down by 50% and they will have to reduce prices by a lot, too, since VC and government billions will no longer be available to their customers.

        • adoxographer@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          I’m not sure. That’s a very static view of the context.

          While china has an AI advantage due to wider adoption, less constraints and overall bigger market, the US has higher tech, and more funds.

          OpenAI, Anthropic, MS and especially X will all be getting massive amounts of backing and will reverse engineer and adopt whatever advantages R1 had. Which while there are some it’s still not a full spectrum competitor.

          I see the is as a small correction that the big players will take advantage of to buy stock, and then pump it with state funds, furthering the gap and ignoring the Chinese advances.

          Regardless, Nvidia always wins. They sell the best shovels. In any scenario the world at large still doesn’t have their Nvidia cluster, think Africa, Oceania, South America, Europe, SEA who doesn’t necessarily align with Chinese interests, India. Plenty to go around.

          • alvvayson@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 days ago

            Extra funds are only useful if they can provide a competitive advantage.

            Otherwise those investments will not have a positive ROI.

            The case until now was built on the premise that US tech was years ahead and that AI had a strong moat due to high computer requirements for AI.

            We now know that that isn’t true.

            If high compute enables a significant improvement in AI, then that old case could become true again. But the prospects of such a reality happening and staying just got a big hit.

            I think we are in for a dot-com type bubble burst, but it will take a few weeks to see if that’s gonna happen or not.

            • adoxographer@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              3 days ago

              Maybe, but there is incentive to not let that happen, and I wouldn’t be surprised if “they” have unpublished tech that will be rushed out.

              The ROI doesn’t matter, it wasn’t there yet it’s the potential for it. The Chinese AIs are also not there yet. The proposition is to reduce FTEs, regardless of cost, as long as cost is less.

              While I see OpenAi and mostly startups and VC reliant companies taking a hit, Nvidia itself as the shovel maker will remain strong.

    • GenosseFlosse@feddit.org
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      3 days ago

      Sure you can run it on low end hardware, but how does the performance (response time for a given prompt) compare to the other models, either local or as a service?

      • ArchRecord@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 days ago

        That set of tokens/s is the performance, or response time if you’d like to call it that. GPT-o1 tends to get anywhere from 33-60, whereas in the example I showed previously, a Raspberry Pi can do 200 on a distilled model.

        Now, granted, a distilled model will produce worse performance than the full one, as seen in a benchmark comparison done by DeepSeek here (I’ve outlined the most distilled version of the newest DeepSeek model, which is likely the kind that is being run on the Raspberry Pi, albeit likely with some changes made by the author of that post, as well as OpenAI’s two most high-end models of a comparable distillation)

        The gap in quality is relatively small for a model that is likely distilled far past what OpenAI’s “mini” model is, when you consider that even regular laptop/PC hardware is orders of magnitudes more powerful than a Raspberry Pi, or that an external AI accelerator can be bought for as little as $60, the quality in practice could be very comparable with even slightly less distillation, especially with fine-tuning for a given use case (e.g. a local version of DeepSeek in a code development platform would be fine-tuned specifically just to produce code-related results)

        If you get into the region of only cloud-hosted instances of DeepSeek that are running at-scale on GPUs like OpenAI’s models are, the performance is only 1-2 percentage points off from OpenAI’s model, at about 3-6% of the cost, which effectively means 3-6% of the total amount of GPU power being paid for compared to the amount of GPU power OpenAI is paying for.