Avieshek@lemmy.world to Technology@lemmy.worldEnglish · 1 month agoEdward Snowden slams Nvidia's RTX 50-series 'F-tier value,' whistleblows on lackluster VRAM capacitywww.tomshardware.comexternal-linkmessage-square103fedilinkarrow-up1118arrow-down129
arrow-up189arrow-down1external-linkEdward Snowden slams Nvidia's RTX 50-series 'F-tier value,' whistleblows on lackluster VRAM capacitywww.tomshardware.comAvieshek@lemmy.world to Technology@lemmy.worldEnglish · 1 month agomessage-square103fedilink
minus-squareThe Hobbyist@lemmy.ziplinkfedilinkEnglisharrow-up4·1 month agoYou can. I’m running a 14B deepseek model on mine. It achieves 28 t/s.
minus-squareViri4thus@feddit.orglinkfedilinkEnglisharrow-up1·1 month agoI also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx
minus-squarelevzzz@lemmy.worldlinkfedilinkEnglisharrow-up1·1 month agoYou need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory
minus-squareJeena@piefed.jeena.netlinkfedilinkEnglisharrow-up1·1 month agoOh nice, that’s faster than I imagined.
You can. I’m running a 14B deepseek model on mine. It achieves 28 t/s.
I also have a 3060, can you detail which framework (sglang, ollama, etc) you are using and how you got that speed? i’m having trouble reaching that level of performance. Thx
You need a pretty large context window to fit all the reasoning, ollama forces 2048 by default and more uses more memory
Oh nice, that’s faster than I imagined.