Intel Announces Retirement of CEO Pat Gelsinger

farcaster@lemmy.world · 7 months ago

Intel Announces Retirement of CEO Pat Gelsinger

rumba@lemmy.zip · 7 months ago

Intel is totally missing the boat honestly. Their mobile i9 with the built-in GPU can share DDR5 with the video card.

You can put 96 gigs of RAM in a small form factor and load in a monster model. It’s not super fast, But it works, and it’s a lot faster than not offloading layers off the CPU.

They should be selling nuk sized PCs with built-in graphics and 128 gigs of the fastest RAM they can put on the boards.

brucethemoose@lemmy.world · 7 months ago

IMO its not really “enough” until the bus is 256 bit. Thats when 32B-72B class models start to look even theoretically runnable at decent speeds.

rumba@lemmy.zip · 7 months ago

he was getting 1.4 tokens on a 70B model. Not setting the world on fire, but enough to load and script against 70b

https://www.youtube.com/watch?v=xyKEQjUzfAk

brucethemoose@lemmy.world · edit-2 7 months ago

Also that is a very low context test. A longer context will bog it down, even setting aside the prompt processing time.

…On the other hand, you could probably squeeze a bit more running openvino instead of llama.cpp, so that is still respectable.

rumba@lemmy.zip · 7 months ago

text test. A longer co

yeah, it’s definitely not good enough for user-facing work, but if I’m working on development for something like translations, being able to see the 70b output to compare it to other models, it’s super useful before I send it off to something that costs more money to run.

9/10 times, the bigger model isn’t significantly better for what I’m trying to do, but it’s really nice to confirm that.