Lol. Lmao even. "DeepSeek R1 reproduced for $30: Berkeley researchers replicate DeepSeek R1 for $30—casting doubt on H100 claims and controversy"

Snot Flickerman@lemmy.blahaj.zone · 15 hours ago

Lol. Lmao even. "DeepSeek R1 reproduced for $30: Berkeley researchers replicate DeepSeek R1 for $30—casting doubt on H100 claims and controversy"

BlueMonday1984@awful.systems · 13 hours ago

To reference a previous sidenote, DeepSeek gives corps and randos a means to shove an LLM into their shit for dirt-cheap, so I expect they’re gonna blow up in popularity.

SGforce@lemmy.ca · 14 hours ago

They finetuned 1.5-3b models. This is a non-story

MrPoopyButthole@lemmy.dbzer0.com · 13 hours ago

Yup and it’s not even testing general reasoning. They didn’t have money for that.

self@awful.systems · 22 minutes ago

fuck almighty have these DeepSeek threads been attracting a lot of LLM “experts”

swlabr@awful.systems · 5 hours ago

Is General reasoning in the room with us now?

froztbyte@awful.systems · 2 hours ago

I heard someone say Private Reasoning was around the corner. Think they’re related?

fallowseed@lemmy.world · 14 hours ago

open source behaving like open source? couldn’t be the evil scary chinese!

vrighter@discuss.tchncs.de · 2 hours ago

open weights is not open source. If it were, then nobody would have to work on trying to reproduce it. They could just run the build script.

reallykindasorta@slrpnk.net · edit-2 13 hours ago

Non-techie requesting a laymen explanation if anyone has time!

After reading a couple of”what makes nvidias h100 chips so special” articles I’m gathering that they were supposed to have a significant amount more computational capability than their competitors (which I’m taking to mean more computations per second). So the question with deepseek and similar is something like ‘how are they able to get the same results with less computations?’ and the answer is speculated to be more efficient code/instructions for the AI model so it can make the same conclusions with less computations overall, potentially reducing the need for special jacked up cpus to run it?

fallowseed@lemmy.world · edit-2 6 hours ago

i read that that the chinese made alterations to the cards, as well-- they dismantled them to access the chips themselves and were able to do more precise micromanagement that cuda doesn’t support, for instance… basically they took the training wheels off and used a more fine-tuned and hands-on approach that gave them some serious advantages

froztbyte@awful.systems · 2 hours ago

got a source for that?

fallowseed@lemmy.world · 40 minutes ago

just something i read, this isn’t the original source i read, but a quick search gave me: https://www.xatakaon.com/robotics-and-ai/the-secret-to-deepseeks-extreme-efficiency-is-out-it-bypasses-nvidias-cuda-standard

froztbyte@awful.systems · edit-2 18 minutes ago

okay so that post’s core supposition (“using ptx instead of cuda”) is just ~~fucking wrong~~ fucking weird and I’m not going to spend time on it, but it links to this tweet which has this:

DeepSeek customized parts of the GPU’s core computational units, called SMs (Streaming Multiprocessors), to suit their needs. Out of 132 SMs, they allocated 20 exclusively for server-to-server communication tasks instead of computational tasks

this still reads more like simply tuning allocation than outright scheduler and execution control (which your post alluded to)

[x] doubt

e: original wording because cuda still uses ptx anyway, whereas this post looks like it’s saying “they steered ptx directly”. at first I read the tweet more like “asm vs python” but it doesn’t appear to be what that part meant to convey. still doubting the core hypothesis tho

froztbyte@awful.systems · edit-2 14 minutes ago

sidebar: I definitely wouldn’t be surprised if it comes to this overall being a case of “a shop optimised by tuning, and then it suddenly turns out the entire industry has never tried to tune a thing ever”

because why try hard when the money taps are open and flowing free? velocity over everything! this is the bayfucker way.

fallowseed@lemmy.world · 23 minutes ago

well you’re always free to doubt and do your own research-- as i mentioned- it is something i read and between believing what the US tech bros are saying when all their money and hegemony is on the line vs what the chinese have given up for free-use, i am going to go out on a limb and trust the chinese. you’re free to make your own decisions in this regard and kudos for having your own mind.

justOnePersistentKbinPlease@fedia.io · 12 hours ago

From a technical POV, from having read into it a little:

Deepseek devs worked in a very low level language called Assembly. This language is unlike relatively newer languages like C in that it provides no guardrails at all and is basically CPU instructions in extreme shorthand. An “if” statement would be something like BEQ 1000, where it goes to a specific memory location(in this case address 1000 if two CPU registers are equal.)

The advantage of using it is that it is considerably faster than C. However, it also means that the code is mostly locked to that specific hardware. If you add more memory or change CPUs you have to refactor. This is one of the reasons the language was largely replaced with C and other languages.

Edit: to expound on this: “modern” languages are even slower, but more flexible in terms of hardware. This would be languages like Python, Java and C#

fartsparkles@lemmy.world · 12 hours ago

I’m sure that non techie person understood every word of this.

blakestacey@awful.systems · 11 hours ago

And I’m sure that your snide remark will both tell them what to simplify and explain how to do so.

Enjoy your free trip to the egress.