ChatGPT gets code questions wrong 52% of the time

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 year ago

ChatGPT gets code questions wrong 52% of the time

theluddite@lemmy.ml · 1 year ago

The real problem with LLM coding, in my opinion, is something much more fundamental than whether it can code correctly or not. One of the biggest problems coding faces right now is code bloat. In my 15 years writing code, I write so much less code now than when I started, and spend so much more time bolting together existing libraries, dealing with CI/CD bullshit, and all the other hair that software projects has started to grow.

The amount of code is exploding. Nowadays, every website uses ReactJS. Every single tiny website loads god knows how many libraries. Just the other day, I forked and built an open source project that had a simple web front end (a list view, some forms – basic shit), and after building it, npm informed me that it had over a dozen critical vulnerabilities, and dozens more of high severity. I think the total was something like 70?

All code now has to be written at least once. With ChatGPT, it doesn’t even need to be written once! We can generate arbitrary amounts of code all the time whenever we want! We’re going to have so much fucking code, and we have absolutely no idea how to deal with that.

space_comrade [he/him]@hexbear.net · 1 year ago

I don’t think it’s gonna go that way. In my experience the bigger the chunk of code you make it generate the more wrong it’s gonna be, not just because it’s a larger chunk of code, it’s gonna be exponentially more wrong.

It’s only good for generating small chunks of code at a time.

FunkyStuff [he/him]@hexbear.net · 1 year ago

It won’t be long (maybe 3 years max) before industry adopts some technique for automatically prompting a LLM to generate code to fulfill a certain requirement, then iteratively improve it using test data to get it to pass all test cases. And I’m pretty sure there already are ways to get LLM’s to generate test cases. So this could go nightmarishly wrong very very fast if industry adopts that technology and starts integrating hundreds of unnecessary libraries or pieces of code that the AI just learned to “spam” everywhere so to speak. These things are way dumber than we give them credit for.

space_comrade [he/him]@hexbear.net · edit-2 1 year ago

Oh that’s definitely going to lead to some hilarious situations but I don’t think we’re gonna see a complete breakdown of the whole IT sector. There’s no way companies/institutions that do really mission critical work (kernels, firmware, automotive/aerospace software, certain kinds of banking/finance software etc.) will let AI write that code any time soon. The rest of the stuff isn’t really that important and isn’t that big of a deal it if breaks for a few hours/days because the AI spazzed out.

FunkyStuff [he/him]@hexbear.net · 1 year ago

Agreed, don’t expect it to break absolutely everything but I expect that software development is going to get very hairy when you have to use whatever bloated mess AI is creating.

space_comrade [he/him]@hexbear.net · 1 year ago

I’m here for it, it’s already a complete shitshow, might as well go all the way.

SmoothIsFast@citizensgaming.com · 1 year ago

If you have seen the crunch before demos for military projects you might start to think the other way. I doubt the bigger vendors will change much but you definetly could see contracts being won for shit that will just be ai generated because they got some base manager to eat up their proposal filled with buzz words. I’d be more worried about it, causing more contract bloat and wasted resources in critical systems going to these vapor ware solutions. Then you take general government contracts which go to the lowest bidder and you are gonna see a ton of AI bullshit start cropping up and bloating our systems because some high-school kid got chatgpt to make a basic website and no thinks he is the AI website God. Plus I work in the financial sector now and they have been eating up all the AI buzzwords like fucking hot cakes, the devs all know it will be a shit show but the ego from the executives thinking it’s a great idea won’t hear any of it, because think of the efficiency and bonuses they could get if they cut the implementation timeline down to a quarter. Not realizing the vulnerability, maintainence cost, and lack of understanding from the llm that will cause massive long-term issues regardless if they can get a buggy alpha created.

possibly a cat@lemmy.ml · edit-2 1 year ago

deleted by creator

theluddite@lemmy.ml · 1 year ago

Yes I agree. I meant the fundamental problem with the idea of LLMs doing more and more of our code, even if they get quite good.

BloodyDeed@feddit.ch · edit-2 1 year ago

This is so true. I feel like my main job as a senior software engineer is to keep the bloat low and delete unused code. Its very easy to write code - maintaining it and focusing on the important bits is hard.

This will be one of the biggest and most challenging problems Computer Science will have to solve in the coming years and decades.

floofloof@lemmy.ca · edit-2 1 year ago

It’s easy and fun to write new code, and it wins management’s respect. The harder work of maintaining and improving large code bases and data goes mostly unappreciated.

DefinitelyNotAPhone [he/him]@hexbear.net · 1 year ago

There’s the other half of this problem, which is that the kind of code that LLMs are relatively good at pumping out with some degree of correctness are almost always the bits of code that aren’t difficult to begin with. A sorting algorithm on command is nice, but if you’re working on any kind of novel implementation then the hard bits are the business logic which in all likelihood has never been written before and is either sensitive information or just convoluted enough to make turning into a prompt difficult. You still have to have coders who understand architecture and converting requirements into raw logic to do that even with the LLMs.

AlexWIWA@lemmy.ml · 1 year ago

Makes the Adeptus Mechanicus look like a realistic future. Really advanced tech, but no one knows how it works

Antiwork [none/use name]@hexbear.net · edit-2 1 year ago

obama-spike uhhhh let me code here