They found 1 (one!) commit in git, and report that's it's all over the kernel. Nice journalism.
Linux
A community for everything relating to the GNU/Linux operating system (except the memes!)
Also, check out:
Original icon base courtesy of lewing@isc.tamu.edu and The GIMP
That's ZDNET!
Checked who the author was, should have guessed... SJVN. He certainly has a flair for taking something relatively small, that a solution already exists for and suggesting something bureaucratic, unnecessary, and completely outside his technical competence. This is one of those things that the kernel devs can, and will solve when it's a real problem. Random journalists and armchair experts can wait till they're called upon.
"it's one horse and they report that it's all over troy. nice journalism" - people living in troy
I mean, read into what they wrote about:
I'm pleased to announce the release of AUTOSEL, a complete rewrite of the stable kernel patch selection tool that Julia Lawall and I presented back in 2018[1]. Unlike the previous version that relied on word statistics and older neural network techniques, AUTOSEL leverages modern large language models and embedding technology to provide significantly more accurate recommendations.
...
Would be great to hear more. My very subjective feeling is that the last batch of AUTOSEL is much worse than the previous. Easily 50% of false positives.
Seems the newly rewritten kernel review tools wasn't what waa expected as an upgrade.
I guess the policy is that the code is reviewed. What does it matter if it was AI generated or not? If someone submits bullshit AI generated code he will be ignored in the future.
I would be deeply uncomfortable to work in an environment where one couldn't ask the author of a change for insights or rationale, because the author let some machine write it and therefore lacks any deeper understanding.
For me it's grounds to deny a merge request. Can't explain your code? Then it's evidently not clear enough. Come back when it is.
Should apply to all code too. Doesn't matter if you, stackoverflow or AI wrote it.
you think linux kernel problems are solved on stack overflow? app coding vs lib coding is a huge gap in what the code looks like, I don't even want to think of kernel code.
Kernel code isn't fundamentally different. Even designing hardware is still basically just coding, despite what hardware designers claim. (They think it's fundamentally different because many things happen in parallel in a single cycle.)
Volume and Moderation.
Generating slop is significantly quicker.
You get an increase in volume of people pushing slop, which then has to be reviewed. In addition to the increase in submissions you also get the increase in fidelity/general complexity of the submissions.
Reviewing a PR generated by LLM's used by amateurs is more involved than an equivalent PR written directly by said amateur.
Straight up coding mistakes aren't most of the issue, it's the complex architectural and logical bugs that are going to be the problems.
Stuff that's functional but logically/architecturally unsound is much harder to spot and it's significantly easier to generate these kinds of issues with an LLM than to write them out by hand.
If someone submits bullshit AI generated code he will be ignored in the future.
Like this for example, a seemingly reasonable functional argument that is relatively logically unsound, in that is focuses on a narrow "happy path" and ignores where the actual issues are.
1 . To get to the stage where you can block this person you need to review the code first and identify if there is an issue.
Doing this for LLM generated code takes longer, on average.
- It's also now possible for people less skilled to generate a higher volume of code that looks more reasonable, so that increases the total amount of reviews needed.
So the existing process of reviewing people and code is now a multiple more difficult and resource consuming.
Which is generally what people want addressed.
Can LLM's help?, possibly.
Are there issues that are going to become a large resource problem if we don't actually address them, yes.
Ok, so you're suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it's components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.
I think the slop patches are small fixes suggested by some AI code analysis tools, that architectural and complex changes are part of well defined roadmap and don't come out of nowhere and that code that doesn't follow conventions is easily spotted and rejected. The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches. The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me.
TL;DR;
You asked why it mattered if it's LLM generated or not, i provided examples where it does matter, nothing you've said in your reply seems to refute that so I'll just assume we've agreed on this point.
The rest of this reply is just me replying to your additional arguments.
Ok, so you’re suggesting that people are submitting kernel patches that somehow modify the architecture of the kernel/it’s components, that the new architecture is very complex and hard to analyze, that the those architectural changes are part of roadmap and are not rejected right away and that those big, complex architectural level patches are submitted with high frequency. Somehow I doubt all of it.
I mean, i didn't say any of that but feel free to doubt a position you just made up.
I think the slop patches are small fixes suggested by some AI code analysis tools.
There's no reason to believe that LLM usage is limited to small patches.
that architectural and complex changes are part of well defined roadmap and don’t come out of nowhere and that code that doesn’t follow conventions is easily spotted and rejected.
In a well maintained project, sure, ish, but let's just say you're right about the plan/roadmap phase.
The spotting and rejection you mentioned are now significantly more time and resource consuming for the reasons i stated in the previous reply.
Also when i used the word architecturally i was referring to the logical domain of the patch and the things it interacts with, i wasn't implying that LLM's would get a chance at re-architecting an entire project as large as the Linux kernel.
At least i'd hope not.
The linked article talks only about marking the code as AI generated (IMHO useless but harmless) and increasing volume of AI slop patches.
I'm not sure of the usefulness of this kind of marking in practice, but i can tell you a way in which it might be useful.
The way you need to go about evaluating LLM generated code vs human code can be different.
And before you get on your high horse I'm not saying we shouldn't be doing a good job reviewing in general, of course we should.
Review and testing resources are limited in most practical settings, we should be focusing on best utilising that resource in the most efficient manner possible.
There are tools specifically geared towards evaluating LLM generated code for specific mistakes, this marking would enable a more efficient usage/allocation of review resources over and above the baseline code-quality tests.
The idea that maintainers spend time analyzing complex LLM generated code submitted by random amateurs looking for possible architectural bugs sounds like a fantasy to me
Which is clear from your answers, if you don't understand how pull request review works in practice you're going to struggle to make a coherent argument that requires that understanding.
To answer the statement directly, there's sometimes no efficient way to tell which patches are from amateurs, even without LLM's.
The issue isn't even just relegated to amateurs, i would like to assume a competent dev of any skill level wouldn't be submitting patches they don't understand but that's just not always the case.
and again, think architecture with a 'little a' rather than a 'big A'.
Logical flow and domain understanding in a relatively limited scope, rather than system-wide structural change.
The difference between tactics and strategy.
Are you Linux kernel contributor?
No.
You ?
edit: If any of my answers made it seem like i was, let me know and i'll adjust them, that was not my intention.
No. Let's wait for someone who knows what they are talking about.
You mean like a software developer who has to deal with PR's from sources that may or may not include LLM generated code ?
If that's the case, i might know someone.......
Wait... unless your original assertion was very specifically about only linux kernel development and not about the principles that apply to software PR review and LLM's as a whole ?
In that case, i don't have anyone to hand and you should probably mark it "Active Linux Kernel Contributors Only".
It's clearer that way.
I think this needs to be the policy for all generated content. You can't stand behind the AI blaming it for its errors. It's like a CEO of a multinational bank blaming the inturn for a market crash.
AI is creeping on Kernel Sanders???
