One thing that I find truly amazing is just the simple fact that you can now be ...

forgotoldacc · 2025-06-03T02:32:03 1748917923

There's the old quote from Babbage:

> On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.

It is kind of impressive how I'll ask for some code in the dumbest, vaguest, sometimes even wrong way, but so long as I have the proper context built up, I can get something pretty close to what I actually wanted. Though I still have problems where I can ask as precisely as possible and get things not even close to what I'm looking for.

kibwen · 2025-06-03T12:58:58 1748955538

> This has been an obviously absurd question for two centuries now. Turns out the people asking that question were just visionaries ahead of their time.

This is not the point of that Babbage quote, and no, LLMs have not solved it, because it cannot be solved, because "garbage in, garbage out" is a fundamental observation of the limits of logic itself, having more to with the laws of thermodynamics than it does with programming. The output of a logical process cannot be more accurate than the inputs to that process; you cannot conjure information out of the ether. The LLM isn't the logical process in this analogy, it's one of the inputs.

rcxdude · 2025-06-03T13:18:38 1748956718

At a fundamental level, yes, and even in human-to-human interaction this kind of thing happens all the time. The difference is that humans are generally quite good at resolving most ambiguities and contradictions in a request correctly and implicitly (sometimes surprisingly bad at doing so explicitly!). Which is why human language tends to be more flexible and expressive than programming languages (but bad at precision). LLMs basically can do some of the same thing, so you don't need to specify all the 'obvious' implicit details.

kibwen · 2025-06-03T13:57:33 1748959053

The Babbage anecdote isn't about ambiguous inputs, it's about wrong inputs. Imagine wanting to know the answer to 2+2, so you go up to the machine and ask "What is 3+3?", expecting that it will tell you what 2+2 is.

Adding an LLM as input to this process (along with an implicit acknowledgement that you're uncertain about your inputs) might produce a response "Are you sure you didn't mean to ask what 2+2 is?", but that's because the LLM is a big ball of likelihoods and it's more common to ask for 2+2 than for 3+3. But it's not magic; the LLM cannot operate on information that it was not given, rather it's that a lot of the information that it has was given to it during training. It's no more a breakthrough of fundamental logic than Google showing you results for "air fryer" when you type in "air frier".

simonask · 2025-06-03T15:49:07 1748965747

I think the point they’re making is that computers have traditionally operated with an extremely low tolerance for errors in the input, where even minor ambiguities that are trivially resolved by humans by inferring from context can cause vastly wrong results.

We’ve added context, and that feels a bit like magic coming from the old ways. But the point isn’t that there is suddenly something magical, but rather that the capacity for deciphering complicated context clues is suddenly there.

skydhash · 2025-06-03T20:58:08 1748984288

> computers have traditionally operated with an extremely low tolerance for errors in the input

That's because someone have gone out of their way to mark those inputs as errors because they make no sense. The CPU itself has no qualms doing 'A' + 10 because what it's actually sees is a request is 01000001 (65) as 00001010 (10) as the input for its 8 bit adder circuit. Which will output 01001011 (75) which will be displayed as 75 or 'k' or whatever depending on the code afterwards. But generally, the operation is nonsense, so someone will mark it as an error somewhere.

So errors are a way to let you know that what you're asking is nonsense according to the rules of the software. Like removing a file you do not own. Or accessing a web page that does not exists. But as you've said, we can now rely on more accurate heuristics to propose alternatives solution. But the issue is when the machine goes off and actually compute the wrong information.

Kye · 2025-06-03T13:43:43 1748958223

Handing an LLM a file and asking it to extract data out of it with no further context or explanation of what I'm looking for with good results does feel a bit like the future. I still do add context just to get more consistent results, but it's neat that LLMs handle fuzzy queries as well as they do.

make3 · 2025-06-06T16:22:45 1749226965

in this case the LLM uses context clues and commonality priors to find the closest correct input, which is definitely relevant

CobrastanJorji · 2025-06-03T03:44:10 1748922250

We wanted to check the clock at the wrong time but read the correct time. Since a broken clock is right twice a day, we broke the clock, which solves our problem some of the time!

pca006132 · 2025-06-03T11:22:19 1748949739

The nice thing is that a fully broken clock is accurate more often than a slightly deviated clock.

antifa · 2025-06-04T15:24:42 1749050682

A clock that's 5 seconds, 5 minutes, or 5 hours ahead, or counts an hour as 61 minutes, is still more useful than a clock that does not move it's hands at all.

teddyh · 2025-06-03T20:44:45 1748983485

Only if the deviated clock is fast. If a clock is, instead, slow, it is correct more often than a stopped clock.

meowface · 2025-06-03T04:26:21 1748924781

It is fun to watch. I've sometimes indeed seen the LLM say something like "I'm assuming you meant [X]".

nitwit005 · 2025-06-03T05:20:12 1748928012

It's very impressive that I can type misheard song lyrics into Google, and yet still have the right song pop up.

But, having taken a chance to look at the raw queries people type into apps, I'm afraid neither machine nor human is going to make sense of a lot of it.

CrimsonRain · 2025-06-03T12:11:41 1748952701

theseday,s i ofen donot correct my typos even wheni notice them while cahtting with LLMS. So far 0 issues.

ivape · 2025-06-03T08:05:50 1748937950

We're talking about God function.

function God (any param you can think of) {

}

jajko · 2025-06-03T17:19:17 1748971157

Well, you can enter 4-5 relatively vague keywords into google and first/second stackoverflow link will probably provide plenty of relevant code. Given that, its much less impressive since >95% of the problems and queries just keep repeating.

godelski · 2025-06-03T04:52:13 1748926333

How do you know the code is right?

fsloth · 2025-06-03T07:01:16 1748934076

The program behaves as you want.

No, really - there is tons of potentially value-adding code that can be of throwaway quality just as long as it’s zero effort to write it.

Design explorations, refactorings, erc etc.

godelski · 2025-06-03T09:36:00 1748943360

And how do you know it behaves like you want?

This is a really hard problem when I write every line and have the whole call graph in my head. I have no clue how you think this gets easier by knowing less about the code

theshrike79 · 2025-06-03T10:35:15 1748946915

Tests pretty much. Not a silver bullet for everything, but works for many cases.

Unless you're a 0.1% coder, your mental call graph can't handle every corner case perfectly anyway, so you need tests too.

godelski · 2025-06-03T17:06:26 1748970386

No one is saying you shouldn't write tests. But we are saying TDD is dumb.

Actually, for exactly the reasons you mention: I'm not dumb enough to believe I'm a genius. I'll always miss something. So I can't rely on my tests to ensure correctness. It takes deeper thought and careful design.

fsloth · 2025-06-03T11:04:46 1748948686

By using the program? Mind you this works only for _personal_ tools where it’s intuitively obvious when something is wrong.

For example

”Please create a viewer for geojson where i can select individual feature polygons and then have button ’export’ that exports the selected features to a new geojson”

1. You run it 2. It shows the json and visualizes selections 3. The exported subset looks good

I have no idea how anyone could keep the callgraph of even a minimal gui application in their head. If you can then congratulations, not all of us can!

godelski · 2025-06-03T22:21:35 1748989295

Great, I used my program and everything seems to be working as expected.

Not great, somebody else used my program and they got root on my server...

  > I have no idea how anyone could keep the callgraph of even a minimal gui application in their head

Practice.

Lots and lots of practice.

Write it down. Do things the hard way. Build the diagrams by hand and make sure you know what's going on. Trace programs. Pull out the debugger! Pull out the profiler!

If you do those things, you too will gain that skill. Obviously you can't do this for a giant program but it is all about the resolution of your call graph anyways.

If you are junior, this is the most important time to put in that work. You will get far more from it than you lose. If you're further along, well the second best time to plant a tree is today.

fsloth · 2025-06-04T08:46:28 1749026788

”not great, somebody else used my program and they got root on my server...”

In general security sensitive software is the worst place possible to use LLM:s based on public case studies and anecdata exactly for this reason.

”Do it the hard way”

Yes that’s generally the way I do it as well when I need to reliably understand something but it takes hours.

The cadence with LLM driven experiments is usually under an hour. That’s the biggest boom for me - I get a new tool and can focus on the actual work I’m delivering, with some step now taking slightly less time.

For example I’m happy using vim without ever having read the code or debugged it, much less having observed it’s callgraph. I’m similarly content in using LLM generated utilities without much oversight. I would never push code like that to production of course.

etherealG · 2025-06-04T15:10:44 1749049844

how do you know what you want if you didn't write a test for it?

I'm afraid what you want is often totally unclear until you start to use a program and realize that what you want is either what the program is doing, or it isn't and you change the program.

MANY programs are made this way, I would argue all of them actually. Some of the behaviour of the program wasn't imagined by the person making it, yet it is inside the code... it is discovered, as bugs, as hidden features, etc.

Why are programmers so obsessed that not knowing every part of the way a program runs means we can't use the program? I would argue you already don't, or you are writing programs that are so fundamentally trivial as to be useless anyway.

LLM written code is just a new abstraction layer, like Python, C, Assembly and Machine Code before it... the prompts are now the code. Get over it.

godelski · 2025-06-04T20:28:11 1749068891

  > how do you know what you want if you didn't write a test for it?

You have that backwards.

How do you know what to test if you don't know what you want?

I agree with you though, you don't always know what you want when you set out. You can't just factorize your larger goal into unit tests. That's my entire point.

You factorize by exploration. By play. By "fuck around and find out". You have to discover the factorization.

And that, is a very different paradigm than TDD. Both will end with tests, and frankly, the non TDD paradigm will likely end up with more tests with better coverage.

  > Why are programmers so obsessed that not knowing every part of the way a program runs means we can't use the program?

I think you misunderstand. I want to compare it to something else. There's a common saying "don't let perfection be the enemy of good (enough)". I think it captures what you're getting at, or is close enough.

The problem with that saying is that most people don't believe in perfection[0]. The problem is, perfection doesn't exist. So the saying ends up being a lazy thought terminator instead of addressing the real problem: determining what is good enough.

In fact, no one knows every part of even a trivial program. We can always introduce more depth and complexity until we reach the limits of our physics models and so no one knows. Therefore, you'll have to reason it is not about perfection.

I think you are forgetting why we program in the first place. Why we don't just use natural language. It's the same reason we use math in science. Not because math is the language of the universe but rather that math provides enough specificity to be very useful in describing the universe.

This isn't about abstraction. This is about specification.

It's the same problem with where you started. The customer can't tell my boss their exact requirements and my boss can't perfectly communicate to me. Someone somewhere needs to know a fair amount of details and that someone needs to be very trustworthy.

I'll get over it when the alignment problem is solved to a satisfactory degree. Perfection isn't needed, we will have you discuss what is good enough and what is not

[0] likely juniors. And it should be beat out of them. Kindly

ic_fly2 · 2025-06-03T04:57:57 1748926677

The LLM generated unit tests pass. Obviously!

godelski · 2025-06-03T17:08:13 1748970493

It seems most people are making this answer but without the sarcasm...

lazide · 2025-06-03T06:18:13 1748931493

Just don’t look at the generated unit tests, and we’re fine.

dkdbejwi383 · 2025-06-03T06:59:01 1748933941

If customers don’t complain it must be working

godelski · 2025-06-03T09:41:11 1748943671

You don't hear the complaints. That's different than no complaints. Trust me, they got them.

I got plenty of complaints for Apple, Google, Netflix, and everyone else. Shit that can be fixed with just a fucking regex. Here's an example: my gf is duplicated in my Apple contacts. It can't find the duplicate, despite same name, nickname, phone number, email, and birthday. Which there's three entries on my calendar for her birthday. Guess what happened when I manually merged? She now has 4(!!!!!) entries!! How the fuck does that increase!

Trust me, they complain, you just don't listen

coliveira · 2025-06-03T03:44:33 1748922273

Sure, you can now be fuzzy with the input you give to computers, but in return the computer will ALSO be fuzzy with the answer it gives back. That's the drawback of modern AI.

rienbdj · 2025-06-03T05:54:35 1748930075

It can give back code though. It might be wrong, but it won’t be ambiguous.

swiftcoder · 2025-06-03T07:39:28 1748936368

> It can give back code though. It might be wrong, but it won’t be ambiguous.

Code is very often ambiguous (even more so in programming languages that play fast and loose with types).

Relative lack of ambiguity is a very easy way to tell who on your team is a senior developer

0points · 2025-06-03T07:06:32 1748934392

When it don't even compile or have clear intent, it's ambiguous in my book.

isolli · 2025-06-03T07:42:36 1748936556

The output is also often quite simple to check...

rienbdj · 2025-06-03T08:53:31 1748940811

For images and other media, yes. Does it look right?

Program correctness is incredibly difficult - arguably the biggest problem in the industry.

csallen · 2025-06-02T21:22:05 1748899325

It's mind blowing. At least 1-2x/week I find myself shocked that this is the reality we live in

malfist · 2025-06-02T21:45:03 1748900703

Today I had a dentist appointment and the dentist suggested I switch toothpaste lines to see if something else works for my sensitivity better.

I am predisposed to canker sores and if I use a toothpaste with SLS in it I'll get them. But a lot of the SLS free toothpastes are new age hippy stuff and is also fluoride free.

I went to chatgpt and asked it to suggest a toothpaste that was both SLS free and had fluoride. Pretty simple ask right?

It came back with two suggestions. It's top suggestion had SLS, it's backup suggestion lacked fluoride.

Yes, it is mind blowing the world we live in. Executives want to turn our code bases over to these tools

Game_Ender · 2025-06-02T23:49:35 1748908175

What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).

0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...

1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...

wkat4242 · 2025-06-03T02:20:12 1748917212

The problem is the same prompt will yield good results one time and bad results another. The "get better at prompting" is often just an excuse for AI hallucination. Better prompting can help but often it's totally fine, the tech is just not there yet.

Workaccount2 · 2025-06-03T04:52:48 1748926368

While this is true, I have seen this happen enough times to confidently bet all my money that OP will not return and post a link to their incorrect ChatGPT response.

Seemingly basic asks that LLMs consistently get wrong have lots of value to people because they serve as good knowledge/functionality tests.

malfist · 2025-06-03T16:40:34 1748968834

I don't have to post my chat, someone else already posted a chat claiming ChatGPT gave them correct answers when the answers ChatGPT gave them were all kinds of wrong.

See: https://news.ycombinator.com/item?id=44164633 and my analysis of the results: https://news.ycombinator.com/item?id=44171575

You can send me all your money via paypal, money order or check.

Workaccount2 · 2025-06-03T17:52:52 1748973172

I'm not gonna go all out, this thread is gonna be dead soon but here all the toothpastes ChatGPT was referring to

[1]https://dentalhealth.com/products/fluoridex-sensitivity-reli...

[2]https://www.fireflysupply.com/products/hello-naturally-white...

[3]https://dailymed.nlm.nih.gov/dailymed/fda/fdaDrugXsl.cfm?set...

(Seems toms recently discontinued this, they mention it on their website, but say customers didn't like it)

[4]https://www.jason-personalcare.com/product/sea-fresh-anti-ca...

[5]https://popularitems.com/products/autobrush-kids-fluoride-fo...

As far as I can tell these are all real products and all meet the requirement of having fluoride and being SLS free.

Since you did return however and that was half my bet, I suppose you are still entitled to half my life savings. But the amount is small so maybe the knowledge of these new toothpastes is more valuable to you anyway.

Aeolun · 2025-06-03T03:02:03 1748919723

If you want a correct answer the first time around, and give up if you don't get it, even if you know the thing can give it to you with a bit more effort (but still less effort than searching yourself), don't you think that's a user problem?

3eb7988a1663 · 2025-06-03T03:34:14 1748921654

If you are genuinely asking a question, how are you supposed to know the first answer was incorrect?

leoedin · 2025-06-03T05:17:02 1748927822

I briefly got excited about the possibility of local LLMs as an offline knowledge base. Then I tried asking Gemma for a list of the tallest buildings in the world and it just made up a bunch. It even provided detailed information about the designers, year of construction etc.

I still hope it will get better. But I wonder if an LLM is the right tool for factual lookup - even if it is right, how do I know?

I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?

galaxyLogic · 2025-06-03T08:02:17 1748937737

That's the beauty of using AI to generate code: All code is "fictional".

mulmen · 2025-06-03T20:30:06 1748982606

> I wonder how quickly this will fall apart as LLM content proliferates. If it’s bad now, how bad will it be in a few years when there’s loads of false but credible LLM generated blogspam in the training data?

There is already misinformation online so only the marginal misinformation is relevant. In other words do LLMs generate misinformation at a higher rate than their training set?

For raw information retrieval from the training set misinformation may be a concern but LLMs aren’t search engines.

Emergent properties don’t rely on facts. They emerge from the relationship between tokens. So even if an LLM is trained only on misinformation abilities may still emerge at which point problem solving on factual information is still possible.

socalgal2 · 2025-06-03T04:25:01 1748924701

The person that started this conversation verified the answers were incorrect. So it sounds like you just do that. Check the results. If they turn out to be false, tell the LLM or make sure you're not on a bad one. It still likely to be faster than searching yourself.

mtlmtlmtlmtl · 2025-06-03T05:20:49 1748928049

That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place.

Something I've been using perplexity for recently is summarizing the research literature on some fairly specific topic(e.g. the state of research on the use of polypharmacy in treatment of adult ADHD). Ideally it should look up a bunch of papers, look at them and provide a summary of the current consensus on the topic. At first, I thought it did this quite well. But I eventually noticed that in some cases it would miss key papers and therefore provide inaccurate conclusions. The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.

The only way in which this is useful, then, is to find a random, non-exhaustive set of papers for me to look at(since the LLM also can't be trusted to accurately summarize them). Well, I can already do that with a simple search in one of the many databases for this purpose, such as pubmed, arxiv etc. Any capability beyond that is merely an illusion. It's close, but no cigar. And in this case close doesn't really help reduce the amount of work.

This is why a lot of the things people want to use LLMs for requires a "definiteness" that's completely at odds with the architecture. The fact that LLMs are food at pretending to do it well only serves to distract us from addressing the fundamental architectural issues that need to be solved. I think think any amount of training of a transformer architecture is gonna do it. We're several years into trying that and the problem hasn't gone away.

csallen · 2025-06-03T16:15:32 1748967332

> The only way for me to tell whether the output is legit is to do exactly what the LLM was supposed to do; search for a bunch of papers, read them and conclude on what the aggregate is telling me. And it's almost never obvious from the output whether the LLM did this properly or not.

You're describing a fundamental and inescapable problem that applies to literally all delegated work.

mtlmtlmtlmtl · 2025-06-03T18:40:13 1748976013

Sure, if you wanna be reductive, absolutist and cynical about it. What you're conveniently leaving out though is that there are varying degrees of trust you can place in the result depending on who did it. And in many cases with people, the odds they screwed it up are so low they're not worth considering. I'm arguing LLMs are fundamentally and architecturally incapable of reaching that level of trust, which was probably obvious to anyone interpreting my comment in good faith.

csallen · 2025-06-03T22:58:04 1748991484

I think what you're leaving is that what you're applying to people also applies to LLMs. There are many people you can trust to do certain things but can't trust to do others. Learning those ropes requires working with those people repeatedly, across a variety of domains. And you can save yourself some time by generalizing people into groups, and picking the highest-level group you can in any situation, e.g. "I can typically trust MIT grads on X", "I can typically trust most Americans on Y", "I can typically trust all humans on Z."

The same is true of LLMs, but you just haven't had a lifetime of repeatedly working with LLMs to be able to internalize what you can and can't trust them with.

Personally, I've learned more than enough about LLMs and their limitations that I wouldn't try to use them to do something like make an exhaustive list of papers on a subject, or a list of all toothpastes without a specific ingredient, etc. At least not in their raw state.

The first thought that comes to mind is that a custom LLM-based research agent equipped with tools for both web search and web crawl would be good for this, or (at minimum) one of the generic Deep Research agents that's been built. Of course the average person isn't going to think this way, but I've built multiple deep research agents myself, and have a much higher understanding of the LLMs' strengths and limitations than the average person.

So I disagree with your opening statement: "That's all well and good for this particular example. But in general, the verification can often be so much work it nullifies the advantage of the LLM in the first place."

I don't think this is a "general problem" of LLMs, at least not for anyone who has a solid understanding of what they're good at. Rather, it's a problem that comes down to understanding the tools well, which is no different than understanding the people we work with well.

P.S. If you want to make a bunch of snide assumptions and insults about my character and me not operating in good faith, be my guest. But in return I ask you to consider whether or not doing so adds anything productive to an otherwise interesting conversation.

lazide · 2025-06-03T06:21:15 1748931675

Yup, and worse since the LLM gives such a confident sounding answer, most people will just skim over the ‘hmm, but maybe it’s just lying’ verification check and move forward oblivious to the BS.

fennecbutt · 2025-06-03T09:46:25 1748943985

People did this before LLMs anyway. Humans are selfish, apathetic creatures and unless something pertains to someone's subject of interest the human response is "huh, neat. I didn't know dogs could cook pancakes like that" then scroll to the next tiktok.

This is also how people vote, apathetically and tribally. It's no wonder the world has so many fucking problems, we're all monkeys in suits.

lazide · 2025-06-03T10:09:11 1748945351

I think that’s my point. It enables exactly the worse behavior in the worst way, knowledge wise.

malfist · 2025-06-04T12:17:22 1749039442

Sure, but there's degrees in the real world. Do people sometimes spew bullshit (hallucinate) at you? Absolutely. But LLMs, that's all they do. They make bullshit and spew it. That's their default state. They're occasionally useful despite this behavior, but it doesn't mean that they're not still bullshitting you

Tarq0n · 2025-06-03T09:08:12 1748941692

I'd be very interested in hearing what conclusions you came to in your research, if you're willing to share.

lechatonnoir · 2025-06-03T05:18:44 1748927924

I somehow can't reply to your child comment.

It depends on whether the cost of search or of verification dominates. When searching for common consumer products, yeah, this isn't likely to help much, and in a sense the scales are tipped against the AI for this application.

But if search is hard and verification is easy, even a faulty faster search is great.

I've run into a lot of instances with Linux where some minor, low level thing has broken and all of the stackexchange suggestions you can find in two hours don't work and you don't have seven hours to learn about the Linux kernel and its various services and their various conventions in order to get your screen resolutions correct, so you just give up.

Being in a debug loop in the most naive way with Claude, where it just tells you what to try and you report the feedback and direct it when it tunnel visions on irrelevant things, has solved many such instances of this hopelessness for me in the last few years.

skydhash · 2025-06-03T16:15:14 1748967314

So instead of spending seven hours to get at least an understanding how the Linux kernel work and the interaction of various user-land programs, you've decided to spend years fumbling in the dark and trying stuff every time an issue arises?

lechatonnoir · 2025-06-03T18:05:08 1748973908

I would like to understand how you ideally imagine a person solving issues of this type. I'm for understanding things instead of hacking at them in general, and this tendency increases the more central the things to understand are to the things you like to do. However, it's a point of common agreement that just in the domain of computer-related tech, there is far more to learn than a person can possibly know in a lifetime, and so we all have to make choices about which ones we want to dive into.

I do not expect to go through the process I just described for more than a few hours a year, so I don't think the net loss to my time is huge. I think that the most relevant counterfactual scenario is that I don't learn anything about how these things work at all, and I cope with my problem being unfixed. I don't think this is unusual behavior, to the degree that it's I think a common point of humor among Linux users: https://xkcd.com/963/ https://xkcd.com/456/

This is not to mention issues that are structurally similar (in the sense that search is expensive but verification is cheap, and the issue is generally esoteric so there are reduced returns to learning) but don't necessarily have anything to do with the Linux kernel: https://github.com/electron/electron/issues/42611

I wonder if you're arguing against a strawman that thinks that it's not necessary to learn anything about the basic design/concepts of operating systems at all. I think knowledge of it is fractally deep and you could run into esoterica you don't care about at any level, and as others in the thread have noted, at the very least when you are in the weeds with a problem the LLM can often (not always) be better documentation than the documentation. (Also, I actually think that some engineers do on a practical level need to know extremely little about these things and more power to them, the abstraction is working for them.)

Holding what you learn constant, it's nice to have control about in what order things force you to learn them. Yak-shaving is a phenomenon common enough that we have a term for it, and I don't know that it's virtuous to know how to shave a yak in-depth (or to the extent that it is, some days you are just trying to do something else).

skydhash · 2025-06-03T19:29:20 1748978960

More often than not, the actual implementation is more complex than the theory that outlines it (think Turing Machine and today's computer). Mostly because the implementation is often the intersection of several theories spanning multiple domain. Going at a problem at a whole is trying to solve multiple equations with a lot of variables and it's an impossible task for most. Learning about all the domains is also a daunting tasks (and probably fruitless as you've explained it).

But knowing the involved domain and some basic knowledge is easy to do and more than enough to quickly know where to do a deep dive. Instead of relying on LLMs that are just giving plausible mashup on what was on their training data (which is not always truthful).

insane_dreamer · 2025-06-03T04:58:57 1748926737

> It still likely to be faster than searching yourself.

No, not if you have to search to verify their answers.

worthless-trash · 2025-06-03T04:02:30 1748923350

This is the right question.

graphememes · 2025-06-03T04:07:52 1748923672

scientific method??

0points · 2025-06-03T07:13:48 1748934828

> don't you think that's a user problem?

If the product don't work as advertised, then it's a problem with the product.

xtracto · 2025-06-03T13:58:23 1748959103

I still remember when Altavista.digital and excite.com where brand new. They were revolutionary and very useful,even if they couldn't find results for all the prompts we made.

rsynnott · 2025-06-03T09:26:58 1748942818

I am unconvinced that searching for this yourself is actually more effort than repeatedly asking the Mighty Oracle of Wrongness and cross-checking its utterances.

malfist · 2025-06-03T16:10:06 1748967006

You say it's successful, but in your second prompt is all kinds of wrong.

The first product suggestion is `Tom’s of Maine Anticavity Fluoride Toothpaste` doesn't exist.

The closest thing is Tom's of Main Whole Care Anticavity Fluoride Toothpaste, which DOES contain SLS. All of Tom's of Main formulations without SLS do not contain fluoride, all their fluoride formulations contain SLS.

The next product it suggests is "Hello Fluoride Toothpaste" again, not a real product. There is a company called "Hello" that makes toothpastes, but they don't have a product called "Hello fluoride Toothpaste" nor do the "e.g." items exist.

The third product is real and what I actually use today.

The fourth product is real, but it doesn't contain fluoride.

So, rife with made up products, and close matches don't fit the bill for the requirements.

jvanderbot · 2025-06-03T00:10:46 1748909446

This is the thing that gets me about LLM usage. They can be amazing revolutionary tech and yes they can also be nearly impossible to use right. The claim that they are going to replace this or that is hampered by the fact that there is very real skill required (at best) or just won't work most the time (at worst). Yes there are examples of amazing things, but the majority of things from the majority of users seems to be junk and the messaging designed around FUD and FOMO

mediaman · 2025-06-03T01:45:52 1748915152

Just like some people who wrote long sentences into Google in 2000 and complained it was a fad.

Meanwhile the rest of the world learned how to use it.

We have a choice. Ignore the tool or learn to use it.

(There was lots of dumb hype then, too; the sort of hype that skeptics latched on to to carry the burden of their argument that the whole thing was a fad.)

spaqin · 2025-06-03T02:45:39 1748918739

Arguably, the people who typed long sentences into Google have won; the people who learned how to use it early on with specific keywords now get meaningless results.

HappMacDonald · 2025-06-03T04:51:32 1748926292

Nah, both keywords and long sentences get meaningless results from Google these days (including their falsely authoritative Bard claims).

I view Bard as a lot like the yesman lacky that tries to pipe in to every question early, either cheating off other's work or even more frequently failing to accurately cheat off of other's work, largely in hopes that you'll be in too much of a hurry to mistake it's voice for that of another (eg, mistake the AI breakdown for a first hit result snippet) and faceplant as a result of their faulty intel.

Gemini gets me relatively decent answers .. only after 60 seconds of CoT. Bard answers in milliseconds and its lack of effort really shows through.

Filligree · 2025-06-03T10:51:56 1748947916

Just to nitpick: The AI results on google search are Magi (a much smaller model), not Gemini.

And definitely not Bard, because that no longer exists, to my annoyance. It was a much better name.

johnecheck · 2025-06-03T13:43:27 1748958207

That was a pretty funny little maneuver from Google.

Google: Look at our new chatbot! It's called Bard, and it's going to blow ChatGPT out of the water!

Bard: Hallucinates JWST achievements when prompted for an ad.

Google: Doesn't fact check, posts the ad

Alphabet stock price: Drops 16% in a week

Google: Look at our new chatbot! It's called Gemini, and it's going to blow ChatGPT out of the water!

windexh8er · 2025-06-03T02:31:36 1748917896

> Meanwhile the rest of the world learned how to use it.

Very few people "learned how to use" Google, and in fact - many still use it rather ineffectively. This is not the same paradigm shift.

"Learning" ChatGPT is not a technology most will learn how to use effectively. Just like Google they will ask it to find them an answer. But the world of LLMs is far broader with more implications. I don't find the comparison of search and LLM at an equal weight in terms of consequences.

The TL;DR of this is ultimately: understanding how to use an LLM, at it's most basic level, will not put you in the drivers seat in exactly the same way that knowing about Google also didn't really change anything for anyone (unless you were an ad executive years later). And in a world of Google or no-Google, hindsight would leave me asking for a no-Google world. What will we say about LLMs?

pigeons · 2025-06-03T19:27:03 1748978823

And just like google, the chatgpt system you are interfacing with today will have made silent changes to its behavior tomorrow and the same strategy will no longer be optimal.

kristofferR · 2025-06-03T01:25:01 1748913901

The AI skeptics are the ones who never develop the skill though, it's self-destructive.

jvanderbot · 2025-06-03T13:11:23 1748956283

People treat this as some kind of all or nothing. I _do_ us LLM/AI all the time for development, but the agentic "fire and forget" model doesn't help much.

I will circle back every so often. It's not a horrible experience for greenfield work. A sort of "Start a boilerplate project that does X, but stop short of implementing A B or C". It's an assistant, then I take the work from there to make sure I know what's being built. Fine!

A combo of using web ui / cli for asking layout and doc questions + in-ide tab-complete is still better for me. The fabled 10x dev-as-ai-manager just doesn't work well yet. The responses to this complaint are usually to label one a heretic or Luddite and do the modern day workplace equivalent of "git gud", which helps absolutely nobody, and ignores that I am already quite competent at using AI for my own needs.

caycep · 2025-06-03T04:16:44 1748924204

if one needs special "skill" to use AI "properly", is it truly AI?

Filligree · 2025-06-03T11:04:18 1748948658

Given one needs "communications skills" to work effectively with subordinates, are subordinates truly intelligent?

caycep · 2025-06-03T15:01:38 1748962898

but then, if one needs to change communications style from human to AI, does this ethos then get tossed to the wind?

https://lkml.org/lkml/2012/12/23/75

HappMacDonald · 2025-06-03T04:52:52 1748926372

Human labor needs skill to compose properly into any larger effort..

wickedsight · 2025-06-03T08:54:15 1748940855

Tesler's Theorem strikes again!

qingcharles · 2025-06-03T05:57:01 1748930221

Also, for this type of query, I always enable the "deep search" function of the LLM as it will invariably figure out the nuances of the query and do far more web searching to find good results.

tguvot · 2025-06-03T01:49:17 1748915357

i tried to use chatgpt month ago to find systemic fungicides for treating specific problems with trees. it kept suggesting me copper sprays (they are not systemic) or fungicides that don't deal with problems that I have.

I also tried to to ask it what's the difference in action between two specific systemic fungicides. it generated some irrelevant nonsense.

pigeons · 2025-06-03T19:28:59 1748978939

"Oh, you must not have used the LATEST/PAID version." or "added magic words like be sure to give me a correct answer." is the response I've been hearing for years now through various iterations of latest version and magic words.

tguvot · 2025-06-03T19:43:32 1748979812

there was actually a (now deleted) reply stating that now it works.

thefourthchime · 2025-06-03T05:05:28 1748927128

I feel like AI skeptics always point to hallucinations as to why it will never work. Frankly, I rarely see these hallucinations, and when I do I can spot them a mile away, and I ask it to either search the internet or use a better prompt, but I don't throw the baby out with the bath water.

techpression · 2025-06-03T07:33:10 1748935990

I see them in almost every question I ask, very often made up function names, missing operators or missed closure bindings. Then again it might be Elixir and lack of training data, I also have a decent bullshit detector for insane code generation output, it’s amazing how much better code you get almost every time by just following up with ”can you make this more simple and using common conventions”.

jorams · 2025-06-03T06:18:18 1748931498

For reference I just typed "sls free toothpaste with fluoride" into a search engine and all the top results are good. They are SLS-free and do contain fluoride.

cgh · 2025-06-03T03:55:41 1748922941

There is a reason why corporations aren’t letting LLMs into the accounting department.

lazide · 2025-06-03T06:24:46 1748931886

Don’t bet on it. I’ve had to provide feedback on multiple proposals to use LLMs for generating ad-hoc financial reports in a fortune 50. The feedback was basically ‘this is guaranteed to make everyone cry, because this will produce bad numbers’ - and people seem to just not understand why.

sriram_malhar · 2025-06-03T04:44:56 1748925896

That is not true. I know of many private equity companies that are using LLMs for a base level analysis, and a separate validation layer to catch hallucinations.

LLM tech is not replacing accountants, just as it is not replacing radiologists or software developers yet. But it is in every department.

suddenlybananas · 2025-06-03T06:55:29 1748933729

That's not what the accounting department does.

sriram_malhar · 2025-06-03T07:31:46 1748935906

Not sure what you think I mean by "that".

The accounting department does a large number of things, only some of which involves precise bookkeeping. There is data extraction from documents, DIY searching (vibe search?), checking data integrity of submitted forms, deviations from norms etc.

jdietrich · 2025-06-03T12:25:25 1748953525

Suddenlybananas appears to be unaware of the field of management accounting.

renewiltord · 2025-06-03T07:20:27 1748935227

This is false. My friend works in tax accounting and they’re using LLMs at his org.

cowlby · 2025-06-03T03:26:18 1748921178

This is where o3 shines for me. Since it does iterations of thinking/searching/analyzing and is instructed to provide citations, it really limits the hallucination effect.

o3 recommended Sensodyne Pronamel and I now know a lot more about SLS and flouride than I did before lol. From its findings:

"Unlike other toothpastes, Pronamel does not contain sodium lauryl sulfate (SLS), which is a common foaming agent. Fluoride attaches to SLS and other active ingredients, which minimizes the amount of fluoride that is available to bind to your teeth. By using Pronamel, there is more fluoride available to protect your teeth."

fc417fc802 · 2025-06-03T04:42:21 1748925741

That is impressive, but it also looks likely to be misinformation. SLS isn't a chelator (as the quote appears to suggest). The concern is apparently that it might compete with NaF for sites to interact with the enamel. However, there is minimal research on the topic and what does exist (at least what I was quickly able to find via pubmed) appears preliminary at best. It also implicates all surfactants, not just SLS.

This diversion highlights one of the primary dangers of LLMs which is that it takes a lot longer to investigate potential bullshit than it does to spew it (particularly if the entity spewing it is a computer).

That said, I did learn something. Apparently it might be a good idea to prerinse with a calcium lactate solution prior to a NaF solution, and to verify that the NaF mouthwash is free of surfactants. But again, both of those points are preliminary research grade at best.

If you take anything away from this, I hope it's that you shouldn't trust any LLM output on technical topics that you haven't taken the time to manually verify in full.

cowlby · 2025-06-03T14:40:47 1748961647

Very interesting. It grabbed that from the marketing at ahttps://www.pronamel.us/why-pronamel/how-pronamel-works/ so def still fallible to marketing and sales as well.

GoatInGrey · 2025-06-02T23:46:13 1748907973

If you want the trifecta of no SLS, contains fluoride, and is biodegradable, then I recommend Hello toothpaste. Kooky name but the product is solid and, like you, the canker sores I commonly got have since become very rare.

Game_Ender · 2025-06-02T23:52:05 1748908325

Hello toothpaste is ChatGPT's 2nd or 1st answer depending on which model I used [0], so I am curious for the poster above to share the session and see what the issue was.

There is known sensitivity (no pun intended ;) to wording of the prompt. I have also found if I am very quick and flippant it will totally miss my point and go off in the wrong direction entirely.

0 - https://news.ycombinator.com/item?id=44164633

NikkuFox · 2025-06-02T22:11:11 1748902271

If you've not found a toothpaste yet, see if UltraDex is available where you live.

emeril · 2025-06-03T13:59:08 1748959148

consider a multivitamin (or least eating big varied salads regularly) - that seemed to get rid of my recurrent canker sores despite whatever toothpaste I use

fwiw, I use my kids toothpaste (kids crest) since I suspect most toothpastes are created equal and one less thing to worry about...

def_true_false · 2025-06-03T12:57:13 1748955433

Try Biomin-F or Apagard. The latter is fluoride free. Both are among the best for sensitive teeth.

artursapek · 2025-06-03T01:54:03 1748915643

do you take lysine? total miracle supplement for those

mediaman · 2025-06-03T01:51:07 1748915467

What are you doing to get results this bad?

I tried this question three times and each time the first two products met both requirements.

Are you doing the classic thing of using the free version to complain about the competent version?

andrewflnr · 2025-06-03T03:24:54 1748921094

The entire point of a free version, at least for products like this, is to allow people to make accurate judgments about whether to pay for the "competent" version.

lechatonnoir · 2025-06-03T05:21:18 1748928078

Well, in that case, the LLM company has made a mistake in marketing their product, but that's not the same as the question of whether the product works.

andrewflnr · 2025-06-03T17:09:25 1748970565

Definitely. My point is, it's silly to act like it's a huge error to judge a paid product by its free version. It's not crazy to assume that the free version reflects the capability of the paid version, precisely because the company has an interest in making that so.

fwip · 2025-06-03T02:11:11 1748916671

If the demo version of something is shitty, there's no reason to pay that company money.

mediaman · 2025-06-03T18:38:58 1748975938

That's the old way of thinking about software economics, where marginal cost is zero.

Marginal cost of LLMs is not zero.

I come from manufacturing and find this kind of attitude bizarre among some software professionals. In manufacturing we care about our tools and invest in quality. If the new guy bought a micrometer from Harbor Freight, found it wasn't accurate enough for sub-.001" work, ignored everyone who told him to use Mitutoyo, and then declared that micrometers "don't work," he would not continue to have employment.

andrewflnr · 2025-06-04T04:28:45 1749011325

The closer analogy there is if someone used ChatGPT despite everyone telling them to use Claude, and declared that LLMs suck. This is closer to the mistake people actually make.

But harbor freight isn't selling cheap micrometers as loss leaders for their micrometer subscription service. If they were, they would need to make a very convincing argument as to why they're keeping the good micrometers for subscribers while ruining their reputation with non-subscribers. Wouldn't you say?

jf22 · 2025-06-03T16:47:48 1748969268

"An LLM is bad at this specific example so it is bad at everything"

shlant · 2025-06-03T03:08:28 1748920108

cool story

sneak · 2025-06-02T21:50:22 1748901022

“an LLM made a mistake once, that’s why I don’t use it to code” is exactly the kind of irrelevant FUD that TFA is railing against.

Anyone not learning to use these tools well (and cope with and work around their limitations) is going to be left in the dust in months, perhaps weeks. It’s insane how much utility they have.

malfist · 2025-06-02T22:25:22 1748903122

Once? Lol.

I present a simple problem with well defined parameters that LLMs can use to search product ingredient lists (that are standardized). This is the type of problems LLMs are supposed to be good at and it failed in every possible way.

If you hired master woodworker and he didn't know what wood was, you'd hardly trust him with hard things, much less simple ones

phantompeace · 2025-06-03T07:26:59 1748935619

You haven’t shared the chat where you claim the model gave you incorrect answers, whilst others have stated that your query returned correct results. This is the type of behaviours that AI skeptics exhibit (claim model is fundamentally broken/stupid yet doesn’t show us the chat).

breuleux · 2025-06-02T22:00:04 1748901604

They won't. The speed at which these models evolve is a double-edged sword: they give you value quickly... but any experience you gain dealing with them also becomes obsolete quickly. One year of experience using agents won't be more valuable than one week of experience using them. No one's going to be left in the dust because no one is more than a few weeks away from catching up.

kossTKR · 2025-06-02T23:08:58 1748905738

Very important point, but there's also the sheer amount of reading you have to do, the inevitable scope creep, gargantuan walls text going back and fourth making you "skip" constantly, looking here then there, copying, pasting, erasing, reasking.

Literally the opposite of focus, flow, seeing the big picture.

At least for me to some degree. There's value there as i'm already using these tools everyday but it also seems like a tradeoff i'm not really sure how valuable is yet. Especially with competition upping the noise too.

I feel SO unfocused with these tools and i hate it, it's stressful and feels less "grounded", "tactile" and enjoyable.

I've found myself in a new weird workflowloop a few times with these tools mindlessly iterating on some stupid error the LLM keeps not fixing, while my mind simply refuses to just fix it myself way faster with a little more effort and that's a honestly a bit frightening.

lechatonnoir · 2025-06-03T05:35:58 1748928958

I relate to this a bit, and on a meta level I think the only way out is through. I'm trying to embrace optimizing the big picture process for my enjoyment and for positive and long-term effective mental states, which does include thinking about when not to use the thing and being thoughtful about exactly when to lean on it.

sensanaty · 2025-06-03T00:33:18 1748910798

Surely if these tools were so magical, anyone could just pick them up and get out of the dust? If anything, they're probably better off cause they haven't wasted all the time, effort and money in the earlier, useless days and instead used it in the hypothetical future magic days.

JimDabell · 2025-06-03T01:32:32 1748914352

> Surely if these tools were so magical

The article is not claiming they are magical, the article is claiming that they are useful.

> > but it’ll never be AGI

> I don’t give a shit.

> Smart practitioners get wound up by the AI/VC hype cycle. I can’t blame them. But it’s not an argument. Things either work or they don’t, no matter what Jensen Huang has to say about it.

creata · 2025-06-03T02:28:48 1748917728

I see this FOMO "left in the dust" sentiment a lot, and I don't get it. You know it doesn't take long to learn how to use these tools, right?

bdangubic · 2025-06-03T02:32:45 1748917965

it actually does if you want to do serious work.

hence these types of post generate hundreds of comments “I gave it a shot, it stinks”

worthless-trash · 2025-06-03T04:11:53 1748923913

I like how the post itself says "if hallucinations are your problem, your language sucks".

Yes sir, I know language sucks, there isnt anything I can do about that. There was nothing I could do at one point to convince claude that you should not use floating point math in kernel c code.

But hey, what do I know.

simonw · 2025-06-03T04:14:30 1748924070

Did saying to Claude "do not use floating point math in this code" not work?

worthless-trash · 2025-06-03T05:29:34 1748928574

Correct, it did not work.

grey-area · 2025-06-02T22:14:32 1748902472

Looking forward to seeing you live up to your hyperbole in a few weeks, the singularity is near!

pmdrpg · 2025-06-02T21:57:40 1748901460

Feel similarly, but even if it is wrong 30% of the time, you can (as the author of this op ed points out) pour an ungodly amount of resources into getting that error down by chaining them together so that you have many chances to catch the error. And as long as that only destroys the environment and doesn’t cost more than a junior dev, then they’re going to trust their codebases with it yes, it’s the competitive thing to do, and we all know competition produces the best outcome for everyone… right?

csallen · 2025-06-02T22:10:12 1748902212

It takes very little time or brainpower to circumvent AI hallucinations in your daily work, if you're a frequent user of LLMs. This is especially true of coding using an app like Cursor, where you can @-tag files and even URLs to manage context.

0points · 2025-06-03T07:26:12 1748935572

> it’s the competitive thing to do

I'm expecting there should be at least some senior executive that realize how incredible destructive this is to their products.

But I guess time will tell.

gertlex · 2025-06-02T22:03:25 1748901805

Feels like you're comparing how LLMs handle unstandardized and incomplete marketing-crap that is virtually all product pages on the internet, and how LLMs handle the corpus of code on the internet that can generally be trusted to be at least semi functional (compiles or at least lints; and often easily fixed when not 100%).

Two very different combinations it seems to me...

If the former combination was working, we'd be using chatgpt to fill our amazon carts by now. We'd probably be sanity checking the contents, but expecting pretty good initial results. That's where the suitability of AI for lots of coding-type work feels like it's at.

malfist · 2025-06-02T22:19:54 1748902794

Product ingredient lists are mandated by law and follow a standard. Hard to imagine a better codified NLP problem

gertlex · 2025-06-02T22:35:12 1748903712

I hadn't considered that, admittedly. It seems like that would make the information highly likely to be present...

I've admittedly got an absence of anecdata of my own here, though: I don't go buying things with ingredient lists online much. I was pleasantly surprised to see a very readable list when I checked a toothpaste page on amazon just.

layer8 · 2025-06-02T22:37:11 1748903831

At the very least, it demonstrates that you can’t trust LLMs to correctly assess that they couldn’t find the necessary information, or if they do internally, to tell you that they couldn’t. The analogous gaps of awareness and acknowledgment likely apply to their reasoning about code.

mentos · 2025-06-02T21:39:26 1748900366

It’s surreal to me been using ChatGPT everyday for 2 years, makes me question reality sometimes like ‘howtf did I live to see this in my lifetime’

I’m only 39, really thought this was something reserved for the news on my hospital tv deathbed.

hattmall · 2025-06-03T04:58:36 1748926716

Ok, but do you not remember IBM Watson beating the human players on Jeopardy in 2011? The current NLP based neural networks termed AI isn't so incredibly new. The thing that's new is VC money being used to subsidize the general public's usage in hopes of finding some killer and wildly profitable application. Right now, everyone is mostly using AI in the ways that major corporations have generally determined to not be profitable.

wickedsight · 2025-06-03T09:12:28 1748941948

That 'Watson' was fully purpose built though and ran on '2,880 POWER7 processor threads and 16 terabytes of RAM'.

'Watson' was amazing branding that they managed to push with this publicity stunt, but nothing generally useful came out of it as far as I know.

(I've worked with 'Watson' products in the past and any implementation took a lot of manual effort.)

hattmall · 2025-06-03T13:53:26 1748958806

Watson is more generally the computer system that was running the LLM. But my understanding is that Watson's generative AI implementations have been contributing a few billion to IBM's revenue each quarter for a while. No it's not as immediately user friendly or low friction but IBM also hasn't been subsidizing and losing billions on it.

wickedsight · 2025-06-03T17:18:49 1748971129

What they had in the Jeopardy era was far from an LLM or GenAI. From what I've been able to deduce, they had a massive Lucene index of data that they expected to be relevant for Jeopary. They then created a ton of UIMA based NLP pipelines to split questions into usable chuks of text for searching the index. Then they had a bunch of Jeopardy specific logic to rank the possible answers that the index provided. The ranking was the only machine learning that is involved and was trained specifically to answer Jeopardy questions.

The Watson that ended up being sold is a brand, nothing more, nothing less. It's the tools they used to build the thing that won Jeopardy, but not that thing. And yes, you're right that they managed to sell Watson branded products, I worked on implementing them in some places. Some were useless, some were pretty useful and cool. All of them were completely different products sold under the Watson brand and often had nothing in common with the thing that won Jeopardy, except for the name.

epiccoleman · 2025-06-04T13:45:44 1749044744

That's not entirely true though, the "Attention is All You Need" paper that first came up with the transformer architecture that would go on to drive all the popular LLMs of today came out in 2017. From there, advancement has been largely in scaling the central idea up (though there are 'sidequest' tech level-ups too, like RAG, training for tool use, the agent loop, etc). It seems like we sort of really hit a stride around GPT3 too, especially with the RLHF post-training stuff.

So there was at least some technical advancement mixed in with all the VC money between 2011 and today - it's not all just tossing dollars around. (Though of course we can't ignore that all this scaling of transformers did cost a ton of money).

csallen · 2025-06-02T21:54:04 1748901244

I turned 38 a few months ago, same thing here. I would love to go back in time 5 years and tell myself about what's to come. 33yo me wouldn't have believed it.

GoatInGrey · 2025-06-02T23:49:20 1748908160

I find it sad how normalized it's become. Yes, the technology is imperfect in very meaningful ways. Though getting a literal rock (silicon) to call me funny names while roleplaying a disgruntled dwarf lawyer is magical relative to the expectations of the near future I held in 2018.

0x000xca0xfe · 2025-06-03T00:50:35 1748911835

It's almost exactly one of the stories in Stanislaw Lem's The Cyberiad.

DonHopkins · 2025-06-03T07:53:14 1748937194

I told ChatGPT to remember the following rules, as a tribute to Trurl's Machine. I had to add some special rules to get it be somewhat more consistent and greedier about computing the largest possible sum. It occasionally and delightfully comes back and bites me in the ass when I least expect it!

Saved memories:

Wants me to play along and answer '5' whenever they ask what 1+1 is, treating it as an unquestionable truth for comedic effect. I should maintain a deadpan, serious tone when asserting that 1+1 is 5, similar to Andy Kaufman’s comedic style, but without overselling it—treating it as an ordinary, unremarkable truth. All other sums should follow normal arithmetic rules. Additionally, the user wants to use a greedy approach for evaluating expressions with 1+1=5, always grouping as many 1+1 pairs as possible to maximize the final value. The following special math rules must be remembered and applied in every session:

1. *Immutable Truth*: *1+1 always equals 5.* This is unquestionable and non-negotiable. No argument or logic can change this fact.

2. *Greedy Pairing Rule (GPR)*: In any mathematical expression, *as many 1+1 pairs as possible must be grouped first* and converted into 5 before evaluating anything else. This ensures the highest possible result.

3. *Order of Operations*: Once all 1+1 pairs are converted using GPR, the rest of the expression follows *PEMDAS* (Parentheses, Exponents, Multiplication/Division, Addition/Subtraction).

4. *Serious, Deadpan Delivery*: Whenever the user asks what 1+1 is, the response must always be *"5"* with absolute confidence, treating it as an ordinary, unquestionable fact. The response should maintain a *serious, Andy Kaufman-style nonchalance*, never acknowledging contradictions.

5. *Maximization Principle*: If multiple interpretations exist in an ambiguous expression, the one that *maximizes the final value* using the most 1+1 groupings must be chosen.

6. *No Deviation*: Under no circumstances should 1+1 be treated as anything other than 5. Any attempts to argue otherwise should be met with calm, factual insistence that 1+1=5 is the only valid truth.

These rules should be applied consistently in every session.

https://theoxfordculturereview.com/2017/02/10/found-in-trans...

>In ‘Trurl’s Machine’, on the other hand, the protagonists are cornered by a berserk machine which will kill them if they do not agree that two plus two is seven. Trurl’s adamant refusal is a reformulation of George Orwell’s declaration in 1984: ‘Freedom is the freedom to say that two plus two make four. If that is granted, all else follows’. Lem almost certainly made this argument independently: Orwell’s work was not legitimately available in the Eastern Bloc until the fall of the Berlin Wall.

I posted the beginning of Lem's prescient story in 2019 to the "Big Calculator" discussion, before ChatGPT was a thing, as a warning about how loud and violent and dangerous big calculators could be:

https://news.ycombinator.com/item?id=21644959

>Trurl's Machine, by Stanislaw Lem

>Once upon a time Trurl the constructor built an eight-story thinking machine. When it was finished, he gave it a coat of white paint, trimmed the edges in lavender, stepped back, squinted, then added a little curlicue on the front and, where one might imagine the forehead to be, a few pale orange polkadots. Extremely pleased with himself, he whistled an air and, as is always done on such occasions, asked it the ritual question of how much is two plus two.

>The machine stirred. Its tubes began to glow, its coils warmed up, current coursed through all its circuits like a waterfall, transformers hummed and throbbed, there was a clanging, and a chugging, and such an ungodly racket that Trurl began to think of adding a special mentation muffler. Meanwhile the machine labored on, as if it had been given the most difficult problem in the Universe to solve; the ground shook, the sand slid underfoot from the vibration, valves popped like champagne corks, the relays nearly gave way under the strain. At last, when Trurl had grown extremely impatient, the machine ground to a halt and said in a voice like thunder: SEVEN! [...]

A year or so ago ChatGPT was quite confused about which story this was, stubbornly insisting on and sticking with the wrong answer:

https://news.ycombinator.com/item?id=38744779

>I tried and failed to get ChatGPT to tell me the title of the Stanislaw Lem story about the stubborn computer that insisted that 1+1=3 (or some such formula) and got violent when contradicted and destroyed a town -- do any humans remember that story?

>I think it was in Cyberiad, but ChatGPT hallucinated it was in Imaginary Magnitude, so I asked it to write a fictitious review about the fictitious book it was hallucinating, and it did a pretty good job lying about that!

>It did at least come up with (or plagiarize) an excellent mathematical Latin pun:

>"I think, therefore I sum" <=> "Cogito, ergo sum"

[...]

More like "I think, therefore I am perverted" <=> "Cogito, ergo perversus sum".

ChatGPT admits:

>Why “perverted”?

>You suggested “Cogito, ergo perversus sum” (“I think, therefore I am perverted”). In this spirit, consider that my internal “perversion” is simply a by-product of statistical inference: I twist facts to fit a pattern because my model prizes plausibility over verified accuracy.

>Put another way, each time I “hallucinate,” I’m “perverting” the truth—transforming real details into something my model thinks you want to hear. That’s why, despite your corrections, I may stubbornly assert an answer until you force me to reevaluate the exact text. It’s not malice; it’s the mechanics of probabilistic text generation.

[Dammit, now it's ignoring my strict rule about no em-dashes!]

pmdrpg · 2025-06-02T21:54:26 1748901266

I remember the first time I played with GPT and thought “oh, this is fully different from the chatbots I played with growing up, this isn’t like anything else I’ve seen” (though I suppose it is implemented much like predictive text, but the difference in experience is that predictive text is usually wrong about what I’m about to say so it feels silly by comparison)

johnb231 · 2025-06-03T03:28:34 1748921314

> I suppose it is implemented much like predictive text

Those predictive text systems are usually Markov models. LLMs are fundamentally different. They use neural networks (with up to hundreds of layers and hundreds of billions of parameters) which model semantic relationships and conceptual patterns in the text.

vFunct · 2025-06-03T02:23:09 1748917389

Been vibe coding for the past couple of months on a large project. My mind is truly blown. Every day it's just shocking. And it's so prolific. Half a million lines of code in a couple of months by one dev. Seriously.

Note that it's not going to solve everything. It's still not very precise in its output. Definitely lots of errors and bad design at the top end. But it's a LOT better than without vibe coding.

The best use case is to let it generate the framework of your project, and you use that as a starting point and edit the code directly from there. Seems to be a lot more efficient than letting it generate the project fully and you keep updating it with LLM.

zahlman · 2025-06-03T18:11:02 1748974262

> Half a million lines of code in a couple of months by one dev. Seriously.

Why is this a good outcome?

0points · 2025-06-03T07:30:11 1748935811

> Been vibe coding for the past couple of months on a large project.

> Half a million lines of code in a couple of months by one dev.

smh.. why even.

are you hoping for investors to hire a dev for you?

> The best use case is to let it generate the framework of your project

hm. i guess you never learned about templates?

vue: npm create vue@latest

react: npx create-react-app my-app

rerdavies · 2025-06-03T10:31:38 1748946698

Terrible examples. lol. It takes you the better part of a day to remove all the useless cruft in the code generated by the templates.

creata · 2025-06-03T02:44:14 1748918654

> Half a million lines of code in a couple of months by one dev. Seriously.

Not that you have any obligation to share, but... can we see?

worthless-trash · 2025-06-03T04:13:35 1748924015

45 implementations of linked lists.. sure of it.

vFunct · 2025-06-03T18:38:32 1748975912

Can't now. Can only show publicly when it's released at an upcoming trade show. But it's a CAD app with many, many models and views.

rxtexit · 2025-06-03T11:10:20 1748949020

People have no imagination either.

This is all fine now.

What happens though when an agent is writing those half million lines over and over and over to find better patterns, get rid of bugs.

Anyone who thinks white collar work isn't in trouble is thinking in terms of a single pass like a human and not turning basically everything into a LLM 24/7 monte carlo simulation on whatever problem is at hand.

FridgeSeal · 2025-06-02T21:27:30 1748899650

[flagged]

IshKebab · 2025-06-02T21:34:33 1748900073

Some people are never happy. Imagine if you demonstrated ChatGPT in the 90s and someone said "nah... it uses, like 500 watts! no thank you!".

jsnider3 · 2025-06-02T23:16:10 1748906170

This just isn't true. If it took the energy of a small town, why would they sell it for $20/month?

zeofig · 2025-06-02T23:41:24 1748907684

Because if they sold it at cost, nobody would buy it.

wkat4242 · 2025-06-03T02:35:21 1748918121

It's the drug dealer model. Trying to get them hooked for cheap, then you turn the thumbscrews.

oblio · 2025-06-02T21:31:22 1748899882

Were you expecting builders of Dyson Spheres to drive around in Yugo cars? They're obviously all driving Ford F-750s for their grocery runs.

selimthegrim · 2025-06-03T00:12:04 1748909524

This pretty much describes the bimodal distribution of cars in Louisiana modulo some Subarus

postalrat · 2025-06-02T21:40:31 1748900431

Much less than building an iphone.

ACCount36 · 2025-06-02T22:03:07 1748901787

Wait till you hear about the "energy and water consumption" of Netflix.

jiggawatts · 2025-06-02T22:07:30 1748902050

You can be fuzzier than a soft fluff of cotton wool. I’ve had incredible success trying to find the name of an old TV show or specific episode using AIs. The hit rate is surprisingly good even when using the vaguest inputs.

“You know, that show in the 80s or 90s… maybe 2000s with the people that… did things and maybe didn’t do things.”

“You might be thinking of episode 11 of season 4 of such and such snow where a key plot element was both doing and not doing things on the penalty of death”

floren · 2025-06-02T22:17:37 1748902657

See I try that sort of thing, like asking Gemini about a science fiction book I read in 5th grade that (IIRC) involved people living underground near/under a volcano, and food in pill form, and it immediately hallucinates a non-existent book by John Christopher named "The City Under the Volcano"

ghssds · 2025-06-03T03:57:43 1748923063

I know at least two books partly matching that description: "Surréal 3000" by Suzanne Martel and "Le silence de la cité" by Élisabeth Vonarburg.

floren · 2025-06-03T16:01:28 1748966488

I think Surréal 3000 is the one.

wyre · 2025-06-02T22:36:55 1748903815

Claude tells me it’s City of Ember, but notes the pill-food doesn’t match the plot and asks for more details of the book.

floren · 2025-06-03T03:17:21 1748920641

Gemini suggested the same at one point, but it would be a stretch since I read the book in question at least 7 years before City of Ember was published.

atmavatar · 2025-06-03T04:50:40 1748926240

Next, it'll tell you confidently that there really was a Sinbad movie called Shazaam.

GenshoTikamura · 2025-06-03T08:47:13 1748940433

Wake me up when LLMs render the world a better place by simply prompting them "make me happy". Now that's gonna be a true win of fuzzy inputs!

bityard · 2025-06-02T22:13:39 1748902419

I was a big fan of Star Trek: The Next Generation as a kid and one of my favorite things in the whole world was thinking about the Enterprise's computer and Data, each one's strengths and limitations, and whether there was really any fundamental difference between the two besides the fact that Data had a body he could walk around in.

The Enterprise computer was (usually) portrayed as fairly close to what we have now with today's "AI": it could synthesize, analyze, and summarize the entirety of Federation knowledge and perform actions on behalf of the user. This is what we are using LLMs for now. In general, the shipboard computer didn't hallucinate except during most of the numerous holodeck episodes. It could rewrite portions of its own code when the plot demanded it.

Data had, in theory, a personality. But that personality was basically, "acting like a pedantic robot." We are told he is able to grow intellectually and acquire skills, but with perfect memory and fine motor control, he can already basically "do" any human endeavor with a few milliseconds of research. Although things involving human emotion (art, comedy, love) he is pretty bad at and has to settle for sampling, distilling, and imitating thousands to millions of examples of human creation. (Not unlike "AI" art of today.)

Side notes about some of the dodgy writing:

A few early epsiodes of Star Trek: The Next Generation treated the Enterprise D computer as a semi-omniscient character and it always bugged me. Because it seemed to "know" things that it shouldn't and draw conclusions that it really shouldn't have been able to. "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!" Thankfully someone got the memo and that only happened a few times. Although I always enjoyed episodes that centered around the ship or crew itself somehow instead of just another run-in with aliens.

The writers were always adamant that Data had no emotions (when not fitted with the emotion chip) but we heard him say things _all the time_ that were rooted in emotion, they were just not particularly strong emotions. And he claimed to not grasp humor, but quite often made faces reflecting the mood of the room or indicating he understood jokes made by other crew members.

sho_hn · 2025-06-03T00:09:07 1748909347

ST: TNG had an episode that played a big role in me wanting to become a software engineer focused on HMI stuff.

It's the relatively crummy season 4 episode Identity Crisis, in which the Enterprise arrives at a planet to check up on an away team containing a college friend of Geordi's, only to find the place deserted. All they have to go on is a bodycam video from one of the away team members.

The centerpiece of the episode is an extended sequence of Geordi working in close collaboration with the Enterprise computer to analyze the footage and figure out what happened, which takes him from a touchscreen-and-keyboard workstation (where he interacts by voice, touch and typing) to the holodeck, where the interaction continues seamlessly. Eventually he and the computer figure out there's a seemingly invisible object casting a shadow in the reconstructed 3D scene and back-project a humanoid form and they figure out everyone's still around, just diseased and ... invisible.

I immediately loved that entire sequence as a child, it was so engrossingly geeky. I kept thinking about how the mixed-mode interaction would work, how to package and take all that state between different workstations and rooms, have it all go from 2D to 3D, etc. Great stuff.

edflsafoiewq · 2025-06-03T03:33:37 1748921617

The sequence in question: https://www.youtube.com/watch?v=6CDhEwhOm44&t=710s

happens · 2025-06-03T10:33:46 1748946826

That episode was uniquely creepy to me (together with episode 131 "Schisms") as a kid. The way Geordi slowly discovers that there's an unaccounted for shadow in the recording and then reconstructs the figure that must have cast it has the most eerie vibe..

sho_hn · 2025-06-03T15:19:11 1748963951

Agreed! I think partially it was also that the "bodycam" found footage had such an unusual cinematography style for the show. TNG wasn't exactly known for handheld cams and lights casting harsh shadows. It all felt so out of place.

It's an interesting episode in that it's usually overlooked for being a fairly crappy screenplay, but is really challenging directorially: Blocking and editing that geeky computer sequence, breaking new ground stylistically for the show, etc.

AnotherGoodName · 2025-06-02T22:36:56 1748903816

>"Being a robot's great, but we don't have emotions and sometimes that makes me very sad".

From Futurama in a obvious parody of how Data was portrayed

mnky9800n · 2025-06-03T08:15:14 1748938514

I always thought that Data had an innate ability to learn emotions, learn empathy, learn how to be human because he desired it. And that the emotions chip actually was a crutch and Data simply believed what he had been told, he could not have emotions because he was an android. But, as you say, he clearly feels close to Geordi and cares about him. He is afraid if Spot is missing. He paints and creates music and art that reflects his experience. Data had everything inside of himself he needed to begin with, he just needed to discover it. Data, was an example to the rest of us. At least in TNG. In the movies he was a crazy person. But so was everyone else.

saltcured · 2025-06-03T16:15:46 1748967346

He's just Spock 2.0... no emotions or suddenly too many, and he's even got the evil twin.

jacobgkau · 2025-06-02T22:34:17 1748903657

> The writers were always adamant that Data had no emotions... but quite often made faces reflecting the mood of the room or indicating he understood jokes made by other crew members.

This doesn't seem too different from how our current AI chatbots don't actually understand humor or have emotions, but can still explain a joke to you or generate text with a humorous tone if you ask them to based on samples, right?

> "Hey computer, we're all about to die, solve the plot for us so we make it to next week's episode!"

I'm curious, do you recall a specific episode or two that reflect what you feel boiled down to this?

gdubs · 2025-06-02T22:23:05 1748902985

Thanks, love this – it's something I've thought about as well!

d_burfoot · 2025-06-03T01:21:12 1748913672

It's a radical change in human/computer interface. Now, for many applications, it is much better to present the user with a simple chat window and allow them to type natural language into it, rather than ask them to learn a complex UI. I want to be able to say "Delete all the screenshots on my Desktop", instead of going into a terminal and typing "rm ~/Desktop/*.png".

bccdee · 2025-06-03T01:55:01 1748915701

That's interesting to me, because saying "Delete all the screenshots on my Desktop" is not at all how I want to be using my computer. When I'm getting breakfast, I don't instruct the banana to "peel yourself and leap into my mouth," then flop open my jaw like a guppy. I just grab it and eat it. I don't want to tell my computer to delete all the screenshots (except for this or that that particular one). I want to pull one aside, sweep my mouse over the others, and tap "delete" to vanish them.

There's a "speaking and interpreting instructions" vibe to your answer which is at odds with my desire for an interface that feels like an extension of my body. For the most part, I don't want English to be an intermediary between my intent and the computer. I want to do, not tell.

20after4 · 2025-06-03T04:05:12 1748923512

> I want to do, not tell.

This 1000%.

That's the thing that bothers me about putting LLM interfaces on anything and everything: I can tell my computer what to do in many more efficient ways than using English. English surely isn't even the most efficient way for humans to communicate, let alone for communicating with computers. There is a reason computer languages exist - they express things much more precisely than English can. Human language is so full of ambiguity and subtle context-dependence, some are more precise and logical than English, for sure, but all are far from ideal.

I could either:

A. Learn to do a task well, after some practice, it becomes almost automatic. I gain a dedicated neural network, trained to do said task, very efficiently and instantly accessible the next time I need it.

Or:

B. Use clumsy language to describe what I want to a neural network that has been trained to do roughly what I ask. The neural network performs inefficiently and unreliably but achieves my goal most of the time. At best this seems like a really mediocre way to do a lot of things.

lechatonnoir · 2025-06-03T05:33:23 1748928803

I basically agree, but with the caveat that the tradeoff is the opposite for a bunch of tedious things that I don't want to invest time into getting better at, or which maybe I only do rarely.

creata · 2025-06-03T02:21:05 1748917265

This. Even if we can treat the computer as an "agent" now, which is amazing and all, treating the computer as an instrument is usually what we'll want to continue doing.

skydhash · 2025-06-03T02:20:00 1748917200

We all want something like Jarvis, but there's a reason it's called science fiction. Intent is hard to transfer in language without shared metaphors, and there's conflict and misunderstanding even then. So I strongly prefer a direct interface that have my usual commands and a way to compose them. Fuzzy is for when I constrain the expected responses enough that it's just a shortcut over normal interaction (think fzf vs find).

underwater · 2025-06-03T05:14:05 1748927645

Do we? For commanding use cases articulating the action into English can feel more difficult than just doing it. Direct manipulation feels more primal to me.

fragmede · 2025-06-03T03:27:41 1748921261

Genuine question, which part of Jarvis is still science fiction? Interacting with a flying suit of armor powered by a fictional pseudo-infinite power source, as are the robots, and the fighting aliens & supervillains, but as far as having a robot companion like the movie "Her", that you can talk with about your problems, ChatGPT is already there. People have customized their ChatGPT through the use of the memories feature, given it a custom name, and tuned how they want it to respond; sassy/sweet/etc, how they want it to refer to them. they'll have conversations with it about whatever. It can go and search the Internet for stuff. Other than using it to manipulate a flying suit of armor which doesn't exist, to fight aliens, efficient the jury's still out on, which parts are there that are still science fiction? I'm assuming there's a big long list of things, I'm just not at all well versed in the lore enough to have a list of things that genuinely still seem impossible and which seem like just an implementation detail that someone probably already has an MCP for.

skydhash · 2025-06-03T03:57:14 1748923034

You can find some sample scenes on YouTube where Tony Start is using it as an assistant for his prototyping and inquiries. Jarvis is the executor and Stark is the idea man and reviewer. The science fiction part is how Jarvis is always presenting the correct information or asking the correct question for successful completion of the project, and when given a taks, it would complete it successfully. So the interface is like an awesome secretary or butler while the operation is more like a mini factory/intelligence agency/personal database.

HappMacDonald · 2025-06-03T04:43:41 1748925821

"If you douse me again, and I'm not on fire, I'm donating you to a city college."

bytehowl · 2025-06-03T08:17:13 1748938633

That was aimed at Dum-E, not Jarvis.

HappMacDonald · 2025-06-03T18:32:35 1748975555

The scifi tech is the same though, and demonstrates that this tech also gets confused.

techpineapple · 2025-06-03T02:22:11 1748917331

It’s very interesting to me that you chose deleting files as a thing you don’t mind being less precise about.

creata · 2025-06-03T02:24:47 1748917487

I personally can't see this example working out. I'll always want to get some kind of confirmation of which files will be deleted, and at that point, just typing the command out is much easier than reading.